Skip to main navigation Skip to search Skip to main content

Interpreting Office Document Macros with Bi-Directional Transformer Models

  • Mahesh Kalappattil
  • , Varghese Mathew Vaidyan
  • , Gurcan Comert
  • , Yong Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Microsoft Office Document malware is prevalent today, even though some of the macros were developed 30 years ago. This paper provides a novel method to classify malicious office document macros with inter-pretability. Our approach combines the function semantics and keyword contexts to leverage the self-attention functionality of transformers. This research focuses on Bidirectional Encoder Representations from Transformers (BERT) model variants to evaluate and compare the accuracy and interpretability of transformer models in detecting office document macros. The model is evaluated on a dataset collected using Common Crawl. The results show that our method using BERT model variants provides more than 99% accuracy in detecting office document macros. Our research also shows that the BERT models can accurately attribute the classification outcome to the input tokens. Finally, we propose a novel solution to scan email attachments for malicious office document macros and provide attribution reports which not only labels the email as malicious but also attributes as to which tokens in the document are contributing positively towards the classification. This solution is integrated with Gmail as a workspace add-on. We hope that such solutions improve the trust of cyber security personnel in the model and threat detection mechanisms and fine-tune the model to eliminate false positives and biases.
Original languageEnglish
Title of host publication2025 Cyber Awareness and Research Symposium, CARS 2025
DOIs
StatePublished - 2025

Fingerprint

Dive into the research topics of 'Interpreting Office Document Macros with Bi-Directional Transformer Models'. Together they form a unique fingerprint.

Cite this