Publication Date

Spring 2016

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Thomas Austin

Second Advisor

Chris Pollett

Third Advisor

Jon Pearce


Javascript Malware Detection N-grams


The Internet has an immense importance in our day to day life, but at the same time, it has become the medium of infecting computers, attacking users, and distributing malicious code. As JavaScript is the principal language of client side pro- gramming, it is frequently used in conducting such attacks. Various approaches have been made to overcome the JavaScript security issues. Some advanced approaches utilize machine learning technology in combination with de-obfuscation and emula- tion. Many methods of analysis incorporate static analysis and dynamic analysis. Our solution is entirely based on static analysis, which avoids unnecessary runtime overhead.

The central objective of this project is to integrate the work done by Eunjin (EJ) Jung et al. on Towards A Robust Detection of Malicious JavaScript (TARDIS) into the web browser via a Firefox add-on and to demonstrate the usability of our add- on in defending against such attacks. TARDIS uses statistical language modeling for an automatic feature extraction and combines it with structural features from an abstract syntax tree [1]. We have developed a Firefox add-on that is capable of extracting JavaScript code from the page visited and classifying the JavaScript code as either malicious or benign. We leverage the bene t of using a pre-compiled training model in JavaScript Object Notation (JSON). JSON is lightweight and does not consume much memory on a user’s machine. Moreover, it stores the data as key-value pairs and easily maps to the data structures used in modern programming languages. The principle advantage of using a pre-compiled training model is better performance. Our model can achieve 98% accuracy on our sample dataset.