Publication Date
Spring 2016
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Thomas Austin
Second Advisor
Chris Pollett
Third Advisor
Jon Pearce
Keywords
Javascript Malware Detection N-grams
Abstract
The Internet has an immense importance in our day to day life, but at the same time, it has become the medium of infecting computers, attacking users, and distributing malicious code. As JavaScript is the principal language of client side pro- gramming, it is frequently used in conducting such attacks. Various approaches have been made to overcome the JavaScript security issues. Some advanced approaches utilize machine learning technology in combination with de-obfuscation and emula- tion. Many methods of analysis incorporate static analysis and dynamic analysis. Our solution is entirely based on static analysis, which avoids unnecessary runtime overhead.
The central objective of this project is to integrate the work done by Eunjin (EJ) Jung et al. on Towards A Robust Detection of Malicious JavaScript (TARDIS) into the web browser via a Firefox add-on and to demonstrate the usability of our add- on in defending against such attacks. TARDIS uses statistical language modeling for an automatic feature extraction and combines it with structural features from an abstract syntax tree [1]. We have developed a Firefox add-on that is capable of extracting JavaScript code from the page visited and classifying the JavaScript code as either malicious or benign. We leverage the bene t of using a pre-compiled training model in JavaScript Object Notation (JSON). JSON is lightweight and does not consume much memory on a user’s machine. Moreover, it stores the data as key-value pairs and easily maps to the data structures used in modern programming languages. The principle advantage of using a pre-compiled training model is better performance. Our model can achieve 98% accuracy on our sample dataset.
Recommended Citation
Shah, Anumeha, "Malicious JavaScript Detection using Statistical Language Model" (2016). Master's Projects. 476.
DOI: https://doi.org/10.31979/etd.nujz-hf4a
https://scholarworks.sjsu.edu/etd_projects/476