Publication Date

Fall 2021

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Katerina Potika

Second Advisor

Mark Stamp

Third Advisor

Thomas Austin

Keywords

code2vec, code analysis, programming language labeling

Abstract

Software development is an expensive and difficult process. Mistakes can be easily made, and without extensive review process, those mistakes can make it to the production code and may have unintended disastrous consequences.

This is why various automated code review services have arisen in the recent years. From AWS’s CodeGuro and Microsoft’s Code Analysis to more integrated code assistants, like IntelliCode and auto completion tools. All of which are designed to help and assist the developers with their work and help catch overlooked bugs.

Thanks to recent advances in machine learning, these services have grown tremen- dously in sophistication to a point where they can catch bugs that often go unnoticed even with traditional code reviews.

This project investigates the use of code2vec [1], which is a probabilistic machine learning model on source code, in correctly labeling methods from different program- ming language families. We extend this model to work with more languages, train the created models, and compare the performance of static and dynamic languages.

As a by-product we create new datasets from the top stared open source GitHub projects in various languages. Different approaches for static and dynamic languages are applied, as well as some improvement techniques, like transfer learning. Finally, different parsers were used to see their effect on the model’s performance.

Recommended Citation

Elsaid, Sherif, "The Impact of Programming Language’s Type on Probabilistic Machine Learning Models" (2021). Master's Projects. 1050.
DOI: https://doi.org/10.31979/etd.ferw-3a7j
https://scholarworks.sjsu.edu/etd_projects/1050

Download

Included in

Artificial Intelligence and Robotics Commons, Programming Languages and Compilers Commons

COinS

DOI

https://doi.org/10.31979/etd.ferw-3a7j

Master's Projects

The Impact of Programming Language’s Type on Probabilistic Machine Learning Models

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

The Impact of Programming Language’s Type on Probabilistic Machine Learning Models

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links