Publication Date

Winter 2018

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


Crime has been prevalent in our society for a very long time and it continues to be so even today. The San Francisco Police Department has continued to register numerous such crime cases daily and has released this data to the public as a part of the open data initiative. In this paper, Big Data analysis is used on this dataset and a tool that predicts crime in San Francisco is provided. The focus of the project is to perform an in-depth analysis of the major types of crimes that occurred in the city, observe the trend over the years, and determine how various attributes, such as seasons, contribute to specific crimes. Furthermore, the proposed model is described that builds on the results of the performed predictive analytics, by identifying the attributes that directly affect the prediction. More specifically, the model predicts the type of crime that will occur in each district of the city. After preprocessing the dataset, the problem reduced to a multi-class classification problem. Various classification techniques such as K-Nearest Neighbor, Multi-class Logistic Regression, Decision Tree, Random Forest and Naïve Bayes are used. Lastly, our results are experimentally evaluated and compared against previous work.The proposed model finds applications in resource allocation of law enforcement in a Smart City.