Publication Date

Spring 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Melody Moh

Second Advisor

Teng Moh

Third Advisor

Ching-Seh Wu


City Traffic, CNNs, Spark


Obtain information from historical data to forecast traffic flow in a city can be difficult because a precision forecasting demands large amount of data and accurate pattern analysis. Meanwhile, it is also meaningful because it provides a detailed and accurate point-to-point prediction for users. In this project, I use CNN (Convolutional Neural Network) to train the model based on the images captured by webcams in New York City. Then I deploy the training process on a Spark distributed Cluster so that the whole training process is accelerated. To efficiently combine CNN and Apache Spark, the prediction model is re-designed and optimized, and the distributed cluster is tuned. By using 5-fold validation, multiple test results are presented to provides a support for the analysis about the model optimization and distributed cluster tuning. The aim of this project is to find the most accurate prediction model for the traffic flow prediction with acceptable time cost.