Publication Date

Spring 5-20-2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Chris Pollett

Second Advisor

Thomas Austin

Third Advisor

Mike Wu


Subprime mortgage, credit derivatives, linear regression, hidden markov model, long short-term memory


The 2008 housing crisis was caused by faulty banking policies and the use of credit derivatives of mortgages for investment purposes. In this project, we look into datasets that are the markers to a typical housing crisis. Using those data sets we build three machine learning techniques which are, Linear regression, Hidden Markov Model, and Long Short-Term Memory. After building the model we did a comparative study to show the prediction done by each model. The linear regression model did not predict a housing crisis, instead, it showed that house prices would be rising steadily and the R-squared score of the model is 0.76. The Hidden Markov Model predicted a fall in the house prices and the R-squared score for this model is 0.706. Lastly, the Long Short-Term Memory showed that the house price would fall briefly but would stabilize after that. Also, fall is not as sharp as what was predicted by the HMM model. The R- squared scored for this model is 0.9, which is the highest among all other models. Although the R-squared score doesn’t say how accurate a model it definitely says how closely a model fits the data. From our model R-square score the model that best fits the data was LSTM. As the dataset used in all the models are the same therefore it is safe to say the prediction made by LSTM is better than the other ones.