Publication Date

Spring 2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Ching-seh (Mike) Wu

Second Advisor

Nada Attar

Third Advisor

Ajay Wisawe


Machine learning, predictive modelling, predictive analysis, sports analytics


In today’s world, data is growing in huge volume and type day by day. Historical data can hence be leveraged to predict the likelihood of the events which are to occur in the future. This process of using statistical or any other form of data to predict future outcomes is commonly termed as predictive modelling. Predictive modelling is becoming more and more important and is trending because of several reasons. But mainly, it enables businesses or individual users to gain accurate insights and allows to decide suitable actions for a profitable outcome.

Machine learning techniques are generally used in order to build these predictive models. Examples of machine learning models ranges from time-series-based regression models which can be used for predicting volume of airline related traffic and linear regression-based models which can be used for predicting fuel efficiency. There are many domains which can gain competitive advantage by using predictive modelling with machine learning. Few of these domains include, but are not limited to, banking and financial services, retail, insurance, fraud detection, stock market analysis, sentimental analysis etc.

In this research project, predictive analysis is used for the sports domain. It’s an upcoming domain where machine learning can help make better predictions. There are numerous sports events happening around the globe every day and the data gathered from these events can very well be used for predicting as well as improving the future events. In this project, machine learning with statistics would be used to perform quantitative and predictive analysis of dataset related to soccer. Comparisons of these models to see how effectively the models are is also presented. Also, few big data tools and techniques are used in order to optimize these predictive models and increase their accuracy to over 90%.