This writing project addresses the topic of attempting to use machine learning on very large data sets on cloud servers. The project consists of two phases. The first being developing a machine learning system which will learn on the data provided by IBM for the “IBM Watson Great minds Challenge SJSU Pilot” competition and providing the best possible results on the evaluation data set, also provided by the IBM Watson team. This will serve as a basis for the second phase of the project, in which the objective is to move the machine learning system on to a cloud server, so that it may be used as a service by future students. The innovation in this project would be to use machine learning based data classification techniques on the cloud and solve a real world classification problem. The challenges involved would be first deploying and testing the classification algorithm that was developed in CS297 on the cloud. The project consists of not just the study of the different techniques of machine learning and its applications, but also involves identifying the algorithm and the environment which will be most suitable for this particular classification problem.
Bhawe, Chinmay, "BIG DATA CLASSIFICATION USING DECISION TREES ON THE CLOUD" (2013). Master's Projects. 317.