Dhruv Jalota

Publication Date

Spring 2013

Degree Type

Master's Project


This writing project aims to apply the supervised machine learning technique known as Support Vector Machines to a large labeled data set, to attempt to classify an unlabeled data set using the result of training on the labeled data set, and hence perform an analysis of the various results obtained using different Amazon Elastic Cloud Compute instances, sizes of input data set, and different parameters or kernels of the SVM tool. The given data set is relatively large for SVM and the tool being used, known as libsvm, having approximately 1.3 million training examples and 341 attributes with binary classification labels i.e., true (+1) and false (-1). By using the open source tool and deploying it to the cloud, we make use of the computing power available, to get the best possible results for classification. We eventually give a detailed analysis of the performance of all the experiments conducted, and draw conclusions from these results.