Publication Date

Spring 2018

Degree Type

Master's Project


Computer Science


With the large amount of data of different types that are available today, the number of features that can be extracted from it is huge. The ever-increasing popularity of multimedia applications, has been a major factor for this, especially in the case of image data. Image data is used for several applications such as classification, retrieval, object recognition, and annotation. Often, utilizing the entire feature set for each of these activities can be not only be time consuming but can also negatively impact the performance. Given the large number of features, it is difficult to find the subset of features that is useful for a given task. Genetic Algorithms (GA) can be used to alleviate this problem, by searching the entire feature set, for those features that are not only essential but improve performance as well. In this project, we explore the various approaches to use GA to select features for different applications, and develop a solution that uses a reduced feature set (selected by GA) to classify images based on their domain/genre. The increased interest in Machine Learning applications has led to the design and development of multiple classification algorithms. In this project, we explore 3 such classification algorithms – Random Forest (RF), Support Vector Machine (SVM), and Neural Networks (NN), and perform 10-fold cross-validation with all 3 methods. The idea is to evaluate the performance of each classifier with the reduced feature set and analyze the impact of feature selection on the accuracy of the model. It is observed that the RF is insensitive to feature selection, while SVM and NN show considerable improvement in accuracy with the reduced feature set. ii The use of this solution is demonstrated in image retrieval, and a possible application in image tampering detection is introduced.