Master of Science (MS)
In this paper, a novel technique for parallelizing data-classification problems is applied to finding genes in sequences of DNA. The technique involves various ensem- ble classification methods such as Bagging and Select Best. It then distributes the classifier training and prediction using MapReduce. A novel sequence classification voting algorithm is evaluated in the Bagging method, as well as compared against the Select Best method.
Jahnke, Glenn, "MRCRAIG: MapReduce and Ensemble Classifiers for Parallelizing Data Classification Problems" (2009). Master's Projects. 143.