Publication Date

2009

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

Abstract

In this paper, a novel technique for parallelizing data-classification problems is applied to finding genes in sequences of DNA. The technique involves various ensem- ble classification methods such as Bagging and Select Best. It then distributes the classifier training and prediction using MapReduce. A novel sequence classification voting algorithm is evaluated in the Bagging method, as well as compared against the Select Best method.

Share

COinS