Publication Date


Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


In this work, we present an approach to predicting transcription units based on Bayesian classifiers. The predictor uses publicly available data to train the classifier, such as genome sequence data from Genbank, expression values from microarray experiments, and a collection of experimentally verified transcription units. We have studied the importance of each of the data source on the performance of the predictor by developing three classifier models and evaluating their outcomes. The predictor was trained and validated on the E. coli genome, but can be extended to other organisms. Using the full Bayesian classifier, we were able to correctly identify 80% of gene pairs belonging to operons.