Publication Date

Spring 5-23-2016

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Sami Khuri

Second Advisor

Chris Pollett

Third Advisor

Vidya Rangasayee


The human genome consists of various patterns and sequences that are of biolog- ical signi cance. Capturing these patterns can help us in resolving various mysteries related to the genome, like how genomes evolve, how diseases occur due to genetic mutation, how viruses mutate to cause new disease and what is the cure for these diseases. All these applications are covered in the study of bioinformatics.

One of the very common tasks in bioinformatics involves simultaneous alignment of a number of biological sequences. In bioinformatics, this is widely known as Mul- tiple Sequence Alignment. Multiple sequence alignments help in grouping together organisms with the same evolutionary history. They also help in learning properties of a new sequence by aligning it with previously studied homologous sequences.

This project covers probabilistic modeling method to perform multiple sequence alignment (MSA). Use of hidden Markov models in MSA signi cantly improves com- putational speed especially for sequences that contain overlapping regions. We used Baum-Welch expectation maximization algorithm to train hidden Markov models and Viterbi algorithm to align the sequences. Our results are comparable to the ones obtained by publicly available packages like ClustalW and Clustal Omega.