Publication Date

2010

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

Abstract

Clustering is a popular method to glean useful information from microarray data. Unfortunately the results obtained from the common clustering algorithms are not consistent and even with multiple runs of different algorithms a further validation step is required. Due to absence of well defined class labels, and unknown number of clusters, the unsupervised learning problem of finding optimal clustering is hard. Obtaining a consensus of judiciously obtained clusterings not only provides stable results but also lends a high level of confidence in the quality of results. Several base algorithm runs are used to generate clusterings and a co-association matrix of pairs of points is obtained using a configurable majority criterion. Using this consensus as a similarity measure we generate a clustering using four algorithms. Synthetic as well as real world datasets are used in experiment and results obtained are compared using various internal and external validity measures. Results on real world datasets showed a marked improvement over those obtained by other researchers with the same datasets.

Share

COinS