Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for |Bi-Level Clustering of Mixed categorical and numerical data types|. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedical data sets of mixed types, such as hepatitis, thyroid disease and yeast gene expression data with Gene Ontology annotations, more accurately than if using one type alone.
Bill Andreopoulos, Aijun An, and Xiaogang Wang. "Bi-level Clustering of Mixed Categorical and Numerical Biomedical Data" International Journal of Data Mining and Bioinformatics (2006): 19-56. doi:10.1504/IJDMB.2006.009920