Off-campus SJSU users: To download campus access theses, please use the following link to log into our proxy server with your SJSU library user name and PIN.

Publication Date

Spring 2021

Degree Type

Thesis - Campus Access Only

Degree Name

Master of Science (MS)


Computer Engineering


Carlos Rojas


Autoencoders, Comparing Hi-C Data, Deep Learning, Denoising, Hi-C Data, Synthetic Hi-C Data

Subject Areas

Computer engineering; Computer science; Bioinformatics


The rapidly increasing three-dimensional genome-wide data produced by chromosome conformation capture presents many challenges in computational biology to understand the genome. We use methods such as the high-throughput chromosome conformation capture (Hi-C) technique to understand the role that three-dimensional organizational structures play in gene expression. In recent years, the field has learned about spatial structures such as A/B compartments, topological associating domains (TADs), and chromatin loops. By studying cell lines exposed to various biological conditions we can understand the role of 3D structure. However, the sequencing process of Hi-C data produces noise that prevents the effective comparison of cell lines. Methods such as distance centric and linear models help identify differences between pairs of Hi-C data, but they do not consider the noise that is introduced during sequencing. As a result, these methods have their results biased by noise. We propose a novel method that helps detect areas of interest between pairs of Hi-C data using convolutional autoencoders that reduces the noise in Hi-C data. The proposed deep learning framework can compare diseased and normal genomes of two different cell types. Our method reduces noise that could alter the comparison of Hi-C data. By analyzing various similarity measures our preliminary experiments provide evidence for the advantage of using a convolutional autoencoder for Hi-C comparisons.