Publication Date
Spring 2024
Degree Type
Master's Project
Degree Name
Master of Science in Bioinformatics (MSBI)
Department
Computer Science
First Advisor
Dr. Wendy Lee
Second Advisor
Dr. William Andreopoulos
Third Advisor
Dr. Fabio Di Troia
Keywords
Nanopore Sequencing, Deep Learning, Sequencing Artifacts, Sequence Context, Cancer Diagnosis
Abstract
Oxford Nanopore sequencing is a revolutionary new technology for sequencing DNA molecules in long stretches. However, it has a significantly higher error rate than conventional short-read sequencing, resulting in numerous sequencing artifacts. These artifacts can be indistinguishable from low frequency somatic variants, which is a roadblock for cancer diagnosis using liquid biopsies. In this study, benchmarked human genome samples from Genome in a Bottle were used to create a dataset of labeled variants, including artifacts and true variants. Variant features, including sequence context, were used to train various deep learning models. The multi-input neural network combining sequence context features and other variant features resulted in higher validation accuracy (0.871) than the other non-sequence context features alone (0.853), demonstrating that the sequence context surrounding a variant has some predictive power regarding whether a called variant is a sequencing artifact or a true variant.
Recommended Citation
Zhou, David, "Characterizing Nanopore Sequencing Artifacts with Deep Learning" (2024). Master's Projects. 1417.
DOI: https://doi.org/10.31979/etd.3jm8-bx3d
https://scholarworks.sjsu.edu/etd_projects/1417