Publication Date
Fall 2019
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Leonard Wesley
Second Advisor
Philip Heller
Third Advisor
Skyler Payne
Keywords
Next Generation Sequencing, Mutations, Homopolymer Regions, Pancreatic Cancer, Evidential Reasoning, Support Vector Classifier
Abstract
This study observes how an evidential reasoning approach can be used as a diagnostic tool for early detection of pancreatic cancer. The evidential reasoning model combines the output of a linear Support Vector Classifier (SVC) with factors such as smoking history, health history, biopsy location, NGS technology used, and more to predict the likelihood of the disease. The SVC was trained using genomic data of pancreatic cancer patients derived from the National Cancer Institute (NIH) Genomic Data Commons (GDC). To test the evidential reasoning model, a variety of synthetic data was compiled to test the impact of combinations of different factors. Through experimentation, we monitored how the evidential interval for pancreatic cancer fluctuated based on the inputs that were provided. We observed how the pancreatic cancer evidential interval increased and the machine learning prediction of pancreatic cancer was supported when the input changed from a non-smoker and non-drinker to an individual with a highly active smoking and drinking history. Similarly, we observed how the evidential interval for pancreatic cancer increased significantly when the machine learning prediction for pancreatic cancer was maintained as high and the input of the quality of the sequencing read was changed from a high quantity of cytosine guanine content and homopolymer regions to a moderate quantity of cytosine guanine content and low homopolymer regions; indicating that there was initially a higher likelihood of error in the sequencing reads, resulting in a more inaccurate machine learning output. This experiment shows that an evidence-based approach has the potential to contribute as a diagnostic tool for screening for high-risk groups. Future work should focus on improving the machine learning model by using a larger pancreatic cancer genomic database. Next steps will involve programmatically analyzing real sequencing reads for irregular guanine cytosine content and high homopolymer regions.
Recommended Citation
Sharagi, Omid, "Toward Early Detection Of Pancreatic Cancer: An Evidence-Based Approach" (2019). Master's Projects. 896.
DOI: https://doi.org/10.31979/etd.2tzh-x2j9
https://scholarworks.sjsu.edu/etd_projects/896