Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

William Andreopoulos

Second Advisor

Wendy Lee

Third Advisor

Anurag Wasankar

Keywords

Insecticidal genes, Bacterial Genomics, Functional Gene Predic- tion, Deep Learning, Transformer Models, Biological Language Models, K-mer Tokenization.

Abstract

Identification of bacterial gene sequences with agricultural applications has the potential to transform agricultural biotechnology. These genes can be used in environmentally friendly pest control strategies. One such use case is identifying genes with potential insecticidal properties. With an increasing number of genomic information and decreasing numbers of available annotated sequences, finding new insecticidal genes has become more challenging.The traditional methods relying on sequence alignment and annotated databases are not effective in detecting functionally relevant genes lacking close homology to known cases. This project investigates the data-driven classification of genes by sequence modeling. This research is focused on learning DNA sequence motifs and transferring them to distinguish between insecticidal and non-insecticidal genes. The study exhibits that decision-making functional information may be obtained from DNA with state-of-the-art machine learning methodologies and that deep models are capable of generalization to low-resource environments.

Available for download on Saturday, May 30, 2026

Share

COinS