Comparison between Ribosomal Assembly and Machine Learning Tools for Microbial Identification of Organisms with Different Characteristics
Publication Date
1-1-2025
Document Type
Article
Publication Title
Current Bioinformatics
Volume
20
Issue
7
DOI
10.2174/0115748936299440240709070105
First Page
595
Last Page
619
Abstract
Background: Genome assembly tools are used to reconstruct genomic sequences from raw sequencing data, which are then used for identifying the organisms present in a metagenomic sample. Methodology: More recently, machine learning approaches have been applied to a variety of bioinformatics problems, and in this paper, we explore their use for organism identification. We start by evaluating several commonly used metagenomic assembly tools, including PhyloFlash, MEGAHIT, MetaSPAdes, Kraken2, Mothur, UniCycler, and PathRacer, and compare them against state-of-the-art deep learning-based machine learning classification approaches represented by DNABERT and DeLUCS, in the context of two synthetic mock community datasets. Results: Our analysis focuses on determining whether ensembling metagenome assembly tools with machine learning tools have the potential to improve identification performance relative to using the tools individually. Conclusion: We find that this is indeed the case, and analyze the level of effectiveness of potential tool ensembling for organisms with different characteristics (based on factors such as repetitiveness, genome size, and GC content).
Funding Number
DE-AC02-05CH11231
Funding Sponsor
San José State University
Keywords
kraken2, MEGAHIT, metaSPAdes, mothur, pathRacer, PhyloFlash, uniCycler
Department
Computer Engineering
Recommended Citation
Stephanie Chau, Carlos Rojas, Jorjeta G. Jetcheva, Mary Markart, Sudha Vijayakumar, Sophia Yuan, Vincent Stowbunenko, Amanda N. Shelton, and William B. Andreopoulos. "Comparison between Ribosomal Assembly and Machine Learning Tools for Microbial Identification of Organisms with Different Characteristics" Current Bioinformatics (2025): 595-619. https://doi.org/10.2174/0115748936299440240709070105