Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)


Computer Science

First Advisor

Ching-Seh Wu

Second Advisor

Navrati Saxena

Third Advisor

Robert Chun


Bird Species Identification, Deep Learning, Audio Classification, MixIT, Source Separation, Noise Reduction, Transformers, Audio Spectrogram Transformer, EfficientNet


The identification of bird species using deep learning techniques presents a novel approach in bioacoustics, by significantly advancing our understanding and enhancing our capabilities in bird species recognition from audio recordings. The value of audio over visual data for monitoring ecological patterns in birds can be highlighted with the deployment of automated recording devices in remote wildlife sensing, offering a more cost-effective, non-invasive, and practical solution. However, the methods of processing and classifying the audio remain challenging due to the complexity of bird audio, characterized by diverse vocalizations and imminent environmental noise, which poses difficult challenges to perform effective classification. The rapidly

evolving field of machine learning, particularly deep learning, has shown promis- ing results in processing and interpreting complex audio data. With the use of a

sound separation technique in audio processing known as Mixture Invariant Train- ing (MixIT), the potential for accurate and efficient bird species identification is

enhanced. Bird audio classification was performed on different variations of deep learning Convolutional Neural Network (CNN) models like EfficientNet and ResNet, and Transformer-based models like Audio-Spectrogram Transformer (AST), Vision Transformer and Wav2Vec2 Transformer. From the findings, it is seen that applying deep learning on MixIT processed data improved accuracy by 12%, from 70.53% to 82.49%, for Audio-Spectrogram Transformer and by 16%, from 65.22% to 81.19%, for EfficientNet.

Available for download on Friday, May 23, 2025