Publication Date
Spring 2024
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Ching-Seh Wu
Second Advisor
Navrati Saxena
Third Advisor
Robert Chun
Keywords
Bird Species Identification, Deep Learning, Audio Classification, MixIT, Source Separation, Noise Reduction, Transformers, Audio Spectrogram Transformer, EfficientNet
Abstract
The identification of bird species using deep learning techniques presents a novel approach in bioacoustics, by significantly advancing our understanding and enhancing our capabilities in bird species recognition from audio recordings. The value of audio over visual data for monitoring ecological patterns in birds can be highlighted with the deployment of automated recording devices in remote wildlife sensing, offering a more cost-effective, non-invasive, and practical solution. However, the methods of processing and classifying the audio remain challenging due to the complexity of bird audio, characterized by diverse vocalizations and imminent environmental noise, which poses difficult challenges to perform effective classification. The rapidly
evolving field of machine learning, particularly deep learning, has shown promis- ing results in processing and interpreting complex audio data. With the use of a
sound separation technique in audio processing known as Mixture Invariant Train- ing (MixIT), the potential for accurate and efficient bird species identification is
enhanced. Bird audio classification was performed on different variations of deep learning Convolutional Neural Network (CNN) models like EfficientNet and ResNet, and Transformer-based models like Audio-Spectrogram Transformer (AST), Vision Transformer and Wav2Vec2 Transformer. From the findings, it is seen that applying deep learning on MixIT processed data improved accuracy by 12%, from 70.53% to 82.49%, for Audio-Spectrogram Transformer and by 16%, from 65.22% to 81.19%, for EfficientNet.
Recommended Citation
Kosuru, Sasanka, "BIRDSONG CLASSIFICATION USING DEEP LEARNING AND MIXIT" (2024). Master's Projects. 1376.
DOI: https://doi.org/10.31979/etd.7v78-vrv5
https://scholarworks.sjsu.edu/etd_projects/1376