Exploring Music Representation Learning for Detection of Finer-Grained Details

Publication Date

2-25-2026

Document Type

Article

Publication Title

Multimedia Tools and Applications

Volume

85

Issue

3

DOI

10.1007/s11042-026-21302-w

Abstract

The foundational tonic in Indian classical music presents a crucial element for algorithmic understanding. This research investigates the efficacy of modern machine learning techniques for detecting this fine-grained musical attribute from diverse audio representations. Addressing limitations in prior work that used spe-cialized techniques, this study pioneers the integration of audio representations with non-linear dimensionality reduction and machine learning classifiers for tonic classification. The methodology employs various machine learning algorithms on music represented by audio features such as Mel-frequency cepstral coefficients and Mel spectrograms. Spectral analysis using Uniform Manifold Approximation and Projection (UMAP) offers novel insights into music representation learning. The results from tonic classification using permutations of classifiers and dimen- sionality reduction techniques demonstrate the feasibility of automated detection, with UMAP-reduced data achieving a peak accuracy of 99.56%, significantly out- performing Principal Component Analysis (76.82%) and full data without any dimensionality reduction (55.19%). Experiments are also performed on a public dataset with the authors’ renditions for comparison and to increase the diversity and representation of Indian classical music data.

Keywords

Boosting classifiers, Cepstral coefficients, Machine learning, Random forests, Support vector machines, Uniform manifold approximation and projection

Department

Applied Data Science

Share

COinS