Publication Date
8-1-2024
Document Type
Article
Publication Title
Data in Brief
Volume
55
DOI
10.1016/j.dib.2024.110730
Abstract
There are currently a limited number of Indian classical music datasets, especially those large enough and with useful annotations, particularly the subtler ones, such as the tonic, for training classification or prediction models. The dataset described in this paper is created with useful tonic annotations, to fill this gap. The tonic pitch, or base pitch, plays an important role in music, so much so that it is sometimes called the keynote. The vocalists and the accompanying instrumental ensemble are fine-tuned to this keynote to render the composition. The first and second authors of this paper, who are vocalists themselves, recorded songs in four different tonics: F#, G, G#, and A. Using the Python library pydub, each 3+ minute song was segmented into 20-second snippets, including the remainder as a separate snippet. The raw audio snippet data is available in folders separated by tonic, and a directory contains each snippet's file path and tonic. This dataset can be reused for tonic classification work in the future, as well as for training other automated systems targeting higher-level attributes of ICM, such as melodic framework, as a tonic can be the basis for them all.
Keywords
Indian classical music, Machine learning, Music tonic, Raw audio snippets
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Applied Data Science
Recommended Citation
Samhita Konduri, Kriti V. Pendyala, and Vishnu S. Pendyala. "KritiSamhita: A machine learning dataset of South Indian classical music audio clips with tonic classification" Data in Brief (2024). https://doi.org/10.1016/j.dib.2024.110730