Publication Date

8-1-2024

Document Type

Article

Publication Title

Data in Brief

Volume

55

DOI

10.1016/j.dib.2024.110730

Abstract

There are currently a limited number of Indian classical music datasets, especially those large enough and with useful annotations, particularly the subtler ones, such as the tonic, for training classification or prediction models. The dataset described in this paper is created with useful tonic annotations, to fill this gap. The tonic pitch, or base pitch, plays an important role in music, so much so that it is sometimes called the keynote. The vocalists and the accompanying instrumental ensemble are fine-tuned to this keynote to render the composition. The first and second authors of this paper, who are vocalists themselves, recorded songs in four different tonics: F#, G, G#, and A. Using the Python library pydub, each 3+ minute song was segmented into 20-second snippets, including the remainder as a separate snippet. The raw audio snippet data is available in folders separated by tonic, and a directory contains each snippet's file path and tonic. This dataset can be reused for tonic classification work in the future, as well as for training other automated systems targeting higher-level attributes of ICM, such as melodic framework, as a tonic can be the basis for them all.

Keywords

Indian classical music, Machine learning, Music tonic, Raw audio snippets

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Applied Data Science

Share

COinS