- About MET-lab
- Research projects
- K-12 education initiatives
In this work, evaluation is performed using labels from Pandora’s Music Genome Project® (MGP) across more than 1.2Million examples.
Human Labeled Attributes (1): The first method uses only a selection of human-labeled attributes from Pandora's MGP as features to classify the presence of a genre.
Audio Features (2a): The second method uses only audio features directly to classify the presence of a genre.
Predicted Attributes (2b): The third method uses audio features to predict the presence of the human labeled attributes from (1). These attribute predictions are used to classify the presence of genre.
Hybrid Method (2c): The hybrid method uses audio features to predict the presence of the human labeled attributes from (1). These attribute predictions are used along with the raw audio features to classify the presence of genre.
Model Setup: Each genre task is formulated as an individual binary classification and uses logistic regression as the learning model. Each of the attribute learning tasks are formulated as a binary valued classification using logistic regression or a continuous valued regression using linear regression. Tasks that incorporate audio use the following features: MFCCs (460) , Mellin Scale Transform (230) , Beat Profiles (108) , Tempogram Ratios (39) .
The Music Genome Project
In this work, attribute and genre labels are derived from Pandora’s Music Genome Project® (MGP). The MGP contains 500+ expert labels across 1.2M+ songs. In this work we use a subset of the MGP containing 48 musical attributes, 12 “Basic” genres and 47 sub-genres.
Attribute Examples: Male Vocals, Female Vocals, Distorted Guitar, Triple Meter, Syncopation, Live Recording, etc.
Basic Genre Examples: Rock, Rap, Latin, Jazz, etc.
Sub-genre Examples: Light Rock, Hard Rock, Punk Rock, Bebop Jazz, Afro-Cuban Jazz, etc.
In order to take a closer look at musical attributes and their relation to genre, we'll explore some components of Jazz. The results in Figure 2 show how well each of the individual attributes perform as a single dimensional feature when classifying sub-genres within Jazz. Classification ROC-AUC values are shown for each of the attributes on the left across a few sub-genres listed below the figure.
One of the more obvious correlations is with the swing attribute. Notice how the presence of swing is a good predictor of both "Swing Jazz" and "Boogie." However swing isn't always an important attribute of jazz. In "Afro-Cuban Jazz", swing is not a good predictor, however, we see that Syncopation is, due to the syncopated, straight-time clave rhythms present. "Afro-Cuban Jazz" also contains unique instrumentation. The presence of auxiliary percussion (not standard drumset) instruments (i.e., congas, claves, etc.) is a defining factor. More interestingly the presence of a backbeat is a good predictor of "Free Jazz". This is due to "Free Jazz" being one of the few styles of music with out a backbeat, making that negative correlation a powerful predictive attribute.
Classification results for all 4 model types from Figure 1 are shown in Figure 3. The top of the figure shows the results for the Jazz sub-genre group. The bottom plot shows the average AUC-ROC for all genre groups. In all cases, the 48 musical attributes are the best representation to use when classifying genre. This shows that this low dimensional representation (1) is powerful and contains important correlations to genre.
The audio features alone (2a) do reasonably well also, but in using only audio features, there is no way to know what about genre each is capturing. This is achieved by learning the musical attributes from audio and using estimated attributes to classify genre (2b). While it does not work as well as direct audio features, we gain insight into what the features are capturing, as well as a significant and meaningful reduction in dimensionality. There is also lots of room for improvement here. Better models of each attribute individual will greatly improve the musical attributes layer of this model, and therefore improve genre classification overall. Lastly, the final model (2c) is second only to the human labeled attributes alone. This shows that audio features and the attribute models maybe contain complementary information, and each may be making up for shortcomings in the other.
Cite This Work