Music Similarity Analysis

Research Day Poster April 17, 2007
By: Donald S. Williamson
Advisor: Dr. Youngmoo Kim

Introduction


This project explores a method that uses a computer algorithm to assess song similarity, which is based solely on the sound waveform. By extracting acoustic features that represent the timbre, or sound quality, similarity is assessed by comparing the quantitative distance between features. Our system is evaluated by comparing the algorithm results to responses from human subjects.


Features: Mel-Frequency Cepstral Coefficients (MFCCs)


Spectrograms contain hundreds of redundant linearly-spaced frequency channels. The first two figures represent the spectrograms of a Bryan Adams and a U2 song respectively. As you can see, it is hard to distinguish between these two songs based on the spectrogram.

Frequency domain signal
   

The next two figures display the MFCC-spectrograms for the same songs as above from Bryan Adams and U2. MFCCs make it easier to distinguish between two songs.

MFCC domain signal
   


MFCCs more accurately reflect the information perceived by our auditory system, using the mel scale. The mel scale is a nonlinear scale with higher resolution at low-frequencies, similar to human perception.


Feature Comparison: Kullback-Leibler (KL) Divergence


KL divergence measures the relative similarity between two single-Gaussian distributions of data. A small divergence signifies that the distributions, i.e. songs, are similar.


Graphical Feature Comparison


The top image displays two song distributions with similar mean and covariance values, thus producing a small KL value.


The KL divergence value between the two song distributions in the bottom image would be higher because their mean and variance differ significantly.


Similarity Assessment


The figure below displays a three-dimensional visualization of how song similarity is assessed. Each dot represents a song from an artist.

   

First, the KL divergence values between each song in the data set were computed. From there the plots were generated by performing Multi-Dimensional Scaling (MDS) on the matrix of KL values.


Similarity Evaluations


The computer-similarity algorithm generated the lists of songs to be compared to each other.

This survey was used to evaluate the results of our automatic song-similarity algorithm. Human subjects were also asked to rate the specific similarities and differences between various songs.


Preliminary Results


The 10-point scale user ratings for Green Day and Aerosmith are plotted against the KL values between each of their test songs and comparison songs.

Ideally a small KL value should result in a high user rating and visa versa. The correlation between our system and the user ratings is -0.35; -1 signifies perfect correlation.


Future Work


Expanding the data set to include a larger variety of music may enhance our system. Users may also benefit from a multi-touch display screen that will enable them to visually quantify song similarity.