Research Day Poster April 17, 2007
By: Donald S. Williamson
Advisor: Dr. Youngmoo Kim
This project explores a method that uses a computer algorithm to assess song similarity, which is based solely on the sound waveform. By extracting acoustic features that represent the timbre, or sound quality, similarity is assessed by comparing the quantitative distance between features. Our system is evaluated by comparing the algorithm results to responses from human subjects.
Features: Mel-Frequency Cepstral Coefficients (MFCCs)
Spectrograms contain hundreds of redundant linearly-spaced frequency channels. The first two figures represent the spectrograms of a Bryan Adams and a U2 song respectively. As you can see, it is hard to distinguish between these two songs based on the spectrogram.
Frequency domain signal
The next two figures display the MFCC-spectrograms for the same songs as above from Bryan Adams and U2. MFCCs make it easier to distinguish between two songs.
MFCC domain signal
MFCCs more accurately reflect the information perceived by our auditory system, using the mel scale. The mel scale is a nonlinear scale with higher resolution at low-frequencies, similar to human perception.
Feature Comparison: Kullback-Leibler (KL) Divergence
KL divergence measures the relative similarity between two single-Gaussian distributions of data. A small divergence signifies that the distributions, i.e. songs, are similar.
Graphical Feature Comparison
The top image displays two song distributions with similar mean and covariance values, thus producing a small KL value.
The KL divergence value between the two song distributions in the bottom image would be higher because their mean and variance differ significantly.
The figure below displays a three-dimensional visualization of how song similarity is assessed. Each dot represents a song from an artist.
First, the KL divergence values between each song in the data set were computed. From there the plots were generated by performing Multi-Dimensional Scaling (MDS) on the matrix of KL values.
The computer-similarity algorithm generated the lists of songs to be compared to each other.
This survey was used to evaluate the results of our automatic song-similarity algorithm. Human subjects were also asked to rate the specific similarities and differences between various songs.
The 10-point scale user ratings for Green Day and Aerosmith are plotted against the KL values between each of their test songs and comparison songs.
Ideally a small KL value should result in a high user rating and visa versa. The correlation between our system and the user ratings is -0.35; -1 signifies perfect correlation.
Expanding the data set to include a larger variety of music may enhance our system. Users may also benefit from a multi-touch display screen that will enable them to visually quantify song similarity.