Predicting Time-Varying Musical Mood Distributions

Overview


The appeal of music lies in its ability to express emotions, and it is natural for us to organize music in terms of emotional associations. But the ambiguities of emotions make the determination of a single, unequivocal response label for the mood of a piece of music unrealistic. We address this lack of specificity by modeling human response labels to music in the arousal-valence (A-V) representation of affect as a stochastic distribution. Based upon our collected data, we present and evaluate methods using multiple sets of acoustic features to estimate these mood distributions parametrically using multivariate regression. Furthermore, since the emotional content of music often varies within a song, we explore the estimation of these A-V distributions in a time-varying context, demonstrating the ability of our system to track changes on a short-time basis.


    Collected A-V labels and distribution projections resulting from regression analysis. A-V labels: second-by-second labels per song (gray bullet), standard deviation of collected labels (red ellipse), standard deviation of MLR projection from spectral contrast features (blue dash-dot ellipse), standard deviation of MLR Multi-Level combined projection (green dashed ellipse).


    Time-varying emotion distribution regression results for three example 15-second music clips (markers become darker as time advances): second-by-second labels per song (gray bullet), Standard deviation of the collected labels over 1-second intervals (red ellipse), and standard deviation of the distribution projected from acoustic features in 1-second intervals (blue ellipse).

Published Work:


  • Scott, J., Schmidt, E. M., Prockup, M., Morton, B. and Kim, Y. E. (2012). Predicting time-varying musical emotion distributions from multi-track audio. Proceedings of the International Symposium on Computer Music Modeling and Retrieval, London, U.K.: CMMR. [PDF]

  • Schmidt, E. M. and Kim, Y. E. (2011). Modeling the acoustic structure of musical emotion with deep belief networks. NIPS Workshop on Music and Machine Learning, Sierra Nevada, Spain: NIPS-MML. [Oral Presentation]

  • Schmidt, E. M. and Kim, Y. E. (2011). Modeling musical emotion dynamics with conditional random fields. Proceedings of the 2011 International Society for Music Information Retrieval Conference, Miami, Florida: ISMIR. [PDF]

  • Speck, J. A., Schmidt, E. M., Morton, B. G., and Kim, Y. E. (2011). A comparative study of collaborative vs. traditional annotation methods. Proceedings of the 2011 International Society for Music Information Retrieval Conference, Miami, Florida: ISMIR. [PDF]

  • Schmidt, E. M. and Kim, Y. E. (2011). Learning emotion-based acoustic features with deep belief networks. Proceedings of the 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY: WASPAA. [PDF]

  • Schmidt, E. M. and Kim, Y. E. (2010). Prediction of time-varying musical mood distributions using Kalman filtering. Proceedings of the 2010 IEEE International Conference on Machine Learning and Applications, Washington, D.C.: ICMLA. [PDF]

  • Schmidt, E. M. and Kim, Y. E. (2010). Prediction of time-varying musical mood distributions from audio. Proceedings of the 2010 International Society for Music Information Retrieval Conference, Utrecht, Netherlands: ISMIR. [PDF]