Music is often called the “language of emotion”, and one’s mood can be transformed by a selection of songs. This project explores a method of automatically generating playlists designed to change a listener’s mood. We use data from MoodSwings, a collaborative game for music mood label collection, to predict the emotional content of music. MoodSwings represents emotion using the arousal-valence (A-V) space, where arousal reflects intensity and valence indicates emotional polarity. We then present an automated playlist generation algorithm targeting various trajectories in the A-V space. Human subjects then evaluate the computer-generated playlists through a survey which measures the accuracy and effectiveness of the playlists in transforming the listener’s mood. We describe the mood data collection process and applications of the mood prediction system to automated playlist generation.
The automatic playlist generation algorithm presented is based on the A-V representation of emotions in songs. Valence measures the polarity of an emotion (i.e; positivity or negativity). Arousal represents the intensity (low vs. high). This A-V space is used for our data collection tool, MoodSwings.
MoodSwings is a collaborative game incorporating two players’ judgements of the moods of songs into gameplay. Players are partnered anonymously over the internet, with the goal of dynamically and continuously reaching agreement on the mood of five 30-second song clips drawn from a music database. The game board is analogous to the A-V space.
Using MoodSwings data, we attempt to quantify a song’s mood. Support vector regression is a non linear regression technique used to find the relationship between dependent and independent variables. Mel frequency cepstrum coefficients (MFCCs) are used to account for non linear frequency sensitivity of the human ear.
After collecting A-V labels, songs following predetermined A-V trajectories are selected. For each trajectory, a list of songs within ±10 units of the trajectory is returned. The trajectory is broken into 5 segments, with songs chosen to match each segment. These songs form the automatically generated playlist.
- Mean of overall rating for all the playlists was 3.6111 (scale from 1 to 7).
- Mean distance between the survey response values and the mean of predicted valence arousal values of the songs was 0.4152.
- Playlists from the ‘Group C’ trajectory performed the best.
- Corresponds to a playlists attempting to alter the mood from the “depressing quadrant” to the “joyful” quadrant.
- Mean transition value of 4.3611 for group C (scale of 1 to 7).
Improving Data Collection
Addressing some known issues could produce more meaningful data from MoodSwings. Initial clustering of labels during the begining of a song clip as shown in Figure 3 and insufficient data for many of the songs caused by lack of game replay, skew the prediction results.
Conclusions & Future Work
We seek to avoid clustering of player responses near A-V origin at beginning of clips.
- Determine appropriate song clip time intervals for analysis
We would also like to encourage replay value with more entertaining gameplay
- Use more, shorter song clips to boost excitement levels
- Enhance competitive aspects of gameplay
The following playlist generation methods may also benefit from implementing the following in the future:
- Incorporating personal tastes in music
- Modifying trajectories of playlists