Automatic Multi-Track Mixing Using Linear Dynamical Systems

This page presents the audio examples referenced in Section 5 of the paper published in the 2011 Sound and Music Computing conference proceedings entitled "Automatic Multi-Track Mixing Using Linear Dynamical Systems". [PDF]

Abstract
Over the past several decades music production has evolved from something that was only possible with multi-room, multi-million dollar studios into the province of the average person's living room. New tools for digital production have revolutionized the way we consume and interact with music on a daily basis. We propose a system based on a structured audio framework that can generate a basic mix-down of a set of multi-track audio files using parameters learned through supervised machine learning. Given the new surge of mobile content consumption, we extend this system to operate on a mobile device as an initial measure towards an integrated interactive mixing platform for multi-tracked music.

Overview
Our dataset for this work is a set of multi-track stems from the RockBand video game. Since we do not have the actual gain parameters from the studio sessions used to mix the tracks together, we estimate them by minimizing the error between the spectrum of the target track and a weighted sum of the individual instrument spectra. These are the baseline values we use as labels for a supervised machine learning task.

A linear dynamical system and multiple linear regressor are then trained to estimate the mixing coefficients for a set of multi-track data. Each system is trained separately on a corpus of 48 songs using leave-one-out cross-validation. The following features are used as input to the system.

  • Spectral centroid
  • RMS energy
  • Slope and intercept of a line fit to the spectrum of each frame

Fig. 1 System diagram of multi-track weight prediction.

Audio Examples
Each table below contains the audio examples for a six songs. Each track per song was generated as follows:

  • Source Mix - The audio content obtained from the gaming console.
  • Baseline - This file was generated form the estimated ground truth weights calculated in Section 4.1.
  • Average Tracks - Each track was averaged to form this mix. This is an example of an essentially unmixed or poorly mixed track.
  • LDS Predicted Weights - This track was generated using the estimated weights from the Linear Dynamical System model in Section 4.5.
  • MLR Weights - This file was made using the predicted weights from Multiple Linear Regression in Section 4.4.




NOTE: You cannot listen to the examples using Firefox since they do not license mp3 decoding.

Mixing Method B-52s - Roam
Source Mix
Baseline
Average Tracks
LDS Predicted Weights
MLR Predicted Weights


Mixing Method Blue Oyster Cult - Don't Fear the Reaper
Source Mix
Baseline
Average Tracks
LDS Predicted Weights
MLR Predicted Weights




Mixing Method Dream Theatre - Constant Motion
Source Mix
Baseline
Average Tracks
LDS Predicted Weights
MLR Predicted Weights


Mixing Method Rush - Closer To the Heart
Source Mix
Baseline
Average Tracks
LDS Predicted Weights
MLR Predicted Weights


Mixing Method Stone Temple Pilots - Interstate Love Song
Source Mix
Baseline
Average Tracks
LDS Predicted Weights
MLR Predicted Weights


Mixing Method The Who - Who Are You
Source Mix
Baseline
Average Tracks
LDS Predicted Weights
MLR Predicted Weights


Mixing Method Weezer - Say It Ain't So
Source Mix
Baseline
Average Tracks
LDS Predicted Weights
MLR Predicted Weights