Modeling Instruments As Dynamic Textures

Overview
In this work we introduce the concept of modeling instrument tones as dynamic textures. Dynamic textures are multi-dimensional signals which exhibit temporal-stationary characteristics that facilitate modeling them as observations from a linear dynamical system (LDS). Previous work in dynamic textures research has shown that sequences exhibiting certain temporal-stationary characteristics can be well modeled and re-synthesized by an LDS. Short-time Fourier transform (STFT) coefficients of certain instrument tones (e.g. piano, guitar) exhibit similar stationary properties in their temporal evolution. We show that these instruments can be re-synthesized using an LDS model with high fidelity, even with low-dimensional models. In addition, we analyze the connections between musical qualities such as articulation with linear dynamical system model parameters. Furthermore, we investigate how model parameters can be used to control and alter musical qualities.

This page provides a multitude of audio samples produced by applying our LDS modeling approach to a corpus of instrument samples.


Audio Examples for Individually Learned LDS Models
The tables below allow the LDS reconstructed audio files to be compared to their analyzed counter parts. The synthetic examples were created by varying the following parameters:

  • Model Order - This indicates the number of dimensions used in the hidden state vector to regenerate the signal's Short-time Fourier Transform (STFT). The model order is typically much less than the size of the Discrete Fourier Transform (DFT) used to extract the STFT
  • Hankel Observations - Varying the number of Hankel Observations allows each hidden state variable to account for not only the current output, but also future outputs. We find that using additional observations reduces the noise in the synthetic signals.



NOTE: You cannot listen to the examples using Firefox since they do not license mp3 decoding.


Piano Samples

Reconstruction Type Piano - Hard Key Press - G3
Original
SVD, order 25
SVD, order 50
Hankel, order 25, 10 observations
Hankel, order 50, 25 observations
Reconstruction Type Piano - Medium Key Press - G3
Original
SVD, order 25
SVD, order 50
Hankel, order 25, 10 observations
Hankel, order 50, 25 observations
Reconstruction Type Piano - Soft Key Press
Original
SVD, order 25
SVD, order 50
Hankel, order 25, 10 observations
Hankel, order 50, 25 observations


Marimba Samples

Reconstruction Type Marimba - A3
Original
SVD, order 25
SVD, order 50
Reconstruction Type Marimba - A3
Original
SVD, order 25
SVD, order 50
Reconstruction Type Marimba - A3
Original
SVD, order 25
SVD, order 50


Audio Examples For Jointly Learned Models
To reduce the parameter space required for our LDS models, we employ a joint modeling approach where a common dynamics matrix is used to describe different articulations of a particular note. Thus, each note is produced by varying the observation matrix and the initial state value using the common dynamics matrix. The audio examples below provide a comparison of the original note articulation and the reconstructed tone using the joint approach.

The tones were modeled using the following techniques:

  • Performing a joint singular value decomposition (SVD) on the STFT matrices for each articulation to yield the hidden state vectors for each tone.
  • Estimating the resulting dynamics matrix based on the hidden state vectors
  • Using a reduced model order of 25 for the hidden state variables
Articulation Piano - G3
Original Hard Key Press
Reconstructed Hard Key Press
Original Medium Key Press
Reconstructed Medium Key Press
Original Soft Key Press
Reconstructed Medium Key Press