Enabling Humanoids to Dance to Music

Motivation
Humanoids with the capability to dance autonomously could lead to novel and interesting interactions with humans. For example, such robots would be able to react seamlessly to changes in music, so they would be able to dance during live performances. Such robots could also be useful as dance prototype tools, allowing choreographers to test out new motion sequences and to rapidly prototype and refine them. In order to produce a satisfactory dancy, the robot must be able to produce finely-controlled, smooth motions that can be concatenated in a sensible order. Motion-capture data might be needed to ensure that the motions look especially human-like. It must also identify high-level features in music, such as beat locations and emotional content, so that it can move congruently with the audio. Finally, to function in the real world, the robot must be capable of overcoming the obstacles imposed by noisy acoustic environments, and should run causally and in real-time.
Music Information Retrieval Systems
In order for the robot to respond to musical audio, it must be able to extract high-level features from an acoustic waveform. These features, such as beat locations, tempo, and emotional content, can be analyzed to determine a sequence of motions that are congruent with the audio.
Beat tracker
The basic structure of our beat tracker is as follows: as each audio frame is passed into the system, it is divided into subbands by a triangle filterbank. Triangle filters are used because then the frequency division is easy to calculate in the frequency domain. The energy in each subband is found, then autocorrelated. Since the autocorrelation values for a signal are highest at delays corresponding to the period of that signal, the peaks in the autocorrelation of a piece of music (which can be considered periodic with a period of its tempo) can be used to find the tempo of that audio. Finally, the total energy in the frame is calculated, and this energy is compared to energies in previous frames. The system uses a heuristic method, incorporating the relative energy of the current frame to previous frames, and the spacings of previous beat locations compared with the tempo, to determine whether or not the current frame contains a beat.
Our beat tracker has achieved >98% accuracy on clean, digital audio. We are currently investigating systems to allow the beat tracker to function accurately in noisy acoustic environments. We have achieved >92% accuracy with a baseline system of using spectral subtraction to eliminate noisy spectral bins, with 'noisy' bins determined by comparison to a noise threshold. We are also studying more sophisticated systems, including the use of Gaussians to model the audio in such a way that the underlying beat structure is clear.
Mood detection
We are exploring methods of using acoustic features to estimate the mood of the music. We model emotion on a 2-dimensional axis, with one axis being arousal (how energetic the music is) and the other being valence (how positive or negative it is). We have found that the 'spectral contrast' feature, a measure of the peaks and valleys in acoustic subbands, can reliably map to both the arousal and valence of music. The spectral contrast feature gave 48.67% accuracy, plus or minus 6.10%, on a testing set of 240 song clips.
Robot motions
We have enabled our robots to move in response to the musical features that our algorithms detect. The Hubo, for example, can move to the beat of music and can parameterize its gestures according to the mood of the audio. The Hubo's motions have been found to be congruent for both beat and mood with the testing audio.
The Hubo can also dance according to pre-programmed, choreographed gestures. This includes motion-capture data, which helps us enable the Hubo to move in highly human-like manners. Hubo was provided with motion-capture data taken from a human dancer, and then displayed those dance moves in a performance for Drexel University's convocation. The robot was able to move smooth and correctly, acting as a capable partner for the human performers.
Future Work
Videos:
RoboNova dancing (using real-time audio beat tracking)
Hubo dances to music (using real-time audio beat tracking to adapt to changes in the music)
Hubo dancing (side-by-side with model simulation).
Papers: