Dancing Humanoids

Enabling Humanoids to Respond to Music


Motivation

Humanoids with the capability to dance or otherwise respond to music autonomously could lead to novel and interesting interactions with humans. While various artist are interested in incorporating robots into musical performances, it is vital that those robots be able to analyze and respond to the music so that their performances fit with what the humans are doing. In order to produce a satisfactory response, the robot must be able to identify high-level features in music, such as beat locations, so that it can move congruently with the audio. To function in the real world, the robot must be capable of overcoming the obstacles imposed by noisy acoustic environments, and should therefore be robust to the various types of noises which may occur during a performance. Ultimately, our objective is to enable human-robot musical performances, in which both the humans and robots can build off of each other to produce novel and interesting music



Music Information Retrieval Systems
In order for the robot to respond to musical audio, it must be able to extract high-level features from musical heard over the robot's microphones. These features, such as beat locations, tempo, and emotional content, can be analyzed to determine a sequence of motions that are congruent with the audio.


Beat tracking

    Our system uses a beat tracker that was developed to be robust to noise. Audio is recorded through the robot's microphones and is first passed through a Harmonic-Percussive Source Separation (HPSS) algorithm, which allows the system to filter out the less-informative harmonic component of the signal and to retain only the percussive component. The audio is then processed with a Probabilistic Latent Component Analysis (PLCA) algorithm. PLCA decomposes a signal, such as the audio waveform that the robot hears, into its component elements. Decomposing the audio in this manner helps to separate components containing beat information, such as drums, from components containing noise elements or non-beat musical components. In order to model the time-varying nature of a musical beat, the PLCA algorithm processes three consecutive columns of the percussive component's spectrogram at once, thereby determining the spectral characteristics of each component in three consecutive forty-millisecond chunks. The PLCA algorithm also determines the probability of each component being active during each time frame (or activation probability), and these probabilities are used as potential accent signals which can indicate the presence of beats.

    The activation probability of each component is correlated with many impulse trains, each with a slightly different period. This operation will produce the strongest result when an impulse train of a certain period is correlated with an activation probability that has the same period. The component whose activation probability produces the strongest response with any of the impulse trains is thus judged to have the most periodic, or beat-like, activation probability, and thus most likely represents the beat. Additionally, the impulse train that produces the strongest response is judged to have the same period as the music. That activation probability and tempo estimation are finally passed into a Dynamic Programming algorithm which attempts to locate a sequence of beats positioned near local maxima of that activation probability and spaced according to the estimated tempo.



    Mood detection

    Acoustic features are used to estimate the mood of the music. Emotion is modeled in terms of arousal (how energetic the music is) and valence (how positive or negative it is). We have found that the 'spectral contrast' feature, a measure of the peaks and valleys in acoustic subbands, can reliably map to both the arousal and valence of music. The spectral contrast feature gave 48.67% accuracy, plus or minus 6.10%, on a testing set of 240 song clips.



    Robot motions

    We have enabled our robots to move in response to the musical features that our algorithms detect. The Hubo, for example, can move to the beat of music and can parameterize its gestures according to the mood of the audio. The Hubo's motions have been found to be congruent for both beat and mood with the testing audio.

    The Hubo can also dance according to pre-programmed, choreographed gestures. This includes motion-capture data, which helps us enable the Hubo to move in highly human-like manners. Hubo was provided with motion-capture data taken from a human dancer, and then displayed those dance moves in a performance for Drexel University's convocation. The robot was able to move smooth and correctly, acting as a capable partner for the human performers.



    Future Work

  • Incorporate more components into the beat tracker's analysis.
  • Analyze the components themselves to classify beats (e.g., as downbeats, offbeats, etc.).


  • Videos:

    RoboNova dancing (using real-time audio beat tracking)

    Hubo dances to music (using real-time audio beat tracking to adapt to changes in the music)

    Hubo dancing (side-by-side with model simulation).

    Papers:


  • D. K. Grunberg and Y. E. Kim, "Rapidly Learning Musical Beats in the Presence of Environmental and Robot Ego Noise," in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014. [PDF]

  • D. K. Grunberg, A. M. Batula, E. M. Schmidt, and Y. E. Kim, "Affective Gesturing with Music Mood Recognition," in Proceedings of the International Conference on Humanoid Robotics, 2012. [PDF]

  • D. K. Grunberg, A. M. Batula, E. M. Schmidt, and Y. E. Kim, "Synthetic Emotions for Humanoids: Perceptual Effects of Size and Number of Robot Platforms," Journal of Synthetic Emotions: Special Issue on Music, Robots, and Emotion (invited paper), 2012. [PDF]

  • D. K. Grunberg, A. M. Batula, and Y. E. Kim, "Towards the Development of Robot Musical Audition," in Proceedings of the 2012 Music, Mind, and Invention Workshop (MMI), 2012. [PDF]

  • D. K. Grunberg, D. M. Lofaro, P. Y. Oh, and Y. E. Kim, "Robot Audition and Beat Identification in Noisy Environment," in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011. [PDF]

  • Y. E. Kim, D. K. Grunberg, A. M. Batula, D. M. Lofaro, J.-H. Oh, and P. Y. Oh, "Enabling Humanoid Musical Interaction and Performance," in Proceedings of the 2011 International Conference on Collaboration Technologies and Systems, 2011. [PDF]

  • D. K. Grunberg, R. Ellenberg, I.-H. Kim, J.-H. Oh, P. Y. Oh, and Y. E. Kim, “Development of an Autonomous Dancing Robot,” International Journal of Hybrid Information Technology, 2010. [PDF]

  • E. M. Schmidt, D. Turnbull, and Y. E. Kim, "Feature selection for content-based, time-varying musical emotion regression," Proc. ACM SIGMM International Conference on Multimedia Information Retrieval, 2010. [PDF]

  • Y. E. Kim, A. M. Batula, D. K. Grunberg, D. M. Lofaro, J.-H. Oh, and P. Y. Oh, “Developing Humanoids for Musical Interaction,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, October, 2010. [PDF]

  • R. Ellenberg, D. K. Grunberg, P. Y. Oh, and Y. E. Kim, “Using Miniature Humanoids as Surrogate Research Platforms,” in Proceedings of the IEEE-RAS Conference on Humanoid Robotics, 2009. [PDF]

  • R. Ellenberg, D. K. Grunberg, P. Y. Oh, and Y. E. Kim, “Creating an Autonomous Dancing Robot,” in Proceedings of the International Conference on Hybrid Information Technology, August, 2009. [PDF]

  • D. K. Grunberg, R. Ellenberg, Y. E. Kim, and P. Y. Oh, "From RoboNova to HUBO: Platforms in Robot Dance," in Proceedings of the International Conference of Advanced Humanoid Robotics Research (ICAHRR) 2009, August, 2009. [PDF]

  • R. Ellenberg, D. K. Grunberg, P. Y. Oh, and Y. E. Kim, “Exploring Creativity Through Humanoids and Dance,” in Proceedings of the 5th International Conference on Ubiquitous Robotics and Ambient Intelligence, November, 2008. [PDF]