Embedding Embodied Music Generation

Dr Charles Martin - The Australian National University

web: charlesmartin.com.au     twitter/github: @cpmpercussion

Ngunnawal & Ngambri Country

Embodied Music Generation

  • note generation: generate “symbolic” music—notes (A, B, C, half-note, quaver, etc.). Abstract version of sounds created by some musical instruments.

  • embodied gesture generation: generate the movements a performer makes to operate a particular musical instrument.

this project explores embodied gesture generation in an improvised electronic music context!

Why do this?

  • Lots of musical instruments don’t use “notes”

  • e.g., turntable, mixer, modular synthesiser, effects pedal, etc

  • what does “intelligence” and “co-creation” look like in these instruments?

  • can we incorporate generative AI into a longer-term performance practice?

Embodied Predictive Musical Instrument (EMPI)

Embodied Predictive Musical Instrument (EMPI)

  • Predicts next movement and time, represents physically.
  • Experiments with interaction mappings; mainly focussed on call-response
  • Weird and confusing/fun?

Training Data

Human Data Sine Data Square Data Saw Data Noise Data

Generated Data

Human Generation Synth Generation Noise Generation

Improvisations with EMPI

  • 12 participants

  • two independent factors: model and feedback

  • model: human, synthetic, noise

  • feedback: motor on, motor off

Results: Survey

Change of ML model had significant effect: Q2, Q4, Q5, Q6, Q7

Results: Survey

  • human model most “related”, noise was least

  • human model most “musically creative”

  • human model easiest to “influence”

  • noise model not rated badly!

Participants generally preferred human or synth, but not always!

Results: Performance Length

Human and synth: more range of performance lengths with motor on.

Noise: more range with motor off.


Studied self-contained intelligent instrument in genuine performance.

Physical representation could be polarising.

Performers work hard to understand and influence ML model.

Constrained, intelligent instrument can produce a compelling experience.

Generative AI System

  • gestural predictions are made by a Mixture Density Recurrent Neural Network (implemented using “Interactive Music Prediction System”—IMPS)

  • MDRNN: an extension of common LSTM/RNN designs to allow expressive predictions of multiple continuous variables.

  • MDRNN specs: 2 32-unit LSTM layers, 9-dimensional mixture density layer (8 knobs + time)

  • IMPS: A CLI Python program that provides MDRNN, data collection, training and interaction features.

  • communicates with music software over OSC (Open Sound Control)

  • in this case, MDRNN is configured for “call-and-response” interaction (or “continuation”)

Performances and Experiences

  • deployed in performance since 2019

  • so it works! and it’s practical!

  • but is it better than a random walk generator?

Influence and Co-Creation

  • can be steered (a little bit) by performer’s gestures

  • tends to continue adjusting knobs the performer last used

  • learns interesting behaviours from data (moving one vs multiple knobs, pauses, continuous changes)

  • good to for performer to have a different task to work on.

  • also important to allow performer to “just listen”

Small Data and Co-Adaptation

  • interactions from each performance are saved

  • some of these have been incorporated into training datasets

  • co-adaptive: system grows and changes along with the performer (yet to be studied rigorously)