An Interactive Musical Prediction System with MDRNNs

Dr Charles Martin - The Australian National University

web: charlesmartin.au     mastodon: @[email protected]

What is this?

Learning to Predict Sequences

Interacting with Musical Predictions

Why is this needed?

Creative Deep Learning Systems NIMEs
Focus on MIDI data (e.g., Magenta Studio) Yes MIDI, but also many custom sensors
Focus on digital audio Focus on performer gestures
Focus on composition/artefact generation Focus on interaction
Rhythm on 16th note grid Complex or no rhythm
Focus on categorical data Continuous data more interesting

IMPS: Interactive Musial Prediction System

  • An opinionated deep learning model for NIMEs
  • An environment for making NIMEs that play themselves
  • “Wekinator” for deep learning?

How does it work?

Mixture Density Recurrent Neural Network

Mixture Density RNN

Good at predicting creative, continuous, multi-dimensional data: handwriting, sketches… musical gestures?

What to do with predictions?

  1. Call-and-Response: Continue gestures when performer stops
  2. Layered predictions: Always predict next move from current gesture
  3. Duet: Two interdependent processes

Ok, how do I use it?

Three easy steps…

  1. Collect some data: IMPS logs interactions automatically to build up a dataset
  2. Train an MDRNN: IMPS includes good presets, no need to train for days/weeks
  3. Perform! IMPS includes three interaction modes, scope to extend in future!

Is this even practical?

Deep Learning in NIMEs??

  • Is it practical for real-time use?
  • How do the MDRNN parameters affect time per prediction?
  • What are “good defaults” for training parameters?
  • Do you need a powerful/expensive computer?

Test Systems

Test computers

Results: Time per prediction

Time per prediction vs LSTM units

Time per prediction (ms) with different sizes of LSTM layers.

Results: Time per prediction

Time per prediction vs MDN dimension

Time per prediction (ms) with different MDN output dimensions. (64 LSTM units)

Results: Training Error vs Validation Set Error

12K sample dataset (15 minutes of performance)

Takeaway: Smallest model best for small datasets. Don’t bother training for too long.

Results: Training Error vs Validation Set Error

100K sample dataset (120 minutes of performance)

Takeaway: 64- and 128-unit model still best!

Results: Exploring Generation

Takeaway: Make Gaussians less diverse, make categorical more diverse.

Try it out!

  • Available on GitHub
  • Try with your NIMEs!
  • Hack if you want!
  • Add an issue with problems/results!

Twitter: @cpmpercussion

Website: creativeprediction.xyz/imps