MUSAIC: Multi-USer Algorithmic Interactive Composer
by Lance Putnam
December 8, 2003

Objective

The MUSAIC system aims to group the judgments of many people about short algorithmically generated snippets of music into a collective composition that will either be played back alongside the users' activity or at a later time. The system will consist of a single server and multiple clients all running the music programming and sound synthesis software Supercollider 3. MUSAIC uses the concept of distributed processing over a network to evaluate music by (ideally) many users and collect this information to produce music that represents the collective taste of the users.

The system is designed to transfer as little data as possible between server and client while still maintaining high quality sound output at the client-side.  The problem that plagues current networked multimedia applications is the trade-off between speed of transfer and quality of data.  This is evident in streaming audio.  MUSAIC attempts to reduce the amount of network overhead by describing an entire song by a small seed of 14 digits.  The quality of data problem is also overcome since the song is synthesized on the client's computer.

Related Work

SAOL

SAOL, Structured Audio Orchestration Language, is part of the MPEG-4 standard.  This language is similar to the unit-generator models of Supercollider 3 and MAX/MSP.  SAOL is a solution to transferring high-quality sound over a low speed/bandwidth network by describing the data rather than compressing it.

Network Architecture

System Overview

MUSAIC is based upon the voting of multiple performers on short snippets of algorithmically composed music to formulate a master composition.  The snippets of music are continuously being previewed and voted upon by performers wearing headphones.  After a certain amount of these snippets, the most voted upon snippet will be integrated into the master composition that will be played over loudspeakers for an audience.



The server and client's main form of exchange is the song ID.  A song ID is a short string used to identify the synthesized instruments and composition of a song. For the sake of simplicity and limitations of human memory, a 7-digit integer is used for each the synthesis and composition seeds.

Client Functions

The client is responsible for the following tasks:

1) establishing and breaking connection with the server
2) algorithmic sound synthesis and composition based on seeds

Server

The server will act as a general record keeper of the activities during a
composition session. Primary responsibilities include:

1) generating song IDs and sending them to connected clients
2) logging votes for song IDs in database
3) broadcasting winning song ID after specified amount of song IDs have been judged
4) listening for new clients

The server flow is illustrated in the following diagram.  T is the time between snippets and N is the number of snippets per voting session.

Synthesis

Overview

Instruments are synthesized by randomizing the parameters that define the timbre of the instrument.  For each parameter of the instrument a range of values is defined to select from.  For example, the duration of the bass drum will be a selected from a range of 200-800 ms.  By using a seeded random number generator to pick these values, a distinct sounding, reproducible instrument can be synthesized. 

The following is a list of instruments that will be used to construct a song.

Percussion Tonal
Bass Drum
Snare Drum
Clap
Closed Hi-hat
Opened Hi-hat
Bass
Arpeggiator

Drum

The drum is synthesized in two parts: the transient hit and the tone.  The transient hit is the initial sound of the drum when it is hit.  This is rich in frequencies and is simulated by a short burst of noise.  The tone is the sound the skin of the drum makes as it vibrates after the initial hit.  Two modes of this vibration are simulated, (0,0) and (0,1).  These are the lowest frequency concentric vibrations of the skin.  Also, when a drum is first hit it is tight and thus produces a higher pitched sound.  As time progresses, the skin loosens and produces the normal modes of vibration.  This behavior can be reproduced by applying an exponentially decaying envelope to the fundamental pitch of the drum.

Snare

The snare drum operates in a similar fashion to the drum and is thus synthesized using similar methods.  Additional modes of vibration and a longer noise burst were used to give the snare a stronger presence.  The snare is also run through a resonant low-pass filter to give a larger palette of sound.

Clap

The clap sound is simulated quite easily with a simple filtered noise burst.  To give the clap its distinct sound, a series of equally spaced bursts of noise are followed by a longer decaying envelope.

Hi-hat

Both an opened hi-hat and closed hi-hat are produced using a resonant high-pass filtered noise burst.  The opened hi-hat is simply given a longer envelope length than the closed hi-hat.

Tonal Instruments

The tonal instruments, bass and arpeggiator, use a dual-oscillator subtractive synthesis method.  Three waveforms are possible for each of the two oscillators, a sine wave, a triangle wave, and a saw wave.  Each oscillator has independent controls for amplitude, frequency and amplitude envelope.  Oscillator 1 can also be phase modulated by oscillator 2 to add overtones.  The two oscillators are run through a resonant low-pass filter with envelope control over the cutoff frequency.

Composition

Overview

Composition is done in two steps- phrase generation and song generation.  A phrase, within the scope of this project, is a small sequence of notes that defines each instruments playback throughout the song.  During the song generation phase, these phrases are then layered together to create a measure.  Also, as part of the song generation phase, the measures are combined sequentially in a probabilistic manner.

Phrase Generation

The first step in composing the song is creating a phrase for each of the parts. The phrases vary in size, but are generally short, on the order of 8-32 steps. Probability sequences are used to determine when an instrument will sound for a specific step. A probability sequence for the drum part might be [1,0,0,0.2,0.2,0.2,0,0]. This means on the first step of the phrase the drum will always be played and on steps 4-6 the drum will play only 20% of the time.

For tonal instruments, such as the bass and arpeggiator lines, pitch must also be considered. First, a root frequency is randomly selected from a specified range of values. Successive notes are then selected according to a certain spread factor. For instance, the bass-line may first select a frequency from a range of 60-120Hz and then create successive notes within the range (root/spread - root*spread). After a sequence of notes is generated in this fashion, the phrase may be extended by adding random transpositions of itself in series.

Of course, a random set of frequencies will not produce musically pleasing results. To solve this, the entire sequence of frequencies is quantized to a randomly chosen scale.

Song Generation

After the phrases are generated, the song is composed using a simple first-order Markov chain.  The states of the Markov chain define what phrases will be added in parallel to construct a measure and what state to go to next.  The phrases are put into a measure according to a set of probabilities defined by the state.  An over-simplified Markov chain is presented in the following diagram.
 State A
Next State p
A 0.7
B 0.3
Instrument p
Rhythm 1
Bass 0.5
Lead 0.0
State B
Next State p
B 0.7
C 0.3
Instrument p
Rhythm 0.1
Bass 0.4
Lead 0.5
State C
Next State p
C 0.8
D 0.2
Instrument p
Rhythm 1
Bass 1
Lead 1
State D
Next State p
D 0.5
end 0.5
Instrument p
Rhythm 0.1
Bass 0.8
Lead 0.1

Results

The results from testing MUSAIC gave mixed feelings.  Although many interesting textures of music seemed possible at first, it became obvious after extensive listening that most of it sounded similar.  Fortunately, the synthesis and composition algorithms have much room for improvement.  Improved synthesis algorithms would provide a richer and more fulfilling template of timbres.  More complex composition algorithms could help reduce repetitiveness and add more movement to the song as a whole.  The voting system worked as planned, however, not much testing was done with multiple users.  Technically speaking, this seems to be a trivial issue as the system can effectively be tested by a single user judging the music.

With the considerations above and considering that fact that this project is still in its infancy, it has shown to be a worthwhile effort to pursue further.

Future Work

Templates

Picking a specific genre of music is important depending on the situation.  For clubs and DJs, techno and trance genres would be most suitable.  If the system was to be used in an installation, maybe something more abstract or experimental would be better.

Timing

Timing applies both to the local activities of the client and also to the amount of time it takes to send packets over the network.  One issue is that if a user has a large packet round-trip-time he/she will not have as much time to vote on a song.  A solution might be to have the server track the average RTT of each client.  The other timing issue applies to when the master composition should be played.

Better Synthesis

At the moment, only a handful of instruments are available.  More instruments would provide a larger array of sound and also help create more genres of electronic music.

Maximize Info/Minimize Seed

A major problem with representing large amounts of data with small identifiers is that you limit freedom of choice.  MUSAIC currently uses two 7-digit integers to represent the song.  This means there are 107 x 107 = 100,000,000,000,000 distinct songs (per template).  While 100 trillion choices is not bad, this can be easily increased by including upper and lower case letters in the seed.  This gives 627 x 627 = 12,401,769,434,657,526,912,139,264 choices.