The MUSAIC system aims to group the judgments of many people about short
algorithmically generated snippets of music into a collective composition that
will either be played back alongside the users' activity or at a later time.
The system will consist of a single server and multiple clients all running
the music programming and sound synthesis software Supercollider 3. MUSAIC
uses the concept of distributed processing over a network to evaluate music by
(ideally) many users and collect this information to produce music that
represents the collective taste of the users.
The system is designed to transfer as little data as possible between server
and client while still maintaining high quality sound output at the
client-side. The problem that plagues current networked multimedia
applications is the trade-off between speed of transfer and quality of
data. This is evident in streaming audio. MUSAIC attempts to
reduce the amount of network overhead by describing an entire song by a small
seed of 14 digits. The quality of data problem is also overcome since
the song is synthesized on the client's computer.
SAOL, Structured Audio Orchestration Language, is part of the MPEG-4 standard. This language is similar to the unit-generator models of Supercollider 3 and MAX/MSP. SAOL is a solution to transferring high-quality sound over a low speed/bandwidth network by describing the data rather than compressing it.
MUSAIC is based upon the voting of multiple performers on short snippets of
algorithmically composed music to formulate a master composition. The
snippets of music are continuously being previewed and voted upon by
performers wearing headphones. After a certain amount of these snippets,
the most voted upon snippet will be integrated into the master composition
that will be played over loudspeakers for an audience.

The server and client's main form of exchange is the song ID. A song ID
is a short string used to identify the synthesized instruments and composition
of a song. For the sake of simplicity and limitations of human memory, a
7-digit integer is used for each the synthesis and composition seeds.
The client is responsible for the following tasks:
1) establishing and breaking connection with the server
2) algorithmic sound synthesis and composition based on seeds
The server will act as a general record keeper of the activities during a
composition session. Primary responsibilities include:
1) generating song IDs and sending them to connected clients
2) logging votes for song IDs in database
3) broadcasting winning song ID after specified amount of song IDs have been
judged
4) listening for new clients
The server flow is illustrated in the following diagram. T is the time
between snippets and N is the number of snippets per voting session.

Instruments are synthesized by randomizing the parameters that define the
timbre of the instrument. For each parameter of the instrument a range
of values is defined to select from. For example, the duration of the
bass drum will be a selected from a range of 200-800 ms. By using a
seeded random number generator to pick these values, a distinct sounding,
reproducible instrument can be synthesized.
The following is a list of instruments that will be used to construct a song.
| Percussion | Tonal |
| Bass Drum Snare Drum Clap Closed Hi-hat Opened Hi-hat |
Bass Arpeggiator |
The drum is synthesized in two parts: the transient hit and the tone. The transient hit is the initial sound of the drum when it is hit. This is rich in frequencies and is simulated by a short burst of noise. The tone is the sound the skin of the drum makes as it vibrates after the initial hit. Two modes of this vibration are simulated, (0,0) and (0,1). These are the lowest frequency concentric vibrations of the skin. Also, when a drum is first hit it is tight and thus produces a higher pitched sound. As time progresses, the skin loosens and produces the normal modes of vibration. This behavior can be reproduced by applying an exponentially decaying envelope to the fundamental pitch of the drum.
The snare drum operates in a similar fashion to the drum and is thus synthesized using similar methods. Additional modes of vibration and a longer noise burst were used to give the snare a stronger presence. The snare is also run through a resonant low-pass filter to give a larger palette of sound.
The clap sound is simulated quite easily with a simple filtered noise burst. To give the clap its distinct sound, a series of equally spaced bursts of noise are followed by a longer decaying envelope.
Both an opened hi-hat and closed hi-hat are produced using a resonant high-pass filtered noise burst. The opened hi-hat is simply given a longer envelope length than the closed hi-hat.
The tonal instruments, bass and arpeggiator, use a dual-oscillator subtractive synthesis method. Three waveforms are possible for each of the two oscillators, a sine wave, a triangle wave, and a saw wave. Each oscillator has independent controls for amplitude, frequency and amplitude envelope. Oscillator 1 can also be phase modulated by oscillator 2 to add overtones. The two oscillators are run through a resonant low-pass filter with envelope control over the cutoff frequency.
Composition is done in two steps- phrase generation and song generation. A phrase, within the scope of this project, is a small sequence of notes that defines each instruments playback throughout the song. During the song generation phase, these phrases are then layered together to create a measure. Also, as part of the song generation phase, the measures are combined sequentially in a probabilistic manner.
The first step in composing the song is creating a phrase for each of the
parts. The phrases vary in size, but are generally short, on the order of 8-32
steps. Probability sequences are used to determine when an instrument will
sound for a specific step. A probability sequence for the drum part might be
[1,0,0,0.2,0.2,0.2,0,0]. This means on the first step of the phrase the drum
will always be played and on steps 4-6 the drum will play only 20% of the
time.
For tonal instruments, such as the bass and arpeggiator lines, pitch must also
be considered. First, a root frequency is randomly selected from a specified
range of values. Successive notes are then selected according to a certain
spread factor. For instance, the bass-line may first select a frequency from a
range of 60-120Hz and then create successive notes within the range
(root/spread - root*spread). After a sequence of notes is generated in this
fashion, the phrase may be extended by adding random transpositions of itself
in series.
Of course, a random set of frequencies will not produce musically pleasing
results. To solve this, the entire sequence of frequencies is quantized to a
randomly chosen scale.
After the phrases are generated, the song is composed using a simple
first-order Markov chain. The states of the Markov chain define what
phrases will be added in parallel to construct a measure and what state to go
to next. The phrases are put into a measure according to a set of
probabilities defined by the state. An over-simplified Markov chain is
presented in the following diagram.
State A
|
State B
|
State C
|
State D
|
The results from testing MUSAIC gave mixed feelings. Although many
interesting textures of music seemed possible at first, it became obvious
after extensive listening that most of it sounded similar. Fortunately,
the synthesis and composition algorithms have much room for improvement.
Improved synthesis algorithms would provide a richer and more fulfilling
template of timbres. More complex composition algorithms could help
reduce repetitiveness and add more movement to the song as a whole. The
voting system worked as planned, however, not much testing was done with
multiple users. Technically speaking, this seems to be a trivial issue
as the system can effectively be tested by a single user judging the music.
With the considerations above and considering that fact that this project is
still in its infancy, it has shown to be a worthwhile effort to pursue
further.
Picking a specific genre of music is important depending on the situation. For clubs and DJs, techno and trance genres would be most suitable. If the system was to be used in an installation, maybe something more abstract or experimental would be better.
Timing applies both to the local activities of the client and also to the amount of time it takes to send packets over the network. One issue is that if a user has a large packet round-trip-time he/she will not have as much time to vote on a song. A solution might be to have the server track the average RTT of each client. The other timing issue applies to when the master composition should be played.
At the moment, only a handful of instruments are available. More instruments would provide a larger array of sound and also help create more genres of electronic music.
A major problem with representing large amounts of data with small identifiers is that you limit freedom of choice. MUSAIC currently uses two 7-digit integers to represent the song. This means there are 107 x 107 = 100,000,000,000,000 distinct songs (per template). While 100 trillion choices is not bad, this can be easily increased by including upper and lower case letters in the seed. This gives 627 x 627 = 12,401,769,434,657,526,912,139,264 choices.