Join us in a deep dive into the sound design for Elastic Man - a 100% procedural web experience produced for Adult Swim in collaboration with digital artist David Li.
We present a concise but thorough overview of how we approached the sound design for this hyper-realistic and fairly gross simulation of Morty's face (of Rick and Morty).
This article focuses on how we reconciled both technical obstacles and aesthetic requirements, while leaving room for creative exploration and happy accidents.
But first, if you haven't done so already, try out the experience here.
We received one of the most entertaining sound design briefs we have ever laid our eyes on for this project. These originally included belly gloop ("don't ask"), fleshy slapping, muddy sloshing, rubbery stretching and squeaky toy sounds. The goal was to create a hyperrealistic and gross rendition of Morty's face.
Morty's skin can be stretched away from his face, making a rubbery stretching sound that's modulated based on how hard and fast you're pulling:
After being stretched the skin snaps back when let go, making a fleshy slapping sound:
This slapping action causes ripples to travel around the head like a waterballoon, resulting in mud-like sloshing sounds following the viscous movement:
While the sloshing sounds provide a good basis for the head ripples, an extra layer of gloop would follow larger head deformations. David wanted Morty's face to be like a stomach suspended in space, and so we created a belly gloop model based on some footage from our mood board.
|Physical Volume||Float||Approximate volume of Morty's face|
|Physical Volume Delta||Float||Volume change over time|
|Dragging State||Boolean||Whether the mouse is dragging the skin or not|
|Dragging Delta||3D Vector||Distance from the point of dragging|
|Dragging Velocity||Float||Dragging velocity|
|Surface Area||Float||Total surface area of Morty's face|
|Surface Area Delta||Float||Surface area change over time|
|Slap||Event||Sends an event when Morty's face has been slapped|
The heavy patch was developed as a singleton - in other words, it contains multiple sound modules, and it handles all parameter routing, mapping and post effects including levels, ducking, compression and so forth. Some parameters in the patch such as volume levels were exposed, allowing David or any other developer to easily fine-tune these at a later stage.
Each of the sound modules will be described in detail now in order of increasing complexity. This post will focus primarily on the sound design process: how the designs of our models and their corresponding mapping (integration) strategies were affected by our aesthetic priorities and choices.
This model generates a fleshy slapping sound at variable intensity when the face is pinched and let go. The visualisation system sends a message to the sound model each time a slap has occurred. The severity of the slap is based on the stretching distance at the point of letting go.
We implemented a very simple signal chain to recreate this sound, consisting of white noise and envelopes and dynamic filters. As will be explained momentarily, the type of sound we created consists of two very, very short but distinct temporal sections that we named 'floop' and 'tail' (for lack of better terminology). In similar vein, the model consists of two parts, controlled by the same incoming parameters from the visualisation.
While this turned out to be the simplest model in terms of both parameterisation and signal flow, the sound was difficult to emulate without the aid of material. This tends to be the case with most transient sounds - because we are not used to working in such fine temporal detail, it is difficult to rely solely on our intuition when emulating them procedurally. Two slapping sounds were created in a DAW using conventional sound design techniques and used as main points of reference in the development of the model. These two sounds represented slaps at the highest and lowest intensities. With the aid of spectral analysis and careful listening, these samples were separated into smaller components (or layers) that could easily be recreated procedurally. By using the same signal chain to recreate both sounds, we can interpolate between the two states to create a continuous parameter space representing the severity of the slap.
The sound was separated into 'floop' and 'tail' sections.
The 'floop' section represents a very short (10ms) burst of filtered noise with a reversed exponential envelope. Aside from the amplitude of the noise it also modulates the cut-off frequency of a resonant band-pass filter. This emulates the initial impact on the skin and, by scaling the affected ranges of the envelope, can be made to sound like a small tap, a heavy slap and everything in between.
The 'tail' section represents a downward sloping envelope of noise passed through a separate bank of filters, emulating an almost reverb-like decay often found in cartoon soundtracks. The length of the tail section, frequency ranges of the filters and general volume are all modulated based on the intended severity of the slap.
Have a go with the interactive model here:
The gloop module is based on a common synthesis technique known as granular synthesis. This technique relies on randomness and probability distributions to create dynamic textures out of smaller microscopic 'grains'. A grain can be generated out of a fixed waveform (granular sampling) or out of primitive synthesis components such as a sinusiodal oscillator, as is the case here.
Each grain of gloop generated by our model consists of a random bubble-like sinusoidal chirp that varies in frequency range, duration and density as a function of movement. In other words, each time a chirp is triggered, its amplitude, duration and start/end frequencies are randomised within fixed ranges. This is demonstrated in the following interactive example:
When triggered in fast succession these chirps resemble some of the gloopy bursts of bubbles we were aiming for. Chirp intervals are randomised (within a range of 20-300 milliseconds) on each new trigger, resulting in an arbitrary and uninterrupted stream of bubbles.
To change the stream density we generate another random number on each trigger determining whether a bubble should be off or on. The density can now be modulated interactively by setting a probability threshold expressed as a percentage.
The maximum density can be increased by using multiple copies of the above process simultaneously (i.e. multiple streams). Even at low densities a higher quality of sound can be achieved with multiple voices due to the ability to overlap bubbles. This implementation only required two voices, as we didn't need very dense clusters of bubbles and valuable CPU cycles were better spent on other aspects of the sound.
The next step is to map parameters from the physical simulation to the sound model. This is the fundamental step that transforms the sound model from a monotonous texture generator into a dynamic and interactive gloop simulator. Each aspect of the sound model described above can be made to react to an incoming stream of values from the visualisation engine.
Initially, bubble density was linearly mapped to the velocity of face movement (volume delta). This creates the impression that a larger number of bubbles is generated when there is a lot of movement, and no bubbles when there is no movement. As any professional belly glooper might confess, this is quite close to what one would expect in reality. But some more variation and nuance is required to perfect the sonic interaction.
After some trial and error the mouse dragging velocity was used to amplify the bubble density, making the gloop seem more responsive to mouse movement.
This has the primary advantage of allowing it to come through in the mix on top of some of the other sound modules described below. Bubble density increases with the velocity of face movement (volume delta) and mouse dragging speed.
Like the belly gloop module this sound suggests viscous liquid squanching on itself - in other words, producing lots of tiny little bubbles in response to movement or deformation.
To emulate this sound we would need a much higher density of grains and we would need to follow the face's movement more closely.
In theory it would be possible to use the same synthesis module as above with many more voices and different parameter ranges, but a simpler and computationally efficient solution can be found instead.
As one might expect, the solution we found is less capable of generating convincing belly gloop noises, but can instead efficiently generate the dense sloshing sounds that are required here. Let's pinpoint some of the fundamental differences to our belly gloop generator:
Therefore, instead of considering the model as aggregated voices of individual bubble-producing modules, we can instead try to create a single module that generates or approximates a very fast sequence of sinusoidal bursts (at the cost of lower definition per grain). From a practical perspective this would drastically reduce the computational cost of the model. From a creative perspective this opens up new opportunities for exploration and discovery. This also keeps us from unintentionally homogenising the sound image by reusing the same model.
The synthesis method that we used to create this effect has some similarities to a sample-and-hold operation. First off, we go back to our trusted friend, the white noise generator. All that a white noise generator does is generate a random number on every new audio sample. We can perform logical operations on these numbers, so as to generate a new sequence of numbers constituting a new and useful signal.
In this case we are checking if the numbers fall below a specified threshold t. When the number is below t we output a one, otherwise we output a zero. The resulting signal is a random sequence of impulses that gets denser as t increases.
We've got some basic granular behaviour now, but there's still some work to be done before we start nearing the sloshing sound that we're after. Instead of outputting it directly to the speakers, we use the stream of ones and zeroes as a gating signal, causing a separate pitched signal to rapidly turn on and off. When the number falls below a specified density threshold, a sinusoidal oscillator is toggled and set to a random frequency.
This brings us closer to our reference sloshes, but the overall sound is still quite harsh: the grains are way too short and contain a lot of ugly transients. Because the noise generator is producing new numbers at such a fast rate, sinusoidal bursts are extremely short and abrupt. The resulting sound resembles the sound of frying eggs. By smoothing the signal that is toggling the sinusoidal oscillator we end up with longer and smoother bursts with more perceivable pitches. The sound becomes squelchy and suggests a change in wetness or viscosity.
Density, wetness and frequency range were mapped to react to changes in surface area and volume of Morty's face. The resulting sound suggests viscous fluid squelching as it is forced through air cavities inside the wobbling head:
The sound of rubbery materials stretching is the result of a process known as stick-slip-friction. As two surfaces rub against each other they periodically slip away from each other and stick together again due to dynamic friction forces.
Friction simulations can easily become a technical rabbit hole. Numerical simulations that simulate the underlying forces have been developed in academia, most notably by Stefania Serafin in a musical real-time context. These are known to be very difficult to parameterise and control in a sound design context. They can also be computationally expensive and require the physics simulation to be calculated at audio rate.
Instead, we can try and imitate the resulting sound directly, as proposed by Andy Farnell in Designing Sound (Chapter 32). A combination of pulse-train and a sawtooth oscillators were used to simulate individual 'sticks' and 'slips'. The frequency of each increases as a function of stretching velocity. Some further filtered noise is used to randomly modulate this frequency value and generate a rougher sound (like screeching tires):
To add more detail and realism to the sound, a similar signal chain to the one described above is used to modulate the first set of oscillators. This makes it possible to have macro and micro friction sounds - in other words, stick-slip motions happening within a larger stick-slip window generated by a lower-frequency oscillator.
As it stands, the general stick-slip behaviour of the model is suitable, however the output has an extremely flat spectral signature causing it to sound very unnatural. A resonator is required to give the impression of a virtual object amplifying the sticking and slipping motion we have modelled.
We used our ears to tune a set of filters, each taking as input the output from our friction oscillator. Centre frequencies, bandwidths and gains of the filters were tuned to suggest the sound of a rubbery band.
We then introduced a size parameter to scale these frequencies proportionally. As the filters scale up in frequency they approximate the sound of a virtual object being stretched. This is the final ingredient that is needed to create the sound of rubbery stretching we were after. You can hear the improvement in sound quality in the interactive demo at the end of this section.
Further down the line we decided to pass all of the other modules through the same resonator to create a more believable and holistic sound image.
As in all the other modules described here, a lot of the detail and finesse lies in the way that data extracted from the physical simulation is mapped to the sound model. In this case we developed a workflow where mappings were stored into look-up tables that could easily be modified by hand in the design process. For example, the relationship between stretching velocity and oscillator frequencies can be changed with a single mouse gesture, rather than expressing it mathematically. This made it possible to generate very complex behaviours while retaining a high degree of artistic control. Fast iteration is the key here; technical nitpicking distracts from aesthetic problems, which are better solved through a process of creative exploration and trial-and-error.
Further down the line we felt that the stretching sounds could use a bit more density. We were still far from maxing out our CPU budget, so we solved the problem by simply running three of these modules simultaneously. As the micro-textures are all generated by independent noise generators, each module ends up producing a slightly different sound.
It is worth noting that while some of these modules rely on randomness at a microscopic level, the way that they react to interactions is highly deterministic. We deem this to be very important when creating interactive sound experiences - create variation and diversity by increasing the degrees of freedom, rather than adding random variance. This leads to a greater sense of control and immersion, where repeating the same gesture does result in the same output. (Of course, the real power of interactive computational audio only manifests itself when the available input interaction is sufficiently nuanced).
When dealing with multiple models on a single object the overall sound quality can become monotonous if each model is dependent on the same input parameter and mapping strategy. If the interaction has multiple degrees of freedom then one way of keeping a diverse and responsive sound image is to create distinct mappings for each sound module.
For example, in the case of the belly gloop module we decided to exaggerate its responsiveness to mouse dragging velocity in order for it to stand out during particular interactions. In the slapping module we decided to take the distance along both horizontal and vertical axes into account to introduce more variation to the sound image. While the choice of variation is more or less arbitrary, the relationship to the input data is set in stone - so a slap against Morty's masseteric fascia from a 30 degree angle and a distance of 88 pixels WILL sound the same way twice.
The purpose of this blogpost was to provide a quick and deep dive into the creative process behind computational sound design. For the sake of clarity we have omitted some details that made it into the final implementation, but don't hestitate to get in touch if you have any questions.