The Sound Authoring Process
In Chapters 4—6 we'll examine the first part of the communication chain described in Chapter 1, where we author sound to create a particular effect:

CASTING different sounds according to function (Chapter 4);

RECORDING, STORAGE and PLAYBACK of sound (Chapter 5); and

DIGITIZATION and EDITING methods for digital sound (Chapter 6).

Sound waveforms as described in Chapter 1 can now be seen in the context of recording and digitization of a specific sound to create a virtual sound experience. What the process involves is choosing from among the many different types of sounds sources available to the composer, and then subsequently storing and editing these sounds via analog-digital (A-D) and digital-analog (D-A) conversion. These are the basic functions of a digital audio workstation (DAW) for desktop audio production; i.e., the sound hardware and software present within your computer. In subsequent chapters we’ll examine more advanced methods for processing audio.

The compositional process involves capturing many different types of sounds, editing, processing, and assembly for a final product. But first we have to have something worthwhile to record. Casting involves finding sounds that are both appropriate and worthwhile for recording for use in a particular audio production, and then assigning a function to that sound; the sound can then be termed an audio cast member. What’s appropriate to cast is completely up to the composer; only the imagination should be the limit of what sounds one might want to use for a particular application. What’s more difficult sometimes is identifying sound sources that have something special about them —a certain sonic richness. Even more important is the difficulty in identifying sounds that will successfully transfer to the desktop audio medium. It’s somewhat akin to photography; you can be in the midst of a beautiful landscape, but only so many of the potential scenes will transfer successfully according to the composer’s intent. Furthermore, they need to be sounds that can be captured by a recording device with a minimum of inherent noise and distortion. It is tempting to use pre-recorded sounds, and indeed the use of “second hand” sampled sounds in the hands of an expert can yield incredibly satisfying results (as with many types of pop music.) But in many multimedia productions, you hear the same types of music and effects over and over again, which bore the user instead of exciting her.

A composer needs to be especially careful that the sound is not overused, owned, or cliché or else the immediate reaction of some listeners will be exactly like when hearing a radio station that they don't like —listeners are capable of making extremely fast judgments to “tune out” upon hearing certain sounds. Hence, it’s as important to become proficient at recording your own original sonic material as it is to know how to capture and manipulate the sounds made by other people.

There are many types of sound sources that one will usually need to cast in a multimedia environment. For our purposes, we'll divide them into three different categories, seen at the left of Figure 4.1. The first are sounds made by humans, in particular, vocal music and the spoken voice. The second category is “sound sources of definite pitch,” which would include musical instrument sound sources, according to your personal definition of music: “traditional,” “non-traditional,” samples, dubs, acoustic, or electronic sounds produced by samplers, synthesizers, and so on. Usually, musical sounds have a definite pitch, except for certain percussion instruments. Music can be defined for present purposes as a layer of sound that is not directly connected to an action within a visual metaphor or object that is seen. The third category is for sounds mostly used for sound effects which sometimes but not always have an indefinite pitch. This would include those sounds that

FIGURE 4.1. Relationship between different classifications of sound sources, and their eventual application within an audio production.

are directly connected to a visual action, metaphor or object; the following example (click here) is practically a science for the class of cinematic sound effects known as Foley sound. The number of sounds that are connected to visual objects are infinite: toasters, wind, automobiles, latches, etc..

Ultimately, the problem of categorizing sounds according to characteristics of the original sound source is that it is arbitrary; many sounds could fit into any three of these categories from moment to moment. For instance, non-traditional sound sources can easily be manipulated to sound like musical instruments, and both music and sound effects can be used to suggest a mood or attitude on the psychology of the listener. As an alternative, it is easier to categorize sounds into possible cast members according to their function. Figure 4.1 also shows how sound source characteristics map into Narration, Music and Sound Effect functions. Note also that the boundaries are not permanently fixed between these functional categories; a single sound source can act as any three of these cast types, even in combination.

In the discussion that follows, we'll adopt some fairly narrow working definitions of narration, music and sound effect functions, using traditional examples. The main point is that these three functions are commonly utilized simultaneously in the process of composing all of the audio for a multimedia production. But there's nothing wrong with imagining less traditional alternatives, such pounding a garbage can lid to form an understandable, primitive narrative that's also musical (click here).

Narration is defined here as the use of the spoken word for the purpose of presenting a story to a listener —a sequence of ideas that forms an image within the listener's imagination. While we use a number of different media to "tell the story" in a multimedia production, the power of the spoken voice can override other types of cues, since the subtle power of different types of narrative style is profound.

A narrator's voice is more similar to an actor in a movie than to a written narrative in that we can immediately understand important information regarding the characterization of the speaker and the mood of the situation just through listening to the narrator's delivery. A monologue or a dialogue in a play or movie script are only words on paper; the characterization that arises through the narrative performance. The manner in which characterization affects delivery is manifested in such things as timing, variation in pitch, volume, and other features inherent to the range of variation capable by a single person. Additionally there are features to the voice that are indicative or unique to a particular individual or a group of people (including stereotypes of particular people). We can guess the age of the person, whether or not they're happy, sad, young, their language, dialect, and even cultural standing.

Most people use a range of characterizations in their everyday speech without being conscious of the fact. We use a certain type of character that directly affects one’s vocal delivery towards particular persons; one voice for people we’re intimate with, another for strangers, another with parents or family, and so on. Most of these characterizations are probably absorbed culturally from learning from others, but they are also absorbed from mass media, especially television. These characterizations also help define societal gender roles. Note how a certain “authoritative intimacy” is often found in the monologue used in personal care product advertisements aimed at women, but not for a cross-gender product such as toothpaste.

Listen to the following examples of different narration of the following simple text and imagine the different scenarios that would accompany this particular dialogue - “here it comes now.”

Clickhere for sound example one

Clickhere for sound example two

Click here for sound example three

Click here for sound example four

What are the sonic characteristics that allow us to differentiate between these different examples? Using the methods for describing sound outlined in Chapters 1—3, it’s possible to see how varying pitch, intensity, the type of frequency modulation, and the temporal sequence of the words helps us associate certain scenarios and situations with the narrative. We match these physical acoustical variations with cultural associations of fear, anger, or boredom.

Some people think of music in terms of a recognizable “tune,” while to others any assemblage of sound or even silence (thanks to John Cage) can be viewed as music. As if tied to the old adage “I don’t know much about art but I know what I like,” many people are quite limited in their perceptions of what constitutes acceptable music. For commercial multimedia audio production, it’s often safer to aim for the lowest common denominator. It’s a whole art form to create musical tunes that sound vaguely familiar but not familiar enough that one would have to pay royalties for the use of the song. Or you can purchase the rights to certain tunes, or use ones that are in the public domain. The danger with popular music usage is for something that is intended to have a lifestyle longer than what is considered to be popular.

Traditionally one considers melody, harmony and rhythm as separate domains, with melody considered the most important. This is a western notion; for instance, in Indian classical music, the importance of rhythm is paramount (click here).

One technique commonly used in multimedia is the ostinato. In fact, it’s easy to create a simple ostinato by taking a snippet of sound and then repeating it (click here to hear a snippet; click here to hear the sound looped a few times). This gets overdone since most multimedia packages allow continuous sound loops. You can save on disc space by looping a sound but it will become noticeably repetitious. Usually the function of an ostinato pattern is to set a mood without being noticed.

Complex ostinati are much more interesting. For instance, the sound examples used at the start of Chapter 1 were based on an ostinato. In this example a single ostinato pattern is heard, played on the lower notes of the guitar (click here to hear the example). In this example, there are two ostinato patterns, one in the piano, one on the marimba (click here to hear the example).

We’ve already shown how music can change the mood of a particular text, in Chapter 1 on page 2. If we vary the type of music used with a particular narration, we can alter the emotional content as well. For instance

click here

to listen to a “cute” setting of a little boy’s voice. But if you use this type of music instead—

click here

the feeling is completely transformed.

Note that without music, we would need to have changed the style of the delivery used in the narrative to create a different effect. Through the combination of a narration with musical accompaniment, we can manipulate the “deeper meaning” of the dialogue.

Musical accompaniment in advertising is a good source of the psychological effects of harmony. For instance, which of the three chord change combinations causes you to feel more confident about a product:


here............or.............. here

here..............or.............. here

The first example uses what is termed in musical harmonic theory as an “unresolved” harmonic sequence. The second example uses no particular harmonic sequence at all; instead, a consonant harmonic combination is followed by a much more dissonant harmonic combination. In the third example, the harmonic sequence is resolved, using the consonant (and sometimes sickeningly sweet) sound of major 7th chords.

Sound Effects (efx)
Sound effects ("efx" for short) as used within multimedia applications for two purposes; as a function of the type of visual object or action they are associated with. First, there are efx that respond to no visual object on the screen but instead to a physical action by the user, such as clicking the mouse. These create a sensation of interactivity due to the real-world familiarity of physical actions resulting in sonic feedback.

The following are some of the stereo sounds developed for mouse interaction on a Macintosh by Jay Boersma. Bored of ordinary monaural system beeps, Jay developed the following sounds for various system alerts in order to exploit stereo playback.

click here

click here

click here

Many other programs for all types of platforms have methods for attaching different sound efx to various computer alerts. While the above examples are very good, the fact that people frequently turn the volume off for many computer alerts or change them after a period of time points to the need for flexibility and a compositional approach.

The term “found sound” comes from the equivalent concept in art, the “found object.” Sound designers obtain many of their best sound efx by gathering whatever they “find” during the collection phase of the creative process. This is a period where an artist collects materials without regard to the execution of a particular goal or project; it can be one of the most enjoyable and creative periods of the artistic process.

In that spirit, the author recently had to kill an hour or two while waiting in a friend's office for them to return from a lesson. Luckily, I was equipped with a good microphone and a digital tape recorder (it's amazing what you can find in a simple space such as an office). Try guessing what the following sounds are; on the following page, the sounds are identified. Note that the literal realism of the sounds will be affected by the quality of your playback system, in particular, the loudspeakers. Sounds such as these are “rich” for potential sound efx, as you shall hear on the next page.

Click here to listen to the first found sound
write down what you think it is
Click here to listen to the second found sound
write down what you think it is
Clickhere to listen to the third found sound
write down what you think it is
Clickhere to listen to the forth found sound
write down what you think it is
Clickhere to listen to the fifth found sound
write down what you think it is
Clickhere to listen to the sixth found sound

Clickhere to listen to the first found sound again; this is a fire alarm bell that was hanging on the wall. Note that with a struck object, the timbre can vary radically as a function of the physical make-up of the striking device, and to a lesser degree, as a function of the degree of force used in striking the object. A familiar example is how a snare drum can immediately sound more “rock” or more “jazz” depending on whether or not the drummer uses a "sticks" (drum beaters made of wood) or “brushes” (drum beaters made of thin wires). In this example, I used a coin to strike the metal. The actual fire alarm would never be heard in this way; a metal beater plays the sound repeatedly, never allowing the amplitude envelope to die out completely.

Clickhere to listen to the second found sound again; you may have guessed that it was made by rubbing paper together. But if your ears (or playback system) are really good, you'd recognize this as the sound of money —specifically, US currency— being counted with two hands. As you might or might not guess, bankers, professional gamblers and blind adults are quite good at distinguishing the sound of the “real thing” from mere paper. A finely developed sense of timbre is cultivated out of practical necessity, here, the timbre that results from the fiber content and size of money. A more obvious example is the sound of a coin rattling (Clickhere).

Click here to listen to the third found sound again. This one is pretty easy; most of you who have ever had a desk job recognize the sound of a length of transparent tape being pulled out of a dispenser and then cut with the dispenser's teeth. But there's more to this sound than first meets the ear. Note that the sound overall is comprised of three distinct components: first, the sound of the glue separating the length of the tape from the roll; second, the silence that is associated with orienting the tape up and over the teeth, once pulled out; and third; the quick tearing sound of the teeth cutting the tape. The second silent component is important to the realism of the effect; remember that you have to pull tape at an angle upwards away from the teeth to both extend its length and achieve enough of a distance to have adequate force to cause the tape to separate. It can be interesting and even fascinating to dyed-in-the-wool sound efx designers how a detailed study of sound reveals the complexity of what might otherwise be considered a mundane and forgettable everyday action. But the power of understanding this is revealed when you try to create this sound incorrectly

Click here to listen to the fourth found sound again. You may have guessed that this is some sort of water sound, but you may be surprised by the source. This sound was produced by simply rocking a plastic water bottle back and forth with just a bit of water inside; and then playing back the sound at half speed. The resulting illusion is of a much larger body of water being displaced by a large object; a bit like the sound of a houseboat tied to a dock as someone steps aboard.

Click here to listen to the fifth found sound again. This is an example of slightly more complicated processing. If you guessed that this is the sound of a whip striking something, you've heard the intended effect. But you may be surprised to know that the sound source was originally a page being turned in a book (actually, a music score, which has larger and thicker paper than most books). Click here to listen to the original sound. For processing, the pitch was shifted 50% lower, and then equalization was used to emphasize frequencies in the region between 2—6 kHz, yielding the slightly distorted “biting” sound as the whip makes contact.

Clickhere to listen to the sixth found sound again. This was a latch on a case for transporting electronic equipment. If we paste the sound of keys jingling in front of the sound of the latch opening, we can create a type of non-verbal sonic narrative— Click here. We have someone searching through a set of keys followed by the successful opening of something that was locked. What’s missing is the sound of the key inside the latch itself, as it passes the lock tumblers (click here). Putting it all together via software editing, we get this: click here.