4.1.6.1 Frequency Perception

In Chapter 3, we discussed the non-linear nature of pitch perception when we looked at octaves as defined in traditional Western music. The A above middle C (call it A4) on a piano keyboard sounds very much like the note that is 12 semitones above it, A5, except that A5 has a higher pitch. A5 is one octave higher than A4. A6 sounds like A5 and A4, but it’s an octave higher than A5. The progression between octaves is not linear with respect to frequency. A2’s frequency is twice the frequency of A1. A3’s frequency is twice the frequency of A2, and so forth. A simple way to think of this is that as the frequencies increase by multiplication, the perception of the pitch change increases by addition. In any case, the relationship is non-linear, as you can clearly see if you plot frequencies against octaves, as shown in Figure 4.7.

Figure 4.7 Non-linear nature of pitch perception
Figure 4.7 Non-linear nature of pitch perception

The fact that this is a non-linear relationship implies that the higher up you go in frequencies, the bigger the difference in frequency between neighboring octaves. The difference between A2 and A1 is 110 – 55 = 55 Hz while the difference between A7 and A6 is 3520 – 1760 = 1760 Hz. Because of the non-linearity of our perception, frequency response graphs often show the frequency axis on a logarithmic scale, or you’re given a choice between a linear and a logarithmic scale, as shown in Figure 4.8. Notice that you can select or deselect “linear” in the upper left hand corner. In the figure on the right, the distance between 10 and 100 Hz on the horizontal axis is the same as the distance between 100 and 1000, which is the same as 1000 and 10000. This is more in keeping with how our perception of the pitch changes as the frequencies get higher. You should always pay attention to the scale of the frequency axis in graphs such as this.

Figure 4.8 Frequency response graphs with linear and nonlinear scales for frequency
Figure 4.8 Frequency response graphs with linear and nonlinear scales for frequency

The range of frequencies within human hearing is, at best, 20 Hz to 20,000 Hz. The range varies with individuals and diminishes with age, especially for high frequencies. Our hearing is less sensitive to low frequencies than to high; that is, low frequencies have to be more intense for us to hear them than high frequencies.

Frequency resolution (also called frequency discrimination) is our ability to distinguish between two close frequencies. Frequency resolution varies by frequency, loudness, the duration of the sound, the suddenness of the frequency change, and the acuity and training of the listener’s ears. The smallest frequency change that can be noticed as a pitch change is referred to as a just-noticeable-difference (jnd). At low frequencies, it’s possible to notice a difference between frequencies that are separated by just a few Hertz. Within the 1000 Hz to 4000 Hz range, it’s possible for a person to hear a jnd of as little as 1/12 of a semitone. (But 1/12 a semitone step from 1000 Hz is about 88 Hz, while 1/12 a semitone step from 4000 Hz is about 353 Hz.) At low frequencies, tones that are separated by just a few Hertz can be distinguished as separate pitches, while at high frequencies, two tones must be separated by hundreds of Hertz before a difference is noticed.

You can test your own frequency range and discrimination with a sound processing program like Audacity or Audition, generating and listening to pure tones, as shown in Figure 4.9 Be aware, however, that the monitors or headphones you use have an impact on your ability to hear the frequencies.

Figure 4.9 Creating a single-frequency tone in Adobe Audition
Figure 4.9 Creating a single-frequency tone in Adobe Audition

4.1.6.2 Critical Bands

One part of the ear’s anatomy that is helpful to consider more closely is the area in the inner ear called the basilar membrane. It is here that sound vibrations are detected, separated by frequencies, and transformed from mechanical energy to electrical impulses sent to the brain.   The basilar membrane is lined with rows of hair cells and thousands of tiny hairs emanating from them. The hairs move when stimulated by vibrations, sending signals to their base cells and the attached nerve fibers, which pass electrical impulses to the brain.   In his pioneering work on frequency perception, Harvey Fletcher discovered that different parts of the basilar membrane resonate more strongly to different frequencies. Thus, the membrane can be divided into frequency bands, commonly called critical bands. Each critical band of hair cells is sensitive to vibrations within a certain band of frequencies. Continued research on critical bands has shown that they play an important role in many aspects of human hearing, affecting our perception of loudness, frequency, timbre, and dissonance vs. consonance. Experiments with critical bands have also led to an understanding of frequency masking, a phenomenon that can be put to good use in audio compression.

Critical bands can be measured by the band of frequencies that they cover. Fletcher discovered the existence of critical bands in his pioneering work on the cochlear response. Critical bands are the source of our ability to distinguish one frequency from another. When a complex sound arrives at the basilar membrane, each critical band acts as a kind of bandpass filter, responding only to vibrations within its frequency spectrum. In this way, the sound is divided into frequency components. If two frequencies are received within the same band, the louder frequency can overpower the quieter one. This is the phenomenon of masking, first observed in Fletcher’s original experiments.

[aside]A bandpass filter allows only the frequencies in a defined band to pass through, filtering out all other frequencies. Bandpass filters are studied in Chapter 7.[/aside]

Critical bands within the ear are not fixed areas but instead are created during the experience of sound. Any audible sound can create a critical band centered on it. However, experimental analyses of critical bands have arrived at approximations that are useful guidelines in designing audio processing tools. Table 4.4 is one model taken after Fletcher, Zwicker, and Barkhausen’s independent experiments, as cited in (Tobias, 1970). Here, the basilar membrane is divided into 25 overlapping bands, each with a center frequency and with variable bandwidths across the audible spectrum. The width of each band is given in Hertz, semitones, and octaves. (The widths in semitones and octaves were derived from the widths in Hertz, as explained in Section 4.3.1.) The center frequencies are graphed against the critical bands in Hertz in Figure 4.10.

You can see from the table and figure that, measured in Hertz, the critical bands are wider for higher frequencies than for lower. This implies that there is better frequency resolution at lower frequencies because a narrower band results in less masking of frequencies in a local area.

The table shows that critical bands are generally in the range of two to four semitones wide, mostly less than four. This observation is significant as it relates to our experience of consonance vs. dissonance. Recall from Chapter 3 that a major third consists of four semitones.  For example, the third from C to E is separated by four semitones (stepping from C to C#, C# to D, D to D #, and D# to E.) Thus, the notes that are played simultaneously in a third generally occupy separate critical bands. This helps to explain why thirds are generally considered consonant – each of the notes having its own critical band. Seconds, which exist in the same critical band, are considered dissonant. At very low and very high frequencies, thirds begin to lose their consonance to most listeners. This is consistent with the fact that the critical bands at the low frequencies (100-200 and 200-300 Hz) and high frequencies (over 12000 Hz) span more than a third, so that at these frequencies, a third lies within a single critical band.

[table caption=”Table 4.4 An estimate of critical bands using the Bark scale” width=”80%”]

Critical Band,Center Frequency in Hertz,Range of Frequencies in Hertz,Bandwidth in Hertz,Bandwidth in Semitones Relative to Start*,Bandwidth in Octaves Relative to Start*
1,50,1-100,100,,-
2,150,100-200,100,12,1
3,250,200-300,100,7,0.59
4,350,300–400,100,5,0.42
5,450,400–510,110,4,0.31
6,570,510–630,120,4,0.3
7,700,630–770,140,3,0.29
8,840,770–920,150,3,0.26
9,1000,920–1080,160,3,0.23
10,1170,1080–1270,190,3,0.23
11,1370,1270–1480,210,3,0.22
12,1600,1480–1720,240,3,0.22
13,1850,1720–2000,280,3,0.22
14,2150,2000–2320,320,3,0.21
15,2500,2320–2700,380,3,0.22
16,2900,2700–3150,450,3,0.22
17,3400,3150–3700,550,3,0.23
18,4000,3700–4400,700,3,0.25
19,4800,4400–5300,900,3,0.27
20,5800,5300–6400,1100,3,0.27
21,7000,6400–7700,1300,3,0.27
22,8500,7700–9500,1800,4,0.3
23,10500,9500–12000,2500,4,0.34
24,13500,12000–15500,3500,4,0.37
25,18775,15500–22050,6550,6,0.5
*See Section 4.3.2 for an explanation of how the last two columns of this table were derived.[attr colspan=”6″]

[/table]

Figure 4.10 Critical bands graphed from Table 4.4
Figure 4.10 Critical bands graphed from Table 4.4

4.1.6.3 Amplitude Perception

In the early 1930s at Bell Laboratories, groundbreaking experiments by Fletcher and Munson clarified the extent to which our perception of loudness varies with frequency (Fletcher and Munson 1933). Their results, refined by later researchers (Robinson and Dadson, 1956) and adopted as International Standard ISO 226, are illustrated in a graph of equal-loudness contours shown in Figure 4.11. In general, the graph shows how much you have to “turn up” or “turn down” a single frequency tone to make it sound equally loud to a 1000 Hz tone. Each curve on the graph represents an n-phon contour. One phon is defined as a 1000 Hz sound wave at a loudness of 1 dBSPL. An n-phon contour is created as follows:

  • Frequency is on the horizontal axis and loudness in decibels is on the vertical axis
  • n curves are drawn.
  • Each curve, from 1 to n, represents the intensity levels necessary in order to make each frequency, across the audible spectrum, sound equal in loudness to a 1000 Hz wave at n dBSPL.

Let’s consider, for example, the 10-phon contour. This contour was creating by playing a 1000 Hz pure tone at a loudness level of 10 dBSPL, and then asking groups of listeners to say when they thought pure tones at other frequencies matched the loudness of the 1000 Hz tone. Notice that low-frequency tones had to be increased by 60 or 75 dB to sound equally loud. Some of the higher-frequency tones – in the vicinity of 3000 Hz – actually had to be turned down in volume to sound equally loud to the 10 dBSPL 1000 Hz tone. Also notice that the louder the 1000 Hz tone is, the less lower-frequency tones have to be turned up to sound equal in loudness. For example, the 90-phon contour goes up only about 30 dB to make the lowest frequencies sound equal in loudness to 1000 Hz at 90 dBSPL, whereas the 10-phon contour has to be turned up about 75 dB.

Figure 4.11 Equal loudness contours (Figure derived from a program by Jeff Tacket, posted at the MATLAB Central File Exchange)
Figure 4.11 Equal loudness contours (Figure derived from a program by Jeff Tacket, posted at the MATLAB Central File Exchange)

With the information captured in the equal loudness contours, devices that measure the loudness of sounds – for example, SPL meters (sound pressure level meters) – can be designed so that they compensate for the fact that low frequency sounds seem less loud than high frequency sounds at the same amplitude. This compensation is called “weighting.” Figure 4.12 graphs three weighting functions – A, B, and C. The A, B, and C-weighting functions are approximately inversions of the 40-phon, 70-phon, and 100-phon loudness contours, respectively. This implies that applying A-weighting in an SPL meter causes the meter to measure loudness in a way that matches our differences in loudness perception at 40-phons.

To understand how this works, think of the graphs of the weighting as frequency filters – also called frequency response graphs. When a weighting function is applied by an SPL meter, the meter uses a filter to reduce the influence of frequencies to which our ears are less sensitive, and conversely to increase the weight of frequencies that our ears are sensitive to. The fact that the A-weighting graph is lower on the left side than on the right means that an A-weighted SPL meter reduces the influence of low-frequency sounds as it takes its overall loudness measurement. On the other hand, it boosts the amplitude of frequencies around 3000 Hz, as seen by the bump above 0 dB around 3000 Hz. It doesn’t matter that the SPL meter meddles with frequency components as it measures loudness. After all, it isn’t measuring frequencies. It’s measuring how loud the sounds seem to our ears. The use of weighted SPL meters is discussed further in Section 4.2.2.2.

Figure 4.12 Graphs of A, B, and C-weighting functions (Figure derived from a program by Jeff Tacket, posted at the MATLAB Central File Exchange)
Figure 4.12 Graphs of A, B, and C-weighting functions
(Figure derived from a program by Jeff Tacket, posted at the MATLAB Central File Exchange)

Sometimes it’s convenient to simplify our understanding of sound by considering how it behaves when there is nothing in the environment to impede it. An environment with no physical influences to absorb, reflect, diffract, refract, reverberate, resonate, or diffuse sound is called a free field. A free field is an idealization of real world conditions that facilitates our analysis of how sound behaves. Sound in a free field can be pictured as radiating out from a point source, diminishing in intensity as it gets farther from the source. A free field is partially illustrated in Figure 4.18. In this figure, sound is radiating out from a loudspeaker, with the colors indicating highest to lowest intensity sound in the order red, orange, yellow, green, and blue. The area in front of the loudspeaker might be considered a free field. However, because the loudspeaker partially blocks the sound from going behind itself, the sound is lower in amplitude there. You can see that there is some sound behind the loudspeaker, resulting from reflection and diffraction.

Figure 4.13 Sound radiation from a loudspeaker, viewed from top
Figure 4.13 Sound radiation from a loudspeaker, viewed from top

4.1.7.1 Absorption, Reflection, Refraction, and Diffraction

In the real world, there are any number of things that can get in the way of sound, changing its direction, amplitude, and frequency components. In enclosed spaces, absorption plays an important role. Sound absorption is the conversion of sound’s energy into heat, thereby diminishing the intensity of the sound. The diminishing of sound intensity is called attenuation. A general mathematical formulation for the way sound attenuates as it moves through the air is captured in the inverse square law, which shows that sound decreases in intensity in proportion to the square of the distance from the source. (See Section 4.2.1.6.) The attenuation of sound in the air is due to the air molecules themselves absorbing and converting some of the energy to heat. The amount of attenuation depends in part on the air temperature and relative humidity. Thick, porous materials can absorb and attenuate the sound even further, and they’re often used in architectural treatments to modify and control the acoustics of a room. Even hard, solid surfaces absorb some of the sound energy, although most of it is reflected back. The material of walls and ceilings, the number and material of seats, the number of persons in an audience, and all solid objects have to be taken into consideration acoustically in sound setups for live performance spaces.

Sound that is not absorbed by objects is instead reflected from, diffracted around, or refracted into the object. Hard surfaces reflect sound more than soft ones, which are more absorbent. The law of reflection states that the angle of incidence of a wave is equal to the angle of reflection. This means that if a wave were to propagate in a straight line from its source, it reflects in the way pictured in Figure 4.15. In reality, however, sound radiates out spherically from its source. Thus, a wavefront of sound approaches objects and surfaces from various angles. Imagine a cross-section of the moving wavefront approaching a straight wall, as seen from above. Its reflection would be as pictured in Figure 4.15, like a mirror reflection.

Figure 4.14 Angle of incidence equals angle of reflection
Figure 4.14 Angle of incidence equals angle of reflection
Figure 4.15 Sound radiating from source and reflecting off flat wall, as seen from above
Figure 4.15 Sound radiating from source and reflecting off flat wall, as seen from above

In a special case, if the wavefront were to approach a concave curved solid surface, it would be reflected back to converge at one point in the room, the location of that point depending on the angle of the curve. This is how whispering rooms are constructed, such that two people whispering in the room can hear each other perfectly if they’re positioned at the sound’s focal points, even though the focal points may be at the far opposite ends of the room. A person positioned elsewhere in the room cannot hear their whispers at all. A common shape found with whispering rooms is an ellipse, as seen in Figure 4.16. The shape and curve of these walls cause any and all sound emanating from one focal point to reflect directly to the other.

Figure 4.16 Sound reflects directly between focal points in a whispering room
Figure 4.16 Sound reflects directly between focal points in a whispering room

[aside]

Diffraction also has a lot to do with microphone and loudspeaker directivity. Consider how microphones often have different polar patterns at different frequencies. Even with a directional mic, you’ll often see lower frequencies behave more omnidirectionally, and sometimes an omnidirectional mic may be more directional at high frequencies. That’s largely because of the size of the wavelength compared to size of the microphone diaphragm. It’s hard for high frequencies to diffract around a larger object, so for a mic to have a truly omnidirectional pattern, the diaphragm has to be very small.

[/aside]

Diffraction is the bending of a sound wave as it moves past an obstacle or through a narrow opening. The phenomenon of diffraction allows us to hear sounds from sources that are not in direct line-of-sight, such as a person standing around a corner or on the other side of a partially obstructing object. The amount of diffraction is dependent on the relationship between the size of the obstacle and the size of the sound’s wavelength. Low frequency sounds (i.e., long-wavelength sounds) are diffracted more than high frequencies (i.e., short wavelengths) around the same obstacle. In other words, low frequency sounds are better able to travel around obstacles. In fact, if the wavelength of a sound is significantly larger than an obstacle that the sound encounters, the sound wave continues as if the obstacle isn’t even there. For example, your stereo speaker drivers are probably protected behind a plastic or metal grill, yet the sound passes through it intact and without noticeable coloration. The obstacle presented by the wire mesh of the grill (perhaps a millimeter or two in diameter) is even smaller than the smallest wavelength we can hear (about 2 centimeters for 20 kHz, 10 to 20 times larger than the wire), so the sound diffracts easily around it.
Refraction is the bending of a sound wave as it moves through different media. Typically we think of refraction with light waves, as when we look at something through glass or that is underwater. In acoustics, the refraction of sound waves tends to be more gradual, as the properties of the air change subtly over longer distances. This causes a bending in sound waves over a long distance, primarily due to temperature, humidity, and in some cases wind gradients over distance and altitude. This bending can result in noticeable differences in sound levels, either as a boost or an attenuation, also referred to as a shadow zone.

4.1.7.2 Reverberation, Echo, Diffusion, and Resonance

Reverberation is the result of sound waves reflecting off of many objects or surfaces in the environment. Imagine an indoor room in which you make a sudden burst of sound. Some of that sound is transmitted through or absorbed by the walls or objects, and the rest is reflected back, bouncing off the walls, ceilings, and other surfaces in the room. The sound wave that travels straight from the sound source to your ears is called the direct signal. The first few instances of reflected sound are called primary or early reflections. Early reflections arrive at your ears about 60 ms or sooner after the direct sound, and play a large part in imparting a sense of space and room size to the human ear. Early reflections may be followed by a handful of secondary and higher-order reflections. At this point, the sound waves have had plenty of opportunity to bounce off of multiple surfaces, multiple times. As a result, the reflections that are arriving now are more numerous, closer together in time, and quieter. Much of the initial energy initial energy of the reflections has been absorbed by surfaces or expended in the distance traveled through the air. This dense collection of reflections is reverberation, illustrated in Figure 4.17. Assuming that the sound source is only momentary, the generated sound eventually decays as the waves lose energy, the reverberation becoming less and less loud until the sound is no longer discernable. Typically, reverberation time is defined as the time it takes for the sound to decay in level by 60 dB from its direct signal.

Figure 4.17 Sound reflections and reverberation
Figure 4.17 Sound reflections and reverberation

 

Single, strong reflections that reach the ear a significant amount of time – about 100 ms – after the direct signal can be perceived as an echo – essentially a separate recurrence of the original sound. Even reflections as little as 50 ms apart can cause an audible echo, depending on the type of sound and room acoustics. While echo is often employed artistically in music recordings, echoes tend to be detrimental and distracting in a live setting and are usually avoided or require remediation in performance and listening spaces.

Diffusion is another property that interacts with reflections and reverberation. Diffusion relates to the ability to distribute sound energy more evenly in a listening space. While a flat, even surface reflects sounds strongly in a predictable direction, uneven surfaces or convex curved surfaces diffuse sound more randomly and evenly. Like absorption, diffusion is often used to treat a space acoustically to help break up harsh reflections that interfere with the natural sound. Unlike absorption, however, which attempts to eliminate the unwanted sound waves by reducing the sound energy, diffusion attempts to redirect the sound waves in a more natural manner. A room with lots of absorption has less overall reverberation, while diffusion maintains the sound’s intensity and helps turn harsh reflections into more pleasant reverberation. Usually a combination of absorption and diffusion is employed to achieve the optimal result. There are many unique types of diffusing surfaces and panels that are manufactured based on mathematical algorithms to provide the most random, diffuse reflections possible

Putting these concepts together, we can say that the amount of time it takes for a particular sound to decay depends on the size and shape of the room, its diffusive properties, and the absorptive properties of the walls, ceilings, and objects in the room. In short, all the aforementioned properties determine how sound reverberates in a space, giving the listener a “sense of place.”

Reverberation in an auditorium can enhance the listener’s experience, particularly in the case of a music hall where it gives the individual sounds a richer quality and helps them blend together. Excessive reverberation, however, can reduce intelligibility and make it difficult to understand speech. In Chapter 7, you’ll see how artificial reverberation is applied in audio processing.

A final important acoustical property to be considered is resonance. In Chapter 2, we defined resonance as an object’s tendency to vibrate or oscillate at a certain frequency that is basic to its nature. Like a musical instrument, a room has a set of resonant frequencies, called its room modes. Room modes result in locations in a room where certain frequencies are boosted or attenuated, making it difficult to give all listeners the same audio experience. We’ll talk more about how to deal with room modes in Section 4.2.2.5.

4.2.1.1 Real-World Considerations

We now turn to practical considerations related to the concepts introduced in Section 1. We first return to the concept of decibels.

An important part of working with decibel values is learning to recognize and estimate decibel differences. If a sound isn’t loud enough, how much louder does it need to be? Until you can answer that question in a dB value, you will have a hard time figuring out what to do. It’s also important to understand the kind of dB differences that are audible. The average listener cannot distinguish a difference in sound pressure level that is less than 3 dB. With training, you can learn to recognize differences in sound pressure level of 1 dB, but differences that are less than 1 dB are indistiguishable to even well-trained listeners.

Understanding the limitations to human hearing is very important when working with sound. For example, when investigating changes you can make to your sound equipment to get higher sound pressure levels, you should be aware that unless the change amounts to 3 dB or more, most of your listeners will probably not notice. This concept also applies when processing audio signals. When manipulating the frequency response of an audio signal using an equalizer, unless you’re making a difference of 3 dB with one of your filters, the change will be imperceptible to most listeners.

Having a reference to use when creating audio material or sound systems is also helpful. For example, there are usually loudness requirements imposed by the television network for television content. If these requirements are not met, there will be level inconsistencies between the various programs on the television station that can be very annoying to the audience. These requirements could be as simple as limiting peak levels to -10 dBFS or as strict as meeting a specified dBFS average across the duration of the show.

You might also be putting together equipment that delivers sound to a live audience in an acoustic space. In that situation you need to know how loud in dBSPL the system needs to perform at the distance of the audience. There is a minimum dBSPL level you need to achieve in order to get the signal above the noise floor of the room, but there is also a maximum dBSPL level you need to stay under in order to avoid damaging people’s hearing or violating laws or policies of the venue. Once you know these requirements, you can begin to evaluate the performance of the equipment to verify that it can meet these requirements.

4.2.1.2 Rules of Thumb

Table 4.2 gives you some rules of thumb for how changes in dB are perceived as changes in loudness. Turn a sound up by 10 dB and it sounds about twice as loud. Turn it up by 3 dB, and you’ll hardly notice any difference.

Similarly, Table 4.5 gives you some rules of thumb regarding power and voltage changes. These rules give you a quick sense of how boosts in power and voltage affect sound levels.

[table caption=”Table 4.5 Rules of thumb for changes in power, voltage, or distance in dB” width=”60%”]

“change in power, voltage, or distance”,approximate change in dB
power $$\ast$$ 2,3 dB increase
power ÷ 2,3 dB decrease
power $$\ast$$ 10,10 dB increase
power ÷ 10,10 dB decrease
voltage $$\ast$$ 2,6 dB increase
voltage ÷ 2,6 dB decrease
voltage $$\ast$$ 10,20 dB increase
voltage ÷ 10,20 dB decrease
distance away from source $$\ast$$ 2,6 dB decrease

[/table]

In the following sections, we’ll give examples of how these rules of thumb come into practice. A mathematical justification of these rules is given in Section 3.

4.2.1.3 Determining Power and Voltage Differences and Desired Changes in Power Levels

Decibels are also commonly used to compare the power levels of loudspeakers and amplifiers. For power, Equation 4.6 applies — $$\Delta Power \: dB = 10\log_{10}\left ( \frac{P_{1}}{P_{0}} \right )$$.

Based on this equation, how much more powerful is an 800 W amplifier than a 200 W amplifier, in decibels?

$$!10\log_{10}\left ( \frac{800\, W}{200\, W} \right )=10\log_{10}4=6\: dB\: increase \: in \:power$$

For voltages, Equation 4.4 is used ($$\Delta Voltage\:dB=20\log_{10}\left ( \frac{V_{1}}{V_{0}} \right )$$). If you increase a voltage level from 100 V to 1000 V, what is the increase in decibels?

$$!20\log_{10}\left ( \frac{100\:V}{10\: V} \right )=20\log_{10}10=20\:dB \: increase\:in\:voltage$$

[aside]

Multiplying power times 2 corresponds to multiplying voltage times $$\sqrt{2}$$ because power is proportional to voltage squared: $$P\propto V^{2}$$

Thus

$$!110\log_{10}\left ( \frac{2\ast P_{0}}{P_{0}} \right )=$$

$$!10\log_{10}\left ( \frac{\sqrt{2} \ast V_{0}}{V_{0}} \right )^{2}=3\:dB\:increase.$$

[/aside]

It’s worth pointing out here that because the definition of decibels-sound-pressure-level was derived from the power decibel definition, then if there’s a 3 dB increase in the power of an amplifier, there is a corresponding 3 dB increase in the sound pressure level it produces. We know that a 3 dB increase in sound pressure level is barely detectable, so the implication is that doubling the power of an amplifier doesn’t increase the loudness of the sounds it produces very much. You have to multiply the power of the amplifier by ten in order to get sounds that are approximately twice asloud.

The fact that doubling the power gives about a 3 dB increase in sound pressure level has implications with regard to how many speakers you ought to use for a given situation. If you double the speakers (assuming identical speakers), you double the power, but you get only a 3 dB increase in sound level. If you quadruple the speakers, you get a 6 dB increase in sound because each time you double, you go up by 3 dB. If you double the speakers again (eight speakers now), you hypothetically get a 9 dB increase, not taking into account other acoustical factors that may affect the sound level.

Often, your real world problem begins with a dB increase you’d like to achieve in your live sound setup. What if you want to increase the level by ΔdB? You can figure out how to do this with the power ratio formula, derived in Equation 4.11.

[equation caption=”Equation 4.11 Derivation of power ratio formula”]

$$!\Delta dB=10\log_{10}\left ( \frac{P_{1}}{P_{0}} \right )$$

$$!\frac{\Delta dB}{10}=\log_{10}\left ( \frac{P_{1}}{P_{0}} \right )$$

Thus

$$!\frac{P_{1}}{P_{0}}=10^{\frac{\Delta dB}{10}}$$

where $$P_{0}$$ is the starting power, $$P_{1}$$ is the new power level, and ΔdB is the desired change in decibels

[/equation]

It may help to recast the equation to clarify that for the problem we’ve described, the desired decibel change and the beginning power level are known, and we wish to compute the new power level needed to get this decibel change.

[equation caption=”Equation 4.12 Power ratio formula”]

$$!P_{1}=P_{0}\ast 10^\frac{\Delta dB}{10}$$

where $$P_{0}$$ is the starting power, $$P_{1}$$ is the new power level, and ΔdB is the desired change in decibels

[/equation]

Applying this formula, what if you start with a 300 W amplifier and want to get one that is 15 dB louder?

$$!P_{1}=300\,W\ast10^{\frac{15}{10}}=9486\,W$$

You can see that it takes quite an increase in wattage to increase the power by 15 dB.

Instead of trying to get more watts, a better strategy would be to choose different loudspeakers that have a higher sensitivity. The sensitivity of a loudspeaker is defined as the sound pressure level that is produced by the loudspeaker with 1 watt of power when measured 1 meter away. Also, because the voltage gain in a power amplifier is fixed, before you go buy a bunch of new loudspeakers, you may also want to make sure that you’re feeding the highest possible voltage signal into the power amplifier. It’s quite possible that the 15 dB increase you’re looking for is hiding somewhere in the signal chain of your sound system due to inefficient gain structure between devices. If you can get 15 dB more voltage into the amplifier by optimizing your gain structure, the power amplifier quite happily amplifies that higher voltage signal assuming you haven’t exceeded the maximum input voltage for the power amplifier. Chapter 8 includes a Max demo on gain structure that may help you with this concept.

4.2.1.4 Converting from One Type of Decibels to Another

A similar problem arises when you have two pieces of sound equipment whose nominal output levels are measured in decibels of different types. For example, you may want to connect two devices where the nominal voltage output of one is given in dBV and the nominal voltage output of the other is given in dBu. You first want to know if the two voltage levels are the same. If they are not, you want to know how much you have to boost the one of lower voltage to match the higher one.

The way to do this is to convert both dBV and dBu back to voltage. You can then compare the two voltage levels in dB. From this you know how much the lower voltage hardware needs to be boosted. Consider an example where one device has an output level of −10 dBv and the other operates at 4 dBu.

Convert −10 dBV to voltage:

$$!-10=20\log_{10}\left ( \frac{v}{1} \right )$$

$$!\frac{-10}{20}=\log_{10}v$$

$$!-0.5=\log_{10}v$$

$$!10^{-0.5}=v\approx 0.316$$

Thus, −10 dBV converts to 0.316 V.

By a similar computation, we get the voltage corresponding to 4 dBu, this time using 0.775 V as the reference value in the denominator.

Convert 4 dBu to voltage:

$$!4=20\log_{10}\left ( \frac{v}{0.775} \right )$$

$$!\frac{4}{20}=\log_{10}\left ( \frac{v}{0.775} \right )$$

$$!0.2=\log_{10}\left ( \frac{v}{0.775} \right )$$

$$!10^{0.2}\ast0.775=v\approx 1.228$$

Thus, 4 dBu converts to 1.228 V.

Now that we have the two voltages, we can compute the decibel difference between them.

Compute the voltage difference between 0.316 V and 1.228 V:

$$!\Delta dB=10\log_{10}\left ( \frac{1.228}{0.316} \right ) \approx 12\: dB$$

From this you see that the lower-voltage device needs to be boosted by 12 dB in order to match the other device.

4.2.1.5 Combining Sound Levels from Multiple Sources

In the last few sections, we’ve been discussing mostly power and voltage decibels. These decibel computations are relevant to our work because power levels and voltages produce sounds. But we can’t hear volts and watts. Ultimately, what we want to know is how loud things sound. Let’s return now to decibels as they measure audible sound levels.

Think about what happens when you add one sound to another in the air or on a wire and want to know how loud the combined sound is in decibels. In this situation, you can’t just add the two decibel levels. For example, if you add an 85 dBSPL lawnmower on top of a 110 dBSPL symphony orchestra, how loud is the sound? It isn’t 85 dBSPL + 110 dBSPL = 195 dBSPL.   Instead, we derive the sum of decibels $$d_{1}$$ and $$d_{2}$$ as follows:

Convert $$d_{1}$$ to air pressure:

$$!85=20\log_{10}\left ( \frac{x}{0.00002} \right )$$

$$!x=10^{\frac{85}{20}}\ast \left ( 0.00002 \right )\approx 0.36\: Pa$$

Convert $$d_{2}$$ to air pressure:

$$!110=20\log_{10}\left ( \frac{x}{0.00002} \right )$$

$$!x=10^{\frac{110}{20}}\ast \left ( 0.00002 \right )\approx 6.32\: Pa$$

Sum the air pressure amplitudes and and convert back to dBSPL:

$$!dBSPL=20\log_{10}\left ( \frac{0.36+6.32}{0.00002} \right )$$

$$!dBSPL\approx 110.5\,dB$$

The combined sounds in this case are not perceptibly louder than the louder of the two original sounds being combined!

4.2.1.6 Inverse Square Law

The last row of Table 4.5 is known as the inverse square law, which states that the intensity of sound from a point source is proportional to the inverse of the square of the distance r from the source. Perhaps of more practical use is the related rule of thumb that for every doubling of distance from a sound source, you get a decrease in sound level of 6 dB. We can informally prove the inverse square law by the following argument.

For simplification, imagine a sound as coming from a point source. This sound radiates spherically (equally in all directions) from the source. Sound intensity is defined as sound power passing through a unit area. The fact that intensity is measured per unit area is what is significant here. You can picture the sound spreading out as it moves away from the source. The farther the sound gets away from the source, the more it has “spread out,” and thus its intensity lessens per unit area as the sphere representing the radiating sound gets larger. This is illustrated in Figure 4.18.

Figure 4.18 Sphere representing sound radiating from a point source; radii representing two different distances from this sound
Figure 4.18 Sphere representing sound radiating from a point source; radii representing two different distances from this sound
Figure 4.19 Applying the inverse square law
Figure 4.19 Applying the inverse square law

This phenomenon of sound attenuation as sound moves from a source is captured in the inverse square law, illustrated in Figure 4.18:

[equation caption=”Equation 4.13 Inverse square law”]

$$!I_{1}-I_{0}=10\log_{10}\left ( \frac{{r_{0}}^{2}}{{r_{1}}^{2}} \right )=20\log_{10}\left ( \frac{r_{0}}{r_{1}} \right )dB$$

where $$r_{0}$$ is the initial distance from the sound, $$r_{1}$$ is the new distance from the sound, $$I_{0}$$ is the intensity of the sound at the microphone in decibels, and is $$I_{1}$$ the intensity of the sound at the listener in decibels

[/equation]

What this means in practical terms is the following. Say you have a sound source, a singer, who is a distance $$r_{0}=$$ 7′ 11″ from the microphone, as shown in Figure 4.19. The microphone detects her voice at a level of $$l_{0}=$$50 dBSPL. The listener is a distance $$r_{1}=$$ 49′ 5″ from the singer. Then the sound reaching the listener from the singer has an intensity of

$$!I_{1}-I_{0}=20\log_{10}\left ( \frac{r_{0}}{r_{1}} \right )$$

$$!I_{1}=I_{0}+20\log_{10}\left ( \frac{r_{0}}{r_{1}} \right )=50+20\log_{10}\left ( \frac{7.0833}{49.4167} \right )=50-16.8728=33.12\, dBSPL$$

Notice that when $$r_{1}<r_{0}$$ the logarithm gives a negative number, which makes sense because the sound is less intense as you move away from the source.

[wpfilebase tag=file id=19 tpl=supplement /]

The inverse square law is a handy rule of thumb. Each time we double the distance from our source, we decrease the sound level by 6 dB. The first doubling of distance is a perceptible but not dramatic decrease in sound level. Another doubling of distance (which would be four times the original distance from the source) yields a 12 dB decrease, which makes the source sound less than half as loud as it did from the initial distance. These numbers are only approximations for ideal free-field conditions. Many other factors intervene in real-world acoustics. But the inverse square law gives a general idea of sound attenuation that is useful in many situations.

4.2.2.1 Potential Acoustic Gain (PAG)

[wpfilebase tag=file id=115 tpl=supplement /]

The acoustic gain of an amplification system is the difference between the loudness as perceived by the listener when the sound system is turned on as compared to when the sound system is turned off. One goal of the sound engineer is to achieve a high potential acoustic gain, or PAG – the gain in decibels that can be added to the original sound without causing feedback. This potential acoustic gain is the entire reason the sound system is installed and the sound engineer is hired. If you can’t make the sound louder and more intelligible, you fail as a sound engineer. The word “potential” is used here because the PAG represents the maximum gain possible without causing feedback. Feedback can occur when the loudspeaker sends an audio signal back through the air to the microphone at the same level or louder than the source. In this situation, the two similar sounds arrive at the microphone at the same level but at a different phase. The first frequency from the loudspeaker to combine with the source at a 360 degree phase relationship is reinforced by 6 dB. The 6 dB reinforcement at that frequency happens over and over in an infinite loop. This sounds like a single sine wave that gets louder and louder. Without intervention on the part of the sound engineer, this sound continues to get louder until the loudspeaker is overloaded. To stop a feedback loop, you need to interrupt the electro-acoustical path that the sound is traveling by either muting the microphone on the mixing console or turning off the amplifier that is driving the loudspeaker. If feedback happens too many times, you’ll likely not be hired again.When setting up for a live performance, an important function of the sound engineer operating the amplification/mixing system is to set the initial sound levels.

The equation for PAG is given below.

[equation caption=”Equation 4.14 Potential acoustic gain (PAG)”]

$$!PAG=20\log_{10}\left ( \frac{D_{1}\ast D_{0}}{D_{s}\ast D_{2}} \right )$$

where $$D_{s}$$ is the distance from the sound source to the microphone,

$$D_{0}$$ is the distance from the sound source to the listener,

$$D_{1}$$ is the distance from the microphone to the loudspeaker, and

$$D_{2}$$ is the distance from the loudspeaker to the listener

[/equation]

PAG is the limit. The amount of gain added to the signal by the sound engineer in the sound booth must be less than this. Otherwise, there will be feedback.

In typical practice, you should stay 6 dB below this limit in order to avoid the initial sounds of the onset of feedback. This is sometimes described as sounding “ringy” because the sound system is in a situation where it is trying to cause feedback but hasn’t quite found a frequency at exactly a 360° phase offset. This 6 dB safety factor should be applied to the result of the PAG equation. The amount of acoustic gain needed for any situation varies, but as a rule of thumb, if your PAG is less than 12 dB, you need to make some adjustments to the physical locations of the various elements of the sound system in order to increase the acoustic gain. In the planning stages of your sound system design, you’ll be making guesses on how much gain you need. Generally you want the highest possible PAG, but in your efforts to increase the PAG you will eventually get to a point where the compromises required to increase the gain are unacceptable. These compromises could include financial cost and visual aesthetics. Once the sound system has been purchased and installed, you’ll be able to test the system to see how close your PAG predictions are to reality. If you find that the system causes feedback before you’re able to turn the volume up to the desired level, you don’t have enough PAG in your system. You need to make adjustments to your sound system in order to increase your gain before feedback.

Figure 4.20 Potential acoustic gain
Figure 4.20 Potential acoustic gain, $$PAG=20\log_{10}\left ( \frac{D_{1}\ast D_{0}}{D_{s}\ast D_{2}} \right )$$

Increasing the PAG can be achieved by a number of means, including:

  • Moving the source closer to the microphone
  • Moving the loudspeaker farther from the microphone
  • Moving the loudspeaker closer to the listener.

It’s also possible to use directional microphones and loudspeakers or to apply filters or equalization, although these methods do not yield the same level of success as physically moving the various sound system components. These issues are illustrated in the interactive Flash tutorial associated with this section.

Note that PAG is the “potential” gain. Not all aspects of the sound need to be amplified by this much. The gain just gives you “room to play.” Faders in the mixer can still bring down specific microphones or frequency bands in the signal. But the potential acoustic gain lets you know how much louder than the natural sound you will be able to achieve.

The Flash tutorial associated with this section helps you to visualize how acoustic gain works and what its consequences are.

4.2.2.2 Checking and Setting Sound Levels

One fundamental part of analyzing an acoustic space is checking sound levels at various locations in the listening area. In the ideal situation, you want everything to sound similar at various listening locations. A realistic goal is to have each listening location be within 6 dB of the other locations. If you find locations that are outside that 6 dB range, you may need to reposition some loudspeakers, add loudspeakers, or apply acoustic treatment to the room. With the knowledge of decibels and acoustics that you gained in Section 1, you should have a better understanding now of how this works.

There are two types of sound pressure level (SPL) meters for measuring sound levels in the air. The most common is a dedicated handheld SPL meter like the one shown in Figure 4.21. These meters have a built-in microphone and operate on battery power. They have been specifically calibrated to convert the voltage level coming from the microphone into a value in dBSPL.

There are some options to configure that can make your measurements more meaningful. One option is the response time of the meter. A fast response allows you to see level changes that are short, such as peaks in the sound wave. A slow response shows you more of an average SPL. Another option is the weighting of the meter. The concept of SPL weighting comes from the equal loudness contours explained in Section 4.1.6.3. Since the frequency response of the human hearing system changes with the SPL, a number of weighting contours are offered, each modeling the human frequency response in with a slightly different emphasis. A-weighting has a rather steep roll off at low frequencies. This means that the low frequencies are attenuated more than they are in B or C weighting. B-weighting has less roll off at low frequencies. C-weighting is almost a flat frequency response except for a little attenuation at low frequencies. The rules of thumb are that if you’re measuring levels of 90 dBSPL or lower, A-weighting gives you the most accurate representation of what you’re hearing. For levels between 90 dBSPL and 110 dBSPL, B-weighting gives you the most accurate indication of what you hear. Levels in excess of 110 dBSPL should use C-weighting. If your SPL meter doesn’t have an option for B-weighting, you should use C-weighting for all measurements higher than 90 dBSPL.

Figure 4.21 Handheld SPL meter
Figure 4.21 Handheld SPL meter

The other type of SPL meter is one that is part of a larger acoustic analysis system. As described in Chapter 2, these systems can consist of a computer, audio interface, analysis microphone, and specialized audio analysis software. When using this analysis software to make SPL measurements, you need to calibrate the software. The issue here is that because the software has no knowledge or control over the microphone sensitivity and the preamplifier on the audio interface, it has no way of knowing which analog voltage levels and corresponding digital sample values represent actual SPL levels. To solve this problem, an SPL calibrator is used. An SPL calibrator is a device that generates a 1 kHz sine wave at a known SPL level (typically 94 dBSPL) at the transducer. The analysis microphone is inserted into the round opening on the calibrator creating a tight seal. At this point, the tip of the microphone is up against the transducer in the calibrator, and the microphone is receiving a known SPL level. Now you can tell the analysis software to interpret the current signal level as a specific SPL level. As long as you don’t change microphones and you don’t change the level of the preamplifier, the calibrator can then be removed from the microphone, and the software is able to interpret other varying sound levels relative to the known calibration level. Figure 4.22 shows an SPL calibrator and the calibration window in the Smaart analysis software.

Figure 4.22 Analysis software needs to be calibrated for SPL
Figure 4.22 Analysis software needs to be calibrated for SPL

4.2.2.3 Impulse Responses and Reverberation Time

In addition to sound amplitude levels, it’s important to consider frequency levels in a live sound system. Frequency measurements are taken to set up the loudspeakers and levels such that the audience experiences the sound and balance of frequencies in the way intended by the sound designer.

One way to do frequency analysis is to have an audio device generate a sudden burst or “impulse” of sound and then use appropriate software to graph the audio signal in the form of a frequency response. The frequency response graph, with frequency on the x-axis and the magnitude of the frequency component on the y-axis, shows the amount of each frequency in the audio signal in one window of time. An impulse response graph is generated in the same way that a frequency response graph is generated, using the same hardware and software. The impulse response graph (or simply impulse response) has time on the x-axis and amplitude of the audio signal on the y-axis. It is this graph that helps us to analyze the reverberations in an acoustic space.

An impulse response measured in a small chamber music hall is shown in Figure 4.23. Essentially what you are seeing is the occurrences of the stimulus signal arriving at the measurement microphone over a period of time. The first big spike at around 48 milliseconds is the arrival of the direct sound from the loudspeaker. In other words, it took 48 milliseconds for the sound to arrive back at the microphone after the analysis software sent out the stimulus audio signal. The delay results primarily from the time it takes for sound to travel through the air from the loudspeaker to the measurement microphone, with a small amount of additional latency resulting from the various digital and analog conversions along the way. The next tallest spike at 93 milliseconds represents a reflection of the stimulus signal from some surface in the room. There are a few small reflections that arrive before that, but they’re not large enough to be of much concern. The reflection at 93 milliseconds arrives 45 milliseconds after the direct sound and is approximately 9 dB quieter than the direct sound. This is an audible reflection that is outside the precedence zone and may be perceived by the listener as an audible echo. (The precedence effect is explained in Section 4.2.2.6.) If this reflection is to be problematic, you can try to absorb it. You can also diffuse it and convert it into the reverberant energy shown in the rest of the graph.

Figure 4.23 Impulse response of small chamber music hall
Figure 4.23 Impulse response of small chamber music hall

Before you can take any corrective action, you need to identify the surface in the room causing the reflection. The detective work can be tricky, but it helps to consider that you’re looking for a surface that is visible to both the loudspeaker and the microphone. The surface should be at a distance 50 feet longer than the direct distance between the loudspeaker and the microphone. In this case, the loudspeaker is up on the stage and the microphone out in the audience seats. More than likely, the reflection is coming from the upstage wall behind the loudspeaker. If you measure approximately 25 feet between the loudspeaker and that wall, you’ve probably found the culprit. To see if this is indeed the problem, you can put some absorptive material on that wall and take another measurement. If you’ve guess correctly, you should see that spike disappear or get significantly smaller. If you wanted to give a speech or perform percussion instruments in this space, this reflection would probably cause intelligibility problems. However, in this particular scenario, where the room is primarily used for chamber music, this reflection is not of much concern. In fact, it might even be desirable, as it makes room sound larger.

[aside]

RT60 is the time it takes for reflections of a direct sound to decay by 60 dB.

[/aside]

As you can see in the graph, the overall sound energy decays very slowly over time. Some of that sound energy can be defined as reverberant sound. In a chamber music hall like this, a longer reverberation time might be desirable. In a lecture hall, a shorter reverberation time is better. You can use this impulse response data to determine the RT60 reverberation time of the room as shown in Figure 4.24. RT60 is the time it takes for reflections of a sound to decay by 60 dB. In the figure, RT60 is determined for eight separate frequency bands. As you can see, the reverberation time varies for different frequency bands. This is due to the varying absorption rates of high versus low frequencies. Because high frequencies are more easily absorbed, the reverberation time of high frequencies tends to be lower. On average, the reverberation time of this room is around 1.3 seconds.

Figure 4.24 RT60 reverberation time of small chamber music hall
Figure 4.24 RT60 reverberation time of small chamber music hall

The music hall in this example is equipped with curtains on the wall that can be lowered to absorb more sound and reduce the reverberation time. Figure 4.25 shows the impulse response measurement taken with the curtains in place. At first glance, this data doesn’t look very different from Figure 4.23, when the curtains were absent. There is a slight difference, however, in the rate of decay for the reverberant energy. The resulting reverberation time is shown in Figure 4.26. Adding the curtains reduces the average reverberation time by around 0.2 seconds.

Figure 4.25 Impulse response of small chamber music hall with curtains on the some of the walls
Figure 4.25 Impulse response of small chamber music hall with curtains on the some of the walls
Figure 4.26 RT60 reverberation time of small chamber music hall with curtains on some of the walls
Figure 4.26 RT60 reverberation time of small chamber music hall with curtains on some of the walls

4.2.2.4 Frequency Levels and Comb Filtering

When working with sound in acoustic space, you discover that there is a lot of potential for sound waves to interact with each other. If the waves are allowed to interact destructively – causing frequency cancelations – the result can be detrimental to the sound quality perceived by the audience.

Destructive sound wave interactions can happen when two loudspeakers generate identical sounds that are directed to the same acoustic space. They can also occur when a sound wave combines in the air with its own reflection from a surface in the room.

Let’s say there are two loudspeakers aimed at you, both generating the same sound. Loudspeaker A is 10 feet away from you, and Loudspeaker B is 11 feet away. Because sound travels at a speed of approximately one foot per millisecond, the sound from Loudspeaker B arrives at your ears one millisecond after the sound from Loudspeaker A, as shown in Figure 4.27. That one millisecond of difference doesn’t seem like much. How much damage can it really inflict on your sound? Let’s again assume that both sounds arrive at the same amplitude. Since the position of your ears to the two loudspeakers is directly related to the timing difference, let’s also assume that your head is stationary, as if you are sitting relatively still in your seat at a theater. In this case, a one millisecond difference causes the two sounds to interact destructively. In Chapter 2 you read about what happens when two identical sounds combine out-of-phase. In real life, phase differences can occur as a result of an offset in time. That extra one millisecond that it takes for the sound from Loudspeaker B to arrive at your ears results in a phase difference relative to the sound from Loudspeaker A. The audible result of this depends on the type of sound being generated by the loudspeakers.

Figure 4.27 Two loudspeakers arriving at a listener one millisecond apart
Figure 4.27 Two loudspeakers arriving at a listener one millisecond apart

Let’s assume, for the sake of simplicity, that both loudspeakers are generating a 500 Hz sine wave, and the speed of sound is 1000 ft/s. (As stated in Section 1.1.1, the speed of sound in air varies depending upon temperature and air pressure so you don’t always get a perfect 1130 ft/s.) Recall that wavelength equals velocity multiplied by period ($$\lambda =cT$$). Then with this speed of sound, a 500 Hz sine wave has a wavelength λ of two feet.

$$!\lambda =cT=\left ( \frac{1000\, ft}{s} \right )\left ( \frac{1\, s}{500\: cycles} \right )=\frac{2\: ft}{cycle}$$

At a speed of 1000 ft/s, sound travels one foot each millisecond, which implies that with a one millisecond delay, a sound wave is delayed by one foot. For 500 Hz, this is half the frequency’s wavelength. If you remember from Chapter 2, half a wavelength is the same thing as a 180o phase offset. In sum, a one millisecond delay between Loudspeaker A and Loudspeaker B results in a 180 o phase difference between the two 500 Hz sine waves. In a free-field environment with your head stationary, this results in a cancellation of the 500 Hz frequency when the two sine waves arrive at your ear. This phase relationship is illustrated in Figure 4.28.

Figure 4.28 Phase relationship between two 500 Hz sine waves one millisecond apart
Figure 4.28 Phase relationship between two 500 Hz sine waves one millisecond apart
Figure 4.29 Phase relationship between two 1000 Hz sine waves one millisecond apart
Figure 4.29 Phase relationship between two 1000 Hz sine waves one millisecond apart

If we switch the frequency to 1000 Hz, we’re now dealing with a wavelength of one foot. An analysis similar to the one above shows that the one millisecond delay results in a 360o phase difference between the two sounds. For sine waves, two sounds combining at a 360o phase difference behave the same as a 0o phase difference. For all intents and purposes, these two sounds are coherent, which means when they combine at your ear, they reinforce each other, which is perceived as an increase in amplitude. In other words, the totally in-phase frequencies get louder. This phase relationship is illustrated in Figure 4.29.

Simple sine waves serve as convenient examples for how sound works, but they are rarely encountered in practice. Almost all sounds you hear are complex sounds made up of multiple frequencies. Continuing our example of the one millisecond offset between two loudspeakers, consider the implications of sending two identical sine wave sweeps through two loudspeakers. A sine wave sweep contains all frequencies in the audible spectrum. When those two identical complex sounds arrive at your ear one millisecond apart, each of the matching pairs of frequency components combines at a different phase relationship. Some frequencies combine with a phase relationship that is a multiple of 180 o, causing cancellations. Some frequencies combine with a phase relationship that is a multiple of 360 o, causing reinforcements. All the other frequencies combine in phase relationships that vary between multiples of 0 o and 360 o, resulting in amplitude changes somewhere between complete cancellation and perfect reinforcement. This phenomenon is called comb filtering, which can be defined as a regularly repeating pattern of frequencies being attenuated or boosted as you move through the frequency spectrum. (See Figure 4.32.)

To understand comb filtering, let’s look at how we detect and analyze it in an acoustic space. First, consider what the frequency response of the sine wave sweep would look like if we measured it coming from one loudspeaker that is 10 feet away from the listener. This is the black line in Figure 4.30. As you can see, the line in the audible spectrum (20 to 20,000 Hz) is relatively flat, indicating that all frequencies are present, at an amplitude level just over 100 dBSPL. The gray line shows the frequency response for an identical sine sweep, but measured at a distance of 11 feet from the one loudspeaker. This frequency response is a little bumpier than the first. Neither frequency response is perfect because environmental conditions affect the sound as it passes through the air. Keep in mind that these two frequency responses, represented by the black and gray lines on the graph, were measured at different times, each from a single loudspeaker, and at distances from the loudspeaker that varied by one foot – the equivalent of offsetting them by one millisecond. Since the two sounds happened at different moments in time, there is of course no comb filtering.

Figure 4.30 Frequency response of two sound sources 1 millisecond apart
Figure 4.30 Frequency response of two sound sources 1 millisecond apart

The situation is different when the sound waves are played at the same time through the two loudspeakers not equidistant from the listener, such that the frequency components arrive at the listener in different phases. Figure 4.31 is a graph of frequency vs. phase for this situation. You can understand the graph in this way: For each frequency on the x-axis, consider a pair of frequency components of the sound being analyzed, the first belonging to the sound coming from the closer speaker and the second belonging to the sound coming from the farther speaker. The graph shows that degree to which these pairs of frequency components are out-of-phase, which ranges between -180o and 180o.

Figure 4.31 Phase relationship per frequency for two sound sources one millisecond apart
Figure 4.31 Phase relationship per frequency for two sound sources one millisecond apart

Figure 4.32 shows the resulting frequency response when these two sounds are combined. Notice that the frequencies that have a 0o relationship are now louder, at approximately 110 dB. On the other hand, frequencies that are out-of-phase are now substantially quieter, some by as much as 50 dB depending on the extent of the phase offset. You can see in the graph why the effect is called comb filtering. The scalloped effect in the graph is how comb filtering appears in frequency response graphs – a regularly repeated pattern of frequencies being attenuated or boosted as you more through the frequency spectrum.

Figure 4.32 Comb filtering frequency response of two sound sources one millisecond apart
Figure 4.32 Comb filtering frequency response of two sound sources one millisecond apart

[wpfilebase tag=file id=38 tpl=supplement /]

We can try a similar experiment to try to hear the phenomenon of comb filtering using just noise as our sound source. Recall that noise consists of random combinations of sound frequencies, usually sound that is not wanted as part of a signal. Two types of noise that a sound processing or analysis system can generate artificially are white noise and pink noise (and there are others). In white noise, there’s an approximately equal amount of each of the frequency components across the range of frequencies within the signal. In pink noise, there’s an approximately equal amount of the frequencies in each octave of frequencies. (Octaves, as defined in Chapter 3, are spaced such that the beginning frequency of one octave is ½ the beginning frequency of the next octave. Although each octave is twice as wide as the previous one – in the distance between its upper and lower frequencies – octaves sound like they are about the same width to human hearing.) The learning supplements to this chapter include a demo of comb filtering using white and pink noise.

Comb filtering in the air is very audible, but it is also very inconsistent. In a comb-filtered environment of sound, if you move your head just slightly to the right or left, you find that the timing difference between the two sounds arriving at your ear changes. With a change in timing comes a change in phase differences per frequency, resulting in comb filtering of some frequencies but not others. Add to this the fact that the source sound is constantly changing, and, all things considered, comb filtering in the air becomes something that is very difficult to control.

One way to tackle comb filtering in the air is to increase the delay between the two sound sources. This may seem counter-intuitive since the difference in time is what caused this problem in the first place. However, a larger delay results in comb filtering that starts at lower frequencies, and as you move up the frequency scale, the cancellations and reinforcements get close enough together that they happen within critical bands. The sum of cancellations and reinforcements within a critical band essentially results in the same overall amplitude as would have been there had there been no comb filtering. Since all frequencies within a critical band are perceived as the same frequency, your brain glosses over the anomalies, and you end up not noticing the destructive interference. (This is an oversimplification of the complex perceptual influence of critical bands, but it gives you a basic understanding for our purposes.) In most cases, once you get a timing difference that is larger than five milliseconds on a complex sound that is constantly changing, the comb filtering in the air is not heard anymore. We explain this point mathematically in Section 3.

The other strategy to fix comb filtering is to simply prevent identical sound waves from interacting. In a perfect world, loudspeakers would have shutter cuts that would let you put the sound into a confined portion of the room. This way the coverage pattern for each loudspeaker would never overlap with another. In the real world, loudspeaker coverage is very difficult to control. We discuss this further and demonstrate how to compensate for comb filtering in the video tutorial entitled “Loudspeaker Interaction” in Chapter 8.

Comb filtering in the air is not always the result of two loudspeakers. The same thing can happen when a sound reflects from a wall in the room and arrives in the same place as the direct sound. Because the reflection takes a longer trip to arrive at that spot in the room, it is slightly behind the direct sound. If the reflection is strong enough, the amplitudes between the direct and reflected sound are close enough to cause comb filtering. In really large rooms, the timing difference between the direct and reflected sound is large enough that the comb filtering is not very problematic. Our hearing system is quite good at compensating for any anomalies that result in this kind of sound interaction. In smaller rooms, such as recording studios and control rooms, it’s quite possible for reflections to cause audible comb filtering. In those situations, you need to either absorb the reflection or diffuse the reflection at the wall.

The worst kind of comb filtering isn’t the kind that occurs in the air but the kind that occurs on a wire. Let’s reverse our scenario and instead of having two sound sources, let’s switch to a single sound source such as a singer and use two microphones to pick up that singer. Microphone A is one foot away from the singer, and Microphone B is two feet away. In this case, Microphone B catches the sound from the singer one millisecond after Microphone A. When you mix the sounds from those two microphones (which happens all the time), you now have a one millisecond comb filter imposed on an electronic signal that then gets delivered in that condition to all the loudspeakers in the room and from there to all the listeners in the room equally. Now your problem can be heard no matter where you sit, and no matter how much you move your head around. Just one millisecond delay causes a very audible problem that no one can mask or hide from. The best way to avoid this kind of problem is never to allow two microphones to pick up the same signal at the same time. A good sound engineer at a mixing console ensures that only one microphone is on at a time, thereby avoiding this kind of destructive interaction. If you must have more than one microphone, you need to keep those microphones far away from each other. If this is not possible, you can achieve modest success fixing the problem by adding some extra delay to one of the microphones. This changes the phase effect of the two microphones combining, but doesn’t mimic the difference in level that would come if they were physically farther apart.

4.2.2.5 Resonance and Room Modes

In Chapter 2, we discussed the concept of resonance. Now we consider how resonance comes into play in real, hands-on applications.

Resonance plays a role in sound perception in a room. One practical example of this is the standing wave phenomenon, which in an acoustic space produces the phenomenon of room modes. Room modes are collections of resonances that result from sound waves reflecting from the surfaces of an acoustical space, producing places where sounds are amplified or attenuated. Places where the reflections of a particular frequency reinforce each other, amplifying that frequency, are the frequency’s antinodes. Places where the frequency’s reflections cancel each other are the frequency’s nodes. Consider this simplified example – a 10-foot-wide room with parallel walls that are good sound reflectors. Let’s assume again that the speed of sound is 1000 ft/s. Imagine a sound wave emanating from the center of the room. The sound waves reflecting off the walls either constructively or destructively interfere with each other at any given location in the room, depending on the relative phase of the sound waves at that point in time and space. If the sound wave has a wavelength that is exactly twice the width of the room, then the sound waves reflecting off opposite walls cancel each other in the center of the room but reinforce each other at the walls. Thus, the center of the room is a node for this sound wavelength and the walls are antinodes.

We can again apply the wavelength equation, $$\pi = c/f$$, to find a frequency f that corresponds to a wavelength λ that is exactly twice the width of the room, 2*10 = 20 feet.

$$!\lambda =c/f$$

$$!20\frac{ft}{cycle}=\frac{1000\frac{ft}{sec}}{f}$$

$$!f=\frac{50\, cycles}{s}$$

At the antinodes, the signals are reinforced by their reflections, so that the 50 Hz sound is unnaturally loud at the walls.   At the node in the center, the signals reflecting off the walls cancel out the signal from the loudspeaker. Similar cancellations and reinforcements occur with harmonic frequencies at 100 Hz, 150 Hz, 200 Hz, and so forth, whose wavelengths fit evenly between the two parallel walls. If listeners are scattered around the room, standing closer to either the nodes or antinodes, some hear the harmonic frequencies very well and others do not. Figure 4.33 illustrates the node and antinode positions for room modes when the frequency of the sound wave is 50 Hz, 100 Hz, 150 Hz, and 200 Hz. Table 4.6 shows the relationships among frequency, wavelength, number of nodes and antinodes, and number of harmonics.

Cancelling and reinforcement of frequencies in the room mode phenomenon is also an example of comb filtering.

Figure 4.33 Room mode
Figure 4.33 Room mode

 

[table caption=”Table 4.6 Room mode, nodes, antinodes, and harmonics” width=”50%”]

Frequency,Antinodes,Nodes,Wavelength,Harmonics
$$f_{0}=\frac{c}{2L}$$,2,1,$$\lambda =2L$$,1st harmonic
$$f_{1}=\frac{c}{L}$$,3,2,$$\lambda =L$$,2nd harmonic
$$f_{2}=\frac{3c}{2L}$$,4,3,$$\lambda =\frac{2L}{3}$$,3rd harmonic
$$f_{k}=\frac{kc}{2L}$$,k + 1,k,$$\lambda =\frac{2L}{k}$$,kth harmonic

[/table]

This example is actually more complicated than shown because there are actually multiple parallel walls in a room. Room modes can exist that involve all four walls of a room plus the floor and ceiling. This problem can be minimized by eliminating parallel walls whenever possible in the building design. Often the simplest solution is to hang material on the walls at selected locations to absorb or diffuse the sound.

[wpfilebase tag=file id=132 tpl=supplement /]

The standing wave phenomenon can be illustrated with a concrete example that also relates to instrument vibrations and resonances. Figure 4.34 shows an example of a standing wave pattern on a vibrating plate. In this case, the flat plate is resonating at 95 Hz, which represents a frequency that fits evenly with the size of the plate. As the plate bounces up and down, the sand on the plate keeps moving until it finds a place that isn’t bouncing. In this case, the sand collects in the nodes of the standing wave. (These are called Chladni patterns, after the German scientist who originated the experiments in the early 1800s.) If a similar resonance occurred in a room, the sound would get noticeably quieter in the areas corresponding to the pattern of sand because those would be the places in the room where air molecules simply aren’t moving (neither compression nor rarefaction). For a more complete demonstration of this example, see the video demo called Plate Resonance linked in this section.

Figure 4.34 Resonant frequency on a flat plate
Figure 4.34 Resonant frequency on a flat plate

4.2.2.6 The Precedence Effect

When two or more similar sound waves interact in the air, not only does the perceived frequency response change, but your perception of the location of the sound source can change as well. This phenomenon is called the precedence effect. The precedence effect occurs when two similar sound sources arrive at a listener at different times from different directions, causing the listener to perceive both sounds as if they were coming from the direction of the sound that arrived first.

[wpfilebase tag=file id=39 tpl=supplement /]

The precedence effect is sometimes intentionally created within a sound space. For example, it might be used to reinforce the live sound of a singer on stage without making it sound as if some of the singer’s voice is coming from a loudspeaker. However, there are conditions that must be in place for the precedence effect to occur. First is that the difference in time arrival at the listener between the two sound sources needs to be more than one millisecond. Also, depending on the type of sound, the difference in time needs to be less than 20 to 30 milliseconds or the listener perceives an audible echo. Short transient sounds starts to echo around 20 milliseconds, but longer sustained sounds don’t start to echo until around 30 milliseconds. The required condition is that the two sounds cannot be more than 10 dB different in level. If the second arrival is more than 10 dB louder than the first, even if the timing is right, the listener begins to perceive the two sounds to be coming from the direction of the louder sound.

When you intentionally apply the precedence effect, you have to keep in mind that comb filtering still applies in this scenario. For this reason, it’s usually best to keep the arrival differences to more than five milliseconds because our hearing system is able to more easily compensate for the comb filtering at longer time differences.

The advantage to the precedence effect is that although you perceive the direction of both sounds as arriving from the direction of the first arrival, you also perceive an increase in loudness as a result of the sum of the two sound waves. This effect has been around for a long time and is a big part of what gives a room “good acoustics.” There exist rooms where sound seems to propagate well over long distances, but this isn’t because the inverse square law is magically being broken. The real magic is the result of reflected sound. If sound is reflecting from the room surfaces and arriving at the listener within the precedence time window, the listener perceives an increase in sound level without noticing the direction of the reflected sound. One goal of an acoustician is to maximize the good reflections and minimize the reflections that would arrive at the listener outside of the precedence time window, causing an audible echo.

The fascinating part of the precedence effect is that multiple arrivals can be daisy chained, and the effect still works. There could be three or more distinct arrivals at the listener, and as long as each arrival is within the precedence time window of the previous arrival, all the arrivals sound like they’re coming from the direction of the first arrival. From the perspective of acoustics, this is equivalent to having several early reflections arrive at the listener. For example, a listener might hear a reflection 20 milliseconds after the direct sound arrives. This reflection would image back to the first arrival of the direct sound, but the listener would perceive an increase in sound level. A second reflection could also arrive 40 milliseconds later. Alone, this 40 millisecond reflection would cause an audible echo, but when it’s paired with the first 20 millisecond reflection, no echo is perceived by the listener because the second reflection is arriving within the precedence time window of the first reflection. Because the first reflection arrives within the precedence time window of the direct sound, the sound of both reflections image back to the direct sound. The result is that the listener perceives an overall increase in level along with a summation of the frequency response of the three sounds.

The precedence effect can be replicated in sound reinforcement systems. It is common practice now in live performance venues to put a microphone on a performer and relay that sound out to the audience through a loudspeaker system in an effort to increase the overall sound pressure level and intelligibility perceived by the audience. Without some careful attention to detail, this process can lead to a very unnatural sound. Sometimes this is fine, but in some cases the goal might be to improve the level and intelligibility while still allowing the audience to perceive all the sound as coming from the actual performer. Using the concept of the precedence effect, a loudspeaker system could be designed that has the sound of multiple loudspeakers arriving at the listener from various distances and directions. As long as each loudspeaker arrives at the listener within 5 to 30 milliseconds and within 10 dB of the previous sound with the natural sound of the performer arriving first, all the sound from the loudspeaker system images in the listener’s mind back to the location of the actual performer. When the precedence effect is handled well, it simply sounds to the listener like the performer is naturally loud and clear, and that the room has good acoustics.

As you can imagine from the issues discussed above, designing and setting up a sound system for a live performance is a complicated process. A good knowledge of amount of digital signal processing is required to manipulate the delay, level and frequency response of each loudspeaker in the system to line up properly at all the listening points in the room. The details of this process are beyond the scope of this book. For more information, see (Davis and Patronis, 2006) and (McCarthy, 2009).

4.2.2.7 Effects of Temperature

In addition to the physical obstructions with which sound interacts, the air through which sound travels can have an effect on the listener’s experience.

As discussed in Chapter 2, the speed of sound increases with higher air temperatures. It seems fairly simple to say that if you can measure the temperature in the air you’re working in, you should be able to figure out the speed of sound in that space. In actual practice, however, air temperature is rarely uniform throughout an acoustic space. When sound is played outdoors, in particular, the wave front encounters varying temperatures as it propagates through the air.

Consider the scenario where the sun has been shining down on the ground all day. The sun warms up the ground. When the sun sets at the end of the day (which is usually when you start an outdoor performance), the air cools down. The ground is still warm, however, and affects the temperature of the air near the ground. The result is a temperature gradient that gets warmer the closer you get to the ground. When a sound wave front tries to propagate through this temperature gradient, the portion of the wave front that is closer to the ground travels faster than the portion that is higher up in the air. This causes the wave front to curve upwards towards the cooler air. Usually, the listeners are sitting on the ground, and therefore the sound is traveling away from them. The result is a quieter sound for those listeners. So if you spent the afternoon setting your sound system volume to a comfortable listening level, when the performance begins at sun down, you’ll have to increase the volume to maintain those levels because the sound is being refracted up towards the cooler air.

Figure 4.35 shows a diagram representing this refraction. Recall that sound is a longitudinal wave where the air pressure amplitude increases and decreases, vibrating the air molecules back and forth in the same direction in which the energy is propagating. The vertical lines represent the wave fronts of the air pressure propagation. Because the sound travels faster in warmer air, the propagation of the air pressure is faster as you get closer to the ground. This means that the wave fronts closer to the ground are ahead of those farther from the ground, causing the sound wave to refract upwards.

Figure 4.35 Sound refracted toward cooler air
Figure 4.35 Sound refracted toward cooler air

A similar thing can happen indoors in a movie theater or other live performance hall. Usually, sound levels are set when the space is empty prior to an audience arriving. When an audience arrives and fills all the seats, things suddenly get a lot quieter, as any sound engineer will tell you. Most attribute this to sound absorption in the sense that a human body absorbs sound much better than an empty chair. Absorption does play a role, but it doesn’t entirely explain the loss of perceived sound level. Even if human bodies are absorbing some of the sound, the sound arriving at the ears directly from the loudspeaker, with no intervening obstructions, arrives without having been dampened by absorption. It’s the reflected sound that gets quieter. Also, most theater seats are designed with padding and perforation on the underside of the seat so that they absorb sound at a similar rate to a human body. This way, when you’re setting sound levels in an empty theatre, you should be able to hear sound being absorbed the way it will be absorbed when people are sitting in those seats, allowing you to set the sound properly. Thus, absorption can’t be the only reason for the sudden drop in sound level when the listeners fill the audience. Temperature is also a factor here. Not only is the human body a good absorber of acoustic energy, but it is also very warm. Fill a previously empty audience area with several hundred warm bodies, turn on the air conditioning that vents out from the ceiling, and you’re creating a temperature gradient that is even more dramatic than the one that is created outdoors at sundown. As the sound wave front travels toward the listeners, the air nearest to the listeners allows the sound to travel faster while the air up near the air conditioning vents slows the propagation of that portion of the wave front. Just as in the outdoor example, the wave front is refracted upward toward the cooler air, and there may be a loss in sound level perceived by the listeners. There isn’t anything that can be done about the temperature effects. Eventually the temperature will even out as the air conditioning does its job. The important thing to remember is to listen for a while before you try to fix the sound levels. The change in sound level as a result of temperature will likely fix itself over time.

4.2.2.8 Modifying and Adapting to the Acoustical Space

An additional factor to consider when you’re working with indoor sound is the architecture of the room, which greatly affects the way sound propagates. When a sound wave encounters a surface (walls, floors, etc.) several things can happen. The sound can reflect off the surface and begin traveling another direction, it can be absorbed by the surface, it can be transmitted by the surface into a room on the opposite side, or it can be diffracted around the surface if the surface is small relative to the wavelength of the sound.

Typically some combination of all four of these things happens each time a sound wave encounters a surface. Reflection and absorption are the two most important issues in room acoustics. A room that is too acoustically reflective is not very good at propagating sound intelligibly. This is usually described as the room being too “live.” A room that is too acoustically absorptive is not very good at propagating sound with sufficient amplitude. This is usually described as the room being too “dead.” The ideal situation is a good balance between reflection and absorption to allow the sound to propagate through the space loudly and clearly.

The kinds of reflections that can help you are called early reflections, which arrive at the listener within 30 milliseconds of the direct sound. The direct sound arrives at the listener directly from the source. An early reflection can help with the perceived loudness of the sound because the two sounds combine at the listener’s ear in a way that reinforces, creating a precedence effect. Because the reflection sounds like the direct sound and arrives shortly after the direct sound, the listener assumes both sounds come from the source and perceives the result to be louder as a result of the combined amplitudes. If you have early reflections, it’s important that you don’t do anything to the room that would stop those early reflections such as modifying the material of the surface with absorptive material. You can create more early reflections by adding reflective surfaces to the room that are angled in such a way that the sound hitting that surface is reflected to the listener.

If you have reflections that arrive at the listener more than 30 milliseconds after the direct sound, you’ll want to fix that because these reflections sound like echoes and destroy the intelligibility of the sound. You have two options when dealing with late reflections. The first is simply to absorb them by attaching to the reflective surface something absorptive like a thick curtain or acoustic absorption tile (Figure 4.36). The other option is to diffuse the reflection.

Figure 4.36 Acoustic absorption tile
Figure 4.36 Acoustic absorption tile

When reflections get close enough together, they cause reverberation. Reverberant sound can be a very nice addition to the sound as long as the reverberant sound is quieter than the direct sound. The relationship between the direct and reverberant sound is called the direct to reverberant ratio. If that ratio is too low, you’ll have intelligibility problems.Diffusing a late reflection using diffusion tiles (Figure 4.37) generates several random reflections instead of a single one. If done correctly, diffusion converts the late reflection into reverberation. If the reverberant sound in the room is already at a sufficient level and duration, then absorbing the late reflection is probably the best route. For more information on identifying reflections in the room, see Section 4.2.2.3.

Figure 4.37 Acoustic diffusion tile
Figure 4.37 Acoustic diffusion tile

If you’ve exhausted all the reasonable steps you can take to improve the acoustics of the room, the only thing that remains is to increase the level of the direct sound in a way that doesn’t increase the reflected sound. This is where sound reinforcement systems come in. If you can use a microphone to pick up the direct sound very close to the source, you can then play that sound out of a loudspeaker that is closer to the listener in a way that sounds louder to the listener. If you can do this without directing too much of the sound from the loudspeaker at the room surfaces, you can increase the direct to reverberant ratio, thereby increasing the intelligibility of the sound.

8.2.4.1 Designing a Sound Delivery System

Theatre and concert performances introduce unique challenges in pre-production not present in sound for CD, DVD, film, or video due to the fact that the sound is delivered live.  One of the most important parts of the process in this context is the design of a sound delivery system.  The purpose of the design is to ensure clarity of sound and a uniform experience among audience members.

In a live performance, it’s quite possible that when the performers on the stage create their sound, that sound does not arrive at the audience loudly or clearly enough to be intelligible. A sound designer or sound engineer is hired to design a sound reinforcement system to address this problem. The basic process is to use microphones near the performers to pick up whatever sound they’re making and then play that sound out of strategically-located loudspeakers.

There are several things to consider when designing and operating a sound reinforcement system:

  • The loudspeakers must faithfully generate a loud enough sound.
  • The microphones must pick up the source sound as faithfully as possible without getting in the way.
  • The loudspeakers must be positioned in a way that will direct the sound to the listeners without sending too much sound to the walls or back to the microphones. This is because reflections and reverberations affect intelligibility and gain.
  • Ideally, the sound system will deliver a similar listening experience to all the listeners regardless of where they sit.

Many of these considerations can be analyzed before you purchase the sound equipment so that you can spend your money wisely, you can Discover the best audio equipment at Sound Manual. Also, once the equipment is installed, the system can be tested and adjusted for better performance. These adjustments include repositioning microphones and loudspeakers to improve gain and frequency response, replacing equipment with something else that performs better, and adjusting the settings on equalizers, compressors, crossovers, and power amplifiers.

Most loudspeakers have a certain amount of directivity. Loudspeaker directivity is described in terms of the 6 dB down point – a horizontal and vertical angle off-axis corresponding to the location where the sound is reduced by 6 dB.  The 6 dB down point is significant because, as a rule of thumb, you want the loudness at any two points in the audience to differ by no more than 6 dB. In other words, the seat on the end of the aisle shouldn’t sound more than 6 dB quieter or louder than the seat in the middle of the row, or anywhere else in the audience.

The issue of loudspeaker directivity is complicated by the fact that loudspeakers naturally have a different directivity for each frequency. A single circular loudspeaker driver is more directional as the frequency increases because the loudspeaker diameter gets larger relative to the wavelength of the frequency. This high-frequency directivity effect is illustrated in Figure 8.21. Each of the six plots in the figure represents a different frequency produced by the same circular loudspeaker driver. In the figures,  is the wavelength of the sound.  (Recall that the higher the frequency, the smaller the wavelength.  See Chapter 2 for the definition of wavelength, and see Chapter 1 for an explanation of how to read a polar plot.)

Going from top to bottom, left to right in Figure 8.21, the frequencies being depicted get smaller.  Notice that frequencies having a wavelength that is longer than the diameter of the loudspeaker are dispersed very widely, as shown in the first two polar plots. Once the frequency has a wavelength that is equal to the diameter of the loudspeaker, the loudspeaker begins to exercise some directional control over the sound. This directivity gets narrower as the frequency increases and the wavelength decreases.

Figure 8.21 Directivity of circular radiators. Diagrams created from actual measured sound
Figure 8.21 Directivity of circular radiators. Diagrams created from actual measured sound

This varying directivity per frequency for a single loudspeaker driver partially explains why most full-range loudspeakers have multiple drivers. The problem is not that a single loudspeaker can’t produce the entire audible spectrum. Any set of headphones uses a single driver for the entire spectrum. The problem with using one loudspeaker driver for the entire spectrum is that you can’t distribute all the frequencies uniformly across the listening area. The listeners sitting right in front of the loudspeaker will hear everything fine, but for the listeners sitting to the side of the loudspeaker, the low frequencies will be much louder than the high ones. To distribute frequencies more uniformly, a second loudspeaker driver can be added, considerably smaller than the first.  Then an electronic unit called a crossover directs the high frequencies to the small driver and the low frequencies to the large driver.  We two different-size drivers, you can achieve a much more uniform directional dispersion, as shown in Figure 8.22. In this case, the larger driver is 5″ in diameter and the smaller one is 1″ in diameter. Wavelengths corresponding to frequencies of 500 Hz and 1000 Hz have larger wavelengths than 5″, so they are fairly omnidirectional. The reason that frequencies of 2000 Hz and above have consistent directivity is that the frequencies are distributed to the two loudspeaker drivers in a way that keeps the relationship consistent between the wavelength and the diameter of the driver. The 2000 Hz and 4000 Hz frequencies would be directed through the 5” diameter driver because their wavelengths are between 6” and3”. The 8000 Hz and 16,000 Hz frequencies would be distributed to the 1” diameter driver because their wavelengths are between 2” and1”. This way the two different size drivers are able to exercise directional control over the frequencies that are radiating.

Figure 8.22 Directivity of 2-way loudspeaker system with 5" and 1" diameter drivers
Figure 8.22 Directivity of 2-way loudspeaker system with 5″ and 1″ diameter drivers

There are many other strategies used by loudspeaker designers to get consistent pattern control, but all must take into account the size of the loudspeaker drivers and way in which they affect frequencies. You can simply look at any loudspeaker and easily determine the lowest possible directional frequency based on the loudspeaker’s size.

Understanding how a loudspeaker exercises directional control over the sound it radiates can also help you decide where to install and aim a loudspeaker to provide consistent sound levels across the area of your audience. Using the inverse square law in conjunction with the loudspeaker directivity information, you can find a solution that provides even sound coverage over a large audience area using a single loudspeaker. (The inverse square law is introduced in Chapter 4.)

Consider the example 1000 Hz vertical polar plot for a loudspeaker shown in Figure 8.23. If you’re going to use that loudspeaker in the theatre shown in Figure 8.24, where do you aim the loudspeaker?

Figure 8.23 Vertical 1000 Hz polar plot for a loudspeaker
Figure 8.23 Vertical 1000 Hz polar plot for a loudspeaker
Figure 8.24 Section view of audience area with distances and angles for a loudspeaker
Figure 8.24 Section view of audience area with distances and angles for a loudspeaker

Most beginning sound system designers will choose to aim the loudspeaker at seat B thinking that it will keep the entire audience as close as possible to the on-axis point of the loudspeaker. To test the idea, we can calculate the dB loss over distance using the inverse square law for each seat and then subtract any additional dB loss incurred by going off-axis from the loudspeaker. Seat B is directly on axis with the loudspeaker, and according to the polar plot there is a loss of approximately 2 dB at 0 degrees. Seat A is 33 degrees down from the on-axis point of the loudspeaker, corresponding to 327 degrees on the polar plot, which shows an approximate loss of 3 dB. Seat C is 14 degrees off axis from the loudspeaker, resulting in a loss of 6 dB according to the polar plot. Assuming that the loudspeaker is outputting 100 dBSPL at 1 meter (3.28 feet), we can calculate the dBSPL level for each seat as shown in Table 8.1.

[listtable width=50% caption=””]

  • A
    • $$Seat\: A\: dBSPL = 100 dB + \left ( 20\log_{10}\frac{3.28′}{33.17′} \right )-3 dB$$
    • $$Seat\: A\: dBSPL = 100 dB + \left ( 20\log_{10}0.1 \right )-3 dB$$
    • $$Seat\: A\: dBSPL = 100 dB + \left ( 20\ast -1 \right )-3 dB$$
    • $$Seat\: A\: dBSPL = 100 dB + \left ( -20\right )-3 dB$$
    • $$Seat\: A\: dBSPL = 77\, dBSPL$$
  • B
    • $$Seat\: B\: dBSPL = 100 dB + \left ( 20\log_{10}\frac{3.28′}{50.53′} \right )-2 dB$$
    • $$Seat\: B\: dBSPL = 100 dB + \left ( 20\log_{10}0.06 \right )-2 dB$$
    • $$Seat\: B\: dBSPL = 100 dB + \left ( 20\ast -1.19 \right )-2 dB$$
    • $$Seat\: B\: dBSPL = 100 dB + \left ( -23.75\right )-2 dB$$
    • $$Seat\: B\: dBSPL = 74.25\, dBSPL$$
  • C
    • $$Seat\: C\: dBSPL = 100 dB + \left ( 20\log_{10}\frac{3.28′}{77.31′} \right )-6 dB$$
    • $$Seat\: C\: dBSPL = 100 dB + \left ( 20\log_{10}0.04 \right )-6 dB$$
    • $$Seat\: C\: dBSPL = 100 dB + \left ( 20\ast -1.37 \right )-6 dB$$
    • $$Seat\: C\: dBSPL = 100 dB + \left ( -27.45\right )-6 dB$$
    • $$Seat\: C\: dBSPL = 66.55\, dBSPL$$

[/listtable]

Table 8.1 Calculating dBSPL of a given loudspeaker aimed on-axis with seat B

In this case the loudest seat is seat A at 77 dBSPL, and seat C is the quietest at 66.55 dBSPL, with a 10.45 dB difference. As discussed, we want all the audience locations to be within a 6 dB range. But before we throw this loudspeaker away and try to find one that works better, let’s take a moment to examine the reasons why we have such a poor result. The reason seat C is so much quieter than the other seats is that it is the farthest away from the loudspeaker and is receiving the largest reduction due to directivity. By comparison, A is the closest to the loudspeaker, resulting in the lowest loss over distance and only a 3 dB reduction due to directivity. To even this out let’s try having the farthest seat away be the seat with the least directivity loss, and the closest seat to the loudspeaker have the most directivity loss.

The angle with the least directivity loss is around 350 degrees, so if we aim the loudspeaker so that seat C lines up with that 350 degree point, that seat will have no directivity loss. With that aim point, seat B will then have a directivity loss of 3 dB, and seat A will have a directivity loss of 10 dB. Now we can recalculate the dBSPL for each seat as shown in Table 8.2.

[listtable width=50% caption=””]

  • A
    • $$Seat\: A\: dBSPL = 100 dB + \left ( 20\log_{10}\frac{3.28′}{33.17′} \right )-10 dB$$
    • $$Seat\: A\: dBSPL = 100 dB + \left ( 20\log_{10}0.1 \right )-10 dB$$
    • $$Seat\: A\: dBSPL = 100 dB + \left ( 20\ast -1 \right )-10 dB$$
    • $$Seat\: A\: dBSPL = 100 dB + \left ( -20\right )-10 dB$$
    • $$Seat\: A\: dBSPL = 70\, dBSPL$$
  • B
    • $$Seat\: B\: dBSPL = 100 dB + \left ( 20\log_{10}\frac{3.28′}{50.53′} \right )-3 dB$$
    • $$Seat\: B\: dBSPL = 100 dB + \left ( 20\log_{10}0.06 \right )-3 dB$$
    • $$Seat\: B\: dBSPL = 100 dB + \left ( 20\ast -1.19 \right )-3 dB$$
    • $$Seat\: B\: dBSPL = 100 dB + \left ( -23.75\right )-3 dB$$
    • $$Seat\: B\: dBSPL = 73.25\, dBSPL$$
  • C
    • $$Seat\: C\: dBSPL = 100 dB + \left ( 20\log_{10}\frac{3.28′}{77.31′} \right )-0 dB$$
    • $$Seat\: C\: dBSPL = 100 dB + \left ( 20\log_{10}0.04 \right )-0 dB$$
    • $$Seat\: C\: dBSPL = 100 dB + \left ( 20\ast -1.37 \right )-0 dB$$
    • $$Seat\: C\: dBSPL = 100 dB + \left ( -27.45\right )-0 dB$$
    • $$Seat\: C\: dBSPL = 72.55\, dBSPL$$

[/listtable]

Table 8.2 Calculating dBSPL of a given loudspeaker aimed on-axis with seat C

In this case our loudest seat is seat B at 73.25 dB SPL, and our quietest seat is seat A at 70 dBSPL, for a difference of 3.25 dB. Compared with the previous difference of 10.55 dB, we now have a much more even distribution of sound to the point where most listeners will hardly notice the difference. Before we fully commit to this plan, we have to test these angles at several different frequencies, but this example serves to illustrate an important rule of thumb when aiming loudspeakers. In most cases, the best course of action is to aim the loudspeaker at the farthest seat, and have the closest seat be the farthest off-axis to the loudspeaker. This way, as you move from the closest seat to the farthest seat, while you’re losing dB over the extra distance you’re also gaining dB by moving more directly on-axis with the loudspeaker.

[aside]EASE was developed by German engineers ADA (Acoustic Design Ahnert) in 1990 and introduced at the 88th AES Convention.  That’s also the same year that Microsoft announced Windows 3.0.[/aside]

Fortunately there are software tools that can help you determine the best loudspeakers to use and the best way to deploy them in your space. These tools range in price from free solutions such as MAPP Online Pro from Meyer Sound shown in Figure 8.25 to relatively expensive commercial products like EASE from the Ahnert Feistel Media Group, shown in Figure 8.26. These programs allow you to create a 2D or 3D drawing of the room and place virtual loudspeakers in the drawing to see how they disperse the sound. The virtual loudspeaker files come in several formats. The most common is the EASE format. EASE is the most expensive and comprehensive solution out there, and fortunately most other programs have the ability to import EASE loudspeaker files. Another format is the Common Loudspeaker Format (CLF). CLF files use an open format, and many manufacturers are starting to publish their loudspeaker data in CLF. Information on loudspeaker modeling software that uses CLF can be found at the website for the Common Loudspeaker Format Group http://www.clfgroup.org.

Figure 8.25 MAPP Online Pro software from Meyer Sound
Figure 8.25 MAPP Online Pro software from Meyer Sound
Figure 8.26 EASE software
Figure 8.26 EASE software

8.2.4.2 System Documentation

Once you’ve decided on a loudspeaker system that distributes the sound the way you want, you need to begin the process of designing the systems that capture the sound of the performance and feed it into the loudspeaker system. Typically this involves creating a set of drawings that give you the opportunity to think through the entire sound system and explain to others – installers, contractors, or operators, for example – how the system will function.

[aside]You can read the entire USITT document on System Diagram guidelines by visiting the USITT website.[/aside]

The first diagram to create is the System Diagram. This is similar in function to an electrical circuit diagram, showing you which parts are used and how they’re wired up.  The sound system diagram shows how all the components of a sound system connect together in the audio signal chain, starting from the microphones and other input devices all the way through to the loudspeakers that reproduce that sound. These diagrams can be created digitally with vector drawing programs such as AutoCAD and VectorWorks or diagramming programs such as Visio and OmniGraffle.

The United States Institute for Theatre Technology has published some guidelines for creating system diagrams. The most common symbol or block used in system diagrams is the generic device block shown in Figure 8.27. The EQUIPMENT TYPE label should be replaced with a descriptive term such a CD PLAYER or MIXING CONSOLE. You can also specify the exact make and model of the equipment in the label above the block.

Figure 8.27 A generic device block for system diagrams
Figure 8.27 A generic device block for system diagrams

There are also symbols to represent microphones, power amplifiers, and loudspeakers. You can connect all the various symbols to represent an entire sound system. Figure 8.28 shows a very small sound system, and Figure 8.29 shows a full system diagram for a small musical theatre production.

Figure 8.28 A small system diagram
Figure 8.28 A small system diagram
Figure 8.29 System diagram for a full sound system
Figure 8.29 System diagram for a full sound system

While the system diagram shows the basic signal flow for the entire sound system, there is a lot of detail missing about the specific interconnections between devices. This is where a patch plot can be helpful. A patch plot is essentially a spreadsheet that shows every connection point in the sound system. You should be able to use the patch plot to determine which and how many cables you’ll need for the sound system.  It can also be a useful tool in troubleshooting a sound system that isn’t behaving properly. The majority of the time when things go wrong with your sound system or something isn’t working, it’s because it isn’t connected properly or one of the cables has been damaged. A good patch plot can help you find the problem by showing you where all the connections are located in the signal path. There is no industry standard for creating a patch plot, but the rule of thumb is to err on the side of too much information. You want every possible detail about every audio connection made in the sound system. Sometimes color coding can help make the patch plot easier to understand. Figure 8.30 shows an example patch plot for the sound system in Figure 8.28.

Figure 8.30 Patch plot for a simple sound system
Figure 8.30 Patch plot for a simple sound system

8.2.4.3 Sound Analysis Systems

[aside]Acoustic systems are systems in which the sounds produced depend on the shape and material of the sound-producing instruments. Electroacoustic systems produce sound through electronic technology such as amplifiers and loudspeakers.[/aside]

Section 8.2.4.1 discussed mathematical methods and tools that help you to determine were loudspeakers should be placed to maximize clarity and minimize the differences in what is heard in different locations in an auditorium.  However, even with good loudspeaker placement, you’ll find there are differences between the original sound signal and how it sounds when it arrives as the listener.  Different frequency components respond differently to their environment, and frequency components interact with each other as sounds from multiple sources combine in the air.  The question is, how are these frequencies heard by the audience once they pass through loudspeakers and travel through space encountering obstructions, varying air temperatures, comb filtering, and so forth? Is each frequency arriving at the audience’s ears at the desired amplitude? Are certain frequencies too loud or too quiet?  If the high frequencies are too quiet, you could sacrifice the brightness or clarity in the sound.  Low frequencies that are too quiet could result in muffled voices.  There are no clear guidelines on what the “right” frequency response is because it usually boils down to personal preference, artistic considerations, performance styles, and so forth.  In any case, before you can decide if you have a problem, the first step is to analyze the frequency response in your environment. With practice you can hear and identify frequencies, but sometimes being able to see the frequencies can help you to diagnose and solve problems. This is especially true when you’re setting up the sound system for a live performance in a theatre.

A sound analysis system is one of the fundamental tools for ensuring that frequencies are being received at proper levels. The system consists of a computer running the analysis software, an audio interface with inputs and outputs, and a special analysis microphone.  An analysis microphone is different from a traditional recording microphone. Most recording microphones have a varying response or sensitivity at different frequencies across the spectrum. This is often a desired result of their manufacturing and design, and part of what gives each microphone its unique sound. For analyzing acoustic or electroacoustic systems, you need a microphone that measures all frequencies equally.  This is often referred to as having a flat response.  In addition, most microphones are directional. They pick up sound better in the front than in the back. A good analysis microphone should be omnidirectional so it can pick up the sound coming at it from all directions. Figure 8.31 shows a popular analysis microphone from Earthworks.

Figure 8.31 Earthworks M30 analysis microphone
Figure 8.31 Earthworks M30 analysis microphone

There are many choices for analysis software, but they all fall into two main categories: signal dependent and signal independent.  Signal dependent sound analysis systems rely on a known stimulus signal that the software generates – e.g., a sine wave sweep.  A sine wave sweep is a sound that begins at a low frequency sine wave and smoothly moves up in frequency to some given high frequency limit.  The sweep, lasting a few seconds or less, is sent by a direct cable connection to the loudspeaker. You then place your analysis microphone at the listening location you want to analyze. The microphone picks up the sound radiated by the loudspeaker so that you can compare what the microphone picks up with what was actually sent out.

The analysis software records and stores the information in a file called an impulse response.  The impulse response is a graph of the sound wave with time on the x-axis and the amplitude of the sound wave on the y-axis.  This same information can be displayed in a frequency response graph, which has frequencies on the x-axis and the amplitude of each frequency on the y-axis.  (In Chapter 7, we’ll explain the mathematics that transforms the impulse response graph to the frequency response graph, and vice versa.) Figure 8.32 shows an example frequency response graph created by the procedure just described.

Figure 8.32 Frequency response graph created from a signal dependent sound analysis system
Figure 8.32 Frequency response graph created from a signal dependent sound analysis system

Figure 8.33 shows a screenshot from FuzzMeasure Pro, a signal dependent analysis program that runs on the Mac operating system.  The frequency response is on the top, and the impulse response is at the bottom.  As you recall from Chapter 2, the frequency response has frequencies on the horizontal axis and amplitudes of these frequency components on the vertical axis.  It should how the frequencies “responded” to their environment as they moved from the loudspeaker to the microphone.  We know that the sine wave emitted had frequencies distributed evenly across the audible spectrum, so if the sound was not affected in passage, the frequency response graph should be flat.  But notice in the graph that the frequencies between 30 Hz and 500 Hz are 6 to 10 dB louder than the rest, which is their response to the environment.

Figure 8.33 FuzzMeasure Pro sound analysis software
Figure 8.33 FuzzMeasure Pro sound analysis software

When you look at an analysis such as this, it’s up to you to decide if you’ve identified a problem that you want to solve. Keep in mind that the goal isn’t necessarily to make the frequency response graph be a straight line, indicating all frequencies are of equal amplitude. The goal is to make the right kind of sound. Before you can decide what to do, you need to determine why the frequency response sounds like this. There are many possible reasons.  It could be that you’re too far off-axis from the loudspeaker generating the sound. That’s not a problem you can really solve when you’re analyzing a listening space for a large audience, since not everyone can sit in the prime location. You could move the analysis microphone so that you’re on-axis with the loudspeaker, but you can’t fix the off-axis frequency response for the loudspeaker itself.  In the example shown in Figure 8.34 the loudspeaker system that is generating the sound uses two sets of sound radiators. One set of loudspeakers generates the frequencies above 500 Hz. The other set generates the frequencies below 500 Hz. Given that information, you could conclude that the low-frequency loudspeakers are simply louder than the high frequency ones. If this is causing a sound that you don’t want, you could fix it by reducing the level of the low-frequency loudspeakers.

Figure 8.34 Frequency response graph showing a low frequency boost
Figure 8.34 Frequency response graph showing a low frequency boost

Figure 8.35 shows the result of after this correction. The grey line shows the original frequency response and the black line shows the frequency response after reducing the amplitude of the low-frequency loudspeakers by 6 dB.

Figure 8.34 Frequency response graph showing a low frequency boost
Figure 8.34 Frequency response graph showing a low frequency boost

The previous example gives you a sketch of how a sound analysis system might be used. You place yourself in a chosen position in a room where sound is to be performed or played, generate sound that is played through loudspeakers, and then measure the sound as it is received at your chosen position. The frequencies that are actually detected may not be precisely the frequency components of the original sound that was generated or played.   By looking at the difference between what you played and what you are able to measure, you can analyze the frequency response of your loudspeakers, the acoustics of your room, or a combination of the two. The frequencies that are measured by the sound analysis system are dependent not only on the sound originally produced, but also on the loudspeakers’ types and positions, the location of the listener in the room, and the acoustics of the room. Thus, in addition to measuring the frequency response of your loudspeakers, the sound analysis system can help you to determine if different locations in the room vary significantly in their frequency response, leaving it to you to decide if this is a problem and what factor might be the source.

The advantage to a signal dependent system is that it’s easy to use, and with it you can get a good general picture of how frequencies will sound in a given acoustic space with certain loudspeakers. You also can save the frequency response graphs to refer to and analyze later. The disadvantage to a signal dependent analysis system is that it uses only artificially-generated signals like sine sweeps, not real music or performances.

If you want to analyze actual music or performances, you need to use a signal independent analysis system. These systems allow you to analyze the frequency response recorded music, voice, sound effects, or even live performances as they sound in your acoustic space. In contrast to systems like FuzzMeasure, which know the precise sweep of frequencies they’re generating, signal independent systems must be given a direct copy of the sound being played so that the original sound can be compared with the sound that passes through the air and is received by the analysis microphone. This is accomplished by taking the original sound and sending one copy of it to the loudspeakers while a second copy is sent directly, via cable, to the sound analysis software. The software presumably is running on a computer that has a sound card attached with two sound inputs. One of the inputs is the analysis microphone and one is a direct feed from the sound source. The software compares the two signals in real time – as the music or sound is played – and tells you what is different about them.

The advantage of the signal independent system is that it can analyze “real” sound as it is being played or performed. However, real sound has frequency components that constantly change, as we can tell from the constantly changing pitches that we hear. Thus, there isn’t one fixed frequency response graph that gives you a picture of how your loudspeakers and room are dealing with the frequencies of the sound. The graph changes dynamically over the entire time that the sound is played. For this reason, you can’t simply save one graph and carry it off with you for analysis. Instead, your analysis consists of observing the constantly-changing frequency response graph in real time, as the sound is played. If you wanted to save a single frequency response graph, you’d have to do what we did to generate Figure 8.36 – that is, get a “screen capture” of the frequency response graph at a specific moment in time – and the information you have is about only that moment. Another disadvantage of signal independent systems is that they analyze the noise in the environment along with the desired sound.

Figure 8.36 was produced from a popular signal independent analysis program called Smaart Live, which runs on Windows and Mac operating systems. The graph shows the difference, in decibels, between the amplitudes of the frequencies played vs. those received by the analysis microphone. Because this is only a snapshot in time, coupled with the fact that noise is measured as well, it isn’t very informative to look at just one graph like this. Being able to glean useful information from a signal independent sound analysis system comes from experience in working with real sound – learning how to compare what you want, what you see, what you understand is going on mathematically, and – most importantly – what you hear.

Figure 8.36 Smaart Live sound analysis software
Figure 8.36 Smaart Live sound analysis software

8.2.4.4 System Optimization

Once you have the sound system installed and everything is functioning, the system needs to be optimized. System optimization is a process of tuning and adjusting the various components of the sound system so that

  • they’re operating at the proper volume levels,
  • the frequency response of the sound system is consistent and desirable,
  • destructive interactions between system components and the acoustical environment have been minimized, and
  • the timing of the various system components has been adjusted so the audience hears the sounds at the right time.

The first optimization you should perform applied to the gain structure of the sound system. When working with sound systems in either a live performance or recording situation, gain structure is a big concern. In a live performance situation, the goal is to amplify sound. In order to achieve the highest potential for loudness, you need to get each device in your system operating at the highest level possible so you don’t lose any volume as the sound travels through the system. In a recording situation, you’re primarily concerned with signal-to-noise ratio. In both of these cases, good gain structure is the solution.

In order to understand gain structure, you first need to understand that all sound equipment makes noise. All sound devices also contains amplifiers. What you want to do is amplify the sound without amplifying the noise. In a sound system with good gain structure, every device is receiving and sending sound at the highest level possible without clipping. Lining up the gain for each device involves lining up the clip points. You can do this by starting with the first device in your signal chain – typically a microphone or some sort of playback device. It’s easier to set up gain structure using a playback source because you can control the output volume. Start by playing something on the CD, synthesizer, computer, iPod or whatever your playback device is in a way that outputs the highest volume possible. This is usually done with either normalized pink noise or a normalized sine wave. Turn up the gain preamplifier on the mixing console or sound card input so that the level coming from the playback source clips the input. Then back off the gain until that sound is just below clipping. If you’re recording this sound, your gain structure is now complete. Just repeat this process for each input. If it’s a live performer on a microphone, ask him to perform at the highest volume they expect to generate and adjust the input gain accordingly.

[wpfilebase tag=file id=145 tpl=supplement /]

If you’re in a live situation, the mixing console will likely feed its sound into another device such as a processor or power amplifier. With the normalized audio from your playback source still running, adjust the output level of the mixing console so it’s also just below clipping. Then adjust the input level of the next device in the signal chain so that it’s receiving this signal at just below its clipping point. Repeat this process until you’ve adjusted every input and output in your sound system. At this point, everything should clip at the same time. If you increase the level of the playback source or input preamplifier on the mixing console, you should see every meter in your system register a clipped signal. If you’ve done this correctly, you should now have plenty of sound coming from your sound system without any hiss or other noise. If the sound system is too loud, simply turn down the last device in the signal chain. Usually this is the power amplifier.

Setting up proper gain structure in a sound system is fairly simple once you’re familiar with the process. The Max demo on gain structure associated with this section gives you an opportunity to practice the technique. Then you should be ready to line up the gain for your own systems.

Once you have the gain structure optimized, the next thing you need to do is try to minimize destructive interactions between loudspeakers. One reason that loudspeaker directivity is important is due to the potential for multiple loudspeakers to interact destructively if their coverage overlaps in physical space. Most loudspeakers can exercise some directional control over frequencies higher than 1 kHz, but frequencies lower than 1 kHz tend to be fairly omnidirectional, which means they will more easily run into each other in the air. The basic strategy to avoid destructive interactions is to adjust the angle between two loudspeakers so their coverage zone intersects at the same dBSPL, and at the point in the coverage pattern where they are 6 dB quieter than the on-axis level, as shown in Figure 8.37. This overlap point is the only place where the two loudspeakers combine at the same level. If you can pull that off, you can then adjust the timing of the loudspeakers so they’re perfectly in phase at that overlap point. Destructive interaction is eliminated because the waves reinforce each other, creating a 6 dB boost that eliminates the dip in sound level at high frequencies.   The result is that there is even sound across the covered area. The small number of listeners who happen to be sitting in an area of overlap between two loudspeakers will effectively be covered by a virtual coherent loudspeaker.

When you move away from that perfect overlap point, one loudspeaker gets louder as you move closer to it, while the other gets quieter as you move farther away. This is handy for two reasons. First, the overall combined level should remain pretty consistent at any angle as you move through the perfect overlap point. Second, for any angle outside of that perfect overlap point, while the timing relationship between the two loudspeaker arrivals begins to differ, the loudspeakers also differ more and more in level. As pure comb filtering requires both of the interacting signals to be at the same amplitude, the level difference greatly reduces the effect of the comb filtering introduced by the shift in timing. The place where the sound from the two loudspeakers arrives at the same amplitude and comb filters the most is at center of the overlap, but this is the place where we aligned the timing perfectly to prevent comb filtering in the first place. With this technique, not only do you get the wider coverage that comes with multiple loudspeakers, but you also get to avoid the comb filtering!

Figure 8.37 Minimizing comb filtering between two loudspeakers
Figure 8.37 Minimizing comb filtering between two loudspeakers

[wpfilebase tag=file id=134 tpl=supplement /]

What about the low frequencies in this example? Well, they’re going to run into each other at similar amplitudes all around the room because they’re more omnidirectional than the high frequencies. However, they also have longer wavelengths, which means they require much larger offsets in time to cause destructive interaction. Consequently, they largely reinforce each other, giving an overall low frequency boost. Sometimes this free bass boost sounds good. If not, you can easily fix it with a system EQ adjustment by adding a low shelf filter that reduces the low frequencies by a certain amount to flatten out the frequency response of the system. This process is demonstrated in our video on loudspeaker interaction.

You should work with your loudspeakers in smaller groups, sometimes called systems. A center cluster of loudspeakers being used to cover the entire listening area from a single point source would be considered a system. You need to work with all the loudspeakers in that cluster to ensure they are working well together. A row of front fill loudspeakers at the edge of the stage being used to cover the front few rows will also need to be optimized as an individual system.

Once you have each loudspeaker system optimized, you need to work with all the systems together to ensure they don’t destructively interact with each other. This typically involves manipulating the timing of each system. There are two main strategies for time aligning loudspeaker systems. You can line the system up for coherence, or you can line the system up for precedence imaging. The coherence strategy involves working with each loudspeaker system to ensure that their coverage areas are as isolated as possible. This process is very similar to the process we described above for aligning the splay angles of two loudspeakers. In this case, you’re doing the same thing for two loudspeaker systems. If you can line up two different systems so that the 6 dB down point of each system lands in the same point in space, you can then apply delay to the system arriving first so that both systems arrive at the same time, causing a perfect reinforcement. If you can pull this off for the entire sound system and the entire listening area, the listeners will effectively be listening to a single, giant loudspeaker with optimal coherence.

The natural propagation of sound in an acoustic space is inherently not very coherent due to the reflection and absorption of sound, resulting in destructive and constructive interactions that vary across the listening area. This lack of natural coherence is often the reason that a sound reinforcement system is installed in the first place. A sound system that has been optimized for coherence has the characteristic of sounding very clear and very consistent across the listening area. These can be very desirable qualities in a sound system where clarity and intelligibility are important. The downside to this optimization strategy is that it sometimes does not sound very natural. This is because with coherence optimized sound systems, the direct sound from the original source (i.e. a singer/performer on stage) has typically little to no impact on the audience, and so the audience perceives the sound as coming directly from the loudspeakers. If you’re close enough to the stage and the singer, and the loudspeakers are way off to the side or far overhead, it can be strange to see the actual source yet hear the sound come from somewhere else. In an arena or stadium setting, or at a rock concert where you likely wouldn’t hear much direct sound in the first place, this isn’t as big a problem. Sound designers are sometimes willing to accept a slightly unnatural sound if it means that they can solve the clarity and intelligibility problems that occur in the acoustic space.

[aside]While your loudspeakers might sit still for the whole show, the performers usually don’t.  Out Board’s TiMax tracker and soundhub delay matrix system use radar technology to track actors and performers around a stage in three dimensions, automating and adjusting the delay times to maintain precedence and deliver natural, realistic sound throughout the performance.[/aside]

Optimizing the sound system for precedence imaging is completely opposite to the coherence strategy. In this case, the goal is to increase the clarity and loudness of the sound system while maintaining a natural sound as much as possible. In other words, you want the audience to be able to hear and understand everything in the performance but you want them to think that what they are hearing is coming naturally from the performer instead of coming from loudspeakers in a sound system. In a precedence imaging sound system, each loudspeaker system behaves like an early reflection in an acoustic space. For this strategy to work, you want to maximize the overlap between the various loudspeaker systems. Each listener should be able to hear two or three loudspeaker systems from a single seat. The danger here is that these overlapping loudspeaker systems can easily comb filter in a way that will make the sound unpleasant or completely unintelligible. Using the precedence effect described in Chapter 4, you can manipulate the delay of each loudspeaker system so they arrive at the listener at least five milliseconds apart but no more than 30 milliseconds apart. The signals still comb filter, but in a way that our hearing system naturally compensates for. Once all of the loudspeakers are lined up, you’ll also want to delay the entire sound system back to the performer position on stage. As long as the natural sound from the performer arrives first, followed by a succession of similar sounds from the various loudspeaker systems each within this precedence timing window, you can get an increased volume and clarity as perceived by the listener while still maintaining the effect of a natural acoustic sound. If that natural sound is a priority, you can achieve acceptable results with this method, but you will sacrifice some of the additional clarity and intelligibility that comes with a coherent sound system.

Both of these optimization strategies are valid, and you’ll need to evaluate your situation in each case to decide which kind of optimized system best addresses the priorities of your situation. In either case, you need some sort of system processor to perform the EQ and delay functions for the loudspeaker systems. These processors usually take the form of a dedicated digital signal-processing unit with multiple audio inputs and outputs. These system processors typically require a separate computer for programming, but once the system has been programmed, the units perform quite reliably without any external control. Figure 8.38 shows an example of a programming interface for a system processor.

Figure 8.38 Programming interface for a digital system processor
Figure 8.38 Programming interface for a digital system processor

8.2.4.5 Multi-Channel Playback

Mid-Side can also be effective as a playback technique for delivering stereo sound to a large listening area. One of the limitations to stereo sound is that the effect relies on having the listener perfectly centered between the two loudspeakers. This is usually not a problem for a single person listening in a small living room. If you have more than one listener, such as in a public performance space, it can be difficult if not impossible to get all the listeners perfectly centered between the two loudspeakers. The listeners who are positioned to the left or right of the center line will not hear a stereo effect. Instead they will perceive most of the sound to be coming from whichever loudspeaker they are closest to. A more effective strategy would be to set up three loudspeakers. One would be your Mid loudspeaker and would be positioned in front of the listeners. The other two loudspeakers would be positioned directly on either side of the listeners as shown in Figure 8.39.

Figure 8.39 Mid Side loudspeaker setup
Figure 8.39 Mid Side loudspeaker setup

If you have an existing audio track that has been mixed in stereo, you can create a reverse Mid-Side matrix to convert the stereo information to a Mid-Side format. The Mid loudspeaker gets a L+R audio signal equivalent to summing the two stereo tracks to a single mono signal. The Side+ loudspeaker gets a L-R audio signal, equivalent to inverting the right channel polarity and summing the two channels to a mono signal. This will cancel out anything that is equal in the two channels essentially, removing all the Mid information. The Side- loudspeaker gets a R-L audio signal. Inverting the left channel polarity and summing to mono or simply inverting the Side+ signal can achieve this effect. The listeners in this scenario will all hear something similar to a stereo effect. The right channel stereo audio will cancel out in the air between the Mid and Side+ loudspeakers and the left channel stereo audio will cancel out in the air between the Mid and Side- loudspeakers. Because the Side+/- loudspeakers are directly to the side of the listeners, they will all hear this stereo effect regardless of whether they are directly in front of the MID loudspeaker. Just like Mid Side recording, the stereo image can be widened or narrowed as the balance between the Mid loudspeaker and Side loudspeakers is adjusted.

You don’t need to stop at just three loudspeakers. As long as you have more outputs on your playback system you can continue to add loudspeakers to your system to help you create more interesting soundscapes. The concept of Mid-Side playback illustrates an important concept. Having multiple loudspeakers doesn’t mean you have surround sound. If you play the same sound out of each loudspeaker, the precedence effect takes over and each listener will source the sound to the closest loudspeaker. To create surround sound effects, you need to have different sounds in each loudspeaker. The concept of Mid-Side playback demonstrates how you can modify a single sound to have different properties in three loudspeakers, but you could also have completely different sounds playing from each loudspeaker. For example, instead of having a single track of raindrops playing out of ten loudspeakers, you could have ten different recordings of water dripping onto various surfaces. This will create a much more realistic and immersive rain effect. You can also mimic acoustic effects using multiple loudspeakers. You could have the dry sound of a recorded musical instrument playing out of the loudspeakers closest to the stage and then play various reverberant or wet versions of the recording out of the loudspeakers near the walls. With multiple playback channels and multiple loudspeakers you can also create the effect of a sound moving around the room by automating volume changes over time.

8.2.4.6 Playback and Control

Sound playback has evolved greatly in the past decades, and it’s safe to say tape decks with multiple operators and reel changes are a thing of history.  While some small productions may still use CD players, MiniDiscs, or even MP3 players to playback their sound, it’s also safe to say that computer-based playback is the system of choice, especially in any professional production.  Already an integral part of the digital audio workflow, computers offer flexibility, scalability, predictability, and unprecedented control over audio playback.  Being able to consistently run a performance and reduce operator error is a huge advantage that computer playback provides.  Yet as simple as it may be to operate on the surface, the potential complexity behind a single click of a button can be enormous.

Popular computer sound playback software systems include SFX by Stage Research for Windows operating systems, and QLab by Figure 56 on a Mac.  These playback tools allow for many methods of control and automation, including sending and receiving MIDI commands, scripting, telnet, and more, allowing them to communicate with almost any other application or device.  These playback systems also allow you to use multiple audio outputs, sending sound out anywhere you want, be it a few specific locations, or the entire sound system. This is essential for creating immersive and dynamic surround effects. You’ll need a separate physical output channel from your computer audio interface for each loudspeaker location (or group of loudspeakers, depending on your routing) in your system that you want to control individually.

Controlling these systems can be as simple as using the mouse pointer on your computer to click a GO button.  Yet that single click could trigger layers and layers of sound and control cues, with specifically timed sequences that execute an entire automated scene change or special effect.  Theme parks use these kind of playback systems to automatically control an entire show or environment, including sound playback, lighting effects, mechanical automation, and any other special effects.   In these cases, sometimes the simple GO isn’t even triggered by a human operator, but by a timed script, making the entire playback and control a consistent and self-reliant process.  Using MIDI or Open Sound Control you can get into very complex control systems.  Other possible examples include using sensors built into scenery or costumes for actor control, as well as synchronizing sound, lighting, and projection systems to keep precisely timed sequences operating together and exactly on cue, such as a simulated lighting strike.  Outside of an actual performance, these control systems can benefit you as a designer by providing a means of wireless remote control from a laptop or tablet, allowing you to make changes to cues while listening from various locations in the theatre.

Using tools such as Max or PD, you can capture input from all kinds of sources such as cameras, mobile devices, or even video game controllers, and use that control data to generate MIDI commands to control sound playback.  You’ll always learn more actually doing it than simply reading about it, so included in this section are several exercises to get you going making your own custom control and sound playback systems.

15/15