Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
eng.doc
Скачиваний:
23
Добавлен:
15.06.2014
Размер:
653.82 Кб
Скачать

5 Basic sound theory and synthesis

5.1 What is sound?

Sound is what you hear. You hear what your ears capture, and your ears capture subtle and rapid changes in air pressure. These changes in pressure, or pressure waves travelling through the air, typically come from moving objects or parts of objects. If something moves, it pushes on the air in front of it and that causes a wave of pressure that spreads out and grows weaker the further it goes. It's essentially the same phenomenon that can be seen by playing around with your hands on a water surface.

These pressure waves can be of different intensity (experienced as volume, very loud sounds can even be felt in the rest of your body), and they fluctuate back and forth at the same speed as the object that caused them (heard as frequency or pitch). Our ears can interpret sound frequencies ranging between approximately 20 Hz (Hertz) to 20000 Hz (or 20 kHz, for kilo-hertz). One Hertz corresponds to one back-and-forth motion in a second.

A single Hertz can easily be produced by waving your hand back and forth, and it doesn't appear as sound since our ears can't "go that low". It does however generate a movement of air that you can easily feel. If you were able to wave your hand faster, at more than 20 back-and-forths per second, it would start sounding like a very deep bass tone.

As an example of what you actually can hear, a Swedish dial-tone is usually 440 Hz, or the musical note 'A'. That means that it is generated by an object swinging back and forth 440 times each second, which is pretty fast as far as everyday motion goes. Apparently the American equivalent is a mix of two tones at 350 and 440 Hz, but the result is still in the same range.

When sound needs to be illustrated graphically, it's common to use a horizontal time axis and show the position of an imagined moving object on the vertical axis. This means that the graph can be interpreted as a "movie reel" showing how an object would have to move to recreate the sound. For a simple tone, the graph looks like a wavy string, and if you travel to the right in the image you see that the object would move up and down at a regular interval which will correspond to the frequency of the tone.

5.2 What about sound synthesis, how do we make sounds?

So, to create sound we apparently need to move an object. Conveniently enough, there are objects widely available that can be moved back and forth in a quick and arbitrary way. They're called speakers. A speaker is basically an electromagnet attached to a mobile cone, in such a way that when a varying electrical signal is fed to it, the cone will move to follow the signal.

The traditional way of creating sounds is to reproduce naturally occuring sounds. This is done using a microphone, which is essentially a reversed speaker, that generates a signal from the sounds it is exposed to. That signal is then either stored or passed directly through an amplifier to one or several speakers. When the speakers receive the signal, they move in the same way as the microphone membrane (cone) moved when it was exposed to the original sound, so the end result is yet again that same sound, but with a small time delay and a modified volume (typically louder).

Reproducing natural sounds is all well and good, but that's not what we're here for. If you want a specific type of sound you'd have to search for a way of generating it using actual physical objects, which can often be close to impossible. Old-school sound effect technicians in movie production have done some remarkable work with hands-on physical sound generation though. Popular examples are the Tie Fighter laser sound from Star Wars (hitting metal wires) and the T-Rex growl in Jurassic Park (large moving machinery, a flight simulator I believe). Obviously there are infinite other effects ranging from explosions to monster noises and of course the ever-popular fist fights.

Most of these aim to capture "life-like" sounds in some way though - things that could potentially be heard in the real world. For imaginary sounds that have no correlation to the real world, it would be handy to bypass the microphone stage completely and just create our own electrical signal for the speakers.

Once again, there are convenient systems in place for this. A computer soundcard has what's called a digital-to-analog converter (D/A or DAC), which is able to generate an electrical signal from a digital number that is given to it. The number must be within a certain range, say -1.0 to +1.0 (with arbitrary precision in between). Generally, zero gives us the rest/idle position of the speaker element, and positive numbers push it forward while negative numbers push it back in the other direction. It doesn't actually matter a whole lot which way it goes though, as the sound you hear will be more or less the same.

So this means our task becomes that of coming up with interesting sequences of numbers to feed into the soundcard. If we just put in one number and never change it, the speaker cone will move to that position and stay there, resulting in maybe a small pop and then silence. Interesting stuff happen only when we repeatedly change the number according to some pattern.

In practical terms, and somewhat ironically, you can't really expect a modern computer to maintain a steady enough timed loop for it to explicitly send out each number to the soundcard. Instead, the standard way of doing it is to supply a long sequence of numbers, maybe a few hundred, to the soundcard at once, and then the soundcard goes through that sequence with exact timing and feeds each number to the D/A converter. This all happens at a predetermined rate, usually at 44100 samples per second (a sample is one of these numbers) for CD-quality audio. This sample rate is sometimes referred to as "Hz", even though it's not technically describing actual back-and-forth (oscillating) motion.

5.3 A simple waveform

All right, we need to come up with numbers. Repeating patterns of numbers. How on earth do we do that? Well, it's absolutely trivial - any repeating pattern of numbers will create an audible waveform (if you make them span a large enough range, going between 0.0 and 0.00001 is hardly going to catch anyone's attention).

* Any pattern you say? Well, what about 1,0,1,0,1,0,1,0,...?

Yeah, that'll work. But unfortunately you won't hear that one either, even though it's using a large part of the range. Remember that the samples are fed out through the D/A converter at 44100 per second, and those numbers describe an oscillation (back-and-forth, remember?).

...but the pattern repeats itself every two samples, so the resulting speaker oscillation is at 22050 Hz. That's just above 20 kHz which you may recall as the upper limit for human hearing. That's actually the reason for using 44.1 kHz as the sample rate - so we can reproduce sounds over the entire range of the human ear, but not more.

* Ok, what if I hold each of the numbers for a while before changing?

You mean something like 1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,...?

* Yeah, like that. Or even more of the same before changing.

Whee, you've just created a square wave, which is probably the most classic digitally synthesized waveform. How many times do we repeat a number before changing to the other one? If we increase that count, the waveform will be more stretched out in time, and the oscillations will be slower since it takes more time to complete a full back-and-forth. The repeat count (also called "period" if measured in time) is a direct way of controlling the frequency of the tone that we are generating. A longer period means a lower frequency and vice versa.

A small note of worth here is that you generally want a waveform to oscillate evenly around the zero-point, otherwise there will be an audible pop/click at the start and end of the tone. This is caused by the average signal level being shifted suddenly. It is illustrated rather clearly by how audio systems are constructed. They usually filter out anything that moves slower than a handful of Hz, to always keep the speaker cone oscillating around the physical rest point of the speaker. This is so you'll get the maximum available movement range out of it, even if your audio signal has a DC offset (meaning a rest level other than actual zero).

5.4 Sine wave

While the square wave has the nice property of being very simple to describe with a digital computer, it is in fact very complex from an auditory or musical view. It contains a lot of resonant frequencies above the one we create by selecting the period length, which can make it sound "full" and characteristic. Actually some may describe it more as "hollow" or "airy", but it's still got character.

From a physical standpoint, the simplest waveform is the sine wave. It is quite simply a description of ideal oscillation similar to the kind you can get from a pendulum or a weight suspended by a spring, for instance. Objects tend to prefer this shape of movement when they're swinging back and forth across a midpoint.

Subjectively, a sine wave is rather dull as it can only be described as a very pure tone and nothing else. For making the cleanest possible sound it has some merit, and it's also useful as a component in building more complex sounds. Most of all it's an important theoretical and mathematical construct.

A sine wave is somewhat complex to generate computationally, but in most cases there will be a sin() function providing it at a reasonable speed (and in an easy-to-use manner). A full oscillation period of sin(t) is attained by cycling t through the values 0.0 to 2*pi (which is approximately 6.283), and it outputs values ranging from -1.0 to +1.0. The reasoning for using such a seemingly odd parameter range is mathematical and related to the circle, traditional home of both sine and cosine. A circle of radius 1 has a circumference of exactly 2*pi.

5.5 Triangle

A triangle wave is an attempt to mimic the sine wave more closely than you could with square, but still using rather simple calculations. It is just a rising and falling ramp of values. The result does sound very much like a sine wave, so in that sense it's useful. Still sounds boring though, naturally.

5.6 Sawtooth

This is half a triangle wave, resetting back to the starting value right when it reaches the top. From that jump in signal level you get a harsh quality similar to the square wave, but there's only one such jump in a period (compared to the square, which has two - both up and down) and the rest of it is smoothly shaped, so it tends to sound a bit warmer and more organic than the square.

5.7 Noise

If you want more chaotic sounds, resembling explosions or general thrashing, you won't have much luck with simple repeating patterns (unless you vary them a lot, and then they cease to be simple repeating patterns). A straight-forward way of getting chaotic patterns is to employ a random number generator. Most programming languages have one. In C you would do something like this:

next_number = (float)(rand()%20000-10000)/10000

...to get a random number between -1.0 and +1.0. Look up rand() and srand() for details.

Despite this pattern being completely random and without order, you can still control the apparent "frequency" or lightness of the sound. This can be done in several ways, but the simplest is to delay the switch to a new random value based on whatever quality you're after. It is very similar to the square wave in that respect, except you will have to keep the period lengths much shorter for noise.

5.8 General waveforms (for programmers)

To try out different kinds of waveforms using the same basic control code, you can keep a variable that will pass from 0.0 to 1.0 and wrap around for each full period of the sound. This period length would be determined by your desired frequency. To get the actual sample values, you can use different functions (depending on what waveform you want) to look up a suitable value based on the current period position.

For square, you would just check if it's less than or greater than 0.5. If it's less, you output -1.0, else +1.0 (or the other way around). To get a sawtooth, you just output something like (-1.0+period_pos*2.0), which will result in a signal ramp from -1.0 to +1.0 over the {0.0 - 1.0} period position range.

One thing I've found useful for noise is to regenerate a buffer of random samples each time the period counter wraps around, and continuously read out values from that buffer based on the current position within a period. You can choose the size of the buffer to be some suitable value, based on which frequency values you want to use. It doesn't have to be long but a few hundred samples is probably a good idea to avoid recomputing it all the time.

To change sound frequency you just set a new value for period length (though this will only affect the rate at which period position increments, it will always span between 0.0 and 1.0). For controlling volume you multiply the output sample by a volume value (say between 0.0 for silence and 1.0 for maximum volume). You can also mix several independent waveforms by adding their samples together, though make sure that you're not exceeding the allowed range of your soundcard (which would probably be the familiar -1.0 to +1.0 range). In other words, you'll have to use a lower volume for each waveform (or "channel") if you are going to mix (add) several of them together... or you can just multiply the final sum with a master volume value if you don't need volume control over each individual channel. This saves some multiply operations.

Another small heads-up worth considering is that you shouldn't use floating point numbers as if they were integral datatypes. That is; don't increment them in small steps over a large range. They have limited precision and will misbehave if you do that. It can manifest itself as a slight but noticable change in pitch or worse. You can increment floats as long as you keep them within a reasonable range, like between 0.0 and 1.0. The problems occur when the size of increments is much smaller than the current value (for example 50000.0 + 0.1).

Соседние файлы в предмете Английский язык