What sounds humans love most? There are tons of surveys and studies out there. Some of the most popular sounds are: waves breaking onshore, fire crackling, white noise, children’s laughter… Back in 1933, Harvey Fletcher and Wilden A. Munson published a paper called “Loudness, its definition, measurement and calculation” in which they found out that the human ear is most sensitive between 2 and 5 kHz. It turns out that the frequency band that is critical for speech recognition falls right into that range¹. In the end, it’s not surprising that the sound we love most is actually that of the human voice.
Getting closer to the human voice
Besides the mentioned frequency range, there are another two elements that make a sound be close to the human voice:
-Ability to imitate or emulate human vocal effects
Some musical instruments have a timbre that is similar to that of our voice, such as a cello or a trumpet. Some others can imitate vocal effects, like an electric guitar with a “wah-wah” effect. The sarangi (सारंगी) is considered to be the closest match, having both, a similar timbre and the ability to imitate vocal effects. In this article, we are mainly focusing on the latter.
Emulating vocal effects
When our vocal chords produce a sound, this is then articulated and filtered in the vocal track, producing a wide range of complex sounds. For instance, pronouncing “wah” would alter the space between the top of your tongue and the larynx. Thus, the harmonics that resonate in your mouth will shift from low (‘w’) to high (‘ah’), which is exactly what a “wah-wah” effect does².
Having the “wah-wah” pedal as a starting point, modulating a sound with a single filter will produce similar sounds to that of a person saying “wah”. To emulate what happens inside the mouth, we can use either a low-pass or a band-pass filter, sweeping the frequency from low to high and back. Depending on the sound we use as a source, adjusting the resonance of the filter may be interesting.
The modulation can be done automatically, setting the frequency sweep to be done at a fixed pace with the help of a low-frequency oscillator or it can also be controlled by an internal parameter, such as the amplitude of the input signal.
I personally prefer a manual modulation, shifting the frequency with my own hand by using a knob, a MIDI controller, an expression pedal or even the laptop’s touchpad. Keep in mind that in order to get a “wah” effect, the sweep has to be done generally fast.
What is the difference between saying “ah” and “ooh”? What determines which vowel you are saying? When you speak, you modify the shape of your mouth and that changes the resonance in your vocal tract. It’s like having a series of band-pass filters with different frequencies combined together with the original sound. The resonance added by the filters determines the vowel. Each language has a different set of vowels, which can be represented by their formant frequencies (the lowest frequency is called F1, the second F2, and the third F3). Normally, the first two formant frequencies are enough to disambiguate a vowel.
Applying a formant filter
There are several plugins in the market, each one with its own character. The effect of a fixed formant is not always easy to notice, so in order to get a sound that is closer to human vocalization, the formant has to be constantly changing rather than fixed, for instance, from “a” to “u” to “a” again. Again, this can be done manually or automatically, as we mentioned before.
The best fun
While applying a formant filter to lead instruments and sounds is a common practice, it’s also a lot of fun to apply it to unconventional sources. For instance, a low-pitch bass line or a whole drum set. Doing the latter can get you a sort of beat-box effect that works very well on drum rolls.
There are also some other methods for getting voice-like sounds, such as articulatory speech synthesis or voice modulation with vocoders. This will be covered in future articles. The interesting fact about formant filters is that, as we are naturally attracted to human vocal sounds, they are attention grabbers and can spice up your track if you use them creatively.
Now that we mentioned articulatory speech synthesis, you can have lots of fun playing with Neil Thapen’s “Pink Trombone”: https://dood.al/pinktrombone/
Hint: it works better on multi-touch devices.
1: Some consider it to be wider, from 300 Hz to 3.4 KHz. Source: http://www.uoverip.com/voice-fundamentals-human-speech-frequency/
2: For a more detailed explanation of the “wah-wah” effect, see: http://theweek.com/articles/454797/science-making-guitar-sound-like-human-voice