Web audio by Thump Audio

Discussions about music gear: hardware/software, Tips/Tricks, Questions, How Tos, and advice ...
Mastering/Recording/Mixing...
Scratching
Post Reply
PatientZero

Web audio by Thump Audio

Post by PatientZero »

Thump's Audio for the Web 101


Okay so you've got this .wav file you've whipped up in ACID, now it's time to turn the masses on to your vision via Internet distribution. The first choice you need to make is which format you're going to use. The "Big Three" are Windows Media v7 (v8 on it’s way), RealMedia v8 (featuring RealAudio v8.5 codecs), and MPEG-Layer3 (mp3). There are other formats around like AAC (sounds friggin awesome), Liquid Audio (proprietary version of AAC), OggVorbis (open source answer to mp3; OggVorbis v4 spec just released), Quicktime (the work of the devil), and PCA (SOFO's proprietary Perfect Clarity Audio format - 2:1 data compression only); but for the sake of discussion we're going to ignore them.

How do you decide which format you're going to use? First look at your target audience and how the file will be delivered - is the tune going to be a download? If it is mp3 performs the best out of the big 3. Are you going to stream the tune? If so use wma for low-bandwidth (56kbps and under) and RealAudio for high-bandwidth (128kbps and up). Why the change? Why not wma for everything? Simple - RealAudio 8 at high-bandwidths uses Sony's ATRAC3 compression scheme - the same compression scheme your MiniDisc player is using. It's a nice codec for a lossy one, but at low-bandwidth it removes too much of the frequency range. At 96kbps+ it does sound superior to wma (to my ears at least). Okay, we know the format now we need to decide the exact bandwidth we're going to use.

You might think "Well, I'm doing this for a 56k modem, so I'll encode using a 56k template". Wrong. If you look at the specs of that "56k FM Radio Stereo" template you'll see that it's actually using 36kbps (kilobits per second), not 56 as you might expect. Why is that? You need to leave room for ISP packets that are sent from your box to the ISP every now and then. 36kbps is playing it a little too cautious IMHO tho. You can get away with 40-42kbps as a 56k stream without any problem, maximizing the amount of data available in the packet stream for your audio. There's another advantage to upping the available bandwidth as well, we'll be hitting that in a second. Here's a quick reference thingy for connection types and my recommended bitrates:

28k Dial-Up: 22kbps
56k Dial-Up: 40 - 42kbps
Cable Modem: 100kbps
DSL: 300kbps+
LAN: 300kbps+

You also might have noticed that there's a number in kHz listed after the bandwidth in the template. What that number represents depends on which format you're using. For mp3 and wma that number represents the sampling rate. For RealMedia that number represents the frequency response. What's the difference between the two?? In this case - none - they're describing the same thing two different ways.

WARNING: Esoteric Tech ahead - skip to ****** if you want!!

Sampling rate: according to the Nyquist Therom the sampling rate must be 2x the highest frequency desired. If you want to capture frequencies up to 11kHz use 22kHz as yer sampling rate (2x11=22) That's where the 44.1kHz CD sampling rate comes from. The range of human hearing is 20-20kHz - so to capture the full range we can hear we need to sample at a minimum of 40kHz to accurately reproduce that. We use 44.1 to give us some extra headroom, and while we can't hear those 22kHz fundamental frequencies, we can hear their effect on 18Khz+. What happens when your sampling rate is lower than needed for the desired sound? Say you're recording a 16kHz sine wave (for simplicity's sake) but you're using a 22kHz sampling rate. As we've seen the frequency response of that sampling rate is 11kHz - how do we fit a 16kHz tone into only 11kHz??? Simple answer is we don't. The sound is changed, somewhat drastically in this case. What happens is this - the sampling rate captures the sound to 11kHz and then since it's at the highest it can go - it returns to zero and starts over. So we've got 11kHz captured, it rolls over, and captures the leftover 5kHz. The resulting tone you'd hear on playback would be around 5k not 11kHz or 16kHz - this effect is known as aliasing - the "washy" or "chimey" sound you hear on compressed files. To check this out for yourself in Forge follow these steps:
1. Use Tools/Simple Synthesis to create a 16kHz sine wave at 44.1kHz sampling rate.
2. Go to Process\Resample and select 22kHz as the new sampling rate. Check the box that say "Set the sample rate only (no resample) and apply.
3. Normalize the file using peak normalization at -0.1dB (done only to make hearing it easier)
4. Select Tool\Spectral Analysis and look at the file - you should see a single line around 7.6kHz.
Why not 5kHz you say? I'm simplifying the math - I don't work in 32bit like yer computer does ;)> Let me know if you try this and get different results.

/End Esoteric Tech Stuff *************************************************


Okay - there was a point to all of that - the biggest problem with compressed audio is the aliasing - well now that we know why it occurs we can try to fight it. The biggest tool in your arsenal is EQ. If you're going to be creating wma that has a sampling rate of 32kHz that means the highest frequency you can capture is 16kHz before aliasing starts to occur. Fire up your favorite EQ plugin, and set a high-shelf (or low-pass - they’re the same thing) filter so that it ends at 16kHz - removing everything above that. Voila - much less aliasing!!! If you can get a brickwall filter (looks like the profile of a cliff) you're really smoking - but those are fairly rare - so try and get as close to that as you can. This is why we need to know that target sampling rate/frequency response of our encoded file. If you're going for a 22kHz sampling rate set the filter at 11kHz, etc.....

Also, all compression codecs use "perceptual encoding" to determine what data gets thrown out to make the file smaller. What this means is that if you have a kick drum and a hi-hat being played at the same time and at equal volumes - the hi-hat's going to be tossed out since the kick drum is more prominent (I refer you to the Fletcher-Munson curve - a logarithmic curve describing how our ears hear different frequencies at equal volume). EQ can also help keep this process from messing with that cool high-hat pattern you made. Adjust the EQ to actually decrease the bass - it'll sound funny when you listen to the wav, and not very musical - but it will keep things from being removed or decreased that you want prominent. What you're shooting for is not "musical sound" but balanced sound. When doing this - try doing some test encodes until you find the EQ settings that make the compressed file sound as close to how it should as you can. Always trust what your ears tell you over math!!

Okay - remember way back when I said there was going to be an advantage to upping bandwidth beyond the maker's recommendations?? Here's why: Microsoft recommends 36kbps for a 56k dialup. At that bandwidth tho there's only room for a sampling rate of 22kHz, which means a frequency response of 11kHz, which is pretty dull and lifeless sounding. At 40kbps tho - we have a new sampling rate available 32kHz - with only a slight increase in bandwidth cost. A 16kHz file will sound much more lifelike than an 11khz file.

So far we've seen how format choice, sampling rate, and EQ affect the quality of the compressed audio file - we've got one more trick left to cover: Compression. Normally the rule of thumb for compression while tracking, mixing, and mastering is "if you can hear it, you're using too much". That goes out the window when it comes to encoding a compressed audio file. You're goal should be to make it as loud as possible, to maximize the number of available bits describing the file. I recommend a dynamic range of no more than 6dB - that means 6dB difference between the "meat" or RMS value (RMS = root mean square = average = fancy audio engineer talk we use to sound smart; I've got about 6 different terms I can use to mean volume) and the highest peak in the file. Once again, let your ears be the guide and experiment! Compression helps combat the effects of perceptual encoding by raising the volume of the EQ'ed file (oh yeah, EQ first - then compress) which means the Fletcher-Munson curve will act differently, ergo there won't be as many frequencies removed, ergo the tune will sound more balanced, not bass-heavy as many low-bandwidth files are.
The final step is to normalize the file (once again maximum volume is good when encoding) and then actually encode the file. Listen to it 4 hours after you create it, to give your ears time to rest and to remove any preconceptions you might have about what you're actually hearing.

There we have it in a nutshell - how to create good sounding compressed files. The things I cannot stress enough is to always trust your ears, always check how your work sounds while you do it - not just after you think you're done, and always trust your ears.

Did this help you? Am I full of feces? Do you want more info on anything discussed? Do you need clarification on anything? Why is the sky blue? Why do drive-up ATM's have Braille? Is invisible ink erasable? Would any of you buy a prosthetic appendix??

I think that about covers it for now...let the flames begin!!!!!!

Peace and Groove,
The Thump
Post Reply