5.1.6 Digital Audio File Types

5.1.6 Digital Audio File Types

[aside]In our discussion of file types, we’ll use capital letters like AIFF and WAV to refer to different formats.  Generally there is a corresponding suffix, called a file extension, used on file names – e.g., .aiff or .wav. However, in some cases, more than one suffix can refer to the same basic file type. For example, .aiff and .aif, and .aifc are all variants of the AIFF format.[/aside]

You saw in the previous section that the digital audio stream moves through various pieces of software and hardware during a recording session, but eventually you’re going to want to save the stream as a file in permanent storage. At this point you have to decide the format in which to save the file. Your choice depends on how and where you’re going to use the recording.

Audio file formats differ along a number of axes. They can be free or proprietary, platform-restricted or cross-platform, compressed or uncompressed, container files or simple audio files, and copy-protected or unprotected.   (Copy protection is more commonly referred to as digital rights management or DRM.)

Proprietary file formats are controlled by a company or an organization. The particulars of a proprietary format and how the format is produced are not made public and their use is subject to patents. Some proprietary files formats are associated with commercial software for audio processing. Such files can be opened and used only in the software with which they’re associated.   Some examples are CWP files for Cakewalk Sonar, SES for Adobe Audition multitrack sessions, AUP projects for Audacity, and PTF for Pro Tools. These are project file formats that include meta-information about the overall organization of an audio project. Other proprietary formats – e.g., MP3 – may have patent restrictions on their use, but they have openly documented standards and can be licensed for use on a variety of platforms.   As an alternative, there exist some free, open source audio file formats, including OGG and FLAC.

Platform-restricted files can be used only under certain operating systems. For example, WMA files run under Windows, AIFF files run under Apple OS, and AU files run under Unix and Linux. The MP3 format is cross-platform. AAC is a cross-platform format that has become widely popular from its use on phones, pad computers, digital radio, and video game consoles.

[aside]Pulse code modulation was introduced by British scientist A. Reeves in the 1930s. Reeves patented PCM as a way of transmitting messages in “amplitude-dichotomized, time-quantized” form – what we now call “digital.”[/aside]

You can’t tell from the file extension whether or not a file is compressed, and if it is compressed, you can’t necessarily tell what compression algorithm (called a codec) was used. There are both compressed and uncompressed versions of WAV, AIFF, and AU files. When you save an audio file, you can choose which type you want.The basic format for uncompressed audio data is called PCM (pulse code modulation). The term pulse code modulation is derived from the way in which raw audio data is generated and communicated. That is, it is generated by the process of sampling and quantization described in Section 5.1 and communicated as binary data by electronic pulses representing 0s and 1s. WAV, AIFF, AU, RAW, and PCM files can store uncompressed audio data. RAW files contain only the audio data, without even a header on the file.

One basic reason that WAV and AIFF files come in compressed and uncompressed versions is that, in reality, these are container file formats rather than simple audio files. A container file wraps a simple audio file in a meta-format which specifies blocks of information that should be included in the header along with the size and position of chunks of data following the header. The container file may allow options for the format of the actual audio data, including whether or not it is compressed. If the audio is compressed, the system that tries to open and play the container file must have the appropriate codec in order to decompress and play it. AIFF files are container files based on a standardized format called IFF. WAV files are based on the RIFF format. MP3 is a container format that is part of the more general MPEG standard for audio and video. WMA is a Windows container format. OGG is an open source, cross-platform alternative.

In addition to audio data, container files can include information like the names of songs, artists, song genres, album names, copyrights, and other annotations. The metadata may itself be in a standardized format. For example, MP3 files use the ID3 format for metadata.

Compression is inherent in some container file types and optional in others. MP3, AAC, and WMA files are always compressed. Compression is important if one of your main considerations is the ability to store lots of files. Consider the size of a CD quality audio file, which consists of two channels of 44,100 samples per second with two bytes per sample. This gives

$$!2\ast \frac{44000\: samples}{sec}\ast \frac{2\: bytes}{sample}\ast 60\frac{sec}{min}\ast 5\: min=52920000\: bytes\approx 50.5\: MB$$

[aside]Why are 52,920,000 bytes equal to about 50.5 MB? You might expect a megabyte to be 1,000,000 bytes, but in the realm of computers, things are generally done in powers of 2.   Thus, we use the following definitions:

kilo = 210 = 1024
mega = 220 = 1,048,576

You should become familiar with the following abbreviations:
[table th=”0″ width=”100%”]
kilobits,kb,210 bits
kilobytes,kB,210 bytes
megabits,Mb,220 bits
megabytes,MB,220 bytes[/table]

Based on these definitions, 52,920,000 bytes is converted to megabytes by dividing by 1,048,576 bytes.

Unfortunately, usage is not entirely consistent.  You’ll sometimes see “kilo” assumed to be 1000 and “mega” assumed to be 1,000,000, e.g., in the specification of the storage capacity of a CD.[/aside]

A five minute piece of music, uncompressed, takes up over 50 MB of memory. MP3 and AAC compression can reduce this to less than a tenth of the original size. Thus, MP3 files are popular for portable music players, since compression makes it possible to store many more songs.

Compression algorithms are of two basic types: lossless or lossy. In the case of a lossless compression algorithm, no audio information is lost from compression to decompression. The audio information is compressed, making the file smaller for storage and transmission. When it is played or processed, it is decompressed, and the exact data that was originally recorded is restored. In the case of a lossy compression algorithm, it’s impossible to get back exactly the original data upon decompression. Examples of lossy compression formats are MP3, AAC, Ogg Vorbis, and the m-law and A-law compression used in AU files. Examples of lossless compression algorithms include FLAC (Free Lossless Audio Codec), Apple Lossless, MPEG-4 ALS (Audio Lossless Coding), Monkey’s Audio, and TTA (True Audio).   More details of audio codecs are given in Section 5.2.1.

With the introduction of portable music players, copy-protected audio files became more prevalent. Apple introduced iTunes in 2001, allowing users to purchase and download music from their online store. The audio files, encoded in a proprietary version of the AAC format and using the.m4p file extension, were protected with Apple’s FairPlay DRM system. DRM enforces limits on where the file can be played and whether it can be shared or copied. In 2009, Apple lifted restrictions on music sold from its iTunes store, offering an unprotected .m4a file as an alternative to .m4p. Copy-protection is generally embedded within container file formats like MP3. WMA (Windows Media Audio) files are another example, based on the Advanced Systems Format (ASF) and providing DRM support.

Common audio file types are summarized in Table 5.1.

[table caption=”Table 5.1 Common audio file types” width=”90%”]

File Type,Platform,File Extensions,Compression,Container,Proprietary,DRM
PCM,cross,.pcm,no,no,no,no
RAW,cross,.raw,no,no,no,no
WAV,cross,.wav,Optional (lossy),”yes, RIFF format”,no,no
AIFF,Mac,”.aif, .aiff,”,no,”yes, IFF format”,no,no
AIFF-C,Mac,.aifc,”yes, with various codecs (lossy)”,”yes, IFF format”,no,
CAF,Mac,.caf,yes,yes,no,no
AU,Unix/Linux,”.au, .snd”,optional m-law (lossy),yes,no,no
MP3,cross,.mp3,MPEG (lossy),yes,license required for~~distribution or sale of~~codec but not for use,optional
AAC,cross,”.m4a, .m4b, .m4p,~~.m4v, .m4r, .3gp,~~.mp4, .aac”,AAC (lossy),more of a compression~~standard than a container;~~ADIF is container ,license required for~~distribution or sale of~~codec but not for use,
WMA,Windows,.wma,WMA (lossy),yes,yes,optional
OGG Vorbis,cross,”.ogg, .oga”,Vorbis (lossy),yes,”no, open source”,optional
FLAC,cross,.flac,FLAC (lossless),yes,”no, open source”,optional

[/table]

AIFF and WAV have been the most commonly used file types in recent years. CAF files are an extension of AIFF files without AIFF’s 4 GB size limit. This additional file size was needed for all the looping and other metadata used in GarageBand and Logic.

[wpfilebase tag=file id=21 tpl=supplement /]

All along the way as you work with digital audio, you’ll have to make choices about the format in which you save your files. A general strategy is this:

  • When you’re processing the audio in a software environment such as Audition, Logic, or Pro Tools, save your work in the software’s proprietary format until you’re finished working on it. These formats retain meta-information about non-destructive processes – filters, EQ, etc. – applied to the audio as it plays. Non-destructive processes do not change the original audio samples. Thus, they are easily undone, and you can always go back and edit the audio data in other ways for other purposes if needed.
  • The originally recorded audio is the best information you have, so it’s always good to keep an uncompressed copy of this. Generally, you should keep as much data as possible as you edit an audio file, retaining the highest bit depth and sampling rate appropriate for the work.
  • At the end of processing, save the file in the format suitable for your platform of distribution (e.g., CD, DVD, web, or live performance). This may be compressed or uncompressed, depending on your purposes.