If you’d like to dig into the MIDI specification more deeply – perhaps writing a program that can either generate a MIDI file in the correct format or interpret one and turn it into digital audio – you need to know more about how the files are formatted.
Standard MIDI Files (SMF), with the .mid or .smf suffix, encode MIDI data in a prescribed format for the header, timing information, and events. Format 0 files are for single tracks, Format1 files are for multiple tracks, and Format 2 files are for multiple tracks where a separate song performance can be represented. (Format 2 is not widely used.) SMF files are platform-independent, interpretable on PCs, MAC, and Linux machines.
Blocks of information in SMF files are called chunks. The first chunk is the header chunk, followed by one or more data chunks. The header and data chunks begin with four bytes (a four-character string) identifying the type of chunk they are. The header chunk then has four bytes giving the length of the remaining fields of the chunk (which is always 6 for the header), two bytes telling the format (MIDI 0, 1, or 2), two bytes telling the number of tracks, and two bytes with information about how timing is handled.
Data are stored in track chunks. A track chunk also begins with four bytes telling the type of chunk followed by four bytes telling how much data is in the chunk. The data then follow. The bytes which constitute the data are track events: either regular MIDI events like Note On; meta-events like changes of tempo, key, or time signature; or sys-ex events. The events begin with a timestamp telling when they are to happen. Then the rest of the data are MIDI events with the format described in Section 1. The structure of an SMF file is illustrated in Figure 6.47.
MIDI event timestamps tell the change in time between the previous event and the current one, using the tick-per-beat, frame rate in frames/s, and tick/frame defined in the header. The timestamp itself is given in a variable-length field. To accomplish this, the first bit of each byte in the timestamp indicates how many bytes are to follow in the timestamp. If the bit is a 0, then the value in the following seven bits of the byte make up the full value of the timestamp. If the bit is a 1, then the next byte is also to be considered part of the timestamp value. This ultimately saves space. It would be wasteful to dedicate four bytes to the timestamp just to take care of the few cases where there is a long pause between one MIDI event and the next one.
An important consideration in writing or reading SMF files is the issue of whether bytes are stored in little-endian or big-endian format. In big-endian format, the most significant byte is stored first in a sequence of bytes that make up one value. In little-endian, the least significant byte is stored first. SMF files store bytes in big-endian format. If you’re writing an SMF-interpreting program, you need to check the endian-ness of the processor and operating system on which you’ll be running the program. A PC/Windows combination is generally little-endian, so a program running on that platform has to swap the byte-order when determining the value of a multiple-byte timestamp.
More details about SMF files can be found at www.midi.org. To see the full MIDI specification, you have to order and pay for the documentation. (Messick 1998) is a good source to help you write a C++ program that reads and interprets SMF files.