Audio Compact Disk - Writing and Reading the data
CD/ROM -- An extension of the CD audio standard
Other disk formats of interest
Two lectures of material
The conventional audio compact disk is a high density media for storing digitally sampled audio. A CD audio disk holds approximately 74 minutes of stereo music recorded with 16-bit resolution -- and incorporates a number of error reduction, detection and correction techniques.
I. The disk itself
A. Size and overall construction
The CD disk is a 120 mm diameter disk of polycarbonate. The center contains a hole 15 mm in diameter. The innermost part of the disk does not hold data. The active data area starts at the 46 mm diameter location and ends at the 117 mm diameter location. The 46-50 mm range is the lead in area and the 116-117 range is the lead out area[1]. Disks are written from the center to the outside (this increases manufacturing yield, and also allows for changes in disk size).

A CD disk contains a long string of pits written helically on the disk. The edges of the pits correspond to binary "1"s.

Each pit is approximately 0.5 microns wide and 0.83 microns to 3.56 microns long. (Remember that the wavelength of green light is approximately 0.5 micron) Each track is separated from the next track by 1.6 microns.
The area between the pits is termed "land". So, a highly magnified section of track might look something like:

Pits are formed in the polycarbonate disk by an injection molding process. As such, they represent some of the smallest mechanically fabricated objects made by humans. The width of a CD pit is approximately the wavelength of green light. The tracks are separated by approximately three times the wavelength of green light. Diffraction from these features (so very close to the wavelength of light) is what gives CD disks their beautiful colors.

¡@
A thin layer (50-100 nm) of metal (aluminum, gold or silver) covers the pits. An additional thin layer (10-30 microns) of polymer covers the metal. Finally, a label is silk-screened on the top. Notice that the pits are far closer to the silk screened side of the disk (20 microns) than they are to the read-side of the disk (1.55 mm). Thus, it is easier to permanently damage a disk by scratching the top -- than the bottom!
¡@
B. Making the disk
The fabrication of a CD disk is a fascinating process. This process is discussed in some detail in The Compact Disk Handbook, Chapter 7 and only the high points are summarized here[2].
The process begins by making the "glass master". To do this, a glass plate about 300 mm in diameter is lapped flat and polished. The plate is coated with photoresist.

A mastering tape is made containing the information to be written on the disk. A laser then writes the pattern from the master tape into the photoresist.

The photoresist is developed. A layer of metal (typically silver over a nickel flash) is evaporated over the photoresist. The master is then checked for accuracy by playing the disk.

The master is then subject to an electroforming process. In this electrochemical process, additional metal is deposited on the silver layer.

When the metal is thick enough (typically a few mm's) the metal layer is separated from the glass master. This results in a metal negative impression of the disk -- called a father.

The electroplating process is then repeated on the father. This typically generates 3-6 positive metal impressions from the father before the quality of the father degrades unacceptably. These impressions are called "mothers".

The electroplating process is repeated again on the mothers. Each mother typically makes 3-6 negative metal impressions called sons or stampers. The sons are suitable as molds for injection molding.

Polycarbonate is used to injection mold the CD disks.

Once the disks are molded, a metal layer is used to coat the disks. Aluminum, gold, copper and silver are all reflective enough to be optically acceptable. Gold is typically too expensive and copper has a peculiar appearance. Thus, aluminum and silver are the most commonly used metals.

Following metal deposition, a thin plastic layer (1-30 microns) is spin-coated on over the metal. This can be a nitrocellulose layer suitable for air drying, or an acrylic plastic that is cured in UV.

Finally, the logo and other information is silk screened on the top.
¡@
C. Reading the pits
The CD disk is actually read from the bottom. Thus, from the viewpoint of the laser beam reading the disk, the "pit" in the CD is actually a "bump".

The polycarbonate itself is part of the optical system for reading the pits. The index of refraction of air is 1.0 while the index of refraction of the polycarbonate is 1.55. Laser light incident on the polycarbonate surface will be refracted at a greater angle into the surface. Thus, the original incident spot of around 800 microns (entering the polycarbonate) will be focused down to about 1.7 microns (at the metal surface). This is a major win, as it minimizes the effects of dust and scratches on the surface.

The laser used for the CD player is typically an AlGaAs laser diode with a wavelength in air of 780 nm. (Near infrared -- your vision cuts out at about 720 nm). The wavelength inside the polycarbonate is a factor of n=1.55 smaller -- or about 500 nm.
The pit/bump is carefully fabricated so that it is a quarter of a wavelength (notice a wavelength INSIDE the polycarbonate) high. The idea here is that light striking the land travels 1/4 + 1/4 = 1/2 of a wavelength further than light striking the top of the pit. The light reflected from the land is then delayed by 1/2 a wavelength -- and so is exactly out of phase with the light reflected from the pit. These two waves will interfere destructively -- so effectively no light has been reflected.

The spacing between pits is equally carefully selected. Recall from basic optics that the image of a beam passing through a round aperture will form a characteristic pattern called an Airy disk. The FWHM (full-width half-maximum) center of the Airy disk pattern is a spot about 1.7 um wide and falls neatly on top of the pit track. The nulls in the Airy pattern are carefully situated to fall on the neighboring pit tracks. This minimizes crosstalk from neighboring pits[3].

¡@
D. The optical train -- three beam pick-up
The most common optical train in modern CD players is the three beam pick-up, depicted below[4].
The light is emitted by the laser diode and enters a diffraction grating. The grating converts the light into a central peak plus side peaks. The main central peak and two side peaks are important in the tracking mechanism.
The three beams go through a polarizing beam splitter. This only transmits polarizations parallel to the page. The emerging light (now polarized parallel to the page) is then collimated.
The collimated light goes through a 1/4 wave plate. This converts it into circularly polarized light.
The circularly polarized light is then focused down onto the disk. If the light strikes "land" it is reflected back into the objective lens. (If the light strikes the pit, now a bump, it is not reflected.)
The light then passes through the 1/4 wave plate again. Since it is going the reverse direction, it will be polarized perpendicular to the original beam (in other words, the light polarization is now vertical with respect to the paper).
When the vertically polarized light hits the polarizing beam splitter this time, it will be reflected (not transmitted as before). Thus, it will reflect though the focusing lens and then the cylindrical lens and be imaged on the photodetector array. The cylindrical lens is important in the auto-focusing mechanism.
¡@
E. Three beam autofocus
If the objective lens is closer to the compact disk than the focal length of the object lens, then the cylindrical lens creates an elliptical image on the photodetector array.

If the objective lens is further away from the compact disk than the focal length of the object lens, then the cylindrical lens again creates an elliptical image on the photodetector array. However, this elliptical image is perpendicular to first image.

Of course, if the disk is right at the focal length of the objective lens, then the cylindrical lens does not affect the image and it is perfectly circular.

So, if the disk is too far away -- then quadrants D and B will get more light than quadrants A and C. Similarly, if the disk is too close -- then quadrants A and C will get more light than D and B. A simple circuit generates an autofocus signal based upon the output of the photodetector[5].

The output of this correction signal can be used to drive a simple auto-focus servo. A typical example of such a servo is illustrated below[6].

¡@
F. Three beam tracking
When the laser beam goes through the diffraction grating, it is split up into a central bright beam plus a number of side beams. The central beam and one beam on each side are used by the CD for the tracking system.

Consider a segment of the CD player containing several tracks.

If the optical head is on track, then the primary beam will be centered on a track (with pits and bumps) and the two secondary beams will be centered on land. The three spots are deliberately offset approximately 20 microns with respect to each other.

Two additional detectors are placed alongside the main quadrant detector in order to pick up these subsidiary beams. If the three beams are on track, then the two subsidiary photodetectors have equal amounts of light and will be quite bright because they are only tracking on land. The central beam will be reduced in brightness because it is tracking on both land and pits.

However, if the optical head is off track, then the center spot gets more light (because there are fewer pits off track) and the side detectors will be misbalanced.


Two lectures of material
Part I -- A Brief Overview
Data storage in CD format is not a simple thing. Typically, a user pictures the "1s" and "0s" in the memory of the computer as being directly transferred to "pits" and "bumps" on the CD disk. Unfortunately, it is far from that easy.
To begin with the incoming data is subjected to a series of coding operations. These coding operations add a number of additional parity bits to the data for error detection and correction purposes. The data is also subject to an interleaving process (which means that adjacent data on the disk is not adjacent data from the incoming file).
Additionally, the physical form of the data is changed (EFM coding) to eliminate the possibility of adjacent "1s". (This is done because it is the edges of the pit -- not the pit itself -- that represent l's in the data stream.)
A. Simple error detection and correction codes
Error detection and correction codes are fundamental to the operation of any digital storage system. There are literally thousands of such codes. These codes typically rely on using additional bits (usually called parity bits) to carry the error detection and correction information.
In a simple binary parity check, a parity bit is a single bit that represents whether the total number of "1s" in a particular data stream is even (1) or odd (0). (Modulo two addition).
For example, assume that you are setting a parity bit over all the digits of the following word.
1101 0000
The total number of "1s" is odd, so the parity bit would be 1. The word might then be written as
1101 0000 1
where the last digit is the parity bit.
Even simple binary parity checks can become quite complex if more than one parity bit is used. For example, you may elect to have two parity bits -- one on the first four bits of the word and one on the last four.
1 1 0 1 0 0 0 0 P1 P2
x x x x 1
x x x x 0
If enough parity bits are used, then error can not only be detected -- they can also be corrected. For example, consider what happens if you use four parity bits. The first one is on the first four bits, the second one is on the second four bits, the third one is on the 1,2,5,6 bits and the fourth one is on the 2,3,6,7 bits.
1 1 0 1 0 0 0 0 P1 P2 P3 P4
x x x x 1
x x x x 0
x x x x 0
x x x x 1
Now, assume that there was an error in the final bit.
1 2 3 4 5 6 7 8 P1 P2 P3 P4
1 1 0 1 0 0 0 1 1 0 0 1
x x x x 1
x x x x 1
x x x x 0
x x x x 1
Parity bit P1 would agree with the parity bit in the transmitted word, P2 would NOT agree, P3 and P4 would agree. Since P2 is the only parity bit not agreeing with the transmitted word -- then the error must be in the 8th bit.
Unfortunately, the majority of error-detection and correction algorithms used in CD players are not as simple as the binary check codes discussed above. Although an overview of these codes will be presented, in-depth analysis of the codes is beyond the scope of this course. (Interested students should consult more advanced references, such a W. Peterson, Error-Correcting Codes, MIT Press)
¡@
B. Simple interleaving
Interleaving is a very simple and powerful idea. To illustrate interleaving, assume that you have a frame consisting of several characters of information,
U N I V E R S I T Y O F W A S H I N G T O N
Assume that you spit on the disk and destroy several of the characters.
R S I T Y O F W A S H I N G T O N
The first word is then very hard to reconstruct! However, you can take the original frame and scramble it as,
U N I V E R S I T Y O F W A S H I N G T O N
O N S T H U G R F S I I O T W N N V E I Y A
Then you can damage it,
U G R F S I I O T W N N V E I Y A
Then you can unscramble it,
U N I V E R I Y O F W A S I G T N
It is much easier to "interpolate" or "guess" the missing letters. (A bit like the later stages of "hangman"!)
¡@
C. Concealment
In practice some errors are so large that they cannot be corrected by the error-detection and correction algorithms. Unless these errors are handled by some other means, they can result in audible clicks in the audio output. In order to avoid these clicks, several methods are used to conceal uncorrectable errors:
Interpolation: In this technique, some average is constructed using the valid data around an error. This average is then substituted in for the erroneous data. Since most music (with the possible exception of heavy metal!) is continuous -- this method works well for concealing relatively short errors.
Muting: Muting is a last ditch technique -- as it effectively creates a brief period of silence in the audio train. However, it is not effective to simply set all the binary digits to zero --as this produces exactly the click that we are trying to avoid! Instead, the volume is faded out and then back in again to conceal the error.
¡@
D. EFM modulation
EFM means Eight to Fourteen Modulation and is an incredibly clever way of reducing errors. The idea is to minimize the number of 0 to 1 and 1-0 transitions -- thus avoiding small pits. In EFM only those combinations of bits are used in which more than two but less than 10 zeros appear continuously.
For example, a digital 10 given as a binary 0000 1010 is an EFM 1001 0001 0000 00
(See attached table for the complete list of EFM codes[1].)
The use of EFM coding means that pits come in discrete lengths ranging from 3 bits long (often written 3T) to 11 bits long (11T).
As the laser beam scans across these pits, a very distinct RF signal is formed. The shortest wavelength in this signal (highest frequency) is produced by the 3T pits. The longest wavelength in the signal (lowest frequency) is produced by the 11T pits. The zero crossings of the RF signal represent the edges of the pits -- and thus the binary "1s" in the data stream[2]. (Notice that the longer the wavelength, the larger the amplitude of the signal.)
It is common to display the photodetector output on a scope with a conventional trigger. This results in a display where the nine possible frequencies (3T to 11T) all add up on top of each other. This type of display is termed an "eye" pattern and provides valuable information about the various alignment parameters of the CD player. Notice that the relationship between size and wavelength is very distinct in the eye pattern[3].

The RF output is converted to a square wave, and then phase locks a clock with the period T. The CD player then begins to hunt for the characteristic start of frame symbol, which is three transitions separated by 11T. (100000000001000000000010 + 3 merge bits) Then, the player isolates the 33 17T symbols, and then kicks off the 3T merge bits -- leaving the 33 14T active symbols.
¡@
PART II -- IEC - 908 ... The BIG Picture
The encoding of digital audio on CD player is governed by IEC 908. This standard is available in the library for your perusal. (Notice that every other page is missing -- this is because the standard is written in both French and English and I took out the French pages!) This information is also covered more generally in Chapter 3 of Ken Pohlman's book The Compact Disk Handbook, (A-R Editions, 1992).
CD players use parity and interleaving techniques to minimize the effects of an error on the disk. In theory, the combination of parity and interleaving in a CD player can detect and correct a burst error of up to 4000 bad bits -- or a physical defect 2.47 mm long. Interpolation can conceal errors up to 13,700 or physical defects up to 8.5 mm long.
The entire error detection and correction algorithm is summarized on the following table.
This is Figure 12 from the IEC 908 standard. This table will be described in more detail below.
The original musical signal is a waveform in time. A sample of this waveform in time is taken and "digitized" into two 16-bit words, one for the left channel and one for the right channel.
For example, a single sample of the musical signal might look like:
L1 = 0111 0000 1010 1000
R1 = 1100 0111 1010 1000
Six samples (six of the left and six of the right for a total of twelve) are taken to form a frame.
L1 R1 L2 R2 L3 R3 L4 R4 L5 R5 L6 R6
The frame is then encoded in the form of 8-bit words. Each 16-bit audio signal turns into two 8-bit words.
L1 LI R1 R1 L2 L2 R2 R2 L3 L3 R3 R3 L4 L4 R4 R4 L5 L5 R5 R5 L6 L6 R6 R6
This gives a grand total of 24 8-bit words. This is column two on the IEC 908 table.
The even words are then delayed by two blocks and the resulting "word" scrambled. This delay and scramble is the first part of the interleaving process.
The resulting 24 byte word (remember, it has an included two block delay -- so some symbols in this word are from blocks two blocks behind) has 4 bytes of parity added. This particular parity is called "Q" parity. Parity errors found in this part of the algorithm are called C1 errors. More on the Q parity later.
Now, the resulting 24 + 4Q = 28 bytes word is interleaved. Each of the 28 bytes is delayed by a different period. Each period is an integral multiple of 4 blocks. So the first byte might be delayed by 4 blocks, the second by 8 blocks, the third by 12 blocks and so on. The interleaving spreads the word over a total of 28 x 4 = 112 blocks.
The resulting 28 byte words are again subjected to a parity operation. This generates four more parity bytes called P bytes which are placed at the end of the 28 bit data word. The word is now a total of 28 + 4 = 32 bytes long. Parity errors found in this part of the algorithm are called C2 errors. More on the P parity later too.
Finally, the another odd-even delay is performed -- but this time by just a single block. Both the P and Q parity bits are inverted (turning the "1s" into "0s") to assist data readout during muting.
An 8-bit subcode is then added to the front end of the word. The subcode specifies such things as the total number of selections on the disk, their length, and so on. More on this later.
Next the data words are converted to EFM format. EFM means Eight to Fourteen Modulation and is an incredibly clever way of reducing errors. The idea is to minimize the number of 0 to 1 and 1-0 transitions -- thus avoiding small pits. In EFM only those combinations of bits are used in which more than two but less than 10 zeros appear continuously.
For example, a digital 10 given as a binary 0000 1010 is an EFM 1001 0001 0000 00
Each frame finally has a 24-bit synchronization word attached to the very front end -- (just for completeness the word is (100000000001000000000010) and each group of 14 symbols is then coupled by three merge bits.
These merge bits are chosen to meet two goals:
1. No adjacent 1's from neighboring EFM encoded words
Remember that there are lots of EFM words which end in "1" -- as one example, all the eight-bit binary words from 128 to 152 end in "1". Similarly, there are EFM words that start in "1". Thus, it is relatively straightforward to have to have adjacent EFM words that create adjacent "1s".
For example -- binary 128 and binary 57
10000000 in EFM is 00111001 in EFM is
01001000100001 10000000001000
¡@
2. The digital sum value is kept near zero
Minimizing the digital sum value is just an attempt to keep the average number of "0's" and "1's" about the same. The value of +1 is assigned to the "1" states and the value of -1 is assigned to the "0" states. Then, the value of the merge bit is chosen to maintain the average near zero.
SO! The final frame (which started at 6*16*2 = 192 data bits) now contains:
1 sync word 24 bits
1 subcode signal 14 bits
6*2*2*14 data bits 336 bits
8*14 parity bits 112 bits
34*3 merge bits 102 bits
GRAND TOTAL 588 bits
Part III - IEC 908 -- Now for the little details
A. P and Q parity
The eight parity symbols are calculated from the following equations:
Hp . Vp = 0
Hq . Vq = 0
Definitions for H and V are as follows.

¡@
V is pretty straightforward, just being the shifted and interleaved data bits in the data word (including the parity bits). However, H is more complex. H is defined on the Galois field GF (28) by the polynomial:

(The 's in the definitions for the H vector come from the field elements of the Galois field.)
Unfortunately, the Galois field of 28 elements of GF (28) defined by

is a set of 255 's.
However, to illustrate the principle, the Galois field of 24 elements of GF (24) formed as the field of polynomials over GF(2) modulo

is given on the next page[4].
0 = 1 = 0001 1 = 1 = 0010 2 = 2 = 0100 3 = 3 = 1000 4 = 1 + 1 = 0011 5 = 2 + 1 = 0110 6 = 3 + 2 = 1100 7 = 3 + 1 + 1 = 1011 8 = 2 + 1 = 0101 9 = 3 + 1 = 1010 10 = 2 + 1 + 1 = 0111 11 = 3 + 2 + 1 = 1110 12 = 3 + 2 + 1 + 1 = 1111 13 = 3 + 2 + 1 = 1101 14 = 3 + 1 = 1001 15 = 1 = 0
B. Subcodes
The 8-bit subcode is a very peculiar creature. Each 588 bit frame has an eight bit subcode. These bits are named P-Q-R-S-T-U-V-W. So, for each 588 bit frame, there is one P bit (not the same as P parity), one Q bit (not the same as Q parity), one R bit, one S bit and so on[5].

Now, the P-Q-R-S-T-U-V-W bits from 98 consecutive frames are collected together. These 98 bits are called a subcoding channel or just channel. Thus, there is a P-channel of 98 bits (no relation to the P parity), a Q-channel of 98 bits (no relation to Q parity), an R-channel of 98 bits and so on.
Unfortunately (just to maximize confusion with the P and Q parity bits) only the P and Q subcode channels are used. The R-W subcode channels are not yet assigned -- being held for later expansion of the standard.
P Channel -- The P channel simply designates the starting and stopping of tracks. Music data is denoted by all zeros, the start flag before the musical selection by 2-3 seconds of "1's". The lead out at the end of the disk is a 2 Hz alternating 1 and 0[6].

Q channel. The Q channel contains the majority of program and timing information. The first two bits (S0 and S1) are synchronization bits. The next four (bits 3-6) are the control bits. Bit 3 controls the number of channels (2 or 4), bit 4 is unassigned, bit 5 is the copy protect and bit 6 is the pre-emphasis bit. The next four bits control the mode (three defined modes). The next 72 bits are data -- and the last 16 are a cyclic redundancy check on the channel data.
Mode 1 -- contains the primary selection timing information. In the lead-in area, this information consists of the number of tracks and the absolute starting time of each track. This information is continually repeated in the lead-in area, and allows the CD player to build the table of contents[7].
In the program and lead-out areas, the Mode 1 information is track number, index numbers within a track, time within a track, and absolute time[8].

Mode 2 contains a catalog number of the disk -- plus a continuation of the absolute time count[9].

Mode 3 contains IRSC codes for identifying each track -- allowing for such things as automatic copyright logging. Mode 3 also contains a continuation o f the absolute time count. Mode 3 is irregularly used at this time[10].

¡@
¡@
Two lectures of material
Part I -- A Brief Overview
The CD/ROM format offers a magnificent solution to the problem of storage of large digital files. Interactive CD (CD-i) has also made its appearance for more elaborate CD-based presentations. Recent advances in CD-R (CD recordable) have extended the CD/ROM format into the realm of archival data storage.
There are a number of standards in existence governing the CD-audio and CD/ROM data structures. So far, we have only discussed IEC 908 (audio CD or the "red" book). Summarizing the remaining important standards.
Physical format
CD-Audio ICE 908 Red book*
CD-ROM ISO/IEC 10149 Yellow book*
CD-I Green book*
Video CD White book*
CD-recordable ISO/IEC 11172/1/2/3 Orange book (1990)*
Logical format
CD-ROM ISO-9660 High Sierra
CD-Recordable ECMA 168 / IS 13940 Frankfurt proposal
PART II -- IEC - 10149 ... The BIG Picture
The encoding of CD/ROM information is governed by IEC 10149. This standard is available in the library for your perusal. (This one is in English even!) This information is also covered more generally in Chapter 6 of Ken Pohlman's book The Compact Disk Handbook, (A-R Editions, 1992).
Hard as this may be to believe, the error-detection and correction strategies employed in CD/ROM (IEC 10149) are even more elaborate than those employed in conventional CD-audio! The reasons for this are relatively simple. An error in a CD-audio disk might result in an audible "click". However an error in a CD/ROM disk might mean the failure of operation of a piece of valuable software. Thus, a second layer of error detection and correction encoding is employed.
The following notes review IEC 10149. The majority of information is taken directly from IEC 10149 and is not specifically referenced. However, other information sources and illustrations are referenced.
A. Sectors, frames and modes
The fundamental data structure on the CD/ROM is organized differently than with the CD-audio disk. Recall that the CD-audio disk has a fundamental frame of 588 channel bits (192 data bits) with the frame architecture given below[1].
Notice however, that this fundamental frame only encodes 192 original data bits. Although a very nice format for audio, this is just too restrictive for CD/ROM.
Therefore, in IEC 10149 a superstructure of 98 frames is used in order to provide more user data space. This superstructure provides 6*2*2 = 24 bytes per frame over 98 frames, for a grand total of 6*2*2*98 = 2352 bytes.
This superstructure is called a sector. Thus, a frame is contains 24 data bytes and a sector contains 2352 data bytes. Of these 2352 data bytes, at least 2048 bytes are typically reserved for user data (2048 in Mode 1 and 2336 in Mode 2) -- and the remainder are for various system functions. (Notice that there are 75 sectors per second, giving a final channel data rate of 4.3218 MB/sec.)
Sectors are organized in the following way:
Byte number (starting at 0) Contents
0 0000 0000
1-10 1111 1111
11 0000 0000
12 Minutes - 74 max (+ hex A)
13 Seconds - 59 max
14 Block number within sec. (75
block/sec)
15 Mode
16-2063 User data
2064 - 2351 Error detection and correction data
The first 12 bytes hold synchronizing information.
Bytes 12, 13 and 14 hold addressing data. For example, an address of A4 20 45 says that this is the 4th minute, 20th second, and that this is the 45th block (out of 75) in the second (thus, just slightly more than half-way through the second).
(Note, the minutes field has a hexidecimal A0 added to it -- see page 19 of IEC 10149)
(Note, this information does repeat information that will be stored in the Q-channel of the frames also.)
Byte 15 defines the data mode. Three modes are permitted.
Mode 0 is used for null data (and really isn't very interesting!). Notice that this data will eventually also be CIRC encoded (using the same CIRC encoder as for digital audio). Thus, this data is considered to be protected only by CIRC coding.
MODE 0 --
Byte number Contents
0 0000 0000
1-10 1111 1111
11 0000 0000
12 Minutes - 74 max
13 Seconds - 59 max
14 Block number within sec. (75
block/sec)
15 Mode 00
16-2063 0000 0000
2064 - 2351 0000 0000
Mode 2 is a reduced error checking data storage form. Mode 2 has 2336 available user bytes -- trading off additional parity checks for more user data. Notice that this data will eventually also be CIRC encoded (using the same CIRC encoder as for digital audio). Thus, this data is considered to be protected only by CIRC coding.
MODE 2 --
Byte number Contents
0 0000 0000
1-10 1111 1111
11 0000 0000
12 Minutes - 74 max
13 Seconds - 59 max
14 Block number within sec. (75
block/sec)
15 Mode 02
16-2351 User data
Mode 1 is the full CD/ROM standard, making use of a second level of error detection and correction. Mode 1 has 2048 available user bytes -- and then has 4 bytes of additional EDC, some "0's" (called an "intermediate" in IEC 10149), and then 172 bytes of P parity and 104 bytes of Q parity. Notice that this data will eventually also be CIRC encoded (using the same CIRC encoder as for digital audio). Thus, this data is considered to be protected by EDC coding, ECC coding (P and Q parity) and CIRC coding.
MODE 1 --
Byte number Contents
0 0000 0000
1-10 1111 1111
11 0000 0000
12 Minutes - 74 max
13 Seconds - 59 max
14 Block number within sec. (75
block/sec)
15 Mode 01
16-2063 User data
2064 - 2067 4 bytes of EDC
2068 - 2075 8 bytes of 0000 0000
2076 - 2247 172 bytes of P-parity
2248 - 2351 104 bytes of Q-parity
B. EDC coding
Bytes 2064 to 2067 are used for EDC coding in Mode 1. (They are 00s in Mode 00, and user data in Mode 02.)
EDC coding is a 32-bit (8*4=32) cyclic redundancy code applied to bytes 0-2063. The least significant bit of the data byte is used first and the EDC codeword must be divisible by the check polynomial given by,

The least significant parity bit (x0) is stored in the most significant bit of byte 2067.
¡@
C. The P and Q parity fields.
These sector P and Q parity fields add additional parity to the P and Q parity fields in the frame.
As with the P and Q parity bits in the frames, the 172 "P" and 104 "Q" parity bytes are calculated from the following equations:
Hp . Vp = 0
Hq . Vq = 0
Notice however, that the encoding operation is somewhat more complex on the sectors than on the frames -- due primarily to the larger amount of data. The input data is divided into words composed of two 8 bit bytes each. The encoding algorithm is applied twice, first on the most significant bytes and then again on the least significant bytes.
Definitions for H and V are attached (pg 31 and 32 of IEC 10149)
As with CD audio H is defined on the Galois field GF (28) by the polynomial:

(Where the 's in the definitions for the H vector come from the field elements of the Galois field.)
¡@
D. Scrambling
Bytes 12-2351 of each sector (that is, all the bytes after those 0000 0000 and 1111 1111 sync bytes) are scrambled. Notice that the addressing bytes are scrambled also -- as well as the mode 1 parity and EDC bytes.
The reason for this is that in certain cases, the merging bits (recall that these are the three bits added between the 14-bit EFM words to avoid adjacent 0s and to minimize the digital sum value) are often not sufficient to minimize the digital sum value.
Quoting from IEC 10149 Annex B "The scrambler reduces this risk by converting the bits in bytes 12 to 2351 of a sector in a prescribed way. Each bit in the input stream of the scrambler is added modulo 2 to the least significant bit of a maximum length register. The least significant bit of each byte comes first in the input stream. The 15-bit register is of the parallel block synchronized type, and is fed back according to the polynomial

After the sync of the sector, the register is preset with the value 0000 0000 0000 0001 where the 1 is the least significant bit."

¡@
E. Making the F1 frame
Each scrambled sector is mapped onto a series of consecutive frames. Each frame is the typical audio frame format -- consisting of 24 8-bit bytes. However, the starting point of the sector is not necessarily the starting point of the frame. Byte 0 of the sector is placed in byte 4n of a frame where n = 0, 1, 2, 3, 4 or 5. Consecutive bytes in the sector are then placed in consecutive bytes of the frames.

Next, the byte order of each even-odd numbered pair of bytes in the frame is reversed. That is, the original byte order of 1, 2, 3, 4, 5 ... is converted to 1, 0, 3, 2, 5, 4, .... The frame after interchanging is called an F1 frame.

Notice that the original position 0 of the sector is now in the F1-frame at a position of
4n + 1 where n = 0, 1, 2, 3, 4 or 5.
¡@
F. Making the F2 frame
Each F1 frame is then fed into a conventional CIRC. This is exactly the same CIRC encoder we discussed for audio CD (page 50 of IEC 908 and page 36 of IEC 10149).
(Note: For a more complete explanation of the CIRC encoder -- see my course notes 498_95_7.doc entitled Audio Compact Disk - Writing and Reading the data.)
The input to the encoder is 24 8-bit bytes. The output of the encoder is 32 8-bit bytes. (Notice, EFM modulation has not been done yet!)
¡@
G. Subcodes on the F3 frame
A single control byte is added to the 32 byte F2 frame. This byte defines the subcodes. These are the same subcodes as used in audio CD. The subcode byte consists of 8 bits -- each bit defining a P, Q, R, S, T, U, V, W subcode channel. The operation of creating the subcode generates a group of 98 F3 frames related by the subcode information. This group of 98 frames is termed a section. Notice that sections have nothing to do with sectors!
Essentially, each bit from the subcode (P, Q, R, S, T, U, V, W) for 98 F3 frames is collected into a buffer and used as a 98-bit word for control information storage. The first two bits of this 98 bit subcode word are reserved for synchronization.
Thus, the control byte for F3 frame 0 is given by: 00100000000001
and the control byte for F3 frame 1 is given by: 00000000010010
The P subcode channel is virtually identical to the P subcode channel in audio CD. The P channel simply designates the starting and stopping of an information track. Data is denoted by all zeros, the start flag before the data by 2-3 seconds of "1's". The lead out at the end of the disk is a 2 Hz alternating 1 and 0.
The Q subcode channel is similar to the Q subcode channel in audio CD. The contents of the channel are as follows:
Byte number
0-1 Synchronization bits
2-5 Control bits 0100 - digital data copy
protected 0110 - digital data - OK to
copy other settings are only used for
audio
6-9 Mode number
10-81 Mode data
82-97 CRC check code
As with audio CD, mode 1 and mode 2 refer to different types of addressing information. Mode 3 is only used by audio CD. In the user data and lead out areas, mode one contains timing information. The layout is identical to Mode 1 in audio and is given below:
Byte number
0-1 Synchronization bits
2-5 Control bits 0100 - digital data copy
protected 0110 - digital data - OK to
copy other settings are only used for
audio
6-9 Mode number - 0001
10-17 TNO - track number
18-25 Index (00 = pause)
26-33 Minutes in sector
34-41 Seconds in sector
42-49 Frame in second (1/75 increments)
50-57 Zeros
58-65 Absolute minutes
56-73 Absolute second
74-81 Frame in second (1/75 increments)
82-97 CRC code
Mode 2 contains catalog and additional timing information. The layout is identical to Mode 1 in audio and is given below:
Byte number
0-1 Synchronization bits
2-5 Control bits 0100 - digital data copy
protected 0110 - digital data - OK to
copy other settings are only used for
audio
6-9 Mode number - 0010
10-61 Catalog number of the disk
62-73 12 bits of 0
74-81 Frame in second
82-97 CRC code
The F3 frame is now 33 bytes long and consists of 32 bytes plus the subcode.
¡@
H. EFM modulation
All 33 bytes of the F3 frame are now 8-14 bit (EFM) modulated. The resulting 33 14-bit words are separated by 3 merge bits. This operation is identical to that performed in CD audio. Again, the merge bits are determined to avoid adjacent "1s" and minimize the DSV.
(Note: For additional explanation of EFM modulation and merge bits -- see my course notes 498_95_7.doc entitled Audio Compact Disk - Writing and Reading the data.)
I. Sync header on the F3 frame
Each F3 frame finally has a 24-bit synchronization word attached to the very front end -- (just for completeness the word is (100000000001000000000010) and this is spaced from the other symbols by three merge bits.
¡@
Part III - Error signals
Although the error detection and correction algorithms used in CD players are very robust, CD manufacturers still must pay special attention to disk quality to assure that disk errors do not exceed the error detection and correction capabilities of the encoding algorithm.
If we look at the frame level for the CIRC decoder -- there are two principle error correcting stages -- the one that sets the P parity and the one that sets the Q parity. These stages are called C1 for the P parity (frame level) and C2 for the Q parity (frame level).
Four flags are output from a typical CD disk measuring system. These are C1F1, C1F2, C2F1 and C2F2. These flags (and the information they provide) is summarized below[2].
C1F1 C1F2
L L No errors in C1
H L 1 error corrected in C1
L H 2 errors corrected in C1
H H More than 2 errors in C1
(passed to C2)
C2F1 C2F2
L L No errors in C2
H L 1 error corrected in C2
L H 2 errors corrected in C2
H H More than 2 errors in C2
BLER -- The Block error rate (BLER) is a rather complex measure of the errors in a CD. The block error rate is determined by counting up all the various types of C1 errors (i.e. one bad symbol, two bad symbols, more than two bad symbols) over a given time interval. Thus, the block error rate is in errors per second. Notice it does not differentiate between correctable and uncorrectable errors. It also does not differentiate between 2 errors in the C1 decoder and 588 errors. The typical maximum BLER is 220 errors per second.
Burst error -- The burst error rate measures the number of consecutive bad frames. Typically the threshold for a burst error is held at 7 frames. The burst error is often given over the entire disk rather than as a rate. Generally, disks with any burst errors are not shipped.
¡@
One lecture of material
CD-recordable and WORMs
The CD audio and CD/ROM standards were originally developed as a read-only medium. However, mechanical and optical technologies were rapidly developed to permit a writable CD format. These writable formats are variously called CD-R (CD-recordable), Write-Once Read-Many (WORM) or CD-WO (CD Write Once).
In spite of the fact that these names originally all meant the same thing, subtle differences between them have emerged in the culture. As a consequence, write-once read-many techniques that produce disks readable only on special machines (such as DMM or DRAW techniques) are usually referred to as WORMS. Techniques that produce disks readable on any CD-audio or CD/ROM reader (such as cyanine-based organic dyes) are called CD-Rs.
¡@
DMM techniques[1]
Direct metal mastering (DMM) techniques are the oldest of the CD writable technologies. This process creates glass master disks which can be read on special readers (thus justifying calling it a WORM disk) but are most often used as masters to create mothers and sons for small production runs of disks.
In this process, a glass substrate disk is covered with a thin (few nm) nickel flash. Then a copper layer (several hundred nm thick) is sputtered or evaporated over the nickel. A piezoelectric mechanical stylus is used to emboss pits in the disk. By correctly designing the diamond embossing tip and cleverly optimizing the write speed, it is possible to create disks that mimic conventional photoetched master disks.
Notice however, that although this technology creates disks that can be read -- the use of a glass master plate limits the number of machines that they can be read in.
¡@
DRAW technologies[2]
Like DMM technologies, Direct Read After Write (DRAW) technologies were originally developed as an alternative to conventional mastering techniques. In this process, the glass master disk is covered with a thin layer of plastic. A relatively high power laser (such as a 50 mW argon ion) is used to vaporize pits in the polycarbonate layer. A smaller HeNe laser is running right behind the argon to check for pit depth and length and adjust the feedback system accordingly.
Notice however, that although this technology creates disks that can be read -- the use of a glass master plate with no reflective layer essentially means that it can only be read in the mastering machine that created it.
CD-R systems
A number of different techniques have been tried to create a writable disk technology such that the disks can be read in any CD-audio or CD-ROM recorder[3].
Ablative: These techniques are similar to the DRAW process in that a high powered laser is used to physically ablate a thin metal film. Tellurium and its alloys are often used because of the low melting point of the metal.
Bubbling: These techniques rely on creating a bubble or blister in a material by means of heating specialized materials with a laser beam.
Optical methods: In these techniques, irreversible optical changes are made in the materials. For example, the reflectivity, index of refraction, or polarization characteristics can be changed.
Dye-based methods: The most popular current CD-R technologies are the dye-based technologies which use cyanine and phthalocyanine dyes. The dyes are sandwiched between the gold and the polycarbonate substrate. Interaction with the laser heats the dye and then creates a pit in the polycarbonate[4].
The main advantage of the dye-based technologies is that both the disks and the recorders are relatively cheap to manufacture. As of today, you can get a modest quality 2x dye-based recorder for under $2000. Top-of-the-line units run around $5000. Within a year, it is expected that 1x units will be available to the general consumer for less than $1000. There is a clear trend for dye-based recording materials to take over the consumer market in CD-R.
Archival use of dye-based recording materials.
The advent of the dye-based CD-R materials marks the first time that an economical CD recording technology was available to the general public.
One of the key applications that immediately appeared was the use of the CD-R materials for archival data storage. This type of data storage may be as simple as backing up your home PC -- or as complex as storing income tax records for all Americans. Either way, the issue of archival data storage and CD-R is a very hot issue at the present time.
There are several factors which may limit the archival lifetime of CD-R media. These are listed below in order of decreasing importance:
Aging of the phthalocyanine is a critical issue in evaluating the lifespan of CD-recordable disks. The optical characteristics of the phthalocyanine dye are critical in the recording process. A small change in the absorbance properties of this material can result in a disk error.
The photosensitivity of the cyanine and phthalocyanine dyes may be an issue in the archival stability of CD-R media. Cyanine and phthalocyanine dyes are organic materials that may be subject to damage under blue or UV lighting conditions. Although the Orange Book standard does include a "light fastness" test -- it is not clear that the spectrum of this light is a good match that found in typical office environments.
Anecdotal reports by 3M researchers suggest high block error rates when CD-R disks are exposed to sunlight. Although not formally confirmed or published -- such informal testing shows reason for concern.
Speed is clearly a critical issue for certification in an archival application. When using single and double speed access, virtually all manufacturers seem to assume that the Orange Book standard is adequate. However, at higher speeds, it is not so clear. The faster speed means higher power on the laser, and an associated higher risk with the writing process.
It is interesting to note that the recently released Yamaha Expert CD-100 recommends special media (beyond Orange Book standards) with their 4xS (4 speed) logo. Yamaha has established a complete certification program for their 4xS disks and will certify other manufacturers disks.
Additional indications of write-speed issues are indicated in tests by APEX company which suggest that the optimal writing power changes significantly with write speed[5]. Additionally, there are currently efforts underway by the Orange Book Study Group of Japan (OSJ) to include a "write-speed" specification in the subcode for the CD-R standard[6].
Both physical and logical standards for CD-recordable appear to be in a severe state of flux.
Physical standards: The current defacto physical standard for CD-recordable is the Orange Book. However, the recent introduction of the well-received Yamaha Expert CD-100 CD-recorder and the Philips CDD 522 is rapidly forcing development in the direction of 4x standards[7]. At this point, the Yamaha 4xS standard looks like the best contender for the standard of the next five years. However, Sony is also discussing another 4x standard, perhaps for disk release in 1995.
Logical standards: The current defacto logical standard is ISO-9660. However. mere are signs that this may be changing. ISO-9660 was developed in the mid-80's prior to multi-session recording capabilities. ISO-9660 does not support multi-session. Developments in multi-media technology are going to force multi-session development -- thus moving the technology away from ISO-9660. There is a new standard under discussion (Frankfurt, ISO 13490) which will support multisession. However, ISO 13490 is not currently well received.
As just one example, a key consideration in support of a new standard is support of the major industrial software houses -- in particular, Microsoft. Unfortunately, Microsoft has not indicated any plans to write a new version of MSCDEX to support ISO 13490 and has not announced any plans to support it in Chicago. Unfortunately, the problem of single versus multisession will NOT just go away. It does not appear that the ISO 13490 will triumph -- and it is not at all clear what will. This is a critical question for archival purposes, because it is possible that a new multisession standard may be more geared to multimedia applications where a higher BLER is acceptable in a tradeoff for speed, and less geared toward low BLER archival applications. Notice that the ISO or IEC standards are public domain -- but (at least currently) Kodak Photo CD is proprietary. Again -- this poses interesting questions for evaluation of disks for archival applications.
Qualifying CD-R for archival applications requires more than just qualifying the CD-R media. The problem of qualifying CD-recordable (CD-R) for archival application is somewhat different from the broader problems of qualifying CD-ROM and CD-R media alone. In archival applications -- the goal is to be able to reliably read data at some future data (typically several decades from the present). This means that qualifying CD-recordable for archival applications is a linked problem between media, recorders, readers, and software. Putting it bluntly, it is not enough for the media to have survived error-free for several decades if there is no technology that has survived to read it. This is important to consider, because several US companies (e.g., 3M and Kodak) are developing qualification standards for media alone. Valuable as these standards will be -- they do not address the archivist's problems.
CD-recordable is right on the edge of a rapid expansion phase. This generally means that formats, standards, and CD-R technologies may be obsolete in the near future. The consumer can now purchase a high quality 1x CD-recordable unit for under $2000. 4x units sell for around $5000 -- and the prices dropped nearly 30% in 1994. Philips has recently announced plans to introduce a 2x recordable with a 4x player for less than $1000 by mid-1995. This is a familiar pattern reminiscent of the period of very rapid growth following the release of the first consumer VCRs. Although this rapid growth is great for the consumer, it means that the archivist must consider the various available CD-R alternatives carefully. The hot new 4x CD-recordable technology of 1995 -- may be totally and completely obsolete in 2005. Thus, development must be planned to gracefully phase between technologies -- with the end goals of minimum error rates and maximum compatibilities -- rather than speed.
The technological development of the CD-R recorders is primarily overseas (Netherlands and Japan). As such, there is still not much effort being spent on developing CD recordable qualification standards within the US. Although there are several U.S. organizations working on development of standards for evaluating CD/ROMs (in particular, OSTA and AES) -- these are primarily oriented toward CD/ROM media.
CD-I
Compact disk interactive is a combined media which permits the simultaneous storage of audio, video, graphics, text, and data. In essence, CD-I is a multimedia CD/ROM data format.
CD-I systems are intended to be single media systems -- one where the disk contains both the program and the data. The inclusion of program data on the disk distinguishes them from the more conventional storage of audio and video format material on a CD/ROM. CD-I materials are typically intended to play on augmented television sets, rather than requiring a full PC system.
The CD-I data standard is essentially identical to the CD/ROM standard. The major distinguishing difference is that an 8-byte subheader following the regular 16-byte CD/ROM sync/address/mode header.
Byte number (starting at 0) Contents
0 0000 0000
1-10 1111 1111
11 0000 0000
12 Minutes - 74 max (+ hex A)
13 Seconds - 59 max
14 Block number within sec. (75
block/sec)
15 Mode
16-23 CD-I subheader (8-bytes)
24-2351 User data and EDC, ECC
In mode 1, the "blank" bytes in CD//ROM are moved out to make room for the header as:
Byte number Contents
0 0000 0000
1-10 1111 1111
11 0000 0000
12 Minutes - 74 max
13 Seconds - 59 max
14 Block number within sec. (75
block/sec)
15 Mode 01
16-23 CD-I subheader (8-bytes)
24-2071 User data (2048)
2072 - 2075 4 bytes of EDC
2076 - 2247 172 bytes of P-parity
2248 - 2351 104 bytes of Q-parity
CD-I also offers various levels of audio quality in order to maximize disk space utilization[8]:
Name Coding Sample rate # bits in Bandwidth Channels Size of 1
word second of
sound
CD-DA PCM 44.1 kHz 16 bits 20 kHz 1 stereo 171.1 kB
A ADPCM 37.8 kHz 8 bits 17 kHz 2 stereo 4 85.1 kB
mono 42.5 kB
B ADPCM 37.8 kHz 4 bits 17 kHz 8 stereo 4 21.3 kB
mono 42.5 kB
C ADPCM 18.9 kHz 4 bits 8.5 kHz 8 stereo 21.3 kB
16 mono 10.6 kB
Information phonetic
CD-I disks can also store video information compatible with either the US NTSC or the European PAL/SECAM format. Three standards of video resolution are supported[9]:
Name horizontal pixels vertical pixels
NTSC 360 240
PAL 384 280
Double resolution NTSC 720 240
Double resolution PAL 768 280
High resolution NTSC 720 480
High resolution PAL 768 560
Three picture encoding schemes are supported[10]:
¡@
DYUV: Delta Y-U/V coding, reduces a standard single picture to 85 kB (NTSC) 105 kB (PAL)
¡@
RGB 5:5:5: 16 bit color (5 each for R, G, and B, one for transparency). 70 kB (NTSC), 210 kB (PAL)
¡@
CLUT: Color look up table, permits 4-bit/16, 7-bit/128 and 8-bit/256 color animation. With reduction, 85 kB (NTSC) 105 kB (PAL)
¡@
MPEG: CD-I was the first consumer introduction of MPEG and a CD-I disk can hold over 60 minutes of encoded MPEG plus audio.
¡@
Photo CD[11]:
Photo CD is a special CD format developed jointly by Eastman Kodak and by Philips. The idea behind Photo CD is to provide a digital storage method for consumer photographs. The consumer would use conventional 35 mm film which would then be scanned by a Photo CD system. The digitized image would then be processed both for compression and to correct for exposure and color balance. These images would then be written to a CD-R for permanent storage. A typical CD would hold approximately 100 digitized photographs.
The Photo CD system scans images at 2048 pixels per inch. In relation to a 35 mm negative, this means 3072 pixels per line for 2048 lines. The RGB information is stored with 12 bit quantization. The encoding process is similar to NTSC, in that the color is separated into a single 8-bit luminance component and two 8-bit chrominance components.
A full Photo CD image would then include 2048 * 3072 * 24 bits = 18.87 MB. In order to reduce the file size, the chroma channels are two-times undersampled in both the vertical and horizontal directions. Thus, with undersampling, the image is reduced in size to 9.4 MB.
A clever hierarchical form is used to compress the data. Three low-resolution images are stored uncompressed. These low resolution images are Base = 768 pixels x 512 lines, Base/4 = 384 pixels x 256 lines, and Base/16 = 192 pixels x 128 lines. Higher resolution images (4Base and 16Base) are formed by successively interpolating and correcting high resolution residuals relative to the base file.
For example, to display a 4Base image, the Base image is interpolated along both the vertical and horizontal directions to obtain an uncorrected 1536 pixels by 1024 line image. The 4Base residual is then decompressed from its Huffman encoded form and used to correct the interpolated high resolution image. The result is a high resolution image at a fraction of the storage space.
Photo CD images are stored on the disk in conjunction with a construct called an Image Pac. The Image Pac contains photofinishing information, the encoded form of the image, and some microcontroller readable sectors to permit low end machines to read the image.
An Image Pack ranges from 4-6 MB in size. Additionally, the Photo CD contains an overview Pac with reduced copies of all of the image Pacs on the disk.