Sampling and Nyquist's Theorem for Audio and Video

Updated 10/29/03

All line doublers, good comb filters, and DVD players process video as digital and must sooner or later convert it back into analog form.

This is an explanation of why video resolution and audio frequency response might not as great as Nyquist's theorem and sampling theory would otherwise suggest..

**In a Nutshell**

All analog to digital conversion of audio and video is accomplished by sampling. Nyquist's theorem states that it is only necessary to take samples at a rate of slightly over two per cycle of the highest frequency component of the source analog signal.

That is only half of the story. The other half is that when the highest frequency content of the subject is close to half the sampling rate, the process of turning the samples (digital representation) back into a waveform or video scan line (analog representation) is complicated and sometimes not done right.

For video, we take samples by dividing up the picture into a grid of evenly spaced spots we call pixels. The highest frequency component stands for the finest details. A cycle consists of one dark detail followed by one light detail.

The fallacy in video or audio reproduction is that recovering the analog signal might be done incorrectly -- simply by "connecting the dots or samples". I believe that the terminology used is "analog domain" for the original waveform and "digital domain" for the conglomeration of samples (pixels in the case of video). Proper conversion functions, I believe they are called sinc (not sine, not sync.) functions, transform the video signal back into the analog domain. Simply connecting the dots with nearly straight lines that together make a semi-smooth overall curve produces an analog signal but it is really still in the digital domain, the pixel footprint remains impressed on the video. This explains why we lose resolution when samples straddle rather than coincide with details.

**A Few Rules:**

1. Fourier's theorem states that any complex waveform is the sum of sinusoids (sine waves, the simplest kind of waveform). Any cycle of the subject waveform can be selected and considered to be one cycle of a periodic (repeating) waveform where the fundamental frequency is represented by a sine wave of that wavelength. If the selected cycle is not itself a sine wave, then the other sine waves making up the sum are all multiples of the fundamental frequency.

2. Nyquist's theorem states that, if we sample the complex waveform uniformly at a rate just a tad over twice the highest frequency component sine wave contained within, the conglomeration of samples thus obtained are sufficient information to reconstruct the waveform.

3. A mathematical function, when given a particular set of inputs, always produces the same result. Although more than one set of inputs can yield the same result, one set of inputs cannot cause a function to deliver one output some of the time and another output some other time (unless time is itself one of the inputs).

Nyquist's frequency is the frequency that no part of the source material may attain or exceed, which is half of the sampling frequency.

Consider the following situation:

Let's pretend that in the left diagram below we are sampling at very slightly over twice the frequency of the subject waveform, and in the fashion as shown. (For video pretend that black is "up" and white is "down".) We'll have to use some imagination here because the computer monitor can't reproduce the diagram with enough accuracy to show the difference between sampling at exactly twice the frequency (not high enough) and sampling at slightly over twice the frequency as we wish to discuss.

Suppose that we reconstruct the analog output by "connecting the dots" as in the center diagram. Here's a secret: the angular waveform in the center then has a fundamental frequency equal to half the sampling frequency (not exactly the frequency of the original waveform), also it possesses many harmonics, or sine waves that are multiples of the fundamental. (Restating Rule 1 above, any repeating waveform that is itself not a plain sine wave has harmonics of its fundamental frequency.) In a real life system with analog circuits there is a bandwidth and for the highest fundamental frequencies the system is designed for or contrained to, their harmonics won't pass through and content whose fundamental frequency is more than about half the bandwidth is reduced to (de-facto low pass filtered into) pure sine waves represented by the waveform on the right.

Proper reconstruction of the original waveform can be expressed in English as "curve fitting". For this latest discussion, let's again declare (or at least imagine) that dots depicting the samples in the above diagram represent a frequency slightly greater than the original analog signal frequency as we explained earlier. The curve we want to fit and hit all the dots represents the finished waveform but we have an added constraint:

4: The finished reconstructed waveform must not possess any content (such as harmonics) greater than or equal to half the sampling frequency as would be revealed if that waveform were re-decomposed in accordance with Fourier's theorem.If we did not have Rule 4, then more than one curve could be found that went through all the samples and we would have violated Rule 3.

Also if Rule 2 was obeyed, a final output signal that violated Rule 4 could not be the same as the input.

Thus I suppose (I am not a Fourier and Nyquist expert) that the one and only curve that obeys Rule 4 and connects all the dots is the original waveform.

If the output was the angular waveform in the middle, using what we explained earlier, it clearly violates Rule 4 due to the harmonic content. We can't see from the small diagrams above but if we re-use the rightmost diagram to represent exactly half the sampling frequency itself, it too as a finished reconstructed waveform does not satisfy Rule 4; the highest frequency allowed in the original material is an infinitesimal tad less.

**Lookahead**

In reconstructing the original analog waveform (and also for manual curve fitting using French curve templates), the process can't take just one dot at a time but instead must size up several upcoming dots (looking ahead) at all times. For example we may work on samples 1-8, then work on samples 2-9, then 3-10, and so on. The closer any of the actual frequency content is to the sampling frequency, the more lookahead is needed (I don't know the formula for how much). Insufficient lookahead means that the reconstructed waveform might come out different from the original, for example the original signal on the left in the above diagram might still be reconstructed as the waveform on the right. The lesser the amount of lookahead, the greater the chance that the reconstruction logic runs into a situation where it cannot "connect the dots" and still conform with Rule 4. What usually happens is that the reconstruction logic inserts a discontinuity (a sudden bend that violates Rule 4) and then keeps on going.

**Output Oversampling**

Some D/A conversion techniques use digital processing where several graduated steps, or subsamples, are used to shape the output waveform between the samples that underwent digital processing. Were it not for these graduated steps, that process could not do better than a rudimentary "connect the dots" process that violates Rule 4. This process with graduated intermediate steps is referred to as output oversampling.

**Output Low Pass Filtering**

Even though it may make an excellent attempt to conform with Rule 4, the D/A conversion may still produce an output waveform with spurious high frequency content. Jaggies or stairstepping or squared off peaks or peaks of the right width but too sharp seen if the waveform is graphed are examples of such "noise". For example the process using output oversampling as described immediately above will still have tiny jaggies in the waveform emerging from the D/A converter. The process is still considered successful when an analog low pass filter follows and the output after that conforms with Rule 4.

Of course the analog waveform is reconstructed from the samples using mathematical formulas, not pure trial and error.

To conform with Rule 2, the original material must not possess any harmonic content in excess of (or equal to) half the sampling frequency. It is necessary to low pass filter the input material to prevent violation of this rule. Should noise entering the system after this filtering represent higher frequency content, the samples can be corrupted and at the other end, the D/A conversion, the finished waveform that obeys Rules 3 and 4 above will not match the input.

It is common in audio applications for sample values to be accurate to 16 bits or one part in 65536, or for video applications, just 8 bits or one part in 256. When the content is very close to half the sampling frequency, slight inaccuracies in the samples can result in major inaccuracies in the finished analog waveform. There could even be large peaks in the output that were not present in the original material.

The closer the highest frequency content comes to half the sampling frequency, the more complicated the D/A conversion calculations must be. Practically speaking, this means more expensive circuitry. And the more correct the D/A conversion is, the more likely and more profound artifacts from noise and inaccurate sampling will be with respect to the highest frequency content.

Thus noise, sampling inaccuracy, and D/A conversion accuracy all interact to govern the quality of the output. By choosing a higher sampling frequency relative to the highest frequency in the content, inaccuracies in sampling and less than perfect D/A conversion become less important.

Now back to the world of video, and the bad news.

5. Digital to analog converters, like comb filters and line doublers, come in different levels of quality. Nowadays every DVD player (and every line doubler) needs a D/A converter to get the video back into analog form to send down a component video or S-video cable. The ultra simple D/A converters merely connect the dots i.e. look ahead zero pixels beyond the pixel being worked on..

6. The video source material may contain details finer than the pixel spacing. In mathematical, Fourier, and Nyquist terms, that means the material being sampled contains frequencies in excess of half the sampling frequency. When this happens, all bets are off regarding the ability to recreate the input from the samples. Incidentally, recording the video as analog at first and low pass filtering it will remove all details that are too fine to sample.

7. Video samples typically have an accuracy of no better than one part in 256. (The luminance or dark/light value for each pixel on DVD is an 8 bit number.) More lookahead is then needed to properly reconstruct the highest frequencies, and if the high frequency content is only brief (fine detail for a short horizontal distance across the screen) accurate reconstruction is not possible.

8. As part of the MPEG compression used to make the entire movie fit on the disk, the samples (one sample equals one pixel) may be manipulated or substituted with similar but not identical samples from another field or frame. Then turning them back into the analog waveform, even with perfect Fourier formulas, can no longer produce the original waveform. However if simple connect the dots D/A conversion were used, the difference between the incorrect results obtained from manipulated and compressed data and the incorrect results obtained from unmanipulated and uncompressed data is less profound.

**Another "Connect the Dots" Example**

Let's imagine a picket fence subject where thirteen pixels corresponded to six pickets and six spaces between pickets. What we see is portions of the fence with pickets appearing slightly closer together than they really are (using the pixel spacing, not the picket spacing) and other portions of the fence blurred out where the pixels and pickets were out of phase. A correct D/A conversion would show the fence with all the pickets nice and crisp at their original pitch (the video signal is analog when it gets to the picture tube).

Here is a diagram showing details going in and out of phase with the pixels. Actually the nature of analog video is such that, near the resolution limit, the reproduced bars will never be as crisp as the top row in the diagram below. The reproduction at the extreme lower left and lower right is excellent. The purpose of the diagram is to show the profound loss of detail in some places due to pixel straddling.

**The good news:** Connect the dots D/A conversion doesn't look too bad

Nobody counts the pickets in a fence when watching a movie.

If details were lost to pixel straddling in one video frame, chances are they will show up in the next frame in normal motion.

**Foldover Aliasing**

What happens, if we may ask, if we sample material that does contain content greater than half the sampling frequency?

It so happens that if the highest frequency that is supposed to be present is less than X hertz (we sample at twice X hertz), if the source material happened to contain X+Y hertz, the samples obtained would be the same as if the source material contained X-Y hertz instead. For example we sample some audio at 10 KHz. (For telephones, we don't need frequencies greater than 5 KHz.). If per chance there was some 6 KHz content, the D/A converter will recreate the analog audio with 4 KHz (5 minus 1 KHz) content instead of the original 6 KHz (5 plus 1 KHz) in accordance with Rule 4 above. For audio, this produces sour notes. So for audio, it is necessary to do low pass filtering of the original source material prior to A/D conversion.

The phenomenon of frequencies that are more than half the sampling frequency coming back as frequencies that are less than half the sampling frequency is referred to as foldover. It is a kind of aliasing in that the result after conversion back to analog looks like something else, namely what could have been another original source with such lower frequencies already present.

We won't explain thoroughly here the reason for wagon wheels seemingly rotating in reverse as seen in the movies which is also a situation where the sampling frequency is too low. But as a hint we will state that the sampling frequency is the frames per second rate and the frequency we are trying to reproduce is the rate at which spokes pass the "6 o'clock position" on the wheel.

**A De-Rating Factor for Digital Video**

In this writer's opinion, we need to take three samples for each light and dark detail pair (line pair; waveform cycle) for still pictures such as from a computer scanner. Thus a photograph that supposedly has at most 200 dots per inch of resolution would need to be scanned at 300 DPI. In video the subject is usually in motion so details that are straddled in one video frame would likely be clear in the next video frame. The Kell factor represents the ratio between lines of resolution and pixels (spanning the same distance used for reference). It is subjective, and is said to be about 0.7 for still pictures and early (1950's) monochrome video, and about 0.85 to 0.9 for full motion video.

Click here for scanner sample images.

Click here for more on de-rating factors.

All parts (c) copyright 2000-2003, Allan W. Jayne, Jr. unless otherwise noted or other origin stated.

If you would like to contribute an idea for our web page, please send us an e-mail. Sorry but due to the volume of e-mail we cannot reply personally to all inquiries.