Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Embedded Controller Hardware Design (Ken Arnold, 2000).pdf
Скачиваний:
214
Добавлен:
12.08.2013
Размер:
1.35 Mб
Скачать

111CHAPTER FOUR

Memory Technologies and Interfacing

Error Detection and Correction

Error detection circuitry stops an operation before erroneous data is used, such as a parity error trap. Error correction on the other hand, uses redundant data to reconstruct the original data to be used when operation must continue without interruption. Error detection and correction are not often used in small systems because of the relatively low probability of error and high cost of error detection and correction hardware. In systems like PCs and workstations, larger RAM memories result in the need for error detection as a minimum, and error correction in systems requiring high reliability. In most PCs, a ninth bit in each byte stores parity information, and if there is a parity error, an interrupt trap will stop operation and display an error message.

There are two types of errors: hard errors and soft errors. If an error occurs only once, due to noise or a transient error condition, it is referred to as a soft error. A hard error is one that always occurs, such as a read/write memory bit that is stuck in one state and can’t be changed.

Error Sources

Hard errors are usually caused by a permanent hardware defect, while soft errors can be caused by any one of several events, including timing errors, synchronization problems, software bugs, or even the passage of a charged subatomic particle resulting from the decay of trace radioactive materials flying through an IC. As a designer of an embedded system, it is necessary to allow for the occurrence of these events, and minimize the severity of their effect on the overall system. In order to accomplish that goal, it is necessary to detect the occurrence of such an event as a minimum.

Confidence Checks

The confidence check is frequently used to detect these errors, and can be modified to correct certain subsets of the errors as well. Probably the most well known of the detection techniques is parity. Its widespread use is due to the simplicity of its implementation. In the most common form, a single bit is added to every word, containing the parity check bit. The parity bit is set or cleared depending on whether there are an even or odd number of ones in the

112EMBEDDED CONTROLLER

Hardware Design

original word to be checked. Whenever the data is handled, the contents are checked against the parity bit. If any one bit in the word has changed, then the parity of the data will not match the parity bit accompanying the data, indicating an error. For a single byte or word, this is usually a reasonable assumption, however for a large block of data, it is not reasonable. Horizontal parity refers to the parity of a single word of data, while vertical parity refers to the parity of one bit position in multiple words. They are combined to form block parity, which assigns one parity bit for each word horizontally and one parity bit for each bit position in the block of words.

Block parity allows the detection and correction of single bit errors. Since a single bit will cause one horizontal and one vertical parity error to occur, correcting the bit in error requires only complementing the bit belonging to the row and column corresponding to the parity errors. Note that multiple errors may not be corrected or even detected, depending on where they occur.

Here is an example using odd parity:

data:

Horizontal parity:

1 0 1 1 p=0 odd horizontal parity

1 1 1 1 p=1 even horizontal parity +1 = odd parity 1 0 0 1 p=1 even horizontal parity +1 = odd parity 1 0 1 1 p=0 odd horizontal parity

1 0 0 1 < The odd vertical parity bits for the four words above

Another version of parity checking is called Hamming code after its inventor, R.W. Hamming. It is a code in which multiple parity bits are appended to each word in such a way that a single bit error will generate a group of parity bits having a value equal to the data bit number in error.

A checksum is another technique that can be used to detect an error in a group of characters. The idea is simple enough: sum all the data words and keep the least significant bits of the sum. (For you math majors, that’s summing the data modulo 2n, for n bit words.) Checksums are frequently used by various types of memory and logic device programmers to verify that the desired program has been “burned” into the device. A checksum will detect some, but not all, of the common errors in a block of data. For example, it won’t detect errors due to the data being stored in the wrong sequence, since the sum of the numbers is the same regardless of the order. A practical