Error control

From Citizendium
Revision as of 15:00, 23 August 2008 by imported>Howard C. Berkowitz (snapshot while writing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Both in networking and on computers and their storage device, there are a family of techniques called error control. There is no one best strategy, because different mechanisms, using the data, respond differently to errors. Indeed, there are different kinds of error that can have different effects on the same system.

The core technology is error detection, of which the most common case is finding that a bit has changed in transmission or storage, corrupting the data structure containing it. In some cases, such as voice over internet protocol, it is quite adequate to discard, silently, an occasional unit containing errored bits. When the digitized voice is converted back to sound, the human ear is quite tolerant to occasional interruption of sound. Hearing is much less tolerant to the delay between quanta of sound, but delay variations are not generally considered within the scope of error control.

If the application cannot tolerate bit errors, the next question is how to correct the error. The most common method, at least in networking, is to retransmit the data received in error, until it is received correctly or other mechanisms determine that the communications channel is unusable.

Error detection

Perhaps the most basic error detection methods is parity checking. Assume that the units of information come in groups of 7 data bits. On transmission, the sender counts the number of "one" bits in the data. If that total is odd, the 8th "parity bit" is set on, assuming "odd parity" is the default. If the total is even, the parity bit is set to zero.

At the other end of the channel, the receiver counts the "one" bits, computes the parity of the received bits, and compares it to the parity bit. If the parity bit setting does not match the parity of the data bits, the entire group of 8 bits is assumed to be errored, which covers the contingency of the data bits actually being correct, but the parity bit was corrupted.

Simple parity has fairly basic limitations. It can detect a single bit error, but if two bits are changed, the parity will remain the same and the error will not be detected.

There are a variety of more powerful error detection algorithms, which produce error-checking fields longer than one bit. 16 or 32 bit fields are common. Depending on the particular mechanism, the entire field may be discarded. Alternatively, some methods can reconstruct the correct information; to do that, redundant error control bits must be sent with the data. There is a constant tradeoff between the overhead imposed by sending, with every unit data, enough information to reconstruct the correct data, and simply having the errored data retransmitted.

Different strategies apply to storage devices and communications network. If there was an error in writing disk data, or if the disk became corrupted, repeated reads will still produce data with a failure. In networks, however, it is entirely likely that the error took place in transmission, and retransmitting may cause a transfer of the correct information.

Error correction

As mentioned, there are a number of methods of correcting errors, each of which has its own performance tradeoffs.

Retransmission

One of the most basic retransmission methods is called "stop and wait", or "ACK-NAK", ACK standing for acknowledgement. In this message, the data are sent with an error-detecting field. The transmitter will not send another unit of data until it receives a positive acknowledgement that the data were received correctly. A given error-correcting protocol may or may not have a "negative acknowledgement", which are rarely used.

Even with explicit acknowledgement systems, the transmitter can start a timer when it sends the data. If the timer expires and the data has not been acknowledged, it needs to be retransmitted. Conceivably, if the NAK could be delivered much faster than the transmit acknowledgement timer expiration, there might be a performance benefit, but the transmitter still has to have a timer to cover against the contingency of an ACK or NAK being dropped in the return path. Transmission Control Protocol is a common example where only positive acknowledgements are sent.

Stop-and-wait is inherently inefficient, if there is traffic flow in both directions, since the sender has to wait for the data to be transmitted, checked, and then the response transmitted. There are several techniques, which can be used in combination, to increase efficiency. All require that the units of information be numbered, with a different sequence number space in both directions of transmission.

Assuming a TCP connection from A to B, as B receives traffic from A and sends its own data to A, the messages it sends can contain an acknowledgement of the number of units of data that have been succesfully received. This piggybacking' techniques allows simultaneous flow of data and acknowledgements.

Redundant transmission

Forward error correction