
6. Data compression
Information theory provides a method for determining exactly how many bits are required to specify a given message to a given precision. This method is called the theory of data compression or, more technically, rate distortion theory. Often the message selected at the source need not (or cannot) be transmitted perfectly to the destination. For example, in a telephone conversation, extremely high sound quality is not necessary. It is usually sufficient that the two parties recognize and understand each other. Similarly, if the message is a scientific measurement—for example, a measurement of the number = 3.14159265…—it is not possible to transmit all of the digits in a finite amount of time. However, a useful approximation of can be transmitted with a relatively small number of bits.
The general idea of data compression theory is illustrated in the graph below. The horizontal axis measures the distortion, or imprecision, that is tolerable in a given message. The vertical axis gives the minimum possible number of bits required to specify the message with this distortion. The graph shows that as the acceptable distortion becomes smaller and smaller, the required number of bits becomes larger and larger. Conversely, as the allowed distortion becomes larger, the required number of bits decreases. Ultimately, the number of required bits becomes zero. The number becomes equal to zero when the allowed distortion can be achieved by merely guessing at the message.
|
Consider the following simple example. If the message is the outcome of the toss of an ordinary coin and it is acceptable to be wrong 50 percent of the time, then no bits are required. In this case, one can simply guess heads and be sure of being right 50 percent of the time, on average.
The relationship shown by the rate-distortion curve represents the absolute minimum number of bits, on the average, required to represent a message with a given distortion. The exact shape of the curve depends on the details of the particular situation. Shannon’s Fundamental Theorem of Data Compression states that, in principle, it is possible to compress a message to a given level, but no more. Exactly how this compression should be done he did not say, but later researchers developed practical techniques for data compression based on Shannon’s theorems. These techniques use mathematical algorithms, or repeated computations, to compress data. Many of these methods come very close to achieving the ultimate limits set forth by information theory.
7. Information theory applications
Information theory began as a theoretical science. Nevertheless, its insights led to a revolution in the design of digital transmission and storage systems. Three major areas that have directly benefited from information theory are transmission systems, storage systems, and the Internet.
The first transmission systems to be influenced by information theory were spacecraft communication systems. Since the late 1960s, long-range space probes, such as Pioneer, Voyager, Galileo, and Cassini, have used digital communication systems enhanced by advances made through research in information theory. In the early days of space exploration, transmission speeds for signals from distant probes were measured in tens of bits per second. Thanks in large part to information theory, these rates have increased to hundreds of thousands, and in some cases millions, of bits per second. Computer modems have also benefited from information theory. In the 1980s modems often operated at speeds no faster than 300 bits per second. By the late 1990s, modems routinely operated at speeds up to 56,000 bits per second.
Computer storage systems are also designed using the guidelines provided by information theory. The random-access memory, or RAM, in modern computers would be impossible without error-control coding designed by information theorists. High-capacity hard disks and CD ROMs are similarly protected. Many of today’s consumer electronic devices would also be impossible without information theory. Recording engineers have used concepts such as channel capacity and entropy to guide the design of compact disc, DAT (digital audio tape), and DVD (digital video disc) players and recorders.
The Internet and the World Wide Web are computer networks that store and transmit large amounts of data. Sending and receiving large amounts of information accurately over these networks require large amounts of channel capacity. Information theory, especially its data compression algorithms, has played a large part in making the Internet practical. For example, sending and receiving still or moving color images require large amounts of memory and would ordinarily overwhelm the capacity of the Internet. With data compression algorithms, large images can be reduced to an efficient and manageable size, making rapid exchanges of information possible.