rwdgutenberg 0.09 → 0.12
Sign up to get free protection for your applications and to get access to all the features.
- data/Readme.txt +20 -7
- data/code/01rwdcore/01rwdcore.rb +3 -0
- data/code/01rwdcore/openhelpwindow.rb +1 -1
- data/code/01rwdcore/runopentinkerdocument.rb +1 -1
- data/code/01rwdcore/rwdtinkerversion.rb +1 -1
- data/code/superant.com.gutenberg/0uninstallapplet.rb +17 -11
- data/code/superant.com.gutenberg/changegutenbergname.rb +2 -2
- data/code/superant.com.gutenberg/clearbookscreendisplay.rb +5 -5
- data/code/superant.com.gutenberg/cleargutenbergfiles.rb +0 -0
- data/code/superant.com.gutenberg/cleargutrecordfiles.rb +0 -0
- data/code/superant.com.gutenberg/copyfilename.rb +2 -2
- data/code/superant.com.gutenberg/createnewnote.rb +13 -12
- data/code/superant.com.gutenberg/deletegutenbergrecord.rb +9 -8
- data/code/superant.com.gutenberg/gutenbergcreatefile.rb +22 -10
- data/code/superant.com.gutenberg/helptexthashload.rb +21 -0
- data/code/superant.com.gutenberg/launchurl.rb +13 -0
- data/code/superant.com.gutenberg/listdirectories.rb +32 -0
- data/code/superant.com.gutenberg/listnamerecord.rb +10 -10
- data/code/superant.com.gutenberg/listnotedirshtml3.rb +57 -0
- data/code/superant.com.gutenberg/listtextfilesgutenberg.rb +72 -47
- data/code/superant.com.gutenberg/loadbookrecord.rb +55 -17
- data/code/superant.com.gutenberg/loadconfigurationrecord.rb +4 -4
- data/code/superant.com.gutenberg/loadconfigurationvariables.rb +19 -9
- data/code/superant.com.gutenberg/loadhtmlnoterecord.rb +31 -0
- data/code/superant.com.gutenberg/openhelpwindowgutenberg.rb +8 -2
- data/code/superant.com.gutenberg/resetdir.rb +7 -0
- data/code/superant.com.gutenberg/runbackwindow.rb +16 -10
- data/code/superant.com.gutenberg/rungutenbergwindow.rb +89 -71
- data/code/superant.com.gutenberg/rwdgutenbergbackward.rb +27 -27
- data/code/superant.com.gutenberg/rwdtinkerversion.rb +10 -10
- data/code/superant.com.gutenberg/saveconfigurationrecord.rb +4 -4
- data/code/superant.com.gutenberg/savegutenbergrecord.rb +13 -11
- data/code/superant.com.gutenberg/updir.rb +7 -0
- data/code/superant.com.rwdtinkerbackwindow/initiateapplets.rb +110 -108
- data/code/superant.com.rwdtinkerbackwindow/installgemapplet.rb +10 -8
- data/code/superant.com.rwdtinkerbackwindow/listzips.rb +8 -2
- data/code/superant.com.rwdtinkerbackwindow/removeappletvariables.rb +6 -6
- data/code/superant.com.rwdtinkerbackwindow/viewappletcontents.rb +1 -1
- data/code/superant.com.rwdtinkerbackwindow/viewgemappletcontents.rb +1 -1
- data/code/superant.com.rwdtinkerbackwindow/viewlogfile.rb +13 -0
- data/configuration/rwdtinker.dist +4 -8
- data/configuration/rwdwgutenberg.dist +23 -0
- data/configuration/tinkerwin2variables.dist +17 -7
- data/gui/00coreguibegin/applicationguitop.rwd +1 -1
- data/gui/frontwindow0/{viewlogo/cc0openphoto.rwd → cc0openphoto.rwd} +0 -0
- data/gui/{frontwindowselectionbegin/selectiontabbegin → frontwindowselections}/00selectiontabbegin.rwd +0 -0
- data/gui/frontwindowselections/jumplinkcommands.rwd +15 -0
- data/gui/{frontwindowselectionzend/viewselectionzend → frontwindowselections}/wwselectionend.rwd +0 -0
- data/gui/{frontwindowselectionzend/viewselectionzend/zzdocumentbegin.rwd → frontwindowtdocuments/00documentbegin.rwd} +0 -0
- data/gui/frontwindowtdocuments/{superant.com.documents/tinkerdocuments.rwd → tinkerdocuments.rwd} +0 -0
- data/gui/{helpaboutbegin/superant.com.helpaboutbegin → frontwindowtdocuments}/zzdocumentend.rwd +0 -0
- data/gui/helpaboutbegin/{superant.com.helpaboutbegin/zzzrwdlasttab.rwd → zzzrwdlasttab.rwd} +0 -0
- data/gui/helpaboutbegin/{superant.com.helpaboutbegin/zzzzhelpscreenstart.rwd → zzzzhelpscreenstart.rwd} +0 -0
- data/gui/{helpaboutinstalled/superant.com.tinkerhelpabout/helpabouttab.rwd → helpaboutbegin/zzzzzzhelpabouttab.rwd} +0 -0
- data/gui/helpaboutzend/{superant.com.helpaboutend/helpscreenend.rwd → helpscreenend.rwd} +0 -0
- data/gui/helpaboutzend/{superant.com.helpaboutend/zhelpscreenstart2.rwd → zhelpscreenstart2.rwd} +0 -0
- data/gui/helpaboutzend/{superant.com.helpaboutend/zzzzhelpabout2.rwd → zzzzhelpabout2.rwd} +0 -0
- data/gui/helpaboutzend/{superant.com.helpaboutend/zzzzhelpscreen2end.rwd → zzzzhelpscreen2end.rwd} +0 -0
- data/gui/tinkerbackwindows/superant.com.backgutenberg/10appletbegin.rwd +4 -0
- data/gui/tinkerbackwindows/{superant.com.gutenberg → superant.com.backgutenberg}/1tabfirst.rwd +0 -0
- data/gui/tinkerbackwindows/{superant.com.gutenberg → superant.com.backgutenberg}/20listfiles.rwd +5 -4
- data/gui/tinkerbackwindows/{superant.com.gutenberg → superant.com.backgutenberg}/30booklistutilities.rwd +0 -0
- data/gui/tinkerbackwindows/superant.com.backgutenberg/35displaytab.rwd +26 -0
- data/gui/tinkerbackwindows/{superant.com.gutenberg → superant.com.backgutenberg}/67viewconfiguration.rwd +0 -0
- data/gui/{frontwindowselections/superant.com.rwdtinkerwin2selectiontab/jumplinkcommands.rwd → tinkerbackwindows/superant.com.backgutenberg/81jumplinkcommands.rwd} +2 -0
- data/gui/tinkerbackwindows/superant.com.backgutenberg/9end.rwd +6 -0
- data/gui/tinkerbackwindows/superant.com.gutenberg/10htmlnote.rwd +46 -0
- data/gui/tinkerbackwindows/superant.com.gutenberg/12tabfirst.rwd +39 -0
- data/gui/tinkerbackwindows/superant.com.gutenberg/35displaytab.rwd +4 -1
- data/gui/tinkerbackwindows/superant.com.gutenberg/50listfiles.rwd +37 -0
- data/gui/tinkerbackwindows/superant.com.gutenberg/81jumplinkcommands.rwd +1 -1
- data/gui/tinkerbackwindows/superant.com.tinkerbackwindow/75rwdlogfile.rwd +20 -0
- data/gui/tinkerbackwindows/superant.com.tinkerbackwindow/81jumplinkcommands.rwd +1 -1
- data/gui/zzcoreguiend/{tinkerapplicationguiend/yy9rwdend.rwd → yy9rwdend.rwd} +0 -0
- data/init.rb +15 -10
- data/installed/gutenbergdata02.inf +2 -2
- data/installed/{rwdwgutenberg-0.09.inf → rwdwgutenberg.inf} +3 -2
- data/lang/en/rwdcore/languagefile.rb +4 -3
- data/lang/es/rwdcore/languagefile-es.rb +1 -0
- data/lang/fr/rwdcore/languagefile.rb +1 -0
- data/lang/jp/rwdcore/languagefile.rb +1 -0
- data/lang/nl/rwdcore/languagefile.rb +1 -0
- data/{extras → lib}/rconftool.rb +13 -6
- data/{ev → lib/rwd}/browser.rb +2 -2
- data/{ev → lib/rwd}/ftools.rb +0 -0
- data/{ev → lib/rwd}/mime.rb +0 -0
- data/{ev → lib/rwd}/net.rb +18 -7
- data/{ev → lib/rwd}/ruby.rb +1 -1
- data/{ev → lib/rwd}/rwd.rb +108 -625
- data/{ev → lib/rwd}/sgml.rb +1 -1
- data/{ev → lib/rwd}/thread.rb +1 -1
- data/{ev → lib/rwd}/tree.rb +2 -2
- data/{ev → lib/rwd}/xml.rb +1 -1
- data/lib/rwdthemes/default.rwd +317 -0
- data/lib/rwdthemes/pda.rwd +72 -0
- data/lib/rwdthemes/windowslike.rwd +171 -0
- data/lib/rwdtinker/rwdtinkertools.rb +24 -0
- data/{extras → lib}/zip/ioextras.rb +0 -0
- data/{extras → lib}/zip/stdrubyext.rb +0 -0
- data/{extras → lib}/zip/tempfile_bugfixed.rb +0 -0
- data/{extras → lib}/zip/zip.rb +2 -2
- data/{extras → lib}/zip/zipfilesystem.rb +0 -0
- data/{extras → lib}/zip/ziprequire.rb +0 -0
- data/rwd_files/Books/marip10.lnk +6 -0
- data/{Books → rwd_files/Books}/marip10.txt +0 -0
- data/{Books → rwd_files/Books}/shannon1948.html +0 -0
- data/{Books/Shannon.gut → rwd_files/Books/shannon1948.lnk} +1 -1
- data/rwd_files/Books/shannon1948.txt +2667 -0
- data/rwd_files/HowTo_Gutenberg.txt +21 -1
- data/rwd_files/HowTo_Tinker.txt +58 -1
- data/rwd_files/log/rwdtinker.log +2082 -0
- data/{code/superant.com.gutenberg/helptexthashrwdgutenberg.rb → rwd_files/rwdgutenberghelpfiles.txt} +26 -19
- data/rwdconfig.dist +14 -13
- data/tests/makedist-rwdwgutenberg.rb +9 -7
- data/tests/makedist.rb +2 -2
- data/zips/rwdwcalc-0.63.zip +0 -0
- data/zips/rwdwfoldeditor-0.05.zip +0 -0
- data/zips/rwdwgutenberg-0.12.zip +0 -0
- data/zips/rwdwruby-1.08.zip +0 -0
- data/zips/wrubyslippers-1.07.zip +0 -0
- metadata +74 -59
- data/Books/Mariposa.gut +0 -6
- data/code/superant.com.gutenberg/rwdhypernotehelpabout.rb +0 -14
- data/code/superant.com.rwdtinkerbackwindow/installapplet.rb +0 -27
- data/configuration/language.dist +0 -8
- data/configuration/rwdapplicationidentity.dist +0 -3
- data/configuration/rwdwgutenberg-0.09.dist +0 -20
- data/gui/tinkerbackwindows/superant.com.gutenberg/36displaytab.rwd +0 -15
- data/gui/tinkerbackwindows/superant.com.gutenberg/40rwdgutenberg.rwd +0 -16
- data/gui/tinkerbackwindows/superant.com.gutenberg/40rwdgutenberghtml.rwd +0 -16
- data/lib/temp.rb +0 -1
@@ -0,0 +1,24 @@
|
|
1
|
+
|
2
|
+
|
3
|
+
module RwdtinkerTools
|
4
|
+
|
5
|
+
# tools to use in rwdtinker
|
6
|
+
|
7
|
+
def RwdtinkerTools.tail(filename, lines=12)
|
8
|
+
|
9
|
+
|
10
|
+
begin
|
11
|
+
tmpFile = File.open(filename, 'r')
|
12
|
+
|
13
|
+
return tmpFile.readlines.reverse!.slice(0,lines)
|
14
|
+
|
15
|
+
tmpFile.close
|
16
|
+
rescue
|
17
|
+
return "error in opening log"
|
18
|
+
$rwdtinkerlog.error "RwdtinkerTools.tail: file open error"
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
end
|
23
|
+
|
24
|
+
|
File without changes
|
File without changes
|
File without changes
|
data/{extras → lib}/zip/zip.rb
RENAMED
@@ -4,8 +4,8 @@ require 'singleton'
|
|
4
4
|
require 'tempfile'
|
5
5
|
require 'ftools'
|
6
6
|
require 'zlib'
|
7
|
-
require '
|
8
|
-
require '
|
7
|
+
require 'lib/zip/stdrubyext'
|
8
|
+
require 'lib/zip/ioextras'
|
9
9
|
|
10
10
|
if Tempfile.superclass == SimpleDelegator
|
11
11
|
require 'zip/tempfile_bugfixed'
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
@@ -0,0 +1,2667 @@
|
|
1
|
+
Reprinted with corrections from The Bell System Technical Journal,
|
2
|
+
Vol. 27, pp. 379�423, 623�656, July, October, 1948.
|
3
|
+
A Mathematical Theory of Communication
|
4
|
+
By C. E. SHANNON
|
5
|
+
INTRODUCTION The recent development of various methods of modulation such as
|
6
|
+
PCM and PPM which exchange bandwidth for signal-to-noise ratio has intensified
|
7
|
+
the interest in a general theory of communication. A T basis for such a theory
|
8
|
+
is contained in the important papers of Nyquist1 and Hartley2 on this subject.
|
9
|
+
In thepresent paper we will extend the theory to include a number of new
|
10
|
+
factors, in particular the effect of noisein the channel, and the savings
|
11
|
+
possible due to the statistical structure of the original message and due to
|
12
|
+
thenature of the final destination of the information. The fundamental problem
|
13
|
+
of communication is that of reproducing at one point either exactly or ap-
|
14
|
+
proximately a message selected at another point. Frequently the messages have
|
15
|
+
meaning; that is they referto or are correlated according to some system with
|
16
|
+
certain physical or conceptual entities. These semanticaspects of communication
|
17
|
+
are irrelevant to the engineering problem. The significant aspect is that the
|
18
|
+
actualmessage is one selected from a setof possible messages. The system must
|
19
|
+
be designed to operate for eachpossible selection, not just the one which will
|
20
|
+
actually be chosen since this is unknown at the time of design. If the number
|
21
|
+
of messages in the set is finite then this number or any monotonic function of
|
22
|
+
this number can be regarded as a measure of the information produced when one
|
23
|
+
message is chosen from the set, allchoices being equally likely. As was pointed
|
24
|
+
out by Hartley the most natural choice is the logarithmicfunction. Although
|
25
|
+
this definition must be generalized considerably when we consider the influence
|
26
|
+
of thestatistics of the message and when we have a continuous range of
|
27
|
+
messages, we will in all cases use anessentially logarithmic measure. The
|
28
|
+
logarithmic measure is more convenient for various reasons: 1. It is
|
29
|
+
practically more useful. Parameters of engineering importance such as time,
|
30
|
+
bandwidth, number of relays, etc., tend to vary linearly with the logarithm of
|
31
|
+
the number of possibilities. For example,adding one relay to a group doubles
|
32
|
+
the number of possible states of the relays. It adds 1 to the base 2logarithm
|
33
|
+
of this number. Doubling the time roughly squares the number of possible
|
34
|
+
messages, ordoubles the logarithm, etc. 2. It is nearer to our intuitive
|
35
|
+
feeling as to the proper measure. This is closely related to (1) since we in-
|
36
|
+
tuitively measures entities by linear comparison with common standards. One
|
37
|
+
feels, for example, thattwo punched cards should have twice the capacity of one
|
38
|
+
for information storage, and two identicalchannels twice the capacity of one
|
39
|
+
for transmitting information. 3. It is mathematically more suitable. Many of
|
40
|
+
the limiting operations are simple in terms of the loga- rithm but would
|
41
|
+
require clumsy restatement in terms of the number of possibilities. The choice
|
42
|
+
of a logarithmic base corresponds to the choice of a unit for measuring
|
43
|
+
information. If the base 2 is used the resulting units may be called binary
|
44
|
+
digits, or more briefly bits,a word suggested byJ. W. Tukey. A device with two
|
45
|
+
stable positions, such as a relay or a flip-flop circuit, can store one bit
|
46
|
+
ofinformation. Nsuch devices can store Nbits, since the total number of
|
47
|
+
possible states is 2Nand log2 2N N. = If the base 10 is used the units may be
|
48
|
+
called decimal digits. Since log2 M log log = 10 M= 10 2 3 32 log = : 10 M;
|
49
|
+
|
50
|
+
1Nyquist, H., "Certain Factors Affecting Telegraph Speed," Bell System
|
51
|
+
Technical Journal,April 1924, p. 324;
|
52
|
+
|
53
|
+
"Certain Topics in Telegraph Transmission Theory," A.I.E.E. Trans.,v. 47, April
|
54
|
+
1928, p. 617. 2Hartley, R. V. L., "Transmission of Information," Bell System
|
55
|
+
Technical Journal,July 1928, p. 535. 1
|
56
|
+
===============================================================================
|
57
|
+
INFORMATION SOURCE TRANSMITTER RECEIVER DESTINATION SIGNAL RECEIVED SIGNAL
|
58
|
+
MESSAGE MESSAGE NOISE SOURCE Fig. 1 -- Schematic diagram of a general
|
59
|
+
communication system. a decimal digit is about 3 1 bits. A digit wheel on a
|
60
|
+
desk computing machine has ten stable positions and 3 therefore has a storage
|
61
|
+
capacity of one decimal digit. In analytical work where integration and
|
62
|
+
differentiationare involved the base eis sometimes useful. The resulting units
|
63
|
+
of information will be called natural units.Change from the base ato base
|
64
|
+
bmerely requires multiplication by logb a. By a communication system we will
|
65
|
+
mean a system of the type indicated schematically in Fig. 1. It consists of
|
66
|
+
essentially five parts: 1. An information sourcewhich produces a message or
|
67
|
+
sequence of messages to be communicated to the receiving terminal. The message
|
68
|
+
may be of various types: (a) A sequence of letters as in a telegraphof teletype
|
69
|
+
system;
|
70
|
+
|
71
|
+
(b) A single function of time f tas in radio or telephony;
|
72
|
+
|
73
|
+
(c) A function of time and other variables as in black and white television -
|
74
|
+
- here the message may be thought of as afunction f x y tof two space
|
75
|
+
coordinates and time, the light intensity at point x yand time ton a ;
|
76
|
+
|
77
|
+
;
|
78
|
+
|
79
|
+
;
|
80
|
+
|
81
|
+
pickup tube plate;
|
82
|
+
|
83
|
+
(d) Two or more functions of time, say f t, g t, h t-- this is the case in
|
84
|
+
"three- dimensional" sound transmission or if the system is intended to service
|
85
|
+
several individual channels inmultiplex;
|
86
|
+
|
87
|
+
(e) Several functions of several variables -- in color television the message
|
88
|
+
consists of threefunctions f x y t, g x y t, h x y tdefined in a three-
|
89
|
+
dimensional continuum -- we may also think ;
|
90
|
+
|
91
|
+
;
|
92
|
+
|
93
|
+
;
|
94
|
+
|
95
|
+
;
|
96
|
+
|
97
|
+
;
|
98
|
+
|
99
|
+
;
|
100
|
+
|
101
|
+
of these three functions as components of a vector field defined in the region
|
102
|
+
-- similarly, severalblack and white television sources would produce
|
103
|
+
"messages" consisting of a number of functionsof three variables;
|
104
|
+
|
105
|
+
(f) Various combinations also occur, for example in television with an
|
106
|
+
associatedaudio channel. 2. A transmitterwhich operates on the message in some
|
107
|
+
way to produce a signal suitable for trans- mission over the channel. In
|
108
|
+
telephony this operation consists merely of changing sound pressureinto a
|
109
|
+
proportional electrical current. In telegraphy we have an encoding operation
|
110
|
+
which producesa sequence of dots, dashes and spaces on the channel
|
111
|
+
corresponding to the message. In a multiplexPCM system the different speech
|
112
|
+
functions must be sampled, compressed, quantized and encoded,and finally
|
113
|
+
interleaved properly to construct the signal. Vocoder systems, television and
|
114
|
+
frequencymodulation are other examples of complex operations applied to the
|
115
|
+
message to obtain the signal. 3. The channelis merely the medium used to
|
116
|
+
transmit the signal from transmitter to receiver. It may be a pair of wires, a
|
117
|
+
coaxial cable, a band of radio frequencies, a beam of light, etc. 4. The
|
118
|
+
receiverordinarily performs the inverse operation of that done by the
|
119
|
+
transmitter, reconstructing the message from the signal. 5. The destinationis
|
120
|
+
the person (or thing) for whom the message is intended. We wish to consider
|
121
|
+
certain general problems involving communication systems. To do this it is
|
122
|
+
first necessary to represent the various elements involved as mathematical
|
123
|
+
entities, suitably idealized from their 2
|
124
|
+
===============================================================================
|
125
|
+
physical counterparts. We may roughly classify communication systems into three
|
126
|
+
main categories: discrete,continuous and mixed. By a discrete system we will
|
127
|
+
mean one in which both the message and the signalare a sequence of discrete
|
128
|
+
symbols. A typical case is telegraphy where the message is a sequence of
|
129
|
+
lettersand the signal a sequence of dots, dashes and spaces. A continuous
|
130
|
+
system is one in which the message andsignal are both treated as continuous
|
131
|
+
functions, e.g., radio or television. A mixed system is one in whichboth
|
132
|
+
discrete and continuous variables appear, e.g., PCM transmission of speech. We
|
133
|
+
first consider the discrete case. This case has applications not only in
|
134
|
+
communication theory, but also in the theory of computing machines, the design
|
135
|
+
of telephone exchanges and other fields. In additionthe discrete case forms a
|
136
|
+
foundation for the continuous and mixed cases which will be treated in the
|
137
|
+
secondhalf of the paper. PART I: DISCRETE NOISELESS SYSTEMS 1. THE DISCRETE
|
138
|
+
NOISELESS CHANNEL Teletype and telegraphy are two simple examples of a discrete
|
139
|
+
channel for transmitting information. Gen-erally, a discrete channel will mean
|
140
|
+
a system whereby a sequence of choices from a finite set of elementarysymbols
|
141
|
+
S1 Sncan be transmitted from one point to another. Each of the symbols Siis
|
142
|
+
assumed to have ;
|
143
|
+
|
144
|
+
: : : ;
|
145
|
+
|
146
|
+
a certain duration in time tiseconds (not necessarily the same for different
|
147
|
+
Si, for example the dots anddashes in telegraphy). It is not required that all
|
148
|
+
possible sequences of the Sibe capable of transmission onthe system;
|
149
|
+
|
150
|
+
certain sequences only may be allowed. These will be possible signals for the
|
151
|
+
channel. Thusin telegraphy suppose the symbols are: (1) A dot, consisting of
|
152
|
+
line closure for a unit of time and then lineopen for a unit of time;
|
153
|
+
|
154
|
+
(2) A dash, consisting of three time units of closure and one unit open;
|
155
|
+
|
156
|
+
(3) A letterspace consisting of, say, three units of line open;
|
157
|
+
|
158
|
+
(4) A word space of six units of line open. We might placethe restriction on
|
159
|
+
allowable sequences that no spaces follow each other (for if two letter spaces
|
160
|
+
are adjacent,it is identical with a word space). The question we now consider
|
161
|
+
is how one can measure the capacity ofsuch a channel to transmit information.
|
162
|
+
In the teletype case where all symbols are of the same duration, and any
|
163
|
+
sequence of the 32 symbols is allowed the answer is easy. Each symbol
|
164
|
+
represents five bits of information. If the system transmits nsymbols per
|
165
|
+
second it is natural to say that the channel has a capacity of 5nbits per
|
166
|
+
second. This does notmean that the teletype channel will always be transmitting
|
167
|
+
information at this rate -- this is the maximumpossible rate and whether or not
|
168
|
+
the actual rate reaches this maximum depends on the source of informationwhich
|
169
|
+
feeds the channel, as will appear later. In the more general case with
|
170
|
+
different lengths of symbols and constraints on the allowed sequences, we make
|
171
|
+
the following definition:Definition: The capacity Cof a discrete channel is
|
172
|
+
given by log N T C Lim = T T ! where N Tis the number of allowed signals of
|
173
|
+
duration T. It is easily seen that in the teletype case this reduces to the
|
174
|
+
previous result. It can be shown that the limit in question will exist as a
|
175
|
+
finite number in most cases of interest. Suppose all sequences of the symbolsS1
|
176
|
+
Snare allowed and these symbols have durations t1 tn. What is the channel
|
177
|
+
capacity? If N t ;
|
178
|
+
|
179
|
+
: : : ;
|
180
|
+
|
181
|
+
;
|
182
|
+
|
183
|
+
: : : ;
|
184
|
+
|
185
|
+
represents the number of sequences of duration twe have N t N t t1 N t t2 N t
|
186
|
+
tn = , + , + + , : The total number is equal to the sum of the numbers of
|
187
|
+
sequences ending in S1 S2 Snand these are ;
|
188
|
+
|
189
|
+
;
|
190
|
+
|
191
|
+
: : : ;
|
192
|
+
|
193
|
+
N t t1 N t t2 N t tn, respectively. According to a well-known result in finite
|
194
|
+
differences, N t , ;
|
195
|
+
|
196
|
+
, ;
|
197
|
+
|
198
|
+
: : : ;
|
199
|
+
|
200
|
+
, is then asymptotic for large tto Xtwhere X 0 0 is the largest real solution
|
201
|
+
of the characteristic equation: X t t tn , 1 X, 2 X, 1 + + + = 3
|
202
|
+
===============================================================================
|
203
|
+
and therefore C log X0 = : In case there are restrictions on allowed sequences
|
204
|
+
we may still often obtain a difference equation of this type and find Cfrom the
|
205
|
+
characteristic equation. In the telegraphy case mentioned above N t N t 2 N t 4
|
206
|
+
N t 5 N t 7 N t 8 N t 10 = , + , + , + , + , + , as we see by counting
|
207
|
+
sequences of symbols according to the last or next to the last symbol
|
208
|
+
occurring.Hence Cis
|
209
|
+
log
|
210
|
+
2
|
211
|
+
4
|
212
|
+
5
|
213
|
+
7
|
214
|
+
8
|
215
|
+
10
|
216
|
+
0 where 0 is the positive root of 1 . Solving this we find , = + + + + + C 0
|
217
|
+
539. = : A very general type of restriction which may be placed on allowed
|
218
|
+
sequences is the following: We imagine a number of possible states a1 a2 am.
|
219
|
+
For each state only certain symbols from the set S1 Sn ;
|
220
|
+
|
221
|
+
;
|
222
|
+
|
223
|
+
: : : ;
|
224
|
+
|
225
|
+
;
|
226
|
+
|
227
|
+
: : : ;
|
228
|
+
|
229
|
+
can be transmitted (different subsets for the different states). When one of
|
230
|
+
these has been transmitted thestate changes to a new state depending both on
|
231
|
+
the old state and the particular symbol transmitted. Thetelegraph case is a
|
232
|
+
simple example of this. There are two states depending on whether or not a
|
233
|
+
space wasthe last symbol transmitted. If so, then only a dot or a dash can be
|
234
|
+
sent next and the state always changes.If not, any symbol can be transmitted
|
235
|
+
and the state changes if a space is sent, otherwise it remains the same.The
|
236
|
+
conditions can be indicated in a linear graph as shown in Fig. 2. The junction
|
237
|
+
points correspond to the DASH DOT DOT LETTER SPACE DASH WORD SPACE Fig. 2 -
|
238
|
+
- Graphical representation of the constraints on telegraph symbols. states and
|
239
|
+
the lines indicate the symbols possible in a state and the resulting state. In
|
240
|
+
Appendix 1 it is shownthat if the conditions on allowed sequences can be
|
241
|
+
described in this form Cwill exist and can be calculatedin accordance with the
|
242
|
+
following result: s Theorem 1:Let b be the duration of the sth symbol which is
|
243
|
+
allowable in state iand leads to state j. i j Then the channel capacity Cis
|
244
|
+
equal to logWwhere Wis the largest real root of the determinant equation: s W b
|
245
|
+
, i j i j 0 , = s where i j 1 if i jand is zero otherwise. = = For example, in
|
246
|
+
the telegraph case (Fig. 2) the determinant is: 1 W2 4 , W, , + 0 W3 6 2 4 = :
|
247
|
+
, W, W, W, 1 + + , On expansion this leads to the equation given above for this
|
248
|
+
case. 2. THE DISCRETE SOURCE OF INFORMATION We have seen that under very
|
249
|
+
general conditions the logarithm of the number of possible signals in a
|
250
|
+
discretechannel increases linearly with time. The capacity to transmit
|
251
|
+
information can be specified by giving thisrate of increase, the number of bits
|
252
|
+
per second required to specify the particular signal used. We now consider the
|
253
|
+
information source. How is an information source to be described
|
254
|
+
mathematically, and how much information in bits per second is produced in a
|
255
|
+
given source? The main point at issue is theeffect of statistical knowledge
|
256
|
+
about the source in reducing the required capacity of the channel, by the use 4
|
257
|
+
===============================================================================
|
258
|
+
of proper encoding of the information. In telegraphy, for example, the messages
|
259
|
+
to be transmitted consist ofsequences of letters. These sequences, however, are
|
260
|
+
not completely random. In general, they form sentencesand have the statistical
|
261
|
+
structure of, say, English. The letter E occurs more frequently than Q, the
|
262
|
+
sequenceTH more frequently than XP, etc. The existence of this structure allows
|
263
|
+
one to make a saving in time (orchannel capacity) by properly encoding the
|
264
|
+
message sequences into signal sequences. This is already doneto a limited
|
265
|
+
extent in telegraphy by using the shortest channel symbol, a dot, for the most
|
266
|
+
common Englishletter E;
|
267
|
+
|
268
|
+
while the infrequent letters, Q, X, Z are represented by longer sequences of
|
269
|
+
dots and dashes. Thisidea is carried still further in certain commercial codes
|
270
|
+
where common words and phrases are representedby four- or five-letter code
|
271
|
+
groups with a considerable saving in average time. The standardized greetingand
|
272
|
+
anniversary telegrams now in use extend this to the point of encoding a
|
273
|
+
sentence or two into a relativelyshort sequence of numbers. We can think of a
|
274
|
+
discrete source as generating the message, symbol by symbol. It will choose
|
275
|
+
succes- sive symbols according to certain probabilities depending, in general,
|
276
|
+
on preceding choices as well as theparticular symbols in question. A physical
|
277
|
+
system, or a mathematical model of a system which producessuch a sequence of
|
278
|
+
symbols governed by a set of probabilities, is known as a stochastic process.3
|
279
|
+
We mayconsider a discrete source, therefore, to be represented by a stochastic
|
280
|
+
process. Conversely, any stochasticprocess which produces a discrete sequence
|
281
|
+
of symbols chosen from a finite set may be considered a discretesource. This
|
282
|
+
will include such cases as: 1. Natural written languages such as English,
|
283
|
+
German, Chinese. 2. Continuous information sources that have been rendered
|
284
|
+
discrete by some quantizing process. For example, the quantized speech from a
|
285
|
+
PCM transmitter, or a quantized television signal. 3. Mathematical cases where
|
286
|
+
we merely define abstractly a stochastic process which generates a se- quence
|
287
|
+
of symbols. The following are examples of this last type of source. (A) Suppose
|
288
|
+
we have five letters A, B, C, D, E which are chosen each with probability .2,
|
289
|
+
successive choices being independent. This would lead to a sequence of which
|
290
|
+
the following is a typicalexample. B D C B C E C C C A D C B D D A A E C E E AA
|
291
|
+
B B D A E E C A C E E B A E E C B C E A D. This was constructed with the use of
|
292
|
+
a table of random numbers.4 (B) Using the same five letters let the
|
293
|
+
probabilities be .4, .1, .2, .2, .1, respectively, with successive choices
|
294
|
+
independent. A typical message from this source is then: A A A C D C B D C E A
|
295
|
+
A D A D A C E D AE A D C A B E D A D D C E C A A A A A D. (C) A more
|
296
|
+
complicated structure is obtained if successive symbols are not chosen
|
297
|
+
independently but their probabilities depend on preceding letters. In the
|
298
|
+
simplest case of this type a choicedepends only on the preceding letter and not
|
299
|
+
on ones before that. The statistical structure canthen be described by a set of
|
300
|
+
transition probabilities pi j, the probability that letter iis followed by
|
301
|
+
letter j. The indices iand jrange over all the possible symbols. A second
|
302
|
+
equivalent way ofspecifying the structure is to give the "digram" probabilities
|
303
|
+
p i j, i.e., the relative frequency of ;
|
304
|
+
|
305
|
+
the digram i j. The letter frequencies p i, (the probability of letter i), the
|
306
|
+
transition probabilities 3See, for example, S. Chandrasekhar, "Stochastic
|
307
|
+
Problems in Physics and Astronomy," Reviews of Modern Physics, v. 15, No. 1,
|
308
|
+
January 1943, p. 1. 4Kendall and Smith, Tables of Random Sampling
|
309
|
+
Numbers,Cambridge, 1939. 5
|
310
|
+
===============================================================================
|
311
|
+
pi jand the digram probabilities p i jare related by the following formulas: ;
|
312
|
+
|
313
|
+
p i p i jp j ip j pj i = ;
|
314
|
+
|
315
|
+
= ;
|
316
|
+
|
317
|
+
= j j j p i j p i pi j ;
|
318
|
+
|
319
|
+
= pi jp ip i j1 = = ;
|
320
|
+
|
321
|
+
= : j i i j ;
|
322
|
+
|
323
|
+
As a specific example suppose there are three letters A, B, C with the
|
324
|
+
probability tables: pi j j i p i p i j j ;
|
325
|
+
|
326
|
+
A B C A B C A 0 4 1 A 9 A 0 4 1 5 5 27 15 15 i B 1 1 0 B 16 i B 8 8 0 2 2 27 27
|
327
|
+
27 C 1 2 1 C 2 C 1 4 1 2 5 10 27 27 135 135 A typical message from this source
|
328
|
+
is the following: A B B A B A B A B A B A B A B B B A B B B B B A B A B A B A B
|
329
|
+
A B B B A C A C A BB A B B B B A B B A B A C B B B A B A. The next increase in
|
330
|
+
complexity would involve trigram frequencies but no more. The choice ofa letter
|
331
|
+
would depend on the preceding two letters but not on the message before that
|
332
|
+
point. Aset of trigram frequencies p i j kor equivalently a set of transition
|
333
|
+
probabilities pi j kwould ;
|
334
|
+
|
335
|
+
;
|
336
|
+
|
337
|
+
be required. Continuing in this way one obtains successively more complicated
|
338
|
+
stochastic pro-cesses. In the general n-gram case a set of n-gram probabilities
|
339
|
+
p i1 i2 inor of transition ;
|
340
|
+
|
341
|
+
;
|
342
|
+
|
343
|
+
: : : ;
|
344
|
+
|
345
|
+
probabilities pi i is required to specify the statistical structure. 1 i i n ;
|
346
|
+
|
347
|
+
2;
|
348
|
+
|
349
|
+
:::;
|
350
|
+
|
351
|
+
n1 , (D) Stochastic processes can also be defined which produce a text
|
352
|
+
consisting of a sequence of "words." Suppose there are five letters A, B, C, D,
|
353
|
+
E and 16 "words" in the language withassociated probabilities: .10 A .16 BEBE
|
354
|
+
.11 CABED .04 DEB .04 ADEB .04 BED .05 CEED .15 DEED .05 ADEE .02 BEED .08 DAB
|
355
|
+
.01 EAB .01 BADD .05 CA .04 DAD .05 EE Suppose successive "words" are chosen
|
356
|
+
independently and are separated by a space. A typicalmessage might be: DAB EE A
|
357
|
+
BEBE DEED DEB ADEE ADEE EE DEB BEBE BEBE BEBE ADEE BED DEEDDEED CEED ADEE A
|
358
|
+
DEED DEED BEBE CABED BEBE BED DAB DEED ADEB. If all the words are of finite
|
359
|
+
length this process is equivalent to one of the preceding type, butthe
|
360
|
+
description may be simpler in terms of the word structure and probabilities. We
|
361
|
+
may alsogeneralize here and introduce transition probabilities between words,
|
362
|
+
etc. These artificial languages are useful in constructing simple problems and
|
363
|
+
examples to illustrate vari- ous possibilities. We can also approximate to a
|
364
|
+
natural language by means of a series of simple artificiallanguages. The zero-
|
365
|
+
order approximation is obtained by choosing all letters with the same
|
366
|
+
probability andindependently. The first-order approximation is obtained by
|
367
|
+
choosing successive letters independently buteach letter having the same
|
368
|
+
probability that it has in the natural language.5 Thus, in the first-order ap-
|
369
|
+
proximation to English, E is chosen with probability .12 (its frequency in
|
370
|
+
normal English) and W withprobability .02, but there is no influence between
|
371
|
+
adjacent letters and no tendency to form the preferred 5Letter, digram and
|
372
|
+
trigram frequencies are given in Secret and Urgentby Fletcher Pratt, Blue
|
373
|
+
Ribbon Books, 1939. Word frequen- cies are tabulated in Relative Frequency of
|
374
|
+
English Speech Sounds,G. Dewey, Harvard University Press, 1923. 6
|
375
|
+
===============================================================================
|
376
|
+
digrams such as TH, ED, etc. In the second-order approximation, digram
|
377
|
+
structure is introduced. After aletter is chosen, the next one is chosen in
|
378
|
+
accordance with the frequencies with which the various lettersfollow the first
|
379
|
+
one. This requires a table of digram frequencies pi j. In the third-order
|
380
|
+
approximation, trigram structure is introduced. Each letter is chosen with
|
381
|
+
probabilities which depend on the preceding twoletters. 3. THE SERIES OF
|
382
|
+
APPROXIMATIONS TO ENGLISH To give a visual idea of how this series of processes
|
383
|
+
approaches a language, typical sequences in the approx-imations to English have
|
384
|
+
been constructed and are given below. In all cases we have assumed a 27-
|
385
|
+
symbol"alphabet," the 26 letters and a space. 1. Zero-order approximation
|
386
|
+
(symbols independent and equiprobable). XFOML RXKHRJFFJUJ ZLPWCFWKCYJ
|
387
|
+
FFJEYVKCQSGHYD QPAAMKBZAACIBZL-HJQD. 2. First-order approximation (symbols
|
388
|
+
independent but with frequencies of English text). OCRO HLI RGWR NMIELWIS EU LL
|
389
|
+
NBNESEBYA TH EEI ALHENHTTPA OOBTTVANAH BRL. 3. Second-order approximation
|
390
|
+
(digram structure as in English). ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY
|
391
|
+
ACHIN D ILONASIVE TU-COOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE. 4.
|
392
|
+
Third-order approximation (trigram structure as in English). IN NO IST LAT WHEY
|
393
|
+
CRATICT FROURE BIRS GROCID PONDENOME OF DEMONS-TURES OF THE REPTAGIN IS
|
394
|
+
REGOACTIONA OF CRE. 5. First-order word approximation. Rather than continue
|
395
|
+
with tetragram, , n-gram structure it is easier : : : and better to jump at
|
396
|
+
this point to word units. Here words are chosen independently but with
|
397
|
+
theirappropriate frequencies. REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME
|
398
|
+
CAN DIFFERENT NAT-URAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO
|
399
|
+
FURNISHESTHE LINE MESSAGE HAD BE THESE. 6. Second-order word approximation. The
|
400
|
+
word transition probabilities are correct but no further struc- ture is
|
401
|
+
included. THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHAR-
|
402
|
+
ACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THATTHE TIME OF
|
403
|
+
WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED. The resemblance to ordinary
|
404
|
+
English text increases quite noticeably at each of the above steps. Note that
|
405
|
+
these samples have reasonably good structure out to about twice the range that
|
406
|
+
is taken into account in theirconstruction. Thus in (3) the statistical process
|
407
|
+
insures reasonable text for two-letter sequences, but four-letter sequences
|
408
|
+
from the sample can usually be fitted into good sentences. In (6) sequences of
|
409
|
+
four or morewords can easily be placed in sentences without unusual or strained
|
410
|
+
constructions. The particular sequenceof ten words "attack on an English writer
|
411
|
+
that the character of this" is not at all unreasonable. It appears thenthat a
|
412
|
+
sufficiently complex stochastic process will give a satisfactory representation
|
413
|
+
of a discrete source. The first two samples were constructed by the use of a
|
414
|
+
book of random numbers in conjunction with (for example 2) a table of letter
|
415
|
+
frequencies. This method might have been continued for (3), (4) and (5),since
|
416
|
+
digram, trigram and word frequency tables are available, but a simpler
|
417
|
+
equivalent method was used. 7
|
418
|
+
===============================================================================
|
419
|
+
To construct (3) for example, one opens a book at random and selects a letter
|
420
|
+
at random on the page. Thisletter is recorded. The book is then opened to
|
421
|
+
another page and one reads until this letter is encountered.The succeeding
|
422
|
+
letter is then recorded. Turning to another page this second letter is searched
|
423
|
+
for and thesucceeding letter recorded, etc. A similar process was used for (4),
|
424
|
+
(5) and (6). It would be interesting iffurther approximations could be
|
425
|
+
constructed, but the labor involved becomes enormous at the next stage. 4.
|
426
|
+
GRAPHICAL REPRESENTATION OF A MARKOFF PROCESS Stochastic processes of the type
|
427
|
+
described above are known mathematically as discrete Markoff processesand have
|
428
|
+
been extensively studied in the literature.6 The general case can be described
|
429
|
+
as follows: Thereexist a finite number of possible "states" of a system;
|
430
|
+
|
431
|
+
S1 S2 Sn. In addition there is a set of transition ;
|
432
|
+
|
433
|
+
;
|
434
|
+
|
435
|
+
: : : ;
|
436
|
+
|
437
|
+
probabilities;
|
438
|
+
|
439
|
+
pi jthe probability that if the system is in state Siit will next go to state S
|
440
|
+
j. To make this Markoff process into an information source we need only assume
|
441
|
+
that a letter is produced for each transitionfrom one state to another. The
|
442
|
+
states will correspond to the "residue of influence" from preceding letters.
|
443
|
+
The situation can be represented graphically as shown in Figs. 3, 4 and 5. The
|
444
|
+
"states" are the junction A .1 .4 B E .2 .1 C D .2 Fig. 3 -- A graph
|
445
|
+
corresponding to the source in example B. points in the graph and the
|
446
|
+
probabilities and letters produced for a transition are given beside the
|
447
|
+
correspond-ing line. Figure 3 is for the example B in Section 2, while Fig. 4
|
448
|
+
corresponds to the example C. In Fig. 3 B C A A .8 .2 .5 .5 B C B .4 .5 .1 Fig.
|
449
|
+
4 -- A graph corresponding to the source in example C. there is only one state
|
450
|
+
since successive letters are independent. In Fig. 4 there are as many states as
|
451
|
+
letters.If a trigram example were constructed there would be at most n2 states
|
452
|
+
corresponding to the possible pairsof letters preceding the one being chosen.
|
453
|
+
Figure 5 is a graph for the case of word structure in example D.Here S
|
454
|
+
corresponds to the "space" symbol. 5. ERGODIC AND MIXED SOURCES As we have
|
455
|
+
indicated above a discrete source for our purposes can be considered to be
|
456
|
+
represented by aMarkoff process. Among the possible discrete Markoff processes
|
457
|
+
there is a group with special propertiesof significance in communication
|
458
|
+
theory. This special class consists of the "ergodic" processes and weshall call
|
459
|
+
the corresponding sources ergodic sources. Although a rigorous definition of an
|
460
|
+
ergodic process issomewhat involved, the general idea is simple. In an ergodic
|
461
|
+
process every sequence produced by the process 6For a detailed treatment see M.
|
462
|
+
Fr�echet, M�ethode des fonctions arbitraires. Th�eorie des �ev�enements en
|
463
|
+
cha^ine dans le cas d'un nombre fini d'�etats possibles. Paris, Gauthier-
|
464
|
+
Villars, 1938. 8
|
465
|
+
===============================================================================
|
466
|
+
is the same in statistical properties. Thus the letter frequencies, digram
|
467
|
+
frequencies, etc., obtained fromparticular sequences, will, as the lengths of
|
468
|
+
the sequences increase, approach definite limits independentof the particular
|
469
|
+
sequence. Actually this is not true of every sequence but the set for which it
|
470
|
+
is false hasprobability zero. Roughly the ergodic property means statistical
|
471
|
+
homogeneity. All the examples of artificial languages given above are ergodic.
|
472
|
+
This property is related to the structure of the corresponding graph. If the
|
473
|
+
graph has the following two properties7 the corresponding process willbe
|
474
|
+
ergodic: 1. The graph does not consist of two isolated parts A and B such that
|
475
|
+
it is impossible to go from junction points in part A to junction points in
|
476
|
+
part B along lines of the graph in the direction of arrows and alsoimpossible
|
477
|
+
to go from junctions in part B to junctions in part A. 2. A closed series of
|
478
|
+
lines in the graph with all arrows on the lines pointing in the same
|
479
|
+
orientation will be called a "circuit." The "length" of a circuit is the number
|
480
|
+
of lines in it. Thus in Fig. 5 series BEBESis a circuit of length 5. The second
|
481
|
+
property required is that the greatest common divisor of the lengthsof all
|
482
|
+
circuits in the graph be one. D E B E S A B E E D A B D E S B D E C A E E B B D
|
483
|
+
E A D B E E A S Fig. 5 -- A graph corresponding to the source in example D. If
|
484
|
+
the first condition is satisfied but the second one violated by having the
|
485
|
+
greatest common divisor equal to d 1, the sequences have a certain type of
|
486
|
+
periodic structure. The various sequences fall into ddifferent classes which
|
487
|
+
are statistically the same apart from a shift of the origin (i.e., which letter
|
488
|
+
in the sequence iscalled letter 1). By a shift of from 0 up to d 1 any sequence
|
489
|
+
can be made statistically equivalent to any , other. A simple example with d 2
|
490
|
+
is the following: There are three possible letters a b c. Letter ais = ;
|
491
|
+
|
492
|
+
;
|
493
|
+
|
494
|
+
followed with either bor cwith probabilities 1 and 2 respectively. Either bor
|
495
|
+
cis always followed by letter 3 3 a. Thus a typical sequence is a b a c a c a c
|
496
|
+
a b a c a b a b a c a c: This type of situation is not of much importance for
|
497
|
+
our work. If the first condition is violated the graph may be separated into a
|
498
|
+
set of subgraphs each of which satisfies the first condition. We will assume
|
499
|
+
that the second condition is also satisfied for each subgraph. We have inthis
|
500
|
+
case what may be called a "mixed" source made up of a number of pure
|
501
|
+
components. The componentscorrespond to the various subgraphs. If L1, L2, L3
|
502
|
+
are the component sources we may write ;
|
503
|
+
|
504
|
+
: : : L p1L1 p2L2 p3L3 = + + + 7These are restatements in terms of the graph of
|
505
|
+
conditions given in Fr�echet. 9
|
506
|
+
===============================================================================
|
507
|
+
where piis the probability of the component source Li. Physically the situation
|
508
|
+
represented is this: There are several different sources L1, L2, L3 which are ;
|
509
|
+
|
510
|
+
: : : each of homogeneous statistical structure (i.e., they are ergodic). We do
|
511
|
+
not know a prioriwhich is to beused, but once the sequence starts in a given
|
512
|
+
pure component Li, it continues indefinitely according to thestatistical
|
513
|
+
structure of that component. As an example one may take two of the processes
|
514
|
+
defined above and assume p1 2 and p2 8. A = : = : sequence from the mixed
|
515
|
+
source L 2L1 8L2 = : + : would be obtained by choosing first L1 or L2 with
|
516
|
+
probabilities .2 and .8 and after this choice generating asequence from
|
517
|
+
whichever was chosen. Except when the contrary is stated we shall assume a
|
518
|
+
source to be ergodic. This assumption enables one to identify averages along a
|
519
|
+
sequence with averages over the ensemble of possible sequences (the
|
520
|
+
probabilityof a discrepancy being zero). For example the relative frequency of
|
521
|
+
the letter A in a particular infinitesequence will be, with probability one,
|
522
|
+
equal to its relative frequency in the ensemble of sequences. If Piis the
|
523
|
+
probability of state iand pi jthe transition probability to state j, then for
|
524
|
+
the process to be stationary it is clear that the Pimust satisfy equilibrium
|
525
|
+
conditions: Pj Pipi j = : i In the ergodic case it can be shown that with any
|
526
|
+
starting conditions the probabilities Pj Nof being in state jafter Nsymbols,
|
527
|
+
approach the equilibrium values as N . ! 6. CHOICE, UNCERTAINTY AND ENTROPY We
|
528
|
+
have represented a discrete information source as a Markoff process. Can we
|
529
|
+
define a quantity whichwill measure, in some sense, how much information is
|
530
|
+
"produced" by such a process, or better, at what rateinformation is produced?
|
531
|
+
Suppose we have a set of possible events whose probabilities of occurrence are
|
532
|
+
p1 p2 pn. These ;
|
533
|
+
|
534
|
+
;
|
535
|
+
|
536
|
+
: : : ;
|
537
|
+
|
538
|
+
probabilities are known but that is all we know concerning which event will
|
539
|
+
occur. Can we find a measureof how much "choice" is involved in the selection
|
540
|
+
of the event or of how uncertain we are of the outcome? If there is such a
|
541
|
+
measure, say H p1 p2 pn, it is reasonable to require of it the following
|
542
|
+
properties: ;
|
543
|
+
|
544
|
+
;
|
545
|
+
|
546
|
+
: : : ;
|
547
|
+
|
548
|
+
1. Hshould be continuous in the pi. 2. If all the p 1 iare equal, pi , then
|
549
|
+
Hshould be a monotonic increasing function of n. With equally = n likely events
|
550
|
+
there is more choice, or uncertainty, when there are more possible events. 3.
|
551
|
+
If a choice be broken down into two successive choices, the original Hshould be
|
552
|
+
the weighted sum of the individual values of H. The meaning of this is
|
553
|
+
illustrated in Fig. 6. At the left we have three 1 2 1 2 1 2 1 3 2 3 1 3 1 2 1
|
554
|
+
6 1 3 1 6 Fig. 6 -- Decomposition of a choice from three possibilities.
|
555
|
+
possibilities p 1 1 1 1 , p2 , p3 . On the right we first choose between two
|
556
|
+
possibilities each with = 2 = 3 = 6 probability 1 , and if the second occurs
|
557
|
+
make another choice with probabilities 2 , 1 . The final results 2 3 3 have the
|
558
|
+
same probabilities as before. We require, in this special case, that H1 1 1 H1
|
559
|
+
1 1 H2 1 2 ;
|
560
|
+
|
561
|
+
3 ;
|
562
|
+
|
563
|
+
6 = 2 ;
|
564
|
+
|
565
|
+
2 + 2 3 ;
|
566
|
+
|
567
|
+
3 : The coefficient 1 is because this second choice only occurs half the time.
|
568
|
+
2 10
|
569
|
+
===============================================================================
|
570
|
+
In Appendix 2, the following result is established: Theorem 2:The only
|
571
|
+
Hsatisfying the three above assumptions is of the form: n H K pilog pi = , i1 =
|
572
|
+
where Kis a positive constant. This theorem, and the assumptions required for
|
573
|
+
its proof, are in no way necessary for the present theory. It is given chiefly
|
574
|
+
to lend a certain plausibility to some of our later definitions. The real
|
575
|
+
justification of thesedefinitions, however, will reside in their implications.
|
576
|
+
Quantities of the form H pilog pi(the constant Kmerely amounts to a choice of a
|
577
|
+
unit of measure) = , play a central role in information theory as measures of
|
578
|
+
information, choice and uncertainty. The form of Hwill be recognized as that of
|
579
|
+
entropy as defined in certain formulations of statistical mechanics8 where
|
580
|
+
piisthe probability of a system being in cell iof its phase space. His then,
|
581
|
+
for example, the Hin Boltzmann'sfamous Htheorem. We shall call H pilog pithe
|
582
|
+
entropy of the set of probabilities p1 pn. If xis a = , ;
|
583
|
+
|
584
|
+
: : : ;
|
585
|
+
|
586
|
+
chance variable we will write H xfor its entropy;
|
587
|
+
|
588
|
+
thus xis not an argument of a function but a label for a number, to
|
589
|
+
differentiate it from H ysay, the entropy of the chance variable y. The entropy
|
590
|
+
in the case of two possibilities with probabilities pand q 1 p, namely = , H
|
591
|
+
plog p qlogq = , + is plotted in Fig. 7 as a function of p. 1.0 .9 .8 .7 H BITS
|
592
|
+
.6 .5 .4 .3 .2 .1 0 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 p Fig. 7 -- Entropy in the
|
593
|
+
case of two possibilities with probabilities pand 1 p. , The quantity Hhas a
|
594
|
+
number of interesting properties which further substantiate it as a reasonable
|
595
|
+
measure of choice or information. 1. H 0 if and only if all the pibut one are
|
596
|
+
zero, this one having the value unity. Thus only when we = are certain of the
|
597
|
+
outcome does Hvanish. Otherwise His positive. 2. For a given n, His a maximum
|
598
|
+
and equal to log nwhen all the piare equal (i.e., 1 ). This is also n
|
599
|
+
intuitively the most uncertain situation. 8See, for example, R. C. Tolman,
|
600
|
+
Principles of Statistical Mechanics,Oxford, Clarendon, 1938. 11
|
601
|
+
===============================================================================
|
602
|
+
3. Suppose there are two events, xand y, in question with mpossibilities for
|
603
|
+
the first and nfor the second. Let p i jbe the probability of the joint
|
604
|
+
occurrence of ifor the first and jfor the second. The entropy of the ;
|
605
|
+
|
606
|
+
joint event is H x y p i jlog p i j ;
|
607
|
+
|
608
|
+
= , ;
|
609
|
+
|
610
|
+
;
|
611
|
+
|
612
|
+
i j ;
|
613
|
+
|
614
|
+
while H x p i jlogp i j = , ;
|
615
|
+
|
616
|
+
;
|
617
|
+
|
618
|
+
i j j ;
|
619
|
+
|
620
|
+
H y p i jlogp i j = , ;
|
621
|
+
|
622
|
+
;
|
623
|
+
|
624
|
+
: i j i ;
|
625
|
+
|
626
|
+
It is easily shown that H x y H x H y ;
|
627
|
+
|
628
|
+
+ with equality only if the events are independent (i.e., p i j p i p j). The
|
629
|
+
uncertainty of a joint event is ;
|
630
|
+
|
631
|
+
= less than or equal to the sum of the individual uncertainties. 4. Any change
|
632
|
+
toward equalization of the probabilities p1 p2 pnincreases H. Thus if p1 p2 and
|
633
|
+
;
|
634
|
+
|
635
|
+
;
|
636
|
+
|
637
|
+
: : : ;
|
638
|
+
|
639
|
+
we increase p1, decreasing p2 an equal amount so that p1 and p2 are more nearly
|
640
|
+
equal, then Hincreases.More generally, if we perform any "averaging" operation
|
641
|
+
on the piof the form p0 i ai j p j = j where i ai j 1, and all ai j 0, then
|
642
|
+
Hincreases (except in the special case where this transfor- = j ai j= mation
|
643
|
+
amounts to no more than a permutation of the p jwith Hof course remaining the
|
644
|
+
same). 5. Suppose there are two chance events xand yas in 3, not necessarily
|
645
|
+
independent. For any particular value ithat xcan assume there is a conditional
|
646
|
+
probability pi jthat yhas the value j. This is given by p i j ;
|
647
|
+
|
648
|
+
pi j = : j p i j ;
|
649
|
+
|
650
|
+
We define the conditional entropyof y, Hx yas the average of the entropy of
|
651
|
+
yfor each value of x, weighted according to the probability of getting that
|
652
|
+
particular x. That is Hx y p i jlogpi j = , ;
|
653
|
+
|
654
|
+
: i j ;
|
655
|
+
|
656
|
+
This quantity measures how uncertain we are of yon the average when we know x.
|
657
|
+
Substituting the value of pi jwe obtain Hx y p i jlog p i jp i jlogp i j = , ;
|
658
|
+
|
659
|
+
;
|
660
|
+
|
661
|
+
+ ;
|
662
|
+
|
663
|
+
;
|
664
|
+
|
665
|
+
i j i j j ;
|
666
|
+
|
667
|
+
;
|
668
|
+
|
669
|
+
H x y H x = ;
|
670
|
+
|
671
|
+
, or H x y H x Hx y ;
|
672
|
+
|
673
|
+
= + : The uncertainty (or entropy) of the joint event x yis the uncertainty of
|
674
|
+
xplus the uncertainty of ywhen xis ;
|
675
|
+
|
676
|
+
known. 6. From 3 and 5 we have H x H y H x y H x Hx y + ;
|
677
|
+
|
678
|
+
= + : Hence H y Hx y : The uncertainty of yis never increased by knowledge of
|
679
|
+
x. It will be decreased unless xand yare independentevents, in which case it is
|
680
|
+
not changed. 12
|
681
|
+
===============================================================================
|
682
|
+
7. THE ENTROPY OF AN INFORMATION SOURCE Consider a discrete source of the
|
683
|
+
finite state type considered above. For each possible state ithere will be aset
|
684
|
+
of probabilities pi jof producing the various possible symbols j. Thus there is
|
685
|
+
an entropy Hifor each state. The entropy of the source will be defined as the
|
686
|
+
average of these Hiweighted in accordance with theprobability of occurrence of
|
687
|
+
the states in question: H PiHi = iPipi jlogpi j = , : i j ;
|
688
|
+
|
689
|
+
This is the entropy of the source per symbol of text. If the Markoff process is
|
690
|
+
proceeding at a definite timerate there is also an entropy per second H0 fiHi =
|
691
|
+
i where fiis the average frequency (occurrences per second) of state i. Clearly
|
692
|
+
H0 mH = where mis the average number of symbols produced per second. Hor H0
|
693
|
+
measures the amount of informa-tion generated by the source per symbol or per
|
694
|
+
second. If the logarithmic base is 2, they will represent bitsper symbol or per
|
695
|
+
second. If successive symbols are independent then His simply pilog piwhere
|
696
|
+
piis the probability of sym- , bol i. Suppose in this case we consider a long
|
697
|
+
message of Nsymbols. It will contain with high probabilityabout p1Noccurrences
|
698
|
+
of the first symbol, p2Noccurrences of the second, etc. Hence the probability
|
699
|
+
of thisparticular message will be roughly p pp1N pp2N ppnN = 1 2 n or log p: N
|
700
|
+
pilog pi = i log p: NH = , log 1 p H = : = : N His thus approximately the
|
701
|
+
logarithm of the reciprocal probability of a typical long sequence divided by
|
702
|
+
thenumber of symbols in the sequence. The same result holds for any source.
|
703
|
+
Stated more precisely we have(see Appendix 3): Theorem 3:Given any 0 and 0, we
|
704
|
+
can find an N0 such that the sequences of any length N N0 fall into two
|
705
|
+
classes: 1. A set whose total probability is less than . 2. The remainder, all
|
706
|
+
of whose members have probabilities satisfying the inequality log p1 , H , : N
|
707
|
+
log p1 , In other words we are almost certain to have very close to Hwhen Nis
|
708
|
+
large. N A closely related result deals with the number of sequences of various
|
709
|
+
probabilities. Consider again the sequences of length Nand let them be arranged
|
710
|
+
in order of decreasing probability. We define n qto be the number we must take
|
711
|
+
from this set starting with the most probable one in order to accumulate a
|
712
|
+
totalprobability qfor those taken. 13
|
713
|
+
===============================================================================
|
714
|
+
Theorem 4: log n q Lim H = N N ! when qdoes not equal 0 or 1. We may interpret
|
715
|
+
log n qas the number of bits required to specify the sequence when we consider
|
716
|
+
only log n q the most probable sequences with a total probability q. Then is
|
717
|
+
the number of bits per symbol for N the specification. The theorem says that
|
718
|
+
for large Nthis will be independent of qand equal to H. The rateof growth of
|
719
|
+
the logarithm of the number of reasonably probable sequences is given by H,
|
720
|
+
regardless of ourinterpretation of "reasonably probable." Due to these results,
|
721
|
+
which are proved in Appendix 3, it is possiblefor most purposes to treat the
|
722
|
+
long sequences as though there were just 2HNof them, each with a probability2
|
723
|
+
HN , . The next two theorems show that Hand H0 can be determined by limiting
|
724
|
+
operations directly from the statistics of the message sequences, without
|
725
|
+
reference to the states and transition probabilities betweenstates. Theorem 5:
|
726
|
+
Let p Bibe the probability of a sequence Biof symbols from the source. Let 1 GN
|
727
|
+
p Bilogp Bi = , N i where the sum is over all sequences Bicontaining Nsymbols.
|
728
|
+
Then GNis a monotonic decreasing functionof Nand Lim GN H = : N ! Theorem 6:Let
|
729
|
+
p Bi S jbe the probability of sequence Bifollowed by symbol S jand pB S j ;
|
730
|
+
|
731
|
+
i = p Bi S j p Bibe the conditional probability of S jafter Bi. Let ;
|
732
|
+
|
733
|
+
= FN p Bi SjlogpB Sj = , ;
|
734
|
+
|
735
|
+
i i j ;
|
736
|
+
|
737
|
+
where the sum is over all blocks Biof N 1 symbols and over all symbols S j.
|
738
|
+
Then FNis a monotonic , decreasing function of N, FN NGN N 1 GN1 = , , ;
|
739
|
+
|
740
|
+
, 1 N GN Fn = ;
|
741
|
+
|
742
|
+
N n1 = FN GN ;
|
743
|
+
|
744
|
+
and LimN FN H. = ! These results are derived in Appendix 3. They show that a
|
745
|
+
series of approximations to Hcan be obtained by considering only the
|
746
|
+
statistical structure of the sequences extending over 1 2 Nsymbols. FNis the ;
|
747
|
+
|
748
|
+
;
|
749
|
+
|
750
|
+
: : : ;
|
751
|
+
|
752
|
+
better approximation. In fact FNis the entropy of the Nth order approximation
|
753
|
+
to the source of the typediscussed above. If there are no statistical
|
754
|
+
influences extending over more than Nsymbols, that is if theconditional
|
755
|
+
probability of the next symbol knowing the preceding N 1 is not changed by a
|
756
|
+
knowledge of , any before that, then FN H. FNof course is the conditional
|
757
|
+
entropy of the next symbol when the N 1 = , preceding ones are known, while
|
758
|
+
GNis the entropy per symbol of blocks of Nsymbols. The ratio of the entropy of
|
759
|
+
a source to the maximum value it could have while still restricted to the same
|
760
|
+
symbols will be called its relative entropy. This is the maximum compression
|
761
|
+
possible when we encode intothe same alphabet. One minus the relative entropy
|
762
|
+
is the redundancy. The redundancy of ordinary English,not considering
|
763
|
+
statistical structure over greater distances than about eight letters, is
|
764
|
+
roughly 50%. Thismeans that when we write English half of what we write is
|
765
|
+
determined by the structure of the language andhalf is chosen freely. The
|
766
|
+
figure 50% was found by several independent methods which all gave results in
|
767
|
+
14
|
768
|
+
===============================================================================
|
769
|
+
this neighborhood. One is by calculation of the entropy of the approximations
|
770
|
+
to English. A second methodis to delete a certain fraction of the letters from
|
771
|
+
a sample of English text and then let someone attempt torestore them. If they
|
772
|
+
can be restored when 50% are deleted the redundancy must be greater than 50%.
|
773
|
+
Athird method depends on certain known results in cryptography. Two extremes of
|
774
|
+
redundancy in English prose are represented by Basic English and by James
|
775
|
+
Joyce's book "Finnegans Wake". The Basic English vocabulary is limited to 850
|
776
|
+
words and the redundancy is veryhigh. This is reflected in the expansion that
|
777
|
+
occurs when a passage is translated into Basic English. Joyceon the other hand
|
778
|
+
enlarges the vocabulary and is alleged to achieve a compression of semantic
|
779
|
+
content. The redundancy of a language is related to the existence of crossword
|
780
|
+
puzzles. If the redundancy is zero any sequence of letters is a reasonable text
|
781
|
+
in the language and any two-dimensional array of lettersforms a crossword
|
782
|
+
puzzle. If the redundancy is too high the language imposes too many constraints
|
783
|
+
for largecrossword puzzles to be possible. A more detailed analysis shows that
|
784
|
+
if we assume the constraints imposedby the language are of a rather chaotic and
|
785
|
+
random nature, large crossword puzzles are just possible whenthe redundancy is
|
786
|
+
50%. If the redundancy is 33%, three-dimensional crossword puzzles should be
|
787
|
+
possible,etc. 8. REPRESENTATION OF THE ENCODING AND DECODING OPERATIONS We have
|
788
|
+
yet to represent mathematically the operations performed by the transmitter and
|
789
|
+
receiver in en-coding and decoding the information. Either of these will be
|
790
|
+
called a discrete transducer. The input to thetransducer is a sequence of input
|
791
|
+
symbols and its output a sequence of output symbols. The transducer mayhave an
|
792
|
+
internal memory so that its output depends not only on the present input symbol
|
793
|
+
but also on the pasthistory. We assume that the internal memory is finite,
|
794
|
+
i.e., there exist a finite number mof possible states ofthe transducer and that
|
795
|
+
its output is a function of the present state and the present input symbol. The
|
796
|
+
nextstate will be a second function of these two quantities. Thus a transducer
|
797
|
+
can be described by two functions: yn f xn n = ;
|
798
|
+
|
799
|
+
n1 g xn n = ;
|
800
|
+
|
801
|
+
+ where xnis the nth input symbol, nis the state of the transducer when the nth
|
802
|
+
input symbol is introduced, ynis the output symbol (or sequence of output
|
803
|
+
symbols) produced when xnis introduced if the state is n. If the output symbols
|
804
|
+
of one transducer can be identified with the input symbols of a second, they
|
805
|
+
can be connected in tandem and the result is also a transducer. If there exists
|
806
|
+
a second transducer which operateson the output of the first and recovers the
|
807
|
+
original input, the first transducer will be called non-singular andthe second
|
808
|
+
will be called its inverse. Theorem 7:The output of a finite state transducer
|
809
|
+
driven by a finite state statistical source is a finite state statistical
|
810
|
+
source, with entropy (per unit time) less than or equal to that of the input.
|
811
|
+
If the transduceris non-singular they are equal. Let represent the state of the
|
812
|
+
source, which produces a sequence of symbols xi;
|
813
|
+
|
814
|
+
and let be the state of the transducer, which produces, in its output, blocks
|
815
|
+
of symbols y j. The combined system can be representedby the "product state
|
816
|
+
space" of pairs . Two points in the space 1 1 and 2 2 , are connected by ;
|
817
|
+
|
818
|
+
;
|
819
|
+
|
820
|
+
;
|
821
|
+
|
822
|
+
a line if 1 can produce an xwhich changes 1 to 2, and this line is given the
|
823
|
+
probability of that xin this case. The line is labeled with the block of y
|
824
|
+
jsymbols produced by the transducer. The entropy of the outputcan be calculated
|
825
|
+
as the weighted sum over the states. If we sum first on each resulting term is
|
826
|
+
less than or equal to the corresponding term for , hence the entropy is not
|
827
|
+
increased. If the transducer is non-singularlet its output be connected to the
|
828
|
+
inverse transducer. If H0 , H0 and H0 are the output entropies of the source, 1
|
829
|
+
2 3 the first and second transducers respectively, then H0 H0 H0 H0 and
|
830
|
+
therefore H0 H0 . 1 2 3 = 1 1 = 2 15
|
831
|
+
===============================================================================
|
832
|
+
Suppose we have a system of constraints on possible sequences of the type which
|
833
|
+
can be represented by s a linear graph as in Fig. 2. If probabilities p were
|
834
|
+
assigned to the various lines connecting state ito state j i j this would
|
835
|
+
become a source. There is one particular assignment which maximizes the
|
836
|
+
resulting entropy (seeAppendix 4). Theorem 8:Let the system of constraints
|
837
|
+
considered as a channel have a capacity C logW. If we = assign s B j s p W,`ij
|
838
|
+
i j= Bi s where is the duration of the sth symbol leading from state ito state
|
839
|
+
jand the Bisatisfy `i j s B i j i BjW,` = s j ;
|
840
|
+
|
841
|
+
then His maximized and equal to C. By proper assignment of the transition
|
842
|
+
probabilities the entropy of symbols on a channel can be maxi- mized at the
|
843
|
+
channel capacity. 9. THE FUNDAMENTAL THEOREM FOR A NOISELESS CHANNEL We will
|
844
|
+
now justify our interpretation of Has the rate of generating information by
|
845
|
+
proving that Hdeter-mines the channel capacity required with most efficient
|
846
|
+
coding. Theorem 9:Let a source have entropy Hbits per symbol and a channel have
|
847
|
+
a capacity Cbits per second . Then it is possible to encode the output of the
|
848
|
+
source in such a way as to transmit at the average C rate symbols per second
|
849
|
+
over the channel where is arbitrarily small. It is not possible to transmit at
|
850
|
+
, H C an average rate greater than . H C The converse part of the theorem, that
|
851
|
+
cannot be exceeded, may be proved by noting that the entropy H of the channel
|
852
|
+
input per second is equal to that of the source, since the transmitter must be
|
853
|
+
non-singular, andalso this entropy cannot exceed the channel capacity. Hence H0
|
854
|
+
Cand the number of symbols per second H0 H C H. = = = The first part of the
|
855
|
+
theorem will be proved in two different ways. The first method is to consider
|
856
|
+
the set of all sequences of Nsymbols produced by the source. For Nlarge we can
|
857
|
+
divide these into two groups,one containing less than 2 H N + members and the
|
858
|
+
second containing less than 2RNmembers (where Ris the logarithm of the number
|
859
|
+
of different symbols) and having a total probability less than . As Nincreases
|
860
|
+
and approach zero. The number of signals of duration Tin the channel is greater
|
861
|
+
than 2 C T , with small when Tis large. if we choose H T N = + C then there
|
862
|
+
will be a sufficient number of sequences of channel symbols for the high
|
863
|
+
probability group whenNand Tare sufficiently large (however small ) and also
|
864
|
+
some additional ones. The high probability group is coded in an arbitrary one-
|
865
|
+
to-one way into this set. The remaining sequences are represented by
|
866
|
+
largersequences, starting and ending with one of the sequences not used for the
|
867
|
+
high probability group. Thisspecial sequence acts as a start and stop signal
|
868
|
+
for a different code. In between a sufficient time is allowedto give enough
|
869
|
+
different sequences for all the low probability messages. This will require R
|
870
|
+
T1 N = + ' C where is small. The mean rate of transmission in message symbols
|
871
|
+
per second will then be greater than ' 1 , T T 1 , 1 H R 1 1 , + = , + + + ' :
|
872
|
+
N N C C 16
|
873
|
+
===============================================================================
|
874
|
+
C As Nincreases , and approach zero and the rate approaches . ' H Another
|
875
|
+
method of performing this coding and thereby proving the theorem can be
|
876
|
+
described as follows: Arrange the messages of length Nin order of decreasing
|
877
|
+
probability and suppose their probabilities are p 1 1 p2 p3 pn. Let Ps s, pi;
|
878
|
+
|
879
|
+
that is Psis the cumulative probability up to, but not including, ps. = 1 We
|
880
|
+
first encode into a binary system. The binary code for message sis obtained by
|
881
|
+
expanding Psas a binarynumber. The expansion is carried out to msplaces, where
|
882
|
+
msis the integer satisfying: 1 1 log2 ms 1 log + : p 2 s ps Thus the messages
|
883
|
+
of high probability are represented by short codes and those of low probability
|
884
|
+
by longcodes. From these inequalities we have 1 1 ps 2ms 2ms1 : , The code for
|
885
|
+
Pswill differ from all succeeding ones in one or more of its msplaces, since
|
886
|
+
all the remainingPiare at least 1 larger and their binary expansions therefore
|
887
|
+
differ in the first m 2ms splaces. Consequently all the codes are different and
|
888
|
+
it is possible to recover the message from its code. If the channel sequences
|
889
|
+
arenot already sequences of binary digits, they can be ascribed binary numbers
|
890
|
+
in an arbitrary fashion and thebinary code thus translated into signals
|
891
|
+
suitable for the channel. The average number H0 of binary digits used per
|
892
|
+
symbol of original message is easily estimated. We have 1 H0 msps = : N But, 1
|
893
|
+
1 1 1 1 log ps msps 1 log ps + N 2 p 2 s N N ps and therefore, 1 GN H0 GN + N
|
894
|
+
As Nincreases GNapproaches H, the entropy of the source and H0 approaches H. We
|
895
|
+
see from this that the inefficiency in coding, when only a finite delay of
|
896
|
+
Nsymbols is used, need not be greater than 1 plus the difference between the
|
897
|
+
true entropy Hand the entropy G N Ncalculated for sequences of length N. The
|
898
|
+
per cent excess time needed over the ideal is therefore less than GN 1 1 + , :
|
899
|
+
H HN This method of encoding is substantially the same as one found
|
900
|
+
independently by R. M. Fano.9 His method is to arrange the messages of length
|
901
|
+
Nin order of decreasing probability. Divide this series into twogroups of as
|
902
|
+
nearly equal probability as possible. If the message is in the first group its
|
903
|
+
first binary digitwill be 0, otherwise 1. The groups are similarly divided into
|
904
|
+
subsets of nearly equal probability and theparticular subset determines the
|
905
|
+
second binary digit. This process is continued until each subset containsonly
|
906
|
+
one message. It is easily seen that apart from minor differences (generally in
|
907
|
+
the last digit) this amountsto the same thing as the arithmetic process
|
908
|
+
described above. 10. DISCUSSION AND EXAMPLES In order to obtain the maximum
|
909
|
+
power transfer from a generator to a load, a transformer must in general
|
910
|
+
beintroduced so that the generator as seen from the load has the load
|
911
|
+
resistance. The situation here is roughlyanalogous. The transducer which does
|
912
|
+
the encoding should match the source to the channel in a statisticalsense. The
|
913
|
+
source as seen from the channel through the transducer should have the same
|
914
|
+
statistical structure 9Technical Report No. 65, The Research Laboratory of
|
915
|
+
Electronics, M.I.T., March 17, 1949. 17
|
916
|
+
===============================================================================
|
917
|
+
as the source which maximizes the entropy in the channel. The content of
|
918
|
+
Theorem 9 is that, although anexact match is not in general possible, we can
|
919
|
+
approximate it as closely as desired. The ratio of the actualrate of
|
920
|
+
transmission to the capacity Cmay be called the efficiency of the coding
|
921
|
+
system. This is of courseequal to the ratio of the actual entropy of the
|
922
|
+
channel symbols to the maximum possible entropy. In general, ideal or nearly
|
923
|
+
ideal encoding requires a long delay in the transmitter and receiver. In the
|
924
|
+
noiseless case which we have been considering, the main function of this delay
|
925
|
+
is to allow reasonably goodmatching of probabilities to corresponding lengths
|
926
|
+
of sequences. With a good code the logarithm of thereciprocal probability of a
|
927
|
+
long message must be proportional to the duration of the corresponding signal,
|
928
|
+
infact log p1 , C , T must be small for all but a small fraction of the long
|
929
|
+
messages. If a source can produce only one particular message its entropy is
|
930
|
+
zero, and no channel is required. For example, a computing machine set up to
|
931
|
+
calculate the successive digits of produces a definite sequence with no chance
|
932
|
+
element. No channel is required to "transmit" this to another point. One could
|
933
|
+
construct asecond machine to compute the same sequence at the point. However,
|
934
|
+
this may be impractical. In such a casewe can choose to ignore some or all of
|
935
|
+
the statistical knowledge we have of the source. We might considerthe digits of
|
936
|
+
to be a random sequence in that we construct a system capable of sending any
|
937
|
+
sequence of digits. In a similar way we may choose to use some of our
|
938
|
+
statistical knowledge of English in constructinga code, but not all of it. In
|
939
|
+
such a case we consider the source with the maximum entropy subject to
|
940
|
+
thestatistical conditions we wish to retain. The entropy of this source
|
941
|
+
determines the channel capacity whichis necessary and sufficient. In the
|
942
|
+
example the only information retained is that all the digits are chosen from
|
943
|
+
the set 0 1 9. In the case of English one might wish to use the statistical
|
944
|
+
saving possible due to ;
|
945
|
+
|
946
|
+
;
|
947
|
+
|
948
|
+
: : : ;
|
949
|
+
|
950
|
+
letter frequencies, but nothing else. The maximum entropy source is then the
|
951
|
+
first approximation to Englishand its entropy determines the required channel
|
952
|
+
capacity. As a simple example of some of these results consider a source which
|
953
|
+
produces a sequence of letters chosen from among A, B, C, Dwith probabilities 1
|
954
|
+
, 1 , 1 , 1 , successive symbols being chosen independently. 2 4 8 8 We have ,
|
955
|
+
H 1 log 1 1 log 1 2 log 1 = , 2 2 + 4 4 + 8 8 7 bits per symbol = 4 : Thus we
|
956
|
+
can approximate a coding system to encode messages from this source into binary
|
957
|
+
digits with anaverage of 7 binary digit per symbol. In this case we can
|
958
|
+
actually achieve the limiting value by the following 4 code (obtained by the
|
959
|
+
method of the second proof of Theorem 9): A 0 B 10 C 110 D 111 The average
|
960
|
+
number of binary digits used in encoding a sequence of Nsymbols will be 2 , N1
|
961
|
+
1 1 2 3 7 N 2 + 4 + = : 8 4 It is easily seen that the binary digits 0, 1 have
|
962
|
+
probabilities 1 , 1 so the Hfor the coded sequences is one 2 2 bit per symbol.
|
963
|
+
Since, on the average, we have 7 binary symbols per original letter, the
|
964
|
+
entropies on a time 4 basis are the same. The maximum possible entropy for the
|
965
|
+
original set is log 4 2, occurring when A, B, C, = Dhave probabilities 1 , 1 ,
|
966
|
+
1 , 1 . Hence the relative entropy is 7 . We can translate the binary sequences
|
967
|
+
into 4 4 4 4 8 the original set of symbols on a two-to-one basis by the
|
968
|
+
following table: 00 A0 01 B0 10 C0 11 D0 18
|
969
|
+
===============================================================================
|
970
|
+
This double process then encodes the original message into the same symbols but
|
971
|
+
with an average compres-sion ratio 7 . 8 As a second example consider a source
|
972
|
+
which produces a sequence of A's and B's with probability pfor Aand qfor B. If
|
973
|
+
p qwe have H log pp1 p1 p , = , , plog p1 p1 p p , = = , , e : plog = : p In
|
974
|
+
such a case one can construct a fairly good coding of the message on a 0, 1
|
975
|
+
channel by sending a specialsequence, say 0000, for the infrequent symbol Aand
|
976
|
+
then a sequence indicating the numberof B's followingit. This could be
|
977
|
+
indicated by the binary representation with all numbers containing the special
|
978
|
+
sequencedeleted. All numbers up to 16 are represented as usual;
|
979
|
+
|
980
|
+
16 is represented by the next binary number after 16which does not contain four
|
981
|
+
zeros, namely 17 10001, etc. = It can be shown that as p 0 the coding
|
982
|
+
approaches ideal provided the length of the special sequence is ! properly
|
983
|
+
adjusted. PART II: THE DISCRETE CHANNEL WITH NOISE 11. REPRESENTATION OF A
|
984
|
+
NOISY DISCRETE CHANNEL We now consider the case where the signal is perturbed
|
985
|
+
by noise during transmission or at one or the otherof the terminals. This means
|
986
|
+
that the received signal is not necessarily the same as that sent out by
|
987
|
+
thetransmitter. Two cases may be distinguished. If a particular transmitted
|
988
|
+
signal always produces the samereceived signal, i.e., the received signal is a
|
989
|
+
definite function of the transmitted signal, then the effect may becalled
|
990
|
+
distortion. If this function has an inverse -- no two transmitted signals
|
991
|
+
producing the same receivedsignal -- distortion may be corrected, at least in
|
992
|
+
principle, by merely performing the inverse functionaloperation on the received
|
993
|
+
signal. The case of interest here is that in which the signal does not always
|
994
|
+
undergo the same change in trans- mission. In this case we may assume the
|
995
|
+
received signal Eto be a function of the transmitted signal Sand asecond
|
996
|
+
variable, the noise N. E f S N = ;
|
997
|
+
|
998
|
+
The noise is considered to be a chance variable just as the message was above.
|
999
|
+
In general it may be repre-sented by a suitable stochastic process. The most
|
1000
|
+
general type of noisy discrete channel we shall consideris a generalization of
|
1001
|
+
the finite state noise-free channel described previously. We assume a finite
|
1002
|
+
number ofstates and a set of probabilities p i j ;
|
1003
|
+
|
1004
|
+
: ;
|
1005
|
+
|
1006
|
+
This is the probability, if the channel is in state and symbol iis transmitted,
|
1007
|
+
that symbol jwill be received and the channel left in state . Thus and range
|
1008
|
+
over the possible states, iover the possible transmitted signals and jover the
|
1009
|
+
possible received signals. In the case where successive symbols are
|
1010
|
+
independently per-turbed by the noise there is only one state, and the channel
|
1011
|
+
is described by the set of transition probabilities pi j, the probability of
|
1012
|
+
transmitted symbol ibeing received as j. If a noisy channel is fed by a source
|
1013
|
+
there are two statistical processes at work: the source and the noise. Thus
|
1014
|
+
there are a number of entropies that can be calculated. First there is the
|
1015
|
+
entropy H xof the source or of the input to the channel (these will be equal if
|
1016
|
+
the transmitter is non-singular). The entropy of theoutput of the channel,
|
1017
|
+
i.e., the received signal, will be denoted by H y. In the noiseless case H y H
|
1018
|
+
x. = The joint entropy of input and output will be H xy. Finally there are two
|
1019
|
+
conditional entropies Hx yand Hy x, the entropy of the output when the input is
|
1020
|
+
known and conversely. Among these quantities we have the relations H x y H x Hx
|
1021
|
+
y H y Hy x ;
|
1022
|
+
|
1023
|
+
= + = + : All of these entropies can be measured on a per-second or a per-
|
1024
|
+
symbol basis. 19
|
1025
|
+
===============================================================================
|
1026
|
+
12. EQUIVOCATION AND CHANNEL CAPACITY If the channel is noisy it is not in
|
1027
|
+
general possible to reconstruct the original message or the transmittedsignal
|
1028
|
+
with certaintyby any operation on the received signal E. There are, however,
|
1029
|
+
ways of transmittingthe information which are optimal in combating noise. This
|
1030
|
+
is the problem which we now consider. Suppose there are two possible symbols 0
|
1031
|
+
and 1, and we are transmitting at a rate of 1000 symbols per second with
|
1032
|
+
probabilities p 1 0 p1 . Thus our source is producing information at the rate
|
1033
|
+
of 1000 bits = = 2 per second. During transmission the noise introduces errors
|
1034
|
+
so that, on the average, 1 in 100 is receivedincorrectly (a 0 as 1, or 1 as 0).
|
1035
|
+
What is the rate of transmission of information? Certainly less than 1000bits
|
1036
|
+
per second since about 1% of the received symbols are incorrect. Our first
|
1037
|
+
impulse might be to saythe rate is 990 bits per second, merely subtracting the
|
1038
|
+
expected number of errors. This is not satisfactorysince it fails to take into
|
1039
|
+
account the recipient's lack of knowledge of where the errors occur. We may
|
1040
|
+
carryit to an extreme case and suppose the noise so great that the received
|
1041
|
+
symbols are entirely independent ofthe transmitted symbols. The probability of
|
1042
|
+
receiving 1 is 1 whatever was transmitted and similarly for 0. 2 Then about
|
1043
|
+
half of the received symbols are correct due to chance alone, and we would be
|
1044
|
+
giving the systemcredit for transmitting 500 bits per second while actually no
|
1045
|
+
information is being transmitted at all. Equally"good" transmission would be
|
1046
|
+
obtained by dispensing with the channel entirely and flipping a coin at
|
1047
|
+
thereceiving point. Evidently the proper correction to apply to the amount of
|
1048
|
+
information transmitted is the amount of this information which is missing in
|
1049
|
+
the received signal, or alternatively the uncertainty when we have receiveda
|
1050
|
+
signal of what was actually sent. From our previous discussion of entropy as a
|
1051
|
+
measure of uncertainty itseems reasonable to use the conditional entropy of the
|
1052
|
+
message, knowing the received signal, as a measureof this missing information.
|
1053
|
+
This is indeed the proper definition, as we shall see later. Following this
|
1054
|
+
ideathe rate of actual transmission, R, would be obtained by subtracting from
|
1055
|
+
the rate of production (i.e., theentropy of the source) the average rate of
|
1056
|
+
conditional entropy. R H x Hy x = , The conditional entropy Hy xwill, for
|
1057
|
+
convenience, be called the equivocation. It measures the average ambiguity of
|
1058
|
+
the received signal. In the example considered above, if a 0 is received the a
|
1059
|
+
posterioriprobability that a 0 was transmitted is .99, and that a 1 was
|
1060
|
+
transmitted is .01. These figures are reversed if a 1 is received. Hence Hy x
|
1061
|
+
99 log 99 0 01 log0 01 = , : : + : : 081 bits/symbol = : or 81 bits per second.
|
1062
|
+
We may say that the system is transmitting at a rate 1000 81 919 bits per
|
1063
|
+
second. , = In the extreme case where a 0 is equally likely to be received as a
|
1064
|
+
0 or 1 and similarly for 1, the a posterioriprobabilities are 1 , 1 and 2 2 H 1
|
1065
|
+
1 y x log 1 log 1 = , 2 2 + 2 2 1 bit per symbol = or 1000 bits per second. The
|
1066
|
+
rate of transmission is then 0 as it should be. The following theorem gives a
|
1067
|
+
direct intuitive interpretation of the equivocation and also serves to justify
|
1068
|
+
it as the unique appropriate measure. We consider a communication system and an
|
1069
|
+
observer (or auxiliarydevice) who can see both what is sent and what is
|
1070
|
+
recovered (with errors due to noise). This observer notesthe errors in the
|
1071
|
+
recovered message and transmits data to the receiving point over a "correction
|
1072
|
+
channel" toenable the receiver to correct the errors. The situation is
|
1073
|
+
indicated schematically in Fig. 8. Theorem 10:If the correction channel has a
|
1074
|
+
capacity equal to Hy xit is possible to so encode the correction data as to
|
1075
|
+
send it over this channel and correct all but an arbitrarily small fraction of
|
1076
|
+
the errors.This is not possible if the channel capacity is less than Hy x. 20
|
1077
|
+
===============================================================================
|
1078
|
+
CORRECTION DATA OBSERVER M M0 M SOURCE TRANSMITTER RECEIVER CORRECTING DEVICE
|
1079
|
+
Fig. 8 -- Schematic diagram of a correction system. Roughly then, Hy xis the
|
1080
|
+
amount of additional information that must be supplied per second at the
|
1081
|
+
receiving point to correct the received message. To prove the first part,
|
1082
|
+
consider long sequences of received message M0 and corresponding original
|
1083
|
+
message M. There will be logarithmically T Hy xof the M's which could
|
1084
|
+
reasonably have produced each M0. Thus we have T Hy xbinary digits to send each
|
1085
|
+
Tseconds. This can be done with frequency of errors on a channel of capacity Hy
|
1086
|
+
x. The second part can be proved by noting, first, that for any discrete chance
|
1087
|
+
variables x, y, z Hy x z Hy x ;
|
1088
|
+
|
1089
|
+
: The left-hand side can be expanded to give Hy z Hyz x Hy x + Hyz x Hy x Hy z
|
1090
|
+
Hy x H z , , : If we identify xas the output of the source, yas the received
|
1091
|
+
signal and zas the signal sent over the correctionchannel, then the right-hand
|
1092
|
+
side is the equivocation less the rate of transmission over the correction
|
1093
|
+
channel.If the capacity of this channel is less than the equivocation the
|
1094
|
+
right-hand side will be greater than zero andHyz x 0. But this is the
|
1095
|
+
uncertainty of what was sent, knowing both the received signal and the
|
1096
|
+
correction signal. If this is greater than zero the frequency of errors cannot
|
1097
|
+
be arbitrarily small. Example: Suppose the errors occur at random in a sequence
|
1098
|
+
of binary digits: probability pthat a digit is wrongand q 1 pthat it is right.
|
1099
|
+
These errors can be corrected if their position is known. Thus the = ,
|
1100
|
+
correction channel need only send information as to these positions. This
|
1101
|
+
amounts to transmittingfrom a source which produces binary digits with
|
1102
|
+
probability pfor 1 (incorrect) and qfor 0 (correct).This requires a channel of
|
1103
|
+
capacity plog p qlogq , + which is the equivocation of the original system. The
|
1104
|
+
rate of transmission Rcan be written in two other forms due to the identities
|
1105
|
+
noted above. We have R H x Hy x = , H y Hx y = , H x H y H x y = + , ;
|
1106
|
+
|
1107
|
+
: 21
|
1108
|
+
===============================================================================
|
1109
|
+
The first defining expression has already been interpreted as the amount of
|
1110
|
+
information sent less the uncer-tainty of what was sent. The second measures
|
1111
|
+
the amount received less the part of this which is due to noise.The third is
|
1112
|
+
the sum of the two amounts less the joint entropy and therefore in a sense is
|
1113
|
+
the number of bitsper second common to the two. Thus all three expressions have
|
1114
|
+
a certain intuitive significance. The capacity Cof a noisy channel should be
|
1115
|
+
the maximum possible rate of transmission, i.e., the rate when the source is
|
1116
|
+
properly matched to the channel. We therefore define the channel capacity by ,
|
1117
|
+
C Max H x Hy x = , where the maximum is with respect to all possible
|
1118
|
+
information sources used as input to the channel. If thechannel is noiseless,
|
1119
|
+
Hy x 0. The definition is then equivalent to that already given for a noiseless
|
1120
|
+
channel = since the maximum entropy for the channel is its capacity. 13. THE
|
1121
|
+
FUNDAMENTAL THEOREM FOR A DISCRETE CHANNEL WITH NOISE It may seem surprising
|
1122
|
+
that we should define a definite capacity Cfor a noisy channel since we can
|
1123
|
+
neversend certain information in such a case. It is clear, however, that by
|
1124
|
+
sending the information in a redundantform the probability of errors can be
|
1125
|
+
reduced. For example, by repeating the message many times and by astatistical
|
1126
|
+
study of the different received versions of the message the probability of
|
1127
|
+
errors could be made verysmall. One would expect, however, that to make this
|
1128
|
+
probability of errors approach zero, the redundancyof the encoding must
|
1129
|
+
increase indefinitely, and the rate of transmission therefore approach zero.
|
1130
|
+
This is byno means true. If it were, there would not be a very well defined
|
1131
|
+
capacity, but only a capacity for a givenfrequency of errors, or a given
|
1132
|
+
equivocation;
|
1133
|
+
|
1134
|
+
the capacity going down as the error requirements are mademore stringent.
|
1135
|
+
Actually the capacity Cdefined above has a very definite significance. It is
|
1136
|
+
possible to sendinformation at the rate Cthrough the channel with as small a
|
1137
|
+
frequency of errors or equivocation as desiredby proper encoding. This
|
1138
|
+
statement is not true for any rate greater than C. If an attempt is made to
|
1139
|
+
transmitat a higher rate than C, say C R1, then there will necessarily be an
|
1140
|
+
equivocation equal to or greater than the + excess R1. Nature takes payment by
|
1141
|
+
requiring just that much uncertainty, so that we are not actually gettingany
|
1142
|
+
more than Cthrough correctly. The situation is indicated in Fig. 9. The rate of
|
1143
|
+
information into the channel is plotted horizontally and the equivocation
|
1144
|
+
vertically. Any point above the heavy line in the shaded region can be attained
|
1145
|
+
and thosebelow cannot. The points on the line cannot in general be attained,
|
1146
|
+
but there will usually be two points onthe line that can. These results are the
|
1147
|
+
main justification for the definition of Cand will now be proved. Theorem 11:
|
1148
|
+
Let a discrete channel have the capacity Cand a discrete source the entropy per
|
1149
|
+
second H. If H Cthere exists a coding system such that the output of the source
|
1150
|
+
can be transmitted over the channel with an arbitrarily small frequency of
|
1151
|
+
errors (or an arbitrarily small equivocation). If H Cit is possible to encode
|
1152
|
+
the source so that the equivocation is less than H C where is arbitrarily
|
1153
|
+
small. There is no , + method of encoding which gives an equivocation less than
|
1154
|
+
H C. , The method of proving the first part of this theorem is not by
|
1155
|
+
exhibiting a coding method having the desired properties, but by showing that
|
1156
|
+
such a code must exist in a certain group of codes. In fact we will ATTAINABLE
|
1157
|
+
Hy x REGION 1.0 = OPE SL C H x Fig. 9 -- The equivocation possible for a given
|
1158
|
+
input entropy to a channel. 22
|
1159
|
+
===============================================================================
|
1160
|
+
average the frequency of errors over this group and show that this average can
|
1161
|
+
be made less than . If theaverage of a set of numbers is less than there must
|
1162
|
+
exist at least one in the set which is less than . This will establish the
|
1163
|
+
desired result. The capacity Cof a noisy channel has been defined as , C Max H
|
1164
|
+
x Hy x = , where xis the input and ythe output. The maximization is over all
|
1165
|
+
sources which might be used as input tothe channel. Let S0 be a source which
|
1166
|
+
achieves the maximum capacity C. If this maximum is not actually achieved by
|
1167
|
+
any source let S0 be a source which approximates to giving the maximum rate.
|
1168
|
+
Suppose S0 is used asinput to the channel. We consider the possible transmitted
|
1169
|
+
and received sequences of a long duration T. Thefollowing will be true: 1. The
|
1170
|
+
transmitted sequences fall into two classes, a high probability group with
|
1171
|
+
about 2T H x members and the remaining sequences of small total probability. 2.
|
1172
|
+
Similarly the received sequences have a high probability set of about 2T H y
|
1173
|
+
members and a low probability set of remaining sequences. 3. Each high
|
1174
|
+
probability output could be produced by about 2THy x inputs. The probability of
|
1175
|
+
all other cases has a small total probability. All the 's and 's implied by the
|
1176
|
+
words "small" and "about" in these statements approach zero as we allow Tto
|
1177
|
+
increase and S0 to approach the maximizing source. The situation is summarized
|
1178
|
+
in Fig. 10 where the input sequences are points on the left and output
|
1179
|
+
sequences points on the right. The fan of cross lines represents the range of
|
1180
|
+
possible causes for a typicaloutput. E M 2H x T HIGH PROBABILITY 2H y T
|
1181
|
+
MESSAGES HIGH PROBABILITY RECEIVED SIGNALS 2Hy x T REASONABLE CAUSES FOR EACH E
|
1182
|
+
2Hx y T REASONABLE EFFECTS FOR EACH M Fig. 10 -- Schematic representation of
|
1183
|
+
the relations between inputs and outputs in a channel. Now suppose we have
|
1184
|
+
another source producing information at rate Rwith R C. In the period Tthis
|
1185
|
+
source will have 2TRhigh probability messages. We wish to associate these with
|
1186
|
+
a selection of the possiblechannel inputs in such a way as to get a small
|
1187
|
+
frequency of errors. We will set up this association in all 23
|
1188
|
+
===============================================================================
|
1189
|
+
possible ways (using, however, only the high probability group of inputs as
|
1190
|
+
determined by the source S0)and average the frequency of errors for this large
|
1191
|
+
class of possible coding systems. This is the same ascalculating the frequency
|
1192
|
+
of errors for a random association of the messages and channel inputs of
|
1193
|
+
durationT. Suppose a particular output y1 is observed. What is the probability
|
1194
|
+
of more than one message in the setof possible causes of y x 1? There are 2T
|
1195
|
+
Rmessages distributed at random in 2T H points. The probability of a particular
|
1196
|
+
point being a message is thus 2T R H x , : The probability that none of the
|
1197
|
+
points in the fan is a message (apart from the actual originating message) is x
|
1198
|
+
2T Hy P 1 2T R H x , = , : Now R H x Hy xso R H x Hy x with positive.
|
1199
|
+
Consequently , , = , , x 2T Hy P 1 2 THy x T , , = , approaches (as T ) ! 1 2 T
|
1200
|
+
, , : Hence the probability of an error approaches zero and the first part of
|
1201
|
+
the theorem is proved. The second part of the theorem is easily shown by noting
|
1202
|
+
that we could merely send Cbits per second from the source, completely
|
1203
|
+
neglecting the remainder of the information generated. At the receiver
|
1204
|
+
theneglected part gives an equivocation H x Cand the part transmitted need only
|
1205
|
+
add . This limit can also , be attained in many other ways, as will be shown
|
1206
|
+
when we consider the continuous case. The last statement of the theorem is a
|
1207
|
+
simple consequence of our definition of C. Suppose we can encode a source with
|
1208
|
+
H x C ain such a way as to obtain an equivocation Hy x a with positive. Then =
|
1209
|
+
+ = , R H x C aand = = + H x Hy x C , = + with positive. This contradicts the
|
1210
|
+
definition of Cas the maximum of H x Hy x. , Actually more has been proved than
|
1211
|
+
was stated in the theorem. If the average of a set of numbers is p p within of
|
1212
|
+
of their maximum, a fraction of at most can be more than below the maximum.
|
1213
|
+
Since is arbitrarily small we can say that almost all the systems are
|
1214
|
+
arbitrarily close to the ideal. 14. DISCUSSION The demonstration of Theorem 11,
|
1215
|
+
while not a pure existence proof, has some of the deficiencies of suchproofs.
|
1216
|
+
An attempt to obtain a good approximation to ideal coding by following the
|
1217
|
+
method of the proof isgenerally impractical. In fact, apart from some rather
|
1218
|
+
trivial cases and certain limiting situations, no explicitdescription of a
|
1219
|
+
series of approximation to the ideal has been found. Probably this is no
|
1220
|
+
accident but isrelated to the difficulty of giving an explicit construction for
|
1221
|
+
a good approximation to a random sequence. An approximation to the ideal would
|
1222
|
+
have the property that if the signal is altered in a reasonable way by the
|
1223
|
+
noise, the original can still be recovered. In other words the alteration will
|
1224
|
+
not in general bring itcloser to another reasonable signal than the original.
|
1225
|
+
This is accomplished at the cost of a certain amount ofredundancy in the
|
1226
|
+
coding. The redundancy must be introduced in the proper way to combat the
|
1227
|
+
particularnoise structure involved. However, any redundancy in the source will
|
1228
|
+
usually help if it is utilized at thereceiving point. In particular, if the
|
1229
|
+
source already has a certain redundancy and no attempt is made toeliminate it
|
1230
|
+
in matching to the channel, this redundancy will help combat noise. For
|
1231
|
+
example, in a noiselesstelegraph channel one could save about 50% in time by
|
1232
|
+
proper encoding of the messages. This is not doneand most of the redundancy of
|
1233
|
+
English remains in the channel symbols. This has the advantage, however,of
|
1234
|
+
allowing considerable noise in the channel. A sizable fraction of the letters
|
1235
|
+
can be received incorrectlyand still reconstructed by the context. In fact this
|
1236
|
+
is probably not a bad approximation to the ideal in manycases, since the
|
1237
|
+
statistical structure of English is rather involved and the reasonable English
|
1238
|
+
sequences arenot too far (in the sense required for the theorem) from a random
|
1239
|
+
selection. 24
|
1240
|
+
===============================================================================
|
1241
|
+
As in the noiseless case a delay is generally required to approach the ideal
|
1242
|
+
encoding. It now has the additional function of allowing a large sample of
|
1243
|
+
noise to affect the signal before any judgment is madeat the receiving point as
|
1244
|
+
to the original message. Increasing the sample size always sharpens the
|
1245
|
+
possiblestatistical assertions. The content of Theorem 11 and its proof can be
|
1246
|
+
formulated in a somewhat different way which exhibits the connection with the
|
1247
|
+
noiseless case more clearly. Consider the possible signals of duration Tand
|
1248
|
+
supposea subset of them is selected to be used. Let those in the subset all be
|
1249
|
+
used with equal probability, and supposethe receiver is constructed to select,
|
1250
|
+
as the original signal, the most probable cause from the subset, when
|
1251
|
+
aperturbed signal is received. We define N T qto be the maximum number of
|
1252
|
+
signals we can choose for the ;
|
1253
|
+
|
1254
|
+
subset such that the probability of an incorrect interpretation is less than or
|
1255
|
+
equal to q. log N T q Theorem 12:Lim ;
|
1256
|
+
|
1257
|
+
C, where Cis the channel capacity, provided that qdoes not equal 0 or = T T !
|
1258
|
+
1. In other words, no matter how we set out limits of reliability, we can
|
1259
|
+
distinguish reliably in time T enough messages to correspond to about CTbits,
|
1260
|
+
when Tis sufficiently large. Theorem 12 can be comparedwith the definition of
|
1261
|
+
the capacity of a noiseless channel given in Section 1. 15. EXAMPLE OF A
|
1262
|
+
DISCRETE CHANNEL AND ITS CAPACITY A simple example of a discrete channel is
|
1263
|
+
indicated in Fig. 11. There are three possible symbols. The first isnever
|
1264
|
+
affected by noise. The second and third each have probability pof coming
|
1265
|
+
through undisturbed, andqof being changed into the other of the pair. We have
|
1266
|
+
(letting plog p qlogqand Pand Qbe the = , + p q TRANSMITTED RECEIVED SYMBOLS
|
1267
|
+
SYMBOLS q p Fig. 11 -- Example of a discrete channel. probabilities of using
|
1268
|
+
the first and second symbols) H x Plog P 2QlogQ = , , Hy x 2Q = : We wish to
|
1269
|
+
choose Pand Qin such a way as to maximize H x Hy x, subject to the constraint P
|
1270
|
+
2Q 1. , + = Hence we consider U Plog P 2QlogQ 2Q P 2Q = , , , + + U 1 logP 0 =
|
1271
|
+
, , + = P U 2 2 logQ 2 2 0 = , , , + = : Q Eliminating log P log Q = + P Qe Q =
|
1272
|
+
= 25
|
1273
|
+
===============================================================================
|
1274
|
+
1 P Q = = : 2 2 + + The channel capacity is then 2 C log + = : Note how this
|
1275
|
+
checks the obvious values in the cases p 1 and p 1 . In the first, 1 and C log
|
1276
|
+
3, = = 2 = = which is correct since the channel is then noiseless with three
|
1277
|
+
possible symbols. If p 1 , 2 and = 2 = C log 2. Here the second and third
|
1278
|
+
symbols cannot be distinguished at all and act together like one = symbol. The
|
1279
|
+
first symbol is used with probability P 1 and the second and third together
|
1280
|
+
with probability = 2 1 . This may be distributed between them in any desired
|
1281
|
+
way and still achieve the maximum capacity. 2 For intermediate values of pthe
|
1282
|
+
channel capacity will lie between log 2 and log 3. The distinction between the
|
1283
|
+
second and third symbols conveys some information but not as much as in the
|
1284
|
+
noiseless case.The first symbol is used somewhat more frequently than the other
|
1285
|
+
two because of its freedom from noise. 16. THE CHANNEL CAPACITY IN CERTAIN
|
1286
|
+
SPECIAL CASES If the noise affects successive channel symbols independently it
|
1287
|
+
can be described by a set of transitionprobabilities pi j. This is the
|
1288
|
+
probability, if symbol iis sent, that jwill be received. The maximum
|
1289
|
+
channelrate is then given by the maximum of PipijlogPipijPipijlogpij , + i j i
|
1290
|
+
i j ;
|
1291
|
+
|
1292
|
+
;
|
1293
|
+
|
1294
|
+
where we vary the Pisubject to Pi 1. This leads by the method of Lagrange to
|
1295
|
+
the equations, = ps j ps jlog s 1 2 = = ;
|
1296
|
+
|
1297
|
+
;
|
1298
|
+
|
1299
|
+
: : : : j i Pi pi j Multiplying by Psand summing on sshows that C. Let the
|
1300
|
+
inverse of ps j(if it exists) be hstso that = s hst psj t j. Then: =
|
1301
|
+
hstpsjlogpsjlogPipit Chst , = : s j i s ;
|
1302
|
+
|
1303
|
+
Hence: h i Pi pit exp Chsthst psjlog psj = , + i s s j ;
|
1304
|
+
|
1305
|
+
or, h i Pi hitexp Chsthstpsjlogpsj = , + : t s s j ;
|
1306
|
+
|
1307
|
+
This is the system of equations for determining the maximizing values of Pi,
|
1308
|
+
with Cto be determined so that Pi 1. When this is done Cwill be the channel
|
1309
|
+
capacity, and the Pithe proper probabilities for the = channel symbols to
|
1310
|
+
achieve this capacity. If each input symbol has the same set of probabilities
|
1311
|
+
on the lines emerging from it, and the same is true of each output symbol, the
|
1312
|
+
capacity can be easily calculated. Examples are shown in Fig. 12. In such a
|
1313
|
+
caseHx yis independent of the distribution of probabilities on the input
|
1314
|
+
symbols, and is given by pilog pi , where the piare the values of the
|
1315
|
+
transition probabilities from any input symbol. The channel capacity is Max H y
|
1316
|
+
Hx y Max H y pilogpi , = + : The maximum of H yis clearly log mwhere mis the
|
1317
|
+
number of output symbols, since it is possible to make them all equally
|
1318
|
+
probable by making the input symbols equally probable. The channel capacity is
|
1319
|
+
therefore C log m pilogpi = + : 26
|
1320
|
+
===============================================================================
|
1321
|
+
1 2 1 2 1 3 1 2 1 3 1 6 1 3 1 2 1 6 1 6 1 6 1 2 1 2 1 6 1 2 1 6 1 3 1 3 1 2 1 3
|
1322
|
+
1 2 1 6 1 3 1 2 1 2 a b c Fig. 12 -- Examples of discrete channels with the
|
1323
|
+
same transition probabilities for each input and for each output. In Fig. 12a
|
1324
|
+
it would be C log 4 log2 log 2 = , = : This could be achieved by using only the
|
1325
|
+
1st and 3d symbols. In Fig. 12b C log 4 2 log3 1 log6 = , 3 , 3 log 4 log3 1
|
1326
|
+
log2 = , , 3 5 log 1 2 3 = 3 : In Fig. 12c we have C log 3 1 log2 1 log3 1 log6
|
1327
|
+
= , 2 , 3 , 6 3 log = 1 1 1 : 2 2 3 3 6 6 Suppose the symbols fall into several
|
1328
|
+
groups such that the noise never causes a symbol in one group to be mistaken
|
1329
|
+
for a symbol in another group. Let the capacity for the nth group be Cn(in bits
|
1330
|
+
per second)when we use only the symbols in this group. Then it is easily shown
|
1331
|
+
that, for best use of the entire set, thetotal probability Pnof all symbols in
|
1332
|
+
the nth group should be 2Cn Pn= : 2Cn Within a group the probability is
|
1333
|
+
distributed just as it would be if these were the only symbols being used.The
|
1334
|
+
channel capacity is C log 2Cn = : 17. AN EXAMPLE OF EFFICIENT CODING The
|
1335
|
+
following example, although somewhat unrealistic, is a case in which exact
|
1336
|
+
matching to a noisy channelis possible. There are two channel symbols, 0 and 1,
|
1337
|
+
and the noise affects them in blocks of seven symbols.A block of seven is
|
1338
|
+
either transmitted without error, or exactly one symbol of the seven is
|
1339
|
+
incorrect. Theseeight possibilities are equally likely. We have C Max H y Hx y
|
1340
|
+
= , 1 7 8 log 1 = 7 + 8 8 4 bits/symbol = 7 : An efficient code, allowing
|
1341
|
+
complete correction of errors and transmitting at the rate C, is the following
|
1342
|
+
(found by a method due to R. Hamming): 27
|
1343
|
+
===============================================================================
|
1344
|
+
Let a block of seven symbols be X1 X2 X7. Of these X3, X5, X6 and X7 are
|
1345
|
+
message symbols and ;
|
1346
|
+
|
1347
|
+
;
|
1348
|
+
|
1349
|
+
: : : ;
|
1350
|
+
|
1351
|
+
chosen arbitrarily by the source. The other three are redundant and calculated
|
1352
|
+
as follows: X4 is chosen to make X4 X5 X6 X7 even = + + + X2 " " " " X2 X3 X6
|
1353
|
+
X7 " = + + + X1 " " " " X1 X3 X5 X7 " = + + + When a block of seven is received
|
1354
|
+
and are calculated and if even called zero, if odd called one. The ;
|
1355
|
+
|
1356
|
+
binary number then gives the subscript of the Xithat is incorrect (if 0 there
|
1357
|
+
was no error). APPENDIX 1 THE GROWTH OF THE NUMBER OF BLOCKS OF SYMBOLS WITH A
|
1358
|
+
FINITE STATE CONDITION Let Ni Lbe the number of blocks of symbols of length
|
1359
|
+
Lending in state i. Then we have , s N j L Ni L b = , i j i s ;
|
1360
|
+
|
1361
|
+
where b1 b2 bmare the length of the symbols which may be chosen in state iand
|
1362
|
+
lead to state j. These i j;
|
1363
|
+
|
1364
|
+
i j;
|
1365
|
+
|
1366
|
+
: : : ;
|
1367
|
+
|
1368
|
+
i j are linear difference equations and the behavior as L must be of the type !
|
1369
|
+
Nj A jW L = : Substituting in the difference equation s A bij jW L AiWL, = i s
|
1370
|
+
;
|
1371
|
+
|
1372
|
+
or s A bij j AiW, = i s ;
|
1373
|
+
|
1374
|
+
s W b , i j i j Ai 0 , = : i s For this to be possible the determinant s D W a
|
1375
|
+
bij i j W, i j = j j = , s must vanish and this determines W, which is, of
|
1376
|
+
course, the largest real root of D 0. = The quantity Cis then given by log A jW
|
1377
|
+
L C Lim logW = L L = ! and we also note that the same growth properties result
|
1378
|
+
if we require that all blocks start in the same (arbi-trarily chosen) state.
|
1379
|
+
APPENDIX 2 DERIVATION OF H pilog pi = , 1 1 1 Let H A n. From condition (3) we
|
1380
|
+
can decompose a choice from smequally likely possi- ;
|
1381
|
+
|
1382
|
+
;
|
1383
|
+
|
1384
|
+
: : : ;
|
1385
|
+
|
1386
|
+
= n n n bilities into a series of mchoices from sequally likely possibilities
|
1387
|
+
and obtain A sm mA s = : 28
|
1388
|
+
===============================================================================
|
1389
|
+
Similarly A tn nA t = : We can choose narbitrarily large and find an mto
|
1390
|
+
satisfy sm tn s m1 + : Thus, taking logarithms and dividing by nlog s, m log t
|
1391
|
+
m 1 m log t or + , n log s n n n log s where is arbitrarily small. Now from the
|
1392
|
+
monotonic property of A n, A sm A tn A sm1 + mA s nA t m 1 A s + : Hence,
|
1393
|
+
dividing by nA s, m A t m 1 m A t or + , n A s n n n A s A t logt 2 A t Klogt ,
|
1394
|
+
= A s log s where Kmust be positive to satisfy (2). ni Now suppose we have a
|
1395
|
+
choice from npossibilities with commeasurable probabilities pi where = ni the
|
1396
|
+
niare integers. We can break down a choice from nipossibilities into a choice
|
1397
|
+
from npossibilitieswith probabilities p1 pnand then, if the ith was chosen, a
|
1398
|
+
choice from niwith equal probabilities. Using ;
|
1399
|
+
|
1400
|
+
: : : ;
|
1401
|
+
|
1402
|
+
condition (3) again, we equate the total choice from nias computed by two
|
1403
|
+
methods Klog ni H p1 pn K pilogni = ;
|
1404
|
+
|
1405
|
+
: : : ;
|
1406
|
+
|
1407
|
+
+ : Hence h i H K pilogni pilogni = , ni K pilog K pilog pi = , = , : ni If the
|
1408
|
+
piare incommeasurable, they may be approximated by rationals and the same
|
1409
|
+
expression must holdby our continuity assumption. Thus the expression holds in
|
1410
|
+
general. The choice of coefficient Kis a matterof convenience and amounts to
|
1411
|
+
the choice of a unit of measure. APPENDIX 3 THEOREMS ON ERGODIC SOURCES If it
|
1412
|
+
is possible to go from any state with P 0 to any other along a path of
|
1413
|
+
probability p 0, the system is ergodic and the strong law of large numbers can
|
1414
|
+
be applied. Thus the number of times a given path pi jinthe network is
|
1415
|
+
traversed in a long sequence of length Nis about proportional to the
|
1416
|
+
probability of being ati, say Pi, and then choosing this path, Pi pi jN. If Nis
|
1417
|
+
large enough the probability of percentage error in this is less than so that
|
1418
|
+
for all but a set of small probability the actual numbers lie within the limits
|
1419
|
+
Pi pi j N : Hence nearly all sequences have a probability pgiven by P N p p
|
1420
|
+
ipij = i j 29
|
1421
|
+
===============================================================================
|
1422
|
+
log p and is limited by N log p Pipij log pi j = N or log p Pipijlogpij , : N
|
1423
|
+
This proves Theorem 3. Theorem 4 follows immediately from this on calculating
|
1424
|
+
upper and lower bounds for n qbased on the possible range of values of pin
|
1425
|
+
Theorem 3. In the mixed (not ergodic) case if L piLi = and the entropies of the
|
1426
|
+
components are H1 H2 Hnwe have the Theorem:Lim logn q qis a decreasing step
|
1427
|
+
function, N N = ' ! s1 s , q Hs in the interval i q i ' = : 1 1 To prove
|
1428
|
+
Theorems 5 and 6 first note that FNis monotonic decreasing because increasing
|
1429
|
+
Nadds a subscript to a conditional entropy. A simple substitution for pB S in
|
1430
|
+
the definition of F i j Nshows that FN NGN N 1 GN1 = , , , 1 and summing this
|
1431
|
+
for all Ngives GN Fn. Hence GN FNand GNmonotonic decreasing. Also they = N must
|
1432
|
+
approach the same limit. By using Theorem 3 we see that Lim GN H. = N !
|
1433
|
+
APPENDIX 4 MAXIMIZING THE RATE FOR A SYSTEM OF CONSTRAINTS Suppose we have a
|
1434
|
+
set of constraints on sequences of symbols that is of the finite state type and
|
1435
|
+
can be s represented therefore by a linear graph. Let be the lengths of the
|
1436
|
+
various symbols that can occur in `i j s passing from state ito state j. What
|
1437
|
+
distribution of probabilities P ifor the different states and p for i j
|
1438
|
+
choosing symbol sin state iand going to state jmaximizes the rate of generating
|
1439
|
+
information under theseconstraints? The constraints define a discrete channel
|
1440
|
+
and the maximum rate must be less than or equal tothe capacity Cof this
|
1441
|
+
channel, since if all blocks of large length were equally likely, this rate
|
1442
|
+
would result,and if possible this would be best. We will show that this rate
|
1443
|
+
can be achieved by proper choice of the Piand s p . i j The rate in question is
|
1444
|
+
s s P i p log p N , i j i j = : s s P M i pij`i j s s s Let i j . Evidently for
|
1445
|
+
a maximum p kexp . The constraints on maximization are Pi ` = s`i j i j= `i j =
|
1446
|
+
1, j pi j 1, Pi pi j i j 0. Hence we maximize = , = Pipijlog pij , U Pi ipij
|
1447
|
+
jPi pij ij = P + + + , i pi j i j ` i U MPi1 log pi j NPi i j + + ` i iPi 0 = ,
|
1448
|
+
+ = : pi j M2 + + 30
|
1449
|
+
===============================================================================
|
1450
|
+
Solving for pi j pi j AiB jD,`ij = : Since p 1 i j 1 A, BjD,`ij = ;
|
1451
|
+
|
1452
|
+
i = j j B jD,`ij pi j= : s BsD,`is The correct value of Dis the capacity Cand
|
1453
|
+
the B jare solutions of B i j i BjC,` = for then B j pi j C,`ij = Bi Bj Pi
|
1454
|
+
C,`ij Pj = Bi or Pi Pj C,`ij= : Bi B j So that if isatisfy iC,`ij j = Pi Bi i =
|
1455
|
+
: Both the sets of equations for Biand ican be satisfied since Cis such that
|
1456
|
+
C,`ij i j 0 j , j = : In this case the rate is B B P j j i pi jlog C,`ij P B i
|
1457
|
+
pi jlog i B C i , = , Pi pi j i j Pipij ij ` ` but
|
1458
|
+
PipijlogBjlogBiPjlogBjPilogBi0 , = , = j Hence the rate is Cand as this could
|
1459
|
+
never be exceeded this is the maximum, justifying the assumed solution. 31
|
1460
|
+
===============================================================================
|
1461
|
+
PART III: MATHEMATICAL PRELIMINARIES In this final installment of the paper we
|
1462
|
+
consider the case where the signals or the messages or both arecontinuously
|
1463
|
+
variable, in contrast with the discrete nature assumed heretofore. To a
|
1464
|
+
considerable extent thecontinuous case can be obtained through a limiting
|
1465
|
+
process from the discrete case by dividing the continuumof messages and signals
|
1466
|
+
into a large but finite number of small regions and calculating the various
|
1467
|
+
parametersinvolved on a discrete basis. As the size of the regions is decreased
|
1468
|
+
these parameters in general approach aslimits the proper values for the
|
1469
|
+
continuous case. There are, however, a few new effects that appear and alsoa
|
1470
|
+
general change of emphasis in the direction of specialization of the general
|
1471
|
+
results to particular cases. We will not attempt, in the continuous case, to
|
1472
|
+
obtain our results with the greatest generality, or with the extreme rigor of
|
1473
|
+
pure mathematics, since this would involve a great deal of abstract measure
|
1474
|
+
theoryand would obscure the main thread of the analysis. A preliminary study,
|
1475
|
+
however, indicates that the theorycan be formulated in a completely axiomatic
|
1476
|
+
and rigorous manner which includes both the continuous anddiscrete cases and
|
1477
|
+
many others. The occasional liberties taken with limiting processes in the
|
1478
|
+
present analysiscan be justified in all cases of practical interest. 18. SETS
|
1479
|
+
AND ENSEMBLES OF FUNCTIONS We shall have to deal in the continuous case with
|
1480
|
+
sets of functions and ensembles of functions. A set offunctions, as the name
|
1481
|
+
implies, is merely a class or collection of functions, generally of one
|
1482
|
+
variable, time.It can be specified by giving an explicit representation of the
|
1483
|
+
various functions in the set, or implicitly bygiving a property which functions
|
1484
|
+
in the set possess and others do not. Some examples are: 1. The set of
|
1485
|
+
functions: f t sin t = + : Each particular value of determines a particular
|
1486
|
+
function in the set. 2. The set of all functions of time containing no
|
1487
|
+
frequencies over Wcycles per second. 3. The set of all functions limited in
|
1488
|
+
band to Wand in amplitude to A. 4. The set of all English speech signals as
|
1489
|
+
functions of time. An ensembleof functions is a set of functions together with
|
1490
|
+
a probability measure whereby we may determine the probability of a function in
|
1491
|
+
the set having certain properties.1 For example with the set, f t sin t = + ;
|
1492
|
+
|
1493
|
+
we may give a probability distribution for , P . The set then becomes an
|
1494
|
+
ensemble. Some further examples of ensembles of functions are: 1. A finite set
|
1495
|
+
of functions fk t(k 1 2 n) with the probability of fkbeing pk. = ;
|
1496
|
+
|
1497
|
+
;
|
1498
|
+
|
1499
|
+
: : : ;
|
1500
|
+
|
1501
|
+
2. A finite dimensional family of functions f 1 2 n;
|
1502
|
+
|
1503
|
+
t ;
|
1504
|
+
|
1505
|
+
;
|
1506
|
+
|
1507
|
+
: : : ;
|
1508
|
+
|
1509
|
+
with a probability distribution on the parameters i: p 1 n ;
|
1510
|
+
|
1511
|
+
: : : ;
|
1512
|
+
|
1513
|
+
: For example we could consider the ensemble defined by n f a1 an1 n;
|
1514
|
+
|
1515
|
+
t aisini t i ;
|
1516
|
+
|
1517
|
+
: : : ;
|
1518
|
+
|
1519
|
+
;
|
1520
|
+
|
1521
|
+
;
|
1522
|
+
|
1523
|
+
: : : ;
|
1524
|
+
|
1525
|
+
= ! + i1 = with the amplitudes aidistributed normally and independently, and
|
1526
|
+
the phases idistributed uniformly (from 0 to 2 ) and independently. 1In
|
1527
|
+
mathematical terminology the functions belong to a measure space whose total
|
1528
|
+
measure is unity. 32
|
1529
|
+
===============================================================================
|
1530
|
+
3. The ensemble + sin 2W t n f a , i t an ;
|
1531
|
+
|
1532
|
+
= 2W t n n , =, p with the ainormal and independent all with the same standard
|
1533
|
+
deviation N. This is a representation of "white" noise, band limited to the
|
1534
|
+
band from 0 to Wcycles per second and with average power N.2 4. Let points be
|
1535
|
+
distributed on the taxis according to a Poisson distribution. At each selected
|
1536
|
+
point the function f tis placed and the different functions added, giving the
|
1537
|
+
ensemble f t tk + k =, where the tkare the points of the Poisson distribution.
|
1538
|
+
This ensemble can be considered as a type ofimpulse or shot noise where all the
|
1539
|
+
impulses are identical. 5. The set of English speech functions with the
|
1540
|
+
probability measure given by the frequency of occurrence in ordinary use. An
|
1541
|
+
ensemble of functions f tis stationaryif the same ensemble results when all
|
1542
|
+
functions are shifted any fixed amount in time. The ensemble f t sin t = + is
|
1543
|
+
stationary if is distributed uniformly from 0 to 2 . If we shift each function
|
1544
|
+
by t1 we obtain f t t1 sin t t1 + = + + sin t = + ' with distributed uniformly
|
1545
|
+
from 0 to 2 . Each function has changed but the ensemble as a whole is '
|
1546
|
+
invariant under the translation. The other examples given above are also
|
1547
|
+
stationary. An ensemble is ergodicif it is stationary, and there is no subset
|
1548
|
+
of the functions in the set with a probability different from 0 and 1 which is
|
1549
|
+
stationary. The ensemble sin t + is ergodic. No subset of these functions of
|
1550
|
+
probability 0 1 is transformed into itself under all time trans- 6= ;
|
1551
|
+
|
1552
|
+
lations. On the other hand the ensemble asin t + with adistributed normally and
|
1553
|
+
uniform is stationary but not ergodic. The subset of these functions with
|
1554
|
+
abetween 0 and 1 for example is stationary. Of the examples given, 3 and 4 are
|
1555
|
+
ergodic, and 5 may perhaps be considered so. If an ensemble is ergodic we may
|
1556
|
+
say roughly that each function in the set is typical of the ensemble. More
|
1557
|
+
precisely it isknown that with an ergodic ensemble an average of any statistic
|
1558
|
+
over the ensemble is equal (with probability1) to an average over the time
|
1559
|
+
translations of a particular function of the set.3 Roughly speaking,
|
1560
|
+
eachfunction can be expected, as time progresses, to go through, with the
|
1561
|
+
proper frequency, all the convolutionsof any of the functions in the set. 2This
|
1562
|
+
representation can be used as a definition of band limited white noise. It has
|
1563
|
+
certain advantages in that it involves fewer limiting operations than do
|
1564
|
+
definitions that have been used in the past. The name "white noise," already
|
1565
|
+
firmly entrenched in theliterature, is perhaps somewhat unfortunate. In optics
|
1566
|
+
white light means either any continuous spectrum as contrasted with a
|
1567
|
+
pointspectrum, or a spectrum which is flat with wavelength(which is not the
|
1568
|
+
same as a spectrum flat with frequency). 3This is the famous ergodic theorem or
|
1569
|
+
rather one aspect of this theorem which was proved in somewhat different
|
1570
|
+
formulations by Birkoff, von Neumann, and Koopman, and subsequently generalized
|
1571
|
+
by Wiener, Hopf, Hurewicz and others. The literature onergodic theory is quite
|
1572
|
+
extensive and the reader is referred to the papers of these writers for precise
|
1573
|
+
and general formulations;
|
1574
|
+
|
1575
|
+
e.g.,E. Hopf, "Ergodentheorie," Ergebnisse der Mathematik und ihrer
|
1576
|
+
Grenzgebiete,v. 5;
|
1577
|
+
|
1578
|
+
"On Causality Statistics and Probability," Journalof Mathematics and Physics,v.
|
1579
|
+
XIII, No. 1, 1934;
|
1580
|
+
|
1581
|
+
N. Wiener, "The Ergodic Theorem," Duke Mathematical Journal,v. 5, 1939. 33
|
1582
|
+
===============================================================================
|
1583
|
+
Just as we may perform various operations on numbers or functions to obtain new
|
1584
|
+
numbers or functions, we can perform operations on ensembles to obtain new
|
1585
|
+
ensembles. Suppose, for example, we have anensemble of functions f tand an
|
1586
|
+
operator Twhich gives for each function f ta resulting function g t: g t T f t
|
1587
|
+
= : Probability measure is defined for the set g tby means of that for the set
|
1588
|
+
f t. The probability of a certain subset of the g tfunctions is equal to that
|
1589
|
+
of the subset of the f tfunctions which produce members of the given subset of
|
1590
|
+
gfunctions under the operation T. Physically this corresponds to passing the
|
1591
|
+
ensemblethrough some device, for example, a filter, a rectifier or a modulator.
|
1592
|
+
The output functions of the deviceform the ensemble g t. A device or operator
|
1593
|
+
Twill be called invariant if shifting the input merely shifts the output, i.e.,
|
1594
|
+
if g t T f t = implies g t t1 T f t t1 + = + for all f tand all t1. It is
|
1595
|
+
easily shown (see Appendix 5 that if Tis invariant and the input ensemble is
|
1596
|
+
stationary then the output ensemble is stationary. Likewise if the input is
|
1597
|
+
ergodic the output will also beergodic. A filter or a rectifier is invariant
|
1598
|
+
under all time translations. The operation of modulation is not since the
|
1599
|
+
carrier phase gives a certain time structure. However, modulation is invariant
|
1600
|
+
under all translations whichare multiples of the period of the carrier. Wiener
|
1601
|
+
has pointed out the intimate relation between the invariance of physical
|
1602
|
+
devices under time translations and Fourier theory.4 He has shown, in fact,
|
1603
|
+
that if a device is linear as well as invariant Fourieranalysis is then the
|
1604
|
+
appropriate mathematical tool for dealing with the problem. An ensemble of
|
1605
|
+
functions is the appropriate mathematical representation of the messages
|
1606
|
+
produced by a continuous source (for example, speech), of the signals produced
|
1607
|
+
by a transmitter, and of the perturbingnoise. Communication theory is properly
|
1608
|
+
concerned, as has been emphasized by Wiener, not with operationson particular
|
1609
|
+
functions, but with operations on ensembles of functions. A communication
|
1610
|
+
system is designednot for a particular speech function and still less for a
|
1611
|
+
sine wave, but for the ensemble of speech functions. 19. BAND LIMITED ENSEMBLES
|
1612
|
+
OF FUNCTIONS If a function of time f tis limited to the band from 0 to Wcycles
|
1613
|
+
per second it is completely determined by giving its ordinates at a series of
|
1614
|
+
discrete points spaced 1 seconds apart in the manner indicated by the 2W
|
1615
|
+
following result.5 Theorem 13:Let f tcontain no frequencies over W. Then sin 2W
|
1616
|
+
t n f t X , n = 2W t n , , where n Xn f = : 2W 4Communication theory is heavily
|
1617
|
+
indebted to Wiener for much of its basic philosophy and theory. His classic
|
1618
|
+
NDRC report, The Interpolation, Extrapolation and Smoothing of Stationary Time
|
1619
|
+
Series(Wiley, 1949), contains the first clear-cut formulation ofcommunication
|
1620
|
+
theory as a statistical problem, the study of operations on time series. This
|
1621
|
+
work, although chiefly concerned with thelinear prediction and filtering
|
1622
|
+
problem, is an important collateral reference in connection with the present
|
1623
|
+
paper. We may also referhere to Wiener's Cybernetics(Wiley, 1948), dealing with
|
1624
|
+
the general problems of communication and control. 5For a proof of this theorem
|
1625
|
+
and further discussion see the author's paper "Communication in the Presence of
|
1626
|
+
Noise" published in the Proceedings of the Institute of Radio Engineers,v. 37,
|
1627
|
+
No. 1, Jan., 1949, pp. 10�21. 34
|
1628
|
+
===============================================================================
|
1629
|
+
In this expansion f tis represented as a sum of orthogonal functions. The
|
1630
|
+
coefficients Xnof the various terms can be considered as coordinates in an
|
1631
|
+
infinite dimensional "function space." In this space eachfunction corresponds
|
1632
|
+
to precisely one point and each point to one function. A function can be
|
1633
|
+
considered to be substantially limited to a time Tif all the ordinates
|
1634
|
+
Xnoutside this interval of time are zero. In this case all but 2TWof the
|
1635
|
+
coordinates will be zero. Thus functions limited toa band Wand duration
|
1636
|
+
Tcorrespond to points in a space of 2TWdimensions. A subset of the functions of
|
1637
|
+
band Wand duration Tcorresponds to a region in this space. For example, the
|
1638
|
+
functions whose total energy is less than or equal to Ecorrespond to points in
|
1639
|
+
a 2TWdimensional sphere p with radius r 2W E. = An ensembleof functions of
|
1640
|
+
limited duration and band will be represented by a probability distribution p
|
1641
|
+
x1 xnin the corresponding ndimensional space. If the ensemble is not limited in
|
1642
|
+
time we can consider ;
|
1643
|
+
|
1644
|
+
: : : ;
|
1645
|
+
|
1646
|
+
the 2TWcoordinates in a given interval Tto represent substantially the part of
|
1647
|
+
the function in the interval Tand the probability distribution p x1 xnto give
|
1648
|
+
the statistical structure of the ensemble for intervals of ;
|
1649
|
+
|
1650
|
+
: : : ;
|
1651
|
+
|
1652
|
+
that duration. 20. ENTROPY OF A CONTINUOUS DISTRIBUTION The entropy of a
|
1653
|
+
discrete set of probabilities p1 pnhas been defined as: ;
|
1654
|
+
|
1655
|
+
: : : ;
|
1656
|
+
|
1657
|
+
H pilogpi = , : In an analogous manner we define the entropy of a continuous
|
1658
|
+
distribution with the density distributionfunction p xby: Z H p xlog p x dx = ,
|
1659
|
+
: , With an ndimensional distribution p x1 xnwe have ;
|
1660
|
+
|
1661
|
+
: : : ;
|
1662
|
+
|
1663
|
+
Z Z H p x1 xnlog p x1 xn dx1 dxn = , ;
|
1664
|
+
|
1665
|
+
: : : ;
|
1666
|
+
|
1667
|
+
;
|
1668
|
+
|
1669
|
+
: : : ;
|
1670
|
+
|
1671
|
+
: If we have two arguments xand y(which may themselves be multidimensional) the
|
1672
|
+
joint and conditionalentropies of p x yare given by ;
|
1673
|
+
|
1674
|
+
Z Z H x y p x ylog p x y dx dy ;
|
1675
|
+
|
1676
|
+
= , ;
|
1677
|
+
|
1678
|
+
;
|
1679
|
+
|
1680
|
+
and Z Z p x y ;
|
1681
|
+
|
1682
|
+
Hx y p x ylog dx dy = , ;
|
1683
|
+
|
1684
|
+
p x Z Z p x y H ;
|
1685
|
+
|
1686
|
+
y x p x ylog dx dy = , ;
|
1687
|
+
|
1688
|
+
p y where Z p x p x y dy = ;
|
1689
|
+
|
1690
|
+
Z p y p x y dx = ;
|
1691
|
+
|
1692
|
+
: The entropies of continuous distributions have most (but not all) of the
|
1693
|
+
properties of the discrete case. In particular we have the following: 1. If xis
|
1694
|
+
limited to a certain volume vin its space, then H xis a maximum and equal to
|
1695
|
+
log vwhen p x is constant (1 v) in the volume. = 35
|
1696
|
+
===============================================================================
|
1697
|
+
2. With any two variables x, ywe have H x y H x H y ;
|
1698
|
+
|
1699
|
+
+ with equality if (and only if) xand yare independent, i.e., p x y p x p y
|
1700
|
+
(apart possibly from a ;
|
1701
|
+
|
1702
|
+
= set of points of probability zero). 3. Consider a generalized averaging
|
1703
|
+
operation of the following type: Z p0 y a x y p x dx = ;
|
1704
|
+
|
1705
|
+
with Z Z a x y dx a x y dy 1 a x y 0 ;
|
1706
|
+
|
1707
|
+
= ;
|
1708
|
+
|
1709
|
+
= ;
|
1710
|
+
|
1711
|
+
;
|
1712
|
+
|
1713
|
+
: Then the entropy of the averaged distribution p0 yis equal to or greater than
|
1714
|
+
that of the original distribution p x. 4. We have H x y H x Hx y H y Hy x ;
|
1715
|
+
|
1716
|
+
= + = + and Hx y H y : 5. Let p xbe a one-dimensional distribution. The form of
|
1717
|
+
p xgiving a maximum entropy subject to the condition that the standard
|
1718
|
+
deviation of xbe fixed at is Gaussian. To show this we must maximize Z H x p
|
1719
|
+
xlog p x dx = , with Z Z 2 p x x2 dx and 1 p x dx = = as constraints. This
|
1720
|
+
requires, by the calculus of variations, maximizing Z p xlog p x p x x2 p x dx
|
1721
|
+
, + + : The condition for this is 1 log p x x2 0 , , + + = and consequently
|
1722
|
+
(adjusting the constants to satisfy the constraints) 1 p x e x2 2 2 , = p = : 2
|
1723
|
+
Similarly in ndimensions, suppose the second order moments of p x1 xnare fixed
|
1724
|
+
at Ai j: ;
|
1725
|
+
|
1726
|
+
: : : ;
|
1727
|
+
|
1728
|
+
Z Z Ai j xix j p x1 xn dx1 dxn = ;
|
1729
|
+
|
1730
|
+
: : : ;
|
1731
|
+
|
1732
|
+
: Then the maximum entropy occurs (by a similar calculation) when p x1 xnis the
|
1733
|
+
ndimensional ;
|
1734
|
+
|
1735
|
+
: : : ;
|
1736
|
+
|
1737
|
+
Gaussian distribution with the second order moments Ai j. 36
|
1738
|
+
===============================================================================
|
1739
|
+
6. The entropy of a one-dimensional Gaussian distribution whose standard
|
1740
|
+
deviation is is given by p H x log 2 e = : This is calculated as follows: 1 p x
|
1741
|
+
e x2 2 2 , = p = 2 x2 p log p x log 2 , = + 2 2 Z H x p xlog p x dx = , Z Z x2
|
1742
|
+
p p xlog 2 dx p x dx = + 2 2 2 p log 2 = + 2 2 p p log 2 log e = + p log 2 e =
|
1743
|
+
: Similarly the ndimensional Gaussian distribution with associated quadratic
|
1744
|
+
form ai jis given by 1 a i j2 j j p x 1 1 xn exp aijxixj ;
|
1745
|
+
|
1746
|
+
: : : ;
|
1747
|
+
|
1748
|
+
= , 2 n2 2 = and the entropy can be calculated as 1 H log 2 e n2 = a, i j 2 = j
|
1749
|
+
j where ai jis the determinant whose elements are ai j. j j 7. If xis limited
|
1750
|
+
to a half line (p x 0 for x 0) and the first moment of xis fixed at a: = Z a p
|
1751
|
+
x x dx = ;
|
1752
|
+
|
1753
|
+
0 then the maximum entropy occurs when 1 p x e x a , = = a and is equal to log
|
1754
|
+
ea. 8. There is one important difference between the continuous and discrete
|
1755
|
+
entropies. In the discrete case the entropy measures in an absoluteway the
|
1756
|
+
randomness of the chance variable. In the continuouscase the measurement is
|
1757
|
+
relative to the coordinate system. If we change coordinates the entropy willin
|
1758
|
+
general change. In fact if we change to coordinates y1 ynthe new entropy is
|
1759
|
+
given by Z Z x x H y p x1 xn J log p x1 xn J dy1 dyn = ;
|
1760
|
+
|
1761
|
+
: : : ;
|
1762
|
+
|
1763
|
+
;
|
1764
|
+
|
1765
|
+
: : : ;
|
1766
|
+
|
1767
|
+
y y , where J xis the Jacobian of the coordinate transformation. On expanding
|
1768
|
+
the logarithm and chang- y ing the variables to x1 xn, we obtain: Z Z x H y H x
|
1769
|
+
p x1 xnlog J dx1 dxn = , ;
|
1770
|
+
|
1771
|
+
: : : ;
|
1772
|
+
|
1773
|
+
: : : : y 37
|
1774
|
+
===============================================================================
|
1775
|
+
Thus the new entropy is the old entropy less the expected logarithm of the
|
1776
|
+
Jacobian. In the continuouscase the entropy can be considered a measure of
|
1777
|
+
randomness relative to an assumed standard, namelythe coordinate system chosen
|
1778
|
+
with each small volume element dx1 dxngiven equal weight. When we change the
|
1779
|
+
coordinate system the entropy in the new system measures the randomness when
|
1780
|
+
equalvolume elements dy1 dynin the new system are given equal weight. In spite
|
1781
|
+
of this dependence on the coordinate system the entropy concept is as important
|
1782
|
+
in the con-tinuous case as the discrete case. This is due to the fact that the
|
1783
|
+
derived concepts of information rateand channel capacity depend on the
|
1784
|
+
differenceof two entropies and this difference does notdependon the coordinate
|
1785
|
+
frame, each of the two terms being changed by the same amount. The entropy of a
|
1786
|
+
continuous distribution can be negative. The scale of measurements sets an
|
1787
|
+
arbitraryzero corresponding to a uniform distribution over a unit volume. A
|
1788
|
+
distribution which is more confinedthan this has less entropy and will be
|
1789
|
+
negative. The rates and capacities will, however, always be non-negative. 9. A
|
1790
|
+
particular case of changing coordinates is the linear transformation y j aijxi
|
1791
|
+
= : i In this case the Jacobian is simply the determinant a 1 , i j and j j H y
|
1792
|
+
H x log ai j = + j j: In the case of a rotation of coordinates (or any measure
|
1793
|
+
preserving transformation) J 1 and H y = = H x. 21. ENTROPY OF AN ENSEMBLE OF
|
1794
|
+
FUNCTIONS Consider an ergodic ensemble of functions limited to a certain band
|
1795
|
+
of width Wcycles per second. Let p x1 xn ;
|
1796
|
+
|
1797
|
+
: : : ;
|
1798
|
+
|
1799
|
+
be the density distribution function for amplitudes x1 xnat nsuccessive sample
|
1800
|
+
points. We define the ;
|
1801
|
+
|
1802
|
+
: : : ;
|
1803
|
+
|
1804
|
+
entropy of the ensemble per degree of freedom by 1 Z Z H0 Lim p x1 xnlog p x1
|
1805
|
+
xn dx1 dxn = , ;
|
1806
|
+
|
1807
|
+
: : : ;
|
1808
|
+
|
1809
|
+
;
|
1810
|
+
|
1811
|
+
: : : ;
|
1812
|
+
|
1813
|
+
: : : : n n ! We may also define an entropy Hper second by dividing, not by n,
|
1814
|
+
but by the time Tin seconds for nsamples. Since n 2TW, H 2W H0. = = With white
|
1815
|
+
thermal noise pis Gaussian and we have p H0 log 2 eN = ;
|
1816
|
+
|
1817
|
+
H Wlog 2 eN = : For a given average power N, white noise has the maximum
|
1818
|
+
possible entropy. This follows from the maximizing properties of the Gaussian
|
1819
|
+
distribution noted above. The entropy for a continuous stochastic process has
|
1820
|
+
many properties analogous to that for discrete pro- cesses. In the discrete
|
1821
|
+
case the entropy was related to the logarithm of the probabilityof long
|
1822
|
+
sequences,and to the numberof reasonably probable sequences of long length. In
|
1823
|
+
the continuous case it is related ina similar fashion to the logarithm of the
|
1824
|
+
probability densityfor a long series of samples, and the volumeofreasonably
|
1825
|
+
high probability in the function space. More precisely, if we assume p x1
|
1826
|
+
xncontinuous in all the xifor all n, then for sufficiently large n ;
|
1827
|
+
|
1828
|
+
: : : ;
|
1829
|
+
|
1830
|
+
log p H0 , n 38
|
1831
|
+
===============================================================================
|
1832
|
+
for all choices of x1 xnapart from a set whose total probability is less than ,
|
1833
|
+
with and arbitrarily ;
|
1834
|
+
|
1835
|
+
: : : ;
|
1836
|
+
|
1837
|
+
small. This follows form the ergodic property if we divide the space into a
|
1838
|
+
large number of small cells. The relation of Hto volume can be stated as
|
1839
|
+
follows: Under the same assumptions consider the n dimensional space
|
1840
|
+
corresponding to p x1 xn. Let Vn qbe the smallest volume in this space which ;
|
1841
|
+
|
1842
|
+
: : : ;
|
1843
|
+
|
1844
|
+
includes in its interior a total probability q. Then logVn q Lim H0 = n n !
|
1845
|
+
provided qdoes not equal 0 or 1. These results show that for large nthere is a
|
1846
|
+
rather well-defined volume (at least in the logarithmic sense) of high
|
1847
|
+
probability, and that within this volume the probability density is relatively
|
1848
|
+
uniform (again in thelogarithmic sense). In the white noise case the
|
1849
|
+
distribution function is given by 1 1 p x1 xn exp x2 ;
|
1850
|
+
|
1851
|
+
: : : ;
|
1852
|
+
|
1853
|
+
= , 2 N n2 i: = 2N Since this depends only on x2 the surfaces of equal
|
1854
|
+
probability density are spheres and the entire distri- i p bution has spherical
|
1855
|
+
symmetry. The region of high probability is a sphere of radius nN. As n the ! p
|
1856
|
+
probability of being outside a sphere of radius n N approaches zero and 1 times
|
1857
|
+
the logarithm of the + n p volume of the sphere approaches log 2 eN. In the
|
1858
|
+
continuous case it is convenient to work not with the entropy Hof an ensemble
|
1859
|
+
but with a derived quantity which we will call the entropy power. This is
|
1860
|
+
defined as the power in a white noise limited to thesame band as the original
|
1861
|
+
ensemble and having the same entropy. In other words if H0 is the entropy of
|
1862
|
+
anensemble its entropy power is 1 N1 exp 2H0 = : 2 e In the geometrical picture
|
1863
|
+
this amounts to measuring the high probability volume by the squared radius of
|
1864
|
+
asphere having the same volume. Since white noise has the maximum entropy for a
|
1865
|
+
given power, the entropypower of any noise is less than or equal to its actual
|
1866
|
+
power. 22. ENTROPY LOSS IN LINEAR FILTERS Theorem 14:If an ensemble having an
|
1867
|
+
entropy H1 per degree of freedom in band Wis passed through a filter with
|
1868
|
+
characteristic Y fthe output ensemble has an entropy 1 Z H 2 2 H1 log Y f d f =
|
1869
|
+
+ j j : W W The operation of the filter is essentially a linear transformation
|
1870
|
+
of coordinates. If we think of the different frequency components as the
|
1871
|
+
original coordinate system, the new frequency components are merely the oldones
|
1872
|
+
multiplied by factors. The coordinate transformation matrix is thus essentially
|
1873
|
+
diagonalized in termsof these coordinates. The Jacobian of the transformation
|
1874
|
+
is (for nsine and ncosine components) n J Y f2 i = j j i1 = where the fiare
|
1875
|
+
equally spaced through the band W. This becomes in the limit 1 Z exp log Y f2 d
|
1876
|
+
f j j : W W Since Jis constant its average value is the same quantity and
|
1877
|
+
applying the theorem on the change of entropywith a change of coordinates, the
|
1878
|
+
result follows. We may also phrase it in terms of the entropy power. Thusif the
|
1879
|
+
entropy power of the first ensemble is N1 that of the second is 1 Z N 2 1 exp
|
1880
|
+
log Y f d f j j : W W 39
|
1881
|
+
===============================================================================
|
1882
|
+
TABLE I ENTROPY ENTROPY GAIN POWER POWER GAIN IMPULSE RESPONSE FACTOR IN
|
1883
|
+
DECIBELS 1 1 1 sin2 t2 , ! 8 69 = e2 , : t2 2 = ! 0 1 1 1 2 2 4 sint cos t , !
|
1884
|
+
5 33 2 , : e t3 , t2 ! 0 1 1 1 3 cos t 1 cos t sint , , ! 0 411 3 87 6 : , : t4
|
1885
|
+
, 2t2 + t3 ! 0 1 1 p 1 2 2 2 J1 t , ! 2 67 e , : 2 t ! 0 1 1 1 1 8 69 cos 1 t
|
1886
|
+
cos t : , , e2 , t2 ! 0 1 The final entropy power is the initial entropy power
|
1887
|
+
multiplied by the geometric mean gain of the filter. Ifthe gain is measured in
|
1888
|
+
db, then the output entropy power will be increased by the arithmetic mean
|
1889
|
+
dbgainover W. In Table I the entropy power loss has been calculated (and also
|
1890
|
+
expressed in db) for a number of ideal gain characteristics. The impulsive
|
1891
|
+
responses of these filters are also given for W 2 , with phase assumed = to be
|
1892
|
+
0. The entropy loss for many other cases can be obtained from these results.
|
1893
|
+
For example the entropy power factor 1 e2 for the first case also applies to
|
1894
|
+
any gain characteristic obtain from 1 by a measure = , ! preserving
|
1895
|
+
transformation of the axis. In particular a linearly increasing gain G , or a
|
1896
|
+
"saw tooth" ! ! = ! characteristic between 0 and 1 have the same entropy loss.
|
1897
|
+
The reciprocal gain has the reciprocal factor.Thus 1 has the factor e2. Raising
|
1898
|
+
the gain to any power raises the factor to this power. =! 23. ENTROPY OF A SUM
|
1899
|
+
OF TWO ENSEMBLES If we have two ensembles of functions f tand g twe can form a
|
1900
|
+
new ensemble by "addition." Suppose the first ensemble has the probability
|
1901
|
+
density function p x1 xnand the second q x1 xn. Then the ;
|
1902
|
+
|
1903
|
+
: : : ;
|
1904
|
+
|
1905
|
+
;
|
1906
|
+
|
1907
|
+
: : : ;
|
1908
|
+
|
1909
|
+
40
|
1910
|
+
===============================================================================
|
1911
|
+
density function for the sum is given by the convolution: Z Z r x1 xn p y1 yn q
|
1912
|
+
x1 y1 xn yn dy1 dyn ;
|
1913
|
+
|
1914
|
+
: : : ;
|
1915
|
+
|
1916
|
+
= ;
|
1917
|
+
|
1918
|
+
: : : ;
|
1919
|
+
|
1920
|
+
, ;
|
1921
|
+
|
1922
|
+
: : : ;
|
1923
|
+
|
1924
|
+
, : Physically this corresponds to adding the noises or signals represented by
|
1925
|
+
the original ensembles of func-tions. The following result is derived in
|
1926
|
+
Appendix 6. Theorem 15:Let the average power of two ensembles be N1 and N2 and
|
1927
|
+
let their entropy powers be N1 and N2. Then the entropy power of the sum, N3,
|
1928
|
+
is bounded by N1 N2 N3 N1 N2 + + : White Gaussian noise has the peculiar
|
1929
|
+
property that it can absorb any other noise or signal ensemble which may be
|
1930
|
+
added to it with a resultant entropy power approximately equal to the sum of
|
1931
|
+
the white noisepower and the signal power (measured from the average signal
|
1932
|
+
value, which is normally zero), provided thesignal power is small, in a certain
|
1933
|
+
sense, compared to noise. Consider the function space associated with these
|
1934
|
+
ensembles having ndimensions. The white noise corresponds to the spherical
|
1935
|
+
Gaussian distribution in this space. The signal ensemble corresponds to
|
1936
|
+
anotherprobability distribution, not necessarily Gaussian or spherical. Let the
|
1937
|
+
second moments of this distributionabout its center of gravity be ai j. That
|
1938
|
+
is, if p x1 xnis the density distribution function ;
|
1939
|
+
|
1940
|
+
: : : ;
|
1941
|
+
|
1942
|
+
Z Z ai j p xi i x j j dx1 dxn = , , where the iare the coordinates of the
|
1943
|
+
center of gravity. Now ai jis a positive definite quadratic form, and we can
|
1944
|
+
rotate our coordinate system to align it with the principal directions of this
|
1945
|
+
form. ai jis then reducedto diagonal form bii. We require that each biibe small
|
1946
|
+
compared to N, the squared radius of the sphericaldistribution. In this case
|
1947
|
+
the convolution of the noise and signal produce approximately a Gaussian
|
1948
|
+
distribution whose corresponding quadratic form is N bii + : The entropy power
|
1949
|
+
of this distribution is h i1 n = N bii + or approximately h i1 n = N n b n1 ,
|
1950
|
+
ii N = + 1 : N bii = + : n The last term is the signal power, while the first
|
1951
|
+
is the noise power. PART IV: THE CONTINUOUS CHANNEL 24. THE CAPACITY OF A
|
1952
|
+
CONTINUOUS CHANNEL In a continuous channel the input or transmitted signals
|
1953
|
+
will be continuous functions of time f tbelonging to a certain set, and the
|
1954
|
+
output or received signals will be perturbed versions of these. We will
|
1955
|
+
consideronly the case where both transmitted and received signals are limited
|
1956
|
+
to a certain band W. They can thenbe specified, for a time T, by 2TWnumbers,
|
1957
|
+
and their statistical structure by finite dimensional distributionfunctions.
|
1958
|
+
Thus the statistics of the transmitted signal will be determined by P x1 xn P x
|
1959
|
+
;
|
1960
|
+
|
1961
|
+
: : : ;
|
1962
|
+
|
1963
|
+
= 41
|
1964
|
+
===============================================================================
|
1965
|
+
and those of the noise by the conditional probability distribution Px y y P y 1
|
1966
|
+
xn 1 n x ;
|
1967
|
+
|
1968
|
+
: : : ;
|
1969
|
+
|
1970
|
+
= : ;
|
1971
|
+
|
1972
|
+
:::;
|
1973
|
+
|
1974
|
+
The rate of transmission of information for a continuous channel is defined in
|
1975
|
+
a way analogous to that for a discrete channel, namely R H x Hy x = , where H
|
1976
|
+
xis the entropy of the input and Hy xthe equivocation. The channel capacity Cis
|
1977
|
+
defined as the maximum of Rwhen we vary the input over all possible ensembles.
|
1978
|
+
This means that in a finite dimensionalapproximation we must vary P x P x1
|
1979
|
+
xnand maximize = ;
|
1980
|
+
|
1981
|
+
: : : ;
|
1982
|
+
|
1983
|
+
Z ZZ P x y P xlog P x dx P x ylog ;
|
1984
|
+
|
1985
|
+
dx dy , + ;
|
1986
|
+
|
1987
|
+
: P y This can be written Z Z P x y P x ylog ;
|
1988
|
+
|
1989
|
+
dx dy ;
|
1990
|
+
|
1991
|
+
P x P y Z Z Z using the fact that P x ylog P x dx dy P xlog P x dx. The channel
|
1992
|
+
capacity is thus expressed as ;
|
1993
|
+
|
1994
|
+
= follows: 1 ZZ P x y C Lim Max P x ylog ;
|
1995
|
+
|
1996
|
+
dx dy = ;
|
1997
|
+
|
1998
|
+
: T P x T P x P y ! It is obvious in this form that Rand Care independent of
|
1999
|
+
the coordinate system since the numerator P x y and denominator in log ;
|
2000
|
+
|
2001
|
+
will be multiplied by the same factors when xand yare transformed in P x P y
|
2002
|
+
any one-to-one way. This integral expression for Cis more general than H x Hy
|
2003
|
+
x. Properly interpreted , (see Appendix 7) it will always exist while H x Hy
|
2004
|
+
xmay assume an indeterminate form in some , , cases. This occurs, for example,
|
2005
|
+
if xis limited to a surface of fewer dimensions than nin its
|
2006
|
+
ndimensionalapproximation. If the logarithmic base used in computing H xand Hy
|
2007
|
+
xis two then Cis the maximum number of binary digits that can be sent per
|
2008
|
+
second over the channel with arbitrarily small equivocation, just as inthe
|
2009
|
+
discrete case. This can be seen physically by dividing the space of signals
|
2010
|
+
into a large number ofsmall cells, sufficiently small so that the probability
|
2011
|
+
density Px yof signal xbeing perturbed to point yis substantially constant over
|
2012
|
+
a cell (either of xor y). If the cells are considered as distinct points the
|
2013
|
+
situation isessentially the same as a discrete channel and the proofs used
|
2014
|
+
there will apply. But it is clear physically thatthis quantizing of the volume
|
2015
|
+
into individual points cannot in any practical situation alter the final
|
2016
|
+
answersignificantly, provided the regions are sufficiently small. Thus the
|
2017
|
+
capacity will be the limit of the capacitiesfor the discrete subdivisions and
|
2018
|
+
this is just the continuous capacity defined above. On the mathematical side it
|
2019
|
+
can be shown first (see Appendix 7) that if uis the message, xis the signal,
|
2020
|
+
yis the received signal (perturbed by noise) and vis the recovered message then
|
2021
|
+
H x Hy x H u Hv u , , regardless of what operations are performed on uto obtain
|
2022
|
+
xor on yto obtain v. Thus no matter how weencode the binary digits to obtain
|
2023
|
+
the signal, or how we decode the received signal to recover the message,the
|
2024
|
+
discrete rate for the binary digits does not exceed the channel capacity we
|
2025
|
+
have defined. On the otherhand, it is possible under very general conditions to
|
2026
|
+
find a coding system for transmitting binary digits at therate Cwith as small
|
2027
|
+
an equivocation or frequency of errors as desired. This is true, for example,
|
2028
|
+
if, when wetake a finite dimensional approximating space for the signal
|
2029
|
+
functions, P x yis continuous in both xand y ;
|
2030
|
+
|
2031
|
+
except at a set of points of probability zero. An important special case occurs
|
2032
|
+
when the noise is added to the signal and is independent of it (in the
|
2033
|
+
probability sense). Then Px yis a function only of the difference n y x, = , Px
|
2034
|
+
y Q y x = , 42
|
2035
|
+
===============================================================================
|
2036
|
+
and we can assign a definite entropy to the noise (independent of the
|
2037
|
+
statistics of the signal), namely theentropy of the distribution Q n. This
|
2038
|
+
entropy will be denoted by H n. Theorem 16:If the signal and noise are
|
2039
|
+
independent and the received signal is the sum of the transmitted signal and
|
2040
|
+
the noise then the rate of transmission is R H y H n = , ;
|
2041
|
+
|
2042
|
+
i.e., the entropy of the received signal less the entropy of the noise. The
|
2043
|
+
channel capacity is C Max H y H n = , : P x We have, since y x n: = + H x y H x
|
2044
|
+
n ;
|
2045
|
+
|
2046
|
+
= ;
|
2047
|
+
|
2048
|
+
: Expanding the left side and using the fact that xand nare independent H y Hy
|
2049
|
+
x H x H n + = + : Hence R H x Hy x H y H n = , = , : Since H nis independent of
|
2050
|
+
P x, maximizing Rrequires maximizing H y, the entropy of the received signal.
|
2051
|
+
If there are certain constraints on the ensemble of transmitted signals, the
|
2052
|
+
entropy of the receivedsignal must be maximized subject to these constraints.
|
2053
|
+
25. CHANNEL CAPACITY WITH AN AVERAGE POWER LIMITATION A simple application of
|
2054
|
+
Theorem 16 is the case when the noise is a white thermal noise and the
|
2055
|
+
transmittedsignals are limited to a certain average power P. Then the received
|
2056
|
+
signals have an average power P N + where Nis the average noise power. The
|
2057
|
+
maximum entropy for the received signals occurs when they alsoform a white
|
2058
|
+
noise ensemble since this is the greatest possible entropy for a power P Nand
|
2059
|
+
can be obtained + by a suitable choice of transmitted signals, namely if they
|
2060
|
+
form a white noise ensemble of power P. Theentropy (per second) of the received
|
2061
|
+
ensemble is then H y Wlog 2 e P N = + ;
|
2062
|
+
|
2063
|
+
and the noise entropy is H n Wlog 2 eN = : The channel capacity is P N C H y H
|
2064
|
+
n Wlog + = , = : N Summarizing we have the following: Theorem 17:The capacity
|
2065
|
+
of a channel of band Wperturbed by white thermal noise power Nwhen the average
|
2066
|
+
transmitter power is limited to Pis given by P N + C Wlog = : N This means that
|
2067
|
+
by sufficiently involved encoding systems we can transmit binary digits at the
|
2068
|
+
rate P N Wlog + 2 bits per second, with arbitrarily small frequency of errors.
|
2069
|
+
It is not possible to transmit at a N higher rate by any encoding system
|
2070
|
+
without a definite positive frequency of errors. To approximate this limiting
|
2071
|
+
rate of transmission the transmitted signals must approximate, in statistical
|
2072
|
+
properties, a white noise.6 A system which approaches the ideal rate may be
|
2073
|
+
described as follows: Let 6This and other properties of the white noise case
|
2074
|
+
are discussed from the geometrical point of view in "Communication in the
|
2075
|
+
Presence of Noise," loc. cit. 43
|
2076
|
+
===============================================================================
|
2077
|
+
M 2ssamples of white noise be constructed each of duration T. These are
|
2078
|
+
assigned binary numbers from = 0 to M 1. At the transmitter the message
|
2079
|
+
sequences are broken up into groups of sand for each group , the corresponding
|
2080
|
+
noise sample is transmitted as the signal. At the receiver the Msamples are
|
2081
|
+
known andthe actual received signal (perturbed by noise) is compared with each
|
2082
|
+
of them. The sample which has theleast R.M.S. discrepancy from the received
|
2083
|
+
signal is chosen as the transmitted signal and the correspondingbinary number
|
2084
|
+
reconstructed. This process amounts to choosing the most probable (a
|
2085
|
+
posteriori) signal.The number Mof noise samples used will depend on the
|
2086
|
+
tolerable frequency of errors, but for almost all selections of samples we have
|
2087
|
+
log M T P N ;
|
2088
|
+
|
2089
|
+
+ Lim Lim Wlog = ;
|
2090
|
+
|
2091
|
+
0 T T N ! ! so that no matter how small is chosen, we can, by taking
|
2092
|
+
Tsufficiently large, transmit as near as we wish P N to TWlog + binary digits
|
2093
|
+
in the time T. N P N Formulas similar to C Wlog + for the white noise case have
|
2094
|
+
been developed independently = N by several other writers, although with
|
2095
|
+
somewhat different interpretations. We may mention the work ofN. Wiener,7 W. G.
|
2096
|
+
Tuller,8 and H. Sullivan in this connection. In the case of an arbitrary
|
2097
|
+
perturbing noise (not necessarily white thermal noise) it does not appear that
|
2098
|
+
the maximizing problem involved in determining the channel capacity Ccan be
|
2099
|
+
solved explicitly. However,upper and lower bounds can be set for Cin terms of
|
2100
|
+
the average noise power Nthe noise entropy power N1.These bounds are
|
2101
|
+
sufficiently close together in most practical cases to furnish a satisfactory
|
2102
|
+
solution to theproblem. Theorem 18:The capacity of a channel of band Wperturbed
|
2103
|
+
by an arbitrary noise is bounded by the inequalities P N1 P N Wlog + C Wlog +
|
2104
|
+
N1 N1 where P average transmitter power = N average noise power = N1 entropy
|
2105
|
+
power of the noise. = Here again the average power of the perturbed signals
|
2106
|
+
will be P N. The maximum entropy for this + power would occur if the received
|
2107
|
+
signal were white noise and would be Wlog 2 e P N. It may not + be possible to
|
2108
|
+
achieve this;
|
2109
|
+
|
2110
|
+
i.e., there may not be any ensemble of transmitted signals which, added to
|
2111
|
+
theperturbing noise, produce a white thermal noise at the receiver, but at
|
2112
|
+
least this sets an upper bound to H y. We have, therefore C Max H y H n = ,
|
2113
|
+
Wlog 2 e P N Wlog 2 eN1 + , : This is the upper limit given in the theorem. The
|
2114
|
+
lower limit can be obtained by considering the rate if wemake the transmitted
|
2115
|
+
signal a white noise, of power P. In this case the entropy power of the
|
2116
|
+
received signalmust be at least as great as that of a white noise of power P N1
|
2117
|
+
since we have shown in in a previous + theorem that the entropy power of the
|
2118
|
+
sum of two ensembles is greater than or equal to the sum of theindividual
|
2119
|
+
entropy powers. Hence Max H y Wlog 2 e P N1 + 7Cybernetics, loc.
|
2120
|
+
cit.8"Theoretical Limitations on the Rate of Transmission of Information,"
|
2121
|
+
Proceedings of the Institute of Radio Engineers,v. 37, No. 5, May, 1949, pp.
|
2122
|
+
468�78. 44
|
2123
|
+
===============================================================================
|
2124
|
+
and C Wlog 2 e P N1 Wlog 2 eN1 + , P N1 + Wlog = : N1 As Pincreases, the upper
|
2125
|
+
and lower bounds approach each other, so we have as an asymptotic rate P N Wlog
|
2126
|
+
+ : N1 If the noise is itself white, N N1 and the result reduces to the formula
|
2127
|
+
proved previously: = P C Wlog 1 = + : N If the noise is Gaussian but with a
|
2128
|
+
spectrum which is not necessarily flat, N1 is the geometric mean of the noise
|
2129
|
+
power over the various frequencies in the band W. Thus 1 Z N1 exp log N f d f =
|
2130
|
+
W W where N fis the noise power at frequency f. Theorem 19:If we set the
|
2131
|
+
capacity for a given transmitter power Pequal to P N + , C Wlog = N1 then is
|
2132
|
+
monotonic decreasing as Pincreases and approaches 0 as a limit. Suppose that
|
2133
|
+
for a given power P1 the channel capacity is P1 N 1 Wlog + , : N1 This means
|
2134
|
+
that the best signal distribution, say p x, when added to the noise
|
2135
|
+
distribution q x, gives a received distribution r ywhose entropy power is P1 N
|
2136
|
+
1 . Let us increase the power to P1 Pby + , + adding a white noise of power Pto
|
2137
|
+
the signal. The entropy of the received signal is now at least H y Wlog 2 e P1
|
2138
|
+
N 1 P = + , + by application of the theorem on the minimum entropy power of a
|
2139
|
+
sum. Hence, since we can attain theHindicated, the entropy of the maximizing
|
2140
|
+
distribution must be at least as great and must be monotonic decreasing. To
|
2141
|
+
show that 0 as P consider a signal which is white noise with a large P.
|
2142
|
+
Whatever ! ! the perturbing noise, the received signal will be approximately a
|
2143
|
+
white noise, if Pis sufficiently large, in thesense of having an entropy power
|
2144
|
+
approaching P N. + 26. THE CHANNEL CAPACITY WITH A PEAK POWER LIMITATION In
|
2145
|
+
some applications the transmitter is limited not by the average power output
|
2146
|
+
but by the peak instantaneouspower. The problem of calculating the channel
|
2147
|
+
capacity is then that of maximizing (by variation of theensemble of transmitted
|
2148
|
+
symbols) H y H n , p subject to the constraint that all the functions f tin the
|
2149
|
+
ensemble be less than or equal to S, say, for all t. A constraint of this type
|
2150
|
+
does not work out as well mathematically as the average power limitation. The S
|
2151
|
+
most we have obtained for this case is a lower bound valid for all , an
|
2152
|
+
"asymptotic" upper bound (valid N S S for large ) and an asymptotic value of
|
2153
|
+
Cfor small. N N 45
|
2154
|
+
===============================================================================
|
2155
|
+
Theorem 20:The channel capacity Cfor a band Wperturbed by white thermal noise
|
2156
|
+
of power Nis bounded by 2 S C Wlog ;
|
2157
|
+
|
2158
|
+
e3 N S where Sis the peak allowed transmitter power. For sufficiently large N 2
|
2159
|
+
S N + C Wlog e 1 + N S where is arbitrarily small. As 0 (and provided the band
|
2160
|
+
Wstarts at 0) ! N . S C Wlog 1 1 + ! : N S We wish to maximize the entropy of
|
2161
|
+
the received signal. If is large this will occur very nearly when N we maximize
|
2162
|
+
the entropy of the transmitted ensemble. The asymptotic upper bound is obtained
|
2163
|
+
by relaxing the conditions on the ensemble. Let us suppose that the power is
|
2164
|
+
limited to Snot at every instant of time, but only at the sample points. The
|
2165
|
+
maximum entropy ofthe transmitted ensemble under these weakened conditions is
|
2166
|
+
certainly greater than or equal to that under theoriginal conditions. This
|
2167
|
+
altered problem can be solved easily. The maximum entropy occurs if the
|
2168
|
+
different p p samples are independent and have a distribution function which is
|
2169
|
+
constant from Sto S. The entropy , + can be calculated as Wlog 4S: The received
|
2170
|
+
signal will then have an entropy less than Wlog 4S 2 eN1 + + S with 0 as and
|
2171
|
+
the channel capacity is obtained by subtracting the entropy of the white noise,
|
2172
|
+
! ! N Wlog 2 eN: 2 S N + Wlog 4S 2 eN1 Wlog 2 eN Wlog e 1 + + , = + : N This is
|
2173
|
+
the desired upper bound to the channel capacity. To obtain a lower bound
|
2174
|
+
consider the same ensemble of functions. Let these functions be passed through
|
2175
|
+
an ideal filter with a triangular transfer characteristic. The gain is to be
|
2176
|
+
unity at frequency 0 and declinelinearly down to gain 0 at frequency W. We
|
2177
|
+
first show that the output functions of the filter have a peak sin 2 W t power
|
2178
|
+
limitation Sat all times (not just the sample points). First we note that a
|
2179
|
+
pulse going into 2 W t the filter produces 1 sin2 W t 2 W t2 in the output.
|
2180
|
+
This function is never negative. The input function (in the general case) can
|
2181
|
+
be thought of asthe sum of a series of shifted functions sin 2 W t a 2 W t p
|
2182
|
+
where a, the amplitude of the sample, is not greater than S. Hence the output
|
2183
|
+
is the sum of shifted functions of the non-negative form above with the same
|
2184
|
+
coefficients. These functions being non-negative, the greatest p positive value
|
2185
|
+
for any tis obtained when all the coefficients ahave their maximum positive
|
2186
|
+
values, i.e., S. p In this case the input function was a constant of amplitude
|
2187
|
+
Sand since the filter has unit gain for D.C., the output is the same. Hence the
|
2188
|
+
output ensemble has a peak power S. 46
|
2189
|
+
===============================================================================
|
2190
|
+
The entropy of the output ensemble can be calculated from that of the input
|
2191
|
+
ensemble by using the theorem dealing with such a situation. The output entropy
|
2192
|
+
is equal to the input entropy plus the geometricalmean gain of the filter: Z W
|
2193
|
+
Z W W f2 log G2 d f log , d f 2W = = , : 0 0 W Hence the output entropy is 4S
|
2194
|
+
Wlog 4S 2W Wlog , = e2 and the channel capacity is greater than 2 S Wlog : e3 N
|
2195
|
+
S We now wish to show that, for small (peak signal power over average white
|
2196
|
+
noise power), the channel N capacity is approximately S C Wlog 1 = + : N . S S
|
2197
|
+
More precisely C Wlog 1 1 as 0. Since the average signal power Pis less than or
|
2198
|
+
equal + ! ! N N S to the peak S, it follows that for all N P S C Wlog 1 Wlog 1
|
2199
|
+
+ + : N N S Therefore, if we can find an ensemble of functions such that they
|
2200
|
+
correspond to a rate nearly Wlog 1 + Nand are limited to band Wand peak Sthe
|
2201
|
+
result will be proved. Consider the ensemble of functions of the p p following
|
2202
|
+
type. A series of tsamples have the same value, either Sor S, then the next
|
2203
|
+
tsamples have + , p p the same value, etc. The value for a series is chosen at
|
2204
|
+
random, probability 1 for Sand 1 for S. If 2 + 2 , this ensemble be passed
|
2205
|
+
through a filter with triangular gain characteristic (unit gain at D.C.), the
|
2206
|
+
output ispeak limited to S. Furthermore the average power is nearly Sand can be
|
2207
|
+
made to approach this by taking t sufficiently large. The entropy of the sum of
|
2208
|
+
this and the thermal noise can be found by applying the theoremon the sum of a
|
2209
|
+
noise and a small signal. This theorem will apply if S p t N S is sufficiently
|
2210
|
+
small. This can be ensured by taking small enough (after tis chosen). The
|
2211
|
+
entropy power N will be S Nto as close an approximation as desired, and hence
|
2212
|
+
the rate of transmission as near as we wish + to S N Wlog + : N PART V: THE
|
2213
|
+
RATE FOR A CONTINUOUS SOURCE 27. FIDELITY EVALUATION FUNCTIONS In the case of a
|
2214
|
+
discrete source of information we were able to determine a definite rate of
|
2215
|
+
generatinginformation, namely the entropy of the underlying stochastic process.
|
2216
|
+
With a continuous source the situationis considerably more involved. In the
|
2217
|
+
first place a continuously variable quantity can assume an infinitenumber of
|
2218
|
+
values and requires, therefore, an infinite number of binary digits for exact
|
2219
|
+
specification. Thismeans that to transmit the output of a continuous source
|
2220
|
+
with exact recoveryat the receiving point requires, 47
|
2221
|
+
===============================================================================
|
2222
|
+
in general, a channel of infinite capacity (in bits per second). Since,
|
2223
|
+
ordinarily, channels have a certainamount of noise, and therefore a finite
|
2224
|
+
capacity, exact transmission is impossible. This, however, evades the real
|
2225
|
+
issue. Practically, we are not interested in exact transmission when we have a
|
2226
|
+
continuous source, but only in transmission to within a certain tolerance. The
|
2227
|
+
question is, can weassign a definite rate to a continuous source when we
|
2228
|
+
require only a certain fidelity of recovery, measured ina suitable way. Of
|
2229
|
+
course, as the fidelity requirements are increased the rate will increase. It
|
2230
|
+
will be shownthat we can, in very general cases, define such a rate, having the
|
2231
|
+
property that it is possible, by properlyencoding the information, to transmit
|
2232
|
+
it over a channel whose capacity is equal to the rate in question, andsatisfy
|
2233
|
+
the fidelity requirements. A channel of smaller capacity is insufficient. It is
|
2234
|
+
first necessary to give a general mathematical formulation of the idea of
|
2235
|
+
fidelity of transmission. Consider the set of messages of a long duration, say
|
2236
|
+
Tseconds. The source is described by giving theprobability density, in the
|
2237
|
+
associated space, that the source will select the message in question P x. A
|
2238
|
+
given communication system is described (from the external point of view) by
|
2239
|
+
giving the conditional probabilityPx ythat if message xis produced by the
|
2240
|
+
source the recovered message at the receiving point will be y. The system as a
|
2241
|
+
whole (including source and transmission system) is described by the
|
2242
|
+
probability function P x y ;
|
2243
|
+
|
2244
|
+
of having message xand final output y. If this function is known, the complete
|
2245
|
+
characteristics of the systemfrom the point of view of fidelity are known. Any
|
2246
|
+
evaluation of fidelity must correspond mathematicallyto an operation applied to
|
2247
|
+
P x y. This operation must at least have the properties of a simple ordering of
|
2248
|
+
;
|
2249
|
+
|
2250
|
+
systems;
|
2251
|
+
|
2252
|
+
i.e., it must be possible to say of two systems represented by P1 x yand P2 x
|
2253
|
+
ythat, according to ;
|
2254
|
+
|
2255
|
+
;
|
2256
|
+
|
2257
|
+
our fidelity criterion, either (1) the first has higher fidelity, (2) the
|
2258
|
+
second has higher fidelity, or (3) they haveequal fidelity. This means that a
|
2259
|
+
criterion of fidelity can be represented by a numerically valued function: , v
|
2260
|
+
P x y ;
|
2261
|
+
|
2262
|
+
whose argument ranges over possible probability functions P x y. ;
|
2263
|
+
|
2264
|
+
, We will now show that under very general and reasonable assumptions the
|
2265
|
+
function v P x y can be ;
|
2266
|
+
|
2267
|
+
written in a seemingly much more specialized form, namely as an average of a
|
2268
|
+
function x yover the set ;
|
2269
|
+
|
2270
|
+
of possible values of xand y: Z Z , v P x y P x y x y dx dy ;
|
2271
|
+
|
2272
|
+
= ;
|
2273
|
+
|
2274
|
+
;
|
2275
|
+
|
2276
|
+
: To obtain this we need only assume (1) that the source and system are ergodic
|
2277
|
+
so that a very long samplewill be, with probability nearly 1, typical of the
|
2278
|
+
ensemble, and (2) that the evaluation is "reasonable" in thesense that it is
|
2279
|
+
possible, by observing a typical input and output x1 and y1, to form a
|
2280
|
+
tentative evaluationon the basis of these samples;
|
2281
|
+
|
2282
|
+
and if these samples are increased in duration the tentative evaluation
|
2283
|
+
will,with probability 1, approach the exact evaluation based on a full
|
2284
|
+
knowledge of P x y. Let the tentative ;
|
2285
|
+
|
2286
|
+
evaluation be x y. Then the function x yapproaches (as T ) a constant for
|
2287
|
+
almost all x ywhich ;
|
2288
|
+
|
2289
|
+
;
|
2290
|
+
|
2291
|
+
! ;
|
2292
|
+
|
2293
|
+
are in the high probability region corresponding to the system: , x y v P x y ;
|
2294
|
+
|
2295
|
+
! ;
|
2296
|
+
|
2297
|
+
and we may also write Z Z x y P x y x y dx dy ;
|
2298
|
+
|
2299
|
+
! ;
|
2300
|
+
|
2301
|
+
;
|
2302
|
+
|
2303
|
+
since Z Z P x y dx dy 1 ;
|
2304
|
+
|
2305
|
+
= : This establishes the desired result. The function x yhas the general nature
|
2306
|
+
of a "distance" between xand y.9 It measures how undesirable ;
|
2307
|
+
|
2308
|
+
it is (according to our fidelity criterion) to receive ywhen xis transmitted.
|
2309
|
+
The general result given abovecan be restated as follows: Any reasonable
|
2310
|
+
evaluation can be represented as an average of a distance functionover the set
|
2311
|
+
of messages and recovered messages xand yweighted according to the probability
|
2312
|
+
P x yof ;
|
2313
|
+
|
2314
|
+
getting the pair in question, provided the duration Tof the messages be taken
|
2315
|
+
sufficiently large. The following are simple examples of evaluation functions:
|
2316
|
+
9It is not a "metric" in the strict sense, however, since in general it does
|
2317
|
+
not satisfy either x y y xor x y y z x z. ;
|
2318
|
+
|
2319
|
+
= ;
|
2320
|
+
|
2321
|
+
;
|
2322
|
+
|
2323
|
+
+ ;
|
2324
|
+
|
2325
|
+
;
|
2326
|
+
|
2327
|
+
48
|
2328
|
+
===============================================================================
|
2329
|
+
1. R.M.S. criterion. , 2 v x t y t = , : In this very commonly used measure of
|
2330
|
+
fidelity the distance function x yis (apart from a constant ;
|
2331
|
+
|
2332
|
+
factor) the square of the ordinary Euclidean distance between the points xand
|
2333
|
+
yin the associatedfunction space. 1 Z T 2 x y x t y t dt ;
|
2334
|
+
|
2335
|
+
= , : T0 2. Frequency weighted R.M.S. criterion. More generally one can apply
|
2336
|
+
different weights to the different frequency components before using an R.M.S.
|
2337
|
+
measure of fidelity. This is equivalent to passing thedifference x t y tthrough
|
2338
|
+
a shaping filter and then determining the average power in the output. , Thus
|
2339
|
+
let e t x t y t = , and Z f t e k t d = , , then 1 Z T x y f t2 dt ;
|
2340
|
+
|
2341
|
+
= : T0 3. Absolute error criterion. 1 Z T x y x t y t dt ;
|
2342
|
+
|
2343
|
+
= , : T0 4. The structure of the ear and brain determine implicitly an
|
2344
|
+
evaluation, or rather a number of evaluations, appropriate in the case of
|
2345
|
+
speech or music transmission. There is, for example, an
|
2346
|
+
"intelligibility"criterion in which x yis equal to the relative frequency of
|
2347
|
+
incorrectly interpreted words when ;
|
2348
|
+
|
2349
|
+
message x tis received as y t. Although we cannot give an explicit
|
2350
|
+
representation of x yin these ;
|
2351
|
+
|
2352
|
+
cases it could, in principle, be determined by sufficient experimentation. Some
|
2353
|
+
of its properties followfrom well-known experimental results in hearing, e.g.,
|
2354
|
+
the ear is relatively insensitive to phase and thesensitivity to amplitude and
|
2355
|
+
frequency is roughly logarithmic. 5. The discrete case can be considered as a
|
2356
|
+
specialization in which we have tacitly assumed an evaluation based on the
|
2357
|
+
frequency of errors. The function x yis then defined as the number of symbols
|
2358
|
+
in the ;
|
2359
|
+
|
2360
|
+
sequence ydiffering from the corresponding symbols in xdivided by the total
|
2361
|
+
number of symbols inx. 28. THE RATE FOR A SOURCE RELATIVE TO A FIDELITY
|
2362
|
+
EVALUATION We are now in a position to define a rate of generating information
|
2363
|
+
for a continuous source. We are givenP xfor the source and an evaluation
|
2364
|
+
vdetermined by a distance function x ywhich will be assumed ;
|
2365
|
+
|
2366
|
+
continuous in both xand y. With a particular system P x ythe quality is
|
2367
|
+
measured by ;
|
2368
|
+
|
2369
|
+
Z Z v x y P x y dx dy = ;
|
2370
|
+
|
2371
|
+
;
|
2372
|
+
|
2373
|
+
: Furthermore the rate of flow of binary digits corresponding to P x yis ;
|
2374
|
+
|
2375
|
+
Z Z P x y R P x ylog ;
|
2376
|
+
|
2377
|
+
dx dy = ;
|
2378
|
+
|
2379
|
+
: P x P y We define the rate R1 of generating information for a given quality
|
2380
|
+
v1 of reproduction to be the minimum ofRwhen we keep vfixed at v1 and vary Px
|
2381
|
+
y. That is: Z Z P x y R ;
|
2382
|
+
|
2383
|
+
1 Min P x ylog dx dy = ;
|
2384
|
+
|
2385
|
+
Px y P x P y 49
|
2386
|
+
===============================================================================
|
2387
|
+
subject to the constraint: Z Z v1 P x y x y dx dy = ;
|
2388
|
+
|
2389
|
+
;
|
2390
|
+
|
2391
|
+
: This means that we consider, in effect, all the communication systems that
|
2392
|
+
might be used and that transmit with the required fidelity. The rate of
|
2393
|
+
transmission in bits per second is calculated for each oneand we choose that
|
2394
|
+
having the least rate. This latter rate is the rate we assign the source for
|
2395
|
+
the fidelity inquestion. The justification of this definition lies in the
|
2396
|
+
following result: Theorem 21:If a source has a rate R1 for a valuation v1 it is
|
2397
|
+
possible to encode the output of the source and transmit it over a channel of
|
2398
|
+
capacity Cwith fidelity as near v1 as desired provided R1 C. This is not
|
2399
|
+
possible if R1 C. The last statement in the theorem follows immediately from
|
2400
|
+
the definition of R1 and previous results. If it were not true we could
|
2401
|
+
transmit more than Cbits per second over a channel of capacity C. The first
|
2402
|
+
partof the theorem is proved by a method analogous to that used for Theorem 11.
|
2403
|
+
We may, in the first place,divide the x yspace into a large number of small
|
2404
|
+
cells and represent the situation as a discrete case. This ;
|
2405
|
+
|
2406
|
+
will not change the evaluation function by more than an arbitrarily small
|
2407
|
+
amount (when the cells are verysmall) because of the continuity assumed for x
|
2408
|
+
y. Suppose that P1 x yis the particular system which ;
|
2409
|
+
|
2410
|
+
;
|
2411
|
+
|
2412
|
+
minimizes the rate and gives R1. We choose from the high probability y's a set
|
2413
|
+
at random containing 2 R T 1+ members where 0 as T . With large Teach chosen
|
2414
|
+
point will be connected by a high probability ! ! line (as in Fig. 10) to a set
|
2415
|
+
of x's. A calculation similar to that used in proving Theorem 11 shows that
|
2416
|
+
withlarge Talmost all x's are covered by the fans from the chosen ypoints for
|
2417
|
+
almost all choices of the y's. Thecommunication system to be used operates as
|
2418
|
+
follows: The selected points are assigned binary numbers.When a message xis
|
2419
|
+
originated it will (with probability approaching 1 as T ) lie within at least
|
2420
|
+
one ! of the fans. The corresponding binary number is transmitted (or one of
|
2421
|
+
them chosen arbitrarily if there areseveral) over the channel by suitable
|
2422
|
+
coding means to give a small probability of error. Since R1 Cthis is possible.
|
2423
|
+
At the receiving point the corresponding yis reconstructed and used as the
|
2424
|
+
recovered message. The evaluation v0 for this system can be made arbitrarily
|
2425
|
+
close to v 1 1 by taking Tsufficiently large. This is due to the fact that for
|
2426
|
+
each long sample of message x tand recovered message y tthe evaluation
|
2427
|
+
approaches v1 (with probability 1). It is interesting to note that, in this
|
2428
|
+
system, the noise in the recovered message is actually produced by a kind of
|
2429
|
+
general quantizing at the transmitter and not produced by the noise in the
|
2430
|
+
channel. It is more or lessanalogous to the quantizing noise in PCM. 29. THE
|
2431
|
+
CALCULATION OF RATES The definition of the rate is similar in many respects to
|
2432
|
+
the definition of channel capacity. In the former Z Z P x y R Min P x ylog ;
|
2433
|
+
|
2434
|
+
dx dy = ;
|
2435
|
+
|
2436
|
+
Px y P x P y Z Z with P xand v1 P x y x y dx dyfixed. In the latter = ;
|
2437
|
+
|
2438
|
+
;
|
2439
|
+
|
2440
|
+
Z Z P x y ;
|
2441
|
+
|
2442
|
+
C Max P x ylog dx dy = ;
|
2443
|
+
|
2444
|
+
P x P x P y with Px yfixed and possibly one or more other constraints (e.g., an
|
2445
|
+
average power limitation) of the form R R K P x y x y dx dy. = ;
|
2446
|
+
|
2447
|
+
;
|
2448
|
+
|
2449
|
+
A partial solution of the general maximizing problem for determining the rate
|
2450
|
+
of a source can be given. Using Lagrange's method we consider Z Z P x y ;
|
2451
|
+
|
2452
|
+
P x ylog P x y x y x P x y dx dy ;
|
2453
|
+
|
2454
|
+
+ ;
|
2455
|
+
|
2456
|
+
;
|
2457
|
+
|
2458
|
+
+ ;
|
2459
|
+
|
2460
|
+
: P x P y 50
|
2461
|
+
===============================================================================
|
2462
|
+
The variational equation (when we take the first variation on P x y) leads to ;
|
2463
|
+
|
2464
|
+
P x y ;
|
2465
|
+
|
2466
|
+
y x B x e, = where is determined to give the required fidelity and B xis chosen
|
2467
|
+
to satisfy Z B x e x y , ;
|
2468
|
+
|
2469
|
+
dx 1 = : This shows that, with best encoding, the conditional probability of a
|
2470
|
+
certain cause for various received y, Py xwill decline exponentially with the
|
2471
|
+
distance function x ybetween the xand yin question. ;
|
2472
|
+
|
2473
|
+
In the special case where the distance function x ydepends only on the (vector)
|
2474
|
+
difference between x ;
|
2475
|
+
|
2476
|
+
and y, x y x y ;
|
2477
|
+
|
2478
|
+
= , we have Z B x e x y , , dx 1 = : Hence B xis constant, say , and P x y , y
|
2479
|
+
x e, = : Unfortunately these formal solutions are difficult to evaluate in
|
2480
|
+
particular cases and seem to be of little value.In fact, the actual calculation
|
2481
|
+
of rates has been carried out in only a few very simple cases. If the distance
|
2482
|
+
function x yis the mean square discrepancy between xand yand the message
|
2483
|
+
ensemble ;
|
2484
|
+
|
2485
|
+
is white noise, the rate can be determined. In that case we have R Min H x Hy x
|
2486
|
+
H x MaxHy x = , = , with N x y2. But the Max Hy xoccurs when y xis a white
|
2487
|
+
noise, and is equal to W1 log 2 eNwhere = , , W1 is the bandwidth of the
|
2488
|
+
message ensemble. Therefore R W1 log 2 eQ W1 log 2 eN = , Q W1 log = N where
|
2489
|
+
Qis the average message power. This proves the following: Theorem 22:The rate
|
2490
|
+
for a white noise source of power Qand band W1 relative to an R.M.S. measure of
|
2491
|
+
fidelity is Q R W1 log = N where Nis the allowed mean square error between
|
2492
|
+
original and recovered messages. More generally with any message source we can
|
2493
|
+
obtain inequalities bounding the rate relative to a mean square error
|
2494
|
+
criterion. Theorem 23:The rate for any source of band W1 is bounded by Q1 Q W1
|
2495
|
+
log R W1 log N N where Qis the average power of the source, Q1 its entropy
|
2496
|
+
power and Nthe allowed mean square error. The lower bound follows from the fact
|
2497
|
+
that the Max H 2 y x for a given x y Noccurs in the white , = noise case. The
|
2498
|
+
upper bound results if we place points (used in the proof of Theorem 21) not in
|
2499
|
+
the best way p but at random in a sphere of radius Q N. , 51
|
2500
|
+
===============================================================================
|
2501
|
+
ACKNOWLEDGMENTS The writer is indebted to his colleagues at the Laboratories,
|
2502
|
+
particularly to Dr. H. W. Bode, Dr. J. R. Pierce,Dr. B. McMillan, and Dr. B. M.
|
2503
|
+
Oliver for many helpful suggestions and criticisms during the course of
|
2504
|
+
thiswork. Credit should also be given to Professor N. Wiener, whose elegant
|
2505
|
+
solution of the problems of filteringand prediction of stationary ensembles has
|
2506
|
+
considerably influenced the writer's thinking in this field. APPENDIX 5 Let S1
|
2507
|
+
be any measurable subset of the gensemble, and S2 the subset of the fensemble
|
2508
|
+
which gives S1under the operation T. Then S1 T S2 = : Let H be the operator
|
2509
|
+
which shifts all functions in a set by the time . Then HS1 HT S2 T HS2 = =
|
2510
|
+
since Tis invariant and therefore commutes with H. Hence if m Sis the
|
2511
|
+
probability measure of the set S m HS1 m T HS2 m HS2 = = m S2 m S1 = = where
|
2512
|
+
the second equality is by definition of measure in the gspace, the third since
|
2513
|
+
the fensemble isstationary, and the last by definition of gmeasure again. To
|
2514
|
+
prove that the ergodic property is preserved under invariant operations, let S1
|
2515
|
+
be a subset of the g ensemble which is invariant under H, and let S2 be the set
|
2516
|
+
of all functions fwhich transform into S1. Then HS1 HT S2 T HS2 S1 = = = so
|
2517
|
+
that HS2 is included in S2 for all . Now, since m HS2 m S1 = this implies HS2
|
2518
|
+
S2 = for all with m S2 0 1. This contradiction shows that S1 does not exist. 6=
|
2519
|
+
;
|
2520
|
+
|
2521
|
+
APPENDIX 6 The upper bound, N3 N1 N2, is due to the fact that the maximum
|
2522
|
+
possible entropy for a power N1 N2 + + occurs when we have a white noise of
|
2523
|
+
this power. In this case the entropy power is N1 N2. + To obtain the lower
|
2524
|
+
bound, suppose we have two distributions in ndimensions p xiand q xiwith
|
2525
|
+
entropy powers N1 and N2. What form should pand qhave to minimize the entropy
|
2526
|
+
power N3 of theirconvolution r xi: Z r xi p yi q xi yi dyi = , : The entropy H3
|
2527
|
+
of ris given by Z H3 r xilog r xi dxi = , : We wish to minimize this subject to
|
2528
|
+
the constraints Z H1 p xilog p xi dxi = , Z H2 q xilog q xi dxi = , : 52
|
2529
|
+
===============================================================================
|
2530
|
+
We consider then Z U r xlog r x p xlog p x q xlog q x dx = , + + Z U 1 logr x r
|
2531
|
+
x 1 log p x p x 1 logq x q x dx = , + + + + + : If p xis varied at a particular
|
2532
|
+
argument xi si, the variation in r xis = r x q xi si = , and Z U q xi silog r
|
2533
|
+
xi dxi log p si 0 = , , , = and similarly when qis varied. Hence the conditions
|
2534
|
+
for a minimum are Z q xi silog r xi dxi log p si , = , Z p xi silog r xi dxi
|
2535
|
+
log q si , = , : If we multiply the first by p siand the second by q siand
|
2536
|
+
integrate with respect to siwe obtain H3 H1 = , H3 H2 = , or solving for and
|
2537
|
+
and replacing in the equations Z H1 q xi silog r xi dxi H3 log p si , = , Z H2
|
2538
|
+
p xi silog r xi dxi H3 log q si , = , : Now suppose p xiand q xiare normal A n2
|
2539
|
+
= i j j j p x 1 i exp Aijxixj = , 2 n2 2 = B n2 = i j j j q x 1 i exp Bijxixj =
|
2540
|
+
, : 2 n2 2 = Then r xiwill also be normal with quadratic form Ci j. If the
|
2541
|
+
inverses of these forms are ai j, bi j, ci jthen ci j ai j bi j = + : We wish
|
2542
|
+
to show that these functions satisfy the minimizing conditions if and only if
|
2543
|
+
ai j Kbi jand thus = give the minimum H3 under the constraints. First we have n
|
2544
|
+
1 log r x 1 i log Ci j Cijxixj = j j , 2 2 2 Z n 1 q x 1 1 i silog r xi dxi log
|
2545
|
+
Ci j CijsisjCijbij , = j j , , : 2 2 2 2 This should equal H3 n 1 log A 1 i j
|
2546
|
+
Aijsisj j j , H 2 1 2 2 H1 H1 which requires Ai j Ci j. In this case Ai j Bi
|
2547
|
+
jand both equations reduce to identities. = = H3 H2 53
|
2548
|
+
===============================================================================
|
2549
|
+
APPENDIX 7 The following will indicate a more general and more rigorous
|
2550
|
+
approach to the central definitions of commu-nication theory. Consider a
|
2551
|
+
probability measure space whose elements are ordered pairs x y. The variables ;
|
2552
|
+
|
2553
|
+
x, yare to be identified as the possible transmitted and received signals of
|
2554
|
+
some long duration T. Let us callthe set of all points whose xbelongs to a
|
2555
|
+
subset S1 of xpoints the strip over S1, and similarly the set whoseybelong to
|
2556
|
+
S2 the strip over S2. We divide xand yinto a collection of non-overlapping
|
2557
|
+
measurable subsetsXiand Yiapproximate to the rate of transmission Rby 1 P Xi Yi
|
2558
|
+
R ;
|
2559
|
+
|
2560
|
+
1 P Xi Yilog = ;
|
2561
|
+
|
2562
|
+
T P X P Y i i i where P Xi is the probability measure of the strip over Xi P Yi
|
2563
|
+
is the probability measure of the strip over Yi P Xi Yi is the probability
|
2564
|
+
measure of the intersection of the strips ;
|
2565
|
+
|
2566
|
+
: A further subdivision can never decrease R1. For let X1 be divided into X1 X0
|
2567
|
+
X00 and let = 1 + 1 P Y1 a P X1 b c = = + P X0 b P X0 Y1 d 1 = 1;
|
2568
|
+
|
2569
|
+
= P X00 c P X00 Y1 e 1 = 1 ;
|
2570
|
+
|
2571
|
+
= P X1 Y1 d e ;
|
2572
|
+
|
2573
|
+
= + : Then in the sum we have replaced (for the X1, Y1 intersection) d e d e +
|
2574
|
+
d elog by dlog elog + + : a b c ab ac + It is easily shown that with the
|
2575
|
+
limitation we have on b, c, d, e, d e d e + ddee + b c bdce + and consequently
|
2576
|
+
the sum is increased. Thus the various possible subdivisions form a directed
|
2577
|
+
set, withRmonotonic increasing with refinement of the subdivision. We may
|
2578
|
+
define Runambiguously as the leastupper bound for R1 and write it 1 ZZ P x y R
|
2579
|
+
P x ylog ;
|
2580
|
+
|
2581
|
+
dx dy = ;
|
2582
|
+
|
2583
|
+
: T P x P y This integral, understood in the above sense, includes both the
|
2584
|
+
continuous and discrete cases and of coursemany others which cannot be
|
2585
|
+
represented in either form. It is trivial in this formulation that if xand
|
2586
|
+
uarein one-to-one correspondence, the rate from uto yis equal to that from xto
|
2587
|
+
y. If vis any function of y(notnecessarily with an inverse) then the rate from
|
2588
|
+
xto yis greater than or equal to that from xto vsince, inthe calculation of the
|
2589
|
+
approximations, the subdivisions of yare essentially a finer subdivision of
|
2590
|
+
those forv. More generally if yand vare related not functionally but
|
2591
|
+
statistically, i.e., we have a probability measurespace y v, then R x v R x y.
|
2592
|
+
This means that any operation applied to the received signal, even though ;
|
2593
|
+
|
2594
|
+
;
|
2595
|
+
|
2596
|
+
;
|
2597
|
+
|
2598
|
+
it involves statistical elements, does not increase R. Another notion which
|
2599
|
+
should be defined precisely in an abstract formulation of the theory is that of
|
2600
|
+
"dimension rate," that is the average number of dimensions required per second
|
2601
|
+
to specify a member ofan ensemble. In the band limited case 2Wnumbers per
|
2602
|
+
second are sufficient. A general definition can beframed as follows. Let f tbe
|
2603
|
+
an ensemble of functions and let T f t f t be a metric measuring ;
|
2604
|
+
|
2605
|
+
54
|
2606
|
+
===============================================================================
|
2607
|
+
the "distance" from fto fover the time T(for example the R.M.S. discrepancy
|
2608
|
+
over this interval.) LetN Tbe the least number of elements fwhich can be chosen
|
2609
|
+
such that all elements of the ensemble ;
|
2610
|
+
|
2611
|
+
;
|
2612
|
+
|
2613
|
+
apart from a set of measure are within the distance of at least one of those
|
2614
|
+
chosen. Thus we are covering the space to within apart from a set of small
|
2615
|
+
measure . We define the dimension rate for the ensemble by the triple limit log
|
2616
|
+
N T Lim Lim Lim ;
|
2617
|
+
|
2618
|
+
;
|
2619
|
+
|
2620
|
+
= : 0 0 T Tlog ! ! ! This is a generalization of the measure type definitions
|
2621
|
+
of dimension in topology, and agrees with the intu-itive dimension rate for
|
2622
|
+
simple ensembles where the desired result is obvious. 55
|
2623
|
+
===============================================================================
|
2624
|
+
************ DDooccuummeenntt OOuuttlliinnee ************
|
2625
|
+
* A Mathematical Theory of Communication
|
2626
|
+
* Introduction
|
2627
|
+
* Part I: Discrete Noiseless Systems
|
2628
|
+
o The Discrete Noiseless Channel
|
2629
|
+
o The Discrete Source of Information
|
2630
|
+
o The Series of Approximations to English
|
2631
|
+
o Graphical Representations of a Markoff Process
|
2632
|
+
o Ergodic and Mixed Sources
|
2633
|
+
o Choice, Uncertainty and Entropy
|
2634
|
+
o Representation of the Encoding and Decoding Operation
|
2635
|
+
o The Fundamental Theorem of a Noiseless Channel
|
2636
|
+
o Discussion and Examples
|
2637
|
+
* Part II: The Discrete Channel with Noise
|
2638
|
+
o Representation of a Noisy Discrete Channel
|
2639
|
+
o The Fundamental Theorem for a Discrete Channel with Noise
|
2640
|
+
o Discussion
|
2641
|
+
o Example of a Discrete Channel and its Capacity
|
2642
|
+
o The Channel Capacity in Certain Special Cases
|
2643
|
+
o An Example of Efficient Coding
|
2644
|
+
o A1. The Growth of the Number of Blocks of Symbols with a Finite
|
2645
|
+
State Condition
|
2646
|
+
o A2. The Derivation of Entropy
|
2647
|
+
o A3. Theorems on Ergodic Sources
|
2648
|
+
o A4. Maximizing the Rate for a System of Constraints
|
2649
|
+
* Part III: Mathematical Prelininaries
|
2650
|
+
o Sets and Ensembles of Functions
|
2651
|
+
o Band Limited Ensembles of Functions
|
2652
|
+
o Entropy of a Continuous Distribution
|
2653
|
+
o Entropy of an Ensemble of Functions
|
2654
|
+
o Entropy Loss in Linear Filters
|
2655
|
+
o Entropy of a Sum of Two Ensembles
|
2656
|
+
* Part IV: The Continuous Channel
|
2657
|
+
o The Capacity of a Continuous Channel
|
2658
|
+
o Channel Capacity with an Average Power Limitation
|
2659
|
+
o The Channel Capacity with a Peak Power Limitation
|
2660
|
+
* Part V: The Rate for a Continuous Source
|
2661
|
+
o Fidelity Evaluation Functions
|
2662
|
+
o The Rate for a Source Relative to a Fidelity Evaluation
|
2663
|
+
o The Calculation of Rates
|
2664
|
+
o A5
|
2665
|
+
o A6
|
2666
|
+
o A7
|
2667
|
+
===============================================================================
|