Information Theory#

Fundamental Limits on Communications#

In the study of communications, fundamental limits refer to the theoretical boundaries that dictate the feasibility of two core tasks: compression and transmission.

Compression reduces the amount of data required to represent an information source, while transmission pertains to reliably sending this data over a communication channel.

These tasks are governed by principles from information theory, which provides a mathematical framework for quantifying information, assessing channel capabilities, and determining the efficiency of coding schemes.

Types of Information Sources#

Information sources, which generate data for communication systems, can be broadly categorized as follows:

  • Analog Sources: These produce continuous-time signals (e.g., audio waveforms, video signals), characterized by a theoretically infinite range of possible values within a given interval.

    For example, a microphone capturing sound generates an analog signal that varies smoothly over time.

  • Discrete Sources: These emit sequences of symbols from a finite set, known as an alphabet.

    Examples include text messages composed of letters or binary data streams of 0s and 1s.

    Because the symbols are drawn from a finite alphabet, discrete sources lend themselves more readily to mathematical modeling and digital processing.

Source Coding#

Source coding is the process of converting the output of an information source into a digital format, typically a sequence of binary digits (bits).

Source coding task is performed by a source encoder, which maps the source output to its binary representation.

For instance, an analog audio signal might be sampled and quantized to produce a bitstream, whereas a discrete source like a text file can be encoded into bits using a predefined scheme (e.g., ASCII).

The overarching goal is to minimize the number of bits used to represent the source while still preserving the ability to reconstruct the original information—either perfectly (lossless coding) or with acceptable distortion (lossy coding).

Modeling Communication Channels#

The transmission of information across a channel is analyzed through the lens of information theory, a field pioneered by Claude Shannon.

Channels—whether physical (e.g., wires, optical fibers) or wireless (e.g., radio waves)—introduce noise, distortion, or other impairments that degrade signal integrity.

Information theory provides mathematical models to describe these effects, enabling the determination of how much information can be transmitted accurately under specified conditions.

Key Parameters of Communication Channels#

Two critical parameters define the performance limits of communication channels:

  • Channel Capacity:
    This is the maximum rate at which information can be reliably transmitted, usually measured in bits per second (bps).

    It depends on factors such as bandwidth and noise level.

    Shannon’s capacity formula quantifies this limit for specific channel models (e.g., via the noisy channel coding theorem).

  • Channel Cutoff Rate:
    This parameter serves as a practical upper bound on the rate at which information can be transmitted with an acceptable error probability using a particular coding scheme.

    It is typically lower than the channel capacity and is used to benchmark achievable performance in real-world systems.

Mathematical Models for Information Sources#

Random Nature of Information Sources#

Every information source generates an output that is inherently random, meaning its exact sequence cannot be predicted with certainty.

This randomness is described statistically, using probabilities to characterize the likelihood of different outcomes.

For example, a weather sensor might produce temperature readings that fluctuate unpredictably, while a text generator outputs words based on probabilistic rules.

The presence of randomness implies that the source conveys new information—if the output were fully predictable, there would be no need for transmission.

Necessity of Transmission#

If the output of a source were deterministic and known beforehand, the receiver could simply reproduce it locally without any actual communication.

The need for communication arises precisely because the source output is uncertain, requiring a system that can convey this unpredictability.

Discrete Source: Letters and Alphabet#

A discrete source emits a sequence of symbols, or letters, from a finite set known as the alphabet.

For example, an alphabet might be \(\{A, B, C\}\) for a basic text source or \(\{0, 1\}\) for a binary source. The output of this source is a string of these symbols generated over time.

A binary source might produce a sequence like \(100101110\ldots\), where each symbol (0 or 1) is chosen according to some probabilistic rule.

Example of a Binary Source
A binary source, as a concrete example of a discrete source, has an alphabet \(\{0, 1\}\). It generates a sequence such as \(100101110\ldots\), where each position is occupied by either a 0 or a 1.

The specific pattern in the sequence reflects the source’s statistical properties, such as the relative frequency of 0s and 1s.

Binary sources are fundamental in digital communications because any form of data—text, images, audio, or video—can ultimately be represented in binary form.

General Discrete Information Source#

In a more general setting, a discrete information source operates with an alphabet of size \(K\), denoted

\[ \{\,x_1, x_2, \ldots, x_K\}. \]

The source emits a sequence of letters, where each letter \(x_k\) (for \(1 \leq k \leq K\)) is selected from this alphabet.

For instance, a source with an alphabet \(\{A, B, C, D\}\) might produce a sequence like \(ABBDCA\ldots\).

The value of \(K\) specifies the size of the alphabet, while the probability distribution over these letters determines the sequence’s statistical properties.

Mathematical Model for a Discrete Source#

Probability Assignment to Alphabet Letters#

To build a mathematical model for a discrete source, we assign a probability

\[ P[x_k] = P(X = x_k) \]

to each letter \(x_k\) in the alphabet \(\{x_1, x_2, \ldots, x_K\}\).

This quantity represents the likelihood that the source emits the letter \(x_k\) at any given time.

These probabilities must satisfy the normalization condition

\[ \sum_{k=1}^{K} P[x_k] = 1, \]

ensuring the total probability across all possible letters equals \(1\).

For example, a binary source with \(P[0] = 0.6\) and \(P[1] = 0.4\) satisfies \(0.6 + 0.4 = 1\).

Statistical Independence Assumption#

We assume that the sequence of letters emitted by the source is statistically independent.

In other words, the occurrence of a letter at any position does not depend on the letters that came before or after it.

Mathematically, for a sequence \(X_1, X_2, X_3, \ldots\), the joint probability factors as:

\[ P(X_1 = x_1, X_2 = x_2, X_3 = x_3) \;=\; P[X_1 = x_1] P[X_2 = x_2] P[X_3 = x_3]. \]

This assumption greatly simplifies analysis and forms a key basis for many theoretical communication models.

Memoryless Source#

A source for which each output letter is statistically independent of all past and future outputs is called memoryless.

In such a source, the “history” of the sequence has no effect on the current output.

For instance, in a binary memoryless source with \(\{0, 1\}\), if the source has just emitted 101, the next letter (0 or 1) depends solely on \(P[0]\) and \(P[1]\), and not on the specific sequence 101.

Discrete Memoryless Source and Stationarity#

Discrete Memoryless Source (DMS)#

A discrete source that is memoryless is termed a discrete memoryless source (DMS).

Formally, a DMS is represented by a sequence of independent and identically distributed (i.i.d.) random variables \(\{X_i\}\), where each \(X_i\) takes values in the alphabet \(\{\,x_1, x_2, \ldots, x_K\}\) according to the same probability distribution \(P[x_k]\).

The i.i.d. property means:

  • Independence: No correlation exists among successive letters.

  • Identical Distribution: Each letter follows the same probability distribution.

Statistically Dependent Discrete Sources#

Not all discrete sources are memoryless.

For example, in English text, the letter “h” is more likely to follow “t” than “q.” In such cases, the output exhibits statistical dependence, and a different model is required.

Here, we use the concept of statistical stationarity: although letters may depend on their context, the overall statistical properties do not change over time.

Definition of Stationarity#

A discrete source is said to be stationary if the joint probabilities of any sequence of \(n\) letters remain the same when shifted in time.

Formally, for sequences \(\{a_1, a_2, \ldots, a_n\}\) and \(\{a_{1+m}, a_{2+m}, \ldots, a_{n+m}\}\), stationarity implies:

\[ P(a_1, a_2, \ldots, a_n) \;=\; P(a_{1+m}, a_{2+m}, \ldots, a_{n+m}), \]

for all \(n \ge 1\) and all time shifts \(m\).

This invariance means that the source’s statistical behavior does not evolve over time. Many natural and physical processes exhibit this property.

Interpretation of Stationarity
In practical terms, stationarity ensures that the probability of observing a particular sequence (e.g., “ABC”) at some initial time is the same as that of observing the same sequence starting at any other time.

This consistency allows for simpler, more robust modeling over long sequences, even if there are dependencies among the letters.

Mathematical Model for Analog Sources#

Analog Source Output#

An analog source generates a continuous-time output waveform \(x(t)\), which can be viewed as a sample function (or realization) of a stochastic process \(X(t)\).

A stochastic process is essentially a family of random variables indexed by time; the function \(x(t)\) is one particular outcome of this process.

For instance, \(X(t)\) could represent the voltage of a sensor over time, and \(x(t)\) would then be the specific voltage trace measured during a given experiment.

Stationary Stochastic Process#

We assume \(X(t)\) is a stationary stochastic process, meaning its statistical properties—such as mean and variance—remain the same over time.

Two key functions characterize stationarity:

  • Autocorrelation Function, \(R_X(\tau)\):
    Measures the correlation between \(X(t)\) and \(X(t + \tau)\).

  • Power Spectral Density, \(S_X(f)\):
    Describes how the process’s power is distributed across different frequencies.

For a stationary process, these functions are time-invariant, depending only on the shift \(\tau\) (in the case of \(R_X\)) or frequency \(f\) (in the case of \(S_X\)), but not on the absolute time \(t\).

Band-Limited Process and Sampling Theorem#

If \(X(t)\) is band-limited, its power spectral density is zero outside the range \(\lvert f \rvert \ge W\), where \(W\) is the process bandwidth.

According to the sampling theorem, such a signal can be exactly represented by discrete samples taken at the Nyquist rate,

\[ f_s = 2W \quad \text{(samples per second)}. \]

The classical reconstruction formula states:

\[\boxed{ X(t) \;=\; \sum_{n=-\infty}^{\infty} X\!\bigl(\tfrac{n}{2W}\bigr) \,\text{sinc}\!\Bigl[\,2W\Bigl(t - \tfrac{n}{2W}\Bigr)\Bigr], } \]

where \(\text{sinc}(x) = \tfrac{\sin(\pi x)}{\pi x}\), and \(X\!\bigl(\tfrac{n}{2W}\bigr)\) are the samples of \(X(t)\) taken at times \(t = \tfrac{n}{2W}\).

This equation shows that if you sample at or above \(2W\), you can perfectly reconstruct the continuous signal.

Conversion to Discrete-Time Source#

By applying the sampling theorem, an analog source’s output can be converted into an equivalent discrete-time source.

Specifically, the samples

\[ \bigl\{X\!\bigl(\tfrac{n}{2W}\bigr)\bigr\}_{n=-\infty}^{\infty} \]

form a sequence of random variables, which we can denote as

\[ X_n \;=\; X\!\Bigl(\tfrac{n}{2W}\Bigr)\quad \text{for } n\in \mathbb{Z}. \]

The statistical properties of this discrete-time source are then described by the joint probability density function (PDF)

\[ P\bigl(x_1, x_2, \ldots, x_m\bigr), \]

for any \(m \ge 1\). This PDF captures any dependencies (e.g., correlation) among the samples in the sequence.