Functions of A Random Variable#
Expectation#
Expectation Overview
The probability density function (PDF), \( f_X(x) \), provides a complete statistical description of a continuous random variable (RV), \( X \). While this level of detail is comprehensive, it often exceeds practical requirements. For most real-world scenarios, simpler measures, such as statistical averages, provide sufficient information for characterizing the behavior of \( X \). Among these, the focus is placed on the first-order average, known as the expected value or mean, due to its practical relevance. Higher-order averages, such as variance and covariance, are explored separately.
Definition of the Mean
The expected value or mean of a continuous random variable \( X \) is a measure of its central tendency. It is mathematically defined as:
where:
\( \mu_X \) represents the mean.
\( \mathbb{E}[X] \) denotes the expectation or averaging operator.
\( f_X(x) \) is the PDF of \( X \).
The mean (or expected value) serves as a foundational concept in probability and statistics due to its wide application in summarizing data and modeling real-world phenomena.
Notations for Expected Value:#
The terms average, mean, expectation, and first moment are synonymous and represent the same concept of expected value. These terms will be used interchangeably, as they all describe the measure of the central tendency of a random variable (RV).
Common Notations:
\( \mathbb{E}[X] \): The expectation operator applied to a random variable \( X \).
\( \mu_X \): The mean or expected value of \( X \).
\( \overline{X} \): Another notation indicating the mean or expected value.
Consistency Across Representations:
Each of these notations communicates the same concept but may be used in different contexts for clarity, convenience, or alignment with specific conventions.
Interpretation as Function of the PDF#
The expectation operator \( \mathbb{E} \), when applied to a continuous random variable \( X \), yields a unique scalar value. This value is computed using the probability density function \( f_X(x) \) of \( X \), as shown by:
This highlights that the expected value is inherently tied to the distribution of the random variable and summarizes its central location based on the probabilities of different outcomes.
Expectation for Discrete Random Variables#
For a discrete random variable (RV), the probability density function (PDF) is expressed in terms of the probability mass function (PMF) using the delta function:
where:
\( p_X(x_k) \) represents the PMF, assigning probabilities to the discrete values \( x_k \).
\( \delta(x - x_k) \) is the delta function that isolates contributions at the specific points \( x_k \).
Definition of Expected Value for Discrete RVs
The expected value, or mean, of a discrete RV \( X \), is defined as:
where:
\( x_k \): Possible values of the RV \( X \).
\( p_X(x_k) \): The probability mass associated with \( x_k \).
We can interpret this definition as the expected value of a discrete RV is a weighted average of its possible values, with the probabilities \( p_X(x_k) \) serving as the weights. Each \( x_k \) contributes to the expected value proportionally to how likely it is to occur.
Notes on the Existence of Expected Value
The expected value of a random variable exists only if the summation \( \sum_{k} x_k p_X(x_k) \) converges, meaning the series does not diverge. This depends on the specific values of \( x_k \) and their associated probabilities \( p_X(x_k) \).
Properties of the Expectation Operator#
Property 1: Linearity of Expectation#
The linearity property of the expectation operator states that the expected value of the sum of random variables (RVs) is equal to the sum of their individual expected values. Mathematically, this is expressed as:
Extension to Multiple Random Variables#
This property can be generalized to the sum of multiple random variables using the principle of induction:
This interpret that the expectation operator distributes over addition, making it a highly convenient tool in statistical analysis. It can be interpret in simple terms:
The expectation of a sum of RVs is equal to the sum of their individual expectations.
This property is valid for both discrete and continuous random variables, regardless of whether the variables are dependent or independent. It facilitates a number of statistical and probabilistic analyses, such as finding the mean of a combined random process.
Property 2: Statistical Independence#
The statistical independence of random variables (RVs) leads to a simplification of the expectation of their product. If \( X \) and \( Y \) are independent, the expectation of their product \( Z = XY \) is given by:
Generalization to Multiple Random Variables#
This property extends to the product of multiple independent random variables. By induction, for independent \( X_1, X_2, \dots, X_n \), we have:
We can interpret this property as the expectation of the product of independent RVs equals the product of their individual expectations.
This result is foundational in probability theory and has wide applications, particularly in communication systems and machine learning, where independence assumptions often simplify computations.
Independence Condition#
The property holds only when the random variables are statistically independent, meaning the joint probability distribution of \( X_1, X_2, \dots, X_n \) can be factored into the product of their individual distributions:
Expected Values of Functions of Random Variables#
Definition
For a random variable (RV) \( X \), the expected value of a function \( g(X) \) is a generalization of the concept of expectation. It represents the average or mean value of the function \( g(X) \) weighted by the probability distribution of \( X \).
For Continuous RVs
If \( X \) is a continuous RV with probability density function (PDF) \( f_X(x) \), the expected value of \( g(X) \) is given by:
For Discrete RVs
If \( X \) is a discrete RV with probability mass function (PMF) \( p_X(x_k) \), the expected value of \( g(X) \) becomes:
We can see that the expectation \( \mathbb{E}[g(X)] \) calculates the weighted average of the function \( g(X) \), where the weights correspond to the probabilities of \( X \).
This generalized expectation forms the basis for analyzing transformations of random variables and for deriving key statistical measures, such as variance, moments, and covariance, by appropriately choosing \( g(X) \).
Linearity of Expectation#
The expectation operator is inherently linear, meaning it satisfies specific properties when applied to linear combinations of random variables or functions.
Theorem: Expectation of a Linear Function
For any constants \( a \) and \( b \), the expected value of a linear transformation \( aX + b \) is:
where
The constant \( a \) scales the expectation of \( X \).
The constant \( b \) adds a fixed value to the result, reflecting the shift in the distribution.
Expectation of a Sum of Functions
If a function \( g(x) \) can be expressed as a sum of \( N \) component functions, \( g(x) = g_1(x) + g_2(x) + \dots + g_N(x) \), then:
This property extends the linearity of expectation to sums of arbitrary functions of the random variable \( X \).
We can see that expectation is a linear operation, meaning it can be interchanged with addition and scalar multiplication. This simplifies computations and forms the foundation for many probabilistic and statistical analyses.
Moments#
Definition of Moments
The \( n \)-th moment of a random variable (RV) \( X \) provides a measure of the distribution’s shape by considering powers of \( X \).
Moments are mathematically defined as follows:
For Continuous Random Variables
\[ \mathbb{E}[X^n] = \int_{-\infty}^{\infty} x^n f_X(x) dx, \]where \( f_X(x) \) is the probability density function (PDF) of \( X \).
For Discrete Random Variables
\[ \mathbb{E}[X^n] = \sum_{k} x_k^n p_X(x_k), \]where \( p_X(x_k) \) is the probability mass function (PMF).
Zeroth Moment#
The zeroth moment represents the total area under the PDF, which must equal 1 for any valid probability distribution:
Commonly Used Moments#
First Moment (Mean):
\[ \mathbb{E}[X] = \mu_X. \]Represents the central tendency of the distribution.
For symmetric distributions, like noise centered around zero, the mean is zero, indicating no bias.
Second Moment (Mean Squared Value):
\[ \mathbb{E}[X^2]. \]Measures the average squared value of \( X \).
For noise, this provides a measure of its strength or energy.
Example: Noise Distribution
If \( X \) represents a noise waveform:
A zero mean (\( \mathbb{E}[X] = 0 \)) indicates that the noise is symmetric and unbiased.
The second moment (\( \mathbb{E}[X^2] \)) quantifies the noise’s strength or power.
It is noted that moments are critical in characterizing distributions:
The first moment describes the location (mean).
The second moment relates to variability or spread (variance is derived from it).
Higher-order moments provide insights into skewness and kurtosis.
Central Moments#
Central moments help characterize the variability or randomness of a random variable (RV) more effectively, especially in cases where the RV \( Y \) combines a deterministic part \( a \) and a random part \( X \), such as:
If the random part \( X \) is small compared to the deterministic part \( a \), the moments of \( Y \) are dominated by the fixed part (\( a \)). In such cases, central moments are used to focus on the random fluctuations by subtracting the mean.
Definition of Central Moments
The \( n \)-th central moment of a random variable \( X \) is defined as:
For Continuous RVs:
\[ \mathbb{E}[(X - \mu_X)^n] = \int_{-\infty}^{\infty} (x - \mu_X)^n f_X(x) dx, \]where \( \mu_X \) is the mean (\( \mathbb{E}[X] \)) of \( X \).
For Discrete RVs:
\[ \mathbb{E}[(X - \mu_X)^n] = \sum_{k} (x_k - \mu_X)^n p_X(x_k), \]where \( p_X(x_k) \) is the probability mass function (PMF) of \( X \).
This can be interpreted as:
Subtracting the Mean: By subtracting \( \mu_X \) (the mean), central moments remove the bias introduced by the location of the distribution. This ensures that the higher moments reflect variability around the mean.
Zeroth Central Moment: The zeroth central moment equals the total probability, which is always 1 for a valid probability distribution.
Higher-Order Central Moments:
The second central moment (\( \mathbb{E}[(X - \mu_X)^2] \)) is the variance, a measure of the spread or dispersion of the distribution.
Higher-order central moments provide insights into features like skewness (asymmetry) and kurtosis (peakedness).
Central moments are particularly useful when studying distributions where the mean does not adequately capture the randomness or variability. They are fundamental in noise analysis, signal processing, and statistical characterization of random processes.
Conditional Expected Values#
Definition
The conditional expected value of a random variable (RV) provides the average value of the RV under the condition that a specific event \( A \) has occurred. It adjusts the expectation by weighting the possible values of the RV based on the conditional probability distribution.
For Continuous Random Variables: If \( X \) is a continuous RV, the conditional expected value is:
\[ \mathbb{E}[X|A] = \int_{-\infty}^{\infty} x f_{X|A}(x) dx, \]where \( f_{X|A}(x) \) is the conditional probability density function (PDF) of \( X \), given \( A \).
For Discrete Random Variables: If \( X \) is a discrete RV, the conditional expected value is:
\[ \mathbb{E}[X|A] = \sum_{k} x_k p_{X|A}(x_k), \]where \( p_{X|A}(x_k) \) is the conditional probability mass function (PMF) of \( X \), given \( A \).
Conditional Expectation of a Function of a RV#
The conditional expectation extends naturally to functions of a random variable. For a function \( g(X) \):
Continuous RV:
\[ \mathbb{E}[g(X)|A] = \int_{-\infty}^{\infty} g(x) f_{X|A}(x) dx. \]Discrete RV:
\[ \mathbb{E}[g(X)|A] = \sum_{k} g(x_k) p_{X|A}(x_k). \]
It is noted that the conditional expectation is computed similarly to regular expectation but uses the conditional PDF or PMF instead of the marginal distributions. It effectively recalculates the average value based on the knowledge that event \( A \) has occurred.
Main applications are:
Probabilistic Modeling: Useful for updating predictions when new information (event \( A \)) is available.
Bayesian Inference: Forms the basis of posterior expectations.
Signal Processing: Helps in filtering and prediction of signals with known prior events.
Characteristic Functions#
The characteristic function of a random variable (RV) is a mathematical tool closely related to the Fourier transform of the probability density function (PDF). It provides a frequency-domain representation of the RV, offering an alternative perspective on its statistical properties.
Definition
The characteristic function of a random variable \( X \) is defined as:
where:
\( \omega \) is the frequency variable.
\( f_X(x) \) is the PDF of \( X \).
\( \mathbb{E}[\cdot] \) represents the expectation operator.
We can see that:
Connection to Fourier Transform:
The characteristic function \( \Phi_X(\omega) \) resembles the Fourier transform of the PDF \( f_X(x) \).
In electrical engineering literature, the Fourier transform of \( f_X(x) \) is typically expressed as \( \Phi_X(-\omega) \).
Inverse Relationship:
The PDF \( f_X(x) \) can be recovered from its characteristic function \( \Phi_X(\omega) \) using the inverse Fourier transform:
\[ f_X(x) = \frac{1}{2\pi} \int_{-\infty}^{\infty} e^{-j\omega x} \Phi_X(\omega) d\omega. \]
Use Cases#
Analysis of RVs:
Characteristic functions are useful for analyzing the properties of random variables, such as moments and distributions.
Simplifying Operations:
Operations such as sums of independent random variables become straightforward in the frequency domain because the characteristic functions of the summed variables multiply.
Frequency-Domain Perspective:
While \( \omega \) does not represent a physical frequency, the frequency-domain representation provides insights into the distribution’s behavior, such as its spread and symmetry.
In short, characteristic functions offer a Fourier-based approach to analyzing random variables, providing a bridge between the time-domain representation (PDF) and the frequency domain. Through the inverse Fourier transform, all information about the random variable’s PDF is preserved and can be recovered.
Probability Generating Functions#
Connection to Signal Analysis
In signal analysis, Fourier transforms are widely used for analyzing continuous-time signals, while the z-transform is the standard tool for discrete-time signals.
Similarly, in probability theory:
The characteristic function serves as a Fourier-like tool for continuous random variables (RVs).
The probability generating function (PGF) plays an analogous role for discrete RVs, offering a convenient way to work with their distributions.
Definition of PGF
The probability generating function (PGF) of a discrete random variable \( X \) with a probability mass function (PMF) \( p_X(k) \), defined for nonnegative integers \( k = 0, 1, 2, \dots \), is given by:
We can see that this formulation highlights a direct resemblance to the unilateral z-transform, as both involve representing functions in terms of powers of \( z \).
Deriving the Mean Using the PGF#
The mean (expected value) of a discrete RV \( X \) can be obtained from the first derivative of its PGF, evaluated at \( z = 1 \):
Higher-Order Derivatives and Factorial Moments#
The factorial moments of a discrete RV \( X \) are derived from the higher-order derivatives of the PGF, evaluated at \( z = 1 \):
Proof. Differentiating \(k\) times:
At \(z = 1\), the result simplifies to:
which is equivalent to \(\mathbb{E}[X(X-1)\cdots(X-k+1)]\). \(\hspace{1em} \blacksquare\)
These factorial moments are useful for analyzing higher-order properties of the distribution, such as variability and dispersion.
Applications
The PGF provides a tool to obtain key statistical measures, such as the mean and higher-order moments, in a compact and systematic form.
The PGF provides a systematic way to analyze discrete random variables, e.g., the number of successful transmissions in a wireless system.
Moment Generating Functions#
Laplace Transform and Moment Generating Functions
Many real-world random quantities are nonnegative, such as:
Frequency of a random signal.
Time intervals (e.g., between arrivals in a queue).
Nonnegative outcomes like scores in a game.
For these one-sided distributions, the Laplace transform is a standard tool in signal analysis, and the moment generating function (MGF) serves as its equivalent in probability theory.
Definition of MGF
The moment generating function of a nonnegative random variable \( X \) is defined as:
where:
\( M_X(u) \): The MGF.
\( f_X(x) \): The probability density function (PDF) of \( X \).
The MGF resembles the Laplace transform of the PDF, providing a frequency-domain representation for random variables.
Derive the PDF from the MGF#
The PDF can, in principle, be recovered from the MGF using an operation analogous to an inverse Laplace transform:
The integral is computed along the Bromwich contour, which must be to the left of all poles of the MGF due to the sign convention in the exponential term.
Moments from the MGF#
The moments of the random variable \( X \) are derived from the derivatives of the MGF, evaluated at \( u = 0 \):
This property gives the MGF its name, as it directly generates the moments of \( X \).
Applications
The MGF approach is a mathematical technique commonly used in communication theory to evaluate the average error probability in digital communication systems.