Tail Probability#

The tail probability of a random variable (RV) describes the likelihood that a RV takes on values far away from its mean, specifically in the “tails” of its distribution.

The Q-function is commonly used to express the tail probability of a Gaussian RV and is closely related to the cumulative distribution function (CDF).

Definition
The tail probability refers to the probability that \( X \) takes a value greater than (or less than) a certain threshold. Mathematically:

  • Upper tail probability: \( \Pr(X > a) \)

  • Lower tail probability: \( \Pr(X < b) \)

Tail Probability of Standard Normal RV#

For the standard normal distribution (\( \mu = 0, \sigma^2 = 1 \)):

\[ \Pr(X > a) = \int_a^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} dx \]
\[ \Pr(X < b) = \int_{-\infty}^b \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} dx \]

Recall that:

\[ Q(x) = \Pr(X > x) = \int_x^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}} dt \]

where \( x \) is the threshold.

Properties

  • Symmetry relationship:

    \[ Q(-x) = 1 - Q(x) \]
  • \( Q(x) \) can be approximated as:

    \[ Q(x) \approx \frac{1}{\sqrt{2\pi}x} e^{-\frac{x^2}{2}}, \quad x \gg 1 \]

    For large \( x \), the tail probability of the standard normal distribution can be approximated using an asymptotic expansion.

Examples

  • Find \( \Pr(X > 1) \) for a standard normal RV:

    \[ \Pr(X > 1) = Q(1) = \int_1^{\infty} \frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}} dt \]
  • Approximate \( Q(3) \): Using the approximation:

    \[ Q(3) \approx \frac{1}{\sqrt{2\pi} \cdot 3} e^{-\frac{3^2}{2}} = \frac{1}{\sqrt{2\pi} \cdot 3} e^{-4.5} \]

Significance Level#

The tail probability of a RV is closely tied to the significance level \( \alpha \) in hypothesis testing and statistical inference.

Significance Level \( \alpha \)#

In hypothesis testing:

  • \( \alpha \) is the probability of rejecting the null hypothesis (\( H_0 \)) when it is true (Type I error).

  • Typically, \( \alpha \) is chosen as a small value (e.g., \( 0.05, 0.01 \)), representing the maximum acceptable level of error.

The significance level determines the critical region or threshold beyond which the null hypothesis is rejected.

Tail Probability and \( \alpha \)
In a hypothesis test:

  • For a one-tailed test, \( \alpha \) corresponds directly to the tail probability.

  • For a two-tailed test, \( \alpha \) is split equally between the two tails of the distribution.

One-Tailed Test#

In a one-tailed test, we are interested in the probability of a value exceeding a certain critical value (\( z_\alpha \)) or falling below a threshold:

\[ \Pr(X > z_\alpha) = \alpha \quad \text{or} \quad \Pr(X < -z_\alpha) = \alpha \]

Here, \( z_\alpha \) is the critical value such that the tail probability beyond \( z_\alpha \) is exactly \( \alpha \).

Two-Tailed Test#

In a two-tailed test, the significance level \( \alpha \) is divided equally between the two tails:

\[ \Pr(X > z_{\alpha/2}) = \frac{\alpha}{2} \quad \text{and} \quad \Pr(X < -z_{\alpha/2}) = \frac{\alpha}{2} \]

The total tail probability is the sum of the probabilities in both tails:

\[ \Pr(X > z_{\alpha/2} \, \text{or} \, X < -z_{\alpha/2}) = \alpha \]

Critical Value#

Recall that, in a one-tailed test, the critical value \( z_\alpha \) is the value that satisfies:

\[ \Pr(X > z_\alpha) = \alpha. \]

In a two-tailed test, the critical value \( z_{\alpha/2} \) is the value that satisfies:

\[ \Pr(X > z_{\alpha/2}) = \frac{\alpha}{2} \quad \text{and} \quad \Pr(X < -z_{\alpha/2}) = \frac{\alpha}{2}. \]

Thus, the total probability in the tails for a two-tailed test is:

\[ \Pr(X > z_{\alpha/2} \, \text{or} \, X < -z_{\alpha/2}) = \alpha. \]

Critical Value for Standard Normal RV#

Considering the case of the standard normal RV, the critical value \( z_\alpha \) (or \( z_{\alpha/2} \)) is related to the Q-function for a standard normal distribution:

\[ \Pr(X > z_\alpha) = Q(z_\alpha) = \alpha \]

To find \( z_\alpha \), we use the inverse of the Q-function or the complementary cumulative distribution function (CCDF):

\[ z_\alpha = Q^{-1}(\alpha) \]

For a two-tailed test: $\( z_{\alpha/2} = Q^{-1}(\alpha/2) \)$

Example of A Specific \( \alpha \)#

Suppose \( \alpha = 0.05 \):

  • For a one-tailed test:

    \[ \Pr(X > z_\alpha) = \alpha = 0.05 \]

    Using a standard normal table or calculator, \( z_\alpha = 1.645 \).

  • For a two-tailed test:

    \[ \Pr(X > z_{\alpha/2}) = \Pr(X < -z_{\alpha/2}) = \frac{\alpha}{2} = 0.025 \]

    From the standard normal table, \( z_{\alpha/2} = 1.96 \).

Interpretation in Hypothesis Testing#

In hypothesis testing, the tail probability quantifies the extremity of the observed test statistic under the null hypothesis. If the observed test statistic falls in the tail region defined by \( \alpha \), the null hypothesis is rejected.

For example:

  • If the test statistic \( T \) is compared to \( z_\alpha \) in a one-tailed test:

    \[ \text{Reject } H_0 \text{ if } T > z_\alpha \]
  • In a two-tailed test: $\( \text{Reject } H_0 \text{ if } |T| > z_{\alpha/2} \)$

Thus, the tail probability provides the basis for determining statistical significance.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Given significance level alpha
alpha = 0.05

# Calculate z_alpha for one-tailed test
# z_alpha = Q^{-1}(alpha)
z_alpha_one_tail = norm.ppf(1 - alpha)

# Calculate z_alpha for two-tailed test
z_alpha_two_tail = norm.ppf(1 - alpha / 2)

# Define the range for plotting and the standard normal PDF
x = np.linspace(-4, 4, 1000)  # Range of z values
pdf = norm.pdf(x)  # Standard normal PDF

# Plot
plt.figure(figsize=(6, 4))

# Standard normal distribution curve
plt.plot(x, pdf, label="Standard Normal Distribution", color="blue")

# Highlight one-tailed region (z_alpha = 1.645)
plt.fill_between(x, 0, pdf, where=(x >= z_alpha_one_tail), color="orange", alpha=0.5, label=f"One-tailed region (z={z_alpha_one_tail:.3f})")

# Highlight two-tailed regions (z_alpha = 1.96)
plt.fill_between(x, 0, pdf, where=(x >= z_alpha_two_tail), color="green", alpha=0.5, label=f"Right tail (z={z_alpha_two_tail:.3f})")
plt.fill_between(x, 0, pdf, where=(x <= -z_alpha_two_tail), color="green", alpha=0.5, label=f"Left tail (z=-{z_alpha_two_tail:.3f})")

# Add vertical lines for critical z-values
plt.axvline(z_alpha_one_tail, color="orange", linestyle="--", label=f"z={z_alpha_one_tail:.3f} (one-tailed)")
plt.axvline(z_alpha_two_tail, color="green", linestyle="--", label=f"z={z_alpha_two_tail:.3f} (two-tailed)")
plt.axvline(-z_alpha_two_tail, color="green", linestyle="--")

# Labels, legend, and grid
plt.title("Tail Probability for Numerically Derived z_alpha Values", fontsize=14)
plt.xlabel("z", fontsize=12)
plt.ylabel("Probability Density", fontsize=12)
plt.legend(fontsize=10)
plt.grid(alpha=0.3)

# Show plot
plt.show()
_images/fc4dc8606c84d321fab9ba53e47b3ee5541e4cf82bd5fc7b8626ccd0be790f0e.png

Evaluating Tail Probabilities#

As mentioned earlier, it is essential to determine the probability that a random variable (RV) exceeds a specified threshold, expressed as \( \Pr(X > x_0) \). Alternatively, one might be interested in the probability of the RV deviating significantly from its mean, \( \Pr(|X - \mu_X| > x_0) \). These probabilities are collectively referred to as tail probabilities, as they measure the likelihood of outcomes occurring in the tails of the distribution.

Challenges in Calculating Tail Probabilities#

While tail probabilities can often be derived directly from the cumulative distribution function (CDF) of the RV, practical challenges may arise:

  • The CDF may not be available in a closed form or may be computationally intensive to evaluate.

  • In such cases, numerical integration of the probability density function (PDF) can be employed. However:

    • Numerical integration over a semi-infinite region can be computationally demanding or unstable.

    • The PDF itself might not be explicitly known; the RV could instead be described in an alternative form.

Inequalities for Bounding Tail Probabilities#

When exact calculation is infeasible, inequalities provide valuable tools for bounding tail probabilities. Two widely used results are:

Theorem: Markov’s Inequality#

If \( X \) is a nonnegative random variable (i.e., \( X \geq 0 \)), the probability that \( X \) exceeds a threshold \( x_0 > 0 \) is bounded by:

\[ \Pr(X \geq x_0) \leq \frac{\mathbb{E}[X]}{x_0}. \]
  • Application: Markov’s inequality is useful when only the expected value \( \mathbb{E}[X] \) of the RV is known.

Example: Continued Example of Standard Normal RV#

We compute \( \frac{\mathbb{E}[X]}{z_\alpha} \) as follows:

  • \( \mathbb{E}[X] \): The expected value of \( X \). For a standard normal random variable \( X \), \( \mathbb{E}[X] = 0 \), since the mean of a standard normal distribution is 0.

  • \( z_\alpha = 1.645 \): The critical value corresponding to \( \alpha = 0.05 \) for a one-tailed test in the standard normal distribution.

Now compute:

\[ \frac{\mathbb{E}[X]}{z_\alpha} = \frac{0}{1.645} = 0. \]

If we plug this into Markov’s inequality, we have:

\[ \Pr(X \geq x_0) \leq \frac{\mathbb{E}[X]}{x_0} = \frac{0}{x_0} = 0, \]

which is clearly nonsensical since \( \Pr(X > 1.645) = 0.05 > 0 \).

This happens because the inequality assumes \( \mathbb{E}[X] > 0 \), which does not hold for a symmetric distribution centered at zero.

Why Markov Inequality Doesn’t Apply in This Example

Markov’s inequality applies only to non-negative random variables. A standard normal random variable \( X \) is not non-negative, as it can take negative values. Therefore, the inequality is not valid for \( X \sim \mathcal{N}(0, 1) \).

Example: Markov’s Inequality for Exponential RV#

An exponential random variable (RV) is non-negative, which is defined over the range \( [0, \infty) \). Its probability density function (PDF) is given by:

\[\begin{split} f_X(x) = \begin{cases} \lambda e^{-\lambda x}, & x \geq 0, \\ 0, & x < 0, \end{cases} \end{split}\]

where:

  • \( \lambda > 0 \) is the rate parameter.

The cumulative distribution function (CDF) of the exponential RV is:

\[\begin{split} F_X(x) = \begin{cases} 1 - e^{-\lambda x}, & x \geq 0, \\ 0, & x < 0. \end{cases} \end{split}\]

The CDF \( F_X(x) \) gives the probability that the random variable \( X \) is less than or equal to \( x \), that is:

\[ F_X(x) = \Pr(X \leq x). \]

We can interpret that the PDF is zero for \( x < 0 \), the exponential random variable cannot take negative values.

Thus, the exponential RV satisfies the non-negativity condition required for applying inequalities such as Markov’s inequality.

Compute the critical value for an exponential RV

Given that significance level is \( \alpha = 0.05 \).

Consider the probability \(\Pr(X \leq z), \forall z \geq 0\), the CDF of an exponential random variable \( X \) is:

\[ F_X(z) = 1 - e^{-\lambda z}, \quad z \geq 0. \]

For a one-tailed test, the critical value \( z_\alpha \) satisfies:

\[ \Pr(X > z_\alpha) = \alpha. \]

The probability \( \Pr(X > z_\alpha) \) can be written as:

\[ \Pr(X > z_\alpha) = 1 - F_X(z_\alpha) = e^{-\lambda z_\alpha}. \]

Set this equal to \( \alpha \):

\[ e^{-\lambda z_\alpha} = \alpha. \]

Taking the natural logarithm on both sides:

\[ -\lambda z_\alpha = \ln(\alpha). \]

Rearrange to solve for \( z_\alpha \):

\[ z_\alpha = -\frac{\ln(\alpha)}{\lambda}. \]

Given \( \alpha = 0.05 \) and assuming the rate parameter \( \lambda > 0 \), the critical value is:

\[ z_\alpha = -\frac{\ln(0.05)}{\lambda}. \]

If \( \lambda = 1 \):

\[ z_\alpha = -\ln(0.05) \approx -\ln(0.05) = 2.9957. \]

Calculate \( \frac{\mathbb{E}[X]}{z_\alpha} \)

Recall that the critical value \( z_\alpha \) for a one-tailed test at significance level \( \alpha \) is:

\[ z_\alpha = -\frac{\ln(\alpha)}{\lambda}. \]

Substitute \( \mathbb{E}[X] = \frac{1}{\lambda} \) and \( z_\alpha = -\frac{\ln(\alpha)}{\lambda} \) into the ratio:

\[ \frac{\mathbb{E}[X]}{z_\alpha} = \frac{\frac{1}{\lambda}}{-\frac{\ln(\alpha)}{\lambda}} = -\frac{1}{\ln(\alpha)}. \]

For \( \alpha = 0.05 \):

\[ \ln(0.05) \approx -2.9957. \]

Thus:

\[ \frac{\mathbb{E}[X]}{z_\alpha} = -\frac{1}{-2.9957} \approx 0.334. \]

Comment on Markov’s Inequality for Exponential RV

Using the exact tail probability for \( \alpha = 0.05 \) and \( \lambda = 1 \):

\[ z_\alpha = 2.9957, \quad \Pr(X \geq z_\alpha) = e^{-z_\alpha} = e^{-2.9957} \approx 0.05. \]

It is obvious because because we pre-defined the tail probability is 0.05 via \(\alpha\).

From Markov’s inequality:

\[ \frac{\mathbb{E}[X]}{z_\alpha} = 0.334. \]

Clearly:

\[ \Pr(X \geq z_\alpha) = 0.05 \leq 0.334. \]

Markov’s inequality holds for exponential random variables and provides a valid upper bound for tail probabilities. However:

  • The bound is often loose, as seen here (\( 0.334 \) is much larger than the true tail probability \( 0.05 \)).

  • The inequality is more useful for providing rough estimates or when only the mean of the distribution is known.

import numpy as np

# Parameters
alpha = 0.05
lambda_param = 1

# Number of simulations
num_samples = 10**6

# Generate exponential random samples
samples = np.random.exponential(1/lambda_param, num_samples)

# Numerically compute z_alpha (critical value)
z_alpha_simulated = np.percentile(samples, 100 * (1 - alpha))

# Compute E[X]
expected_value = np.mean(samples)

# Compute the ratio E[X] / z_alpha
ratio = expected_value / z_alpha_simulated

# True tail probability at z_alpha
tail_probability = np.mean(samples >= z_alpha_simulated)

# Markov bound at z_alpha
markov_bound = expected_value / z_alpha_simulated

# Results
z_alpha_simulated, ratio, tail_probability, markov_bound
(2.994081024033943, 0.33372327249576017, 0.05, 0.33372327249576017)

Theorem: Chebyshev’s Inequality#

If \( X \) is a random variable with mean \( \mu_X \) and variance \( \sigma_X^2 \), the probability that \( X \) deviates from its mean by more than \( x_0 > 0 \) satisfies:

\[ \Pr(|X - \mu_X| \geq x_0) \leq \frac{\sigma_X^2}{x_0^2}. \]
  • Application: Chebyshev’s inequality provides a bound on tail probabilities based on the mean and variance of the RV. It is particularly useful for distributions with unknown shapes.

When direct computation is impractical, inequalities such as Markov’s and Chebyshev’s provide effective upper bounds, enabling insights into the behavior of the RV based on limited statistical information.

Chernoff Bound and#

Chernoff Bound
The Chernoff bound provides a powerful tool for bounding the tail probabilities of a random variable (RV) \( X \) using its moment generating function (MGF).

Theorem: Chernoff Bound#

If \( X \) is a random variable with moment generating function \( M_X(s) = \mathbb{E}[e^{sX}] \), then:

\[ \Pr(X \geq x_0) \leq \min_{s \geq 0} e^{-sx_0} M_X(s). \]

Key Points:

  • This inequality gives an upper bound on the probability that \( X \) exceeds a threshold \( x_0 \).

  • The bound is derived using the properties of the MGF, without requiring the explicit probability density function (PDF) of \( X \).

  • The minimization over \( s \geq 0 \) ensures the tightest possible bound.

Precise Tail Probability Expression#

Precise Expression for Tail Probability from the MGF If an exact computation of the tail probability \( \Pr(X \geq x_0) \) is required, it can be obtained directly from the MGF using the following integral representation:

Theorem: Exact Tail Probability from MGF#

For a random variable \( X \) with moment generating function \( M_X(u) \), the exact tail probability \( \Pr(X \geq x_0) \) is given by:

\[ \Pr(X \geq x_0) = \frac{1}{2\pi j} \int_{c-j\infty}^{c+j\infty} \frac{M_X(u)}{u} e^{-ux_0} \, du, \]

where:

  • \( c \) is chosen such that the contour of integration lies in the region of convergence of the MGF:

    • To the right of the origin (\( \Re(u) > 0 \)).

    • To the left of all singularities of \( M_X(u) \) in the right half-plane.

Comparison and Applications

  1. Chernoff Bound:

    • Provides a computationally simpler upper bound for tail probabilities.

    • Useful when the MGF is known and minimization over \( s \geq 0 \) is computationally feasible.

    • Less precise but faster to compute.

  2. Exact Tail Probability:

    • Offers a precise value for \( \Pr(X \geq x_0) \).

    • Requires evaluation of a complex integral along a vertical contour in the complex plane, which may be computationally intensive.