Estimators Based on Sufficient Statistics#

Statistic#

Let T(y) be a statistic, which is a function of the observed random variables y1,y2,,ym generated by a probability density function (PDF) p(y1,y2,,ym;α) where α is an unknown parameter.

Note that

  • When random samples depend on α, the statistic derived from these samples is used to infer α.

  • A useful statistic should summarize all the information from the measurements efficiently, often by reducing multiple random variables to a single manageable form.

  • Such a statistic can serve as a good estimator of α.

For instance:

  • The estimator α^=y1 does not capture all the information about α.

  • However, the sample mean-based estimator α^=T(y)y¯, derived from independent, identically distributed normal random variables, retains all relevant information about α and is considered a sufficient statistic.

  • This implies that T(y)=y¯ encapsulates all the information about α available in the sample.

Definition of Sufficient Statistic#

In the context of statistical estimation, a sufficient statistic is a function of the data that captures all the information necessary to estimate a parameter.

Mathematically speaking, a statistic T(y) is considered sufficient for α if the conditional pdf of the data given T(y), i.e., p(y1,y2,,ymT(y)) does not depend on α.

Formally, if you have a set of observations y=(y1,y2,,ym) and a parameter of interest α, then a statistic T(y) is said to be sufficient for α if the conditional distribution of the data y, given the statistic T(y) and the parameter α, is independent of α, i.e.:

P(y|T(y),α)=P(y|T(y))

This means that once you know T(y), the original data y provides no additional information about the parameter α. In other words, T(y) captures all the information that y contains about α.

Determine If a Statistic is Sufficient#

One method to determine whether a statistic is sufficient, e.g., a statistic T(y¯) is sufficient for a parameter α, is through the Fisher Factorization Theorem.

According to the theorem, if the probability density function (pdf) p(y1,y2,,ym;α) can be factored into two parts:

  • g(T(y¯),α): a function that depends on the statistic T(y¯) and the parameter α.

  • h(y1,y2,,ym): a function that depends only on the observed data and not on the parameter α.

Then, T(y¯) is a sufficient statistic for α.

Mathematically, the factorization can be expressed as:

p(y1,,ym;α)=g(T(y1,,ym),α)h(y1,,ym)

Here, g() encapsulates the dependency on the statistic and the parameter, while h() isolates the data dependency, confirming the sufficiency of T(y¯).

Therefore, an estimate (or statistic) T(y¯) is considered a sufficient estimate for a parameter α if it satisfies the Fisher factorization theorem.

The converse is also true. If T(y) is a sufficient statistic for the parameter α, then the probability density function (pdf) p(y;α) can be factored according to the Fisher factorization theorem.

This means that if T(y) is sufficient, the pdf can be expressed as:

p(y;α)=g(T(y),α)h(y)

In general, T(y) is a sufficient statistic for α if and only if the probability distribution (or likelihood) p(y|α) can be factored into the product of two functions:

p(y|α)=g(T(y),α)h(y)

Here:

  • g(T(y),α) is a function that depends on the data only through the statistic T(y) and the parameter α,

    • It encapsulates all the information about the parameter α through the statistic T(y).

  • h(y) is a function that depends on the data y but not on the parameter α.

    • It does not provide any information about α.

    • It only contributes to the overall likelihood through the data distribution, not through the parameter estimation.

Example: Sample Mean as a Sufficient Statistic#

Consider the scenario where you have a set of m independent and identically distributed (i.i.d.) observations y=(y1,y2,,ym), each RV follows a normal distribution N(μ,σ2), where the mean μ is the parameter of interest, and σ2 is known.

The likelihood function is:

p(y;μ,σ2)=i=1m12πσ2exp((yiμ)22σ2)

This can be rewritten as:

p(y1,,ym;μ,σ2)=(12πσ2)mexp(12σ2i=1m(yiμ)2)

The sum of squares is expanded as

i=1m(yiμ)2=i=1myi22μi=1myi+mμ2

The sample mean is defined as

y¯=1mi=1myi

Substituting this into the likelihood function, we get:

p(y;μ,σ2)=(12πσ2)mexp(12σ2(i=1myi22mμy¯+mμ2))

Applying the Fisher Factorization theorem, we get

p(y;μ,σ2)=exp(m(y¯μ)22σ2)g(y¯,μ)(12πσ2)mexp(12σ2i=1m(yiy¯)2)h(y)

where

  • g(T(y),α)=g(y¯,μ):

    • Depends on μ: Through μ in (y¯μ)2

    • Depends on T(y)=y¯

  • h(y):

    • Depends only on the data y: Through i=1m(yiy¯)2 and the constant term (12πσ2)m (not that from y we can compute y¯)

    • Does not depend on μ

Thus, this factorization confirms that the sample mean y¯ is a sufficient statistic for the parameter μ.

Derivation of the likelihood function (joint PDF)#

Note that the likelihood function is the joint PDF of the observations y1,y2,,ym.

Specifically, given that y1,y2,,ym are independent and identically distributed (i.i.d.) random variables from a normal distribution N(μ,σ2), the joint pdf of these observations can be written as:

f(y;μ)=f(y1,y2,,ym;μ)=i=1mf(yi;μ)

Since each yi is normally distributed with mean μ and variance σ2, the pdf of each yi is:

f(yi;μ)=12πσ2exp((yiμ)22σ2)

Thus, the joint pdf is:

f(y;μ)=i=1m12πσ2exp((yiμ)22σ2)

This can be simplified to:

f(y;μ)=(12πσ2)mexp(12σ2i=1m(yiμ)2)

This is the joint pdf of the observations y1,y2,,ym, and it is also the likelihood function when considering μ as the parameter to be estimated from the data.

Discussion: Sufficient Statistic T(y) vs. Estimator α^(y)#

  • A sufficient statistic is a function of the data that captures all the information available in the data about a particular parameter.

  • An estimator is a rule or function that provides an estimate of a parameter based on the observed data. It is also typically a statistic (a function of the data) that is used to infer the value of an unknown parameter.

Loosely speaking, a sufficient statistic is a statistic (a function of data), which can be used to serve as an estimator. A sufficient statistic often plays a dual role as a statistic and as an estimator.

Sufficient Statistic for The Sample Variance#

Consider independent and identically distributed (i.i.d.) Gaussian random variables, where each individual observation yi is normally distributed with mean μ and variance σ2 as above.

We know that the sample mean y¯ is a sufficient statistic for estimating the true mean μ. Now, what is a sufficient statistic for estimating the true variance σ2?

We have that the sample variance sy2, where

sy2=1mi=1m(yiy¯)2

Recall that the likelihood function is given by

p(y;μ,σ2)=(12πσ2)mexp(12σ2i=1m(yiμ)2)

Given that μ is known (or already estimated), our goal is to identify a statistic that encapsulates all information about σ2 contained in the data y.

The realization of the sample variance is

sy2=1mi=1m(yiμ)2

Thus, the sum of squared deviations from the mean is:

i=1m(yiμ)2=msy2

Substitute the expression for the sum of squared deviations into the likelihood function:

p(y;μ,σ2)=(12πσ2)mexp(msy22σ2)

Applying the Fisher Factorization theorem, we obtain

p(y;μ,σ2)=(12πσ2)mexp(msy22σ2)g(sy2,σ2)1h(y)

where

  • g(T(y),α)=g(sy2,σ2):

    • Depends on σ2: Through the exponential term and the coefficient.

    • Depends on the data only through sy2: Since sy2 is a function of y.

  • h(y)=1:

    • Independence from σ2: This function does not involve σ2.

    • In this case, h(y) is simply 1, meaning it does not contribute any additional information about σ2.

Thus, the sample variance sy2 is a sufficient statistic for the variance σ2.

In summary:

  • The sample mean y¯ is a sufficient statistic for the mean μ of the distribution.

    • This means it contains all the necessary information about μ present in the data.

    • Moreover, y¯ is an unbiased estimator of the true mean μ, meaning that its expected value equals μ.

  • The sample variance sy2, on the other hand, is also a sufficient statistic for the variance σ2.

    • It captures all the relevant information about the variance from the data.

    • However, when used as an estimator of the true variance σ2, sy2 is a biased estimator.

    • Specifically, the expectation of sy2 is m1mσ2, which is slightly lower than the true variance σ2. - To correct this bias, the unbiased estimator of the variance is mm1sy2, often denoted as sunbiased2.