Estimators Based on Sufficient Statistics#
Statistic#
Let
Note that
When random samples depend on
, the statistic derived from these samples is used to infer .A useful statistic should summarize all the information from the measurements efficiently, often by reducing multiple random variables to a single manageable form.
Such a statistic can serve as a good estimator of
.
For instance:
The estimator
does not capture all the information about .However, the sample mean-based estimator
, derived from independent, identically distributed normal random variables, retains all relevant information about and is considered a sufficient statistic.This implies that
encapsulates all the information about available in the sample.
Definition of Sufficient Statistic#
In the context of statistical estimation, a sufficient statistic is a function of the data that captures all the information necessary to estimate a parameter.
Mathematically speaking, a statistic
Formally, if you have a set of observations
This means that once you know
Determine If a Statistic is Sufficient#
One method to determine whether a statistic is sufficient, e.g., a statistic
According to the theorem, if the probability density function (pdf)
: a function that depends on the statistic and the parameter . : a function that depends only on the observed data and not on the parameter .
Then,
Mathematically, the factorization can be expressed as:
Here,
Therefore, an estimate (or statistic)
The converse is also true. If
This means that if
In general,
Here:
is a function that depends on the data only through the statistic and the parameter ,It encapsulates all the information about the parameter
through the statistic .
is a function that depends on the data but not on the parameter .It does not provide any information about
.It only contributes to the overall likelihood through the data distribution, not through the parameter estimation.
Example: Sample Mean as a Sufficient Statistic#
Consider the scenario where you have a set of
The likelihood function is:
This can be rewritten as:
The sum of squares is expanded as
The sample mean is defined as
Substituting this into the likelihood function, we get:
Applying the Fisher Factorization theorem, we get
where
:Depends on
: Through inDepends on
:Depends only on the data
: Through and the constant term (not that from we can compute )Does not depend on
Thus, this factorization confirms that the sample mean
Derivation of the likelihood function (joint PDF)#
Note that the likelihood function is the joint PDF of the observations
Specifically, given that
Since each
Thus, the joint pdf is:
This can be simplified to:
This is the joint pdf of the observations
Discussion: Sufficient Statistic vs. Estimator #
A sufficient statistic is a function of the data that captures all the information available in the data about a particular parameter.
An estimator is a rule or function that provides an estimate of a parameter based on the observed data. It is also typically a statistic (a function of the data) that is used to infer the value of an unknown parameter.
Loosely speaking, a sufficient statistic is a statistic (a function of data), which can be used to serve as an estimator. A sufficient statistic often plays a dual role as a statistic and as an estimator.
Sufficient Statistic for The Sample Variance#
Consider independent and identically distributed (i.i.d.) Gaussian random variables, where each individual observation
We know that the sample mean
We have that the sample variance
Recall that the likelihood function is given by
Given that
The realization of the sample variance is
Thus, the sum of squared deviations from the mean is:
Substitute the expression for the sum of squared deviations into the likelihood function:
Applying the Fisher Factorization theorem, we obtain
where
:Depends on
: Through the exponential term and the coefficient.Depends on the data only through
: Since is a function of .
:Independence from
: This function does not involve .In this case,
is simply 1, meaning it does not contribute any additional information about .
Thus, the sample variance
In summary:
The sample mean
is a sufficient statistic for the mean of the distribution.This means it contains all the necessary information about
present in the data.Moreover,
is an unbiased estimator of the true mean , meaning that its expected value equals .
The sample variance
, on the other hand, is also a sufficient statistic for the variance .It captures all the relevant information about the variance from the data.
However, when used as an estimator of the true variance
, is a biased estimator.Specifically, the expectation of
is , which is slightly lower than the true variance . - To correct this bias, the unbiased estimator of the variance is , often denoted as .