Maximum a Posteriori (MAP) Estimation#

Motivation.

When the cost function in Bayesian estimation is unspecified or assumed to be equal for all estimation errors, the estimation problem reduces to Maximum a Posteriori (MAP) estimation.

In this special case of Bayesian estimation, the cost function is uniform, and the optimal estimator minimizes the expected loss by selecting the parameter value that maximizes the posterior distribution.

To apply the MAP estimator, we assume prior knowledge of the parameter’s probability distribution, represented by the prior probability p(α).

MAP Estimator Definition#

The MAP estimate is computed by maximizing the posterior probability density function (PDF) p(α|y) with respect to the parameter α:

α^MAP=argmaxα{p(α|y)}

According to Bayes’ theorem, the posterior PDF is given by:

p(α|y)=p(y|α)p(α)p(y)

Since the denominator p(y) does not depend on α, it can be omitted in the maximization process.

Therefore, the MAP estimate simplifies to finding the value of α that maximizes the product p(y|α)p(α):

α^MAP=argmaxα{p(y|α)p(α)}

This simplification makes MAP estimation computationally efficient, as it avoids calculating the often complex denominator p(y).

MAP Estimator Formulation#

Conditional Cost

Recall that the conditional cost Cc(α^|y) represents the expected cost of estimating the parameter α given the observation y:

Cc(α^|y)=C(α,α^(y))p(α|y)dα

where

  • C(α,α^(y)) is the cost of estimating α as α^(y).

  • p(α|y) is the posterior probability density function of α given y.

Uniform Cost Function

Recall that the uniform cost function penalizes any estimation error beyond a small threshold Δ2:

CU(α,α^(y))={0,|αα^(y)|<Δ21,|αα^(y)|Δ2

where Δ is a small positive number representing the acceptable estimation error margin.

Expressing the Average Risk with the Uniform Cost Function

Recall that the average risk R is the expected cost over all possible observations y:

R=p(y)Cc(α^(y)|y)dy

where p(y) is the probability density function of the observation y.

Using the uniform cost function in the conditional cost equation, we have

Cc(α^|y)=CU(α,α^(y))p(α|y)dα

Since CU(α,α^(y))=0 when |αα^(y)|<Δ2, and CU(α,α^(y))=1 otherwise, the integral simplifies to:

Cc(α^|y)=|αα^(y)|Δ2p(α|y)dα

This expression calculates the probability that the estimation error exceeds Δ2.

We can split the integral into two regions where the estimation error is beyond the acceptable threshold:

Cc(α^|y)=α^(y)Δ2p(α|y)dα+α^(y)+Δ2p(α|y)dα

Substituting the expression for Cc(α^|y) back into the average risk:

R=p(y)[α^(y)Δ2p(α|y)dα+α^(y)+Δ2p(α|y)dα]dy

Recognizing that the total probability integrates to 1:

p(α|y)dα=1

We can rewrite the sum of the two integrals as:

α^(y)Δ2p(α|y)dα+α^(y)+Δ2p(α|y)dα=1α^(y)Δ2α^(y)+Δ2p(α|y)dα

Substituting back into the average risk expression, we have:

R=p(y)[1α^(y)Δ2α^(y)+Δ2p(α|y)dα]dy

Note that

  • The term α^(y)Δ2α^(y)+Δ2p(α|y)dα represents the probability that the true parameter α lies within the acceptable error margin of the estimate α^(y).

  • By subtracting this probability from 1, we obtain the probability that the estimation error exceeds the acceptable threshold, which is exactly what the conditional cost measures under the uniform cost function.

The Optimal Estimate α^MAP: Minimizing the Average Risk

To minimize R, we need to maximize the integral α^(y)Δ2α^(y)+Δ2p(α|y)dα for each y.

  • Since p(y)0, the only way to minimize R is by maximizing the integral inside the brackets.

  • This integral measures how concentrated the posterior probability p(α|y) is around the estimate α^(y).

As Δ approaches zero (i.e., for very small acceptable errors), the integral becomes proportional to the value of the posterior probability density at α^(y):

α^(y)Δ2α^(y)+Δ2p(α|y)dαΔp(α^(y)|y)

Thus, minimizing R is equivalent to maximizing p(α^(y)|y).

Therefore, the optimal estimate α^MAP is the value of α that maximizes the posterior probability density function given y:

α^MAP=argmaxα^{p(α|y)}

Finding The MAP Estimate#

For small Δ, the integral

α^(y)Δ2α^(y)+Δ2p(α|y)dα

is maximized when the estimate α^MAP corresponds to the point where the posterior density p(α|y) reaches its maximum.

Essentially, we want to choose α^MAP such that the probability of the true parameter α lying within the interval [α^MAPΔ2,α^MAP+Δ2] is as high as possible.

In the case of a unimodal posterior density—that is, a distribution with a single peak—the estimate α^MAP is the mode of p(α|y).

Note that the mode of a probability distribution is the value at which the PDF reaches its maximum—the point where the distribution has its peak.

Differential Equation

We have that, in calculus, the maxima (and minima) of a differentiable function occur at critical points where the first derivative is zero.

This is because the slope of the tangent to the function at these points is horizontal, indicating a potential maximum or minimum.

Therefore, in our MAP estimation,

  • To find the maximum of the posterior density p(α|y), we look for the point where the function attains its highest value with respect to α.

  • Setting the derivative to zero helps us locate critical points of p(α|y), i.e.:

αp(α|y)=0
  • For a unimodal distribution, this critical point corresponds to the global maximum.

  • By solving αp(α|y)=0, we find the value of α where p(α|y) is at its peak, which is the most probable estimate given the observed data y.

Using ln()

We have that:

  • The logarithm simplifies the optimization problem, especially when the posterior density involves exponential functions or products of multiple terms.

  • The logarithm turns products into sums and exponents into multipliers, making differentiation more straightforward.

In our case, since the natural logarithm is a strictly increasing function, the location of the maximum of p(α|y) remains the same when considering lnp(α|y), i.e., maximizing p(α|y) is equivalent to maximizing lnp(α|y).

Therefore, we often obtain a simpler equation to solve for α^MAP, by setting:

αlnp(α|y)=0

This approach is particularly useful when p(α|y) is composed of exponential terms common in probability distributions like the Gaussian distribution.

Note that these equations assume that the derivatives exist and that p(α|y) is differentiable with respect to α.

Example C3.9: MAP Estimation of The True Mean#

Problem Statement

Using Example C3.7, show that the a posteriori PDF of Eq. (C3.98) has a maximum when

μ^MAP=γ2ω=β2y¯+m1σ2/mβ2+σ2/m

Also show that the a posteriori PDF p(μ|y) is Gaussian, the mode and mean are identical, so that the MAP estimate is the same as the MMSE estimate.

Solution

To find the MAP estimate μ^MAP of the unknown random mean μ using the given observations y1,y2,,ym, we’ll use the posterior PDF provided in equation (C3.98):

p(μ|y)=12πγ2exp([μγ2ω]22γ2)

This posterior PDF is a normal distribution with mean γ2ω and variance γ2.

We have that the normal distribution is unimodal and symmetric, the maximum of p(μ|y) occurs at its mean.

Note that, in this example, we do not need to solve the differential equation (i.e., set the derivative of the posterior density to zero) to find the MAP estimate, sepcifically:

  • Because we know the maximum occurs at the mean for a normal distribution, we can directly identify the MAP estimate without further calculations.

  • There’s no need to set the derivative μp(μ|y)=0 or μlnp(μ|y)=0 because we already know where the maximum lies.

Therefore, the MAP estimate μ^MAP is:

μ^MAP=γ2ω

Here we present breifly again the derivation of γ and ω.

Calculating γ2 and ω:

Recall that the parameters γ2 and ω are given by :

γ2=(mσ2+1β2)1=1mσ2+1β2

To simplify, combine the terms in the denominator:

γ2=1mσ2+1β2=σ2β2mβ2+σ2

And

ω=my¯σ2+m1β2

And y¯ is the sample mean:

y¯=1mk=1myk

Multiply γ2 and ω, we have

μ^MAP=γ2ω=(σ2β2mβ2+σ2)(my¯σ2+m1β2)

Next, multiply the numerators and denominators:

μ^MAP=σ2β2(my¯σ2+m1β2)mβ2+σ2

Simplify the terms inside the parentheses:

my¯σ2+m1β2=mβ2y¯+m1σ2σ2β2

Substitute back:

μ^MAP=σ2β2(mβ2y¯+m1σ2σ2β2)mβ2+σ2

The σ2β2 terms cancel out:

μ^MAP=mβ2y¯+m1σ2mβ2+σ2

Thus, the MAP estimate is the same as the MMSE estimate obtained in previous section.