Posterior Distribution Derivation

Posterior Distribution Derivation#

Author : Payam Parvazmanesh
Contact : payam.manesh@gmail.com
Pattern Recognition

We assume the following:

Prior Distribution for $( \mu )$: $( \mu_0 \sim \mathcal{N}(\mu_0, \sigma_0^2) )$, a normal distribution with mean $( \mu_0 )$ and variance $( \sigma_0^2 )$.
Likelihood Function: The likelihood of the data $( \mathbf{D} = \{x_1, x_2, \dots, x_n\} )$ is assumed to come from a normal distribution with mean $( \mu )$ and variance $( \sigma^2 )$, i.e., $( x_i \sim \mathcal{N}(\mu, \sigma^2) )$.

Now, we want to find the posterior distribution of $( \mu )$ given the data.

1. Prior Distribution#

The prior distribution of $( \mu )$ is:

\[ p(\mu) = \frac{1}{\sqrt{2\pi \sigma_0^2}} \exp\left( -\frac{(\mu - \mu_0)^2}{2\sigma_0^2} \right) \]

2. Likelihood Function#

Since the data points $( x_1, x_2, \dots, x_n )$ are independent, the likelihood is the product of normal distributions for each data point:

\[ p(\mathbf{D} | \mu) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x_i - \mu)^2}{2\sigma^2} \right) \]

The likelihood function simplifies to:

\[ p(\mathbf{D} | \mu) = \frac{1}{(2\pi \sigma^2)^{n/2}} \exp\left( -\frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \]

3. Posterior Distribution#

The posterior distribution is proportional to the product of the prior and the likelihood:

\[ p(\mu | \mathbf{D}) \propto p(\mu) p(\mathbf{D} | \mu) \]

Substituting the expressions for $( p(\mu) )$ and $( p(\mathbf{D} | \mu) )$:

\[ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{(\mu - \mu_0)^2}{2\sigma_0^2} \right) \exp\left( -\frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \]

We can combine the terms inside the exponentials:

\[ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \frac{(\mu - \mu_0)^2}{\sigma_0^2} + \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \right) \]

4. Completing the Square (Detailed Derivation)#

The goal of completing the square is to rewrite the posterior probability $( p(\mu | \mathbf{D}) )$ into a standard normal distribution form.

4.1 Expanding Terms in the Exponent#

Recall the posterior is proportional to: $$ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \frac{(\mu - \mu_0)^2}{\sigma_0^2} + \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \right). $$

First, expand the terms in the exponent:

Expand $(\frac{(\mu - \mu_0)^2}{\sigma_0^2})$: $$ \frac{(\mu - \mu_0)^2}{\sigma_0^2} = \frac{\mu^2}{\sigma_0^2} - \frac{2\mu\mu_0}{\sigma_0^2} + \frac{\mu_0^2}{\sigma_0^2}. $$
Expand $(\frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2)$: $$ \sum_{i=1}^{n} (x_i - \mu)^2 = \sum_{i=1}^{n} x_i^2 - 2\mu \sum_{i=1}^{n} x_i + n\mu^2. $$

Substituting this back: $$ \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 = \frac{1}{\sigma^2} \left( \sum_{i=1}^{n} x_i^2 - 2\mu \sum_{i=1}^{n} x_i + n\mu^2 \right). $$

Combine the terms: $$ \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 = \frac{n\mu^2}{\sigma^2} - \frac{2\mu}{\sigma^2} \sum_{i=1}^{n} x_i + \frac{\sum_{i=1}^{n} x_i^2}{\sigma^2}. $$

4.2 Combine All Terms#

Now, substitute both expansions back into the posterior:

\[ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \frac{\mu^2}{\sigma_0^2} - \frac{2\mu\mu_0}{\sigma_0^2} + \frac{\mu_0^2}{\sigma_0^2} + \frac{n\mu^2}{\sigma^2} - \frac{2\mu}{\sigma^2} \sum_{i=1}^{n} x_i + \frac{\sum_{i=1}^{n} x_i^2}{\sigma^2} \right) \right). \]

Group terms involving $(\mu^2)$, $(\mu)$, and constants:

Coefficient of $(\mu^2)$: $$ \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}. $$
Coefficient of $(\mu)$: $$ -\frac{2\mu_0}{\sigma_0^2} - \frac{2}{\sigma^2} \sum_{i=1}^{n} x_i. $$
Constant terms: $$ \frac{\mu_0^2}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i^2}{\sigma^2}. $$

Thus, the posterior becomes: $$ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \left( \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2} \right) \mu^2 - 2\left( \frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2} \right)\mu + \text{(constant terms)} \right) \right). $$

4.3 Completing the Square for $(\mu)$#

To rewrite this as a standard quadratic form, complete the square for $(\mu)$.

The quadratic expression is: $$ a\mu^2 - 2b\mu + \text{(constant terms)}, $$ where: $$ a = \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}, \quad b = \frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2}. $$

Complete the square: $$ a\mu^2 - 2b\mu = a\left( \mu^2 - \frac{2b}{a}\mu \right) = a\left( \left( \mu - \frac{b}{a} \right)^2 - \left( \frac{b}{a} \right)^2 \right). $$

Substitute this back: $$ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} a \left( \mu - \frac{b}{a} \right)^2 \right). $$

4.4 Identify the Posterior Mean and Variance#

From the completed square, the posterior distribution is normal: $$ \mu | \mathbf{D} \sim \mathcal{N}\left( \frac{b}{a}, \frac{1}{a} \right). $$

Substitute $(a)$ and $(b)$:

Posterior Mean: $$ \mu_{\text{post}} = \frac{b}{a} = \frac{\frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}} = \frac{\sigma^2 \mu_0 + n \sigma_0^2 \bar{x}}{\sigma^2 + n \sigma_0^2}. $$
Posterior Variance: $$ \sigma^2_{\text{post}} = \frac{1}{a} = \frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}} = \frac{\sigma^2 \sigma_0^2}{\sigma^2 + n \sigma_0^2}. $$

5. Resulting Posterior Distribution#

After completing the square, the posterior distribution of $( \mu )$ is a normal distribution with the following mean and variance:

Mean: $$ \mu_{\text{post}} = \frac{\sigma^2 \mu_0 + n \sigma_0^2 \bar{x}}{\sigma^2 + n \sigma_0^2} $$ where $( \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i )$ is the sample mean.
Variance: $$ \sigma^2_{\text{post}} = \frac{\sigma^2 \sigma_0^2}{\sigma^2 + n \sigma_0^2} $$

Therefore, the posterior distribution of $( \mu )$ given the data $( \mathbf{D} )$ is:

\[ \mu | \mathbf{D} \sim \mathcal{N}\left( \frac{\sigma^2 \mu_0 + n \sigma_0^2 \bar{x}}{\sigma^2 + n \sigma_0^2}, \frac{\sigma^2 \sigma_0^2}{\sigma^2 + n \sigma_0^2} \right) \]

This result shows that the posterior distribution of $( \mu )$ is normal, with a mean that is a weighted average of the prior mean $( \mu_0 )$ and the sample mean $( \bar{x} )$, where the weights depend on the prior variance $( \sigma_0^2 )$, the sample size $( n )$, and the variance of the data $( \sigma^2 )$. The posterior variance is a combination of the prior variance and the data variance.