Posterior Distribution Derivation#

We assume the following:

  • Prior Distribution for \(( \mu )\): \(( \mu_0 \sim \mathcal{N}(\mu_0, \sigma_0^2) )\), a normal distribution with mean \(( \mu_0 )\) and variance \(( \sigma_0^2 )\).

  • Likelihood Function: The likelihood of the data \(( \mathbf{D} = \{x_1, x_2, \dots, x_n\} )\) is assumed to come from a normal distribution with mean \(( \mu )\) and variance \(( \sigma^2 )\), i.e., \(( x_i \sim \mathcal{N}(\mu, \sigma^2) )\).

Now, we want to find the posterior distribution of \(( \mu )\) given the data.

1. Prior Distribution#

The prior distribution of \(( \mu )\) is:

\[ p(\mu) = \frac{1}{\sqrt{2\pi \sigma_0^2}} \exp\left( -\frac{(\mu - \mu_0)^2}{2\sigma_0^2} \right) \]

2. Likelihood Function#

Since the data points \(( x_1, x_2, \dots, x_n )\) are independent, the likelihood is the product of normal distributions for each data point:

\[ p(\mathbf{D} | \mu) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x_i - \mu)^2}{2\sigma^2} \right) \]

The likelihood function simplifies to:

\[ p(\mathbf{D} | \mu) = \frac{1}{(2\pi \sigma^2)^{n/2}} \exp\left( -\frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \]

3. Posterior Distribution#

The posterior distribution is proportional to the product of the prior and the likelihood:

\[ p(\mu | \mathbf{D}) \propto p(\mu) p(\mathbf{D} | \mu) \]

Substituting the expressions for \(( p(\mu) )\) and \(( p(\mathbf{D} | \mu) )\):

\[ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{(\mu - \mu_0)^2}{2\sigma_0^2} \right) \exp\left( -\frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \]

We can combine the terms inside the exponentials:

\[ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \frac{(\mu - \mu_0)^2}{\sigma_0^2} + \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \right) \]

4. Completing the Square (Detailed Derivation)#

The goal of completing the square is to rewrite the posterior probability \(( p(\mu | \mathbf{D}) )\) into a standard normal distribution form.

4.1 Expanding Terms in the Exponent#

Recall the posterior is proportional to: $\( p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \frac{(\mu - \mu_0)^2}{\sigma_0^2} + \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \right). \)$

First, expand the terms in the exponent:

  1. Expand \((\frac{(\mu - \mu_0)^2}{\sigma_0^2})\): $\( \frac{(\mu - \mu_0)^2}{\sigma_0^2} = \frac{\mu^2}{\sigma_0^2} - \frac{2\mu\mu_0}{\sigma_0^2} + \frac{\mu_0^2}{\sigma_0^2}. \)$

  2. Expand \((\frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2)\): $\( \sum_{i=1}^{n} (x_i - \mu)^2 = \sum_{i=1}^{n} x_i^2 - 2\mu \sum_{i=1}^{n} x_i + n\mu^2. \)$

    Substituting this back: $\( \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 = \frac{1}{\sigma^2} \left( \sum_{i=1}^{n} x_i^2 - 2\mu \sum_{i=1}^{n} x_i + n\mu^2 \right). \)$

Combine the terms: $\( \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 = \frac{n\mu^2}{\sigma^2} - \frac{2\mu}{\sigma^2} \sum_{i=1}^{n} x_i + \frac{\sum_{i=1}^{n} x_i^2}{\sigma^2}. \)$

4.2 Combine All Terms#

Now, substitute both expansions back into the posterior:

\[ p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \frac{\mu^2}{\sigma_0^2} - \frac{2\mu\mu_0}{\sigma_0^2} + \frac{\mu_0^2}{\sigma_0^2} + \frac{n\mu^2}{\sigma^2} - \frac{2\mu}{\sigma^2} \sum_{i=1}^{n} x_i + \frac{\sum_{i=1}^{n} x_i^2}{\sigma^2} \right) \right). \]

Group terms involving \((\mu^2)\), \((\mu)\), and constants:

  1. Coefficient of \((\mu^2)\): $\( \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}. \)$

  2. Coefficient of \((\mu)\): $\( -\frac{2\mu_0}{\sigma_0^2} - \frac{2}{\sigma^2} \sum_{i=1}^{n} x_i. \)$

  3. Constant terms: $\( \frac{\mu_0^2}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i^2}{\sigma^2}. \)$

Thus, the posterior becomes: $\( p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \left( \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2} \right) \mu^2 - 2\left( \frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2} \right)\mu + \text{(constant terms)} \right) \right). \)$

4.3 Completing the Square for \((\mu)\)#

To rewrite this as a standard quadratic form, complete the square for \((\mu)\).

The quadratic expression is: $\( a\mu^2 - 2b\mu + \text{(constant terms)}, \)\( where: \)\( a = \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}, \quad b = \frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2}. \)$

Complete the square: $\( a\mu^2 - 2b\mu = a\left( \mu^2 - \frac{2b}{a}\mu \right) = a\left( \left( \mu - \frac{b}{a} \right)^2 - \left( \frac{b}{a} \right)^2 \right). \)$

Substitute this back: $\( p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} a \left( \mu - \frac{b}{a} \right)^2 \right). \)$

4.4 Identify the Posterior Mean and Variance#

From the completed square, the posterior distribution is normal: $\( \mu | \mathbf{D} \sim \mathcal{N}\left( \frac{b}{a}, \frac{1}{a} \right). \)$

Substitute \((a)\) and \((b)\):

  1. Posterior Mean: $\( \mu_{\text{post}} = \frac{b}{a} = \frac{\frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}} = \frac{\sigma^2 \mu_0 + n \sigma_0^2 \bar{x}}{\sigma^2 + n \sigma_0^2}. \)$

  2. Posterior Variance: $\( \sigma^2_{\text{post}} = \frac{1}{a} = \frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}} = \frac{\sigma^2 \sigma_0^2}{\sigma^2 + n \sigma_0^2}. \)$

5. Resulting Posterior Distribution#

After completing the square, the posterior distribution of \(( \mu )\) is a normal distribution with the following mean and variance:

  • Mean: $\( \mu_{\text{post}} = \frac{\sigma^2 \mu_0 + n \sigma_0^2 \bar{x}}{\sigma^2 + n \sigma_0^2} \)\( where \)( \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i )$ is the sample mean.

  • Variance: $\( \sigma^2_{\text{post}} = \frac{\sigma^2 \sigma_0^2}{\sigma^2 + n \sigma_0^2} \)$

Therefore, the posterior distribution of \(( \mu )\) given the data \(( \mathbf{D} )\) is:

\[ \mu | \mathbf{D} \sim \mathcal{N}\left( \frac{\sigma^2 \mu_0 + n \sigma_0^2 \bar{x}}{\sigma^2 + n \sigma_0^2}, \frac{\sigma^2 \sigma_0^2}{\sigma^2 + n \sigma_0^2} \right) \]

This result shows that the posterior distribution of \(( \mu )\) is normal, with a mean that is a weighted average of the prior mean \(( \mu_0 )\) and the sample mean \(( \bar{x} )\), where the weights depend on the prior variance \(( \sigma_0^2 )\), the sample size \(( n )\), and the variance of the data \(( \sigma^2 )\). The posterior variance is a combination of the prior variance and the data variance.