Posterior Distribution Derivation#
Author : Payam Parvazmanesh
Contact : payam.manesh@gmail.com
Pattern Recognition
We assume the following:
Prior Distribution for \(( \mu )\): \(( \mu_0 \sim \mathcal{N}(\mu_0, \sigma_0^2) )\), a normal distribution with mean \(( \mu_0 )\) and variance \(( \sigma_0^2 )\).
Likelihood Function: The likelihood of the data \(( \mathbf{D} = \{x_1, x_2, \dots, x_n\} )\) is assumed to come from a normal distribution with mean \(( \mu )\) and variance \(( \sigma^2 )\), i.e., \(( x_i \sim \mathcal{N}(\mu, \sigma^2) )\).
Now, we want to find the posterior distribution of \(( \mu )\) given the data.
1. Prior Distribution#
The prior distribution of \(( \mu )\) is:
2. Likelihood Function#
Since the data points \(( x_1, x_2, \dots, x_n )\) are independent, the likelihood is the product of normal distributions for each data point:
The likelihood function simplifies to:
3. Posterior Distribution#
The posterior distribution is proportional to the product of the prior and the likelihood:
Substituting the expressions for \(( p(\mu) )\) and \(( p(\mathbf{D} | \mu) )\):
We can combine the terms inside the exponentials:
4. Completing the Square (Detailed Derivation)#
The goal of completing the square is to rewrite the posterior probability \(( p(\mu | \mathbf{D}) )\) into a standard normal distribution form.
4.1 Expanding Terms in the Exponent#
Recall the posterior is proportional to: $\( p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \frac{(\mu - \mu_0)^2}{\sigma_0^2} + \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 \right) \right). \)$
First, expand the terms in the exponent:
Expand \((\frac{(\mu - \mu_0)^2}{\sigma_0^2})\): $\( \frac{(\mu - \mu_0)^2}{\sigma_0^2} = \frac{\mu^2}{\sigma_0^2} - \frac{2\mu\mu_0}{\sigma_0^2} + \frac{\mu_0^2}{\sigma_0^2}. \)$
Expand \((\frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2)\): $\( \sum_{i=1}^{n} (x_i - \mu)^2 = \sum_{i=1}^{n} x_i^2 - 2\mu \sum_{i=1}^{n} x_i + n\mu^2. \)$
Substituting this back: $\( \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 = \frac{1}{\sigma^2} \left( \sum_{i=1}^{n} x_i^2 - 2\mu \sum_{i=1}^{n} x_i + n\mu^2 \right). \)$
Combine the terms: $\( \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 = \frac{n\mu^2}{\sigma^2} - \frac{2\mu}{\sigma^2} \sum_{i=1}^{n} x_i + \frac{\sum_{i=1}^{n} x_i^2}{\sigma^2}. \)$
4.2 Combine All Terms#
Now, substitute both expansions back into the posterior:
Group terms involving \((\mu^2)\), \((\mu)\), and constants:
Coefficient of \((\mu^2)\): $\( \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}. \)$
Coefficient of \((\mu)\): $\( -\frac{2\mu_0}{\sigma_0^2} - \frac{2}{\sigma^2} \sum_{i=1}^{n} x_i. \)$
Constant terms: $\( \frac{\mu_0^2}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i^2}{\sigma^2}. \)$
Thus, the posterior becomes: $\( p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} \left( \left( \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2} \right) \mu^2 - 2\left( \frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2} \right)\mu + \text{(constant terms)} \right) \right). \)$
4.3 Completing the Square for \((\mu)\)#
To rewrite this as a standard quadratic form, complete the square for \((\mu)\).
The quadratic expression is: $\( a\mu^2 - 2b\mu + \text{(constant terms)}, \)\( where: \)\( a = \frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}, \quad b = \frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2}. \)$
Complete the square: $\( a\mu^2 - 2b\mu = a\left( \mu^2 - \frac{2b}{a}\mu \right) = a\left( \left( \mu - \frac{b}{a} \right)^2 - \left( \frac{b}{a} \right)^2 \right). \)$
Substitute this back: $\( p(\mu | \mathbf{D}) \propto \exp\left( -\frac{1}{2} a \left( \mu - \frac{b}{a} \right)^2 \right). \)$
4.4 Identify the Posterior Mean and Variance#
From the completed square, the posterior distribution is normal: $\( \mu | \mathbf{D} \sim \mathcal{N}\left( \frac{b}{a}, \frac{1}{a} \right). \)$
Substitute \((a)\) and \((b)\):
Posterior Mean: $\( \mu_{\text{post}} = \frac{b}{a} = \frac{\frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^{n} x_i}{\sigma^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}} = \frac{\sigma^2 \mu_0 + n \sigma_0^2 \bar{x}}{\sigma^2 + n \sigma_0^2}. \)$
Posterior Variance: $\( \sigma^2_{\text{post}} = \frac{1}{a} = \frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}} = \frac{\sigma^2 \sigma_0^2}{\sigma^2 + n \sigma_0^2}. \)$
5. Resulting Posterior Distribution#
After completing the square, the posterior distribution of \(( \mu )\) is a normal distribution with the following mean and variance:
Mean: $\( \mu_{\text{post}} = \frac{\sigma^2 \mu_0 + n \sigma_0^2 \bar{x}}{\sigma^2 + n \sigma_0^2} \)\( where \)( \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i )$ is the sample mean.
Variance: $\( \sigma^2_{\text{post}} = \frac{\sigma^2 \sigma_0^2}{\sigma^2 + n \sigma_0^2} \)$
Therefore, the posterior distribution of \(( \mu )\) given the data \(( \mathbf{D} )\) is:
This result shows that the posterior distribution of \(( \mu )\) is normal, with a mean that is a weighted average of the prior mean \(( \mu_0 )\) and the sample mean \(( \bar{x} )\), where the weights depend on the prior variance \(( \sigma_0^2 )\), the sample size \(( n )\), and the variance of the data \(( \sigma^2 )\). The posterior variance is a combination of the prior variance and the data variance.