State Space Models - Kalman Filters

State Space Models - Kalman Filters#

Kalman filters solve the filtering problem for the case of linear transitions and emissions with Gaussian probabilities. In the notation of the previous lecture, these can be expressed either as equations:

Initial conditions:

\[ \mathbf{x}_0 = \boldsymbol{\mu}_0 + \mathbf{z}_0, \]

where \(\boldsymbol{\mu}_0\) is a parameter and \(\mathbf{z}_0\sim N(\mathbf{0},\mathbf{V}_0)\) with \(\mathbf{V}_0\) an properly sized covariance matrix.

Transitions:

\[ \mathbf{x}_{t+1} = \mathbf{A}\mathbf{x}_t + \mathbf{B}\mathbf{u}_t + \mathbf{z}_t, \]

where \(\mathbf{A}\) and \(\mathbf{B}\) are matrices and \(\mathbf{z}_t\sim N(\mathbf{0},\mathbf{Q})\) with \(\mathbf{Q}\) an appropriately sized covariance matrix (known as process covariance).

Emissions:

\[ \mathbf{y}_t = \mathbf{C}\mathbf{x}_t + \mathbf{w}_t, \]

where \(\mathbf{C}\) is a matrix and \(\mathbf{w}_t\sim N(\mathbf{0},\mathbf{R})\) with \(\mathbf{R}\) an appropriately sized covariance matrix (known as measurement covariance).

or as probabilities:

Initial conditions:

\[ p(\mathbf{x}_0) = N(\mathbf{x}_0|\boldsymbol{\mu}_0,\mathbf{V}_0), \]

Transitions:

\[ p(\mathbf{x}_{t+1}|\mathbf{x}_t,\mathbf{u}_t) = N(\mathbf{x}_{t+1}|\mathbf{A}\mathbf{x}_t+\mathbf{B}\mathbf{u}_t,\mathbf{Q}). \]

Emissions:

\[ p(\mathbf{y}_{t}|\mathbf{x}_t) = N(\mathbf{y}_t|\mathbf{C}\mathbf{x}_t,\mathbf{R}). \]

The complete details of the Kalman filter are somewhat involved. If you want to dive deeper into the technical part, I suggest you read Chapter 13.3, Bishop (2006). Another excellent resource, albeit rather extensive, is the open-source book Kalman and Bayesian Filters in Python and in particular Chapter 7.

The Kalmann filter algorithm#

The Kalman filter algorithm is a recursive algorithm that computes the posterior distribution \(p(\mathbf{x}_t|\mathbf{y}_{1:t}, \mathbf{u}_{1:t})\). It is composed of two steps:

Prediction step: This step computes the prior distribution \(p(\mathbf{x}_t|\mathbf{y}_{1:t-1}, \mathbf{u}_{1:t})\).
Update step: This step computes the posterior distribution \(p(\mathbf{x}_t|\mathbf{y}_{1:t}, \mathbf{u}_{1:t})\).

We will now go through each of these steps in detail.

Prediction step#

Suppose we have computed the posterior distribution \(p(\mathbf{x}_{t-1}|\mathbf{y}_{1:t-1}, \mathbf{u}_{1:t-1})\). It turns out that this distribution is a Gaussian distribution with mean \(\boldsymbol{\mu}_{t-1}\) and covariance matrix \(\mathbf{V}_{t-1}\). We will see how we can compute these parameters recursively in the update step. For now assume we have them. So:

\[ p(\mathbf{x}_{t-1}|\mathbf{y}_{1:t-1}, \mathbf{u}_{1:t-1}) = N(\mathbf{x}_{t-1}|\boldsymbol{\mu}_{t-1},\mathbf{V}_{t-1}). \]

We can now compute the prior distribution \(p(\mathbf{x}_t|\mathbf{y}_{1:t-1}, \mathbf{u}_{1:t})\). We can do this by marginalizing over \(\mathbf{x}_{t-1}\):

\[\begin{split} \begin{align} p(\mathbf{x}_t|\mathbf{y}_{1:t-1}, \mathbf{u}_{1:t}) &= \int p(\mathbf{x}_t|\mathbf{x}_{t-1},\mathbf{u}_t)p(\mathbf{x}_{t-1}|\mathbf{y}_{1:t-1}, \mathbf{u}_{1:t-1})d\mathbf{x}_{t-1}\\ &= \int N(\mathbf{x}_t|\mathbf{A}\mathbf{x}_{t-1}+\mathbf{B}\mathbf{u}_t,\mathbf{Q})N(\mathbf{x}_{t-1}|\boldsymbol{\mu}_{t-1},\mathbf{V}_{t-1})d\mathbf{x}_{t-1}\\ &= N(\mathbf{x}_t|\mathbf{A}\boldsymbol{\mu}_{t-1}+\mathbf{B}\mathbf{u}_t,\mathbf{Q}+\mathbf{A}\mathbf{V}_{t-1}\mathbf{A}^T). \end{align} \end{split}\]

In this last step we have used properties of Gaussian distributions. We write the mean and covariance matrix of the prior distribution as:

\[\begin{split} \begin{align} \boldsymbol{\mu}^p_t &= \mathbf{A}\boldsymbol{\mu}_{t-1}+\mathbf{B}\mathbf{u}_t\\ \mathbf{V}^p_t &= \mathbf{Q}+\mathbf{A}\mathbf{V}_{t-1}\mathbf{A}^T. \end{align} \end{split}\]

We have used a superscript \(p\) to denote that these are computed in the prediction step.

Update step#

Now suppose we have computed the prior distribution \(p(\mathbf{x}_t|\mathbf{y}_{1:t-1}, \mathbf{u}_{1:t})\) during the prediction step. It is a Gaussian distribution with mean \(\boldsymbol{\mu}^p_t\) and covariance matrix \(\mathbf{V}^p_t\). We want to condition on a new measurement \(\mathbf{y}_t\). We apply Bayes’ rule:

\[\begin{split} \begin{align} p(\mathbf{x}_t|\mathbf{y}_{1:t}, \mathbf{u}_{1:t}) &\propto p(\mathbf{y}_t|\mathbf{x}_t)p(\mathbf{x}_t|\mathbf{y}_{1:t-1}, \mathbf{u}_{1:t})\\ &= N(\mathbf{y}_t|\mathbf{C}\mathbf{x}_t,\mathbf{R})N(\mathbf{x}_t|\boldsymbol{\mu}^p_t,\mathbf{V}^p_t). \end{align} \end{split}\]

Again we have a product of two Gaussian distributions. So, the resulting distribution is also Gaussian. The mean and covariance matrix of this distribution are given by:

\[\begin{split} \begin{align} \boldsymbol{\mu}_t &= \boldsymbol{\mu}^p_t + \mathbf{K}_t(\mathbf{y}_t-\mathbf{C}\boldsymbol{\mu}^p_t)\\ \mathbf{V}_t &= \mathbf{V}^p_t - \mathbf{K}_t\mathbf{C}\mathbf{V}^p_t, \end{align} \end{split}\]

where \(\mathbf{K}_t\) is the Kalman gain matrix:

\[ \mathbf{K}_t = \mathbf{V}^p_t\mathbf{C}^T(\mathbf{C}\mathbf{V}^p_t\mathbf{C}^T+\mathbf{R})^{-1}. \]

Smoothing#

The Kalman filter algorithm computes the posterior distribution \(p(\mathbf{x}_t|\mathbf{y}_{1:t}, \mathbf{u}_{1:t})\). This is the distribution of the state at time \(t\) given all measurements and control inputs up to time \(t\). This is useful for many applications. However, sometimes we want to compute the posterior distribution \(p(\mathbf{x}_t|\mathbf{y}_{1:T}, \mathbf{u}_{1:T})\). This is the distribution of the state at time \(t\) given all measurements and control inputs up to time \(T\). This is called the smoothing problem. The difference is that we use information from the future to improve our estimate of the state at time \(t\).

The smoothing problem can be solved by first computing the filtering distribution \(p(\mathbf{x}_t|\mathbf{y}_{1:t}, \mathbf{u}_{1:t})\) for all \(t\). This can be done using the Kalman filter algorithm. Once we are done with the Kalman filter, we have the desired distribution for the last time step \(p(\mathbf{x}_T|\mathbf{y}_{1:T}, \mathbf{u}_{1:T})\). It is a Gaussian distribution with mean \(\boldsymbol{\mu}_T\) and covariance matrix \(\mathbf{V}_T\). How do we get the distribution for the previous time step \(p(\mathbf{x}_{T-1}|\mathbf{y}_{1:T}, \mathbf{u}_{1:T})\)?

The joint distribution of \(\mathbf{x}_{T-1}\) and \(\mathbf{x}_T\) is given by:

\[ p(\mathbf{x}_{T-1},\mathbf{x}_T|\mathbf{y}_{1:T}, \mathbf{u}_{1:T}) = p(\mathbf{x}_{T-1}|\mathbf{x}_T,\mathbf{y}_{1:T}, \mathbf{u}_{1:T})p(\mathbf{x}_T|\mathbf{y}_{1:T}, \mathbf{u}_{1:T}). \]

We have the right-most term. Let’s just worry about the first term on the right-hand side. Think about it. To go from step \(T-1\) to \(T\) you need to know the control \(u_T\). All other controls do not matter. Also, since we condition on the state at time \(T\), the measurements at time \(T\) do not matter either. We have:

\[ p(\mathbf{x}_{T-1}|\mathbf{x}_T,\mathbf{y}_{1:T}, \mathbf{u}_{1:T}) = p(\mathbf{x}_{T-1}|\mathbf{x}_T,\mathbf{y}_{1:T-1}, \mathbf{u}_T). \]

Okay. Now we need to make the transition probability appear. Let’s use Bayes’ rule:

\[ p(\mathbf{x}_{T-1}|\mathbf{x}_T,\mathbf{y}_{1:T-1}, \mathbf{u}_T) = \frac{p(\mathbf{x}_T|\mathbf{x}_{T-1},\mathbf{y}_{1:T-1}, \mathbf{u}_T)p(\mathbf{x}_{T-1}|\mathbf{y}_{1:T-1}, \mathbf{u}_T)}{p(\mathbf{x}_T|\mathbf{y}_{1:T-1}, \mathbf{u}_T)}. \]

This is starting to look good. The denominator is the prediction step of the Kalman filter. It is:

\[ p(\mathbf{x}_T|\mathbf{y}_{1:T-1}, \mathbf{u}_T) = N(\mathbf{x}_T|\boldsymbol{\mu}^p_T,\mathbf{V}^p_T). \]

The second term in the numerator is the filtering distribution at time \(T-1\):

\[ p(\mathbf{x}_{T-1}|\mathbf{y}_{1:T-1}, \mathbf{u}_T) = N(\mathbf{x}_{T-1}|\boldsymbol{\mu}_{T-1},\mathbf{V}_{T-1}). \]

For the first term in the numerator, we use the Markov property. The state at time \(T\) only depends on the state at time \(T-1\) and the control at time \(T\). So:

\[ p(\mathbf{x}_T|\mathbf{x}_{T-1},\mathbf{y}_{1:T-1}, \mathbf{u}_T) = p(\mathbf{x}_T|\mathbf{x}_{T-1},\mathbf{u}_T) = N(\mathbf{x}_T|\mathbf{A}\mathbf{x}_{T-1}+\mathbf{B}\mathbf{u}_T,\mathbf{Q}). \]

Let’s put everything together:

\[\begin{split} \begin{align*} p(\mathbf{x}_{T-1},\mathbf{x}_T|\mathbf{y}_{1:T}, \mathbf{u}_{1:T}) &= p(\mathbf{x}_{T-1}|\mathbf{x}_T,\mathbf{y}_{1:T-1}, \mathbf{u}_T)p(\mathbf{x}_T|\mathbf{y}_{1:T}, \mathbf{u}_{1:T})\\ &= \frac{N(\mathbf{x}_T|\mathbf{A}\mathbf{x}_{T-1}+\mathbf{B}\mathbf{u}_T,\mathbf{Q})N(\mathbf{x}_{T-1}|\boldsymbol{\mu}_{T-1},\mathbf{V}_{T-1})}{N(\mathbf{x}_T|\boldsymbol{\mu}^p_T,\mathbf{V}^p_T)}N(\mathbf{x}_T|\boldsymbol{\mu}_T,\mathbf{V}_T) \end{align*} \end{split}\]

From this point on, it is just boring algebra. You have to show that the joint is a Gaussian and then you need to integrate out \(\mathbf{x}_T\) to get the distribution of \(\mathbf{x}_{T-1}\). The result is a Gaussian with mean \(\boldsymbol{\mu}_{T-1|T}\) and covariance matrix \(\mathbf{V}_{T-1|T}\):

\[\begin{split} \begin{align*} \boldsymbol{\mu}_{T-1|T} &= \boldsymbol{\mu}_{T-1} + \mathbf{J}_T(\boldsymbol{\mu}_T-\boldsymbol{\mu}^p_T)\\ \mathbf{V}_{T-1|T} &= \mathbf{V}_{T-1} + \mathbf{J}_T(\mathbf{V}_T-\mathbf{V}^p_T)\mathbf{J}_T^T, \end{align*} \end{split}\]

where \(\mathbf{J}_T\) is the smoother gain matrix:

\[ \mathbf{J}_T = \mathbf{V}_{T-1}\mathbf{A}^T\mathbf{V}^{-1}_T. \]

And you keep going back in time until you have the distribution for all time steps.