Kalman Filter

Proof of Kalman Filter

\usepackageamsmath\usepackageamssymb

Linear System

The figure below shows a discrete-time linear system.

Discrete-time linear system

k represents the discrete time instants. xk, uk and zk represent the state of the system, input to the system and measurement of the system at time k respectively. The system is governed by the following equations -

xk+1=Fkxk+Gkuk+wk(State Equation)zk+1=Hk+1xk+1+vk+1(Sensor Equation)

wk is the modeling noise of the state and vk is the measurement noise of the sensors. x, z, w and v are random variables, and u, F, G and H are deterministic. The task of filtering is to obtain an estimate of state xk, given observations z1,z2,...,zk. The Kalman filter is a linear unbiased estimator which minimizes the mean squared error. We present the proof of Kalman filter here for a discrete-time linear system.

Notations

We use the following notations:

Zk=z1,z2,...,zkx^k|j=E[xk|Zj]Pk|j=Var[xk|Zj]=E[(xkx^k|j)(xkx^k|j)T|Zj]

Assumptions

We assume the following:

  1. wk and vk are zero-mean white noise.

    E[wk]=0E[vk]=0Cov[wk,wl]=E[wkwlT]={Qkk=l0klCov[vk,vl]=E[vkvlT]={Rkk=l0kl

    Qk and Rk are symmetric positive-definite matrices.

  2. wk and vk are uncorrelated with xk and Zk.

  3. E[x0]=x0|0 and Var[x0]=E[(x0x0|0)(x0x0|0)T]=P0|0 are the given initial state conditions. x0 is uncorrelated with w0 and v0.

Kalman Filter

Kalman filter gives a recursive formula for the estimates in prediction and update steps.

Prediction

x^k+1|k=Fkx^k|k+GkukPk+1|k=FkPk|kFkT+Qk

Update

Kk+1=Pk+1|kHk+1T(Hk+1Pk+1|kHk+1T+Rk+1)1x^k+1|k+1=x^k+1|k+Kk+1(zk+1Hk+1x^k+1|k)Pk+1|k+1=(IKk+1Hk+1)Pk+1|k

The Kalman filter propagates the conditional mean and variance of the state, by expressing x^k+1|k+1 and Pk+1|k+1 in terms of x^k|k and Pk|k.

x^k+1|k is the prior estimate of xk+1 before observing zk+1, and x^k+1|k+1 is the posterior estimate of xk+1 after the measurement zk+1 of the state. Similarly Pk+1|k and Pk+1|k+1 are the prior and posterior estimates of the variance of the state.

Kk+1 is called the Kalman Gain at time k+1, because it weighs the information gained from the measurement zk+1 by taking its difference with the predicted measurement Hk+1x^k+1|k.

We prove three lemmas in the next section, two of which involve taking the gradient of trace operations and the other concerns with uncorrelated random variables. We use those to prove the Kalman Filter equations in section.

Lemmas

Lemma 1

E[(X+Y)(X+Y)T]=E[XXT]+E[YYT], if X and Y are uncorrelated, and one of E[X] or E[Y] is 0.

Proof

Proof

X and Y are uncorrelated, therefore E[XYT]=E[X]E[YT].
One of E[X] or E[Y] equals 0, therefore E[XYT]=0.

Similarly, E[YXT]=0.

Expanding E[(X+Y)(X+Y)T],

E[(X+Y)(X+Y)T]=E[XXT]+E[YYT]+E[XYT]+E[YXT]=E[XXT]+E[YYT](E[XYT]=E[YXT]=0)

Therefore E[(X+Y)(X+Y)T]=E[XXT]+E[YYT].

Lemma 2

ATr(AB)=BT
Proof

Proof

Let A=[a]n×p and B=[b]p×n.

Tr(AB)=k=1n(AB)kk=k=1nl=1paklblk

Taking gradient with respect to aij,

aijTr(AB)=aijk=1nl=1paklblk=k=1nl=1paijaklblk=aijaijbji=bji

Therefore ATr(AB)=BT.

Lemma 3

ATr(ABAT)=A(B+BT)=B=BT2AB
Proof

Proof

Let A=[a]n×p and B=[b]p×p.

Tr(ABAT)=k=1n(ABAT)kk=k=1nl=1pakl(BAT)lk=k=1nl=1paklr=1pblr(AT)rk=k=1nl=1pr=1paklblrakr

Taking gradient with respect to aij,

aijTr(ABAT)=aijk=1nl=1pr=1paklblrakr=k=1nl=1pr=1paijaklblrakr=l=1pr=1paijailblrair=aijaij2bjj+l=1,ljpaijailbljaij+r=1,rjpaijaijbjrair=2aijbij+l=1,ljpailblj+r=1,rjpbjrair=l=1pailblj+r=1pairbjr

Therefore,

ATr(ABAT)=AB+ABT=2ABif B=BT

Proof

We prove the prediction and update equations of Kalman Filter.

Let us rewrite the equations governing the discrete-time linear system.

(1)xk+1=Fkxk+Gkuk+wk(State Equation)(2)zk+1=Hk+1xk+1+vk+1(Sensor Equation)

We find x^k+1|k using the Kalman Filter state equation.

x^k+1|k=E[xk+1|Zk]=E[Fkxk+Gkuk+wk|Zk](using Eq 1)=FkE[xk|Zk]+Gkuk+E[wk|Zk](uk is not random)=FkE[xk|Zk]+Gkuk(E[wk]=0)(3)=Fkx^k|k+Gkuk(definition of x^k|k)

We substitute x^k+1|k in the definition of Pk+1|k using equation 3.

Pk+1|k=E[(xk+1x^k+1|k)(xk+1x^k+1|k)T|Zk]=E[(Fkxk+Gkuk+wkFkx^k|kGkuk)(Fkxk+Gkuk+wkFkx^k|kGkuk)T|Zk](substituting xk+1 and x^k+1|k using Eq 1 and Eq 3 resp.)=E[(Fk(xkx^k|k)+wk)(Fk(xkx^k|k)+wk)T|Zk]

From definition, we know that x^k|k=g(Zk) for some function g, and wk is uncorrelated with Zk and xk.

Therefore wk is uncorrelated with Fk(xkx^k|k). E[wk]=0, and we can use Lemma 1.

Pk+1|k=E[(Fk(xkx^k|k)+wk)(Fk(xkx^k|k)+wk)T|Zk]=FkE[(xkx^k|k)(xkx^k|k)T|Zk]FkT+E[wkwkT|Zk](using Lemma 1)=FkPk|kFkT+Qk(definition of Pk|k and Qk)

Kalman filter is an unbiased linear optimal estimator. This implies the following:

  1. Unbiased - The expected value of an unbiased estimator equals the expected value of the parameter. Therefore, E[x^k+1|k+1]=E[xk+1] and E[x^k|k]=E[xk].

  2. Linear - A linear estimator means it is a linear combination of the observations, which in this case is Zk+1.

    x^k+1|k is a function of Zk. Therefore let x^k+1|k+1=Kk+1x^k+1|k+Kk+1zk+1, for some matrix Kk+1 and Kk+1.

  3. Optimal - The estimator minimizes the mean squared error. Therefore, x^k+1|k+1=argminE[(xk+1x^k+1|k+1)T(xk+1x^k+1|k+1)|Zk+1].

We use these three conditions to find Kk+1 and Kk+1.

E[xk+1]=E[x^k+1|k+1](x^k+1|k+1 is unbiased)=E[Kk+1x^k+1|k+Kk+1zk+1](x^k+1|k+1 is a linear estimator)=Kk+1E[Fkx^k|k+Gkuk]+Kk+1E[Hk+1xk+1+vk+1](substituting zk+1 and x^k+1|k using Eq 2 and Eq 3 resp.)=Kk+1(FkE[x^k|k]+Gkuk)+Kk+1Hk+1E[xk+1](E[vk+1]=0)=Kk+1(FkE[xk]+Gkuk)+Kk+1Hk+1E[xk+1](x^k|k is unbiased)=Kk+1E[Fkxk+Gkuk+wk]+Kk+1Hk+1E[xk+1](E[wk]=0)=Kk+1E[xk+1]+Kk+1Hk+1E[xk+1](using Eq 1)=(Kk+1+Kk+1Hk+1)E[xk+1]I=Kk+1+Kk+1Hk+1(4)Kk+1=IKk+1Hk+1

Substituting Kk+1 in the expression of x^k+1|k+1,

x^k+1|k+1=Kk+1x^k+1|k+Kk+1zk+1=(IKk+1Hk+1)x^k+1|k+Kk+1zk+1(using Eq 4)(5)=x^k+1|k+Kk+1(zk+1Hk+1x^k+1|k)

We substitute x^k+1|k+1 in the definition of Pk+1|k+1 using equation 5,

Pk+1|k+1=E[(xk+1x^k+1|k+1)(xk+1x^k+1|k+1)T|Zk+1]=E[(xk+1x^k+1|kKk+1(zk+1Hk+1x^k+1|k))(xk+1x^k+1|kKk+1(zk+1Hk+1x^k+1|k))T|Zk+1](substituting xk+1 using Eq 5)=E[(xk+1x^k+1|kKk+1(Hk+1xk+1+vk+1Hk+1x^k+1|k))(xk+1x^k+1|kKk+1(Hk+1xk+1+vk+1Hk+1x^k+1|k))T|Zk+1](substituting zk+1 using Eq 2)=E[((IKk+1Hk+1)(xk+1x^k+1|k)Kk+1vk+1)((IKk+1Hk+1)(xk+1x^k+1|k)Kk+1vk+1)T|Zk+1]

x^k+1|k+1=g(Zk+1) for some function g, and vk+1 is uncorrelated with Zk+1 and xk+1. Therefore Kk+1wk+1 is uncorrelated with (IKk+1Hk+1)(xk+1x^k+1|k). E[vk+1]=0. Therefore we can use Lemma 1.

Pk+1|k+1=E[((IKk+1Hk+1)(xk+1x^k+1|k)Kk+1vk+1)((IKk+1Hk+1)(xk+1x^k+1|k)Kk+1vk+1)T|Zk+1]=(IKk+1Hk+1)E[(xk+1x^k+1|k)(xk+1x^k+1|k)T|Zk+1]=(IKk+1Hk+1)T+Kk+1E[vk+1vk+1T|Zk+1]Kk+1T(using Lemma 1)=(IKk+1Hk+1)Pk+1|k(IKk+1Hk+1)T+Kk+1Rk+1Kk+1T(6)(definition of Pk+1|k and Rk+1)

We find Kk+1 by minimizing the conditional mean squared error of x^k+1|k+1, L=E[(xk+1x^k+1|k+1)T(xk+1x^k+1|k+1)|Zk+1].

We express L as a trace of Pk+1|k+1, and use equation 6 to write it in terms of Kk+1. We then minimize L and find Kk+1.

L=E[(xk+1x^k+1|k+1)T(xk+1x^k+1|k+1)|Zk+1]=Tr(E[(xk+1x^k+1|k+1)(xk+1x^k+1|k+1)T|Zk+1])=Tr(Pk+1|k+1)(definition of Pk+1|k+1)=Tr((IKk+1Hk+1)Pk+1|k(IKk+1Hk+1)T+Kk+1Rk+1KT)(using Eq 6)=Tr(Pk+1|kPk+1|kHk+1TKk+1TKk+1Hk+1Pk+1|k+Kk+1Hk+1Pk+1|kHk+1TKk+1T+Kk+1Rk+1Kk+1T)=Tr(Pk+1|k)Tr(Pk+1|kHk+1TKk+1T)Tr(Kk+1Hk+1Pk+1|k)+Tr(Kk+1(Hk+1Pk+1|kHk+1T+Rk+1)Kk+1T)(Tr is a linear operator)=Tr(Pk+1|k)Tr(Kk+1Hk+1Pk+1|k)Tr(Kk+1Hk+1Pk+1|k)+Tr(Kk+1(Hk+1Pk+1|kHk+1T+Rk+1)Kk+1T)(Tr(A)=Tr(AT))=Tr(Pk+1|k)2Tr(Kk+1Hk+1Pk+1|k)(7)+Tr(Kk+1(Hk+1Pk+1|kHk+1T+Rk+1)Kk+1T)

Taking the gradient of L with respect to Kk+1 and setting it equal to 0,

Kk+1L=00=Kk+1(Tr(Kk+1(Hk+1Pk+1|kHk+1T+Rk+1)Kk+1T)2Tr(Kk+1Hk+1Pk+1|k)+Tr(Pk+1|k))(using Eq 7)=Kk+1(Tr(Kk+1(Hk+1Pk+1|kHk+1T+Rk+1)Kk+1T)2Tr(Kk+1Hk+1Pk+1|k))=2Kk+1(Hk+1Pk+1|kHk+1T+Rk+1)2Pk+1|kHk+1T(using Lemma 2 and 3)(8)Kk+1=Pk+1|kHk+1T(Hk+1Pk+1|kHk+1T+Rk+1)1

We can simplify the expression of Pk+1|k+1 in equation 6 using equation 8.

Pk+1|k+1=(IKk+1Hk+1)Pk+1|k(IKk+1Hk+1)T+Kk+1Rk+1Kk+1T=Pk+1|kPk+1|kHk+1TKk+1TKk+1Hk+1Pk+1|k+Kk+1Hk+1Pk+1|kHk+1TKk+1T+Kk+1Rk+1Kk+1T=Pk+1|kPk+1|kHk+1TKk+1TKk+1Hk+1Pk+1|k+Kk+1(Hk+1Pk+1|kHk+1T+Rk+1)Kk+1T=Pk+1|kPk+1|kHk+1TKk+1TKk+1Hk+1Pk+1|k+Pk+1|kHk+1T(Hk+1Pk+1|kHk+1T+Rk+1)1(Hk+1Pk+1|kHk+1T+Rk+1)Kk+1T(substituing Kk+1 using Eq 8)=Pk+1|kPk+1|kHk+1TKk+1TKk+1Hk+1Pk+1|k+Pk+1|kHk+1TKk+1T=Pk+1|kKk+1Hk+1Pk+1|k=(IKk+1Hk+1)Pk+1|k

Arranging all results together,

x^k+1|k=Fkx^k|k+GkukPk+1|k=FkPk|kFkT+QkKk+1=Pk+1|kHk+1T(Hk+1Pk+1|kHk+1T+Rk+1)1x^k+1|k+1=x^k+1|k+Kk+1(zk+1Hk+1x^k+1|k)Pk+1|k+1=(IKk+1Hk+1)Pk+1|k

Footnotes