Quantifying Variation
Developing Models for Kalman Filters
Through the series on State Observers, we dealt with hidden random variations in state variables in an entirely qualitative manner. Observer designs were chosen so that the tracking of system outputs looked good. Whatever that means! To some extent, you don't care what it means. You do the cut-and-try analysis, you get an observer that works well enough and if the results are well within the tolerances you need, you are allowed to stop right there.
However, that situation is not completely satisfying. How can you be sure that you didn't miss something really important? Is there a way to bypass the cut-and-try experimentation, and go directly to a best observer design (whatever that means)? The objectives of the observer design are to make it insensitive to noise (whatever that means) while producing the best tracking of the real system state (ditto). In general, these goals are incompatible.
To quantify all of this, we need a means to represent and analyze variability. In classic Kalman filtering, the tool for doing this is variance.
Review of variance and covariance
For model building, we used a correlation function to identify linear relationships between two data sequences for a specified time separation. The sequences under study were typically input data and output data after a specified delay. Correlation was estimated by multiplying terms from each sequence, pairwise, and averaging the product terms to estimate the statistical expected value.
Given a vector of variables, each of which is zero mean and random, we can select any two of these variables and perform the same kind of correlation analysis on them. For Kalman filters, noise sources are presumed to be white; that is, there is no non-zero correlation with any other term for time shifts other than zero. Consequently, attention can be restricted to zero time shift. Selecting the variable pairs systematically and repeating the analysis, the results can be collected into a matrix, with rows/columns of the matrix indicating the first/second variable selected for analysis. This matrix is called a covariance matrix. We can observe that the termwise products are commutative, and this results in a covariance matrix that is symmetric with respect to the main diagonal.[1]
Like correlation, if the resulting values in the covariance matrix are distinctly positive, or distinctively negative, this indicates that there is a linear relationship between the variables at each instant of time. As a special case, if the covariance is calculated for one of the terms and itself, every intermediate product term is positive. Consequently, the main diagonal terms of the covariance matrix dominate and are always positive. If two random terms are statistically independent, the corresponding covariance matrix terms are zero.
A widely-adopted practice in Kalman Filter applications is to ignore off-diagonal terms in covariance matrices[2].
Important properties of variance
Covariance calculations and rank-one updates
Observe that for column vector x
the rank one matrix
x xT
contains the pairwise products
of every term with every other term at a given time instant.
Averaging these rank-one matrices over a long sequence provides
an alternate formulation for estimating the covariance matrix.
Covariance and expected values
As the number of terms used to estimate variance increases
toward infinity, the estimates converge to the statistical
expected value, indicated by notation E( · )
.
Covariance and constant vectors
For constant vector terms, the expected values
are the same as the averages and the same as the values. For any
constant vector x
, its covariance is
x xT
.
Vector addition and covariance
Variances are additive. If you have a random vector
x1
with variance matrix V1
, and another
random vector x2
with variance matrix V2
,
the covariance for the sum vector x1 + x2
is
is the matrix V1 + V2
.
Covariances under transformations
Suppose that we know that matrix V
is
a covariance matrix characterizing the relationships between
the variables in a vector x
. Suppose that
we then apply matrix M
to transform vector x
into some new vector q = M x
.
Using the rank one scheme for computing the new variance, we can determine that the covariance of the transformed vector is
cov( q ) = cov( M x ) = E ( [M x] [M x]T ) = M E(x xT) MT = M V MT
Covariance and dynamic system noise
When we first introduced the dynamic state transition models for
linear systems, we reserved some notations w
and
v
to represent random effects. For the
N
state variables in the dynamic model, there are
N
terms in the random noise vector
w
to represent disturbances that directly affect
next-state values. These random influences can be characterized
by covariance matrix Q
. For the M
output
variables in the observation equation, there will be
M
corresponding terms in the random noise vector
v
to represent disturbances that occur during the
process of observing the output values. These random influences
can be characterized by covariance matrix V
.
Since the dynamic equations are linear, the effects on state
due to inputs and the effects on state due to randomness can be
separated, at least for purposes of analysis. The subsequent
cumulative effects on the state variables are also random, and
these random effects too can be separated from the driven
response effects, and described by a reserved
covariance matrix, P
. The difference is that this
particular covariance, being an unobservable property of hidden
variables, is extraordinarily difficult to pin down. Also, because
random effects are propagated by state transition matrix operations,
the random component persists over time, and this state noise is
not white.
Coming up next
We need to follow up on the basic ideas in this installment and examine what they mean for random influences included in the dynamic state transition equations.
[1] Statisticians will be appalled at this tail-wags-the-dog way of describing variance. Variance is considered a fundamental property of statistical distributions, characterizing the "spread" of the distribution, while correlation is something that engineers use to describe a time series. But at the end of the day, they are both just averages of product-terms.
[2] This practice is known as "CReative Assignment of Parameters. The justification usually given is... well, nonexistent. And vexing problems can result, as we will see.