Reformulated Observer

Through the series on State Observers, we dealt with hidden random variations in state variables in an entirely qualitative manner. Observer designs were chosen so that the tracking of system outputs looked good. Whatever that means! To some extent, you don't care what it means. You do the cut-and-try analysis, you get an observer that works well enough and if the results are well within the tolerances you need, you are allowed to stop right there.

However, that situation is not completely satisfying. How can you be sure that you didn't miss something really important? Is there a way to bypass the cut-and-try experimentation, and go directly to a best observer design (whatever that means)? The objectives of the observer design are to make it insensitive to noise (whatever that means) while producing the best tracking of the real system state (ditto). In general, these goals are incompatible.

To quantify all of this, we need a means to represent and analyze variability. In classic Kalman filtering, the tool for doing this is variance.

Review of variance and covariance

For model building, we used a correlation function to identify linear relationships between two data sequences for a specified time separation. The sequences under study were typically input data and output data after a specified delay. Correlation was estimated by multiplying terms from each sequence, pairwise, and averaging the product terms to estimate the statistical expected value.

Given a vector of variables, each of which is zero mean and random, we can select any two of these variables and perform the same kind of correlation analysis on them. For Kalman filters, noise sources are presumed to be white; that is, there is no non-zero correlation with any other term for time shifts other than zero. Consequently, attention can be restricted to zero time shift. Selecting the variable pairs systematically and repeating the analysis, the results can be collected into a matrix, with rows/columns of the matrix indicating the first/second variable selected for analysis. This matrix is called a covariance matrix. We can observe that the termwise products are commutative, and this results in a covariance matrix that is symmetric with respect to the main diagonal.^[1]

Like correlation, if the resulting values in the covariance matrix are distinctly positive, or distinctively negative, this indicates that there is a linear relationship between the variables at each instant of time. As a special case, if the covariance is calculated for one of the terms and itself, every intermediate product term is positive. Consequently, the main diagonal terms of the covariance matrix dominate and are always positive. If two random terms are statistically independent, the corresponding covariance matrix terms are zero.

A widely-adopted practice in Kalman Filter applications is to ignore off-diagonal terms in covariance matrices^[2].

Important properties of variance

Covariance calculations and rank-one updates

Observe that for column vector x the rank one matrix x x^T contains the pairwise products of every term with every other term at a given time instant. Averaging these rank-one matrices over a long sequence provides an alternate formulation for estimating the covariance matrix.

Covariance and expected values

As the number of terms used to estimate variance increases toward infinity, the estimates converge to the statistical expected value, indicated by notation E( · ).

Covariance and constant vectors

For constant vector terms, the expected values are the same as the averages and the same as the values. For any constant vector x, its covariance is x x^T.

Vector addition and covariance

Variances are additive. If you have a random vector x1 with variance matrix V1, and another random vector x2 with variance matrix V2, the covariance for the sum vector x1 + x2 is is the matrix V1 + V2.

Covariances under transformations

Suppose that we know that matrix V is a covariance matrix characterizing the relationships between the variables in a vector x. Suppose that we then apply matrix M to transform vector x into some new vector q = M x .

Using the rank one scheme for computing the new variance, we can determine that the covariance of the transformed vector is

  cov( q )  =
    cov( M x )  =
    E ( [M x] [M x]^T ) =
    M  E(x x^T) M^T = 
    M V M^T

Covariance and dynamic system noise

When we first introduced the dynamic state transition models for linear systems, we reserved some notations w and v to represent random effects. For the N state variables in the dynamic model, there are N terms in the random noise vector w to represent disturbances that directly affect next-state values. These random influences can be characterized by covariance matrix Q. For the M output variables in the observation equation, there will be M corresponding terms in the random noise vector v to represent disturbances that occur during the process of observing the output values. These random influences can be characterized by covariance matrix V.

Since the dynamic equations are linear, the effects on state due to inputs and the effects on state due to randomness can be separated, at least for purposes of analysis. The subsequent cumulative effects on the state variables are also random, and these random effects too can be separated from the driven response effects, and described by a reserved covariance matrix, P. The difference is that this particular covariance, being an unobservable property of hidden variables, is extraordinarily difficult to pin down. Also, because random effects are propagated by state transition matrix operations, the random component persists over time, and this state noise is not white.

Coming up next

We need to follow up on the basic ideas in this installment and examine what they mean for random influences included in the dynamic state transition equations.

[1] Statisticians will be appalled at this tail-wags-the-dog way of describing variance. Variance is considered a fundamental property of statistical distributions, characterizing the "spread" of the distribution, while correlation is something that engineers use to describe a time series. But at the end of the day, they are both just averages of product-terms.

[2] This practice is known as "CReative Assignment of Parameters. The justification usually given is... well, nonexistent. And vexing problems can result, as we will see.

Quantifying Variation