Restructuring State Models
Developing Models for Kalman Filters
In earlier installments, we mentioned that state transition models were not unique. While sometimes this causes problems, it also provides some extra degrees of freedom that can be used to restructure the state transition model, without changing any external results.
General state transition transformations
Start with the original state transition equation system.
Select an arbitrary nonsingular square matrix
Q
of the same size as the state transition matrix. By
definition, because it is nonsingular, an inverse exists.
Therefore, we can use this Q
matrix as a transformation
operator that maps every possible state vector x
into a
new vector q
, or we can invert this mapping to restore the
original x
vector.
Using these new notations, we can replace references to the
original x
state vector in the state transition equations.
Since the Q
matrix is nonsingular, we lose no information
multiplying both sides of the transition equations by Q
.
Now we can define some new names that simplify the notation.
Substituting these simplified notations, we get the following equation system.
The result is another discrete time state transition model, but in terms of different state variables. The input sequences are exactly the same. The output sequences are exactly the same. Clearly the model is an exact equivalent, even though the internal states and the internal model parameters are entirely different.
Same results. Who cares?
There are many possible reasons you might care. There can be important benefits from using the transformed state variables.
- The restructuring can be used to orthogonalize the state transition matrix for improved numerical accuracy.
- The transformations can impose useful scalings on the magnitudes of the internal state values.
- The restructuring can be use to impose a matrix structure with certain terms that exactly equal 0.0 .
- The restructuring can factor some or all states into a modal form with reduced internal interactions between state variables.
- A model can be transformed into a conventional canonical form for which certain theoretical results are more easily applied.
Some useful transformations
Here are some particular transformations that are likely to prove useful.
Scaling transformation
You already know about this one from prior installments of this series.[1] Construct a square matrix with positive scaling values along the main diagonal. Calculating the inverse of the diagonal scaling matrix is trivial.
Row-permutation transformation
Start with an identity matrix, and swap any two columns i
and
j
. When you left-multiply this matrix to any square
matrix (or compatible column vector), it has the effect of swapping
terms in the i
th and the j
th rows; in
effect, this repositions the state variables in the state vector, and
repositions the associated model parameters, without changing any values.
For example, let's consider transforming the following arbitrary state transition matrix by moving the second state variable to the position of the third, and the third state variable to the position of the second.
mmat = 9.8051e-01 -2.5922e-02 -2.8938e-02 3.8595e-02 -3.2968e-02 1.0265e+00 -1.7103e-02 -1.4628e-03 -2.5834e-02 -8.4132e-04 9.7193e-01 -1.5873e-02 3.4865e-02 -1.7536e-02 -1.3940e-02 9.6629e-01 pmat = 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 qmat = pmat' * mmat * pmat qmat = 9.8051e-01 -2.8938e-02 -2.5922e-02 3.8595e-02 -2.5834e-02 9.7193e-01 -8.4132e-04 -1.5873e-02 -3.2968e-02 -1.7103e-02 1.0265e+00 -1.4628e-03 3.4865e-02 -1.3940e-02 -1.7536e-02 9.6629e-01
The net result of the system transformation is that parameters related to the second and third rows and columns are swapped. No values are changed: only their locations. None of the other terms change value or position.
Transposition matrices play nicely with each other. You can do a major reordering by applying a cascade of transposition operations. You can also consolidate the permutation operations and apply them all in one lump.
The most likely reasons to use this kind of transformation:
- to isolate variables deemed secondary or inconsequential from other variables deemed important
- to isolate "clusters" of closely related variables
- to facilitate theoretical analysis of matrix operations
An output observation transformation
Suppose that in your state transition model the observation matrix
C
is a row vector having many nonzero terms. This says
that the observed output is determined by a direct combination of
several internal states. The goal is to define a new set of states such
that under the new definitions the output is determined by a direct
readout of one of the internal state variables.
How might this be done? Well, we need the Q-1
matrix to have the property that
Let's consider how to do this for the state transition equations used previously in this installment, using a process known as orthogonalization.[2] Suppose that the original observation matrix is the following.
cmat = [ 8.2520e+00 2.3353e+00 0.0000e+00 -1.0449e+01 ];
Initialize a completely random transformation matrix with values in the range -0.5 to +0.5.
qinv = rand(4,4)-ones(4,4)*0.5;
Calculate a scaling factor from the initial C
matrix.
cscale = 1.0 / (cmat*cmat');
Replace the first column of the random matrix with a
vector that is a scaled version of the transpose
of the C
matrix.
qinv(1:4,1) = cscale * cmat';
For all of the other columns, modify the column by
subtracting a certain amount of vector CT
such that the product of the modified column
and the original C
matrix is zero.
for icol=2:4 colscale = cmat * qinv(1:4,icol) * cscale qinv(1:4,icol) = qinv(1:4,icol) - colscale*cmat' end
Now verify that the inverse transformation works as intended
by forming the C Q-1
matrix product.
newcmat = cmat * qinv newcmat = 1.0000e+00 -7.5460e-17 -8.4893e-17 -1.5213e-16
Knowing the Q-1
matrix, you
can invert it and complete the rest of the state equation
transformation.