Faster RLS Solution

Last time we began a search for modifications to the Normal Equations updating method that could make recalculation of the parameter values more efficient after each update. We also presented an obscure mathematical theorem, with a promise to explain the relevance.

We obtained the following formula for calculating incremental adjustments to parameter values.

({M_{n + 1}}^{T} M_{n + 1}) \cdot Δ p = {m_{n + 1}}^{T} \cdot (y_{n + 1} - m_{n + 1} \cdot p_{n})

But so far, a numerically intensive matrix inversion calculations for the parameter updates is still required.

Δ p = {({M_{n + 1}}^{T} M_{n + 1})}^{- 1} {m_{n + 1}}^{T} \cdot (y_{n + 1} - m_{n + 1} \cdot p_{n})

Applying the Matrix Inversion Lemma

Restating the special form of the Matrix Inversion Lemma^[1] here:

{(F + A B^{- 1} A^{T})}^{- 1} \Leftrightarrow F^{- 1} - F^{- 1} A {(B + A^{T} F^{- 1} A)}^{- 1} A^{T} F^{- 1}

To apply this specifically to the problem of updating and solving Normal Equations systems with rank one updates, define the following changes of notation.

F^{- 1} \Leftrightarrow {({M_{n}}^{T} M_{n})}^{- 1}

B \Leftrightarrow I

A \Leftrightarrow {m_{n + 1}}^{T}

Inserting these expressions systematically into the Matrix Inversion Lemma formula yields the following.

{(M_{n}^{T} M_{n} + {m_{n + 1}}^{T} I m_{n + 1})}^{- 1} =

{(M_{n}^{T} M_{n})}^{- 1} - {(M_{n}^{T} M_{n})}^{- 1} {m_{n + 1}}^{T} {(I + m_{n + 1} {(M_{n}^{T} M_{n})}^{- 1} {m_{n + 1}}^{T})}^{- 1} m_{n + 1} {(M_{n}^{T} M_{n})}^{- 1}

The expression on the left (the first line) is the updated inverse matrix that we are trying to obtain. On the right side (the second line), the previously-constructed inverse Normal Equations matrix M_n^TM_n^-1 is used to calculate the updated inverse matrix.

There is some initial disappointment in seeing that matrix inversions are still involved. However:

The (M_n^TM_n)^-1 matrix terms are known from the previous update step and requires no extra calculations. They carry forward from the previous step.
For a rank-one update, the complicated double-parenthesized sum expression reduces to a scalar value, making its inversion trivial.

The calculations on the right side of the expression start to look much simpler after some additional notation changes.

X1 == (M_n+1^TM_n+1)^-1 · (m_n+1)^T
X2 == (m_n+1) · (M_n+1^TM_n+1)^-1
X3 == (m_n+1) · X1

Substituting these intermediate terms into the expression above:

= {(M_{n}^{T} M_{n})}^{- 1} - X_{1} {(I + X_{3})}^{- 1} X_{2}

When the parenthesized sum term is a scalar, this expression can be simplified even further.

{(M_{n + 1}^{T} M_{n + 1})}^{- 1} = {(M_{n}^{T} M_{n})}^{- 1} - \frac{X_{1} X_{2}}{(I + X_{3})}

That kind of expression is also something that you might have seen before. It is now clear that the expression on the right is a rank-one update applied directly to the inverse matrix we had previously. In other words, updating the inverse matrix is almost as easy as updating the original matrix. There are just a few additional preliminary calculations for intermediate X1, X2, and X3 expressions prior to applying the update.

The strategy consisting of

calculation of incremental parameter adjustments
maintaining the Normal Equations matrix in an inverse form
rank-one updates

in combination, is known as the Recursive Least Squares method (RLS).

Verifying the method

Do you suppose this actually works? Let's give it a try. Suppose that we want to determine the parameters for a second order polynomial model.

    y  =  a·1 +  b·x +  c·x²

Though this model is not linear, it is linear in the parameters, so given the values of the powers of x as the input data, we can apply the linear least squares methods to determine the best fit for the observed output values y. We will let Octave do all of the hard calculations. Here is the set of input and output values, consisting of powers of dependent variable x.

    M7 =
    1   -2    4
    1   -1    1
    1    0    0
    1    1    1
    1    2    4
    1    3    9
    1    4   16

   Y7 =
   1.3800
   1.1600
   1.4100
   2.0800
   2.9800
   4.4100
   6.2400

Now construct the Normal Equations in the direct way for the best fit.

m7tm7 = M7'*M7
m7ty7 = M7'*Y7
m7inv = m7tm7^-1
fit7  = m7inv*m7ty7

    m7tm7 =
     7     7    35
     7    35    91
    35    91   371

    m7ty7 =
    19.660
    42.310
   160.210

    m7inv =
   0.285714   0.035714  -0.035714
   0.035714   0.083333  -0.023810
  -0.035714  -0.023810   0.011905

    fit7 =
   1.40643
   0.41345
   0.19774

Lets do the same things again, but this time with one additional constraint row added to the matrix and right hand side in the first row.

    M8 =
    1    5   25
    1   -2    4
    1   -1    1
    1    0    0
    1    1    1
    1    2    4
    1    3    9
    1    4   16

    Y8 =
   8.4900
   1.3800
   1.1600
   1.4100
   2.0800
   2.9800
   4.4100
   6.2400

m8tm8 = M8'*M8
m8ty8 = M8'*Y8
m8inv = m8tm8^-1
fit8  = m8inv*m8ty8

    m8inv =
   0.2321429   0.0178571  -0.0178571
   0.0178571   0.0773810  -0.0178571
  -0.0178571  -0.0178571   0.0059524

    fit8 =
   1.39732
   0.41042
   0.20077

We can see that after including the additional new constraint, the calculated best fit parameter values are a little different (as expected).

Now lets go back to the original problem and exercise the RLS updating formulas. We know what the inverse matrix was for the original problem, so we will start with that. Introduce the new constraint data. Calculate the intermediate terms, and using these, update the Normal Equations matrix in inverse form.

m8 = [1.0  5.0  25.0]
y8 = 8.49
X1 = m7inv * m8'
X3 = m8 * X1
divis = (1.0 + X3)
newm8 = m7inv - (X1 * X1') / divis

    X1 =
  -0.42857
  -0.14286
   0.14286

   X3 =  2.4286
   divis =  3.4286

    newm8 =
   0.2321429   0.0178571  -0.0178571
   0.0178571   0.0773810  -0.0178571
  -0.0178571  -0.0178571   0.0059524

A quick inspection shows that this calculation has yielded exactly the same inverted matrix results as the direct method. Next, apply the incremental update equation to determine the updated parameter values.

resid = y8 - m8*fit7
delfit8 = newinv8 * m8' * resid
newfit8 = fit7 + delfit8

   resid =  0.072857

   delfit8 =
  -0.0091071
  -0.0030357
   0.0030357

   newfit8 =
   1.39732
   0.41042
   0.20077

Once again, this is exactly the same solution that we would obtain using the direct construction method for the Normal Equations. Thus, we have verified the equivalence of the direct construction and the RLS inverse matrix updating approach.

Startup details

Until you have collected enough updates, the normal equations matrix (M^TM) is not full rank, and therefore an inverse matrix does not exist. For the RLS updating, one way to work around this limitation is to collect a few updates and invert the matrix explicitly one time to get things started.

Another very popular trick is to fudge the initial (M^TM) matrix. Instead of starting with an (M^TM) matrix of zero and trying to invert it, which will of course fail, start with an (M^TM) matrix that is not very far from zero but full rank, and invert that. Something like the following will do the job.

MTM = eye(7,7) * 1.0e-7;
MTMinv = MTM^-1;

Using this trick, the inverse matrix remains well defined from the beginning, even if the first few parameter estimates generated don't mean much. This introduces a deliberate small numerical error, but so small that it is overwhelmed by numerical rounding errors or residual random noise disturbances coming in from raw input/output data.

Coming next

The ability of the RLS formulation to incorporate new information efficiently, as it becomes available, explains why this method is appealing for real-time adaptive systems. Model adjustments can go into effect as soon as possible. If you have followed this series closely, you already know why this idea could lead to disaster. Next time, we will examine why this is so, and what can be done to set things right.

Footnotes:

[1] See the previous article in this series.

Efficient "Recursive Least Squares" Computation

Applying the Matrix Inversion Lemma

Verifying the method

Startup details

Coming next