# Least Squares Derivative Estimator

Numerical Methods for Derivative Estimation

In previous sections, we found that it was feasible
to produce elegant derivative estimators that combine the features
of a maxflat filter to exclude high frequency noise, and also
capabilities like the central differences method for estimating
derivative values. While these designs have excellent performance
at the frequency extremes, they compromise the accuracy and noise
attenuation in the middle bands. Promising but insufficient
designs based on *least-squares best fit* of polynomial
curves produced generally better accuracy across the lower frequency
range, but lacked control of noise rejection properties at higher
frequencies.

In this section, we will consider a completely
different approach. Instead of designing with polynomials
and evaluating the frequency response, we will design the
filter directly according to the desired frequency
response,^{[1]} again
using "least squares best fit" methods. The
resulting estimators offer a more balanced tradeoff between
accuracy of the estimates and noise rejection.

## The least squares strategy

We have seen previously that the frequency response imposed by a derivative filter will have the following properties.

- Odd symmetry. When applied to a function with even symmetry like a cosine wave, any smooth function will have a derivative of zero at frequency zero. This can be achieved when the curves used in the approximation formula have odd symmetry, and naturally pass through zero. Instead of polynomials, sine waves are used.
- For sine and cosine wave functions, the magnitude of the derivative
of these functions at frequency
`w`

is equal to`w`

. - The filter must attempt to match this linear frequency-proportional
response characteristic for frequencies from 0 through
`0.2 π`

. - Responses for frequencies beyond
`0.2 π`

through to`π`

should ideally be zero, so that high frequency noise is attenuated. - The desired frequency response can be specified by a vector with a
large number
`M`

of points spanning the representable frequency range from`0.0`

to`π`

. Only the first`0.2 * M`

of these`M`

terms are nonzero. - We can try to match this response shape using a sum of a limited
number
`N`

of sine waves. Using more sine waves gives possibilities for a more accurate fit with better noise rejection, but at a cost in computational effort when the estimator is applied. - Only positive frequencies need to be analyzed. A corresponding fit for "negative frequencies" is implied automatically by the selection of sine waves for fitting.
- For each of the frequencies used for the approximation, the values along that waveform are evaluated at all M of the frequencies from 0 to π.
- A perfect fit would produce
`N`

parameter values that, when multiplied with the`M`

sets of expanded values of the full waveforms, would yield the`M`

desired ideal derivative response values in`V`

. In general, it is not possible to satisfy a relatively large number`M`

constraints exactly using a relatively small number`N`

of adjustable parameters. Which means, the best that can be done is "almost" solve them, "as close as possible."

## Obtaining a "least squares best fit"

The expansions of the fitting waveforms are placed into column
vectors of length `M`

, and the `N`

column vectors
are then collected to form an `M x N`

matrix `MMat`

.

The `M`

desired response
gain values are collected into column vector `V`

.

The values of the filter coefficients to be determined constitute
an unknown vector `F`

with `N`

terms. The linear
system to be solved is then

(LLS constraint equation) MMat · F ≈ V

An exact solution for not available, but the classic *Normal Equations*
can be applied to determine the necessary conditions for "least squares best
fit values" of the parameters (the solution that minimizes the sum of the
fitting errors squared).

(LLS normal equation) F = (MMat^{T}MMat)^{-1}(MMat^{T}) · V

In a well-posed problem, the best-fit solution exists and is unique.

Let's try this — call it a "naive least squares fit" — and see what happens. Here is a typical plot of the kind of filter characteristic produced.

There is no need to provide the coefficients of this particular design, since it is deficient in many ways:

- Its has poor asymptotic accuracy near zero.
- Its accuracy falls off seriously, well before reaching the upper end of the "20% of Nyquist" band.
- It is a lot bigger, hence inefficient.

However, it is worth observing that the noise rejection is far better than anything we have yet seen. Perhaps we can retain some of this behavior, and mitigate the poor fit in the lower frequencies.

## Weighting in least squares problems

Suppose that you arbitrarily take one of the `M`

rows
in the constraint equation system, and scale every term on both the
left and right hand side of that equation by an arbitrary non-negative
number `q`

. This doesn't affect the validity of the constraint,
since multiplying both sides of an equality expression with the same
number yields an equivalent equality expression. However, the weighting can
have a big effect on the least squares fitting problem. In the extreme
case, weighting factor `q = 0`

will render the constraint
equation moot, because "zero times anything is still zero" and there
is never any error in fitting this constraint exactly.
For small but nonzero values of `q`

, the constraints have some
influence, but much larger matching errors are tolerated.

The most serious problems with the naive fit is that it expends too much effort trying to accurately fit that severe discontinuity at 20% of the Nyquist frequency. This particular shape is not important. So, let's apply a weighting factor of zero to the band from 20% to 40% of the Nyquist range. In effect, this says "I don't care where the curve goes in this region, as long as it ends up low and doesn't act too wacky." Let's make this change and see what happens.

This result is rather stunning. But keep in mind that this 21-term filter
requires significant computational effort. Let's call this the *overkill
least squares derivative estimator*.

-2.415514e-03 -3.498196e-03 3.733505e-04 8.819803e-03 1.416449e-02 5.951338e-03 -1.850546e-02 -4.727302e-02 -5.953525e-02 -4.163835e-02 0.0 4.163835e-02 5.953525e-02 4.727302e-02 1.850546e-02 -5.951338e-03 -1.416449e-02 -8.819803e-03 -3.733505e-04 3.498196e-03 2.415514e-03

Maybe you don't need * that much* accuracy, or *that
much* of a low-frequency band width, or quite *that much* noise
attenuation. Shorter and computationally more efficient filters can be
obtained, if you are willing to compromise a little on the other performance
characteristics. Here are some adjustment options.

- The "20% of Nyquist" band is not absolute. Small deviations can be tolerated in that band length.
- It is probably sufficient to allow the noise in the upper bands to
be
*just small enough*by applying smaller weighting factors there, allowing tighter control at other frequencies. - Allow a slightly wider transition zone by setting a few more weighting factors to zero between 20% and 50% of the Nyquist limit.

Here are the coefficients for a 13-term design that I like.

0.021028 0.018163 -0.025149 -0.089145 -0.124062 -0.090751 0.0 0.090751 0.124062 0.089145 0.025149 -0.018163 -0.021028

Here is the response of this particular least squares design, comparing it to the best robust maxflat design I could find.

Points to notice:

- This least-squares estimator only uses 4 more terms than the maxflat estimator. (This is not intended to be an "unfair" comparison. A shorter maxflat estimator gives much worse noise attenuation, while a longer maxflat estimator has worse low band accuracy.)
- The response of the least-squares estimator is allowed to degrade a little at the 20% of Nyquist limit band edge, to keep midband gain lower.
- The least-squares estimator extends good accuracy to about 15% of the Nyquist limit, while the comparable range for the maxflat design is about 5% of Nyquist limit.
- The LS estimator is better at resolving and attenuating noise in middle frequencies between 40% and 60% of the Nyquist limit.
- The maxflat designs have better asymptotic accuracy near frequency zero than least squares designs, even though this is not apparent from plot. That means better performance for estimating very small derivative values, but only in a low-noise scenario.
- The LS estimator is less aggressive for attenuating noise in the extreme high frequencies, in exchange for the other benefits.

As a final comparison, let's go back to the "noisy data set" example used in previous sections of this series. If you go back and compare to all of the previous results, it will become apparent just how much benefit the least squares estimator provides.

Finally, if you know that your derivative is relatively small, and the 20% of Nyquist limit bandwidth is too wide for your needs, you can get smoother estimates with good accuracy through about 8% of the Nyquist limit using the following estimator.

Nominal "10% of Nyquist" Least Squares Derivative Estimator - 19 term0.0237475 0.0195601 0.0018814 -0.0279459 -0.0626650 ... -0.0909902 -0.1016449 -0.0879622 -0.0510464 0.0 ... 0.0510464 0.0879622 0.1016449 0.0909902 0.0626650 ... 0.0279459 -0.0018814 -0.0195601 -0.0237475

At this point, it is reasonable to say: "We are done. Just take the least squares estimator design and use it." However, it is possible to optimize further for a particular application. You can do this! Run this Octave (Matlab) design script. Fiddle with the parameters at the top of the script and rerun it until you find something that you like the best.

Some hints about what to look for as you adjust parameters.

- Try starting with a filter length 15. Making the filter length longer improves numerical performance, but at the cost of more computation. A longer filter needs more "start up" values, so it will require a slightly longer input data set to produce the same number of estimate points. A narrower low-frequency band will require more terms, because the filter will need to be more selective.
- Try starting with the low frequency response band equal to the frequency range that you want, and then tweak up and down from there. Reducing the flat response band usually allows a smoother transition and less noise in the middle frequencies in exchange for slight impairment at the end of the band.
- Try starting with an attenuation band above 35% of the Nyquist limit, and then adjust up or down from there. A smaller band lets more mid-band noise through, but promotes accuracy at the low frequencies.
- Try a weighting factor of 0.25 for the attenuation band, 1.0 for the accurate band, and then adjust these weights up or down.
- There is an additional weighting that can be applied to the low band to encourage additional low-frequency accuracy. Start it at 0, and try adjusting it to 10 or 20 after the rest of the filter characteristics look good.

And with that, this series comes to an end... for now.