Autocorrelation and Test Signals
Developing Models for Kalman Filters
The time scale of the state transition model is "one step at a time." The internal state is updated for each input, and then activity moves ahead to the next time step. A functional response that appears in one time step and disappears the next is basically a kind of operation at the Nyquist limit of the sampled data. That is not the kind of behavior we are interested in for the dynamic model.
As we found last time, it is easy to generate randomized test signals that avoid excessive noise variance by limiting the bandwidth of the test input sequence. However, this comes with side effects. Because the filtered signal is obtained by a kind of "weighted average" of neighboring random values, there is a linear relationship between nearby points along the input signal sequence. That is, there is correlation. This correlation will mask any correlation effects in the extreme short term, but not the long term dynamic behaviors. We are going to explore that further, starting with an abbreviated review that you can skip if you are thoroughly versed in correlation statistics.
The meaning of correlation
Suppose we take two unbiased (zero mean) signal sequences,
u
and y
.
u0, u1, u2, u3, etc. y0, y1, y2, y3, etc.
Take the two signals and multiply them term-by-term to obtain a product sequence.
u0·y0, u1· y1, u2·y2, u3·y3, etc.
Calculating the mean value of this product sequence, as the
number of product terms extends to infinity, yields the
correlation between signals
u
and y
. [1]
A useful approximation to the correlation can be obtained by using a large finite number of product terms and averaging them.
First, consider two special extreme cases.
-
If the sequences
u
andy
happen to be exactly equal, term for term, every term in the product sequence turns into a squared value, and contributes a non-negative amount to the average. This produces a very strong positive correlation value. -
When the sequence
y
equals the negative of theu
sequence term for term, every product term in the correlation calculations yields a non-positive value, producing a very strong negative correlation value.
In general, when the sequence y
consists of
a non-negative multiple of the sequence u
, the sign
and magnitude of the multiplier determines the sign and magnitude
of the correlation.
Now consider the opposite extreme, when the sequences are extremely dissimilar.
When sequences
y
andu
are unbiased and independently random, the signs and magnitudes of the product terms could fall anywhere. The highs and the lows balance. The correlation is zero.When sequence
u
is random and unbiased, and terms of sequencey
are independent of the random terms inu
but not necessarily random, the correlation is still zero.When sequence
y
is exactly the same random sequence asu
but shifted by one or more index positions, the correlation is zero. The terms that are involved in the products are independently random.
We sometimes apply the special name autocorrelation for correlations obtained between two signals that are actually shifted versions of the same signal sequence. We sometimes apply the special name crosscorrelation for correlations obtained between shifted or unshifted versions of signals considered to be from different sources. Sometimes, we don't worry about the distinction, and both are just correlations.
Correlation shows a property of separability. If
signal sequence y
is composed of two parts, a part
that is linearly related to u
, and another part that
has no linear relationship to u
, the correlation between
y
and u
will be exactly the same as
if the parts of y
without any relationship had not
been present.
That final observation is the key to finding linear models. A strong correlation between a system's input sequence and output sequence indicates a strong linear relationship.[2]
Obfuscation by autocorrelation
Suppose we take the degenerate case of a system that merely
takes the input value and reports that as the output value at
the next time step (a very minimal system of order one).
From what we already know, the correlation between the input and
output values at time 0 will be zero if the input sequence
u
is made equal to a pure random white noise
sequence r
.
But now, suppose we substitute for u
the filtered
band-limited signal generated by the rsig20
function
that that we discussed last time. Now when we measure
the input to output correlation at time 0, there is a strong
correlation. What happened? Well, the system hasn't fundamentally
changed how it resonds to things. The apparent correlations only
reflect the fact that consecutive input signal values share a
similarity to each other.
Let's state this more formally. We want to evaluate the
crosscorrelation between input signal u
and output
signal y
.
We know that the u
signal is generated from a
random number sequence by convolving with a known filter vector
H
with coefficients h
.
Now perform some algebraic manipulations on this expression.
This tells us what has happened to our correlations. The
corrj
terms are values of correlation
calculated when the input and output signals are shifted by
j
positions. This says that the results using the
filtered signal are the same that you would get by exciting the
system with the unfiltered white noise sequence r
(without the band-limiting filtering) and then applying the
band-limiting filter to the correlation sequence.
This filtering is a smoothing or low pass operation that will obscure high-frequency effects in the correlation sequence. That is mostly a good thing, since variations that occur very rapidly are presumed to be unrelated to the system responses we care about. (Otherwise, a higher sample rate is called for.) However, this also has side effects of attenuating any physically real variations at those same high frequencies. You want your sampling rate faster than any of those variations that are meaningful.
The filtering is very predictable. In fact, we know that the
rsig20
band-limiting filter function has side
effects that are limited to 20 shift positions. Let's verify this
with Octave.
Generate a random signal using rsig20
command.
Now calculate estimates of the autocorrelation in this signal at
time shifts between the input and output: 1 sample, 2 samples,
3 samples, etc.
nterms = 25000; ncorr = 30; nsum = nterms+ncorr; rs = rsig20(nsum); cm=zeros(ncorr,1); for icol=1:ncorr sum = 0.0; for irow=1:nterms sum = sum+rs(irow)*rs(irow+icol-1); end sum = sum/nterms; cm(icol) = sum; end cm
Here are the calculated correlation terms.
0.2487413 1 0.2355682 0.1995715 0.1499528 0.0980362 0.0533403 6 0.0210098 0.0014457 -0.0082210 -0.0119388 -0.0129718 11 -0.0131467 -0.0130204 -0.0124904 -0.0113681 -0.0096733 16 -0.0076556 -0.0056736 -0.0040548 -0.0030012 -0.0025524 21 -0.0025997 -0.0029380 -0.0033386 -0.0036193 25 etc.
It is evident that at term 20 and beyond, all that is left of the correlations is residual noise. (With an infinite data set, those positions beyond 20 would go to zero exactly.) When you see this particular pattern, it can be recognized as a side effect.
Getting practical with correlation
The band-limit filtering will suppress small and rapid variations but have no effect on correlation relationships over an extended interval. Since band-limit filtering cannot produce any effects more than 20 terms removed, and the original white input data prior to the filtering contributes no relationship at all from one term to the next, any wider patterns that appear in the correlation must result from the state effects within the system. Depending on just how slow the system responses are, the effects could persist visibly for many hundreds of samples. The starting point to looking for these patterns is to obtain the input/output data sets for your system.
Footnotes:
[1] Only a cursory review is provided here. For a more complete survey of correlation analysis, you can start at the Wikipedia article on correlation functions.
[2] I've always found it curious that the way to detect a linear relationship is by using quadratic calculations. Would a quadratic relationship reveal itself using third-order calculations? Feel free to explore the seldom-visited world of higher-order statistics.