Test Correlation

The time scale of the state transition model is "one step at a time." The internal state is updated for each input, and then activity moves ahead to the next time step. A functional response that appears in one time step and disappears the next is basically a kind of operation at the Nyquist limit of the sampled data. That is not the kind of behavior we are interested in for the dynamic model.

As we found last time, it is easy to generate randomized test signals that avoid excessive noise variance by limiting the bandwidth of the test input sequence. However, this comes with side effects. Because the filtered signal is obtained by a kind of "weighted average" of neighboring random values, there is a linear relationship between nearby points along the input signal sequence. That is, there is correlation. This correlation will mask any correlation effects in the extreme short term, but not the long term dynamic behaviors. We are going to explore that further, starting with an abbreviated review that you can skip if you are thoroughly versed in correlation statistics.

The meaning of correlation

Suppose we take two unbiased (zero mean) signal sequences, u and y.

  u₀,  u₁,  u₂,  u₃, etc.
  y₀,  y₁,  y₂,  y₃, etc.

Take the two signals and multiply them term-by-term to obtain a product sequence.

  u₀·y₀,  u₁· y₁,  u₂·y₂,   u₃·y₃,  etc.

Calculating the mean value of this product sequence, as the number of product terms extends to infinity, yields the correlation between signals u and y. ^[1]

c o r r (y, u) = E (y \cdot u)

A useful approximation to the correlation can be obtained by using a large finite number of product terms and averaging them.

First, consider two special extreme cases.

If the sequences u and y happen to be exactly equal, term for term, every term in the product sequence turns into a squared value, and contributes a non-negative amount to the average. This produces a very strong positive correlation value.
When the sequence y equals the negative of the u sequence term for term, every product term in the correlation calculations yields a non-positive value, producing a very strong negative correlation value.

In general, when the sequence y consists of a non-negative multiple of the sequence u, the sign and magnitude of the multiplier determines the sign and magnitude of the correlation.

Now consider the opposite extreme, when the sequences are extremely dissimilar.

When sequences y and u are unbiased and independently random, the signs and magnitudes of the product terms could fall anywhere. The highs and the lows balance. The correlation is zero.
When sequence u is random and unbiased, and terms of sequence y are independent of the random terms in u but not necessarily random, the correlation is still zero.
When sequence y is exactly the same random sequence as u but shifted by one or more index positions, the correlation is zero. The terms that are involved in the products are independently random.

We sometimes apply the special name autocorrelation for correlations obtained between two signals that are actually shifted versions of the same signal sequence. We sometimes apply the special name crosscorrelation for correlations obtained between shifted or unshifted versions of signals considered to be from different sources. Sometimes, we don't worry about the distinction, and both are just correlations.

Correlation shows a property of separability. If signal sequence y is composed of two parts, a part that is linearly related to u, and another part that has no linear relationship to u, the correlation between y and u will be exactly the same as if the parts of y without any relationship had not been present.

That final observation is the key to finding linear models. A strong correlation between a system's input sequence and output sequence indicates a strong linear relationship.^[2]

Obfuscation by autocorrelation

Suppose we take the degenerate case of a system that merely takes the input value and reports that as the output value at the next time step (a very minimal system of order one). From what we already know, the correlation between the input and output values at time 0 will be zero if the input sequence u is made equal to a pure random white noise sequence r.

But now, suppose we substitute for u the filtered band-limited signal generated by the rsig20 function that that we discussed last time. Now when we measure the input to output correlation at time 0, there is a strong correlation. What happened? Well, the system hasn't fundamentally changed how it resonds to things. The apparent correlations only reflect the fact that consecutive input signal values share a similarity to each other.

Let's state this more formally. We want to evaluate the crosscorrelation between input signal u and output signal y.

c o r r (y, u) = E (y \cdot u)

= \sum_{i = - N}^{N} y^{(i)} u^{(i)}

We know that the u signal is generated from a random number sequence by convolving with a known filter vector H with coefficients h.

= \sum_{i = - N}^{N} y^{(i)} (\sum_{j = - K}^{K} h^{(j)} r^{(i - j)})

Now perform some algebraic manipulations on this expression.

= \sum_{i = - N}^{N} \sum_{j = - K}^{K} (y^{(i)} h^{(j)} r^{(i - j)})

= \sum_{j = - K}^{K} \sum_{i = - N}^{N} (y^{(i)} h^{(j)} r^{(i - j)})

= \sum_{j = - K}^{K} h^{(j)} \sum_{i = - N}^{N} (y^{(i)} r^{(i - j)})

= \sum_{j = - K}^{K} h^{(j)} c o r r_{j} (y, r)

This tells us what has happened to our correlations. The corr_j terms are values of correlation calculated when the input and output signals are shifted by j positions. This says that the results using the filtered signal are the same that you would get by exciting the system with the unfiltered white noise sequence r (without the band-limiting filtering) and then applying the band-limiting filter to the correlation sequence.

This filtering is a smoothing or low pass operation that will obscure high-frequency effects in the correlation sequence. That is mostly a good thing, since variations that occur very rapidly are presumed to be unrelated to the system responses we care about. (Otherwise, a higher sample rate is called for.) However, this also has side effects of attenuating any physically real variations at those same high frequencies. You want your sampling rate faster than any of those variations that are meaningful.

The filtering is very predictable. In fact, we know that the rsig20 band-limiting filter function has side effects that are limited to 20 shift positions. Let's verify this with Octave.

Generate a random signal using rsig20 command. Now calculate estimates of the autocorrelation in this signal at time shifts between the input and output: 1 sample, 2 samples, 3 samples, etc.

nterms = 25000;
ncorr  = 30;
nsum   = nterms+ncorr;

rs = rsig20(nsum);
cm=zeros(ncorr,1);

for icol=1:ncorr
  sum = 0.0;
  for  irow=1:nterms
      sum = sum+rs(irow)*rs(irow+icol-1);
  end
  sum = sum/nterms;
  cm(icol) = sum;
end

cm

Here are the calculated correlation terms.

   0.2487413     1
   0.2355682
   0.1995715
   0.1499528
   0.0980362
   0.0533403     6
   0.0210098
   0.0014457
  -0.0082210
  -0.0119388
  -0.0129718     11
  -0.0131467
  -0.0130204
  -0.0124904
  -0.0113681
  -0.0096733     16
  -0.0076556
  -0.0056736
  -0.0040548
  -0.0030012
  -0.0025524     21
  -0.0025997
  -0.0029380
  -0.0033386
  -0.0036193     25
   etc.

It is evident that at term 20 and beyond, all that is left of the correlations is residual noise. (With an infinite data set, those positions beyond 20 would go to zero exactly.) When you see this particular pattern, it can be recognized as a side effect.

Getting practical with correlation

The band-limit filtering will suppress small and rapid variations but have no effect on correlation relationships over an extended interval. Since band-limit filtering cannot produce any effects more than 20 terms removed, and the original white input data prior to the filtering contributes no relationship at all from one term to the next, any wider patterns that appear in the correlation must result from the state effects within the system. Depending on just how slow the system responses are, the effects could persist visibly for many hundreds of samples. The starting point to looking for these patterns is to obtain the input/output data sets for your system.

Footnotes:

[1] Only a cursory review is provided here. For a more complete survey of correlation analysis, you can start at the Wikipedia article on correlation functions.

[2] I've always found it curious that the way to detect a linear relationship is by using quadratic calculations. Would a quadratic relationship reveal itself using third-order calculations? Feel free to explore the seldom-visited world of higher-order statistics.

Autocorrelation and Test Signals

The meaning of correlation

Obfuscation by autocorrelation

Getting practical with correlation