05-22-09 - Fast Means

I know of three fast ways to track the mean of a variable. At time t you observe a value x_t. You want to track the mean of x over time quickly (eg. without a divide!). I generally want some kind of local mean, not the full mean since time 0.

1. "weighted decay". You want to do something like M_t = 0.9 * M_t-1 + 0.1 * x_t;

int accum = 0;

accum := ((15 * accum)>>4) + x_t;

mean = (accum>>4);

This is the same as doing the true (total/count) arithmetic mean, except that once count gets to some value (here 16) then instead of doing total += x_t, instead you do total = (15/16) * total + x_t; so that count sticks at 16.

2. "sliding window". Basically just track a window of the last N values, then you can add them to get the local mean. If N is power of 2 you don't divide. Of course the fast way is to only track inputs & exits, don't do the sum all the time.

int window[32];
int window_i = 0;
int accum = 0;

accum += x_t - window[window_i];
window[window_i] = x_t;
window_i = (window_i + 1) & 0x1F;

mean = (accum>>5);

3. "deferred summation". This is the technique I invented for arithmetic coding long ago (I'm sure it's an old technique in other fields). Basically you just do a normal arithmetic mean, but you wait until count is a power of two before doing the divide.

int accum = 0;
int next_count = 0;
int next_shift = 3;

accum += x_t;

if ( --next_count == 0 )
    mean = accum >> next_shift;
    accum = 0;
    next_shift = MIN( 10 , next_shift+1 );
    next_count = 1 << next_shift;

mean is not changing during each chunk, basically you use the mean from the last chunk and wait for another power-of-2 group to be done before you update mean. (also this codelet is resetting accum to zero each time but in practice you probably want to carry over some of the last accum).

For most purposes "deferred summation" is actually really bad. For the thing I invented it for - order 0 coding of stable sources - it's good, but that application is rare and not useful. For real context coding you need fast local adaptation and good handling of sparse contexts with very few statistics, so delaying until the next pow-2 of symbols seen is not okay.

The "weighted decay" method has a similar sort of flaw. The problem is you only have one tune parameter - the decay multiplier (which here became the shift down amount). If you tune it one way, it adapts way too fast to short term fluctuations, but if you tune it the other way it clings on to values from way way in the past much too long.

The best almost always is the sliding window, but the sliding window is not exactly cheap. It takes a lot of space, and it's not awesomely fast to update.


ryg said...

Variant 4: Use an IIR lowpass filter.

The general form of an IIR filter is out(t) = a_0 * in(t-0) + a_1 * in(t-1) + ... + a_m * in(t-m) + b_1 * out(t-1) + ... + b_n * out(t-n); Variant 1 is a very simple (and bad) special case of this, but the points is that there's tons of know-how on designing good lowpass filters (=moving averages) that have a frequency response close to 1 in the passband (=the values you want to average over are summed with equal weight) and sharp cutoff (=the weights from "old" samples rapidly decrease past a certain point). IIR filter design is a fairly basic topic in DSP and control theory, so there's actually lots of usable introductory material around. A nice no-nonsense introduction to filter design is "Yehars digital signal processing tutorial for the braindead".

cbloom said...

I dunno, my experience with signal processing filters has not been positive. I understand the basic theory, but turning it into something actually useful just seems like black magic.

cbloom said...

I mean I'd love to have some little code to get a moving average with proper filter code, especially if it has parameters I can tweak to control how wide the filter is and how sharp the cutoff and all that. But every page I find on the net is like "you take the fourier transform..." ugh.

old rants