Least Squares Filtering and Smoothing, including SavitzkyGolay
This page describes filtering using a least squares curve fit to the data over time, including the SavitzkyGolay smoother and an analogous filter. This page is part of the section on Filtering that is part of A Guide to Fault Detection and Diagnosis.
Overview
Another “moving average” (MA) approach to filtering of data is to perform a least squares curve fit of the input data. Since this filter just depends on recent input values, it is in the general category of MA filters, also called finite impulse response (FIR) filters. You can estimate the current value (“filtering”) or previous values anywhere within the time window of the input data (“smoothing”).
Overshoot in the step response for least squares filtering
Like any MA filter, a leastsquares filter completes a step response in finite time. (Click the plot for fullsized image.) The example for this plot was a filter based on the nine most recent filter input values, with sampling at equal time intervals. This filter performed a linear curve fit. Filters based on higher order polynomials can also be used, but are not recommended for filtering. Higher order polynomials, usually second or third order, are fine for smoothing.
The step response for a least squares filter shows “overshoot” in response to a sudden input change: the filter output can change more than the input for a period of time. This can lead to quicker detection of problems, but it also can lead to temporary false alarms. For this reason, caution should be exercised in using these filters. The above comments on overshoot only apply to filtering to estimate the current value. This will not be a problem when “smoothing” to estimate the value at the midpoint of the time window.
A benefit of a linear least squares filter is that it will exactly track a ramp input without the lag associated with other filters, once all the input data include the ramp.
Least squares filter frequency response
Least squares filters are best used mainly for slowly changing variables, because they can give quirky results for signals with higher frequencies. (A step input can be thought of as containing all frequencies). Higherorder polynomial filters should probably be avoided for filtering because the response to higher frequencies gets even more quirky, This is less of an issue for smoothing.
SavitzkyGolay filtering and smoothing when sampling at a fixed time interval
When sampling analog data at fixed time intervals for filtering, it is not necessary to solve a full least squares problem. Each recent input value can simply be multiplied by a constant coefficient value that only depends on the number of points in the filter. This surprising result means that the filter form and parameters are independent of the input data. This is far less computationally intensive at run time. This approach to calculate constant coefficients was first derived by Savitzky and Golay in 1962, occasionally misspelled SavitskyGolay. The original paper only addressed smoothing, not filtering, and is summarized well in the Wikipedia article. However, simple formulas exist for coefficients to estimate the value at either the current time (“filtering”), or at the midpoint of the time intervals (“smoothing”). The slope (time derivative) is also easily estimated with these filters, described later in the section on estimating derivatives. The fixed coefficients for linear filtering with between 2 and 13 input data points are shown below:


For instance, when using 4 data points, the linear SavitzkyGolay filter is y(k) = .7 x(k) + .4 x(k1) + .1 x(k2)  .2 x(k3)
where x(k) is the newest (current) input x(k  1) is the previous input x(k2) is the next oldest input x(k3) is the oldest input used for the 4point filter y(k) is the newest (current) filtered (output) value.
The special case of a 2point filter is just included for completeness  you wouldn’t ever need to use that. The least squares fit of a straight line for two points goes through both those points, so the least squares estimate is simply the newest input.
The sum of all the coefficients is always 1.0 (within rounding error in the tables above), is a requirement because the response to a steady input (or step response) must match the output when the input is steady. You can see that the oldest data always enters the filter with negative coefficients, so that the sum of earlier data in response to a step input of 1.0 is greater than one. That’s why there is always overshoot to a step response.
For some filter sizes (5, 8, 11), one of the data inputs is completely ignored in each time step. For the most effective noise reduction for the amount of computation, you might as well use filters one size smaller or larger. If you want to use a time period that requires more than 13 data points, consider sampling more slowly at the input to this filter, taking care to avoid aliasing by first filtering with an exponential filter at the higher sample rate.
The SavitzkyGolay smoother has advantages over the filter version. One big benefit is that there is no lead or lag. There is no overshoot, and the locations of the peaks in curves are maintained. Also, the area under the curve is maintained. In general, when not using filtered values immediately for control, if a delay in the application can be tolerated, smoothing will be better. Diagnostics can often tolerate some delay, worth the wait if false alarms can be avoided.
IIR approximations
A linear least squares filter makes an assumption that the input signal has a constant time derivative (fixed slope). If the input signal becomes a ramp, then this filter output will track the ramp, reaching it exactly by the end of the time window. If the slope of the ramp is changed, the filter output will again approach the new ramp. For that reason, this filter is competitive with double exponential smoothing, or alpha beta filtering, which also make a straight line assumption. Those are IIR approximations of the FIR linear least squares filter.
Copyright 2010  2020, Greg Stanley
Return to Filtering Next: Spike Filtering
Return to A Guide to Fault Detection and Diagnosis
