Greg Stanley and Associates
 Home   About Us   Products   Services   Success Stories   Tech Resources   Contact Us 
HomeTech ResourcesFault DiagnosisFiltering > LeastSquares Filter >

Least Squares Filtering and Smoothing, including Savitzky-Golay

Filtering topics:

 

This page describes filtering using a least squares curve fit to the data over time, including the Savitzky-Golay filter. This page is part of the section on
Filtering that is part of
A Guide to Fault Detection and Diagnosis..

Overview

Another “moving average” (MA) approach to filtering of data is to perform a least squares curve fit of the input data. Since this filter just depends on recent input values, it is in the general category of MA filters, also called finite impulse response (FIR) filters. You can estimate the current value (“filtering”) or previous values anywhere within the time window of the input data (“smoothing”).

Overshoot in the step response for least squares filtering

Like any MA filter, it completes a step response in finite time:

Least Squares Filter step response, showing overshoot

The example for this plot was a filter based on the nine most recent filter input values, with sampling at equal time intervals. This filter performed a linear curve fit. Filters based on higher order polynomials can also be used, but are not recommended for filtering. (They may be fine for smoothing).

The step response for a least squares filter shows “overshoot” in response to a sudden input change: the filter output can change more than the input for a period of time. This can lead to quicker detection of problems, but it also can lead to temporary false alarms. For this reason, caution should be exercised in using these filters. The above comments on overshoot only apply to filtering to estimate the current value. This will not be a problem when “smoothing” to estimate the value at the midpoint of the time window.

A benefit of a linear least squares filter is that it will exactly track a ramp input without the lag associated with other filters, once all the input data include the ramp.

Least squares filter frequency response

Least squares filters are best used mainly for slowly changing variables, because they can give quirky results for signals with higher frequencies. (A step input can be thought of as containing all frequencies). Higher-order polynomial filters should probably be avoided for filtering because the response to higher frequencies gets even more quirky, This is less of an issue for smoothing.

Savitzky-Golay filtering and smoothing when sampling at a fixed time interval

When sampling analog data at fixed time intervals for filtering, it is not necessary to solve a full least squares problem. Each recent input value can simply be multiplied by a constant coefficient value that only depends on the number of points in the filter. This surprising result means that the filter form and parameters are independent of the input data. This is far less computationally intensive at run time. This approach to calculate constant coefficients was first derived by Savitzky and Golay in 1962, occasionally misspelled Savitsky-Golay. The original paper only addressed smoothing, not filtering, and is summarized well in the Wikipedia article. However, simple formulas exist for coefficients to estimate the value at either the current time (“filtering”), or at the midpoint of the time intervals (“smoothing”). The slope (time derivative) is also easily estimated with these filters, described later in the section on estimating derivatives. The fixed coefficients for filtering with between 2 and 13 input data points are shown below:

 

least squares filter coefficients for 2 to 7 data points
least squares filter coefficients for 8-13 data points

 

For instance, when using 4 data points, the Savitzky-Golay filter is,
y(k) = .7 x(k) + .4 x(k-1) + .1 x(k-2) - .2 x(k-3)

where
x(k) is the newest (current) input
x(k - 1) is the previous input
x(k-2) is the next oldest input
x(k-3) is the oldest input used for the 4-point filter
y(k) is the newest (current) filtered (output) value.

The special case of a 2-point filter is just included for completeness - you wouldn’t ever need to use that. The least squares fit of a straight line for two points goes through both those points, so the least squares estimate is simply the newest input. 

The sum of all the coefficients is always 1.0 (within rounding error in the tables above), is a requirement because the response to a steady input (or step response) must match the output when the input is steady. You can see that the oldest data always enters the filter with negative coefficients, so that the sum of earlier data in response to a step input of 1.0 is greater than one. That’s why there is always overshoot to a step response.

For some filter sizes (5, 8, 11), one of the data inputs is completely ignored in each time step. For the most effective noise reduction for the amount of computation, you might as well use filters one size smaller or larger. If you want to use a time period that requires more than 13 data points, consider sampling more slowly at the input to this filter, taking care to avoid aliasing by first filtering with an exponential filter at the higher sample rate.

The Savitzky-Golay smoother has advantages over the filter version. One big benefit is that there is no lead or lag - and no overshoot. Also, the area under the curve is maintained. In general, when not using filtered values immediately for control, if a delay in the application can be tolerated, smoothing will be better. Diagnostics can often tolerate some delay, worth the wait if false alarms can be avoided.

 

Copyright 2010 - 2015, Greg Stanley

Return to Filtering   Next: Spike Filtering

Return to A Guide to Fault Detection and Diagnosis

 

Share this page: Share this page via LinkedIn... Bookmark or share this page on Delicious... Share this page by e-mailing link...    

 Home   About Us   Products   Services   Success Stories   Tech Resources   Contact Us 
Enter keywords to search this site:
Now operated by Performity LLC