For instance, when using 4 data points, the SavitzkyGolay filter is, y(k) = .7 x(k) + .4 x(k1) + .1 x(k2)  .2 x(k3)
where x(k) is the newest (current) input x(k  1) is the previous input x(k2) is the next oldest input x(k3) is the oldest input used for the 4point filter y(k) is the newest (current) filtered (output) value.
The special case of a 2point filter is just included for completeness  you wouldn’t ever need to use that. The least squares fit of a straight line for two points goes through both those points, so the least squares estimate is simply the newest input.
The sum of all the coefficients is always 1.0 (within rounding error in the tables above), is a requirement because the response to a steady input (or step response) must match the output when the input is steady. You can see that the oldest data always enters the filter with negative coefficients, so that the sum of earlier data in response to a step input of 1.0 is greater than one. That’s why there is always overshoot to a step response.
For some filter sizes (5, 8, 11), one of the data inputs is completely ignored in each time step. For the most effective noise reduction for the amount of computation, you might as well use filters one size smaller or larger. If you want to use a time period that requires more than 13 data points, consider sampling more slowly at the input to this filter, taking care to avoid aliasing by first filtering with an exponential filter at the higher sample rate.
The SavitzkyGolay smoother has advantages over the filter version. One big benefit is that there is no lead or lag  and no overshoot. Also, the area under the curve is maintained. In general, when not using filtered values immediately for control, if a delay in the application can be tolerated, smoothing will be better. Diagnostics can often tolerate some delay, worth the wait if false alarms can be avoided.
Copyright 2010  2015, Greg Stanley
Return to Filtering Next: Spike Filtering
Return to A Guide to Fault Detection and Diagnosis
