Tuesday 29 December 2015

signal analysis - Need a better step detection algorithm


I have a time series with lots of steps/jumps (data file here). A plot is given below. I would like to subtract an appropriate value for each of these square wave features to bring them back down to the baseline of the signal. A median filter works really well for removing a small number of outliers in a row, but in this case I probably need a different approach since the square wave jumps can have different durations as seen. A common method I've seen for doing this is to compute first differences of adjacent samples, and look for large differences to detect jumps. I implemented this method but the problem is it often fails, since the one tunable parameter for the method is a threshold value $t$ which the first differences must cross in order to detect a jump: $$ | x_{i+1} - x_i | > t $$ As can be seen in the plot below, the jumps I have are often different sizes, so a constant threshold value isn't the best approach. In particular, in some cases there is an interesting signal where adjacent samples can change by large values without being a jump! I have highlighted such a region in red.


Time series



Below is a zoomed in view of the red box area. You can see there is a square wave jump followed by an interesting signal. The red arrows depict a place where adjacent samples from an interesting signal have a larger distance between them than some of the jumps in the signal. Therefore a constant threshold method with finite differencing will not work for me.


Does anyone know of a robust procedure to detect and subtract the square wave jumps to end up with a smoothly varying signal with no jumps? I'm sure this must be a solved problem but I haven't had much luck searching online.


enter image description here



Answer



I managed to get a fairly reliable solution to this problem using the following steps:



  1. Smooth and differentiate the signal with a 2nd order Gaussian filter: $$ y(t) = \frac{d^2}{dt^2}G_{\sigma}(t) * x(t) $$

  2. Search for zero crossings in $y(t)$ and record their positions (a first derivative filter will show peaks at each edge/jump. A second order filter will have zero crossings at the positions of the edge/jump).

  3. Loop through the detected zero crossings $i$ (i.e. edge positions) and push the magnitude $|x_{i+1}-x_i|$ into a double-ended queue. I keep a cumulative sum of all elements in the queue extending backwards in time by some maximum amount (dt_max). If the next jump magnitude $|x_{i+1}-x_i|$ is within some relative error tolerance of the cumulative sum, the algorithm determines that a complete sequence of jumps has occurred leading back to the original signal baseline.

  4. When a complete jump sequence is found, offsets are calculated for each portion of the jump to bring all the segments down to the signal baseline. They are popped off the back of the double-ended queue as they are processed.



This procedure is not perfect, and still misses a few jumps, but manages to capture most of them, even in the presence of significant noise.


No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...