The task that I have is to remove the annual and semiannual oscillation from a set of temperature measurements, taken over several years, by means of least squares method.
I found the method described here and since I have to wait for my colleague to first make some analysis on the data and pass them on to me, I decided to make a test run in Matlab, using a signal with two simple sine and cosine functions and random noise.
%% Creating the data
g=2.5;
t=linspace(0,g,g*365*24*60); %n-years of 1min data
o=rand(size(t)); %random noise
p=2*sin(2*pi*t)+1.5*cos(pi*t); %we create two sinusoids
x=p+o; %and add noise to them
%% The method
N=length(x); %no of data
n=(1:N);
T=g*365*24*60; %time that the data covers
f1=1/(365*24*60); %frequency of 1 yr oscillations
f2=1/((365*24*60)/2); %frequency of 1/2 year oscillations
a1=f1*T;
a2=f2*T;
A1=(2/N)*sum(x.*cos((2*pi*a1*n)/N));
A2=(2/N)*sum(x.*cos((2*pi*a2*n)/N));
B1=(2/N)*sum(x.*sin((2*pi*a1*n)/N));
B2=(2/N)*sum(x.*sin((2*pi*a2*n)/N));
C1=sqrt(A1^2+B1^2); %amplitude
C2=sqrt(A2^2+B2^2);
fi1=atan(B1/A1); %phase shift
fi2=atan(B2/A2);
v=(C1*cos(2*pi*t-fi1))+(C2*cos(2*pi*t-fi2)); %the reconstructed set of data
for these two frequencies
r=x-v; %I wasn't sure what was meant by removing, so i just subtracted
%% Visualization
figure %now lets plot the data
plot(t,x)
xlabel('t[min]')
ylabel('Arb. units :)')
title('Data')
figure %and, original signal and noise separately
plot(t,p,'b')
hold on
plot(t,o,'-r')
xlabel('t[min]')
ylabel('Arb. units :)')
title('Data separated')
figure %and once again data (blue), reconstructed signal
plot(t,x,'b-') %(green), and the final result afer subtraction
hold on %(red)
plot(t,v,'g--')
plot(t,r,'r:')
xlabel('t[min]')
ylabel('w/e')
The problem now is, that the reconstructed signal (v) does not resemble the original signal so much. Even when I remove the noise and work only with the pure sinusoid signal, what I get is not really a good representation of it. So, what I'm most interested is:
- Is this at all the correct use of this method? Am I using the correct equations?
- Is there an error somewhere in the code? By error I mostly consider the wrong definition, or calculation of some of the variables.
- In this "ideal case" (sinusoid without noise) should the method be able to reproduce the original curve (almost) exactly?
I hope I've laid out my problem clearly and understandably, if anyone has additional questions, please ask. I know this is rather basic stuff, but my knowledge and experience in this area is very limited (up to non-exsistant), so please bear with me. Thank you in advance for any help.
Answer
Their are some problems in the code and same algorithmic issues: Code:
- You are estimating the wrong frequency:
f2=1/((365*24*60)/2);should bef2=1/((365*24*60))/2;
Algorithm: Basically you are using a fourier series (http://en.wikipedia.org/wiki/Fourier_series) with those terms corresponding to the known frequencies. But you should be aware of the limitations of that approach:
You have to estimate the dc part (frequency 0), because your signals' dc value is not zero!
In theory this approach works only for full periods of each frequency. That's why f1 is estimated quite good, but
f2is not. To work arround that you need either more data, because the more periods are in the data the smaller will be the estimation error. And you should use a windowing function, to get rid of some side effects by cutting the signal at the beginnig and the end.
The improved code could look like this:
clear
doWindow = true;
scaleFreq = 1;
%% Creating the data
g=2.5;
t=linspace(0,g,g*365*24*60); %n-years of 1min data
o=rand(size(t)); %random noise
%we create two sinusoids
p1=2*sin(2*pi*t*scaleFreq);
p2=1.5*cos(1*pi*t*scaleFreq);
p=p1+p2;
x=p+o; %and add noise to them
%% The method
N=length(x); %no of data
n=(1:N);
T=g*365*24*60; %time that the data covers
f1=scaleFreq*1/(365*24*60); %frequency of 1 yr oscillations
f2=scaleFreq*1/((365*24*60))/2; %frequency of 1/2 year oscillations
a1=f1*T;
a2=f2*T;
if(doWindow)
window = hanning(numel(x)).';
else
window = ones(size(x));
end
xWin = x.*window;%windowing
xMean = mean(xWin);%calculate dc
xWin = xWin-xMean;%subtract dc
buildFunkCos1 = cos((2*pi*a1*n)/N);
buildFunkCos2 = cos((2*pi*a2*n)/N);
buildFunkSin1 = sin((2*pi*a1*n)/N);
buildFunkSin2 = sin((2*pi*a2*n)/N);
A1=(1/N)*sum(xWin.*buildFunkCos1)
A2=(1/N)*sum(xWin.*buildFunkCos2)
B1=(1/N)*sum(xWin.*buildFunkSin1)
B2=(1/N)*sum(xWin.*buildFunkSin2)
if(doWindow)%do correction due to windowing
A1 = 2*A1;
A2 = 2*A2;
B1 = 2*B1;
B2 = 2*B2;
end
C1=2*sqrt(A1^2+B1^2)%amplitude
C2=2*sqrt(A2^2+B2^2)
fi1=atan(B1/A1)%phase shift
if(A1<0)
fi1 = fi1 + pi;
end
fi2=atan(B2/A2)
if(A2<0)
fi2 = fi2 + pi;
end
%the reconstructed set of data for these two frequencies
v1=C1*cos(2*pi*t*scaleFreq-fi1);
v2=(C2*cos(1*pi*t*scaleFreq-fi2));
v=v1+v2+xMean;%sum estimates (freq 1,freq 2, dc)
r=x-v; %I wasn't sure what was meant by removing, so i just subtracted
%% Visualization
figure(1) %now lets plot the data
plot(t,[p1;p2;p]);
legend('p1','p2','p');
title('single sine signals and sum');
figure(2) %and, original signal and noise separately
hold off
plot(t,[v1;v2;v]);
legend('v1','v2','v');
title('estimated sine signals and sum');
figure(3) %and once again data (blue), reconstructed signal
hold off
plot(t,x,'b-') %(green), and the final result afer subtraction
hold on %(red)
plot(t,v1+v2,'g--')
plot(t,r,'r:')
xlabel('t[min]')
ylabel('w/e')
legend('measured','reconstructed','result');
figure(4)
hold off
plot(t,[buildFunkCos1;buildFunkCos2;buildFunkSin1;buildFunkSin2]),
legend('buildFunkCos1','buildFunkCos2','buildFunkSin1','buildFunkSin2');
Here you can switch the windowing and also test what happens if more periods are in the signal. Try, scaleFrequ=10 und doWindow=true and the result should be what you are looking for.
No comments:
Post a Comment