Sunday 1 January 2017

matlab - how filters noise of same frequencies(lms)


I am sharing my assumption. Please correct and guide me If I am wrong. I can't understand or thinking my assumption is wrong somewhere.


We can decompose voice signal or noise signal into a number of sinusoidal signals of different frequencies according to Fourier transform from which we can extract our desired frequencies.


So how do we filter noise from voice of same frequency.


Lets say for one particular signal we required has following frequencies:



say letter A of man x has following frequencies when decomposed. (Is it right I mean each alphabet or word we speak based on who speak etc has different frequencies mixed even though single letter right)


$$F_\mathbf{A,x}= \{200,900,1200\}\\ f_\text{noise}=\{300,900,1200\}$$


Though it has different frequencies amplitude may differ.


So to reduce noise though we can't eliminate frequencies (same frequencies almost so no bandpass etcc) we can reduce amplitude of noise compared to amplitude of desired signal such that it looks like noise is eliminated.


So how do we reduce noise amplitude of particular frequency component.


In above example a(desired signal) of 900 Hz lets say has an amplitude of 2.5 and noise of 6. So how do we reduce this noise amplitude to eliminate noise from signal?


How does a filter even distinguish this is noise amplitude and this is signal amplitude? And how does it decrease noise amplitude only? (I mean they are of same frequency so thinking though they are many algorithms like lms,weiner etccc how do they distinguish noise amplitude and reduce from signal amplitude of same frequency what other paramenter is is using to distinguish)


When I saw adaptive lms


it says




error(gives voice) == mixedsignal(voice+noise) - w(filter coefficent) * noisereference(close to noise lets say in this case noise reference is exactly equal to noise so as to completely eliminate noise )
mu=convergence factor
w=w+2*mu*error*noisereference

From above formula noise is removed from mixed signal and we get voice signal.


filter coefficent updates from feedback so as to make noisereference exactly equal to noise in mixed signal.


From formula taking w as 0 starting


If voice is more than noise we get high



error = voice(high)+noise(low) --- w*refnoise


(low for that partiucalr time instant)

so next instant(sample ) we get even more high error since w (filter coefficent increases)



w=w+2*mu*error*noise

w gets more high and more is subtracted from error to get desired signal.


I mean what is it doing really if voice is more it is subtracting high to remove noise. And if voice is less we get less error so less filter coefficents then it is subtracting less next time instant to remove noise from voice.


Is it subtracting high assuming next time instant(sample) consists of more signal information since present instant(sample) consists of signal amplitude more ?


eventhough present instant or sample is of same frequency how is it even sepereating amplitude of that particualr frequency and why is subtracting more amplitude if voice is more in that particualr sample


If it is subtracting amplitude how is it diffenretiating of same frequency.If voice is more why is it even subtracting high and what exactly it is subtracting.



I also came across another variable called phase .i can understand amplitude and frequency based on voice signal or what we speak.But I cant understand how phase is related to voice what exactly is it doing.


In above example we take non stationary(signal varying with time) and repeating loops by taking no of samples.Our aim is to serperate voice from noise(background of same frequencies).I want to know how this formual works.



Answer




We can decompose voice signal or noise signal into a number of sinusoidal signals of different frequencies according to Fourier transform from which we can extract our desired frequencies.



Not 100% true: Only for band-limited signals, you can say "into a number of sinusoidal signals"; because, for example, "white noise", by definition has uncountable, infinite frequency content.


However, you're talking about digitized signals, which inherently defines a band-limitation and thus means the frequencies necessary to represent the signal are countable finite.



So how do we filter noise from voice of same frequency.




Not with what is usually meant when people say "filter", at all.


You just "trim" away the noise in areas where we know that there's no desired signal, and thus, for the whole spectrum, the noise power drops, but the signal power stays the same, thus increasing Signal-to-Noise-Ratio (SNR).


Thus, you writing $f_\text{noise}=f_\mathbf{A,x}$ is the main misunderstanding here. Noise will be present everywhere, especially where there's no speech signal. It would be more accurate if you understood it as $f_\mathbf{A,x}\subset f_\text{noise}$, and there being a lot more frequency content in $f_\text{noise}$ than in $f_\mathbf{A,x}$.



In above example a(desired signal) of 900 Hz lets say has an amplitude of 2.5 and noise of 6. So how do we reduce this noise amplitude to eliminate noise from signal?



Simple: we cannot do that. That's where speech models come into play that not only assume voice is composed of different tones, but also model the temporal and statistical behaviour of it.


Also, note that an SNR where there's more noise even when exclusively considering the frequencies belonging to the actual signal is typically very bad, and you can't decode the result unless you can observe the desired signal for long enough so that you can, through processing gain, get more info on the desired signal (e.g. accumulating observation, so that uncorrelated noise tends to cancel out, while correlated voice adds up).





Personal note, not really part of the answer: I think you're trying to solve complicated problems before you fully understood your signal model. I'd very strongly recommend getting a cup of tea and reclining with a book on signals. Your mathematical notation for frequency content was terrible before I edited your question, and that might be an indication of further misunderstandings.


Also, I'd encourage you to play with signals a bit. Get a microphone, record the vowel "A" with very little noise, use matlab, audacity or whatever to look at its spectrum. Then extract different parts of the recording, and only look at the spectrum of those (at start, end and middle, for example).


Then, use matlab to generate a noise signal with an average amplitude let's say twice as high as the dominant frequencies in the speech signal, and add this noise to your speech recording. Do you understand anything?


Point here is that most of your misunderstandings could've been investigated in literature that you should have read anyway, before starting to process speech, or by simple experimentation (that in turn could have led to very precise questions that you can ask here!).


No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...