Wednesday 7 October 2015

Phase change rate of human speech


I have a .wav file of human speech, sampled at 48kHz, with useful signal up to 5kHz, and almost no noise. If I do Hilbert transform in MatLab, I get the Analytic signal:


$$ s_\mathrm{a}[n] = s_\mathrm{real}[n] + j \cdot s_\mathrm{hi}[n], $$


where $s_\mathrm{hi}[n]$ is a Hilbert transformation of $s_\mathrm{real}[n]$ from the .wav file. Now for each sample I can calculate the phase:


$$ \phi[n] = \operatorname{Arctan}\left( \frac{s_\mathrm{hi}[n]}{s_\mathrm{real}[n]} \right) $$


and take time derivative from it. That will be the speed of phase changing. I don't want to call it "instantaneous frequency", that's too controversial.


Is any smarter way of doing this in MatLab? Probably this:



Fs/(2*pi)*diff(unwrap(angle(hilbert(s_real))));

I did this in Adobe Audition plugin, in Delphi. Way too many spikes, every time when the envelope $\sqrt{s_\mathrm{real}^2[n] + s_\mathrm{hi}^2[n]} $ was around zero.



Answer



I will not address the question whether or not it is meaningful to compute the instantaneous frequency of a speech signal. Instead I will show you a better method for computing the instantaneous frequency from a given analytic signal. This method avoids the phase unwrapping problem by directly computing the instantaneous frequency from the real and imaginary parts of the (discrete-time) analytic signal


$$s[n]=s_R[n]+js_I[n]\tag{1}$$


where


$$ s_I[n] = \mathscr{H} \big\{ s_R[n] \big\} $$


with $\mathscr{H}\{\cdot\}$ denoting the (discrete) Hilbert transform.


First, note that the formula for computing the phase given in your question only works if the real part of the analytic signal is greater than zero. In general you have to use the atan2 function implemented in most programming languages. In Matlab you can of course use angle, as shown in your example code.



Second, the phase becomes irrelevant and undefined if the magnitude of the respective complex number is (close to) zero. So it's pointless trying to compute the phase angle of a complex number with a negligible magnitude. This is true in general, so it doesn't matter how smart your method for computing the instantaneous frequency is.


Let's define the instantaneous frequency of a discrete-time analytic signal as the forward difference


$$f[n]=\frac{\phi[n+1]-\phi[n]}{2\pi T}\tag{2}$$


where $\phi[n]$ is the phase angle of the analytic signal $s[n]$, and $T$ is the sampling period. Note that other definitions such as the backward difference or central difference are also possible and useful.


We will not use $(2)$ directly for computing $f[n]$ because it requires phase unwrapping, which is especially problematic in the presence of noise. With


$$s[n]= \Big|s[n]\Big|e^{j\phi[n]}\tag{3}$$


we get


$$s[n+1]s^*[n]=\Big|s[n+1]\Big|e^{j\phi[n+1]}\cdot \Big|s[n]\Big|e^{-j\phi[n]}=\Big|s[n+1]\Big| \Big|s[n]\Big|e^{j(\phi[n+1]-\phi[n])}\tag{4}$$


(where $^*$ denotes complex conjugation), and, consequently,


$$f[n]=\frac{1}{2\pi T}\arg\Big\{s[n+1]s^*[n]\Big\}\tag{5}$$



The argument should be computed using the atan2 function. The result of the argument computation is only defined for $s[n]\neq 0$ and $s[n+1]\neq 0$.


Note that using backward or central differences in the definition of the instantaneous frequency would have resulted in the following expressions:


$$\begin{align}f_b[n]&=\frac{1}{2\pi T}\arg\Big\{s[n]s^*[n-1]\Big\}\\f_c[n]&=\frac{1}{4\pi T}\arg\Big\{s[n+1]s^*[n-1]\Big\}=\frac12\big(f[n]+f_b[n]\big)\end{align}$$


Expressing Eq. $(5)$ in terms of the real and imaginary parts of the analytic signal gives:


$\begin{align} f[n] &=\frac{1}{2\pi T}\arg\Big\{\big(s_R[n+1]+js_I[n+1]\big)\big(s_R[n]-js_I[n]\big) \Big\}\\ \\ &=\frac{1}{2\pi T}\arg\Big\{s_R[n]s_R[n+1]+s_I[n]s_I[n+1]+j\big(s_R[n]s_I[n+1]-s_R[n+1]s_I[n]\big)\Big\}\\ \end{align}\tag{6}$


More details about instantaneous frequency (and instantaneous bandwidth) can be found in the paper "The calculation of instantaneous frequency and instantaneous bandwidth" by A.E. Barnes.


Also take a look at this related answer.


No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...