Thursday, 6 August 2015

fft - MFCC feature extraction


I have a block of audio and I have performed an FFT on it. Now what I want to do is convert this FFT into a set of MFCCs, however while I know I need to do something with weighting the FFT samples based upon a set f triangular filters I haven't really got a clue what this involves.


Can anyone go into some good detail on it? If you are going to give me a matlab example please explain each step well as matlab allows you to cut quite a few corners and I'm trying to implement MFCC extraction on iPhone.



Answer



There's a lot of literature on MFCCs on the web, so it would be a bit easier if you could be more specific as to which part of the processing you don't understand. But I'll give an overview of what needs to be done, hoping this is helpful for you:



  1. compute the squared magnitudes of the FFT bins

  2. weigh the bins using triangular windows; usually the windows are chosen such that the centers of the triangles are equidistant on a mel-frequency scale, and such that each triangle begins and ends at the centers of the two adjacent triangles. The mel-frequency scale is defined by $$m=2595\log_{10}(1+f/700)$$ where $f$ is the frequency in Hz. Look at the figure to see how it works: enter image description here Note that the kernels are normalized such that the sum of the weights per triangle equals 1. Usually around 20 such triangular windows are used.


  3. Take the logarithm of the weighted coefficients.

  4. Compute the DCT.


No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...