Wednesday, 23 November 2016

audio - Why are MFCCs of two equivalent signals completely different?



I have an audio file, of which I calculate 16 (actually 15 because I omit the first one) MFCCs in R. When I stream this file via VLC-Player and Icecast2-Server, receive it in Java (with the Player of Javazoom-Lib) and then pass it on to R again for MFCC calculation I get completely different values? Has anyone an idea why this could be?


Additional info:



  • When I write the received data to a file again and view it next to the original file they look almost the same (Waveform + Spectrogram).

  • The file has a length of 3 seconds and contains the sound of a passing vehicle (car).

  • MFCCs are calculated for centered 44100 samples.

  • I need valid data in order to automatically classify cars and trucks with a SVM. This classifier is trained with features calculated from 150 audio files (each containing one vehicle).

  • Since R had problems with reading and processing mp3-files directly I first converted wav to mp3 and back again (in order to "simulate" the loss of information due to compression).


MFCC values



Furthermore, I automatically detect the vehicles in a continuous audio stream and therefore (for the particular example data above) the waveforms for calculating the MFCCs have an offset of about 4400 Samples. Does this matter with an analysis window of 44100 samples?



Answer



You say:



I stream this file via VLC-Player and Icecast2-Server, receive it in Java (with the Player of Javazoom-Lib)



When you receive the stream in Java, what is the stream format?


You say that it is the "same" because the waveform and/or spectrogram look similar, but the MFCCs will come out different if the stream format (sample rate, bit depth, etc) are different.


Can you confirm by printing out the stream formats




  1. Before sending to Java

  2. Upon receiving from Java?


I recommend using Soxi to print out the stream format (if you cannot otherwise dump it from R, or Java)


If you can eliminate stream format issues, mp3<->wav issues (deal only in wav), if you analysis windows are identical (as jojek says), then your MFCCs should come out identical.


No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...