Saturday, 25 July 2015

audio - Libraries for Voice Activity Detection (Not Speech Recognition)


As follow up to my previous question I was wondering if there are any speech detection libraries in existence. By speech detection I mean passing in an audio buffer and getting back an index of where speech starts and stops. So if I have 10 seconds of audio sampling at 44kHz, I would expect an array of numbers such as:


44000
88000
123000
190334
...

This would indicate for example that speech starts one second in and then finishes at the two second point, etc.


What I'm not looking for is speech recognition which writes out text from spoken word. This unfortunately is what I see a lot of when I google 'speech detection'.



It would be great if the library was in C, C++ or even Objective-C as I'm writing an app for the iPhone.


Thanks!



Answer



In my answer to your that question, I had mentioned that Voice Activity Detection is a standard feature for codecs like G.729 and such others.


You should look for reference encoders and decoders for algorithms that applies this.


One such example is - http://www.voiceage.com/openinit_g729.php


Another possible source is Speex codec. Which implements VAD


BTW: You should google "Voice Activity Detection" or "Talk Spurt" rather than "Speech Detection".


No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...