audio - Libraries for Voice Activity Detection (Not Speech Recognition)

Saturday, 25 July 2015

audio - Libraries for Voice Activity Detection (Not Speech Recognition)

As follow up to my previous question I was wondering if there are any speech detection libraries in existence. By speech detection I mean passing in an audio buffer and getting back an index of where speech starts and stops. So if I have 10 seconds of audio sampling at 44kHz, I would expect an array of numbers such as:

This would indicate for example that speech starts one second in and then finishes at the two second point, etc.

What I'm not looking for is speech recognition which writes out text from spoken word. This unfortunately is what I see a lot of when I google 'speech detection'.

It would be great if the library was in C, C++ or even Objective-C as I'm writing an app for the iPhone.

Thanks!

Answer

In my answer to your that question, I had mentioned that Voice Activity Detection is a standard feature for codecs like G.729 and such others.

You should look for reference encoders and decoders for algorithms that applies this.

One such example is - http://www.voiceage.com/openinit_g729.php

Another possible source is Speex codec. Which implements VAD

BTW: You should google "Voice Activity Detection" or "Talk Spurt" rather than "Speech Detection".

Blog

Saturday, 25 July 2015

audio - Libraries for Voice Activity Detection (Not Speech Recognition)

No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?