Thursday 28 April 2016

audio - Identify Where Singing Starts in a Voice Only Recording


I have a little karaoke-style app where a user sings 4 lines of a song, with a one second gap between each line. There is no backing music, so it's voice only, hopefully making the problem easier to solve.


I am looking for the most robust way to detect exactly where in my recording the user starts and ends singing line 1, starts and ends singing line 2, etc.


I have cobbled together a simple-minded algorithm that works when there is very little background noise in the recording (like when does that happen?), but it falls to pieces in the presence of the smallest noise.


Can anybody point me towards something more robust?





No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...