I recorded sounds with a microphone and I try to distinguish them in my Java program. The frequency works quite good, but if I look at the fourier transforms it seems like there should be more features to distinguish sounds. I don´t know very much about signal processing, maybe you can help me. Here is a picture of two fourier transformations.
I know that the frequency is determined by the index of the maximum magnitude (hope that is the right term). In the first fourier transform curve, its at 100, in the second at 12 (frequencies are 1102.5 and 132.3). But the two sounds look so differently when they are transformed, what else could I use to distinguish them?
Answer
Two remarks:
- I am assuming you are plotting the real (or imaginary) part of the Fourier transform. It is much more common to work with the magnitude or squared magnitude (power spectrum).
- The peak in the spectrum is a very poor measure of fundamental frequency (pitch). Take a piano note at 440 Hz, apply a notch filter to it to remove the 440 Hz component. Even if the peak in the spectrum is now at 880 Hz (roughly), it will still sound like a 440 Hz note - yes, at a frequency totally absent from the spectrum!
To answer your question, timbre is the property that distinguish sounds of same loudness and pitch. However, it is not an unidimensional characteristic with a well-defined unit. What you can extract are features such that things that "sound similar" have similar features.
Common features used for characterizing timbre include:
Spectral centroid, which indicate how "dark" or "bright" a sound will be perceived, spectral spread (a measure of bandwidth, tonality vs noisiness), and moments of order 3 and 4 (kurtosis and skewness).
Energy in each Bark band, or ratio of energies between consecutive Bark bands.
Any low-dimensionality descriptors of spectral envelope, including autoregressive coefficients, or Mel-Frequency cepstral coefficients.
See section 6 of this document.
No comments:
Post a Comment