Monday, 28 December 2015

Understanding Voss-McCartney pink noise generation algorithm


I'm implementing the Voss-McCartney pink noise generation algorithm.


If you follow the link above, you can read:




from James McCartney 2 Sep 1999 21:00:30 -0600:



The top end of the spectrum wasn't as good. The cascade of sin(x)/x shapes that I predicted in my other post was quite obvious. Ripple was only about 2dB up to Fs/8 and 4dB up to Fs/5. The response was about 5dB down at Fs/4 (one of the sin(x)/x nulls), and there was a deep null at Fs/2. (These figures are a bit rough. More averaging would have helped.)



You can improve the top octave somewhat by adding a white noise generator at the same amplitude as the others. Which fills in the diagram as follows:


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
x x x x x x x x x x x x x x x x
x x x x x x x x
x x x x
x x

x

It'll still be bumpy up there, but the nulls won't be as deep.



If I understand it well, this algorithm generates pink noise by adding random (white?) noise sources at different frequencies1


However, I don't fully understand the explanation given in the quote above for the extra white noise generator on the "top row". Can someone clarify how/why it improves the algorithm? Does that make it a good algorithm for pink noise generation for audio applications? Especially, shouldn't I discard the first samples until all the "rows" were mixed into the signal (in the ASCII art quoted above, that would mean discarding 15 first samples)?




1 I'm not sure of the wording here. Do not hesitate to correct me if I'm wrong



Answer



So let's look at what the author of the article you linked to says further down; Output samples are on the top row, and are the sum of all the other rows at that time.




Output  /---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\
\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/

Row -1 /---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\/---\
\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/\___/

Row 0 /--------\/--------\/--------\/--------\/--------\/--------\/--------\/--------\/--------\
\________/\________/\________/\________/\________/\________/\________/\________/\________/


Row 1 --------------\/------------------\/------------------\/------------------\/--------------
______________/\__________________/\__________________/\__________________/\______________

Row 2 ------------------------\/--------------------------------------\/------------------------
________________________/\______________________________________/\________________________

Row 3 --------------------------------------------\/--------------------------------------------
____________________________________________/\____________________________________________

Row 4 ------------------------------------------------------------------------------------\/----

____________________________________________________________________________________/\____

This means that the above diagram has multiple different white sequences, which they only occasionally change – let's formalize that. Start with only the two top rows:



  • Row -1 is simply white noise

  • Row 0 is white noise, interpolated by a factor of 2 with a 2-sample-boxcar-filter / sample-and-hold. That gives that noise an (aliased) sinc shape, which is essentially a low-pass shape


Row 1…N do the same, with the sincs becomming narrower by factors of 2.


Thinking about the discrete PSD of this:




  • Row -1 has a constant discrete PSD

  • Row 0 adds sinc(2f)²-shaped power to that

  • Row 1 adds sinc(4f)²-shaped power to that

  • and so on


All in all, I don't have a proof that this becomes perfectly pink at hand, it probably doesn't within finite observation, but it's kind of intuitive to think that close to 0 Hz, all the main lobes of these sinc²s add up, and with every doubling of frequency, you get closer to the zeros of more sinc²s.


The proposed algorithm really doesn't seem so elegant – generating good (discrete) white (pseuderandom) noise is actually surprisingly hard for longer observational windows (which is what you need if you want to assess the quality of something), and hence, having a pseudorandom generator¹ run at asymptotically twice the sampling rate seems more effort then letting it run at the sampling rate and then using an appropriate low-pass filter that approximates the desired spectral shape (in this case, $\lvert H(f)\rvert \propto \frac1f$); at least on modern CPUs, which have excellent SIMD instructions (i.e. highly optimized for running filters, not so much for running pseudo-random noise generators), the difference between holding and adding up many noise values and doing a FIR is that the FIR requires multiplication of held values with constants (the filter taps) – and since that can typically done in a fused multiply-accumulate operation.


Now, on an ASIC or FPGA, things might look different; if the amplitude distribution of the noise doesn't matter (i.e. there's no need to add up anything but uniformly drawn, uncorrelated samples), then you can actually save on complexity by doing the "simpler" thing, i.e. logical operations needed to generate e.g. XOROSHIRO128** would very likely be clocked much higher than the multipliers needed for a nice FIR filter.




¹you don't need multiple generators – you just ask that one white one more often; white samples are uncorrelated in every subsampling!

No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...