Tuesday 8 December 2015

Why is random noise amplitude inversely proporional to the frequency?


http://astarte.csustan.edu/~tom/SFI-CSSS/info-theory/info-lec.pdf compares a bacteria DNA vs a random string:




My first step was to generate for myself a "random genome" of comparable size to compare things with. In this case, I simply used the Unix `random' function to generate a file containing a random sequence of about 4 million A, C, G, T. In the actual genome, these letters stand for the nucleotides adenine, cytosine, guanine, and thymine.


We want to find "interesting" sections (and features) of a genome. As a starting place, we can slide a "window" over the genome, and estimate the entropy within the window. The plot below shows the entropy estimates for the E. coli genome, within a window of size 6561 (= 3^8 ). The window is slid in steps of size 81 (= 3^4 ). This results in 57,194 values, one for each placement of the window. For comparison, the values for a "random" genome are also shown.



enter image description here



At this level, we can make the simple observation that the actual genome values are quite different from the comparative random string. The values for E. coli range from about 5.8 to about 5.96, while the random values are clustered quite closely above 5.99 (the maximum possible is log2 (64) = 6).


There are various "data massage" techniques we could use. For example, we could take the fourier transform of the entropy estimates, and explore that. Below is an example of such a fourier transform. Notice that it has some interesting "periodic" features which might be worth exploring. It is also interesting to note that the fourier


transform of the entropy of a "random" genome has the shape of approximately $1/f = 1/f^1$ (not unexpected), whereas the E. coli data are closer to $1/f^{1.5}$



enter image description here



I must draw you attention that the fourier transform is applied to the entropy rather than to the random signal itself. Why is it expected that resulting spectrum is pink rather than absolutely random? I do not understand why it is expected that the random frequency amplitude is inverse to the frequency, why not $amplitude=1/freq^{1.5}$? The higher low frequencies mean that the next number has higher probability to be closer to the current one and random is not quite random, right?


I have also noted that, in the end of the book, author says that "random" is naturally band-limited by the power law, $1/f^\alpha,$ and, in SETI project, we better watch after the white noise because optimal use of the channel is a sign of intelligent life.




No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...