Wednesday 21 December 2016

fft - Why are magnitudes normalised during synthesis (IDFT), not analysis (DFT)?


In most examples and FFT code that I've seen, the output (frequency magnitudes) of the forward DFT operation is scaled by N -- i.e. instead of giving you the magnitude of each frequency bin, it gives you N times the magnitude.


Operationally, this is simply because the DFT is calculated by taking the inner product of the signal with each basis sine (i.e. un-normalised correlation). However, that doesn't answer the philosophical question of why don't we just divide by N before returning the output?


Instead, most algorithms divide by N when re-synthesising.


This seems counter-intuitive to me, and (unless I'm missing something) it makes all explanations of the DFT very confusing.



In every scenario I can dream up, the actual magnitude (not the magnitude * N) is the value I need from a DFT operation, and the normalised magnitude is the value I want to input into an IDFT operation.


Why isn't the DFT defined as DFT/N, and the IDFT defined as a simple sum of normalised-magnitude sinusoids?



Answer



Whether you scale the output of your DFT, forward or inverse, has nothing to do with convention or what is mathematically convenient. It has everything to do with the input to the DFT. Allow me to show some examples where scaling is either required or not required for both the forward and inverse transform.


Must scale a forward transform by 1/N.


To start with, it ought to be clear that to analyze a simple sine wave, the length of the transform should be irrelevant, mathematically speaking. Suppose N=1024, Freq=100 and your signal is:


f(n) = cos(Freq * 2*Pi * n/N)


If you take a 1024 point DFT of f(n), you will find that bin[100] = 512. But this isn't a meaningful value until you scale it by N. 512/1024 = 1/2 and of course, the other 1/2 is in the negative spectrum in bin[924].


If you double the length of the DFT, N=2048, the output values would be twice those of the 1024 point DFT, which again, makes the results meaningless unless we scale by 1/N. The length of the DFT should not be a factor in these sorts of analysis. So in this example, you must scale scale the DFT by 1/N.


Must not scale a forward transform.



Now suppose you have the impulse response of a 32 tap FIR filter and want to know its frequency response. For convenience, we will assume a low pass filter with a gain of 1. We know that for this filter, the DC component of the DFT must be 1. And it should be clear that this will be the case no matter the size of the DFT because the DC component is simply the sum of the input values (i.e. the sum of the FIR coefficients).


Thus, for this input, the DFT is not scaled by 1/N to get a meaningful answer. This is why you can zero pad an impulse response as much as you want without affecting the outcome of the transform.


What is the fundamental difference between these two examples?


The answer is simple. In the first case, we supplied energy for every input sample. In other words, the sine wave was present for all 1024 samples, so we needed to scale the DFT's output by 1/1024.


In the second example, by definition, we only supplied energy for 1 sample (the impulse at n=0). It took 32 samples for the impulse to work its way through the 32 tap filter, but this delay is irrelevant. Since we supplied energy for 1 sample, we scale the DFT's output by 1. If an impulse were defined with 2 units of energy instead of 1, we would scale the output by 1/2.


Must not scale an inverse transform.


Now let's consider an inverse DFT. As with the forward DFT, we must consider the number of samples we are supplying energy to. Of course, we have to be a bit more careful here because we must fill both the positive and negative frequency bins appropriately. However, if we place an impulse (i.e. a 1) in two appropriate bins, then the resulting output of the inverse DFT will be a cosine wave with an amplitude of 2 no matter how many points we use in the inverse DFT.


Thus, as with the forward DFT, we don't scale the inverse DFT's output if the input is an impulse.


Must scale an inverse transform.


Now consider the case where you know the frequency response of a low pass filter and want to do an inverse DFT to get its impulse response. In this case, since we are supplying energy at all points, we must scale the DFT's output by 1/N to get a meaningful answer. This isn't quite as obvious because the input values will be complex, but if you work through an example, you will see that this is true. If you don't scale by 1/N you will have peak impulse response values on the order of N which can't be the case if the gain is 1.



The four situations I've just detailed are end point examples where it is clear how to scale the DFT's output. However, there is a lot of gray area between the end points. So let’s consider another simple example.


Suppose we have the following signal, with N=1024, Freq=100:


f(n) = 6 * cos(1*Freq * 2*Pi * n/N)  n = 0 - 127
f(n) = 1 * cos(2*Freq * 2*Pi * n/N) n = 128 - 895
f(n) = 6 * cos(4*Freq * 2*Pi * n/N) n = 896 - 1023

Time Domain


Notice the amplitude, frequency, and duration differences for the three components. Unfortunately, the DFT of this signal will show all three components at the same power level, even though the 2nd component has 1/36 the power level of the other two.


UnWindowed Freq Domain


The fact that all three components are supplying the same amount of energy is obvious, which explains the DFT results, but there is an important point to be made here.



If we know the duration for the various frequency components, then we can scale the various frequency bins accordingly. In this case, we would do this to accurately scale the DFT's output: bin[100] /= 128; bin[200] /= 768; bin[400] /= 128;


Which brings me to my final point; in general, we have no idea how long a particular frequency component is present at the input to our DFT, so we can't possibly do this sort of scaling. In general however, we do supply energy for every sample point, which is why we should scale the forward DFT by 1/N when analyzing a signal.


To complicate matters, we would almost certainly apply a window to this signal to improve the DFT's spectral resolution. Since the first and third frequency components are at the beginning and end of the signal, they get attenuated by 27 dB while the center component gets attenuated by only 4 dB (Hanning window).


Windowed Spectrum


To be clear, the output of the DFT can be a pretty poor representation of the input, scaled or not.


In the case of the inverse DFT, which is usually a pure mathematics problem, as opposed to the analysis of an unknown signal, the input to the DFT is clearly defined, so you know how to scale the output.


When analyzing a signal with a spectrum analyzer, analog or FFT, the problems are similar. You don't know the power of the signal displayed unless you also know its duty cycle. But even then, the windowing, span, sweep rates, filtering, the detector type, and other factors all work to goof the result.


Ultimately, you have to be quite careful when moving between the time and frequency domains. The question you asked regarding scaling is important, so I hope I have made it clear that you must understand the input to the DFT to know how to scale the output. If the input isn't clearly defined, the DFT's output has to be regarded with a great deal of skepticism, whether you scale it or not.


No comments:

Post a Comment

readings - Appending 内 to a company name is read ない or うち?

For example, if I say マイクロソフト内のパートナーシップは強いです, is the 内 here read as うち or ない? Answer 「内」 in the form: 「Proper Noun + 内」 is always read 「ない...