Notes on spectral analysis
 
Jason Antenucci
 
Some fundamentals on time-series analysis, written primarily for my own benefit some time ago ...

Fourier transforms
 

Wavelet transforms


 
Fourier transforms

Periodic functions and Fourier series

We wish to investigate a function y1(t) of the form
 

y1(t) = sin(pt)            (1)

This function is periodic - that is, y1(t+r) = y1(t). The period of this function is 2p /p, and the frequency p/(2p ). We now investigate a more complex function y(t) of the form
 

y(t) = sin(pt) + sin(qt)        (2)

The function now has two frequency (or periodic) components, the frequencies being  p/(2p ) and q/(2p ).

These functions and their components are plotted in the figure below. Note the interaction of the two periodic functions in the bottom panel.

In investigating physical systems, it is likely that we will be interested in knowing the periodicity (or frequency content) of the processes under investigation. In collecting a set of data, we are likely to have available a discrete function y(t), without knowing anything about the dominant frequencies in the system. How can we determine these?

Our basic notion is that we wish to represent any function y(t) of period 2L as a combination of periodic components, namely sines and cosines. The general representation of such an idea is
 

                (3)

For the first example given above, all coefficients a and b are zero except for b1 , which is unity. In the second example, both b1 and b2 are equal to unity. The periods for the first and second term in equation (2) are 2L/np and 2L/nq, where n is an integer.

Doing a term-by-term integration between negative and positive L allows us to determine the coefficients of (3).
 

                    (4)
 

Equation (3) is known as the Fourier series of y(t), with equation (4)  the Fourier coefficients of y(t). Series with bn = 0 for all n are known as Fourier cosine series, whereas series with an = 0 for all n are known as Fourier sine series.
 
We can thus estimate any periodic function y(t) as a collection of sine and cosine functions, with the coefficients given by equation (4).
 

Non-periodic functions: Fourier integrals and transforms

What about non-periodic functions? Our approach here is to assume that the function y(t) is periodic in the limit as x becomes infinite. Instead of representing y(t) as a summation of sines and cosines, we introduce an integral representation.
 

                (5)
                (6)

Equation (5) is the equivalent of equation (3), with equation (6) being the equivalent of equation (4). We have made no assumption about the periodicity of the function y(t). It is sometimes useful to represent (5) in complex form. We use the Euler equation
 

exp(it) = cos t + i sin t            (7)

and substitute (6) into (5), to derive the complex Fourier integral

              (8)

We rearrange the double integral in (8) to give
 

            (9)

The expression in brackets becomes the Fourier transform of y
 

                (10)

and equation (9) becomes the inverse Fourier transform of y.
 

            (11)

 
Discrete sampling effects

Our usual method of describing a physical process is in the time domain, where we describe the value of some function y as a function of time: y(t). An alternate method is to describe processes in the frequency domain, such that a function Y is described as a function of frequency Y(f). The two functions y(t) and Y(f) are related via the Fourier transform equations (10) and (11) above.

These refer, however, to continuous functions. In reality, our measurements are discrete, that is evenly spaced in time. An important result of our discrete measurements is the introduction of a minimum resolvable frequency. If a time series is collected with a sampling interval of 10 seconds, the minimum resolvable period is 20 seconds, as we require two points to determine a sine wave. That is, a time series collected at 0.1 Hz (1/10 sec) can only resolve frequencies less than 0.05 Hz (1/20 sec). In this example, 0.05 Hz is known as the Nyquist, or cut-off, frequency. The Nyquist frequency is defined as

             (12)

This becomes important when choosing the sampling interval at which to make measurements of a physical process. If we wish to measure tidal ranges, we will have to measure with greater frequency than the tide itself. For example, measuring tidal amplitudes every 2 days will give us very little useful information, whereas measuring every 6 hours will give a far better indication of the tidal oscillation, as the tide is likely to have a period of 12 and/or 24 hours, depending on our geographic location.

Sampling at discrete intervals for a certain period of time gives us N measurements. The frequencies at which we can calculate the Fourier transform of f is now limited to a set of N frequencies, determined by

             (13)

We now need to replace our continuous representation of the Fourier transform (10) with a discrete representation, in keeping with our discrete sampling.

             (14)

The angular frequency w in (10) has now been replaced by 2p f, with t in (10) replaced by nD t. We have also substituted a discretised form of f, with

             (15)

 

Calculating the power spectrum and estimating the periodogram

Exactly how do we determine which are the "dominant" frequencies in the time series? We define the power spectral density as

             (16)

where T is the length of the time series, and Y(f) as above. This is the continuous representation of the power spectral density, and gives an estimate of the "power" in the signal y(t) at a particular frequency f. The discrete representation of the power spectral density is

             (17)

Analysis of the power spectral density G(f) allows us to investigate the dominant frequencies in a signal, as it is the dominant frequencies that are likely to be important to the physical process.

How do we estimate the power spectrum? One technique is to use a periodogram estimate of the power spectrum, which is defined at N/2 +1 frequencies as

 

         (18)

Note that our estimate of the power at each frequency fk is representative of some sort of average value between halfway of the preceeding frequency 'bin' to halfway to the next one. This is not always particularly accurate, due to a phenomena known as 'leakage', that is, there is a 'leakage' of energy that is at one frequency to another in the periodogram estimate. We can reduce the effects of frequency leakage through a technique known as data windowing.    

Reducing frequency leakage through data windowing

Why does leakage occur? In practice, we take a signal y(t) over a certain time period T. By considering the period T, what we have effectively done is multiply y(t) by a box function that is equal to unity over the period T, and equal to zero elsewhere. This box function (or square window function) turns on and off very rapidly. If we estimate this function as a sum of sines and cosines, we find that the estimate has to have significant higher frequency components to deal with the jumps between zero and unity. It is these higher frequency components that cause contamination of the original signal.

To overcome this problem, we apply a windowing function that is more suitable to our requirements than the square window function. Our requirement for the window function is that it preserves all the power at the centre of the bin, and next to none of the power at the extremities of the bin, with their being a slow transition between the two. Thus the object of data windowing is to "round off" potential discontinuities at each end of the finite segment of the time history being analysed. The introduction of a data window sees the transformation of (14) to

         (19)

where w(j) is the window function.

Several common windows and their leakage function are shown in the figure below. The square (or boxcar) window is defined by

         (20)

The Hanning (or cosine squared) window is defined by

         (21)

The Welch window is defined by

         (22)

 

The cosine taper window is defined by

             (23)

where m is usually equal to 0.1*N.

The effect of using windows and discrete sampling can be seen through examination of the figure below. A sine wave was generated with period 2p (frequency 1/2p Hz), with a sampling interval of 5p /256. The blue and red lines refer to a data record length of 2048 points, whereas the cyan and magenta lines refer to a data record of 2304 points. For the first two series, the dominant frequency was exactly discretised by the frequency fk with k = 20. For the last two series, the dominant frequency fell exactly between two frequencies with k = 22, 23. The top axis shows the power spectrum plotted on a linear scale, with the bottom axis plotted on a log scale.

There are several features to note. Firstly, when the dominant frequency is exactly discretised by the FFT, our best result is obtained by applying the boxcar window. The second two series show the more likely case of the dominant freqeuncy not being exactly discretised by the FFT. In this case, applying the Boxcar window results in a lower estimate of the energy at the dominant peak, as some of the energy is leaked to other frequencies. This is seen in the figure with the magenta line dropping to zero more rapidly than the cyan line. Thus when the dominant frequency is not exactly discretised by our sampling strategy (the usual case), applying a data window results in a better estimate of the power spectral density distribution.

 

Smoothing spectral estimates in the time and frequency domain

An inherent problem of periodogram estimates is that the variance is large, on the order of the PSD squared. There are two major approaches to decreasing the variance of our periodogram estimate. The first is to average the periodogram in the time domain, and the second is to average in the frequency domain.

Averaging in the time domain is known as the Welch method, and involves breaking the signal into sections and averaging the periodograms of these sections. Thus a 2048 point signal can be broken up into 8 segments of 256 points, assuming no overlap. Our final spectrum will have 128 frequencies, whereas our initial spectrum will have 1028 frequencies. Thus the inherent feature of this technique is that low frequency resolution will be lost. The more sections averaged will result in a lower variance, and more sections can be obtained by overlapping sections - the optimum is usually to overlap sections by half the length of the FFT. Best results are achieved by applying a data window to sections prior to computing the periodogram.

Averaging in the frequency domain is also known as band averaging. The spectrum is smoothed locally in the region of the target frequency, as a weighted average of m values to the right and left of a target frequency fk. The variance of the spectrum will decrease as mn increases, that is as the number of frequencies used in the smoothing increases. The value of mn is directly related to the width of the spectral window, and is also known as the bandwidth of the window. As the bandwidth increases, more spectral ordinates are averaged, and hence the resulting estimator becomes smoother, more stable and has smaller variance.

In practice, we increase the bandwidth as we move to higher frequencies. This results in smoother spectra at high frequency, whilst still keeping the low frequencies with decent resolution. This resolution is lost when using Welch's method, as the number of frequencies you can resolve depends on the length of your FFT. This is much smaller than your time series length when using Welch's method, and so the low frequencies are generally 'lost' as your frequency resolution is lost, as explained previously.
 

The coherence and phase between signals
 
It is sometimes of interest to investigate the joint structure of two series, that is the dependence of either series on the other. We are only able to observe relationships at the same frequency in both series. We define the cross-spectrum as

 

 

The "*" refers to the complex conjugate of the function, which is simply

conj(y) = real(y) - i * imag(y)

for a complex function y. Unlike the power spectrum, the cross-spectrum is complex valued as it contains amplitude and phase information. The real part of the cross-spectrum is known as the coincident spectrum (or co-spectrum), and the complex part of the cross-spectrum is known as the quadrature spectrum (or quad-spectrum). From the cross-spectrum, we are able to estimate the amplitude and phase relationship between the signals. The amplitude relationship is quantified through the coherency squared, calculated as

 

A coherency squared value of unity indicates complete dependence of one signal on another, whereas a coherencey squared value of zero refers to no dependence of one signal on another. Two signals can only be coherent at the same frequency.

Phase is calculated directly from the co-spectrum and quadrature spectrum as follows.

 

Wavelet transforms

Fourier transforms allow us to look at signals averaged over a certain period T. There is an inherent assumption of stationarity of the signal. In many natural processes, signals are rarely steady. For example, in a lake the water heats up over a season, constantly changing the background stratification conditions.

One way to get time-frequency information, as opposed to the frequency information returned by the Fourier transform, is to calculate a short-time, or windowed, Fourier transform. This approach is to break the signal up into smaller time intervals, carry out the usual Fourier transform, and then move on to the next time interval. The problem with this technique is that you have to apply a window function to each time interval, and so data is being continually lost.

An alternate technique is to apply a wavelet transform. The idea is that we multiply our function f(t) by another function y (s,t), where y  is a function of scale s and time t. We define a mother wavelet y (t), which may take the form

This type of mother wavelet is known as a Morlet wavelet. Other possible wavelets include the Haar, Mexican hat, and Daubechies wavelets, to name a few. We introduce the scale dimension by redefining the mother wavelet as (replacing t with the dummy time variable u)

We define the continuous wavelet transform as

where c are the wavelet coefficients, and in this case are real-valued. The coefficients c represent the detail in the signal f(u) at scale s.

In application to wave theory, we generally chose the Morlet mother wavelet as it has a well-defined frequency, and so we can make the transformation between scales and frequency. For the Morlet wavelet defined above, the transformation is

where s is the scale of the wavelet, h is the sampling interval of the series f(u), and f is the frequency associated with the series and scale.