Get Your Data On

2025-06-08T14:38:00.000-07:00

Interpolation in the Frequency Domain Improved!

Introduction

This post describes interpolation in the frequency domain (IFD) and an improvement to IFD. IFD is a method of interpolation that is easy to use and produces good results on most time series data (data samples equally spaced in time). It is easy to use because it does not require any prior knowledge of the data. Also, since IFD uses the Fast Fourier Transform (FFT), it is fast (even on large data sets). The drawback to IFD is that it sometimes produces an interpolation that is not good at the start and end of the sequence (referred to as “end effects”). The improved IFD reduces the end effects. While end effects are not always an issue, it is always better to not have them.

The first section briefly introduces interpolation. The next section defines the standard IFD algorithm and shows examples of its use. The final section shows an improved IFD that reduces end effects.

Interpolation Example

Let’s say we check a rain gauge at 7:00 and then once an hour for five hours, resulting in the following measurements: 0.01 cm, 0.19 cm, 0.31 cm, 0.40 cm, 0.52 cm, and 0.59 cm. We didn’t measure the rain at 7:30, but we could estimate the rainfall at 7:30 using a linear spline to interpolate (connecting the points with straight lines).

The figure below shows the five rainfall values from above and an illustration of interpolating a rainfall value at 730. The figure shows the five measurements as black circles, the interpolating blue line connecting the points 700 and 800, and the interpolated point at 0730 shown as a red circle. The red circle shows that the rainfall at 730 is 0.1 cm.

Interpolation in the Frequency Domain

Linear spline interpolation is effective if the underlying data can be modeled by a straight line. IFD works well regardless of the type of data.

The steps in the algorithm for IFD are enumerated below.

Find the discrete Fourier transform (DFT) of the data using the built-in function fft, which is R’s implementation of the FFT. The data started in the time domain (data indexed by time). After the transform, the data is in the frequency domain (data indexed by frequency).
Add zeros to the data. This is called zero padding, and it forces the inverse DFT (IDFT) in the next step to evaluate the data at more points. When adding zeros to the frequency domain data, the zeros must go in the middle. This step is slightly different for even and odd length sequences.
Perform the IDFT on the zero padded DFT sequence.

These steps are explained below.

Find the DFT of the Sequence

To illustrate this step, we use the rain fall values from above.

First, we use the fft function to find the DFT of the sequence. Below, I’m going to stop saying, “Use the fft function to find the DFT.” I’ll just say find the FFT. Note: to take up less space on the screen, I’ll round the output to two decimal places.

N <- length(x)
X <- fft(x)
round(X,digits = 2)

## [1]  1.43+0.00i -0.34+0.37i -0.34+0.11i -0.34-0.11i -0.34-0.37i

Zero Pad the Data

In this case, we want to double the number of points, so we add five zeros to X. The zeros have to be added in the “middle” of the sequence. If you are familiar with Fourier transform theory, you’ll note that this is between the positive and negative frequencies.

Xzp <- c(X[1],X[2:3],rep(0,5),X[4:5])
round(Xzp,digits = 2)

##  [1]  1.43+0.00i -0.34+0.37i -0.34+0.11i  0.00+0.00i  0.00+0.00i  0.00+0.00i
##  [7]  0.00+0.00i  0.00+0.00i -0.34-0.11i -0.34-0.37i

Note if the DFT samples X had an even number of samples, for instance 6, then zero padded DFT Xzp would be c(X[1], X[2:3], X[4]/2, rep(0, N), X[4]/2, X[5:6]). The X[N/2+1] sample (in this case, X[4]) is used twice but cut in half to keep the energy the same. I show an example of this in the IFD Section.

Inverse DFT

Now, we find the inverse FFT. The inverse FFT is scaled by one over the number of samples. Notice we are dividing by the number of actual data points (five) and not the total number of points in the padded sequence (ten). If we divide by ten, we bias the result low because the zeros don’t add any energy to the sequence.

xinterp <- fft(Xzp,inverse = TRUE)/N
round(xinterp,digits=2)

##  [1] 0.01+0i 0.00+0i 0.19+0i 0.33+0i 0.31+0i 0.29+0i 0.40+0i 0.55+0i 0.52+0i
## [10] 0.26+0i

Since we started with a real sequence, we should end with a real sequence. However, the fft output is complex, so we convert the sequence to real using the Re function.

xinterp <- Re(xinterp)
round(xinterp,digits = 2)

##  [1] 0.01 0.00 0.19 0.33 0.31 0.29 0.40 0.55 0.52 0.26

The figure below contains two plots. The interpolated sequence in blue x’s and the original sequence in black circles (the circles are hard to see because they have x’s on them). Note the interpolation (especially the points at 7:30 and 12:30) is not great. This method suffers from end effects and performs poorly with short sequences. We show a method that mitigates the end effects in the IFD Function Improved section.

Example IFD with a Longer Sequence

Now, let’s try IFD with a longer sequence. Below is a 49 sample ramp.

N <- 49
x <-1:N

Since we now know the algorithm, I’ll go through the steps with less explanation. First, find the FFT.

X <-fft(x)

Second, zero pad by 2.

Xzp <- c(X[1:ceiling(N/2)],rep(0,N),X[(ceiling(N/2)+1):N])

Finally, the IDFT.

xInterp <- Re(fft(Xzp,inverse = TRUE)/N)

The figure below contains two plots. The interpolated sequence is shown as a blue line, and the ramp sequence is shown as black circles. You can see the end effects at the start and end of the plot. After the end effects, the interpolated points are close to the ramp.

The figure below contains a zoomed-in view of the one above. A blue line shows the interpolated sequence, and black circles show the original ramp samples. You can see that after the effects at the beginning, the interpolation is very good.

IFD Function

Here, we put the IFD steps into the function findIfd, which inputs a data sequence dataSet and the interpolation factor M. To keep the function simple, M is restricted to an integer value. The function first checks to see if the interpolation factor is an integer, not less than 1. Next, it finds the FFT of dataSet and sets N to the length of dataSet. The function determines if dataSet has an even or odd number of samples and then places the zeros in the middle. After zero padding, it finds the IFFT of the zero-padded sequence. Finally, it returns the interpolated sequence, converting the sequence to real if the input dataSet was real. Note the R function is.numeric returns true if the sequence is real.

findIfd <- function(dataSet,M=2) {
  
  if (round(M)!=M || M<1) {
    stop("M must be an integer larger than zero.")
  }
  
  X <- fft(dataSet)
  N <- length(X)
  
  if(floor(N/2)==N/2){
    Xzp <- c(X[1:(N/2)],X[N/2+1]/2,rep(0,(M-1)*N-1),X[N/2+1]/2,X[(N/2+2):N])
    
  } else {
    Xzp <- c(X[1:ceiling(N/2)],rep(0,((M-1)*N)),X[(ceiling(N/2)+1):N])
  }
  xint <- fft(Xzp,inverse = TRUE)/N
  
  if (is.numeric(dataSet))
      Re(xint[1:(M*N)])
  else
      xint[1:(M*N)]
}

Example: IFD Function and the Ramp

Below, we use the findIfd to interpolate the ramp. You can see we started with a 50 sample ramp and used an interpolation factor of 8, so the length of the interpolated sequence is 400.

N <- 50
x <-1:N
xInterp <- findIfd(x,M=8)

The plot below shows the interpolated sequence in blue (800 points make the plot hard to read), and the ramp sequence is shown in black points. As shown above, we can see that the interpolation is good away from the middle.

Example: Interpolating a Sine Wave

IFD works better when the function is periodic and smooth. In this example, we interpolate a sine sequence, and since it is periodic and smooth, we should get a better result.

N <- 50
n <- 0:(N-1)
f <- 0.02
x <- sin(2*pi*f*n)
M <- 4
xInterp <- findIfd(x,M)
Ndx <- seq(0,N-1/M,1/M)

The figure below contains two plots. The black circles are the original sine sequence, and the blue line shows the interpolation. You can see this is a very good interpolation with no end effects.

IFD Function Improved

The improved version of the IFD function “mirrors” the sequence before performing the interpolation. By mirror, I mean appending a reversed version of the sequence to the end of the original sequence. See below for an example of mirroring.

x <- 1:5
c(x,rev(x))

##  [1] 1 2 3 4 5 5 4 3 2 1

The function findIfdImproved inputs a sequence x and the interpolation factor M and then outputs an interpolated sequence. The difference between this function and findIfd is this function mirrors the sequence before implementing the IFD algorithm. The extra samples at the end of the interpolated sequence (caused by the mirroring) are removed before returning the interpolated sequence. Note mirroring doubles the sequence, making it even. So, we only need to use the zero padding method for even length sequences. Note that findIfdImproved doubles the sequence but does not take twice as long due to the proprieties of the FFT.

findIfdImproved <- function(x,M) {
  
  if (round(M)!=M || M<1 ) {
    stop("M must be an integer larger than zero.")
  }
  
  X <- fft(c(x,rev(x)))
  N <- length(X)
  
  Xzp <- c(X[1:(N/2)],X[N/2+1]/2,rep(0,(M-1)*N-1),X[N/2+1]/2,X[(N/2+2):N])

  xint <- fft(Xzp,inverse = TRUE)/N
  
  if (is.numeric(x)) {
    xint <- Re(xint[1:(M*length(x))]) }
  else{ 
    xint <- xint[1:(M*length(x))] }
}

Below, we use the findIfdImproved to interpolate the 50 point ramp, with an interpolation factor of 8.

N <- 50
x <-1:N
xInterp <- findIfdImproved(x,M=8)

The plot below shows the interpolated sequence as a blue line and the ramp sequence as black points. We see the findIfdImproved produces an improved interpolation with almost no end effects (you can see slight end effects at the end of the sequence).

Conclution

IFD is an easy-to-use method of interpolation that produces good results, except for the end effects. The improved method significantly reduces the end effects, at the cost of additional processing.

Testing and False Positives

2025-06-01T13:59:00.000-07:00

It is impossible to design a test that can detect something without the possibility of a false alarm (false positive). The only way to guarantee no false alarms is to never detect anything. When you design a test, you select the highest false alarm rate you can live with and then calculate the true positive rate based on the signal to noise ratio (SNR) and the type of test.

The relationship between the probability of false alarm and the probability of detection is shown by a receiver operator characteristic (ROC) curve (receiver, as in a radar receiver). ROC curves are used in radar, sonar, and many other applications. Below is a great example of a ROC curve.

ROC curve by Martin Thoma

Imaginary Dust

2025-03-02T14:21:00.000-08:00

Numerical error can sometimes cause what should be real output to be complex. Below we take the inverse fast Fourier transform (FFT) of the FFT of a real sequence. The result should be real (we should get back what we started with), but the result is a complex sequence, which is not what we started with!

x <- fft(fft(1:10),inverse = TRUE)/10
x

##  [1]  1+0i  2+0i  3-0i  4+0i  5-0i  6+0i  7-0i  8+0i  9-0i 10+0i

The complex part is small but shouldn’t be there. We could use the Re function to remove the imaginary part, but getting a large imaginary part when we were expecting a real result indicates an error. If we use the Re function, the error is hidden.

The function removeDust removes the small imaginary part but returns an error if the imaginary part is larger than expected. How small is up to you, but the default is 1e-10.

removeDust <- function(x, imTol=1e-10){
  if (max(abs(Im(x))) > imTol) {
    stop("The imaginary part was > ", imTol)
  }
  Re(x)
}

Below, we use the removeDust function to remove the imaginary dust.

##  [1]  1+0i  2+0i  3-0i  4+0i  5-0i  6+0i  7-0i  8+0i  9-0i 10+0i

removeDust(x)

##  [1]  1  2  3  4  5  6  7  8  9 10

DeepSeek

2025-01-29T16:09:00.000-08:00

DeepSeek is a big deal because the AI luminaries you've heard from are (were) counting on AI requiring giant warehouses of computers to build and run the models. This provides a large barrier to entry (or moat, in tec guy speak). The moat would mean they have no competition and can charge large subscription fees.

DeepSeek was made in less time and with much less money than the AIs you've heard about. And it can run locally. This means lots of companies could compete, and because it runs locally, no subscription fees (or at least much lower ones)

The time and money could be propaganda, but the running locally has been tested.

November Fog

2023-11-18T20:14:00.000-08:00

Pumpkins!

2023-10-29T20:00:00.001-07:00

Wednesday Linkorama!

2023-07-26T18:48:00.002-07:00

Round up of things I found interesting.

The image above came from a Gizmodo article explaining how measuring helium from distant galaxies offers clues to the matter/antimatter imbalance.

The four-color problem asks, "Are four colors sufficient to color the countries of a map so that no two adjacent countries share the same color?" A few minutes of trying this with some paper and markers would lead you to think that four is enough. But it was hard to rigorously prove that 4 is the minimum. In 1976, with the help of a computer program, the question was answered. This article talks about the problem and the controversy surrounding the use of computer programs in math proofs.

The news about the room temperature superconductor would be the biggest science and technology news of the week, but there is quite a bit of skepticism about the result. If true, it would be a huge deal. It could speed up computers, improve batteries, bigger electro magnetic (floating trains!) and so on.

Autoregressive Moving Average Models and Power Spectral Densities

2023-07-25T17:33:00.005-07:00

1 Introduction

The simplest way to estimate a power spectral density (PSD) is to use a periodogram, defined as

periodogram <- function(x) (1/length(x))*abs(fft(x))^2

where x is the data samples. The periodogram estimates the power level at $N$ frequency locations where $N$ is the number of samples in x. The periodogram uses $N$ data samples to estimate $N$ power levels. This leads to high variance estimates. There are ways to reduce the variance, but they also reduce the resolution.

A different approach is to estimate the PSD using a model. A good model can capture the important features of the PSD and be controlled by a few parameters. The autoregressive moving average (ARMA) model and the closely related autoregressive (AR) and moving average (MA) models can model a wide variety of PSDs and can be controlled by a few parameters.

This article defines the ARMA, AR, and MA models and then shows how to calculate their PSDs. We then cover how they are used to estimate PSDs, how to select between the three models, and how to choose the number of parameters (the model order). By the end of this article, you will understand how ARMA/AR/MA models are used for PSD estimation.

Much of this article comes from Chapter 8 of Power Spectral Density Estimation by Example Carbone (2022).

2 ARMA Model

An ARMA model of order $p$, $q$ is defined as

\[ a[0]y[n] = -\sum_{k=1}^p a[k]y[n-k] +\sum_{k=0}^q b[k]x[n-k], \tag{1}\]

where $y[n]$ is the current output, $a[k]$ and $b[k]$ are the coefficients of the model, $p$ is the number of $a$ coefficients, $q$ is number of $b$ coefficients, $y[n − k]$ are the past outputs, and $x[n − k]$ are past (and present) inputs. The coefficient $a[0]$ is always assumed to be 1, so from here on; we replace $a[0]$ with 1. If $a[k]$ $\neq$ 1 we could divide Equation 1 by $a[0]$.

The input $x$ is a sequence of samples from a white noise process with variance $\sigma^2$. The input process could be any white noise process, but for the sake of specificity, we assume the process is white Gaussian noise (WGN). The ARMA model of order $p$, $q$ is written as ARMA[$p,q$]. The parameters of the ARMA model are the autoregressive coefficients $a[k]$, the moving average coefficients $b[k]$, and the variance of the input $\sigma^2$.

Below, we use the ARMA[1,1] model to produce simulated data. In this case, we let $a[1] = 0.9$, $b[1] = −0.9$, and the input $x[n]$ is WGN with variance 1. The model specified above is written as \[ y[n] = −0.9y[n − 1] − 0.9x[n − 1] + x[n]. \tag{2}\]

Note the $a[1]$ coefficient is 0.9. The negative sign in front of 0.9 is part of Equation 1. Below we use rnorm(N) to produce N samples of WGN with variance 1.

set.seed(1) 
N <- 512 
x <- rnorm(N)

Figure 1 is a plot of x, the input into the ARMA[1,1] process defined in Equation 2.

Figure 1: The figure is a plot of the input x to the ARMA[1,1] process defined in Equation 2

Below is the ARMA[1,1] model from Equation 2 implemented in R.

y_arma11 <- rep(0,N)
a <- 0.9 
b <- -0.9 

for (n in 2:N) 
  y_arma11[n] <- -a[1]*y_arma11[n-1]+ b[1]*x[n-1] + x[n]

Figure 2 is a plot of y_arma11, the model’s output.

Figure 2: The figure is a plot of the output produced by the ARMA[1,1] process in Equation 2.

The PSD $P(f)$ of an ARMA model is \[ P_{\text{ARMA}}(f) = \sigma^2 \left |\frac{ 1 + \sum_{k=1}^q b[k]\exp(-j2\pi fk) } { 1 + \sum_{k=1}^p a[k]\exp(-j2\pi fk) } \right |^2, \tag{3}\] where the $a$’s and $b$’s are the coefficients of the model and $\sigma^2$ is the variance of the WGN input. Notice the numerator and denominator in Equation 3 are polynomials. The roots (zeros) of these polynomials control the shape of the PSD. The denominator’s roots are called poles, and they control the peaks (high parts). The numerator’s roots are called zeros and control the valleys (low parts). Note that the peaks are more important than the valleys in most (not all!) applications.

Because ARMA models have poles and zeros, they are good at modeling PSDs with peaks and valleys. When used for PSD estimation, ARMA estimators are called “high resolution” estimators because they are better than the periodogram at resolving closely spaced peaks.

If we let \[B(f) = 1 + \sum_{k=1}^q b[k]\exp(-j2\pi fk)\] and \[A(f) = 1 + \sum_{k=1}^p a[k]\exp(-j2\pi fk),\] then we can write Equation 3 as \[ P_{\text{ARMA}}(f) = \sigma^2 \left |\frac{B(f) } { A(f) } \right |^2 = \sigma^2 |B(f)|^2 \frac{1}{|A(f)|^2}. \tag{4}\] Notice $A(f)$ is the Fourier transform of the $a$’s, and $B(f)$ is the Fourier transform of the $b$’s. Equation 4 makes clear that the ARMA PSD is the product of the input variance $\sigma^2$, $|B(f)|^2$, and $\frac{1}{|A(f)|^2}$. Note the PSD of white noise with variance $\sigma^2$ is $P_{\sigma^2}(f) = \sigma^2$, which is a constant.

The function psdArma inputs the ARMA parameters and output the PSD calculated at evenly spaced point along the frequency axis. The function inputs: the AR coefficients arCoefficients, the MA coefficients maCoefficients, the variance of the input process inputVar, and the number of points in the fft used to calculate the PSD. The function uses the fft function to find $A(f)$ and $B(f)$ in Equation 4. The function uses zeroPad (see below) to zero pad the AR and MA coefficients to produce Nfs frequency samples. The parameters must be known or estimated before using psdArma. If the parameters are known, the function outputs the PSD. If the parameters are estimated, the function outputs an estimated PSD.

zeroPad <- function(x,N) {
  c(x,rep(0,max(N-length(x),0)))
}

psdArma <- function(arCoefficients,maCoefficients,inputVar,Nfs=256){
  A <- fft(zeroPad(arCoefficients,Nfs))
  B <- fft(zeroPad(maCoefficients,Nfs))
  inputVar*abs(B)^2/abs(A)^2
}

Let’s use psdArma to calculate the PSD of the ARMA[1,1] process in Equation 2.

psdArma11   <- 10*log10(psdArma(c(1,a),c(1,b),inputVar=1,Nfs=length(x)))

Figure 3 contains two plots. The blue plot is the PSD of the ARMA[1,1] process that produced the data in Figure 2. The black plot is the PSD of the WGN input.

Figure 3: This figure contains two PSDs. The black line at 0 dB is the PSD of the WGN input, and the blue line is the PSD of the ARMA[1,1] process from Equation 2. The ARMA[1,1] PSD has a peak at 0.5 and a valley at 0.0 (or 1.0). Remember, the PSD is periodic.

There are three things to notice about the PSD plots. First, the frequency units on the x-axis are a fraction of the sampling frequency (cycles/sample). If you know the sampling frequency (samples/second), you could recover frequency in Hz (cycles/second) by multiplying frequency by the sampling frequency, as \[ (\text{cycles/sample})(\text{samples/second})=(\text{cycles/second}).\]

Second, the y-axis is power in decibels (dB), calculated as \[\text{power dB} = 10\log_{10}(\text{power}).\]

Third, PSDs are periodic with the sampling frequency. So, they can be plotted along any region that shows one period. Below is an example of the PSD from above plotted from -0.5 to 0.5.

Figure 4: This figure is the same as the one above, except it is plotted -0.5 to 0.5.

3 Autoregresive Model

ARMA[$p$,0] models are known as autoregressive (AR) models of order $p$. The AR[$p$] model is written as \[ y[n] = -\sum_{k=1}^{p} a[k]y[n-k] + x[n], \tag{5}\] where the $a[k]$’s are the AR coefficients, $x$ is the WGN input, and $p$ is the process’s order.

We use the AR[1] model below to produce simulated data. We let $a[1]=0.9$, and the input $x[n]$ is WGN with variance 1. The model is written as \[ y[n] = -0.9y[n-1] + x[n], \tag{6}\]

We use a for loop to implement the ARMA model. We could do the same thing for the AR model, but using the filter command from the gsignal is faster and makes the code more readable. The function inputs the AR coefficients a, the MA coefficients b, and the samples to be filtered x. The function outputs the data filtered by the ARMA coefficients.

Figure 5 is a plot of the samples produced by the code below.

set.seed(1)
N <- 512
x <- rnorm(N)
a <- c(1.0, 0.9)
b <- 1
y_ar1 <- gsignal::filter(b,a,x)

Figure 5: This is a plot of the samples produced by the AR[1] model in Equation 6 produced by the code above.

The PSD of an AR process is \[ P_{\text{AR}}(f) = \frac{\sigma^2 } {|1 + \sum_{k=1}^p a[k]\exp(-j2\pi fk)|^2 }. \tag{7}\] Since Equation 7 has no zeros in the numerator; the AR PSD only has poles. The poles make AR models good at modeling PSDs with peaks. AR PSD estimators are high resolution estimators because they are good at resolving closely spaced peaks. AR models are good at modeling peaks and are simpler than ARMA models, so they are used more often than the other two.

Below we find the PSD of the AR[1] model above.

psdAr1   <- 10*log10(psdArma(a,b,inputVar=1,Nfs=length(x)))

Figure 5 contains two plots. The blue plot is the PSD of the AR[1] process that produced the data in Figure 5. The black plot is the PSD of the WGN input.

psdAr1   <- 10*log10(psdArma(a,b,inputVar=1,Nfs=length(x)))

Figure 6: This figure contains two PSDs. The black line at 0 dB is the PSD of the WGN input, and the blue line is the PSD of the AR[1] process from Equation 7. This PSD has one pole at 0.5 and no zeros.

4 Moving Average Model

ARMA[0,$q$] models are known as moving average (MA) model of order $q$. An MA[$q$] is written as \[ y[n] = \sum_{k=0}^q b[k]x[n-k], \tag{8}\] where the $b[k]$’s are the MA coefficients, $x[k]$ is WGN input, and the $q$ is the process’s order. The simple averaging filter is a special case of the MA model, where the MA parameters are $\frac{1}{q+1}$. The simple averaging filter is written as \[ y[n] = \frac{1}{q+1}\sum_{k=0}^q x[n-k]. \] Below we use the MA[1] model to produce simulated data. We let $b[1]=-0.9$, and the input $x[n]$ is WGN with variance 1. The model is written as \[ y[n] = -0.9x[n-1] + x[n], \tag{9}\] Now we filter the complex WGN samples.

set.seed(1)
N <- 512
x <- rnorm(N)
a <- 1
b <- c(1,-0.9)
y_ma1 <- gsignal::filter(b,a,x)

Figure 7 is a plot of the samples produced by the code above.

Figure 7: This is a plot of the samples produced by the Ma[1] model in Equation 9.

The PSD of an MA process is \[ P_{\text{MA}}(f) = \sigma^2 |1 + \sum_{k=1}^q b[k]\exp(-j2\pi fk)|^2. \tag{10}\] The MA PSD has zeros, but no poles. So, the MA model is good at modeling PSDs with deep valleys. The MA PSD estimator is not able to resolve closely spaced peaks.

Below we find the PSD of the MA[1] model above.

psdMa1   <- 10*log10(psdArma(a,b,inputVar=1,Nfs=length(x)))

Figure 8 contains two plots. The blue plot is the PSD of the MA[1] process that produced the data in Figure 7. The black plot is the PSD of the WGN input.

Figure 8: This figure contains two PSDs. The black line at 0 dB is the PSD of the WGN input, and the blue line is the PSD of the MA[1] process from Equation 9. You can see the zero at 0.0 (or 1.0).

5 Estimating PSDs using Models

ARMA/AR/MA PSD estimation is performed using the following steps. First, choose the model you wish to use. Second, use the data to estimate the model parameters. Parameter estimation is beyond the scope of this article. You can find information in various places, including Carbone (2022), Kay (1988), or Marple (1987). Finally, calculate the estimated PSD using the estimated parameters in the psdArma function.

6 Choosing a Model

If you are only interested in peaks, use the AR model. If only the valleys are important, use the MA model. If both the peaks and valleys are important, use the ARMA model. As mentioned above, peaks are usually the more important feature, so the AR model is often chosen.

7 Model Order Selection

An ARMA model can model at most $p$ peaks and $q$ nulls. So, if you know the number of peaks and valleys in the PSD, choosing the model order is straightforward. For instance, if the PSD had one peak and two nulls, you would use an ARMA[1,2] model. Unfortunately, we don’t always know the number of peaks and nulls, but we can use a model order estimator to help us choose the model order.

Model order estimators are mathematical tools that let you estimate the order of the process that produced the data. The Akaike Information Criterion (AIC) is a model order estimator built into the arima function. The AIC calculates how well the model fits the data (lower is better).

The arima function produces several outputs, but we will use the AIC. So we use the arima function with $aic, which will only output the AIC value. The arima$aic function inputs the data and the model order and then outputs the AIC associated with that model order. Below is an example of using the function to find the model order.

Below we use the AR[1] data shown in Figure 5 and calculate the AIC for AR model orders 1 to 10.

aicAR <- rep(0,10)
for (modelOrder in 1:10) {
  aicAR[modelOrder] <- arima(y_ar1, order = c(AR=modelOrder, 0, MA=0))$aic
}

Figure 9 is a plot of the AIC for AR models 1 to 10. Note the minimum AIC is at AR[1], which is the correct model order.

Figure 9: This figure contains a plot of the AIC for model orders 1 to 10. The lowest AIC value is at AR[1], which is the true model order.

However, we choose the model order; we don’t want to overfit the model. We want to choose the minimum model order that can model the important features. Overfitting the model requires more parameters and can add bias and variance to our estimate.

References

Carbone, C. 2022. Power Spectral Density Estimation by Example. Amazon.

Kay, Steven M. 1988. Modern Spectral Estimation. Englewood Cliffs, NJ: Prentice-Hall.

Marple, S. L. 1987. Digital Spectral Analysis with Applications. Englewood Cliffs, NJ: Prentice-Hall.

Wednesday Linkorama!

2023-07-18T19:04:00.011-07:00

Round up of things I found interesting.

The image above came from a Sciencenews article and it is possibly the first predator.

As a big fan of diet soda, I'm not happy to hear the bad news on aspartame. On the other hand "The ruling does not mean you need to stop consuming all aspartame-containing products, nor does it change the acceptable daily intake put forth by the World Health Organization (WHO). In a separate ruling released at the same time, the Joint Food and Agriculture Organization/WHO Expert Committee on Food Additives (JECFA), which evaluates the levels of exposure to food additives that pose a risk, found there was no convincing evidence for harm with aspartame consumption below the current acceptable daily intake limit of 40 milligrams per kilogram of body weight. That limit was established by JECFA in 1981. For a 70-kilogram (155-pound) person, it’s equivalent to about 14.5 cans of Diet Coke."

This video shows 40 years of Boston Dynamics progress. It shows that 40 years of incremental progress really adds up.

I liked season 1 of Foundation, and I'm looking forward to Foundation season 2.

Autoregressive Estimation and the AIC

2023-07-16T20:19:00.003-07:00

1 Model Order Selection

This section provides a brief introduction to autoregressive (AR) model order estimation. An AR model of order p (AR[p]) can model up to p peaks in a PSD. So, if we wanted to estimate a PSD with 4 peaks we would use an AR[4] model. Unfortunately, we don’t always know the number of peaks. Note, however we choose the model order we don’t want to over fit the model. Meaning we want to choose the minimum model order that can model the important features. Over fitting the model requires more parameters and adds bias and variance to our estimate.

The arima function inputs the model order and the data. The function produces a number of outputs, but we only use the AIC. Using the arima function with $aic outputs the AIC associated with the input model order. Below is an example of using the arima function to find the model order.

2 Example

In this example we use data from an AR model and then use the AIC to estimate the order of the AR model that produced the data. Note the example is implemented in the R programing language. First, we create data using an AR[1] model.

set.seed(1) 
N <- 512
y_ar1 <- rep(0,N)
x <- rnorm(N)
a <- 0.9 

for (n in 2:N) 
  y_ar1[n] <- -a*y_ar1[n-1] + x[n]

Next, we use the data to estimate the model order. We calculate the AIC for AR models AR[1] to AR[10]. The model with the lowest AIC is the model that fits the data. If two models have similar AICs we should use the lower order model.

aicAR <- rep(0,10)
for (modelOrder in 1:10) {
  aicAR[modelOrder] <- arima(y_ar1, order = c(AR=modelOrder, 0, MA=0))$aic
}

Figure 1 show the AIC value for AR model orders from 1 to 10.

Figure 1: This figure contains a plot of the AIC for model orders 1 to 10. The lowest AIC value is from the AR[1] model, which is the true model order. The AIC is not perfect and sometimes will not get the exact right answer.

Note a similar method could be use to find the model order of moving average (MA) and autoregressive moving avarage (ARMA) models.

Wednesday Linkorama!

2022-12-07T17:54:00.006-08:00

Round up of things I found interesting.

The photo above came from a Gizomodo article about NASA’s Latest Artemis 1 Moon Images. Highly recomed you take a look!

San Francisco narrowly avoids killer robots! Look like the city was thinking about deploying police robots that carried explosives.

How damn became darn. The explanation of how and why use substitute swears.

The Periodogram or Something Else?

2022-12-04T08:57:00.002-08:00

The periodogram is the simplest and most common spectral estimator and, in R is implemented as

findPer <- function(x){(1/length(x))*abs(fft(x))^2},

but there are may othere spectral estimators^*.

So, which spectral estimator should you pick? The flow chart below presents a reasonable way to pick between spectral estimators. The first question you should ask is, do you want to assume a model for the data? If you don’t know anything about the data or don’t trust what you know, then move to the left side of the flow chart. If the Fourier methods have enough resolution, use the periodogram or the Blackman-Tukey spectral estimator (BTSE). If they don’t, you could try the minimum variance spectral estimator (MVSE). If the MVSE does not have enough resolution, try a high order autoregressive (AR) estimate. The AR method does assume a model for the data, but a high enough AR model can estimate any PSD.

If you want to assume a model, we move to the right side of the flow chart. If we know the data consists of sinusoids in noise, then the multiple signal clarifier (MUSIC) estimator is a good choice. If the important features are peaks, but the data is not made up of sinusoids in noise, the AR estimator is a good choice. If nulls are the important features, then use a moving average (MA) estimator. If the important features are nulls and peaks, use the autoregressive moving average (ARMA) estimator. Finally, if you have no idea about the PSD shape and want to use a model, you could try a high order AR.

* For more info on spectral estimation, see my new book! -> Power Spectral Density Estimation by Example

Wednesday Linkorama

2022-11-30T17:35:00.001-08:00

Round up of things I found interesting.

Space elevators have been a part of SiFi for a long time, this article talks about possibly building them for real! Space elevators are cables that go from the ground into space that a vehicle can use to climb into space. This article describes the basic ideas about how they work and why you may want to use one instead of rockets.

Crypto is starting to get the attention of regulators.

Diplomacy is a classic social board game focused on trickery and negotiations between players. The first games computers got really good at (like chess) required a lot of calculation. This article talks about Cicero, an AI that (allegedly) has human level performance in the game Diplomacy. So, computers may be getting better at other types of games.

Wednesday Linkorama

2022-11-02T18:57:00.002-07:00

Round up of things I found interesting.

A Reddit user bought an old Netflix server and reconfigured it for personal storage.

New compression claims 10x over MP3. MP3 is already compressed so ... If this is 10x and sounds as good then they really have something.

Normally autonomous trucks are 5 years out, but Aurora says they are coming in 2024. So, maybe they are on to something?

Wednesday Linkorama

2022-10-26T17:46:00.010-07:00

Round up of things I found interesting.

gsignal is an R package that is an improvement over the signal package. The signal package is a group of signal processing functions like you might find in Matlab. It was helpful but was missing some common functions, and some functions did not play well with complex numbers. The gsingal packaged added some key functions and works with complex numbers. It has the poly function but seems to be missing the roots function.

Ventura, the new mac operating system, is available. I'm going for it. Wish me luck!

Copilot is an app that helps write code. It uses what you wrote to suggest code snippets as you write the code. It looks interesting; I'm going to give it a try.

2022-10-25T10:08:00.010-07:00

My new book is published!

This book provides the tools to use, understand, and apply power spectral density (PSD) estimation. By the end of this book, you will be able to select an appropriate PSD estimation method based on your data, perform the estimation, and understand the results. To accomplish this, we cover the methods and just enough theory so you can use the estimation techniques correctly and interpret the results meaningfully.

Get it here -> Spectral Density Estimation by Example

Filtering Complex Data with R

2020-08-27T19:59:00.009-07:00

Filtering

Base R has a function filter that can perform moving average (MA) or filters (AR). But it cannot do both at the same time; it cannot do an autoregresive ARMA filter. It also only works on real data. The filter function in the signal package can perform ARMA filters, but is is built on the filter function in the base package. So, it also cannot filter complex data. Below is an example of filtering some data with the filter from the signal package. We are using a 6 sample MA filter as specified by the b=rep(1/6,6).

N <- 64
x <- rnorm(N)
b <- rep(1/6,6)
a <- 1
y <- signal::filter(b,a,x)

The figure below has two plots. The blue circles are the Gaussian samples. The green circles are the filtered Gaussian samples. The filtered data has less variance.

Complex Data

In engineering the most common kind of complex data is the circularly symmetric gaussian noise. The circularly symmetric part means that the real and complex part are independent and identically distributed. I described the makeCG function in another post, but as you can see the real part and the imaginary part are made using separate calls to the rnorm function.

makeCG <- function(N,v=1) {(sqrt(v/2))*rnorm(N) + (sqrt(v/2))*1i*rnorm(N)}

The function filterComplex filters real or complex data. The function inputs the MA parameters b and the AR parameters a, and the data x. If the data is complex, it makes two filter calls: one with the real part and one with the imaginary part. If the data is real it make one call.

filterComplex <- function(b, a, x) {
  if (is.complex(x)) {
    y <- signal::filter(b,a,Re(x)) + 1i*signal::filter(b,a,Im(x))
  } else y <- signal::filter(b,a,x)
  return(y)
}

Here we simulate complex Gaussian data and use the function filterComplex to filter it.

N <- 64
x <- makeCG(N)
b <- rep(1/6,6)
a <- 1
y <- filterComplex(b,a,x)

The figure below has two plots. The blue circles are the real part of the complex Gaussian samples. The green circles are the real part of filtered complex Gaussian samples. As expected, the filtered data has less variance.

Now the imaginary part. The blue circles are the imaginary part of the complex Gaussian samples. The green circles are the imaginary part of filtered complex Gaussian samples.

Dungeons and Dragons: Advantage

2020-06-24T11:17:00.002-07:00

D20 and Random Events

In the game Dungeons and Dragons, the success or failure of an event is determined by rolling a 20 sided die (D20): higher is better. If you need to roll 11 or higher you have a 50% chance of success. If another event requires 10 or better you now have a 55% chance for success. Each point the roll goes up or down is worth 5%. Often times, rolling a 20 is a critical success and a 1 is a critical failure; critical means it's extra good/bad.

Advantage

Sometimes things are really going your way and you roll with advantage. Sometimes things are not in your favor and you roll with disadvantage. When rolling with advantage, roll two dice and pick the higher. When rolling with disadvantage, roll two dice and pick the lower. How do advantage and disadvantage affect the chance of getting a 20 or 1?

When rolling without advantage or disadvantage, the probability of getting a 20 or a 1 is 1/20 = 0.05 or 5%. The probability of not getting a twenty 1 - 1/20 = 19/20.

P20    <- 0.05
Pnot20 <- 1-P20

When you have advantage, to not get a 20, you have to not roll a 20 twice. The probability of getting a 20 is (1 - the proability of not getting a 20 twice), and as you can see below, is almost 10%. Advantage about doubles you chance of getting a 20. But you probability guessed that since you're rolling twice. :) The chance of getting a 1 is 0.05^2=0.0025, almost 0!

Pnot20*Pnot20

## [1] 0.9025

1 - Pnot20*Pnot20

## [1] 0.0975

How will advantage and disadvantage affect the average roll? This time let's estimate the answer using a simulation. A 1000 trials will give a good estimate.

Trials <-10000

Below we simulate rolling two D20s

set.seed(1)
x <- sample(1:20,Trials,replace = TRUE)
y <- sample(1:20,Trials,replace = TRUE)

When we have advantage, roll two D20 and pick the max. With disadvantage roll two D20 and pick the min.

RollsWithAdvantage <- apply(cbind(x,y), 1,max)
Advantage    <-mean(RollsWithAdvantage)
RollsWithDisadvantage <- apply(cbind(x,y), 1,min)
Disadvantage <-mean(RollsWithDisadvantage)

Below is the simulated advantage mean, the calculated mean of a regular D20 roll, and the simulated disadvantage mean. Advantage adds about 3.4 points and disadvantage subtracts 3.4 or about +/- 17%.

Advantage

## [1] 13.8863

mean(1:20)

## [1] 10.5

Disadvantage

## [1] 7.1287

Below are 3 figures with histograms. The first histogram is made with 10,000 simulated D20 advantage rolls. Advantage moves a lot of probability to the right. The second histogram is made using 10,000 simulated D20 regular rolls. The histogram is approximately flat and each number is at about 0.05% and that's what we calculated. The last histogram is made with 10,000 simulated D20 disadvantage rolls. Disadvantage moves an equal amount to the left.

The probability of getting 11 or better with advantage is 1- probability of getting 10 or less twice. Rolling with advantage moves a 50% chance to a 75% chance!

p10orLess <- 0.5
1 - p10orLess^2

## [1] 0.75

Complex Normal Samples In R

2020-06-20T08:03:00.001-07:00

Normal Samples

If we want 10 samples from a Gaussian or normal random process with variance 4 can use rnorm(10,sd=2). Remember the standard deviation (sd) is the square root of the variance.

set.seed(2)
x <- rnorm(10,sd=2)
x

##  [1] -1.7938291  0.3696984  3.1756907 -2.2607513 -0.1605035  0.2648406
##  [7]  1.4159095 -0.4793960  3.9689479 -0.2775740

var(x)

## [1] 3.880803

The var() function produces an estimate of the variance, if we want a better estimate we need more samples.

var(rnorm(1000,sd=2))

## [1] 4.105966

Complex Normal Samples

If we are using base R and want complex normal (CN) samples, we need to write our own function. When the signal processing literature refers to CN they are usually referring to circularly-symmetric CN. Circularly-symmetric means the samples are independent and their mean is 0.

The function produces N CN samples with variance v. The real and imaginary parts are independent, because they are produced by different calls to rnorm(). Let x,y be independent. The var(ax) = a^2var(x) and var(x+y)=var(x)+var(y). So, if we want a variance of 1 would have to start a variance of sqrt(1/2).

makeCN <- function(N,v=1) {(sqrt(v/2))*rnorm(N) + (sqrt(v/2))*1i*rnorm(N)}
makeCN(10)

##  [1]  0.0023376-0.2079938i  0.7613032+0.6053620i  0.3946671-0.4049715i
##  [4]  0.4892950-0.1207824i  0.4651165-0.2871364i -0.2312504+0.9408834i
##  [7] -0.2153405-0.9648887i -1.0994866+1.0119199i  1.0396552+0.7824796i
## [10]  0.1147878+0.9059002i

If we want to check the variance, we can't use var() directly.

var(makeCN(10))

## Warning in var(makeCN(10)): imaginary parts discarded in coercion

## [1] 0.6039204

But the real and imaginary parts are independent, so we can calculate the variance separately.

z <- makeCN(10)
var(Re(z)) + var(Im(z))

## [1] 0.5623849

To make this easier, we can create a function to find the variance.

varComplex <- function(z) var(Re(z)) + var(Im(z))

To get a good estimate we need a-lot of samples.

varComplex(makeCN(1000))

## [1] 1.016615

Let's set the variance to 2 and then estimate the variance of the samples.

varComplex(makeCN(1000,v=2))

## [1] 1.925119

Success!

Book Review: Introductory Time Series with R

2020-05-12T18:47:00.000-07:00

I'm a big fan of R and time series analysis, so I was excited to read the book "Introductory Time Series with R. I've been using the book for about 9 years, so I thought it was about time for a review! In this review, I'm going to cover the following topics: the amount of R content, the subject content, who is the book for, and my overall recommendation.

R Content
The R content is high. All the ideas in the book are heavily illustrated with R code. At the beginning of the book, the authors point out that they use Sweave to embed the code and plots. They also make use of online data sets, so you can type in examples from the book and reproduce the calculations and figures.

Time Series Content
The book covers all the time series topics you'd want in an introduction, plus a few specialty topics like multivariate models. Each chapter is a solid introduction to a topic in time series analysis.

Who is the Book For
1) People who what to learn time series analysis. It covers the theory, application, and has plenty of opportunities for hands-on learning. 2) People who want to learn to do time series analysis using R. That's why I bought the book! I'm familiar with time series analysis and I bought the book to learn how to do time series stuff with R. The book definitely delivered on that account. 3) People who want to teach a course on time series analysis. The book has plenty of examples and exercises (not sure if there is a solution manual).

Overall Recommendation
The book is well written. The R code is clear and well presented. The figures are numerous and informative. The book is wonderful and I highly recommend it!

https://www.amazon.com/gp/product/0387886974/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

Book Review: Advanced R

2019-07-29T18:16:00.000-07:00

When I first started using R, my code looked just like MATLAB code. I read an intro to R programming, did some data analysis using R, and I was using the arrow assignment operator! But otherwise, my R code looked very much like MATLAB. Then I read Advanced R by Hadley Wickham. I found out, while I wasn't doing anything wrong, I wasn't getting the most out of R.

Advanced R is very readable if you've used R and maybe some other programming experience. If you're self-taught in R and have no other programming experience, you may find the book a bit tough. The book does a great job describing the important features and how to make the most of them. I particularly liked the chapter on functions and the section on functional programming.

I highly recommend it!

https://www.amazon.com/dp/1466586966/ref=cm_sw_su_dp?tag=devtools-20

Update: A commenters pointed out there is a new version of the book out. The link is below.

https://www.amazon.com/Advanced-Second-Chapman-Hall-CRC/dp/0815384572/ref=pd_lpo_sbs_14_t_0?_encoding=UTF8&psc=1&refRID=11FT84CEP3CE0YQ6XYBJ

Blackman-Tukey Spectral Estimator in R

2019-06-26T16:58:00.001-07:00

Blackman-Tukey Spectral Estimator in R!

There are two definitions of the power spectral density (PSD). Both definitions are mathematically nearly identical and define a function that describes the distribution of power over the frequency components in our data set. The periodogram PSD estimator is based on the first definition of the PSD (see periodogram post). The Blackman-Tukey spectral estimator (BTSE) is based on the second definition. The second definition says, find the PSD by calculating the Fourier transform of the autocorrelation sequence (ACS). In R this definition is written as

PSD <- function(rxx) {
  fft(rxx)
}

where fft is the R implementation of the fast Fourier transform, rxx is the autocorrelation sequence (ACS), the k'th element of the ACS rxx[k] = E[x[0]x[k]], k -infinity to +infinity, and E is the expectation operator. The xx in rxx[k] is a reminder r is a correlation between x and itself. The rxx[k]s are sometimes called lags. The ACS has the propriety that rxx[-k]=rxx[k]*, where * is the complex conjugate. In the post, we will only use real numbers and I'll drop the * from here forward.

So, to find the PSD we just calculate rxx and take its fft! Unfortunately, in practice, we cannot do this. Calculating the expected value requires the probability density function (PDF) of x, which we don't know and we need an infinite amount of data, which we don't have. So, we can't calculate the PSD: we're doomed!

No, we are not doomed. We can't calculate the PSD, but we estimate it! We can derive an estimator for the PSD from the definition of the PSD. First, we replace rxx with an estimate of rxx. We replace the expected value, which gives the exact rxx, with an average, which gives us an estimate of rxx. The E[x[0]x[k]] is replaced with (1/N)(x[1]x[1+k]+x[2]x[2+k]+...+x[N-1-k]x[N-1]+x[N-k]x[N]), where N is the number of data samples. For example; if k=0, then rxx[k]=(1/N)*sum(x*x). In R code the estimate is written as

lagEstimate <-function(x,k,N=length(x)){
   (1/N)*sum(x[1:(N-k)]*x[(k+1):N])
}

If we had an infinite amount of data, N=infinity, we could use lagEstimate to estimate the entire infinite ACS. Unfortunately we don't have an infinite amount of data, even if we did it wouldn't fit into a computer. So, we can only estimate a finite amount of ASC elements. The function below calculates lags 0 to kMax.

Lags <-function(x,kMax) {
  sapply(0:kMax,lagEstimate,x=x)
}

Before we can try these functions out we need data. In this case the data came from a random process with the PSD plotted in the figure below. The x-axis is normalized frequency(frequency divided by the sampling rate). So, if the sampling rate was 1000 Hz, you could multiply the normalized frequency by 1000 Hz and then the frequency axis would read 0 Hz to 1000 Hz. The y-axis in in dB (10log10(amplitude)). You can see six large sharp peaks in the plot and a gradual dip towards 0 Hz and then back up. Some of the peaks are close together and will be hard to resolve.

The data produced by the random process is plotted below. This is the data we will use through this post. The data is posted here: x.RData

Let's calculate the the ACS up to the 5th lag using the data.

Lags(x,kMax=5)

## [1]  6.095786 -1.368336  3.341608  1.738122 -1.737459  3.651765

A kMax of 5 gives us 6 lags: {r[0], r[1], r[2], r[3], r[4], r[5]}. These 6 lags are not an ACS, but are part of an ACS.

We used Lags to estimate the positive lags up to kMax, but the ACS is even sequence, r[-k]=r[k] for all k. So, let's write a function to make a sequence consisting of lags from r[-kMax] to r[kMax]. This is a windowed ACS, values outside of the +/- kMax are replaced with 0. Where it won't cause confusion, I'll refer to the windowed ACS, as the ACS.

acsWindowed <- function(x,kMax,Nzero=0){
  rHalf <- c(Lags(x,kMax),rep(0,Nzero))
  c(rev(rHalf[2:length(rHalf)]),rHalf)
}

Let's try this function out.

acsW <- acsWindowed(x,9)

In the figure below you can see the r[0] lag, the maximum, is plotted in the middle of the plot.

The ACS in the figure above is how the ACS is usually plotted in textbooks. In textbooks the sum in the Fourier transform ranges from -N/2 to (N-1)/2. So, the r[0] lag should be in the center of the plot. In R the sum in the Fourier transform ranges from 1 to N, so the 0'th lag has to be first. We could just make the sequence in R form, but it is often handy to start in textbook from and switch to R form. We can write a function to make switching from textbook to R easy.

Textbook2R <- function(x,N=length(x),foldN=ceiling(N/2)) {
  c(x[foldN:N],x[1:(foldN-1)])
}

Notice in the figure below the maximum lag r[0], is plotted at the beginning.

Let's imagine we have an infinite amount of data and used it to estimated and infinite number of ACS lags. Let's call that sequence rAll. We make a windowed ACS by setting rW=rAll*W, where W=1 for our 9 lags and 0 everywhere else. W is called the rectangular window, because, as you can see in the plot below, it's plot looks like a rectangle. By default when we estimate a finite number of lags we are using a rectangular window.

W <- c(rep(0,9),rep(1,9),rep(0,9))

The reason we can not use a rectangular window is its Fourier Transform is not always positive. As you can see in the plot below there are several values below zero, indicated with dotted line. Re() functions removes some small imaginary numbers due to numerical error, some imaginary dust we have to sweep up.

FFT_W <- Re(fft(Textbook2R(W)))

Even though the fft of the ACS rAll is positive , the produce rAll and a rectangular window might not be positive! The Bartlett window is a simple window whos fft is positive.

BartlettWindow <- function(N,n=seq(0, (N-1)))  {
  1 - abs( (n-(N-1)/2) / ( (N-1)/2) )
}
Wb <- BartlettWindow(19)

As you can see in the plot below the Fourier transform of the Bartlett window is positive.

WbFft <- Re(fft(Textbook2R(Wb)))

Calculating the BTSE with R

Now that we can estimate the ACS and window our estimate, we are ready to estimate the PSD of our data. The BTSE is written as

Btse <- function(rHat,Wb) {
  Re(fft(rHat*Wb))
}

Note the Re() is correcting for numerical error.

In the first example we use a 19 point ACS lag sequence.

rHat   <- Textbook2R(acsWindowed(x,kMax=9))
Wb     <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse9 <- Btse(rHat,Wb)

In the figure below is the BTSE calculated with a maximum lag of 9. The dotted lines indicate the locations of the peaks in the PSD we are trying to estimate. The estimate with a maximum lage of only 9 produces a poor estimate.

We calculate a new estimate with a maximum lag of 18.

rHat    <- Textbook2R(acsWindowed(x,kMax=18))
Wb      <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse18 <- Btse(rHat,Wb)

The next estimate is made with a maximum lag of 18. This estimate is better, the peaks around 0.4 and 0.6 are not resolved. We still need to increase the maximum lag.

Finally we increase the maximum lag to 65 and recalculate the estimate.

rHat    <- Textbook2R(acsWindowed(x,kMax=65))
Wb      <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse65 <- Btse(rHat,Wb)

This finial estimate is very good. All six peaks are resolved and the location of our estimated peaks are very close to the true peak locations locations.

Final Thoughts

Could we use 500 lags in the BTSE? In this case we could, since we have a lot of data, but the higher lags get estimated with less data and therefore have more variance. Using the high variance lags will produce a higher variance estimate.

Are there other ways to improve the BTSE other than using more lags? Yes! There are a few other ways. For instance, we could zero pad the lags. Basically add zeros to the end of our lag sequence. This will make the fft, in the BTSE estimator, evaluate the estimate at more frequencies and we will be able to see more details in the estimated PSD.

Also keep in mind there are other PSD estimation methods that do better on other PSD features. For instance, if you we more interested finding deep nulls rather than peaks, a moving average PSD estimation would be better.

Periodogram with R

2019-06-13T18:05:00.000-07:00

Periodogram with R
The power spectral density (PSD) is a function that describes the distribution of power over the frequency components composing our data set. If we knew the process that generated the data, we could just calculate the PSD; we would not have to estimate it. Unfortunately, in practice we won't have access to the random process, only the samples (data) produced by the process. So, we can't get the true PSD, we can only get an estimate of the true PSD. In our example below, we made the data, so we know what the true PSD should look like.
The PSD is written as
where -1/2 <= f < 1/2 x is the data, e is the complex exponential, the sum of the data x multiplied by e is the Fourier transform. In English, calculate the PSD by finding the expected value of the magnitude squared of the Fourier transform as the amount of data increases to infinity. Well, that's a mouthful! The frequency f is expressed in normalized frequency. Normalized frequency is the frequency divided by the sampling rate. Normalized frequency is often used for simplicity. To convert to frequency in Hz multiply by the sampling rate.
The periodogram estimator is based on the definition in equation above. The definition has a limit and an expected value. To use the definition directly we would need an infinite amount of data, which we don't have, and we would need to know the probability density function (PDF) of the data which we don't know. To use this definition to get an estimator of the PSD, we must drop the limit and we must drop the expectation operator ($E$). We also can only evaluate a finite number of frequencies. After dropping the limit and the expectation we are left with
where 0 <= f < 1 and N is the number of data samples. Notice, the range of f changes from (-1/2,1/2) in the first equation to (0,1) in the second equation. Textbooks usually use the former and R uses the latter.
The periodogram is very easy to implement in R, but before we do we need to simulate some data. The code below first uses the set.seed command so R will produce the same "random" numbers each time. Then it creates a 32 normally distributed numbers and 32 points of a sine wave with a normalized frequency of 0.4 and a amplitude of 2. The signal is made up of a sine wave and the random points added together.
The figure below is a plot of the data generated above. To me it looks random and it is not obvious there is a sine wave in there.

set.seed(0)
N <-32
n<- 0:(N-1)
w <- rnorm(1:N)
f1 <- 0.4
A1 <- 2
s <- A1*sin(2*pi*f1*n)
x <- s + w

Now we are going to implement the periodogram in R. It is pretty much a direct translation from the math. We start with the 1/N and multiply that by the absolute value of the square of the Fourier transform of the data. Writing that in English was more work than implementing the periodogram in R!

xPer <- (1/N)*abs(fft(x)^2)
f    <- seq(0,1.0-1/N,by=1/N)

The figure below is a plot of the periodogram of the data. The dotted line marks the location of the frequency f1, the frequency of the sine wave of the data. Now the sine wave component really stands out!. Notice two things. First, the peaks of the periodogram seems to be a bit off the true values. Second, the plot looks jagged. Both of these things are caused by the same thing: the periodogram is only evaluated at 32 frequency bins. The frequencies we evaluate the periodogram on are called bins. We can fix both problems by evaluating the periodogram at more bins. One way to evaluate the periodogram at more points is to get more data! That would certainly fix both problems. Unfortunately, we often are struck with the data we have. There is another way.
Since f is a continuous variable we can evaluate the periodogram at as many points as we want. The way we do this is to pretend we have more data by sticking zeros at the end of our data, we zero pad it. The R code below adds 968 zeros to the end of x zero-padding it to a total of 1000 "data" points. As you can see in the figure below, we have fixed both problems. In the 32 point periodogram missed the peak, because the periodogram was not evaluated at enough points. The zero-padding does not guarantee we hit the peak, but it will get closer. Also, we fixed the jaggedness by evaluating the periodogram at many more bins.

xzp <- c(x,rep(0,1000-N))
Nzp <- length(xzp)

xPerZp <- (1/N)*abs(fft(xzp)^2)
fzp    <- seq(0,1.0-1/Nzp,by=1/Nzp)

JavaScript is Fast?

2019-06-03T23:42:00.001-07:00

JavaScript is Fast?

OK, I admit it. I'm always looking for the one true programming language. Mostly I use programming languages for signal processing, data science, that kind of thing. So, for me I want a language to be good for prototyping, flexible, fast, have lots of math/statistics libraries. The first language I use for signal processing was C, not much to say about that. I then became a heavy MATLAB user, great for a lot of things. I used MathCad. It was great for making readable code! I even used Mathematica. I never got the hang of it's syntax, it always felt award. Lately, I've been doing most of my stuff in R.

So, I find myself at the Julia site, at the time is was the next new mathematical programming language. They have a benchmark page showing how fast it is and it is fast! On that page they show JavaScript as faster than a bunch of standard math mathematically oriented languages. I would not have guessed that. Maybe, JavaScript should be the shiny "new" programming language for math!

What is a Statistic?

2019-06-02T17:15:00.001-07:00

What is a Statistic?

A statistic is a mathematical operation on a data set, performed to get information from the data.

Below is the R code that generates 20 random samples. The samples are uniformly distributed between 0 and 1. Uniformly means all the data samples are equally likely, like when you flip a coin heads and tails are equally likely.

x <- round(runif(20),2)
x

##  [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63 0.06 0.21 0.18 
## [14] 0.69 0.38 0.77 0.50 0.72 0.99 0.38

One thing we might want to know about this data set it the expected value (EV). The EV is the typical value of the data. Since we know the data is uniformly distributed between 0 and 1, the EV is just the middle of the range, 0.5. In practice, we have the data, but don't know the distribution the data came from. So, we can't calculate the EV, but we could use the average as an estimate of the EV. The R code below is the average.

sum(x)/20

## [1] 0.5615

The average of the data is about 0.56. The average is not the only way to estimate the EV, there are many ways! We could just use the first element of the data set 0.90 as the estimate. Another way is to find the maximum of the data and divide it by 2. The code below finds the max of the data and divides it by 2. Notice the max/2 produces an estimate that is closer to the true value of 0.5 than the average.

max(x)/2

## [1] 0.495

There are many way to estimate the EV, but some are better than others. Estimators of random data are also random. So, the only way to compare estimators is statistically. The case above where the max/2 produced a better estimate than the average could have been luck. The code below generates 20 data samples from the same uniform distribution above 20 times. We then find the estimate the EV in three ways: the average, the first sample, and the max(x)/2.

trials <- 100
avg     <- matrix(0,1,trials)
first   <- matrix(0,1,trials)
halfMax <- matrix(0,1,trials)
for (indx in 1:trials) {
  x <- runif(20)
  avg[indx]     <- sum(x)/20
  first[indx]   <- x[1]
  halfMax[indx] <- max(x)/2
}

The figure below contains 100 averages plotted with circles and true EV plotted in a line at 0.5. Most of the averages are between 0.4 and 0.6 and many are closer. The average seems to be a good estimate of the EV. Which shouldn't be surprising since the average is the most common statistic.

This figure is the estimate based only on the first element of the data set. The estimates are between just over 0 and almost 1. A few estimates are close to 0.5, but most are not. This is not a good estimator for EV. This is probably not surprising since this estimate is based on only one sample and the average is based on 20.

This figure is the plot of the estimate based on the max/2. Most of the estimates are very close to the true value. It looks even better than the average! This may seem surprising, since it looks like it is based on only one sample, the maximum sample, but it is actually makes use of all the samples. This is because you need to look at all the samples to know which one is the maximum.

The average is a good choice (and often the best choice) for estimating the EV, but it is not always the best estimator for EV. In the case of this particular distribution, the max/2 is better estimator of EV than the average.

So, we now know what a statistic is and we worked with a few. As a field of study, statistics uses visualization, probability and other math tools to find the best ways to get information from data.