Get Your Data On

Interpolation in the Frequency Domain Improved!

Introduction

This post describes interpolation in the frequency domain (IFD) and an improvement to IFD. IFD is a method of interpolation that is easy to use and produces good results on most time series data (data samples equally spaced in time). It is easy to use because it does not require any prior knowledge of the data. Also, since IFD uses the Fast Fourier Transform (FFT), it is fast (even on large data sets). The drawback to IFD is that it sometimes produces an interpolation that is not good at the start and end of the sequence (referred to as “end effects”). The improved IFD reduces the end effects. While end effects are not always an issue, it is always better to not have them.

The first section briefly introduces interpolation. The next section defines the standard IFD algorithm and shows examples of its use. The final section shows an improved IFD that reduces end effects.

Interpolation Example

Let’s say we check a rain gauge at 7:00 and then once an hour for five hours, resulting in the following measurements: 0.01 cm, 0.19 cm, 0.31 cm, 0.40 cm, 0.52 cm, and 0.59 cm. We didn’t measure the rain at 7:30, but we could estimate the rainfall at 7:30 using a linear spline to interpolate (connecting the points with straight lines).

The figure below shows the five rainfall values from above and an illustration of interpolating a rainfall value at 730. The figure shows the five measurements as black circles, the interpolating blue line connecting the points 700 and 800, and the interpolated point at 0730 shown as a red circle. The red circle shows that the rainfall at 730 is 0.1 cm.

Interpolation in the Frequency Domain

Linear spline interpolation is effective if the underlying data can be modeled by a straight line. IFD works well regardless of the type of data.

The steps in the algorithm for IFD are enumerated below.

Find the discrete Fourier transform (DFT) of the data using the built-in function fft, which is R’s implementation of the FFT. The data started in the time domain (data indexed by time). After the transform, the data is in the frequency domain (data indexed by frequency).
Add zeros to the data. This is called zero padding, and it forces the inverse DFT (IDFT) in the next step to evaluate the data at more points. When adding zeros to the frequency domain data, the zeros must go in the middle. This step is slightly different for even and odd length sequences.
Perform the IDFT on the zero padded DFT sequence.

These steps are explained below.

Find the DFT of the Sequence

To illustrate this step, we use the rain fall values from above.

First, we use the fft function to find the DFT of the sequence. Below, I’m going to stop saying, “Use the fft function to find the DFT.” I’ll just say find the FFT. Note: to take up less space on the screen, I’ll round the output to two decimal places.

N <- length(x)
X <- fft(x)
round(X,digits = 2)

## [1]  1.43+0.00i -0.34+0.37i -0.34+0.11i -0.34-0.11i -0.34-0.37i

Zero Pad the Data

In this case, we want to double the number of points, so we add five zeros to X. The zeros have to be added in the “middle” of the sequence. If you are familiar with Fourier transform theory, you’ll note that this is between the positive and negative frequencies.

Xzp <- c(X[1],X[2:3],rep(0,5),X[4:5])
round(Xzp,digits = 2)

##  [1]  1.43+0.00i -0.34+0.37i -0.34+0.11i  0.00+0.00i  0.00+0.00i  0.00+0.00i
##  [7]  0.00+0.00i  0.00+0.00i -0.34-0.11i -0.34-0.37i

Note if the DFT samples X had an even number of samples, for instance 6, then zero padded DFT Xzp would be c(X[1], X[2:3], X[4]/2, rep(0, N), X[4]/2, X[5:6]). The X[N/2+1] sample (in this case, X[4]) is used twice but cut in half to keep the energy the same. I show an example of this in the IFD Section.

Inverse DFT

Now, we find the inverse FFT. The inverse FFT is scaled by one over the number of samples. Notice we are dividing by the number of actual data points (five) and not the total number of points in the padded sequence (ten). If we divide by ten, we bias the result low because the zeros don’t add any energy to the sequence.

xinterp <- fft(Xzp,inverse = TRUE)/N
round(xinterp,digits=2)

##  [1] 0.01+0i 0.00+0i 0.19+0i 0.33+0i 0.31+0i 0.29+0i 0.40+0i 0.55+0i 0.52+0i
## [10] 0.26+0i

Since we started with a real sequence, we should end with a real sequence. However, the fft output is complex, so we convert the sequence to real using the Re function.

xinterp <- Re(xinterp)
round(xinterp,digits = 2)

##  [1] 0.01 0.00 0.19 0.33 0.31 0.29 0.40 0.55 0.52 0.26

The figure below contains two plots. The interpolated sequence in blue x’s and the original sequence in black circles (the circles are hard to see because they have x’s on them). Note the interpolation (especially the points at 7:30 and 12:30) is not great. This method suffers from end effects and performs poorly with short sequences. We show a method that mitigates the end effects in the IFD Function Improved section.

Example IFD with a Longer Sequence

Now, let’s try IFD with a longer sequence. Below is a 49 sample ramp.

N <- 49
x <-1:N

Since we now know the algorithm, I’ll go through the steps with less explanation. First, find the FFT.

X <-fft(x)

Second, zero pad by 2.

Xzp <- c(X[1:ceiling(N/2)],rep(0,N),X[(ceiling(N/2)+1):N])

Finally, the IDFT.

xInterp <- Re(fft(Xzp,inverse = TRUE)/N)

The figure below contains two plots. The interpolated sequence is shown as a blue line, and the ramp sequence is shown as black circles. You can see the end effects at the start and end of the plot. After the end effects, the interpolated points are close to the ramp.

The figure below contains a zoomed-in view of the one above. A blue line shows the interpolated sequence, and black circles show the original ramp samples. You can see that after the effects at the beginning, the interpolation is very good.

IFD Function

Here, we put the IFD steps into the function findIfd, which inputs a data sequence dataSet and the interpolation factor M. To keep the function simple, M is restricted to an integer value. The function first checks to see if the interpolation factor is an integer, not less than 1. Next, it finds the FFT of dataSet and sets N to the length of dataSet. The function determines if dataSet has an even or odd number of samples and then places the zeros in the middle. After zero padding, it finds the IFFT of the zero-padded sequence. Finally, it returns the interpolated sequence, converting the sequence to real if the input dataSet was real. Note the R function is.numeric returns true if the sequence is real.

findIfd <- function(dataSet,M=2) {
  
  if (round(M)!=M || M<1) {
    stop("M must be an integer larger than zero.")
  }
  
  X <- fft(dataSet)
  N <- length(X)
  
  if(floor(N/2)==N/2){
    Xzp <- c(X[1:(N/2)],X[N/2+1]/2,rep(0,(M-1)*N-1),X[N/2+1]/2,X[(N/2+2):N])
    
  } else {
    Xzp <- c(X[1:ceiling(N/2)],rep(0,((M-1)*N)),X[(ceiling(N/2)+1):N])
  }
  xint <- fft(Xzp,inverse = TRUE)/N
  
  if (is.numeric(dataSet))
      Re(xint[1:(M*N)])
  else
      xint[1:(M*N)]
}

Example: IFD Function and the Ramp

Below, we use the findIfd to interpolate the ramp. You can see we started with a 50 sample ramp and used an interpolation factor of 8, so the length of the interpolated sequence is 400.

N <- 50
x <-1:N
xInterp <- findIfd(x,M=8)

The plot below shows the interpolated sequence in blue (800 points make the plot hard to read), and the ramp sequence is shown in black points. As shown above, we can see that the interpolation is good away from the middle.

Example: Interpolating a Sine Wave

IFD works better when the function is periodic and smooth. In this example, we interpolate a sine sequence, and since it is periodic and smooth, we should get a better result.

N <- 50
n <- 0:(N-1)
f <- 0.02
x <- sin(2*pi*f*n)
M <- 4
xInterp <- findIfd(x,M)
Ndx <- seq(0,N-1/M,1/M)

The figure below contains two plots. The black circles are the original sine sequence, and the blue line shows the interpolation. You can see this is a very good interpolation with no end effects.

IFD Function Improved

The improved version of the IFD function “mirrors” the sequence before performing the interpolation. By mirror, I mean appending a reversed version of the sequence to the end of the original sequence. See below for an example of mirroring.

x <- 1:5
c(x,rev(x))

##  [1] 1 2 3 4 5 5 4 3 2 1

The function findIfdImproved inputs a sequence x and the interpolation factor M and then outputs an interpolated sequence. The difference between this function and findIfd is this function mirrors the sequence before implementing the IFD algorithm. The extra samples at the end of the interpolated sequence (caused by the mirroring) are removed before returning the interpolated sequence. Note mirroring doubles the sequence, making it even. So, we only need to use the zero padding method for even length sequences. Note that findIfdImproved doubles the sequence but does not take twice as long due to the proprieties of the FFT.

findIfdImproved <- function(x,M) {
  
  if (round(M)!=M || M<1 ) {
    stop("M must be an integer larger than zero.")
  }
  
  X <- fft(c(x,rev(x)))
  N <- length(X)
  
  Xzp <- c(X[1:(N/2)],X[N/2+1]/2,rep(0,(M-1)*N-1),X[N/2+1]/2,X[(N/2+2):N])

  xint <- fft(Xzp,inverse = TRUE)/N
  
  if (is.numeric(x)) {
    xint <- Re(xint[1:(M*length(x))]) }
  else{ 
    xint <- xint[1:(M*length(x))] }
}

Below, we use the findIfdImproved to interpolate the 50 point ramp, with an interpolation factor of 8.

N <- 50
x <-1:N
xInterp <- findIfdImproved(x,M=8)

The plot below shows the interpolated sequence as a blue line and the ramp sequence as black points. We see the findIfdImproved produces an improved interpolation with almost no end effects (you can see slight end effects at the end of the sequence).

Conclution

IFD is an easy-to-use method of interpolation that produces good results, except for the end effects. The improved method significantly reduces the end effects, at the cost of additional processing.