Interpolation in the Frequency Domain Improved!
Introduction
This post describes interpolation in the frequency domain (IFD) and an improvement to IFD. IFD is a method of interpolation that is easy to use and produces good results on most time series data (data samples equally spaced in time). It is easy to use because it does not require any prior knowledge of the data. Also, since IFD uses the Fast Fourier Transform (FFT), it is fast (even on large data sets). The drawback to IFD is that it sometimes produces an interpolation that is not good at the start and end of the sequence (referred to as “end effects”). The improved IFD reduces the end effects. While end effects are not always an issue, it is always better to not have them.
The first section briefly introduces interpolation. The next section defines the standard IFD algorithm and shows examples of its use. The final section shows an improved IFD that reduces end effects.
Interpolation Example
Let’s say we check a rain gauge at 7:00 and then once an hour for five hours, resulting in the following measurements: 0.01 cm, 0.19 cm, 0.31 cm, 0.40 cm, 0.52 cm, and 0.59 cm. We didn’t measure the rain at 7:30, but we could estimate the rainfall at 7:30 using a linear spline to interpolate (connecting the points with straight lines).
The figure below shows the five rainfall values from above and an
illustration of interpolating a rainfall value at 730. The figure shows
the five measurements as black circles, the interpolating blue line
connecting the points 700 and 800, and the interpolated point at 0730
shown as a red circle. The red circle shows that the rainfall at 730 is
0.1 cm.
Interpolation in the Frequency Domain
Linear spline interpolation is effective if the underlying data can be modeled by a straight line. IFD works well regardless of the type of data.
The steps in the algorithm for IFD are enumerated below.
- Find the discrete Fourier transform (DFT) of the data using the
built-in function
fft
, which is R’s implementation of the FFT. The data started in the time domain (data indexed by time). After the transform, the data is in the frequency domain (data indexed by frequency).
- Add zeros to the data. This is called zero padding, and it forces the inverse DFT (IDFT) in the next step to evaluate the data at more points. When adding zeros to the frequency domain data, the zeros must go in the middle. This step is slightly different for even and odd length sequences.
- Perform the IDFT on the zero padded DFT sequence.
These steps are explained below.
Find the DFT of the Sequence
To illustrate this step, we use the rain fall values from above.
First, we use the fft
function to find the DFT of the
sequence. Below, I’m going to stop saying, “Use the fft
function to find the DFT.” I’ll just say find the FFT. Note: to take up
less space on the screen, I’ll round the output to two decimal
places.
N <- length(x)
X <- fft(x)
round(X,digits = 2)
## [1] 1.43+0.00i -0.34+0.37i -0.34+0.11i -0.34-0.11i -0.34-0.37i
Zero Pad the Data
In this case, we want to double the number of points, so we add five zeros to X. The zeros have to be added in the “middle” of the sequence. If you are familiar with Fourier transform theory, you’ll note that this is between the positive and negative frequencies.
Xzp <- c(X[1],X[2:3],rep(0,5),X[4:5])
round(Xzp,digits = 2)
## [1] 1.43+0.00i -0.34+0.37i -0.34+0.11i 0.00+0.00i 0.00+0.00i 0.00+0.00i
## [7] 0.00+0.00i 0.00+0.00i -0.34-0.11i -0.34-0.37i
Note if the DFT samples X had an even number of samples, for instance 6, then zero padded DFT Xzp would be c(X[1], X[2:3], X[4]/2, rep(0, N), X[4]/2, X[5:6]). The X[N/2+1] sample (in this case, X[4]) is used twice but cut in half to keep the energy the same. I show an example of this in the IFD Section.
Inverse DFT
Now, we find the inverse FFT. The inverse FFT is scaled by one over the number of samples. Notice we are dividing by the number of actual data points (five) and not the total number of points in the padded sequence (ten). If we divide by ten, we bias the result low because the zeros don’t add any energy to the sequence.
xinterp <- fft(Xzp,inverse = TRUE)/N
round(xinterp,digits=2)
## [1] 0.01+0i 0.00+0i 0.19+0i 0.33+0i 0.31+0i 0.29+0i 0.40+0i 0.55+0i 0.52+0i
## [10] 0.26+0i
Since we started with a real sequence, we should end with a real
sequence. However, the fft
output is complex, so we convert
the sequence to real using the Re
function.
xinterp <- Re(xinterp)
round(xinterp,digits = 2)
## [1] 0.01 0.00 0.19 0.33 0.31 0.29 0.40 0.55 0.52 0.26
The figure below contains two plots. The interpolated sequence in
blue x’s and the original sequence in black circles (the circles are
hard to see because they have x’s on them). Note the interpolation
(especially the points at 7:30 and 12:30) is not great. This method
suffers from end effects and performs poorly with short sequences. We
show a method that mitigates the end effects in the IFD Function
Improved section.
Example IFD with a Longer Sequence
Now, let’s try IFD with a longer sequence. Below is a 49 sample ramp.
N <- 49
x <-1:N
Since we now know the algorithm, I’ll go through the steps with less explanation. First, find the FFT.
X <-fft(x)
Second, zero pad by 2.
Xzp <- c(X[1:ceiling(N/2)],rep(0,N),X[(ceiling(N/2)+1):N])
Finally, the IDFT.
xInterp <- Re(fft(Xzp,inverse = TRUE)/N)
The figure below contains two plots. The interpolated sequence is
shown as a blue line, and the ramp sequence is shown as black circles.
You can see the end effects at the start and end of the plot. After the
end effects, the interpolated points are close to the ramp.
The figure below contains a zoomed-in view of the one above. A blue
line shows the interpolated sequence, and black circles show the
original ramp samples. You can see that after the effects at the
beginning, the interpolation is very good.
IFD Function
Here, we put the IFD steps into the function findIfd
,
which inputs a data sequence dataSet
and the interpolation
factor M
. To keep the function simple, M
is
restricted to an integer value. The function first checks to see if the
interpolation factor is an integer, not less than 1. Next, it finds the
FFT of dataSet
and sets N
to the length of
dataSet
. The function determines if dataSet
has an even or odd number of samples and then places the zeros in the
middle. After zero padding, it finds the IFFT of the zero-padded
sequence. Finally, it returns the interpolated sequence, converting the
sequence to real if the input dataSet
was real. Note the R
function is.numeric
returns true if the sequence is
real.
findIfd <- function(dataSet,M=2) {
if (round(M)!=M || M<1) {
stop("M must be an integer larger than zero.")
}
X <- fft(dataSet)
N <- length(X)
if(floor(N/2)==N/2){
Xzp <- c(X[1:(N/2)],X[N/2+1]/2,rep(0,(M-1)*N-1),X[N/2+1]/2,X[(N/2+2):N])
} else {
Xzp <- c(X[1:ceiling(N/2)],rep(0,((M-1)*N)),X[(ceiling(N/2)+1):N])
}
xint <- fft(Xzp,inverse = TRUE)/N
if (is.numeric(dataSet))
Re(xint[1:(M*N)])
else
xint[1:(M*N)]
}
Example: IFD Function and the Ramp
Below, we use the findIfd
to interpolate the ramp. You
can see we started with a 50 sample ramp and used an interpolation
factor of 8, so the length of the interpolated sequence is 400.
N <- 50
x <-1:N
xInterp <- findIfd(x,M=8)
The plot below shows the interpolated sequence in blue (800 points
make the plot hard to read), and the ramp sequence is shown in black
points. As shown above, we can see that the interpolation is good away
from the middle.
Example: Interpolating a Sine Wave
IFD works better when the function is periodic and smooth. In this example, we interpolate a sine sequence, and since it is periodic and smooth, we should get a better result.
N <- 50
n <- 0:(N-1)
f <- 0.02
x <- sin(2*pi*f*n)
M <- 4
xInterp <- findIfd(x,M)
Ndx <- seq(0,N-1/M,1/M)
The figure below contains two plots. The black circles are the
original sine sequence, and the blue line shows the interpolation. You
can see this is a very good interpolation with no end effects.
IFD Function Improved
The improved version of the IFD function “mirrors” the sequence before performing the interpolation. By mirror, I mean appending a reversed version of the sequence to the end of the original sequence. See below for an example of mirroring.
x <- 1:5
c(x,rev(x))
## [1] 1 2 3 4 5 5 4 3 2 1
The function findIfdImproved
inputs a sequence
x
and the interpolation factor M
and then
outputs an interpolated sequence. The difference between this function
and findIfd
is this function mirrors the sequence before
implementing the IFD algorithm. The extra samples at the end of the
interpolated sequence (caused by the mirroring) are removed before
returning the interpolated sequence. Note mirroring doubles the
sequence, making it even. So, we only need to use the zero padding
method for even length sequences. Note that findIfdImproved
doubles the sequence but does not take twice as long due to the
proprieties of the FFT.
findIfdImproved <- function(x,M) {
if (round(M)!=M || M<1 ) {
stop("M must be an integer larger than zero.")
}
X <- fft(c(x,rev(x)))
N <- length(X)
Xzp <- c(X[1:(N/2)],X[N/2+1]/2,rep(0,(M-1)*N-1),X[N/2+1]/2,X[(N/2+2):N])
xint <- fft(Xzp,inverse = TRUE)/N
if (is.numeric(x)) {
xint <- Re(xint[1:(M*length(x))]) }
else{
xint <- xint[1:(M*length(x))] }
}
Below, we use the findIfdImproved
to interpolate the 50
point ramp, with an interpolation factor of 8.
N <- 50
x <-1:N
xInterp <- findIfdImproved(x,M=8)
The plot below shows the interpolated sequence as a blue line and the
ramp sequence as black points. We see the findIfdImproved
produces an improved interpolation with almost no end effects (you can
see slight end effects at the end of the sequence).
Conclution
IFD is an easy-to-use method of interpolation that produces good results, except for the end effects. The improved method significantly reduces the end effects, at the cost of additional processing.
No comments:
Post a Comment