Wednesday, June 24, 2020

Dungeons and Dragons: Advantage

D20 and Random Events

In the game Dungeons and Dragons, the success or failure of an event is determined by rolling a 20 sided die (D20): higher is better. If you need to roll 11 or higher you have a 50% chance of success. If another event requires 10 or better you now have a 55% chance for success. Each point the roll goes up or down is worth 5%. Often times, rolling a 20 is a critical success and a 1 is a critical failure; critical means it's extra good/bad.

Advantage

Sometimes things are really going your way and you roll with advantage. Sometimes things are not in your favor and you roll with disadvantage. When rolling with advantage, roll two dice and pick the higher. When rolling with disadvantage, roll two dice and pick the lower. How do advantage and disadvantage affect the chance of getting a 20 or 1?

When rolling without advantage or disadvantage, the probability of getting a 20 or a 1 is 1/20 = 0.05 or 5%. The probability of not getting a twenty 1 - 1/20 = 19/20.

P20    <- 0.05
Pnot20 <- 1-P20

When you have advantage, to not get a 20, you have to not roll a 20 twice. The probability of getting a 20 is (1 - the proability of not getting a 20 twice), and as you can see below, is almost 10%. Advantage about doubles you chance of getting a 20. But you probability guessed that since you're rolling twice. :) The chance of getting a 1 is 0.05^2=0.0025, almost 0!

Pnot20*Pnot20
## [1] 0.9025
1 - Pnot20*Pnot20
## [1] 0.0975

How will advantage and disadvantage affect the average roll? This time let's estimate the answer using a simulation. A 1000 trials will give a good estimate.

Trials <-10000

Below we simulate rolling two D20s

set.seed(1)
x <- sample(1:20,Trials,replace = TRUE)
y <- sample(1:20,Trials,replace = TRUE)

When we have advantage, roll two D20 and pick the max. With disadvantage roll two D20 and pick the min.

RollsWithAdvantage <- apply(cbind(x,y), 1,max)
Advantage    <-mean(RollsWithAdvantage)
RollsWithDisadvantage <- apply(cbind(x,y), 1,min)
Disadvantage <-mean(RollsWithDisadvantage)

Below is the simulated advantage mean, the calculated mean of a regular D20 roll, and the simulated disadvantage mean. Advantage adds about 3.4 points and disadvantage subtracts 3.4 or about +/- 17%.

Advantage
## [1] 13.8863
mean(1:20)
## [1] 10.5
Disadvantage
## [1] 7.1287

Below are 3 figures with histograms. The first histogram is made with 10,000 simulated D20 advantage rolls. Advantage moves a lot of probability to the right. The second histogram is made using 10,000 simulated D20 regular rolls. The histogram is approximately flat and each number is at about 0.05% and that's what we calculated. The last histogram is made with 10,000 simulated D20 disadvantage rolls. Disadvantage moves an equal amount to the left.

The probability of getting 11 or better with advantage is 1- probability of getting 10 or less twice. Rolling with advantage moves a 50% chance to a 75% chance!

p10orLess <- 0.5
1 - p10orLess^2
## [1] 0.75

Saturday, June 20, 2020

Complex Normal Samples In R

Normal Samples

If we want 10 samples from a Gaussian or normal random process with variance 4 can use rnorm(10,sd=2). Remember the standard deviation (sd) is the square root of the variance.

set.seed(2)
x <- rnorm(10,sd=2)
x
##  [1] -1.7938291  0.3696984  3.1756907 -2.2607513 -0.1605035  0.2648406
##  [7]  1.4159095 -0.4793960  3.9689479 -0.2775740
var(x)
## [1] 3.880803

The var() function produces an estimate of the variance, if we want a better estimate we need more samples.

var(rnorm(1000,sd=2))
## [1] 4.105966

Complex Normal Samples

If we are using base R and want complex normal (CN) samples, we need to write our own function. When the signal processing literature refers to CN they are usually referring to circularly-symmetric CN. Circularly-symmetric means the samples are independent and their mean is 0.

The function produces N CN samples with variance v. The real and imaginary parts are independent, because they are produced by different calls to rnorm(). Let x,y be independent. The var(ax) = a^2var(x) and var(x+y)=var(x)+var(y). So, if we want a variance of 1 would have to start a variance of sqrt(1/2).

makeCN <- function(N,v=1) {(sqrt(v/2))*rnorm(N) + (sqrt(v/2))*1i*rnorm(N)}
makeCN(10)
##  [1]  0.0023376-0.2079938i  0.7613032+0.6053620i  0.3946671-0.4049715i
##  [4]  0.4892950-0.1207824i  0.4651165-0.2871364i -0.2312504+0.9408834i
##  [7] -0.2153405-0.9648887i -1.0994866+1.0119199i  1.0396552+0.7824796i
## [10]  0.1147878+0.9059002i

If we want to check the variance, we can't use var() directly.

var(makeCN(10))
## Warning in var(makeCN(10)): imaginary parts discarded in coercion
## [1] 0.6039204

But the real and imaginary parts are independent, so we can calculate the variance separately.

z <- makeCN(10)
var(Re(z)) + var(Im(z))
## [1] 0.5623849

To make this easier, we can create a function to find the variance.

varComplex <- function(z) var(Re(z)) + var(Im(z))

To get a good estimate we need a-lot of samples.

varComplex(makeCN(1000))
## [1] 1.016615

Let's set the variance to 2 and then estimate the variance of the samples.

varComplex(makeCN(1000,v=2))
## [1] 1.925119

Success!

Tuesday, May 12, 2020

Book Review: Introductory Time Series with R

I'm a big fan of R and time series analysis, so I was excited to read the book "Introductory Time Series with R. I've been using the book for about 9 years, so I thought it was about time for a review! In this review, I'm going to cover the following topics: the amount of R content, the subject content, who is the book for, and my overall recommendation.

R Content
The R content is high. All the ideas in the book are heavily illustrated with R code. At the beginning of the book, the authors point out that they use Sweave to embed the code and plots. They also make use of online data sets, so you can type in examples from the book and reproduce the calculations and figures.

Time Series Content
The book covers all the time series topics you'd want in an introduction, plus a few specialty topics like multivariate models. Each chapter is a solid introduction to a topic in time series analysis.

Who is the Book For
1) People who what to learn time series analysis. It covers the theory, application, and has plenty of opportunities for hands-on learning. 2) People who want to learn to do time series analysis using R. That's why I bought the book! I'm familiar with time series analysis and I bought the book to learn how to do time series stuff with R. The book definitely delivered on that account. 3) People who want to teach a course on time series analysis. The book has plenty of examples and exercises (not sure if there is a solution manual).

Overall Recommendation
The book is well written. The R code is clear and well presented. The figures are numerous and informative. The book is wonderful and I highly recommend it!

https://www.amazon.com/gp/product/0387886974/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1

Monday, July 29, 2019

Book Review: Advanced R

When I first started using R, my code looked just like MATLAB code. I read an intro to R programming, did some data analysis using R, and I was using the arrow assignment operator! But otherwise, my R code looked very much like MATLAB. Then I read Advanced R by Hadley Wickham. I found out, while I wasn't doing anything wrong, I wasn't getting the most out of R.

Advanced R is very readable if you've used R and maybe some other programming experience. If you're self-taught in R and have no other programming experience, you may find the book a bit tough. The book does a great job describing the important features and how to make the most of them. I particularly liked the chapter on functions and the section on functional programming.

I highly recommend it!

 https://www.amazon.com/dp/1466586966/ref=cm_sw_su_dp?tag=devtools-20

Update: A commenters pointed out there is a new version of the book out. The link is below.


https://www.amazon.com/Advanced-Second-Chapman-Hall-CRC/dp/0815384572/ref=pd_lpo_sbs_14_t_0?_encoding=UTF8&psc=1&refRID=11FT84CEP3CE0YQ6XYBJ

Wednesday, June 26, 2019

Blackman-Tukey Spectral Estimator in R

Blackman-Tukey Spectral Estimator in R!

There are two definitions of the power spectral density (PSD). Both definitions are mathematically nearly identical and define a function that describes the distribution of power over the frequency components in our data set. The periodogram PSD estimator is based on the first definition of the PSD (see periodogram post). The Blackman-Tukey spectral estimator (BTSE) is based on the second definition. The second definition says, find the PSD by calculating the Fourier transform of the autocorrelation sequence (ACS). In R this definition is written as

PSD <- function(rxx) {
  fft(rxx)
}

where fft is the R implementation of the fast Fourier transform, rxx is the autocorrelation sequence (ACS), the k'th element of the ACS rxx[k] = E[x[0]x[k]], k -infinity to +infinity, and E is the expectation operator. The xx in rxx[k] is a reminder r is a correlation between x and itself. The rxx[k]s are sometimes called lags. The ACS has the propriety that rxx[-k]=rxx[k]*, where * is the complex conjugate. In the post, we will only use real numbers and I'll drop the * from here forward.

So, to find the PSD we just calculate rxx and take its fft! Unfortunately, in practice, we cannot do this. Calculating the expected value requires the probability density function (PDF) of x, which we don't know and we need an infinite amount of data, which we don't have. So, we can't calculate the PSD: we're doomed!

No, we are not doomed. We can't calculate the PSD, but we estimate it! We can derive an estimator for the PSD from the definition of the PSD. First, we replace rxx with an estimate of rxx. We replace the expected value, which gives the exact rxx, with an average, which gives us an estimate of rxx. The E[x[0]x[k]] is replaced with (1/N)(x[1]x[1+k]+x[2]x[2+k]+...+x[N-1-k]x[N-1]+x[N-k]x[N]), where N is the number of data samples. For example; if k=0, then rxx[k]=(1/N)*sum(x*x). In R code the estimate is written as

lagEstimate <-function(x,k,N=length(x)){
   (1/N)*sum(x[1:(N-k)]*x[(k+1):N])
}

If we had an infinite amount of data, N=infinity, we could use lagEstimate to estimate the entire infinite ACS. Unfortunately we don't have an infinite amount of data, even if we did it wouldn't fit into a computer. So, we can only estimate a finite amount of ASC elements. The function below calculates lags 0 to kMax.

Lags <-function(x,kMax) {
  sapply(0:kMax,lagEstimate,x=x)
}

Before we can try these functions out we need data. In this case the data came from a random process with the PSD plotted in the figure below. The x-axis is normalized frequency(frequency divided by the sampling rate). So, if the sampling rate was 1000 Hz, you could multiply the normalized frequency by 1000 Hz and then the frequency axis would read 0 Hz to 1000 Hz. The y-axis in in dB (10log10(amplitude)). You can see six large sharp peaks in the plot and a gradual dip towards 0 Hz and then back up. Some of the peaks are close together and will be hard to resolve.

The data produced by the random process is plotted below. This is the data we will use through this post. The data is posted here: x.RData

Let's calculate the the ACS up to the 5th lag using the data.

Lags(x,kMax=5)
## [1]  6.095786 -1.368336  3.341608  1.738122 -1.737459  3.651765

A kMax of 5 gives us 6 lags: {r[0], r[1], r[2], r[3], r[4], r[5]}. These 6 lags are not an ACS, but are part of an ACS.

We used Lags to estimate the positive lags up to kMax, but the ACS is even sequence, r[-k]=r[k] for all k. So, let's write a function to make a sequence consisting of lags from r[-kMax] to r[kMax]. This is a windowed ACS, values outside of the +/- kMax are replaced with 0. Where it won't cause confusion, I'll refer to the windowed ACS, as the ACS.

acsWindowed <- function(x,kMax,Nzero=0){
  rHalf <- c(Lags(x,kMax),rep(0,Nzero))
  c(rev(rHalf[2:length(rHalf)]),rHalf)
}

Let's try this function out.

acsW <- acsWindowed(x,9)

In the figure below you can see the r[0] lag, the maximum, is plotted in the middle of the plot.

The ACS in the figure above is how the ACS is usually plotted in textbooks. In textbooks the sum in the Fourier transform ranges from -N/2 to (N-1)/2. So, the r[0] lag should be in the center of the plot. In R the sum in the Fourier transform ranges from 1 to N, so the 0'th lag has to be first. We could just make the sequence in R form, but it is often handy to start in textbook from and switch to R form. We can write a function to make switching from textbook to R easy.

Textbook2R <- function(x,N=length(x),foldN=ceiling(N/2)) {
  c(x[foldN:N],x[1:(foldN-1)])
}

Notice in the figure below the maximum lag r[0], is plotted at the beginning.

Let's imagine we have an infinite amount of data and used it to estimated and infinite number of ACS lags. Let's call that sequence rAll. We make a windowed ACS by setting rW=rAll*W, where W=1 for our 9 lags and 0 everywhere else. W is called the rectangular window, because, as you can see in the plot below, it's plot looks like a rectangle. By default when we estimate a finite number of lags we are using a rectangular window.

W <- c(rep(0,9),rep(1,9),rep(0,9))

The reason we can not use a rectangular window is its Fourier Transform is not always positive. As you can see in the plot below there are several values below zero, indicated with dotted line. Re() functions removes some small imaginary numbers due to numerical error, some imaginary dust we have to sweep up.

FFT_W <- Re(fft(Textbook2R(W)))

Even though the fft of the ACS rAll is positive , the produce rAll and a rectangular window might not be positive! The Bartlett window is a simple window whos fft is positive.

BartlettWindow <- function(N,n=seq(0, (N-1)))  {
  1 - abs( (n-(N-1)/2) / ( (N-1)/2) )
}
Wb <- BartlettWindow(19)

As you can see in the plot below the Fourier transform of the Bartlett window is positive.

WbFft <- Re(fft(Textbook2R(Wb)))

Calculating the BTSE with R

Now that we can estimate the ACS and window our estimate, we are ready to estimate the PSD of our data. The BTSE is written as

Btse <- function(rHat,Wb) {
  Re(fft(rHat*Wb))
}

Note the Re() is correcting for numerical error.

In the first example we use a 19 point ACS lag sequence.

rHat   <- Textbook2R(acsWindowed(x,kMax=9))
Wb     <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse9 <- Btse(rHat,Wb)

In the figure below is the BTSE calculated with a maximum lag of 9. The dotted lines indicate the locations of the peaks in the PSD we are trying to estimate. The estimate with a maximum lage of only 9 produces a poor estimate.

We calculate a new estimate with a maximum lag of 18.

rHat    <- Textbook2R(acsWindowed(x,kMax=18))
Wb      <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse18 <- Btse(rHat,Wb)

The next estimate is made with a maximum lag of 18. This estimate is better, the peaks around 0.4 and 0.6 are not resolved. We still need to increase the maximum lag.

Finally we increase the maximum lag to 65 and recalculate the estimate.

rHat    <- Textbook2R(acsWindowed(x,kMax=65))
Wb      <- Textbook2R(BartlettWindow(length(rHat)))
Pbtse65 <- Btse(rHat,Wb)

This finial estimate is very good. All six peaks are resolved and the location of our estimated peaks are very close to the true peak locations locations.

Final Thoughts

Could we use 500 lags in the BTSE? In this case we could, since we have a lot of data, but the higher lags get estimated with less data and therefore have more variance. Using the high variance lags will produce a higher variance estimate.

Are there other ways to improve the BTSE other than using more lags? Yes! There are a few other ways. For instance, we could zero pad the lags. Basically add zeros to the end of our lag sequence. This will make the fft, in the BTSE estimator, evaluate the estimate at more frequencies and we will be able to see more details in the estimated PSD.

Also keep in mind there are other PSD estimation methods that do better on other PSD features. For instance, if you we more interested finding deep nulls rather than peaks, a moving average PSD estimation would be better.

Thursday, June 13, 2019

Periodogram with R

Periodogram with R
The power spectral density (PSD) is a function that describes the distribution of power over the frequency components composing our data set. If we knew the process that generated the data, we could just calculate the PSD; we would not have to estimate it. Unfortunately, in practice we won't have access to the random process, only the samples (data) produced by the process. So, we can't get the true PSD, we can only get an estimate of the true PSD. In our example below, we made the data, so we know what the true PSD should look like.
The PSD is written as
where -1/2 <= f < 1/2 x is the data, e is the complex exponential, the sum of the data x multiplied by e is the Fourier transform. In English, calculate the PSD by finding the expected value of the magnitude squared of the Fourier transform as the amount of data increases to infinity. Well, that's a mouthful! The frequency f is expressed in normalized frequency. Normalized frequency is the frequency divided by the sampling rate. Normalized frequency is often used for simplicity. To convert to frequency in Hz multiply by the sampling rate.
The periodogram estimator is based on the definition in equation above. The definition has a limit and an expected value. To use the definition directly we would need an infinite amount of data, which we don't have, and we would need to know the probability density function (PDF) of the data which we don't know. To use this definition to get an estimator of the PSD, we must drop the limit and we must drop the expectation operator ($E$). We also can only evaluate a finite number of frequencies. After dropping the limit and the expectation we are left with
where 0 <= f < 1 and N is the number of data samples. Notice, the range of f changes from (-1/2,1/2) in the first equation to (0,1) in the second equation. Textbooks usually use the former and R uses the latter.
The periodogram is very easy to implement in R, but before we do we need to simulate some data. The code below first uses the set.seed command so R will produce the same "random" numbers each time. Then it creates a 32 normally distributed numbers and 32 points of a sine wave with a normalized frequency of 0.4 and a amplitude of 2. The signal is made up of a sine wave and the random points added together.
The figure below is a plot of the data generated above. To me it looks random and it is not obvious there is a sine wave in there.
set.seed(0)
N <-32
n<- 0:(N-1)
w <- rnorm(1:N)
f1 <- 0.4
A1 <- 2
s <- A1*sin(2*pi*f1*n)
x <- s + w
Now we are going to implement the periodogram in R. It is pretty much a direct translation from the math. We start with the 1/N and multiply that by the absolute value of the square of the Fourier transform of the data. Writing that in English was more work than implementing the periodogram in R!
xPer <- (1/N)*abs(fft(x)^2)
f    <- seq(0,1.0-1/N,by=1/N)
The figure below is a plot of the periodogram of the data. The dotted line marks the location of the frequency f1, the frequency of the sine wave of the data. Now the sine wave component really stands out!. Notice two things. First, the peaks of the periodogram seems to be a bit off the true values. Second, the plot looks jagged. Both of these things are caused by the same thing: the periodogram is only evaluated at 32 frequency bins. The frequencies we evaluate the periodogram on are called bins. We can fix both problems by evaluating the periodogram at more bins. One way to evaluate the periodogram at more points is to get more data! That would certainly fix both problems. Unfortunately, we often are struck with the data we have. There is another way.
Since f is a continuous variable we can evaluate the periodogram at as many points as we want. The way we do this is to pretend we have more data by sticking zeros at the end of our data, we zero pad it. The R code below adds 968 zeros to the end of x zero-padding it to a total of 1000 "data" points. As you can see in the figure below, we have fixed both problems. In the 32 point periodogram missed the peak, because the periodogram was not evaluated at enough points. The zero-padding does not guarantee we hit the peak, but it will get closer. Also, we fixed the jaggedness by evaluating the periodogram at many more bins.
xzp <- c(x,rep(0,1000-N))
Nzp <- length(xzp)
xPerZp <- (1/N)*abs(fft(xzp)^2)
fzp    <- seq(0,1.0-1/Nzp,by=1/Nzp)

Monday, June 3, 2019

JavaScript is Fast?

JavaScript is Fast?

OK, I admit it. I'm always looking for the one true programming language. Mostly I use programming languages for signal processing, data science, that kind of thing. So, for me I want a language to be good for prototyping, flexible, fast, have lots of math/statistics libraries. The first language I use for signal processing was C, not much to say about that. I then became a heavy MATLAB user, great for a lot of things. I used MathCad. It was great for making readable code! I even used Mathematica. I never got the hang of it's syntax, it always felt award. Lately, I've been doing most of my stuff in R.

So, I find myself at the Julia site, at the time is was the next new mathematical programming language. They have a benchmark page showing how fast it is and it is fast! On that page they show JavaScript as faster than a bunch of standard math mathematically oriented languages. I would not have guessed that. Maybe, JavaScript should be the shiny "new" programming language for math!