sample
with dbinom
beats rbinom
in generating random numbers from a binomial
distribution.
A few years back in a Stochastic Processes paper we were examining
Gambler's Ruin or some such that required generating what was
essentially a bernoulli random variable (in truth, needed to generate
-1 or 1 with equal probability). Now I remember there being a
discussion on an efficient way to generate such numbers in R, and I
suggested using rbinom(n, 1, 0.5) * 2 - 1
, which doesn't
seem too bad, right? rbinom
generates 0's and 1's (and
surely rbinom
must be quite efficient?), and we can
transform these to -1's and 1's with * 2 - 1
,
Armed with my current R knowledge however, I wondered if there was
a faster (perhaps even easier) way. The results may (or may not)
surprise you. So we could use rbinom
as I suggested all
those years ago:
> rbinom(100, 1, 0.5) * 2 -1
[1] -1 1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 1 1 1 1 1 1 -1
[26] 1 1 -1 1 1 1 -1 -1 1 -1 -1 -1 -1 -1 1 1 -1 -1 1 -1 1 1 1 1 -1
[51] 1 -1 1 1 -1 1 1 -1 1 -1 1 1 -1 1 1 -1 1 -1 1 1 -1 -1 -1 -1 -1
[76] -1 1 1 1 1 -1 1 1 -1 1 1 1 -1 -1 1 1 -1 1 1 1 -1 -1 -1 1 -1
An alternative is to use sample
.
> sample(c(-1, 1), 100, TRUE) # TRUE for sampling with replacement
[1] 1 -1 1 1 -1 -1 1 1 1 -1 1 1 1 -1 -1 1 -1 -1 1 -1 1 -1 -1 1 1
[26] -1 -1 1 1 1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 1
[51] 1 -1 -1 1 1 1 1 -1 -1 1 -1 -1 1 -1 -1 -1 1 -1 1 -1 1 1 1 1 1
[76] 1 -1 -1 1 -1 -1 -1 1 -1 1 1 1 -1 -1 1 1 1 -1 -1 1 1 1 -1 1 1
So how do these stack up against each other?
> system.time(rbinom(10^8, 1, 0.5) * 2 - 1)
user system elapsed
10.11 0.38 10.49
> system.time(sample(c(-1, 1), 10^8, TRUE))
user system elapsed
2.86 0.07 2.94
For reference:
> system.time(rbinom(10^8, 1, 0.5))
user system elapsed
9.75 0.38 10.13
> system.time(sample(c(0, 1), 10^8, TRUE))
user system elapsed
2.73 0.19 2.92
So sample
beats rbinom
quite
resoundingly, even for generating a binomial with n = 1
and p = 0.5
.
Now here's where it really gets interesting, what if it's no longer
a simple bernoulli and we try a binomial with say, n = 2
and p = 0.5
> system.time(rbinom(10^7, 2, 0.5))
user system elapsed
0.95 0.00 1.00
We can specify the binomial probabilities to sample
to
generate the same results:
> system.time(sample(0:2, 10^7, TRUE, dbinom(0:2, 2, 0.5)))
user system elapsed
0.34 0.01 0.36
So sample
beats rbinom
even for
larger n
!
Now the question is, how long does this relationship hold? Behold, a function to test it:
sampletestfunc =
## A function to test if sample is better than rbinom
function(startn = 2, maxn = 30, pr = 0.5){
samplebetter = TRUE
n = startn
while(samplebetter && n < maxn){
binomt = system.time(rbinom(10^6, n, pr))[3]
samplet = system.time(sample(0:n, 10^6, TRUE, dbinom(0:n, n, pr)))[3]
samplebetter = binomt > samplet
cat("n =", n, samplebetter, binomt, "vs", samplet, "\n")
n = n + 1
}
}
The results of this seems to suggest that sample
beats rbinom
resoundingly until we reach a point where we
have so many numbers that the probabilities get a little small and
the sample
function returns a warning.
sampletestfunc()
sampletestfunc(pr = 0.2)
sampletestfunc(pr = 0.7)
sampletestfunc(100, 110)
sampletestfunc(1000, 1010)
sampletestfunc(10000, 10010) ## Warning produced by sample
The more you know...