-
R Programming (7) - Discrete random variableR Programming 2020. 3. 28. 04:55728x90
Discrete random variable Discrete random variable
Discrete random variable in R
R has built-in functions for handling the most commonly encountered probability distributions.
Let \(X\) be of typedist
with parametersp1
,p2
, …, thenddist(x, p1, p2, ...)
equals \(P(X=x)\) for \(X\) discrete, or the density of \(X\) at \(x\) for \(X\) continuous.pdist(q, p1, p2, ...)
equals \(P(X \leq q)\)qdist(p, p1, p2, ...)
equals the smallest \(q\) for which \(P(X \leq q) \geq p\) (the 100\(p\)%-point)rdist(n, p1, p2, ...)
is a vector ofn
pseudo-random numbers from distribution type dist.
The input
x
,p
,q
can all be vector valued, in which case the output is vector valued.Some of the discrete distributions
Some of the discrete distributions provided by R, together with the names of their parameter inputs.
Distribution R name (dist) Paramter names Binomial binom
size, prob
Geometric geom
prob
Negative binomial nbinom
size, prob
Poisson pois
lambda
\(P(X \leq 5)\) for \(X\) which follows binomial distribution with
size=10
,prob =0.5
.pbinom(5, size = 10, prob = 0.5)
## [1] 0.6230469
Bernoulli distribution
A Bernoulli random variable \(B\) is based on a single trial
takes on the value 1 if the trial is a success
0 otherwise.
We use notation \(B\) ~ \(Bernoulli(p)\).
\(P(B=x) = p\) for \(x = 1\) and \(P(B=x) = 1-p\) for \(x = 0\).
\(E[B] = p\).
\(Var(B) = p(1-p)\).
Binomial distribution
Let \(X\) be the number of success in n independent trials, with probability of success \(p\).
Then \(X\) is said to have a binomial distribution with parameters \(n\) and \(p\).
Let \(B_1, ..., B_n\) be independent \(Bernoulli(p)\) r.v.s, then
- \(X = B_1 + ... + B_n\) ~ \(binom(n,p)\)
For \(x = 0, 1, ..., n,\) we have
The Bernoulli distribution is the same as a \(binom(1,p)\).
Binomial probability mass function
With different paramter settings, observe the probability mass functions.
par(mfrow=c(2, 2)) x <- seq(0, 20, 1) ps <- c(0.5, 0.5, 0.2, 0.8) trials <- c(10, 20, 10, 10) for (i in seq(1,4)){ y <- dbinom(x, trials[i], ps[i]) plot(x,y,xlim=c(0, trials[i]), ylim=c(0,1), xlab="x", ylab="P(X=x)", type="h") points(x, y) title(paste0("binomial(", trials[i], "," , ps[i], ")")) }
Binomial probability distribution function
par(mfrow=c(2, 2)) x <- seq(0, 20, 1) ps <- c(0.5, 0.5, 0.2, 0.8) trials <- c(10, 20, 10, 10) for (i in seq(1,4)){ y <- pbinom(x, trials[i], ps[i]) plot(x, y, xlim=c(0, trials[i]), ylim=c(0,1), xlab="x", ylab="P(X<=x)", type="s") title( paste("binomial(", trials[i], ",", ps[i], ")") ) }
Generate random variables
- Generates \(N\) binomial random variables using
rbinom
.
N <- 100; n <- 20; p <- 0.4 (brv <- rbinom(N, n, p) )
## [1] 9 10 8 12 6 7 10 8 9 7 8 8 6 6 10 5 9 9 6 10 8 9 8 ## [24] 9 7 10 5 4 9 9 8 7 9 9 2 11 8 7 9 5 10 12 8 6 9 7 ## [47] 5 6 12 10 8 15 8 7 12 9 10 12 5 3 6 9 9 12 7 4 8 8 11 ## [70] 7 8 7 9 8 6 5 8 8 6 6 8 13 10 7 7 4 9 10 8 10 9 14 ## [93] 9 10 8 8 8 9 9 4
table
counts the number of values for each observation.
table(brv)
## brv ## 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ## 1 1 4 6 10 12 22 21 12 2 6 1 1 1
- Using
factor
, you can specifylevels
.
table(factor(brv, levels=0:20))
## ## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ## 0 0 1 1 4 6 10 12 22 21 12 2 6 1 1 1 0 0 0 0 0
Comparison between r.v. and pmf
- If \(N\) is large, then the relative frequency of the random variables is similar to the pmf.
N <- 500; n <- 20; p <- 0.4 frequency <- table(factor(rbinom(N, n, p), levels = 0:n)) relative_frequency <- as.numeric(frequency/N) x <- 0:n pmf <- dbinom(x, n, p) par(mfrow=c(1,2)) plot(x, relative_frequency, 'h') plot(x, pmf, 'h')
Example
- What is the probability that a thousand randomly selected items will have less than 20 failures when the failure rate is 0.01?
pbinom(19, size=1000, prob=0.01)
## [1] 0.9967116
Geometric distribution
Let \(B_1, B_2, ...\) be an infinite sequence of independent \(Bernoulli(p)\) random variables
and let \(Y\) be such that \(B_1 = ... = B_Y = 0\) and \(B_{Y+1} = 1\)
Then \(Y\) is said to have a geometric distribution with parameter \(p\).
\(Y\) is the number of trials up to (but not including) the first success.
We write \(Y\) ~ geom\((p)\), and we have,
- \(P(Y=y) = p(1-p)^y\)
- \(E[Y] = (1-p)/p\)
- \(Var(Y) = (1-p)/p^2\)
pmf of geometric distribution
par(mfrow=c(2,2)) x <- seq(0,20,1) ps <- seq(0.2,0.8,0.2) for (p in ps){ y <- dgeom(x, p) plot(x,y,xlim=c(0,20),ylim=c(0,1),xlab="x", ylab="P(X=x)",type="h") points(x,y) title(paste("geometric(",p,")")) }
cdf of geometric distribution
Comparison with simulation
N <- 500; p <- 0.4; xmax <- 20 x <- 0:xmax; pmf <- dgeom(x, p) frequency <- table(factor(rgeom(N, p), levels=x)) # generated by rgeom function relative_frequency <- as.numeric(frequency/N) par(mfrow=c(1,2)) plot(x, relative_frequency, 'h'); plot(x, pmf, 'h')
Comparison with simulation based on Bernoulli trial
N <- 500; p <- 0.4; xmax <- 20; # generate random values from rbinom geom_rv <- numeric(N) # preallocation, zero vector with length N for (i in 1:N){ # N is sample size n <- 0 while(TRUE){ trial <- rbinom(1, 1, 0.4) # Bernoulli trial if (trial == 1) break # if trial is success, stop while n <- n + 1 # if trial is failure, increase number of trials } geom_rv[i] <- n } # for probability mass function x <- 0:xmax; pmf <- dgeom(x, p) frequency <- table(factor(geom_rv, levels = x)) relative_frequency <- as.numeric(frequency/N) par(mfrow=c(1,2)) plot(x, relative_frequency, 'h'); plot(x, pmf, 'h')
Negative binomial distribution
Let \(Z\) be the number of failures before the \(r\)-th success, in a sequence of iid \(Bernoulli(p)\) trials.
Then \(Z\) is said to have a negative binomial distribution and we write \(Z \sim nbinomial(r,p)\).
- Let \(Y_1, ..., Y_r\) be i.i.d. geom\((p)\) r.v.s, then
- \(Z = Y_1 + ... + Y_r\) ~ nbinom\((r,p)\)
- It follows
- \(E[Z] = r(1-p)/p\)
- \(Var(X) = r(1-p)/p^2\)
Negative binomial probability mass function
par(mfrow=c(2,2)) x <- seq(0,20,1) sizes = c(2, 3, 10, 10) probs = c(0.5, 0.5, 0.5, 0.8) for (i in seq(1,4)){ y<- dnbinom(x, sizes[i], probs[i]) plot(x, y, xlim=c(0,20), ylim=c(0,1), xlab="x", ylab="P(X=x)", type="h") points(x, y) title(paste("neg-binomial(", sizes[i], ",", probs[i], ")")) }
Negative binomial distribution function
par(mfrow=c(2,2)) x <- seq(0,20,1) sizes = c(2, 3, 10, 10) probs = c(0.5, 0.5, 0.5, 0.8) for (i in seq(1,4)){ y<- pnbinom(x, sizes[i], probs[i]) plot(x, y, xlim=c(0,20), ylim=c(0,1), xlab="x", ylab="P(X<=x)", type="s") title(paste("neg-binomial(", sizes[i], ",", probs[i], ")")) }
Comparison with simulation based on Bernoulli trial
In the following code, we use the total number of successes and total number of trials.
Rewrite the code using the total number of successes and total number of failures.
N <- 500; r <- 3; p <- 0.4; xmax <- 20; nbinom_rv <- numeric(N) # preallocation for (i in 1:N){ total_trial <- total_success <- 0 while(TRUE){ total_success <- total_success + rbinom(1, 1, p) total_trial <- total_trial + 1 if (total_success == r) break } nbinom_rv[i] <- total_trial - total_success } x <- 0:xmax; pmf <- dnbinom(x, r, p) relative_frequency <- as.numeric(table(factor(nbinom_rv, levels=x))/N) par(mfrow=c(1,2)) plot(x, relative_frequency, 'h'); plot(x, pmf, 'h')
Example: quality control
A manufacturer tests the production quality of its product by randomly selecting 100 from each batch.
If there are more than two faulty items, then they stop the production.
The probability of fault : \(p\)
Check the probability of stopping when we check 100 items.
- Let \(Z\) be the number of items we check before we find three faults, then \(Z \sim nbinomial(3,p)\) and
- \(P(\)stopping production\() = P(Z+3 \leq 100)\)
pnbinom(97, size = 3, prob = 0.01)
## [1] 0.0793732
This value also can be represented by the binomial distribution:
- The probability of three or more faulty items from 100 items.
1-pbinom(2, size = 100, prob = 0.01)
## [1] 0.0793732
If \(Z\) follows \(nbinomial(r,p)\), then \(P(Z < z) = 1 - P(X \leq r - 1)\) where \(X\) follows \(binom(z + r, p)\)
Poisson distribution
The Poisson distribution is used as a model for events occurring at random over time or space.
number of accidents in a year
number of misprints on a page
number of gamma particles released in a second
number of phone calls at an exchange in an hour
number of companies going bankrupt in a year
We say \(X\) has a Poisson distribution with parameter \(\lambda\), and write \(X \sim\) pois\((\lambda)\).
\(E[X] = \lambda\)
\(Var(X) = \lambda\)
Poisson distribution in R
dpois(x, lambda, log = FALSE)
- probability mass function for the Poisson such that \(P(X=x)\)
x
can be a vector
lambda
: (non-negative) mean, can be a vector
ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)
- Poisson distribution function such that \(P(X \leq q)\)
qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)
- quantile function
rpois(n, lambda)
- random generation for the Poisson distribution
Poisson probability mass function
par(mfrow=c(2, 2)) x <- seq(0, 20, 1) lambdas = c(0.1, 1, 5, 10) for (lambda in lambdas){ y <- dpois(x, lambda) plot(x,y,xlim=c(0,20), ylim=c(0,1), xlab="x", ylab="P(X=x)", type="h") points(x, y) title(paste("Poisson(", lambda, ")")) }
Poisson probability distribution function
par(mfrow=c(2, 2)) x <- seq(0, 20, 1) lambdas = c(0.1, 1, 5, 10) for (lambda in lambdas){ y <- ppois(x, lambda) plot(x,y,xlim=c(0,20), ylim=c(0,1), xlab="x", ylab="P(X=x)", type="s") title(paste("Poisson(", lambda, ")")) }
Comparison with simulation
N <- 500; lambda <- 10; xmax <- 20 x <- 0:xmax pmf <- dpois(x, lambda) frequency <- table(factor(rpois(N, lambda), levels=x)) relative_frequency <- as.numeric(frequency/N) par(mfrow=c(1,2)) plot(x, relative_frequency, ylab = "relative frequency", 'h') plot(x, pmf, 'h')
Poisson approximation to binomial
When \(n\) is large and \(p\) is small, \(binom(n,p)\) is approximated by \(pois(np)\).
n <- 200; p <- 0.03; x<-0:20 par(mfrow=c(1,2)) plot(x, dbinom(x, n, p), 'h') plot(x, dpois(x, n*p), 'h')
728x90'R Programming' 카테고리의 다른 글
R Programming (9) - More on simulation (0) 2020.03.28 R Programming (8) - Continuous random variable (0) 2020.03.28 R Programming (6) - Numerical integration (0) 2020.03.28 R Programming (5) - Sophisticated data structures (0) 2020.03.28 R Programming (4) - Function (0) 2020.03.28