R Programming (7) - Discrete random variable

R Programming 2020. 3. 28. 04:55

728x90

Discrete random variable

Discrete random variable in R

R has built-in functions for handling the most commonly encountered probability distributions.
Let $X$ be of type dist with parameters p1, p2, …, then

ddist(x, p1, p2, ...) equals $P(X=x)$ for $X$ discrete, or the density of $X$ at $x$ for $X$ continuous.
pdist(q, p1, p2, ...) equals $P(X \leq q)$
qdist(p, p1, p2, ...) equals the smallest $q$ for which $P(X \leq q) \geq p$ (the 100 $p$ %-point)
rdist(n, p1, p2, ...) is a vector of n pseudo-random numbers from distribution type dist.

The input x, p, q can all be vector valued, in which case the output is vector valued.

Some of the discrete distributions

Some of the discrete distributions provided by R, together with the names of their parameter inputs.

Distribution	R name (dist)	Paramter names
Binomial	`binom`	`size, prob`
Geometric	`geom`	`prob`
Negative binomial	`nbinom`	`size, prob`
Poisson	`pois`	`lambda`

$P(X \leq 5)$ for $X$ which follows binomial distribution with size=10, prob =0.5.

pbinom(5, size = 10, prob = 0.5)

## [1] 0.6230469

Bernoulli distribution

A Bernoulli random variable $B$ is based on a single trial

takes on the value 1 if the trial is a success
0 otherwise.

We use notation $B$ ~ $Bernoulli(p)$ .

$P(B=x) = p$ for $x = 1$ and $P(B=x) = 1-p$ for $x = 0$ .
$E[B] = p$ .
$Var(B) = p(1-p)$ .

Binomial distribution

Let $X$ be the number of success in n independent trials, with probability of success $p$ .

Then $X$ is said to have a binomial distribution with parameters $n$ and $p$ .

Let $B_1, ..., B_n$ be independent $Bernoulli(p)$ r.v.s, then

$X = B_1 + ... + B_n$ ~ $binom(n,p)$

For $x = 0, 1, ..., n,$ we have

The Bernoulli distribution is the same as a $binom(1,p)$ .

Related functions in R

R online document for Binomial

probability mass function for the binomial such that $P(X=x)$

Usage

dbinom(x, size, prob, log=FALSE)

x can be a vector

Example

dbinom(c(0,1,2,3,4,5), size=5, prob=0.4)

## [1] 0.07776 0.25920 0.34560 0.23040 0.07680 0.01024

probability distribution function for the binomial such that $P(X \leq q)$

Usage

pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)

q can be a vector

Example

pbinom(c(0,1,2,3,4,5), 5, 0.4)

## [1] 0.07776 0.33696 0.68256 0.91296 0.98976 1.00000

quantile function

Usage

qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)

random number generator

Usage

rbinom(n, size, prob)

n observations of of the binomial random variable
- size : number of trial
- prob : probability of success on each trial

Example

Generate 5 samples from a Binomial distribution $binom(10, 0.4)$ .

rbinom(5, size = 10, prob = 0.4)

## [1] 3 5 4 3 4

We can generate a Bernoulli random variable using rbinom function.

# repeat this statement
rbinom(1, size = 1, prob = 0.4)

## [1] 1

The sum of Bernoulli random variable follows a Binomial distribution.

# repeat this statement
sum(rbinom(10, size = 1, prob = 0.4))

## [1] 5

# the same distribution with above
rbinom(1, size = 10, prob = 0.4)

## [1] 5

Binomial probability mass function

With different paramter settings, observe the probability mass functions.

par(mfrow=c(2, 2))
x <- seq(0, 20, 1)

ps <- c(0.5, 0.5, 0.2, 0.8)
trials <- c(10, 20, 10, 10)

for (i in seq(1,4)){
   y <- dbinom(x, trials[i], ps[i])
   plot(x,y,xlim=c(0, trials[i]), ylim=c(0,1), xlab="x", ylab="P(X=x)", type="h")
   points(x, y)
   title(paste0("binomial(", trials[i], "," , ps[i], ")"))
}

Binomial probability distribution function

par(mfrow=c(2, 2))
x <- seq(0, 20, 1)

ps <- c(0.5, 0.5, 0.2, 0.8)
trials <- c(10, 20, 10, 10)

for (i in seq(1,4)){
   y <- pbinom(x, trials[i], ps[i])
   plot(x, y, xlim=c(0, trials[i]), ylim=c(0,1), xlab="x",
                           ylab="P(X<=x)", type="s")
   title( paste("binomial(", trials[i], ",", ps[i], ")") )
}

Generate random variables

Generates $N$ binomial random variables using rbinom.

N <- 100; n <- 20; p <- 0.4
(brv <- rbinom(N, n, p) )

##   [1]  9 10  8 12  6  7 10  8  9  7  8  8  6  6 10  5  9  9  6 10  8  9  8
##  [24]  9  7 10  5  4  9  9  8  7  9  9  2 11  8  7  9  5 10 12  8  6  9  7
##  [47]  5  6 12 10  8 15  8  7 12  9 10 12  5  3  6  9  9 12  7  4  8  8 11
##  [70]  7  8  7  9  8  6  5  8  8  6  6  8 13 10  7  7  4  9 10  8 10  9 14
##  [93]  9 10  8  8  8  9  9  4

table counts the number of values for each observation.

table(brv)

## brv
##  2  3  4  5  6  7  8  9 10 11 12 13 14 15 
##  1  1  4  6 10 12 22 21 12  2  6  1  1  1

Using factor, you can specify levels.

table(factor(brv, levels=0:20))

## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
##  0  0  1  1  4  6 10 12 22 21 12  2  6  1  1  1  0  0  0  0  0

Comparison between r.v. and pmf

If $N$ is large, then the relative frequency of the random variables is similar to the pmf.

N <- 500; n <- 20; p <- 0.4

frequency <- table(factor(rbinom(N, n, p), levels = 0:n))
relative_frequency <- as.numeric(frequency/N)

x <- 0:n
pmf <- dbinom(x, n, p)

par(mfrow=c(1,2))
plot(x, relative_frequency, 'h')
plot(x, pmf, 'h')

Example

What is the probability that a thousand randomly selected items will have less than 20 failures when the failure rate is 0.01?

pbinom(19, size=1000, prob=0.01)

## [1] 0.9967116

Geometric distribution

Let $B_1, B_2, ...$ be an infinite sequence of independent $Bernoulli(p)$ random variables
and let $Y$ be such that $B_1 = ... = B_Y = 0$ and $B_{Y+1} = 1$
Then $Y$ is said to have a geometric distribution with parameter $p$ .
$Y$ is the number of trials up to (but not including) the first success.
We write $Y$ ~ geom $(p)$ , and we have,
- $P(Y=y) = p(1-p)^y$
- $E[Y] = (1-p)/p$
- $Var(Y) = (1-p)/p^2$

pmf of geometric distribution

par(mfrow=c(2,2))
x <- seq(0,20,1)
ps <- seq(0.2,0.8,0.2)
for (p in ps){
    y <- dgeom(x, p)
    plot(x,y,xlim=c(0,20),ylim=c(0,1),xlab="x",     ylab="P(X=x)",type="h")
    points(x,y)
    title(paste("geometric(",p,")"))
}

cdf of geometric distribution

Comparison with simulation

N <- 500; p <- 0.4; xmax <- 20

x <- 0:xmax; pmf <- dgeom(x, p)

frequency <- table(factor(rgeom(N, p), levels=x))   # generated by rgeom function
relative_frequency <- as.numeric(frequency/N)

par(mfrow=c(1,2))
plot(x, relative_frequency, 'h'); plot(x, pmf, 'h')

Comparison with simulation based on Bernoulli trial

N <- 500; p <- 0.4; xmax <- 20;

# generate random values from rbinom
geom_rv <- numeric(N)  # preallocation, zero vector with length N
for (i in 1:N){        # N is sample size
  n <- 0
  while(TRUE){
    trial <- rbinom(1, 1, 0.4)  # Bernoulli trial
    if (trial == 1) break   # if trial is success, stop while
    n <- n + 1              # if trial is failure, increase number of trials
  }
  geom_rv[i] <- n
}

# for probability mass function
x <- 0:xmax;  pmf <- dgeom(x, p)
frequency <- table(factor(geom_rv, levels = x))
relative_frequency <- as.numeric(frequency/N)

par(mfrow=c(1,2))
plot(x, relative_frequency, 'h');   plot(x, pmf, 'h')

Negative binomial distribution

Let $Z$ be the number of failures before the $r$ -th success, in a sequence of iid $Bernoulli(p)$ trials.
Then $Z$ is said to have a negative binomial distribution and we write $Z \sim nbinomial(r,p)$ .
Let $Y_1, ..., Y_r$ be i.i.d. geom $(p)$ r.v.s, then
- $Z = Y_1 + ... + Y_r$ ~ nbinom $(r,p)$
It follows
- $E[Z] = r(1-p)/p$
- $Var(X) = r(1-p)/p^2$

Negative binomial probability mass function

par(mfrow=c(2,2))
x <- seq(0,20,1)
sizes = c(2, 3, 10, 10)
probs = c(0.5, 0.5, 0.5, 0.8)

for (i in seq(1,4)){
  y<- dnbinom(x, sizes[i], probs[i])
    plot(x, y, xlim=c(0,20), ylim=c(0,1), xlab="x", ylab="P(X=x)", type="h")
    points(x, y)
    title(paste("neg-binomial(", sizes[i], ",", probs[i], ")"))
}

Negative binomial distribution function

par(mfrow=c(2,2))
x <- seq(0,20,1)
sizes = c(2, 3, 10, 10)
probs = c(0.5, 0.5, 0.5, 0.8)

for (i in seq(1,4)){
  y<- pnbinom(x, sizes[i], probs[i])
    plot(x, y, xlim=c(0,20), ylim=c(0,1), xlab="x", ylab="P(X<=x)", type="s")
    title(paste("neg-binomial(", sizes[i], ",", probs[i], ")"))
}

Comparison with simulation based on Bernoulli trial

In the following code, we use the total number of successes and total number of trials.

Rewrite the code using the total number of successes and total number of failures.

N <- 500; r <- 3; p <- 0.4; xmax <- 20;

nbinom_rv <- numeric(N)    # preallocation
for (i in 1:N){
  total_trial <- total_success <- 0
  while(TRUE){
    total_success <- total_success + rbinom(1, 1, p)
    total_trial <- total_trial + 1
    if (total_success == r) break
  }
  nbinom_rv[i] <- total_trial - total_success
}

x <- 0:xmax;   pmf <- dnbinom(x, r, p)
relative_frequency <- as.numeric(table(factor(nbinom_rv, levels=x))/N)

par(mfrow=c(1,2))
plot(x, relative_frequency, 'h'); plot(x, pmf, 'h')

Example: quality control

A manufacturer tests the production quality of its product by randomly selecting 100 from each batch.
If there are more than two faulty items, then they stop the production.
The probability of fault : $p$
Check the probability of stopping when we check 100 items.
Let $Z$ be the number of items we check before we find three faults, then $Z \sim nbinomial(3,p)$ and
- $P($ stopping production $) = P(Z+3 \leq 100)$

pnbinom(97, size = 3, prob = 0.01)

## [1] 0.0793732

This value also can be represented by the binomial distribution:

The probability of three or more faulty items from 100 items.

1-pbinom(2, size = 100, prob = 0.01)

## [1] 0.0793732

If $Z$ follows $nbinomial(r,p)$ , then $P(Z < z) = 1 - P(X \leq r - 1)$ where $X$ follows $binom(z + r, p)$

Poisson distribution

The Poisson distribution is used as a model for events occurring at random over time or space.

number of accidents in a year
number of misprints on a page
number of gamma particles released in a second
number of phone calls at an exchange in an hour
number of companies going bankrupt in a year

We say $X$ has a Poisson distribution with parameter $\lambda$ , and write $X \sim$ pois $(\lambda)$ .

$E[X] = \lambda$
$Var(X) = \lambda$

Poisson distribution in R

dpois(x, lambda, log = FALSE)

probability mass function for the Poisson such that $P(X=x)$
- x can be a vector
lambda : (non-negative) mean, can be a vector

ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)

Poisson distribution function such that $P(X \leq q)$

qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)

quantile function

rpois(n, lambda)

random generation for the Poisson distribution

Poisson probability mass function

par(mfrow=c(2, 2))
x <- seq(0, 20, 1)
lambdas = c(0.1, 1, 5, 10)

for (lambda in lambdas){
    y <- dpois(x, lambda)
    plot(x,y,xlim=c(0,20), ylim=c(0,1), xlab="x", ylab="P(X=x)", type="h")
    points(x, y)
    title(paste("Poisson(", lambda, ")"))
}

Poisson probability distribution function

par(mfrow=c(2, 2))
x <- seq(0, 20, 1)
lambdas = c(0.1, 1, 5, 10)

for (lambda in lambdas){
    y <- ppois(x, lambda)
    plot(x,y,xlim=c(0,20), ylim=c(0,1), xlab="x", ylab="P(X=x)", type="s")
    title(paste("Poisson(", lambda, ")"))
}

Comparison with simulation

N <- 500; lambda <- 10; xmax <- 20

x <- 0:xmax
pmf <- dpois(x, lambda)

frequency <- table(factor(rpois(N, lambda), levels=x))
relative_frequency <- as.numeric(frequency/N)

par(mfrow=c(1,2))
plot(x, relative_frequency, ylab = "relative frequency", 'h')
plot(x, pmf, 'h')

Poisson approximation to binomial

When $n$ is large and $p$ is small, $binom(n,p)$ is approximated by $pois(np)$ .

n <- 200; p <- 0.03; x<-0:20
par(mfrow=c(1,2))
plot(x, dbinom(x, n, p), 'h')
plot(x, dpois(x, n*p), 'h')

728x90

'R Programming' 카테고리의 다른 글

R Programming (9) - More on simulation (0)	2020.03.28
R Programming (8) - Continuous random variable (0)	2020.03.28
R Programming (6) - Numerical integration (0)	2020.03.28
R Programming (5) - Sophisticated data structures (0)	2020.03.28
R Programming (4) - Function (0)	2020.03.28

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인기포스트

ABOUT ME

oziguyo oziguyo

Discrete random variable

Discrete random variable in R

Some of the discrete distributions

Bernoulli distribution

Binomial distribution

Binomial probability mass function

Binomial probability distribution function

Generate random variables

Comparison between r.v. and pmf

Example

Geometric distribution

pmf of geometric distribution

cdf of geometric distribution

Comparison with simulation

Comparison with simulation based on Bernoulli trial

Negative binomial distribution

Negative binomial probability mass function

Negative binomial distribution function

Comparison with simulation based on Bernoulli trial

Example: quality control

Poisson distribution

Poisson distribution in R

Poisson probability mass function

Poisson probability distribution function

Comparison with simulation

Poisson approximation to binomial

'R Programming' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

인기포스트

ABOUT ME

Discrete random variable

Discrete random variable in R

Some of the discrete distributions

Bernoulli distribution

Binomial distribution

Related functions in R

probability mass function for the binomial such that P(X=x)P(X=x)

Usage

Example

probability distribution function for the binomial such that P(X≤q)P(X \leq q)

Usage

Example

quantile function

Usage

random number generator

Usage

Example

Binomial probability mass function

Binomial probability distribution function

Generate random variables

Comparison between r.v. and pmf

Example

Geometric distribution

pmf of geometric distribution

cdf of geometric distribution

Comparison with simulation

Comparison with simulation based on Bernoulli trial

Negative binomial distribution

Negative binomial probability mass function

Negative binomial distribution function

Comparison with simulation based on Bernoulli trial

Example: quality control

Poisson distribution

Poisson distribution in R

Poisson probability mass function

Poisson probability distribution function

Comparison with simulation

Poisson approximation to binomial

'R Programming' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

probability mass function for the binomial such that $P(X=x)$

probability distribution function for the binomial such that $P(X \leq q)$