-
R Programming (1) - R IntroR Programming 2020. 3. 25. 09:06728x90
Introduction to R
R as a calculating environment
R can be used as a powerful calculator. For arithmetic calculations:`
(1 + 1 / 100) ^ 100
## [1] 2.704814
17 %% 5
## [1] 2
17 %/% 5
## [1] 3
[1]
implies this is item 1 in a vector of output.symbol meaning + addition - subtraction * multiplication / division ^ exponential %% modulus %/% integer division Built-in function in R
R has a number of built in functions:
sin(x), cos(x), tan(x), exp(x), log(x), sqrt(x), floor(x), ceiling(x), round(x)
exp(1)
## [1] 2.718282
options(digits = 16) exp(1)
## [1] 2.718281828459045
pi
## [1] 3.141592653589793
sin(pi/6)
## [1] 0.4999999999999999
floor(2.3)
## [1] 2
ceiling(2.3)
## [1] 3
round(2.3)
## [1] 2
Variable
-
We can assign a value to a variable and use the variable.
- For the assignment, we use command
<-
- Usually, this is pronounced as “gets”.
- Variable names made up of letters, numbers, . or _
- provided it starts with a letter, or . then a letter.
- names are case sensitive.
- for example,
- x, y, my_variable, a1, a2, .important_variable, x.input
- wrong name:
- 2016_income, .1grade, _x, y@gmail.com
To display the value of a variable
x
, we typex
- or
print(x)
orshow(x)
.
x <- 100 x
## [1] 100
print(x)
## [1] 100
show(x)
## [1] 100
We can show the outcome of assignment by parentheses.
(y <- (1 + 1 / x) ^ x)
## [1] 2.704813829421528
When assigning, the right-hand side is evaluated first, then that value is placed in the variable on the left-hand side.
n <- 1 n <- n + 1 n
## [1] 2
R allows the use of
=
for variable assignment, in common with most programming languages.The following also works.
3 -> three three
## [1] 3
You can have multiple assignments.
v <- w <- z <- 1 v
## [1] 1
w
## [1] 1
z
## [1] 1
Functions
Takes one or more argument (inputs) and produces one or more outputs (return values).
seq(from = 1, to = 9, by =2)
## [1] 1 3 5 7 9
seq(from = 1, to = 9)
## [1] 1 2 3 4 5 6 7 8 9
You can access the built-in help by
help(function name)
or?function name
:Every function has a default order for arguments.
If you provide arguments in this order, then they do not need to be named.seq(1, 9, 2)
## [1] 1 3 5 7 9
seq(to = 9, from = 1)
## [1] 1 2 3 4 5 6 7 8 9
seq(by=-2, 9, 1)
## [1] 9 7 5 3 1
We will study more about functions in Chapter 4.
Vectors
- the basic data structure in R
- also called atomic vector
- indexed set of variables
- i-th element of vector
x
is denoted byx[i]
- the types of all elements of an atomic vector should be the same
- logical, integer, double (often called numeric), character
- three basic functions for constructing vectors
seq
,rep
,c
- Basic three properties:
- Type,
typeof()
, what it is. - Length,
length()
, how many elements it contains. - Attributes,
attributes()
, additional arbitrary metadata.
- Type,
# short for sequence (x <- seq(1, 20, by = 2))
## [1] 1 3 5 7 9 11 13 15 17 19
(x2 <- seq(1.1, 2, by = 0.1))
## [1] 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
# short for repeat (y <- rep(3, 4))
## [1] 3 3 3 3
# short for combine (z <- c(y, x))
## [1] 3 3 3 3 1 3 5 7 9 11 13 15 17 19
- another method for sequence
(x <- 100:110)
## [1] 100 101 102 103 104 105 106 107 108 109 110
(y <- 110:100)
## [1] 110 109 108 107 106 105 104 103 102 101 100
length(x)
## [1] 11
letters, LETTERS
# the 26 lower-case letters of the Roman alphabet letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" ## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
# the 26 upper-case letters of the Roman alphabet; LETTERS
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" ## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
seq_along()
seq_along(x)
is the same as1:length(x)
x <- letters[1:5] # a b c d e y <- seq_along(x) print(y)
## [1] 1 2 3 4 5
- vector and index
(x<- 100:110)
## [1] 100 101 102 103 104 105 106 107 108 109 110
# second element x[2]
## [1] 101
# last element x[length(x)]
## [1] 110
(x<- 100:110)
## [1] 100 101 102 103 104 105 106 107 108 109 110
i <- c(1, 3, 2) x[i]
## [1] 100 102 101
x[1:5]
## [1] 100 101 102 103 104
minus index:
j <- c(-1, -2, -3) x[j]
## [1] 103 104 105 106 107 108 109 110
but you can’t mix positive and negative index, the following code is erroneous.
x[c(-1, 2)]
empty vetor:
x <- c()
elementwise algebraic operation:
x <- c(1, 2, 3) y <- c(4, 5, 6) x * y
## [1] 4 10 18
y ^ x
## [1] 4 25 216
with unequal length of vectors:
c(1, 2, 3, 4) + c(1, 2)
## [1] 2 4 4 6
2 * c(1, 2, 3)
## [1] 2 4 6
(1:3)^2
## [1] 1 4 9
This works but with warning message:
c(1, 2, 3) + c(1, 2)
functions taking vectors
sqrt(1:3)
## [1] 1.000000000000000 1.414213562373095 1.732050807568877
mean(1:6)
## [1] 3.5
sort(c(5, 1, 3, 4, 2))
## [1] 1 2 3 4 5
logical vector
logi <- c(TRUE, FALSE, T, F) typeof(logi)
## [1] "logical"
character vector
char <- c("a", "vector", "of", "characters") typeof(char)
## [1] "character"
integer vector
integ <- c(1L, 2L, 3L, 4L) typeof(integ)
## [1] "integer"
double vector
doub <- c(1, 2, 3, 4) typeof(doub)
## [1] "double"
typeof()
typeof()
returns an (internal) object type :- logical - logical vector
- double - double vector
- interger - interger vector
- character - character vector
- list - see Ch.5
- builtin : R built-in function
- closure : user defined function
- and so on.
Simply speaking, the objects in R are implemented via C, and the internal types mean the C level data types.
Coercion
If we try to combine differet types of elements in a vector, they will be coerced to the most flexible type.
- logical, integer, double, character
c("a", 1)
## [1] "a" "1"
c(TRUE, FALSE, 0)
## [1] 1 0 0
Example : mean and variance
compare computed mean and variance with built-in functions
x <- c(1.2, 0.9, 0.8, 1, 1.2) x.mean <- sum(x)/length(x) x.mean - mean(x)
## [1] 0
x.var <- sum((x-x.mean)^2)/(length(x)-1) x.var - var(x)
## [1] 0
Example : simple numerical integration
- The basic problem in numerical integration is to compute an approximate solution to a definite integral.
dt <- 0.005 t <- seq(0, pi/6, by = dt) ft <- cos(t) (I <- sum(ft) * dt)
## [1] 0.5015486506255458
t
is a vector andft
is also a vector.I - sin(pi/6)
## [1] 0.001548650625545822
Note the difference between the numerical integration and theoretical value.
Example : exponential limit
x <- seq(10, 200, by = 10) y <- (1 + 1/x)^x exp(1) - y
## [1] 0.124539368359042779 0.064984123314622888 0.043963052588742446 ## [4] 0.033217990069081882 0.026693799385437256 0.022311689128828860 ## [7] 0.019165457482859694 0.016796887705717634 0.014949367400859170 ## [10] 0.013467999037516609 0.012253746954290712 0.011240337596801542 ## [13] 0.010381746740967479 0.009645014537900565 0.009005917124194074 ## [16] 0.008446252151229849 0.007952077235180433 0.007512532619638357 ## [19] 0.007119033847887923 0.006764705529727966
plot(x, y)
Missing data
in R, missing data is represented by
NA
.a <- NA # assign NA to variable A is.na(a) # is it missing?
## [1] TRUE
a <- c(11, NA, 13) is.na(a)
## [1] FALSE TRUE FALSE
mean(a)
## [1] NA
mean(a, na.rm = TRUE) #NAs can be removed
## [1] 12
NA
is logicaltypeof(NA)
## [1] "logical"
and can be coerced to numeric, or integer, or charactor
x <- c(NA, 10) typeof(x[1])
## [1] "double"
y <- c(NA, "abc") typeof(y[1])
## [1] "character"
Inf and -Inf
If a computation result is too big, R will return
Inf
.2 ^ 1024
## [1] Inf
- 2 ^ 1024
## [1] -Inf
1 / 0
## [1] Inf
NaN
If a computation result makes little sense, then R will return
NaN
, not a number.Inf - Inf
## [1] NaN
0 / 0
## [1] NaN
Expression and assignment
Expression is a phrase of code that can be executed.
seq(10, 20, by=3)
## [1] 10 13 16 19
4
## [1] 4
mean(c(1,2,3))
## [1] 2
1 > 2
## [1] FALSE
If the evaluation of the expression is saved using
<-
, then it called an assignment.x1 <- seq(10, 20, by=3)
x2 <- 1>2
A logical expression is formed using
- the comparison operators
<
,>
,<=
,>=
,==
, and!=
(not equal to)
- and the logical operators
&
(and),|
(or), and!
(not).
The value of a logical expression is either
TRUE
orFALSE
.- The integers 1 and 0 can also be used as
TRUE
orFALSE
.
c(0, 0, 1, 1) | c(0, 1, 0, 1)
## [1] FALSE TRUE TRUE TRUE
x[subset]
R’s subsetting operators are powerful and fast.
Subsetting is hard to learn.
We can extract a subvector using a subset as a vector of TRUE/FALSE.
x <- 1:10 x%%4 == 0
## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
( y <- x[ x%%4==0 ] )
## [1] 4 8
R also provide subset function, which ignore
NA
.- whereas
x[subset]
preserveNA
.
x <- c(1, NA, 3, 4) x[x > 2]
## [1] NA 3 4
subset(x, subset = x>2 )
## [1] 3 4
For the index position of TRUE elements, use
which(x)
x <- c(1, 1, 2, 3, 5, 8, 13) which(x %% 2 == 0)
## [1] 3 6
Explore more about subsetting:
x <- c(2.1, 4.2, 3.3, 5.4)
x[c(TRUE,FALSE,TRUE,FALSE)]
## [1] 2.1 3.3
x[x > 3]
## [1] 4.2 3.3 5.4
x[order(x)]
## [1] 2.1 3.3 4.2 5.4
Example : rounding error
Many floating numbers are subject to rounding errors in digital computers.
2 * 2 == 4
## [1] TRUE
sqrt(2) * sqrt(2) == 2
## [1] FALSE
The solution is to use
all.equal(x,y)
, which returns TRUE if the difference between x and y is smaller than some tolerance.all.equal(sqrt(2) * sqrt(2), 2)
## [1] TRUE
Sequential
&&
and||
To evaluate
x && y
, R first evaluatex
. Ifx
isFALSE
then R returnsFALSE
without evaluatingy
.To evaluate
x || y
, R first evaluatex
. Ifx
isTRUE
then R returnsTRUE
without evaluatingy
.Sequential evalutation of
x
andy
is useful wheny
is not always well defined.x <- 0 x == 0 || sin(1/x) == 0
## [1] TRUE
Note that
&&
and||
only work on scalars, whereas&
and|
work on vectors elementwise.subsetting and assignment
Subsetting operators can be combined with assignment operator.
x <- 1:5 x[c(1, 2)] <- 2:3 x
## [1] 2 3 3 4 5
x[-1] <- 4:1 x
## [1] 2 4 3 2 1
x[x %% 2 == 0] <- 0 x
## [1] 0 0 3 0 1
x[which(x == max(x))] <- 100 x
## [1] 0 0 100 0 1
Names
You can name a vector:
(x <- c(a = 1, b = 2, c = 3))
## a b c ## 1 2 3
names
function returns a charater vector:names(x)
## [1] "a" "b" "c"
y <- 1:3 names(y) <- c("first", "second", "third") y
## first second third ## 1 2 3
and select elements by names:
y["first"]
## first ## 1
y[c("first", "third")]
## first third ## 1 3
Matrix
Matrix is created from a vector using the function matrix:
matrix( data, nrow =1, ncol=1, byrow=FALSE )
data
: vector of length at mostnrow*ncol
- if length of vector < nrow*ncol, then data is reused as many times as is needed
nrow
: number of rowsncol
: number of columnsbyrow = TRUE
: fill the matrix up row-by-rowbyrow = FALSE
: fill the matrix up column-by-column, default
-
diag(x)
: create diagonal matrix -
rbind(...)
: join matrices with rows of the same length -
cbind(...)
: join matrices with columns of the same length
Example:
(A <- matrix(1:6, nrow = 2, ncol = 3))
## [,1] [,2] [,3] ## [1,] 1 3 5 ## [2,] 2 4 6
(A <- matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE))
## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6
A[1, 3] <- 0 A
## [,1] [,2] [,3] ## [1,] 1 2 0 ## [2,] 4 5 6
A[, 2:3]
## [,1] [,2] ## [1,] 2 0 ## [2,] 5 6
(B <- diag(c(1, 2, 3)))
## [,1] [,2] [,3] ## [1,] 1 0 0 ## [2,] 0 2 0 ## [3,] 0 0 3
Matrix operation
-
Usual algebraic operations, including
*
, act elementwise. -
To perform matrix operation, we use
%*%
. -
nrow(x)
,ncol(x)
-
det(x)
: determinant ofx
-
t(x)
: transpose ofx
-
solve(A, B)
: returnsx
such thatA %*% x = B
-
If
A
is invertable, thesolve(A)
is the inverse ofA
.
A <- matrix(c(3, 5, 2, 3), nrow=2, ncol=2) B <- matrix(c(1, 1, 0, 1), nrow=2, ncol=2)
A %*% B
## [,1] [,2] ## [1,] 5 2 ## [2,] 8 3
A.inv <- solve(A) A %*% A.inv # we observe rounding error
## [,1] [,2] ## [1,] 1 -8.881784197001252e-16 ## [2,] 0 1.000000000000000e+00
A^(-1) #This is not an inverse. ^(-1) applies elementwise.
## [,1] [,2] ## [1,] 0.3333333333333333 0.5000000000000000 ## [2,] 0.2000000000000000 0.3333333333333333
cbind()
: Combine R objects by columnsrbind()
: Combine R objects by rows(C <- cbind(A,B))
## [,1] [,2] [,3] [,4] ## [1,] 3 2 1 0 ## [2,] 5 3 1 1
(D <- rbind(A,B))
## [,1] [,2] ## [1,] 3 2 ## [2,] 5 3 ## [3,] 1 0 ## [4,] 1 1
colnames(C) <- c("a", "b", "c", "d") C
## a b c d ## [1,] 3 2 1 0 ## [2,] 5 3 1 1
rownames(C) <- c("first", "second") C
## a b c d ## first 3 2 1 0 ## second 5 3 1 1
# number of rows nrow(C)
## [1] 2
# number of columns ncol(C)
## [1] 4
Subsetting of matrix
select columns:
(partC1 <- C[, c(1, 2)])
## a b ## first 3 2 ## second 5 3
select columns by column names:
(partC1 <- C[, c("a", "b")])
## a b ## first 3 2 ## second 5 3
partC1
is a matrixclass(partC1)
## [1] "matrix"
When you select one column, it becomes a vector, not a matrix
this_is_vector_now <- C[, "a"] class(this_is_vector_now)
## [1] "numeric"
When you want to preserve matrix form:
still_matrix <- C[, "a", drop = FALSE] class(still_matrix)
## [1] "matrix"
select rows:
(partC2 <- C[1,])
## a b c d ## 3 2 1 0
Or, in a combined way:
(partC2 <- C[1, c(2, 3)])
## b c ## 2 1
class(partC2)
## [1] "numeric"
When we use a single index, R counts the index columnwise:
C[1]
## [1] 3
C[2]
## [1] 5
C[3]
## [1] 2
C[4]
## [1] 3
is.vector(partC2)
## [1] TRUE
is.matrix(partC2)
## [1] FALSE
(Cm <- as.matrix(partC2))
## [,1] ## b 2 ## c 1
class(Cm)
## [1] "matrix"
(Cv <- as.vector(Cm))
## [1] 2 1
class(Cv)
## [1] "numeric"
Objects and classes
R is an object oriented language.
Every object in R is a member of a class.
You can use the
class
function to determine the class of object.# numeric vector class(c(1, 2, 3))
## [1] "numeric"
# character vector class(c("c", "B", "z"))
## [1] "character"
# function class(sin)
## [1] "function"
class(matrix(c(1,2)))
## [1] "matrix"
Workspace
The objects that you create using R remain in existence until you explicitly delete them.
-
rm(x)
: remove objectx
-
rm(list=ls())
: remove all objects
Working directory
When you run R, it uses one of the directories on your hard drive as a working directory,
- where it looks for user-written programs and data files.
Check the working directory.
getwd()
Change the working directory to “dir”
setwd("dir")
/
is for directory and file address,.
refers current directory,..
refers parent directoryWriting script
We can type and evaluate all possible R expression at the prompt, it is much more convenient to write scripts,
-
which simply comprise collections of R expression.
-
We use the terms program and code synonymously with script.
You can use built-in editor in Rgui or Rstudio.
Help
To find out more about an R command or function x, you can type
help(x)
or just?x
.If you cannot remember the exact name, then
help.search("x")
.HTML help command :
help.start()
package
R provides various useful packages to help you.
https://cran.r-project.org/web/packages/
To install a package:
install.packages("packagename")
To access the package:
library("packagename")
Or use package menu.
R documentation : https://www.rdocumentation.org/
728x90'R Programming' 카테고리의 다른 글
R Programming (6) - Numerical integration (0) 2020.03.28 R Programming (5) - Sophisticated data structures (0) 2020.03.28 R Programming (4) - Function (0) 2020.03.28 R Programming (3) - IO (0) 2020.03.28 R Programming (2) - R basic (0) 2020.03.28 -