R Programming (1) - R Intro

R Programming 2020. 3. 25. 09:06

728x90

R as a calculating environment

R can be used as a powerful calculator. For arithmetic calculations:`

(1 + 1 / 100) ^ 100

## [1] 2.704814

17 %% 5

## [1] 2

17 %/% 5

## [1] 3

[1] implies this is item 1 in a vector of output.

symbol	meaning
+	addition
-	subtraction
*	multiplication
/	division
^	exponential
%%	modulus
%/%	integer division

Built-in function in R

R has a number of built in functions:
sin(x), cos(x), tan(x), exp(x), log(x), sqrt(x), floor(x), ceiling(x), round(x)

exp(1)

## [1] 2.718282

options(digits = 16)
exp(1)

## [1] 2.718281828459045

pi

## [1] 3.141592653589793

sin(pi/6)

## [1] 0.4999999999999999

floor(2.3)

## [1] 2

ceiling(2.3)

## [1] 3

round(2.3)

## [1] 2

Variable

We can assign a value to a variable and use the variable.
For the assignment, we use command <-
- Usually, this is pronounced as “gets”.
Variable names made up of letters, numbers, . or _
- provided it starts with a letter, or . then a letter.
- names are case sensitive.
- for example,
  - x, y, my_variable, a1, a2, .important_variable, x.input
- wrong name:
  - 2016_income, .1grade, _x, y@gmail.com

To display the value of a variable x, we type x

or print(x) or show(x).

x <- 100
x

## [1] 100

print(x)

## [1] 100

show(x)

## [1] 100

We can show the outcome of assignment by parentheses.

(y <- (1 + 1 / x) ^ x)

## [1] 2.704813829421528

When assigning, the right-hand side is evaluated first, then that value is placed in the variable on the left-hand side.

n <- 1
n <- n + 1
n

## [1] 2

R allows the use of = for variable assignment, in common with most programming languages.

The following also works.

3 -> three
three

## [1] 3

You can have multiple assignments.

v <- w <- z <- 1
v

## [1] 1

## [1] 1

## [1] 1

Functions

Takes one or more argument (inputs) and produces one or more outputs (return values).

seq(from = 1, to = 9, by =2)

## [1] 1 3 5 7 9

seq(from = 1, to = 9)

## [1] 1 2 3 4 5 6 7 8 9

You can access the built-in help by help(function name) or ?function name:

Every function has a default order for arguments.
If you provide arguments in this order, then they do not need to be named.

seq(1, 9, 2)

## [1] 1 3 5 7 9

seq(to = 9, from = 1)

## [1] 1 2 3 4 5 6 7 8 9

seq(by=-2, 9, 1)

## [1] 9 7 5 3 1

We will study more about functions in Chapter 4.

Vectors

the basic data structure in R
- also called atomic vector
indexed set of variables
i-th element of vector x is denoted by x[i]
the types of all elements of an atomic vector should be the same
- logical, integer, double (often called numeric), character
three basic functions for constructing vectors
- seq, rep, c
Basic three properties:
- Type, typeof(), what it is.
- Length, length(), how many elements it contains.
- Attributes, attributes(), additional arbitrary metadata.

# short for sequence
(x <- seq(1, 20, by = 2))

##  [1]  1  3  5  7  9 11 13 15 17 19

(x2 <- seq(1.1, 2, by = 0.1))

##  [1] 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

# short for repeat
(y <- rep(3, 4))

## [1] 3 3 3 3

# short for combine
(z <- c(y, x))

##  [1]  3  3  3  3  1  3  5  7  9 11 13 15 17 19

another method for sequence

(x <- 100:110)

##  [1] 100 101 102 103 104 105 106 107 108 109 110

(y <- 110:100)

##  [1] 110 109 108 107 106 105 104 103 102 101 100

length(x)

## [1] 11

letters, LETTERS

# the 26 lower-case letters of the Roman alphabet
letters

##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"

# the 26 upper-case letters of the Roman alphabet;
LETTERS

##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

seq_along()

seq_along(x) is the same as 1:length(x)

x <- letters[1:5]  # a b c d e
y <- seq_along(x)
print(y)

## [1] 1 2 3 4 5

vector and index

(x<- 100:110)

##  [1] 100 101 102 103 104 105 106 107 108 109 110

# second element
x[2]

## [1] 101

# last element
x[length(x)]

## [1] 110

(x<- 100:110)

##  [1] 100 101 102 103 104 105 106 107 108 109 110

i <- c(1, 3, 2)

x[i]

## [1] 100 102 101

x[1:5]

## [1] 100 101 102 103 104

minus index:

j <- c(-1, -2, -3)
x[j]

## [1] 103 104 105 106 107 108 109 110

but you can’t mix positive and negative index, the following code is erroneous.

x[c(-1, 2)]

empty vetor:

x <- c()

elementwise algebraic operation:

x <- c(1, 2, 3)
y <- c(4, 5, 6)
x * y

## [1]  4 10 18

y ^ x

## [1]   4  25 216

with unequal length of vectors:

c(1, 2, 3, 4) + c(1, 2)

## [1] 2 4 4 6

2 * c(1, 2, 3)

## [1] 2 4 6

(1:3)^2

## [1] 1 4 9

This works but with warning message:

c(1, 2, 3) + c(1, 2)

functions taking vectors

sqrt(1:3)

## [1] 1.000000000000000 1.414213562373095 1.732050807568877

mean(1:6)

## [1] 3.5

sort(c(5, 1, 3, 4, 2))

## [1] 1 2 3 4 5

logical vector

logi <- c(TRUE, FALSE, T, F)
typeof(logi)

## [1] "logical"

character vector

char <- c("a", "vector", "of", "characters")
typeof(char)

## [1] "character"

integer vector

integ <- c(1L, 2L, 3L, 4L)
typeof(integ)

## [1] "integer"

double vector

doub <- c(1, 2, 3, 4)
typeof(doub)

## [1] "double"

`typeof()`

typeof() returns an (internal) object type :

logical - logical vector
double - double vector
interger - interger vector
character - character vector
list - see Ch.5
builtin : R built-in function
closure : user defined function
and so on.

Simply speaking, the objects in R are implemented via C, and the internal types mean the C level data types.

Coercion

If we try to combine differet types of elements in a vector, they will be coerced to the most flexible type.

logical, integer, double, character

c("a", 1)

## [1] "a" "1"

c(TRUE, FALSE, 0)

## [1] 1 0 0

Example : mean and variance

compare computed mean and variance with built-in functions

x <- c(1.2, 0.9, 0.8, 1, 1.2)

x.mean <- sum(x)/length(x)

x.mean - mean(x)

## [1] 0

x.var <- sum((x-x.mean)^2)/(length(x)-1)
x.var - var(x)

## [1] 0

Example : simple numerical integration

The basic problem in numerical integration is to compute an approximate solution to a definite integral.

dt <- 0.005
t <- seq(0, pi/6, by = dt)
ft <- cos(t)
(I <- sum(ft) * dt)

## [1] 0.5015486506255458

t is a vector and ft is also a vector.

I - sin(pi/6)

## [1] 0.001548650625545822

Note the difference between the numerical integration and theoretical value.

Example : exponential limit

x <- seq(10, 200, by = 10)

y <- (1 + 1/x)^x

exp(1) - y

##  [1] 0.124539368359042779 0.064984123314622888 0.043963052588742446
##  [4] 0.033217990069081882 0.026693799385437256 0.022311689128828860
##  [7] 0.019165457482859694 0.016796887705717634 0.014949367400859170
## [10] 0.013467999037516609 0.012253746954290712 0.011240337596801542
## [13] 0.010381746740967479 0.009645014537900565 0.009005917124194074
## [16] 0.008446252151229849 0.007952077235180433 0.007512532619638357
## [19] 0.007119033847887923 0.006764705529727966

plot(x, y)

Missing data

in R, missing data is represented by NA.

a <- NA   # assign NA to variable A
is.na(a)     # is it missing?

## [1] TRUE

a <- c(11, NA, 13)
is.na(a)

## [1] FALSE  TRUE FALSE

mean(a)

## [1] NA

mean(a, na.rm = TRUE) #NAs can be removed

## [1] 12

NA is logical

typeof(NA)

## [1] "logical"

and can be coerced to numeric, or integer, or charactor

x <- c(NA, 10)
typeof(x[1])

## [1] "double"

y <- c(NA, "abc")
typeof(y[1])

## [1] "character"

Inf and -Inf

If a computation result is too big, R will return Inf.

2 ^ 1024

## [1] Inf

- 2 ^ 1024

## [1] -Inf

1 / 0

## [1] Inf

NaN

If a computation result makes little sense, then R will return NaN, not a number.

Inf - Inf

## [1] NaN

0 / 0

## [1] NaN

Expression and assignment

Expression is a phrase of code that can be executed.

seq(10, 20, by=3)

## [1] 10 13 16 19

## [1] 4

mean(c(1,2,3))

## [1] 2

1 > 2

## [1] FALSE

If the evaluation of the expression is saved using <-, then it called an assignment.

x1 <- seq(10, 20, by=3)

x2 <- 1>2

A logical expression is formed using

the comparison operators
- <, >, <=, >=, ==, and != (not equal to)
and the logical operators
- & (and), | (or), and ! (not).

The value of a logical expression is either TRUE or FALSE.

The integers 1 and 0 can also be used as TRUE or FALSE.

c(0, 0, 1, 1) | c(0, 1, 0, 1)

## [1] FALSE  TRUE  TRUE  TRUE

x[subset]

R’s subsetting operators are powerful and fast.

Subsetting is hard to learn.

We can extract a subvector using a subset as a vector of TRUE/FALSE.

x <- 1:10
x%%4 == 0

##  [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE

( y <- x[ x%%4==0 ] )

## [1] 4 8

R also provide subset function, which ignore NA.

whereas x[subset] preserve NA.

x <- c(1, NA, 3, 4)
x[x > 2]

## [1] NA  3  4

subset(x, subset = x>2 )

## [1] 3 4

For the index position of TRUE elements, use which(x)

x <- c(1, 1, 2, 3, 5, 8, 13)
which(x %% 2 == 0)

## [1] 3 6

Explore more about subsetting:

x <- c(2.1, 4.2, 3.3, 5.4)

x[c(TRUE,FALSE,TRUE,FALSE)]

## [1] 2.1 3.3

x[x > 3]

## [1] 4.2 3.3 5.4

x[order(x)]

## [1] 2.1 3.3 4.2 5.4

Example : rounding error

Many floating numbers are subject to rounding errors in digital computers.

2 * 2 == 4

## [1] TRUE

sqrt(2) * sqrt(2) == 2

## [1] FALSE

The solution is to use all.equal(x,y), which returns TRUE if the difference between x and y is smaller than some tolerance.

all.equal(sqrt(2) * sqrt(2), 2)

## [1] TRUE

Sequential `&&` and `||`

To evaluate x && y, R first evaluate x. If x is FALSE then R returns FALSE without evaluating y.

To evaluate x || y, R first evaluate x. If x is TRUE then R returns TRUE without evaluating y.

Sequential evalutation of x and y is useful when y is not always well defined.

x <- 0
x == 0 || sin(1/x) == 0

## [1] TRUE

Note that && and || only work on scalars, whereas & and | work on vectors elementwise.

subsetting and assignment

Subsetting operators can be combined with assignment operator.

x <- 1:5
x[c(1, 2)] <- 2:3
x

## [1] 2 3 3 4 5

x[-1] <- 4:1
x

## [1] 2 4 3 2 1

x[x %% 2 == 0] <- 0
x

## [1] 0 0 3 0 1

x[which(x == max(x))] <- 100
x

## [1]   0   0 100   0   1

Names

You can name a vector:

(x <- c(a = 1, b = 2, c = 3))

## a b c 
## 1 2 3

names function returns a charater vector:

names(x)

## [1] "a" "b" "c"

y <- 1:3
names(y) <- c("first", "second", "third")
y

##  first second  third 
##      1      2      3

and select elements by names:

y["first"]

## first 
##     1

y[c("first", "third")]

## first third 
##     1     3

Matrix

Matrix is created from a vector using the function matrix:

matrix( data, nrow =1, ncol=1, byrow=FALSE )
- data : vector of length at most nrow*ncol
  - if length of vector < nrow*ncol, then data is reused as many times as is needed
- nrow : number of rows
- ncol : number of columns
- byrow = TRUE : fill the matrix up row-by-row
- byrow = FALSE : fill the matrix up column-by-column, default
diag(x) : create diagonal matrix
rbind(...) : join matrices with rows of the same length
cbind(...) : join matrices with columns of the same length

Example:

(A <- matrix(1:6, nrow = 2, ncol = 3))

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

(A <- matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE))

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

A[1, 3] <- 0
A

##      [,1] [,2] [,3]
## [1,]    1    2    0
## [2,]    4    5    6

A[, 2:3]

##      [,1] [,2]
## [1,]    2    0
## [2,]    5    6

(B <- diag(c(1, 2, 3)))

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    2    0
## [3,]    0    0    3

Matrix operation

Usual algebraic operations, including *, act elementwise.
To perform matrix operation, we use %*%.
nrow(x), ncol(x)
det(x) : determinant of x
t(x) : transpose of x
solve(A, B) : returns x such that A %*% x = B
If A is invertable, the solve(A) is the inverse of A.

A <- matrix(c(3, 5, 2, 3), nrow=2, ncol=2)
B <- matrix(c(1, 1, 0, 1), nrow=2, ncol=2)

A %*% B

##      [,1] [,2]
## [1,]    5    2
## [2,]    8    3

A.inv <- solve(A)
A %*% A.inv   # we observe rounding error

##      [,1]                   [,2]
## [1,]    1 -8.881784197001252e-16
## [2,]    0  1.000000000000000e+00

A^(-1)   #This is not an inverse. ^(-1) applies elementwise.

##                    [,1]               [,2]
## [1,] 0.3333333333333333 0.5000000000000000
## [2,] 0.2000000000000000 0.3333333333333333

cbind() : Combine R objects by columns

rbind() : Combine R objects by rows

(C <- cbind(A,B))

##      [,1] [,2] [,3] [,4]
## [1,]    3    2    1    0
## [2,]    5    3    1    1

(D <- rbind(A,B))

##      [,1] [,2]
## [1,]    3    2
## [2,]    5    3
## [3,]    1    0
## [4,]    1    1

colnames(C) <- c("a", "b", "c", "d")
C

##      a b c d
## [1,] 3 2 1 0
## [2,] 5 3 1 1

rownames(C) <- c("first", "second")
C

##        a b c d
## first  3 2 1 0
## second 5 3 1 1

# number of rows
nrow(C)

## [1] 2

# number of columns
ncol(C)

## [1] 4

Subsetting of matrix

select columns:

(partC1 <- C[, c(1, 2)])

##        a b
## first  3 2
## second 5 3

select columns by column names:

(partC1 <- C[, c("a", "b")])

##        a b
## first  3 2
## second 5 3

partC1 is a matrix

class(partC1)

## [1] "matrix"

When you select one column, it becomes a vector, not a matrix

this_is_vector_now <- C[, "a"]
class(this_is_vector_now)

## [1] "numeric"

When you want to preserve matrix form:

still_matrix <- C[, "a", drop = FALSE]
class(still_matrix)

## [1] "matrix"

select rows:

(partC2 <- C[1,])

## a b c d 
## 3 2 1 0

Or, in a combined way:

(partC2 <- C[1, c(2, 3)])

## b c 
## 2 1

class(partC2)

## [1] "numeric"

When we use a single index, R counts the index columnwise:

C[1]

## [1] 3

C[2]

## [1] 5

C[3]

## [1] 2

C[4]

## [1] 3

is.vector(partC2)

## [1] TRUE

is.matrix(partC2)

## [1] FALSE

(Cm <- as.matrix(partC2))

##   [,1]
## b    2
## c    1

class(Cm)

## [1] "matrix"

(Cv <- as.vector(Cm))

## [1] 2 1

class(Cv)

## [1] "numeric"

Objects and classes

R is an object oriented language.

Every object in R is a member of a class.

You can use the class function to determine the class of object.

# numeric vector
class(c(1, 2, 3))

## [1] "numeric"

# character vector
class(c("c", "B", "z"))

## [1] "character"

# function
class(sin)

## [1] "function"

class(matrix(c(1,2)))

## [1] "matrix"

Workspace

The objects that you create using R remain in existence until you explicitly delete them.

rm(x) : remove object x
rm(list=ls()) : remove all objects

Working directory

When you run R, it uses one of the directories on your hard drive as a working directory,

where it looks for user-written programs and data files.

Check the working directory.

getwd()

Change the working directory to “dir”

setwd("dir")

/ is for directory and file address, . refers current directory, .. refers parent directory

Writing script

We can type and evaluate all possible R expression at the prompt, it is much more convenient to write scripts,

which simply comprise collections of R expression.
We use the terms program and code synonymously with script.

You can use built-in editor in Rgui or Rstudio.

Help

To find out more about an R command or function x, you can type help(x) or just ?x.

If you cannot remember the exact name, then help.search("x").

HTML help command : help.start()

package

R provides various useful packages to help you.

https://cran.r-project.org/web/packages/

To install a package:

install.packages("packagename")

To access the package:

library("packagename")

Or use package menu.

R documentation : https://www.rdocumentation.org/

728x90

'R Programming' 카테고리의 다른 글

R Programming (6) - Numerical integration (0)	2020.03.28
R Programming (5) - Sophisticated data structures (0)	2020.03.28
R Programming (4) - Function (0)	2020.03.28
R Programming (3) - IO (0)	2020.03.28
R Programming (2) - R basic (0)	2020.03.28

ABOUT ME

oziguyo oziguyo

Introduction to R

R as a calculating environment

Built-in function in R

Variable

Functions

Vectors

`typeof()`

Coercion

Example : mean and variance

Example : simple numerical integration

Example : exponential limit

Missing data

Inf and -Inf

NaN

Expression and assignment

x[subset]

Example : rounding error

Sequential `&&` and `||`

subsetting and assignment

Names

Matrix

Subsetting of matrix

Objects and classes

Workspace

Working directory

Writing script

Help

package

'R Programming' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Introduction to R

R as a calculating environment

Built-in function in R

Variable

Functions

Vectors

typeof()

Coercion

Example : mean and variance

Example : simple numerical integration

Example : exponential limit

Missing data

Inf and -Inf

NaN

Expression and assignment

x[subset]

Example : rounding error

Sequential && and ||

subsetting and assignment

Names

Matrix

Subsetting of matrix

Objects and classes

Workspace

Working directory

Writing script

Help

package

'R Programming' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

`typeof()`

Sequential `&&` and `||`