ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • R Programming (1) - R Intro
    R Programming 2020. 3. 25. 09:06
    728x90

     

    R as a calculating environment

    R can be used as a powerful calculator. For arithmetic calculations:`

    (1 + 1 / 100) ^ 100
    ## [1] 2.704814
    17 %% 5
    ## [1] 2
    17 %/% 5
    ## [1] 3

    [1] implies this is item 1 in a vector of output.

    symbol meaning
    + addition
    - subtraction
    * multiplication
    / division
    ^ exponential
    %% modulus
    %/% integer division

    Built-in function in R

    R has a number of built in functions:
    sin(x), cos(x), tan(x), exp(x), log(x), sqrt(x), floor(x), ceiling(x), round(x)

    exp(1)
    ## [1] 2.718282
    options(digits = 16)
    exp(1)
    ## [1] 2.718281828459045
    pi
    ## [1] 3.141592653589793
    sin(pi/6)
    ## [1] 0.4999999999999999
    floor(2.3)
    ## [1] 2
    ceiling(2.3)
    ## [1] 3
    round(2.3)
    ## [1] 2

    Variable

    • We can assign a value to a variable and use the variable.

    • For the assignment, we use command <-
      • Usually, this is pronounced as “gets”.
    • Variable names made up of letters, numbers, . or _
      • provided it starts with a letter, or . then a letter.
      • names are case sensitive.
      • for example,
        • x, y, my_variable, a1, a2, .important_variable, x.input
      • wrong name:

    To display the value of a variable x, we type x

    • or print(x) or show(x).
    x <- 100
    x
    ## [1] 100
    print(x)
    ## [1] 100
    show(x)
    ## [1] 100

    We can show the outcome of assignment by parentheses.

    (y <- (1 + 1 / x) ^ x)
    ## [1] 2.704813829421528

    When assigning, the right-hand side is evaluated first, then that value is placed in the variable on the left-hand side.

    n <- 1
    n <- n + 1
    n
    ## [1] 2

    R allows the use of = for variable assignment, in common with most programming languages.

    The following also works.

    3 -> three
    three
    ## [1] 3

    You can have multiple assignments.

    v <- w <- z <- 1
    v
    ## [1] 1
    w
    ## [1] 1
    z
    ## [1] 1

    Functions

    Takes one or more argument (inputs) and produces one or more outputs (return values).

    seq(from = 1, to = 9, by =2)
    ## [1] 1 3 5 7 9
    seq(from = 1, to = 9)
    ## [1] 1 2 3 4 5 6 7 8 9

    You can access the built-in help by help(function name) or ?function name:

    Every function has a default order for arguments.
    If you provide arguments in this order, then they do not need to be named.

    seq(1, 9, 2)
    ## [1] 1 3 5 7 9
    seq(to = 9, from = 1)
    ## [1] 1 2 3 4 5 6 7 8 9
    seq(by=-2, 9, 1)
    ## [1] 9 7 5 3 1

    We will study more about functions in Chapter 4.

    Vectors

    • the basic data structure in R
      • also called atomic vector
    • indexed set of variables
    • i-th element of vector x is denoted by x[i]
    • the types of all elements of an atomic vector should be the same
      • logical, integer, double (often called numeric), character
    • three basic functions for constructing vectors
      • seq, rep, c
    • Basic three properties:
      • Type, typeof(), what it is.
      • Length, length(), how many elements it contains.
      • Attributes, attributes(), additional arbitrary metadata.
    # short for sequence
    (x <- seq(1, 20, by = 2))
    ##  [1]  1  3  5  7  9 11 13 15 17 19
    (x2 <- seq(1.1, 2, by = 0.1))
    ##  [1] 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
    # short for repeat
    (y <- rep(3, 4))
    ## [1] 3 3 3 3
    # short for combine
    (z <- c(y, x))
    ##  [1]  3  3  3  3  1  3  5  7  9 11 13 15 17 19
    • another method for sequence
    (x <- 100:110)
    ##  [1] 100 101 102 103 104 105 106 107 108 109 110
    (y <- 110:100)
    ##  [1] 110 109 108 107 106 105 104 103 102 101 100
    length(x)
    ## [1] 11
    • letters, LETTERS
    # the 26 lower-case letters of the Roman alphabet
    letters
    ##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
    ## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
    # the 26 upper-case letters of the Roman alphabet;
    LETTERS
    ##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
    ## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
    • seq_along()

    seq_along(x) is the same as 1:length(x)

    x <- letters[1:5]  # a b c d e
    y <- seq_along(x)
    print(y)
    ## [1] 1 2 3 4 5
    • vector and index
    (x<- 100:110)
    ##  [1] 100 101 102 103 104 105 106 107 108 109 110
    # second element
    x[2]
    ## [1] 101
    # last element
    x[length(x)]
    ## [1] 110
    (x<- 100:110)
    ##  [1] 100 101 102 103 104 105 106 107 108 109 110
    i <- c(1, 3, 2)
    
    x[i]
    ## [1] 100 102 101
    x[1:5]
    ## [1] 100 101 102 103 104

    minus index:

    j <- c(-1, -2, -3)
    x[j]
    ## [1] 103 104 105 106 107 108 109 110

    but you can’t mix positive and negative index, the following code is erroneous.

    x[c(-1, 2)]

    empty vetor:

    x <- c()

    elementwise algebraic operation:

    x <- c(1, 2, 3)
    y <- c(4, 5, 6)
    x * y
    ## [1]  4 10 18
    y ^ x
    ## [1]   4  25 216

    with unequal length of vectors:

    c(1, 2, 3, 4) + c(1, 2)
    ## [1] 2 4 4 6
    2 * c(1, 2, 3)
    ## [1] 2 4 6
    (1:3)^2
    ## [1] 1 4 9

    This works but with warning message:

    c(1, 2, 3) + c(1, 2)

    functions taking vectors

    sqrt(1:3)
    ## [1] 1.000000000000000 1.414213562373095 1.732050807568877
    mean(1:6)
    ## [1] 3.5
    sort(c(5, 1, 3, 4, 2))
    ## [1] 1 2 3 4 5

    logical vector

    logi <- c(TRUE, FALSE, T, F)
    typeof(logi)
    ## [1] "logical"

    character vector

    char <- c("a", "vector", "of", "characters")
    typeof(char)
    ## [1] "character"

    integer vector

    integ <- c(1L, 2L, 3L, 4L)
    typeof(integ)
    ## [1] "integer"

    double vector

    doub <- c(1, 2, 3, 4)
    typeof(doub)
    ## [1] "double"

    typeof()

    typeof() returns an (internal) object type :

    • logical - logical vector
    • double - double vector
    • interger - interger vector
    • character - character vector
    • list - see Ch.5
    • builtin : R built-in function
    • closure : user defined function
    • and so on.

    Simply speaking, the objects in R are implemented via C, and the internal types mean the C level data types.

    Coercion

    If we try to combine differet types of elements in a vector, they will be coerced to the most flexible type.

    • logical, integer, double, character
    c("a", 1)
    ## [1] "a" "1"
    c(TRUE, FALSE, 0)
    ## [1] 1 0 0

    Example : mean and variance

    compare computed mean and variance with built-in functions

    x <- c(1.2, 0.9, 0.8, 1, 1.2)
    
    x.mean <- sum(x)/length(x)
    
    x.mean - mean(x) 
    ## [1] 0
    x.var <- sum((x-x.mean)^2)/(length(x)-1)
    x.var - var(x)
    ## [1] 0

    Example : simple numerical integration

    • The basic problem in numerical integration is to compute an approximate solution to a definite integral.
    dt <- 0.005
    t <- seq(0, pi/6, by = dt)
    ft <- cos(t)
    (I <- sum(ft) * dt)
    ## [1] 0.5015486506255458

    t is a vector and ft is also a vector.

    I - sin(pi/6)
    ## [1] 0.001548650625545822

    Note the difference between the numerical integration and theoretical value.

    Example : exponential limit

    x <- seq(10, 200, by = 10)
    
    y <- (1 + 1/x)^x
    
    exp(1) - y
    ##  [1] 0.124539368359042779 0.064984123314622888 0.043963052588742446
    ##  [4] 0.033217990069081882 0.026693799385437256 0.022311689128828860
    ##  [7] 0.019165457482859694 0.016796887705717634 0.014949367400859170
    ## [10] 0.013467999037516609 0.012253746954290712 0.011240337596801542
    ## [13] 0.010381746740967479 0.009645014537900565 0.009005917124194074
    ## [16] 0.008446252151229849 0.007952077235180433 0.007512532619638357
    ## [19] 0.007119033847887923 0.006764705529727966
    plot(x, y)

    Missing data

    in R, missing data is represented by NA.

    a <- NA   # assign NA to variable A
    is.na(a)     # is it missing?
    ## [1] TRUE
    a <- c(11, NA, 13)
    is.na(a)
    ## [1] FALSE  TRUE FALSE
    mean(a)
    ## [1] NA
    mean(a, na.rm = TRUE) #NAs can be removed
    ## [1] 12

    NA is logical

    typeof(NA)
    ## [1] "logical"

    and can be coerced to numeric, or integer, or charactor

    x <- c(NA, 10)
    typeof(x[1])
    ## [1] "double"
    y <- c(NA, "abc")
    typeof(y[1])
    ## [1] "character"

    Inf and -Inf

    If a computation result is too big, R will return Inf.

    2 ^ 1024
    ## [1] Inf
    - 2 ^ 1024
    ## [1] -Inf
    1 / 0
    ## [1] Inf

    NaN

    If a computation result makes little sense, then R will return NaN, not a number.

    Inf - Inf
    ## [1] NaN
    0 / 0
    ## [1] NaN

    Expression and assignment

    Expression is a phrase of code that can be executed.

    seq(10, 20, by=3)
    ## [1] 10 13 16 19
    4
    ## [1] 4
    mean(c(1,2,3))
    ## [1] 2
    1 > 2
    ## [1] FALSE

    If the evaluation of the expression is saved using <-, then it called an assignment.

    x1 <- seq(10, 20, by=3)
    x2 <- 1>2

    A logical expression is formed using

    • the comparison operators
      • <, >, <=, >=, ==, and != (not equal to)
    • and the logical operators
      • & (and), | (or), and ! (not).

    The value of a logical expression is either TRUE or FALSE.

    • The integers 1 and 0 can also be used as TRUE or FALSE.
    c(0, 0, 1, 1) | c(0, 1, 0, 1)
    ## [1] FALSE  TRUE  TRUE  TRUE

    x[subset]

    R’s subsetting operators are powerful and fast.

    Subsetting is hard to learn.

    We can extract a subvector using a subset as a vector of TRUE/FALSE.

    x <- 1:10
    x%%4 == 0
    ##  [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
    ( y <- x[ x%%4==0 ] )
    ## [1] 4 8

    R also provide subset function, which ignore NA.

    • whereas x[subset] preserve NA.
    x <- c(1, NA, 3, 4)
    x[x > 2]
    ## [1] NA  3  4
    subset(x, subset = x>2 )
    ## [1] 3 4

    For the index position of TRUE elements, use which(x)

    x <- c(1, 1, 2, 3, 5, 8, 13)
    which(x %% 2 == 0)
    ## [1] 3 6

    Explore more about subsetting:

    x <- c(2.1, 4.2, 3.3, 5.4)
    x[c(TRUE,FALSE,TRUE,FALSE)]
    ## [1] 2.1 3.3
    x[x > 3]
    ## [1] 4.2 3.3 5.4
    x[order(x)]
    ## [1] 2.1 3.3 4.2 5.4

    Example : rounding error

    Many floating numbers are subject to rounding errors in digital computers.

    2 * 2 == 4
    ## [1] TRUE
    sqrt(2) * sqrt(2) == 2
    ## [1] FALSE

    The solution is to use all.equal(x,y), which returns TRUE if the difference between x and y is smaller than some tolerance.

    all.equal(sqrt(2) * sqrt(2), 2)
    ## [1] TRUE

    Sequential && and ||

    To evaluate x && y, R first evaluate x. If x is FALSE then R returns FALSE without evaluating y.

    To evaluate x || y, R first evaluate x. If x is TRUE then R returns TRUE without evaluating y.

    Sequential evalutation of x and y is useful when y is not always well defined.

    x <- 0
    x == 0 || sin(1/x) == 0
    ## [1] TRUE

    Note that && and || only work on scalars, whereas & and | work on vectors elementwise.

    subsetting and assignment

    Subsetting operators can be combined with assignment operator.

    x <- 1:5
    x[c(1, 2)] <- 2:3
    x
    ## [1] 2 3 3 4 5
    x[-1] <- 4:1
    x
    ## [1] 2 4 3 2 1
    x[x %% 2 == 0] <- 0
    x
    ## [1] 0 0 3 0 1
    x[which(x == max(x))] <- 100
    x
    ## [1]   0   0 100   0   1

    Names

    You can name a vector:

    (x <- c(a = 1, b = 2, c = 3))
    ## a b c 
    ## 1 2 3

    names function returns a charater vector:

    names(x)
    ## [1] "a" "b" "c"
    y <- 1:3
    names(y) <- c("first", "second", "third")
    y
    ##  first second  third 
    ##      1      2      3

    and select elements by names:

    y["first"]
    ## first 
    ##     1
    y[c("first", "third")]
    ## first third 
    ##     1     3

    Matrix

    Matrix is created from a vector using the function matrix:

    • matrix( data, nrow =1, ncol=1, byrow=FALSE )
      • data : vector of length at most nrow*ncol
        • if length of vector < nrow*ncol, then data is reused as many times as is needed
      • nrow : number of rows
      • ncol : number of columns
      • byrow = TRUE : fill the matrix up row-by-row
      • byrow = FALSE : fill the matrix up column-by-column, default
    • diag(x) : create diagonal matrix

    • rbind(...) : join matrices with rows of the same length

    • cbind(...) : join matrices with columns of the same length

    Example:

    (A <- matrix(1:6, nrow = 2, ncol = 3))
    ##      [,1] [,2] [,3]
    ## [1,]    1    3    5
    ## [2,]    2    4    6
    (A <- matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE))
    ##      [,1] [,2] [,3]
    ## [1,]    1    2    3
    ## [2,]    4    5    6
    A[1, 3] <- 0
    A
    ##      [,1] [,2] [,3]
    ## [1,]    1    2    0
    ## [2,]    4    5    6
    A[, 2:3]
    ##      [,1] [,2]
    ## [1,]    2    0
    ## [2,]    5    6
    (B <- diag(c(1, 2, 3)))
    ##      [,1] [,2] [,3]
    ## [1,]    1    0    0
    ## [2,]    0    2    0
    ## [3,]    0    0    3

    Matrix operation

    • Usual algebraic operations, including *, act elementwise.

    • To perform matrix operation, we use %*%.

    • nrow(x), ncol(x)

    • det(x) : determinant of x

    • t(x) : transpose of x

    • solve(A, B) : returns x such that A %*% x = B

    • If A is invertable, the solve(A) is the inverse of A.

    A <- matrix(c(3, 5, 2, 3), nrow=2, ncol=2)
    B <- matrix(c(1, 1, 0, 1), nrow=2, ncol=2)
    A %*% B
    ##      [,1] [,2]
    ## [1,]    5    2
    ## [2,]    8    3
    A.inv <- solve(A)
    A %*% A.inv   # we observe rounding error
    ##      [,1]                   [,2]
    ## [1,]    1 -8.881784197001252e-16
    ## [2,]    0  1.000000000000000e+00
    A^(-1)   #This is not an inverse. ^(-1) applies elementwise.
    ##                    [,1]               [,2]
    ## [1,] 0.3333333333333333 0.5000000000000000
    ## [2,] 0.2000000000000000 0.3333333333333333

    cbind() : Combine R objects by columns

    rbind() : Combine R objects by rows

    (C <- cbind(A,B))
    ##      [,1] [,2] [,3] [,4]
    ## [1,]    3    2    1    0
    ## [2,]    5    3    1    1
    (D <- rbind(A,B))
    ##      [,1] [,2]
    ## [1,]    3    2
    ## [2,]    5    3
    ## [3,]    1    0
    ## [4,]    1    1
    colnames(C) <- c("a", "b", "c", "d")
    C
    ##      a b c d
    ## [1,] 3 2 1 0
    ## [2,] 5 3 1 1
    rownames(C) <- c("first", "second")
    C
    ##        a b c d
    ## first  3 2 1 0
    ## second 5 3 1 1
    # number of rows
    nrow(C)
    ## [1] 2
    # number of columns
    ncol(C)
    ## [1] 4

    Subsetting of matrix

    select columns:

    (partC1 <- C[, c(1, 2)])
    ##        a b
    ## first  3 2
    ## second 5 3

    select columns by column names:

    (partC1 <- C[, c("a", "b")])
    ##        a b
    ## first  3 2
    ## second 5 3

    partC1 is a matrix

    class(partC1)
    ## [1] "matrix"

    When you select one column, it becomes a vector, not a matrix

    this_is_vector_now <- C[, "a"]
    class(this_is_vector_now)
    ## [1] "numeric"

    When you want to preserve matrix form:

    still_matrix <- C[, "a", drop = FALSE]
    class(still_matrix)
    ## [1] "matrix"

    select rows:

    (partC2 <- C[1,])
    ## a b c d 
    ## 3 2 1 0

    Or, in a combined way:

    (partC2 <- C[1, c(2, 3)])
    ## b c 
    ## 2 1
    class(partC2)
    ## [1] "numeric"

    When we use a single index, R counts the index columnwise:

    C[1]
    ## [1] 3
    C[2]
    ## [1] 5
    C[3]
    ## [1] 2
    C[4]
    ## [1] 3
    is.vector(partC2)
    ## [1] TRUE
    is.matrix(partC2)
    ## [1] FALSE
    (Cm <- as.matrix(partC2))
    ##   [,1]
    ## b    2
    ## c    1
    class(Cm)
    ## [1] "matrix"
    (Cv <- as.vector(Cm))
    ## [1] 2 1
    class(Cv)
    ## [1] "numeric"

    Objects and classes

    R is an object oriented language.

    Every object in R is a member of a class.

    You can use the class function to determine the class of object.

    # numeric vector
    class(c(1, 2, 3))
    ## [1] "numeric"
    # character vector
    class(c("c", "B", "z"))
    ## [1] "character"
    # function
    class(sin)
    ## [1] "function"
    class(matrix(c(1,2)))
    ## [1] "matrix"

    Workspace

    The objects that you create using R remain in existence until you explicitly delete them.

    • rm(x) : remove object x

    • rm(list=ls()) : remove all objects

    Working directory

    When you run R, it uses one of the directories on your hard drive as a working directory,

    • where it looks for user-written programs and data files.

    Check the working directory.

    getwd()

    Change the working directory to “dir”

    setwd("dir")

    / is for directory and file address, . refers current directory, .. refers parent directory

    Writing script

    We can type and evaluate all possible R expression at the prompt, it is much more convenient to write scripts,

    • which simply comprise collections of R expression.

    • We use the terms program and code synonymously with script.

    You can use built-in editor in Rgui or Rstudio.

    Help

    To find out more about an R command or function x, you can type help(x) or just ?x.

    If you cannot remember the exact name, then help.search("x").

    HTML help command : help.start()

    package

    R provides various useful packages to help you.

    https://cran.r-project.org/web/packages/

    To install a package:

    install.packages("packagename")

    To access the package:

    library("packagename")

    Or use package menu.

    R documentation : https://www.rdocumentation.org/

    728x90

    'R Programming' 카테고리의 다른 글

    R Programming (6) - Numerical integration  (0) 2020.03.28
    R Programming (5) - Sophisticated data structures  (0) 2020.03.28
    R Programming (4) - Function  (0) 2020.03.28
    R Programming (3) - IO  (0) 2020.03.28
    R Programming (2) - R basic  (0) 2020.03.28

    댓글

Designed by Tistory.