Getting started with R

Overview

  • What is R?
  • R’s correspondence with S
  • R features
  • Useful URLs
  • Installing R, RStudio
  • R and Statistics
  • Using R – Getting Started

What is R?

Contd…

  • Useful R books:
    • R in Action by Robert I. Kabacoff. Pub.: Manning Publications
    • Statistical Analysis with R by John M. Quick. Pub.: PACKT Publishing
    • Many more R e-books available through Books24X7 (available to CDAC through MCIT consortium).

Contd…

Contd…

  • R and statistics:
    • A comprehensive statistical platform providing all sorts of data analytics techniques.
    • Strong graphics capabilities to visualize complex data.
    • Designed to support interactive data analysis and exploration.
    • Capable of reading data from variety of sources.
    • Facility to program new statistical methods and packages.
  • Some disadvantages too…
    • Objects stored in primary memory. May impose performance bottlenecks in case of large datasets.
    • No provision of built-in dynamic or 3D graphics. But external packages like plot3D, scatterplot3D etc. available.
    • Similarly, no built-in support for web-based processing. Can be done through third-party packages.
    • Functionality scattered among packages

Using R – Getting started

  • Launch R Interface/RStudio depending on your platform.
  • Utility commands/functions:
    • setwd() – sets working directory.
    setwd("C:/RDemo1")
    • getwd() – gets current working directory.
    getwd()
    ## [1] "C:/RDemo1"
    • dir() – lists the contents of current working directory.
    dir()
    ## [1] "R-Basics.html" "R-Basics.Rmd"
    • ls() – lists names of objects in R environment
    ls()
    ## [1] "metadata"

Contd…

  • help.start() – provides general help.
  • help(“foo”) or ?foo – help on function “foo”. For ex. help(“mean”) or ?mean.
  • help.search(“foo”) or ??foo – search for string “foo” in help system. For ex. help.search(“mean”) or ??mean
  • example(“foo”) – shows examples of function “foo”.
    example("mean")
    ## 
    ## mean> x <- c(0:10, 50)
    ## 
    ## mean> xm <- mean(x)
    ## 
    ## mean> c(xm, mean(x, trim = 0.10))
    ## [1] 8.75 5.50
  • data() – lists all example datasets in currently loaded packages.
  • library() – lists all available packages

Contd…

  • data(foo) – loads dataset “foo” in R. For ex. data(mtcars)
  • library(foo) – load package “foo” in R. For ex. library(plyr).
  • rm(objectlist) – removes one or more objects from R workspace.
  • options() – shows/sets current options for workspace.
  • history(#) – lists last # commands. default 25.
  • install.packages(“foo”) – installs package “foo”. For ex. install.packages(“reshape2”).
  • help(package=”package-name”) – provides brief description of package, an index of functions and datasets in package.
  • print(x) or x- print obejct ‘x’ on terminal.
  • q() – quits current R session.

Using R – Data types

  • Five basic types in R are – character, numeric, integer, complex, logical(true/false).
  • Common data objects are – vector, matrix, list, factor, data frame, table.
  • Creating and assigning to a variable:
x<-1
  • Checking the type of variable:
class(x)
## [1] "numeric"

Contd…

  • Printing a variable:
x #auto-printing
## [1] 1
print(x) #explicit printing
## [1] 1
  • Creating Vector: contains objects of same class.
x<-c(1,2,3) #using c() function
y<-vector("logical", length=10) #using vector() function
length(x) #length of vector x
## [1] 3

Contd…

  • Vector operations: Various arithmetic operations can be performed member-wise.
y<-c(4,5,6)
5*x #multiplication by a scalar
## [1]  5 10 15
x+y #addition of two vectors
## [1] 5 7 9
x*y #multiplication of two vectors
## [1]  4 10 18
x^y #x to the power y
## [1]   1  32 729

Contd…

  • Creating Matrix: Two-dimensional array having elements of same class.
m<-matrix(c(1,2,3,11,12,13), nrow=2,ncol=3) #using matrix() function.
m
##      [,1] [,2] [,3]
## [1,]    1    3   12
## [2,]    2   11   13
dim(m) #dimensions of matrix m
## [1] 2 3
attributes(m) #attributes of matrix m
## $dim
## [1] 2 3

Contd…

  • By default, elements in matrix are filled by column. “byrow” attribute of matrix() can be used to fill elements by row.
m<-matrix(c(1,2,3,11,12,13), nrow=2,ncol=3, byrow = TRUE)
m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]   11   12   13

Contd…

  • cbind-ing and rbind-ing: By using cbind() and rbind() functions
x<-c(1,2,3)
y<-c(11,12,13)
cbind(x,y)
##      x  y
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
rbind(x,y)
##   [,1] [,2] [,3]
## x    1    2    3
## y   11   12   13

Contd…

  • Matrix operations/functions:
p<-3*m #multiplication by a scalar
n<-matrix(c(4,5,6,14,15,16), nrow=2,ncol=3)
q<-m+n #addition of two matrices
o<-matrix(c(4,5,6,14,15,16), nrow=3,ncol=2)
r<-m %*% o #matrix multiplication by using %*%
mdash<-t(m) #transpose of matrix
s<-matrix(c(4,5,6,14,15,16,24,25,26), nrow=3,ncol=3,
          byrow=TRUE)
s_det<-det(s) #determinant of s
m_row_sum<-rowSums(m)
m_col_sum<-colSums(m)

Contd…

p
##      [,1] [,2] [,3]
## [1,]    3    6    9
## [2,]   33   36   39
q
##      [,1] [,2] [,3]
## [1,]    5    8   18
## [2,]   16   26   29
r
##      [,1] [,2]
## [1,]   32   92
## [2,]  182  542

Contd…

mdash
##      [,1] [,2]
## [1,]    1   11
## [2,]    2   12
## [3,]    3   13
s_det
## [1] 1.110223e-14
m_row_sum
## [1]  6 36
m_col_sum
## [1] 12 14 16

Contd…

  • List: A special type of vector containing elements of different classes
x<-list(1,"p",TRUE,2+4i) #using list() function
x
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "p"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] 2+4i

Contd…

  • Factor: Represents categorical data. Can be ordered or unordered.
    status<-c("low","high","medium","high","low")
    x<-factor(status, ordered=TRUE,
            levels=c("low","medium","high")) #using factor() function
    x
    ## [1] low    high   medium high   low   
    ## Levels: low < medium < high
    • ‘levels’ argument is used to set the order of levels.
    • First level forms the baseline level.
    • Without any order, levels are called nominal. Ex. – Type1, Type2, …
    • With order, levels are called ordinal. Ex. – low, medium, …

Contd…

  • Data frame: Used to store tabular data. Can contain different classes
student_id<-c(1,2,3)
student_names<-c("Ram","Shyam","Laxman")
position<-c("First","Second","Third")
data<-data.frame(student_id,student_names,position) #using data.frame() function
data
##   student_id student_names position
## 1          1           Ram    First
## 2          2         Shyam   Second
## 3          3        Laxman    Third
data$student_id #accessing a particular column
## [1] 1 2 3

Contd…

nrow(data) #no. of rows in data
## [1] 3
ncol(data) #no. of columns in data
## [1] 3
names(data) #column names of data
## [1] "student_id"    "student_names" "position"

Using R – Control structures

  • R provides all types of control structures: if-else, for, while, repeat, break, next, return.
  • Mainly used within functions/scripts.
x<-5
if(x > 7) #if-else structure
  y<-TRUE else
    y<-FALSE
y
## [1] FALSE
for(i in 1:10) #for loop
  print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Contd…

count<-0
while(count < 10) #while loop
  count<-count+1
count
## [1] 10
  • repeat is used to create an infinite loop. It can be terminated only through a call to break.
  • next is used to skip an interation in a loop.
  • return is used to return a value from a function.

Using R – looping functions

  • These functions can be used loop over various type of objects.
  • lapply – loop over a list and evaluate a function on each element.
  • sapply – same as lapply but try to simplify the result.
  • apply – apply a function over the margins of an array
  • tapply – apply a function over the subsets of a vector
x<-list(a=1:5,b=rnorm(20))
lapply(x,sum) #lapply returns a list
## $a
## [1] 15
## 
## $b
## [1] -0.8801658

Contd…

x<-matrix(c(1,2,3,11,12,13), nrow=2, ncol=3,byrow=TRUE)
# MARGIN=1 for rows, MARGIN=2 for columns
apply(x,MARGIN=1,FUN=sum)
## [1]  6 36
y<-c(rnorm(20),runif(20),rnorm(20,1))
f<-gl(3,20) #generate factor levels as per given pattern
tapply(y,f,mean)
##          1          2          3 
## -0.2668254  0.5382292  0.9893389

Using R – Subsetting

  • Refers to extract sub-segment of data from R objects.
  • Important while working with large datasets.
  • There are various operators.
  • [ used to extract the object of same class as original generally from a vector or matrix.
  • [[ used to extract elements of a list or data frame.
  • $ used to extract elements from a list or data frame by name.
x<-c(1,2,3,4)
x[2]
## [1] 2
x[1:3]
## [1] 1 2 3

Contd…

  • Subsetting a matrix:
x<-matrix(c(1,2,3,11,12,13), nrow=2, ncol=3,byrow=TRUE)
x[1,2]
## [1] 2
x[1,]
## [1] 1 2 3
x[,2]
## [1]  2 12

Contd…

  • Subsetting a list:
x<-list(a=1,b="p",c=TRUE,d=2+4i)
x[[1]]
## [1] 1
x$d
## [1] 2+4i
x[["c"]]
## [1] TRUE
x["b"]
## $b
## [1] "p"

Contd…

  • Subsetting a data frame
data[1,]
##   student_id student_names position
## 1          1           Ram    First
data$student_names
## [1] Ram    Shyam  Laxman
## Levels: Laxman Ram Shyam
data[data$position=="Second",]
##   student_id student_names position
## 2          2         Shyam   Second
  • Using logical ANDs and ORs
    data[data$student_id>=2 & data$position=="Third",]
    ##   student_id student_names position
    ## 3          3        Laxman    Third

Using R – Functions

  • Created using the function() directive.
  • Can be passed as arguments to other functions. Can be nested.
  • Return value is the last expression to be evaluated inside function body.
  • Have named arguments with default values.
  • Some arguments can be missing during function calls.
add<-function(a=1,b=2,c=3) {
   s = a+b+c
   print(s)
  }
add()
## [1] 6
add(10,11,12)
## [1] 33
add(10)
## [1] 15

R Source files

  • Should be saved/created with .R extension.
  • Can be used to store functions, commands required to be executed sequentially etc.
  • source() function used to load such R scripts into R workspace.
source("C:/RDemo/test.R")
add()
## [1] 6

Contd…

source("C:/RDemo/test1.R", echo=T)
## 
## > x <- 1
## 
## > y <- 2
## 
## > x + y
## [1] 3
source("C:/RDemo/test1.R", print.eval=T)
## [1] 3

References