Getting started with R

Overview

  • What is R?
  • R’s correspondence with S
  • R features
  • Useful URLs
  • Installing R, RStudio
  • R and Statistics
  • Using R – Getting Started

What is R?

Contd…

  • Useful R books:
    • R in Action by Robert I. Kabacoff. Pub.: Manning Publications
    • Statistical Analysis with R by John M. Quick. Pub.: PACKT Publishing
    • Many more R e-books available through Books24X7 (available to CDAC through MCIT consortium).

Contd…

Contd…

  • R and statistics:
    • A comprehensive statistical platform providing all sorts of data analytics techniques.
    • Strong graphics capabilities to visualize complex data.
    • Designed to support interactive data analysis and exploration.
    • Capable of reading data from variety of sources.
    • Facility to program new statistical methods and packages.
  • Some disadvantages too…
    • Objects stored in primary memory. May impose performance bottlenecks in case of large datasets.
    • No provision of built-in dynamic or 3D graphics. But external packages like plot3D, scatterplot3D etc. available.
    • Similarly, no built-in support for web-based processing. Can be done through third-party packages.
    • Functionality scattered among packages

Using R – Getting started

  • Launch R Interface/RStudio depending on your platform.
  • Utility commands/functions:
    • setwd() – sets working directory.
    setwd("C:/RDemo1")
    • getwd() – gets current working directory.
    getwd()
    ## [1] "C:/RDemo1"
    • dir() – lists the contents of current working directory.
    dir()
    ## [1] "R-Basics.html" "R-Basics.Rmd"
    • ls() – lists names of objects in R environment
    ls()
    ## [1] "metadata"

Contd…

  • help.start() – provides general help.
  • help(“foo”) or ?foo – help on function “foo”. For ex. help(“mean”) or ?mean.
  • help.search(“foo”) or ??foo – search for string “foo” in help system. For ex. help.search(“mean”) or ??mean
  • example(“foo”) – shows examples of function “foo”.
    example("mean")
    ## 
    ## mean> x <- c(0:10, 50)
    ## 
    ## mean> xm <- mean(x)
    ## 
    ## mean> c(xm, mean(x, trim = 0.10))
    ## [1] 8.75 5.50
  • data() – lists all example datasets in currently loaded packages.
  • library() – lists all available packages

Contd…

  • data(foo) – loads dataset “foo” in R. For ex. data(mtcars)
  • library(foo) – load package “foo” in R. For ex. library(plyr).
  • rm(objectlist) – removes one or more objects from R workspace.
  • options() – shows/sets current options for workspace.
  • history(#) – lists last # commands. default 25.
  • install.packages(“foo”) – installs package “foo”. For ex. install.packages(“reshape2”).
  • help(package=”package-name”) – provides brief description of package, an index of functions and datasets in package.
  • print(x) or x- print obejct ‘x’ on terminal.
  • q() – quits current R session.

Using R – Data types

  • Five basic types in R are – character, numeric, integer, complex, logical(true/false).
  • Common data objects are – vector, matrix, list, factor, data frame, table.
  • Creating and assigning to a variable:
x<-1
  • Checking the type of variable:
class(x)
## [1] "numeric"

Contd…

  • Printing a variable:
x #auto-printing
## [1] 1
print(x) #explicit printing
## [1] 1
  • Creating Vector: contains objects of same class.
x<-c(1,2,3) #using c() function
y<-vector("logical", length=10) #using vector() function
length(x) #length of vector x
## [1] 3

Contd…

  • Vector operations: Various arithmetic operations can be performed member-wise.
y<-c(4,5,6)
5*x #multiplication by a scalar
## [1]  5 10 15
x+y #addition of two vectors
## [1] 5 7 9
x*y #multiplication of two vectors
## [1]  4 10 18
x^y #x to the power y
## [1]   1  32 729

Contd…

  • Creating Matrix: Two-dimensional array having elements of same class.
m<-matrix(c(1,2,3,11,12,13), nrow=2,ncol=3) #using matrix() function.
m
##      [,1] [,2] [,3]
## [1,]    1    3   12
## [2,]    2   11   13
dim(m) #dimensions of matrix m
## [1] 2 3
attributes(m) #attributes of matrix m
## $dim
## [1] 2 3

Contd…

  • By default, elements in matrix are filled by column. “byrow” attribute of matrix() can be used to fill elements by row.
m<-matrix(c(1,2,3,11,12,13), nrow=2,ncol=3, byrow = TRUE)
m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]   11   12   13

Contd…

  • cbind-ing and rbind-ing: By using cbind() and rbind() functions
x<-c(1,2,3)
y<-c(11,12,13)
cbind(x,y)
##      x  y
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
rbind(x,y)
##   [,1] [,2] [,3]
## x    1    2    3
## y   11   12   13

Contd…

  • Matrix operations/functions:
p<-3*m #multiplication by a scalar
n<-matrix(c(4,5,6,14,15,16), nrow=2,ncol=3)
q<-m+n #addition of two matrices
o<-matrix(c(4,5,6,14,15,16), nrow=3,ncol=2)
r<-m %*% o #matrix multiplication by using %*%
mdash<-t(m) #transpose of matrix
s<-matrix(c(4,5,6,14,15,16,24,25,26), nrow=3,ncol=3,
          byrow=TRUE)
s_det<-det(s) #determinant of s
m_row_sum<-rowSums(m)
m_col_sum<-colSums(m)

Contd…

p
##      [,1] [,2] [,3]
## [1,]    3    6    9
## [2,]   33   36   39
q
##      [,1] [,2] [,3]
## [1,]    5    8   18
## [2,]   16   26   29
r
##      [,1] [,2]
## [1,]   32   92
## [2,]  182  542

Contd…

mdash
##      [,1] [,2]
## [1,]    1   11
## [2,]    2   12
## [3,]    3   13
s_det
## [1] 1.110223e-14
m_row_sum
## [1]  6 36
m_col_sum
## [1] 12 14 16

Contd…

  • List: A special type of vector containing elements of different classes
x<-list(1,"p",TRUE,2+4i) #using list() function
x
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "p"
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] 2+4i

Contd…

  • Factor: Represents categorical data. Can be ordered or unordered.
    status<-c("low","high","medium","high","low")
    x<-factor(status, ordered=TRUE,
            levels=c("low","medium","high")) #using factor() function
    x
    ## [1] low    high   medium high   low   
    ## Levels: low < medium < high
    • ‘levels’ argument is used to set the order of levels.
    • First level forms the baseline level.
    • Without any order, levels are called nominal. Ex. – Type1, Type2, …
    • With order, levels are called ordinal. Ex. – low, medium, …

Contd…

  • Data frame: Used to store tabular data. Can contain different classes
student_id<-c(1,2,3)
student_names<-c("Ram","Shyam","Laxman")
position<-c("First","Second","Third")
data<-data.frame(student_id,student_names,position) #using data.frame() function
data
##   student_id student_names position
## 1          1           Ram    First
## 2          2         Shyam   Second
## 3          3        Laxman    Third
data$student_id #accessing a particular column
## [1] 1 2 3

Contd…

nrow(data) #no. of rows in data
## [1] 3
ncol(data) #no. of columns in data
## [1] 3
names(data) #column names of data
## [1] "student_id"    "student_names" "position"

Using R – Control structures

  • R provides all types of control structures: if-else, for, while, repeat, break, next, return.
  • Mainly used within functions/scripts.
x<-5
if(x > 7) #if-else structure
  y<-TRUE else
    y<-FALSE
y
## [1] FALSE
for(i in 1:10) #for loop
  print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Contd…

count<-0
while(count < 10) #while loop
  count<-count+1
count
## [1] 10
  • repeat is used to create an infinite loop. It can be terminated only through a call to break.
  • next is used to skip an interation in a loop.
  • return is used to return a value from a function.

Using R – looping functions

  • These functions can be used loop over various type of objects.
  • lapply – loop over a list and evaluate a function on each element.
  • sapply – same as lapply but try to simplify the result.
  • apply – apply a function over the margins of an array
  • tapply – apply a function over the subsets of a vector
x<-list(a=1:5,b=rnorm(20))
lapply(x,sum) #lapply returns a list
## $a
## [1] 15
## 
## $b
## [1] -0.8801658

Contd…

x<-matrix(c(1,2,3,11,12,13), nrow=2, ncol=3,byrow=TRUE)
# MARGIN=1 for rows, MARGIN=2 for columns
apply(x,MARGIN=1,FUN=sum)
## [1]  6 36
y<-c(rnorm(20),runif(20),rnorm(20,1))
f<-gl(3,20) #generate factor levels as per given pattern
tapply(y,f,mean)
##          1          2          3 
## -0.2668254  0.5382292  0.9893389

Using R – Subsetting

  • Refers to extract sub-segment of data from R objects.
  • Important while working with large datasets.
  • There are various operators.
  • [ used to extract the object of same class as original generally from a vector or matrix.
  • [[ used to extract elements of a list or data frame.
  • $ used to extract elements from a list or data frame by name.
x<-c(1,2,3,4)
x[2]
## [1] 2
x[1:3]
## [1] 1 2 3

Contd…

  • Subsetting a matrix:
x<-matrix(c(1,2,3,11,12,13), nrow=2, ncol=3,byrow=TRUE)
x[1,2]
## [1] 2
x[1,]
## [1] 1 2 3
x[,2]
## [1]  2 12

Contd…

  • Subsetting a list:
x<-list(a=1,b="p",c=TRUE,d=2+4i)
x[[1]]
## [1] 1
x$d
## [1] 2+4i
x[["c"]]
## [1] TRUE
x["b"]
## $b
## [1] "p"

Contd…

  • Subsetting a data frame
data[1,]
##   student_id student_names position
## 1          1           Ram    First
data$student_names
## [1] Ram    Shyam  Laxman
## Levels: Laxman Ram Shyam
data[data$position=="Second",]
##   student_id student_names position
## 2          2         Shyam   Second
  • Using logical ANDs and ORs
    data[data$student_id>=2 & data$position=="Third",]
    ##   student_id student_names position
    ## 3          3        Laxman    Third

Using R – Functions

  • Created using the function() directive.
  • Can be passed as arguments to other functions. Can be nested.
  • Return value is the last expression to be evaluated inside function body.
  • Have named arguments with default values.
  • Some arguments can be missing during function calls.
add<-function(a=1,b=2,c=3) {
   s = a+b+c
   print(s)
  }
add()
## [1] 6
add(10,11,12)
## [1] 33
add(10)
## [1] 15

R Source files

  • Should be saved/created with .R extension.
  • Can be used to store functions, commands required to be executed sequentially etc.
  • source() function used to load such R scripts into R workspace.
source("C:/RDemo/test.R")
add()
## [1] 6

Contd…

source("C:/RDemo/test1.R", echo=T)
## 
## > x <- 1
## 
## > y <- 2
## 
## > x + y
## [1] 3
source("C:/RDemo/test1.R", print.eval=T)
## [1] 3

References

Best Practices for Using R Securely

If you download R (or R packages) using an unencrypted Internet connection, there is a possibility that a malicious actor could modify the code in transit (or substitute their own file), if they have access to the connection linking you and the CRAN server delivering the code. (This is possible, for example, when you download R using an unsecured Wi-Fi network.) This could potentially give an attacker the same rights you have to execute code on your system.

To eliminate the possibility of such an attack, the R Consortium recommends all R users to always download R and R packages using an encrypted HTTPS connection from a secure server. Read about Best Practices for Using R Securely.

iPhone 6 Review: Meet The New Best Smartphone

TechCrunch

Apple has two new iPhones debuting today, including the iPhone 6. The iPhone 6 is the heir apparent to the flagship line of Apple smartphones, as it comes in at the same price point as the iPhone 5s, but Apple has done something new this year by introducing a premium priced iPhone 6 Plus. The iPhone 6 is still plenty premium, however, and its 4.7-inch screen is likely going to be a better fit for most users, which is why it earns our vote as the best smartphone currently available.

Video Review

[tc_5min code=”518418694″]

Basics

  • 4.7-inch, 1334 x 750 display, 326 ppi with 1400:1 contrast
  • 16, 64 or 128GB storage
  • A8 processor (64-bit)
  • 8MP iSight camera (rear) with 1.5 micron pixels, 1.2 megapixel FaceTime camera (front)
  • Dual-band 802.11ac Wi-Fi
  • 20-band LTE support
  • MSRP: 16GB for $199 on contract/$649 contract free; 64GB for $299 on contract/$749 contract free; 128GB for $399 on…

View original post 2,128 more words

iOS 8 Review: Refinements And Relaxed Limitations Add Up For A Better Experience

TechCrunch

Apple’s iOS 8 is arriving tomorrow, and while it isn’t as overtly dramatic a change as iOS 7 was last year, it’s still a big update with lots of new features and tweaks. Using it on the new iPhone 6 hardware revealed lots to love in the new mobile OS from Apple, some easing of restrictions that could lead to big advantages for third-party apps, and a lot of potential to change the basic mechanics of the iOS ecosystem.

Messages

ios8-messages

Apple’s new Messages app in iOS 8 more closely resembles the various messaging networks that have sprung up, and that’s a very good thing, because it means you get access to some fun features, and as long as you’re chatting with someone who already has an iPhone or iPad capable of running iOS 8, they’re also already on board without any kind of download or sign-up.

The new features in…

View original post 1,821 more words

iPhone 6 Plus Review: The First Truly Well-Designed Big Smartphone

TechCrunch

Apple is launching not one, but two premium smartphones today, and the iPhone 6 Plus is the one many probably were skeptical even existed just a few short months ago. With a screen size measuring 5.5-inches across the diagonal, it’s well into the territory labeled “phablet” on the ancient sea charts of mariners who’ve braved the Android waters. However, Apple’s version of a smartphone that strains the inclusion of “phone” in any word describing it might surprise even those dead set against the trend toward ever-bigger mobile screens.

Video Review

[tc_5min code=”518417387″]

Basics

  • 5.5-inch, 1920 x 1080 display, 401 ppi with 1300:1 contrast
  • 16, 64 or 128GB storage
  • A8 processor (64-bit)
  • 8MP iSight camera (rear) with 1.5 micron pixels and optical image stabilization, 1.2 megapixel FaceTime camera (front)
  • Dual-band 802.11ac Wi-Fi
  • 20-band LTE support
  • MSRP: 16GB for $299 on contract/$749 contract free; 64GB for $399 on contract/$849 contract free; 128GB…

View original post 2,129 more words