‘Strong Signal’ Stirs Interest in Hunt for Alien Life

A “strong signal” detected by a radio telescope in Russia that is scanning the heavens for signs of extraterrestrial life has stirred interest among the scientific community. The signal is from the direction of a HD164595, a star about 95 light-years from Earth. Read more…

AlphaGo: Possible repercussions and India

ET’s editorial on March 11, 2016 talks about AlphaGo and draws some interesting sketches. At one point, “An AI-run factory, goes a joke, employs just a man and a dog. The dog’s job is to keep the man away from the factory. Why have the man at all, in that case? Someone has to feed the dog.”,
From the same editorial – A possible scenario for India: “AI will enhance productivity and profits for all companies that can master it and deploy it. Much of India’s advanced IT services industry might get replaced by AI, unless industry itself deploys AI. Indian universities have to teach and advance AI in all its myriad forms. India’s human intelligence potential must be realized, for the Indian economy to benefit from AI rather than be its victim.”
Lets wait and watch how it unfolds.

Data Handling with R

Overview

  • Raw Vs Processed data
  • Reading data into R
  • Pre-processing
  • Summary analysis
  • Useful data sources

Raw Vs Processed data

  • Raw data:
    • Original source of data
    • Comes in wide varieties
    • Hard to use directly for data analysis exercise.
    • Needs to be processed.
  • Processed data
    • Ready for data analysis.
    • Processing involves transforming, subsetting, merging etc.
    • Processing should be performed as per set standards.
    • All processing steps should be recorded.
  • Ingredients of data analysis pipeline:
    • Raw data
    • Tidy (processed) data ready for analysis
    • Codebook describing each variable and its values in tidy dataset, other variables not in dataset, summary choices, experimental study design etc.
    • Explicit and exact step-by-step approach for data analysis against said objectives.

Reading data into R

  • Downloading from Internet:
    • download.file() function: download.file(url=”fileurl”, destfile=”filename”,method=”method-name”)
    • methods: curl, wget,lynx,internal
    #fileurl<-"https://data.gov.in/resources/weekly-wholesale-price-turarhar-dal-upto-2012/download"
    #download.file(url=fileurl, destfile="tur-dal-price-upto-2012.csv", method="auto")
    #list.files("./")
  • Reading local files: Use read.table(), read.csv() functions.
    dataset<-read.csv("fdata.csv")
    class(dataset)
    ## [1] "data.frame"
    dim(dataset)
    ## [1] 14931   239
    • Important parameters: sep, header, quote, na.strings, nrows, skip.

Contd…

  • There are many more methods in R to read from different data sources such as Excel, XML, JSON, MySQL, PostgreSQL, from web and APIs etc.

Pre-processing

  • Subsetting revisited
    • Using logical ANDs and ORs
    student_id<-c(1,2,3)
    student_names<-c("Ram","Shyam","Laxman")
    position<-c("First","Second","Third")
    data<-data.frame(student_id,student_names,position) #using data.frame() function
    data[data$student_id>=2 & data$position=="Third",]
    ##   student_id student_names position
    ## 3          3        Laxman    Third

Contd…

  • Sorting and ordering: by using sort() and order() function.
    sort(data$student_names)
    ## [1] Laxman Ram    Shyam 
    ## Levels: Laxman Ram Shyam
    data[order(data$student_names),]
    ##   student_id student_names position
    ## 3          3        Laxman    Third
    ## 1          1           Ram    First
    ## 2          2         Shyam   Second

Contd…

  • Handling with missing values: NA – missing value, NaN – undefined mathematical expressions
x<-c(1,2,NA,20,55,NaN)
#checking for NAs,NaNs
is.na(x)
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
is.nan(x)
## [1] FALSE FALSE FALSE FALSE FALSE  TRUE

Contd…

#removing NAs
bad<-is.na(x)
x[!bad]
## [1]  1  2 20 55
#taking subset with no missing values
good<-complete.cases(x) #returns all complete cases with no NAs.
good
## [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE
x[good]
## [1]  1  2 20 55

Contd…

  • Reshaping data: data needs to be changed from one format to other.
    • Using reshape2 package:
    library(reshape2)
    ## Warning: package 'reshape2' was built under R version 3.0.3
    #converts to flat format, unique id-variable combination
    mdata<-melt(data)
    ## Using student_names, position as id variables
    mdata
    ##   student_names position   variable value
    ## 1           Ram    First student_id     1
    ## 2         Shyam   Second student_id     2
    ## 3        Laxman    Third student_id     3

Contd…

dcast(mdata, student_names~variable) #casts a molten data frame to a data frame or array
##   student_names student_id
## 1        Laxman          3
## 2           Ram          1
## 3         Shyam          2
split(data,data$student_id) #splits data into groups
## $`1`
##   student_id student_names position
## 1          1           Ram    First
## 
## $`2`
##   student_id student_names position
## 2          2         Shyam   Second
## 
## $`3`
##   student_id student_names position
## 3          3        Laxman    Third

Contd…

#adding a new variable
data$year<-c(2015,2015,2015)
data
##   student_id student_names position year
## 1          1           Ram    First 2015
## 2          2         Shyam   Second 2015
## 3          3        Laxman    Third 2015

Contd…

  • Another important package for reshaping is plyr (split-apply-combine paradign for R).
library(plyr)
## Warning: package 'plyr' was built under R version 3.0.3
#ddply() function - takes data frame is input, returns a data frame
ddply(data,c(student_id),count)
##   student_id student_names position year freq
## 1          1           Ram    First 2015    1
## 2          2         Shyam   Second 2015    1
## 3          3        Laxman    Third 2015    1
  • Merging – merge(), intersect() etc.

Summary analysis

  • Datasets often very large. Its important to collect summary statistics
dim(ToothGrowth) #dimensions of dataset
## [1] 60  3
head(ToothGrowth) #shows first part of dataset. try tail().
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Contd…

summary(ToothGrowth) #reports summary of dataset
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
str(ToothGrowth) #more information
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Contd…

#computes summary statistics on subsets of data
aggregate(ToothGrowth,by=list(ToothGrowth$dose),class)
##   Group.1     len   supp    dose
## 1     0.5 numeric factor numeric
## 2     1.0 numeric factor numeric
## 3     2.0 numeric factor numeric

Contd…

quantile(ToothGrowth$len, na.rm=TRUE) #quantiles
##     0%    25%    50%    75%   100% 
##  4.200 13.075 19.250 25.275 33.900
table(ToothGrowth$dose, ToothGrowth$supp) #tabulate data based on parameters
##      
##       OJ VC
##   0.5 10 10
##   1   10 10
##   2   10 10
object.size(ToothGrowth) #size of dataset
## 2568 bytes

Contd…

mean(ToothGrowth$len) #mean
## [1] 18.81333
median(ToothGrowth$len) #median
## [1] 19.25
var(ToothGrowth$len) #variance
## [1] 58.51202
sd(ToothGrowth$len) #standard deviation
## [1] 7.649315

Contd…

range(ToothGrowth$len) #range
## [1]  4.2 33.9
  • Some other important functions to try: xtabs(), ftable(), prop.table(), margin.table() etc.

Simulation, sequencing and sampling

  • Simulation: Useful for inferencing results from data analysis
    • Functions for probability (normal) distribution: rnorm(), dnorm(), pnorm(), qnorm()
    • r – randon no. generation, d – density, p – cummulative distribution, q – quantile
set.seed(3) #sets random no. seed
x<-rnorm(5)
x
## [1] -0.9619334 -0.2925257  0.2587882 -1.1521319  0.1957828
summary(x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.1520 -0.9619 -0.2925 -0.3904  0.1958  0.2588

Contd…

 

References

iPhone 6 Review: Meet The New Best Smartphone

TechCrunch

Apple has two new iPhones debuting today, including the iPhone 6. The iPhone 6 is the heir apparent to the flagship line of Apple smartphones, as it comes in at the same price point as the iPhone 5s, but Apple has done something new this year by introducing a premium priced iPhone 6 Plus. The iPhone 6 is still plenty premium, however, and its 4.7-inch screen is likely going to be a better fit for most users, which is why it earns our vote as the best smartphone currently available.

Video Review

[tc_5min code=”518418694″]

Basics

  • 4.7-inch, 1334 x 750 display, 326 ppi with 1400:1 contrast
  • 16, 64 or 128GB storage
  • A8 processor (64-bit)
  • 8MP iSight camera (rear) with 1.5 micron pixels, 1.2 megapixel FaceTime camera (front)
  • Dual-band 802.11ac Wi-Fi
  • 20-band LTE support
  • MSRP: 16GB for $199 on contract/$649 contract free; 64GB for $299 on contract/$749 contract free; 128GB for $399 on…

View original post 2,128 more words

iOS 8 Review: Refinements And Relaxed Limitations Add Up For A Better Experience

TechCrunch

Apple’s iOS 8 is arriving tomorrow, and while it isn’t as overtly dramatic a change as iOS 7 was last year, it’s still a big update with lots of new features and tweaks. Using it on the new iPhone 6 hardware revealed lots to love in the new mobile OS from Apple, some easing of restrictions that could lead to big advantages for third-party apps, and a lot of potential to change the basic mechanics of the iOS ecosystem.

Messages

ios8-messages

Apple’s new Messages app in iOS 8 more closely resembles the various messaging networks that have sprung up, and that’s a very good thing, because it means you get access to some fun features, and as long as you’re chatting with someone who already has an iPhone or iPad capable of running iOS 8, they’re also already on board without any kind of download or sign-up.

The new features in…

View original post 1,821 more words

iPhone 6 Plus Review: The First Truly Well-Designed Big Smartphone

TechCrunch

Apple is launching not one, but two premium smartphones today, and the iPhone 6 Plus is the one many probably were skeptical even existed just a few short months ago. With a screen size measuring 5.5-inches across the diagonal, it’s well into the territory labeled “phablet” on the ancient sea charts of mariners who’ve braved the Android waters. However, Apple’s version of a smartphone that strains the inclusion of “phone” in any word describing it might surprise even those dead set against the trend toward ever-bigger mobile screens.

Video Review

[tc_5min code=”518417387″]

Basics

  • 5.5-inch, 1920 x 1080 display, 401 ppi with 1300:1 contrast
  • 16, 64 or 128GB storage
  • A8 processor (64-bit)
  • 8MP iSight camera (rear) with 1.5 micron pixels and optical image stabilization, 1.2 megapixel FaceTime camera (front)
  • Dual-band 802.11ac Wi-Fi
  • 20-band LTE support
  • MSRP: 16GB for $299 on contract/$749 contract free; 64GB for $399 on contract/$849 contract free; 128GB…

View original post 2,129 more words