R is a tool for statistics and data modeling. The R programming language is elegant, versatile, and has a highly expressive syntax designed around working with data. R is more than that, though — it also includes extremely powerful graphics capabilities.
变量
定义: x<-42,x赋值为42
向量(Vectors)
定义: c(1,T,”three”)
Sequence Vectors
输入:seq(5,9)或5:9
输出:[1] 5 6 7 8 9
设置步长-increments
1 | >seq(5,9,0.5) |
向量访问
1 | > sentence <- c('walk', 'the', 'plank') |
索引访问:R索引从1开始
1
2> sentence[3]
[1] "plank"定义向量访问:
1
2> sentence[c(1,3)]
[1] "walk" "dog"范围访问:
1
2> sentence[2:4]
[1] "the" "dog" "to"修改向量值
- 单个修改
1
> sentence[3]<-"dog"
- 批量修改
1
> sentence[5:7]<-c('the','poop','deck')
- 新增向量
1
sentence[4]<-"to"
绘图
barplot直方图1
2> vesselsSunk <- c(4, 5, 1)
> barplot(vesselsSunk)
scatter plots散点图1
2
3> values <- -10:10
> absolutes <- abs(values)
> plot(values, absolutes)
NA
1 | > a <- c(1, 3, NA, 7, 9) |
matrix
定义:3行4列,值都为0
matrix(0,3,4)
升维:dim设置matrix维度1
2
3
4
5
6> plank <- 1:8
> dim(plank) <- c(2,4)
> print(plank)
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
赋值:> plank[1,4] <- 0
matrix访问
1 | > print(plank) |
- 单个元素
1
2> plank[2,3]
[1] 6 列
1
2> plank[,4]
[1] 7 8指定范围
1
2
3
4> plank[,2:4]
[,1] [,2] [,3]
[1,] 3 5 7
[2,] 4 6 8
绘图
1 | > elevation <- matrix(1, 10, 10) |
轮廓图
1
> contour(elevation)
三维轮廓图
1
2> persp(elevation)
> persp(elevation, expand=0.2)
统计函数
Mean 平均值
1 | limbs <- c(4, 3, 4, 3, 2, 4, 4, 4) |
Median 中位数
1 | > limbs <- c(4, 3, 4, 3, 2, 4, 4, 14) |
It may be factually accurate to say that our crew has an average of 4.75 limbs, but it’s probably also misleading.
For situations like this, it’s probably more useful to talk about the “median” value.
The median is calculated by sorting the values and choosing the middle one (for sets with an even number of values, the middle two values are averaged).
1 | > median(limbs) |
Standard Deviation 标准偏差
Statisticians use the concept of “standard deviation” from the mean to describe the range of typical values for a data set. For a group of numbers, it shows how much they typically vary from the average value. To calculate the standard deviation, you calculate the mean of the values, then subtract the mean from each number and square the result, then average those squares, and take the square root of that average.
Data Frames
R has a structure for just this purpose: the data frame. You can think of a data frame as something akin to a database table or an Excel spreadsheet. It has a specific number of columns, each of which is expected to contain values of a particular type. It also has an indeterminate number of rows - sets of related values for each column.
1 | > treasure <- data.frame(weights, prices, types) |
数据访问
1 | > treasure[[2]] |
文件IO
1 | > piracy <- read.csv("piracy.csv") |
数据分析
cor.test
R can test for correlation between two vectors with the cor.test function.1
> cor.test(countries$GDP, countries$Piracy)
lm
if we calculate the linear model that best represents all our data points (with a certain degree of error).
The lm function takes a model formula, which is represented by a response variable (piracy rate), a tilde character (~), and a predictor variable (GDP). (Note that the response variable comes first.)
1 | > line <- lm(countries$Piracy ~ countries$GDP) |
扩展安装
install.packages(“ggplot2”)