R语言基础编程技巧汇编 - 16

时间：2015-04-06 08:59:44 阅读：287 评论：0 收藏：0 [点我收藏+]

标签：r语言数据分析数据挖掘机器学习

1. 利用magrittr包进行管道操作

很多情况下，管道操作符可以很大程度的简化代码，并且使其更加直观、易读、易懂，下面就简单说明了useR2014上颇受R用户喜爱的magrittr包。

the pipe operatoris one (if not THE) most important innovation introduced, this year, to the Recosystem

Intro

类似于linux中的|，可以把前一个表达式（函数）的输出（而不用定义中间变量来表示）

直接传递给后面的函数或者表达式，省去不必要中间变量或者括号,代码更加清晰易读。

magrittr提供了一系列的管道操作符，如%>%,x %>% f等价于f(x)。

2014年useR会议介绍magrittr之后，有人评价：the pipe operator is one (if

not THE) mostimportant innovation introduced, this year, to the R ecosystem

Basics

Produce values

通过使用管道操作符，可以从语义上改变我们写代码的方式，使代码更加简洁易读,管道操作符的功能为

· 默认情况下，左边的结果默认作为右边函数的第一个参数，可省略，如下面的transform

· as.Data和format.

· %>%可以以嵌套的方式使用，例如它可以作用于表达式来计算函数的参数，如

· Date = paste(1973, Month, Day, sep ="-") %>% as.Date

· 如果不是传递作为第一个参数用.表示，如aggregate(. ~ Date%>% format("%W"), ., mean)

· 公式中的.并不影响管道操作符，如aggregate(. ~ Date%>% format("%W"), ., mean)

· 如果传递的右边函数只需要一个参数，那么可以省略参数和括号，head,当然head(.)和head()也可以

· 管道操作符左边的函数（包含从上一步传递过来的.)为一元函数如aggregate(.~ Date %>%format("%W"), ., mean)

 library(magrittr)

 weekly <- airquality %>% transform(Date = paste(1973, Month, Day, sep = "-") %>% as.Date) %>% aggregate(. ~ Date %>% format("%W"), .,mean) %>% head

上面代码包括三个部分，输入(airquality)，一系列的数据转换(transform,aggregate)和输出(weekly)，类似函数的定义,所以它可以看成是一个函数的定义和调用的过程，容易读写和理解。当然你也可以不用%>%

    weekly <- aggregate(. ~ format(Date,"%W"),  transform(airquality, Date = as.Date(paste(1973, Month, Day, sep = "-"))), mean)

显然这种写法可读性较差，难于理解，含有多对圆括号，更不利于别人的阅读；或者使用中间变量来避免圆括号的使用，也不如利用管道操作符容易理解。此外，如果想在代码中间添加新的计算，使用管道操作显然特别方便。

Produce functions

此外，利用%>%也可以构造简单的函数，与其基本用法其实是一样的（仅仅是基本用法定义函数的时候即时调用返回结果），在构造函数的时候没有输入变量，用.替代输入变量即成功构造了一个函数

    mae <- . %>% abs %>% mean(na.rm = TRUE)

    mae(rnorm(10))

    ## [1] 0.949347

    ##等价于

    mae <- function(x) {

      mean(abs(x), na.rm = TRUE)

匿名函数和lambda表达式

   # 标准的函数定义

    mtcars %>% (function(x) {

      if (nrow(x) > 2)

        rbind(head(x, 1), tail(x, 1))

      else x

})

    ##            mpg cyl disp  hp drat   wt  qsec vs am gear carb

    ## Mazda RX4  21.0   6  160 110 3.90 2.62 16.46  0  1    4    4

    ## Volvo 142E 21.4   4  121 109 4.11 2.78 18.60  1  1    4    2

    # lambda表达式，一元函数的参数用<code>.</code>表示

    mtcars %>%

      if (nrow(.) > 0)

        rbind(head(., 1), tail(., 1))

      else .

    ##             mpg cyl disp  hp drat   wt  qsec vs am gear carb

    ## Mazda RX4  21.0   6  160 110 3.90 2.62 16.46  0  1    4    4

    ## Volvo 142E 21.4   4  121 109 4.11 2.78 18.60  1  1    4    2

右边的匿名函数用括号包装起来，括号在管道操作符产生作用前优先计算右边括起来的表达式或者函数,这时候默认情况下第一个参数应该为省略的.就不起作用了。

    1:5 %>% {paste(letters[.])}

    ## [1] "a" "b" "c" "d" "e"

此外匿名函数最有用的就是可以用于*pply系列函数

   list(1,2,3) %>% sapply(. %>% length)

    ## [1] 1 1 1

嵌套的函数调用

.在管道操作符的右边可以嵌套调用

  1:5 %>% paste(letters[.])

    ## [1] "1 a" "2 b" "3 c" "4 d" "5 e"

    # 等价于

    1:5 %>% paste(.,letters[.])

    ## [1] "1 a" "2 b" "3 c" "4 d" "5 e"

Advances

magrittr提供了三个其他的，辅助的管道操作符，在某些特定情况下，它们使我们更加方便的实现各种操作

1. %T>%和%>%类似，只是它返回管道操作符左侧的值，通常用于流程中产生其他副作用的步骤(临时的，不改变左侧的结果继续传递给下一个步骤），如(print,plot, logging, etc)

   rnorm(200) %>%

    matrix(ncol = 2) %T>%

    plot %>% # plot usually does not return anything.

    colSums

    ## [1] -9.25270 -7.98688

    ## 其实是下面代码的简写形式

    # plot usually does not return anything.

       rnorm(200) %>% matrix(ncol = 2) %>% {plot(.);.} %>% colSums

     ## [1] 11.197969  7.683612

2. %$%使左侧数据中变量的名称加载到流程的环境中方便我们直接提取，显然它等价于函数with

    mtcars %$% plot(mpg, wt)

3. %<>%必须是流程中的第一个管道操作符，最后的计算结果赋值给其左边变量，它主要作用是把形如foo\<- foo %\>% bar%\>% baz的计算流程简化为foo %\<\>% bar %\>%baz\`

此外为了使R中的基本操作符与管道操作更友好的结合起来，为这些操作符提供了别名：

<blockquote>extract [

extract2 [[

use_series $

add +

subtract -

multiply_by *

raise_to_power ^

multiply_by_matrix %*%

divide_by /

divide_by_int %/%

mod %%

and &

or |

equals ==

is_greater_than >

is_weakly_greater_than >=

is_less_than <

is_weakly_less_than <=

not !

set_colnames colnames<-

set_rownames rownames<-

set_names names<-

</blockquote>

    rnorm(100) %>% <code>*</code>(5) %>% <code>+</code>(5) %>%

      cat("Mean:", mean(.), "Variance:", var(.),  "\n")

      head(.)

    ## Mean: 4.785676 Variance: 23.8642

    ## [1]  7.806526 -4.416972 -1.273823  2.976003  7.568857 12.097451

    # 可以写为

    rnorm(1000)    %>%

    multiply_by(5) %>%

    add(5)         %>%

       cat("Mean:", mean(.),

           "Variance:", var(.), "\n")

       head(.)

    ## Mean: 4.982161 Variance: 26.18578

    ## [1]  0.06189078  8.67467189  6.37717763  6.09563550 -0.65720166  4.04583594

2. 判断字符串内的字母个数

library(data.table)

sum(between(charToRaw("abc1234xyz"),charToRaw("a"), charToRaw("z")))

3. 为图形加上边框

attach(mtcars)

opar <- par(no.readonly=TRUE)

par(mar=c(3, 4, 0, 2)+1, oma=c(2, 2, 2, 2))

plot(wt, mpg,

xlab="Miles Per Gallon",

ylab="Car Weight")

box(which="figure", lty=2,col="red")

box(which="inner", lty=2,col="green")

box(which="outer", lty=2,col="yellow")

box(which="plot",col="blue")

mtext("plot", side=3, line=-0.5,col="red")

mtext("figure/inner", side=3,line=-0.5, col="red", outer=TRUE)

mtext("outer", side=3, line=1.3,col="red", outer=TRUE)

par(opar)

detach(mtcars)

技术分享

4. 生成正交表

gen.orthogonal.design <-function(listFactors,numCards){

library(AlgDesign)

FactorsNames<-c("A","B","C","D","E","F","G","H","J","K","L")

numFactors<-length(listFactors)

dat<-gen.factorial(listFactors,center=FALSE,varNames=FactorsNames[1:numFactors])

desPB<-optFederov(~.,dat,nRepeats=20,approximate=FALSE,nTrials=numCards)

design<-desPB$design#[,2:(numFactors+1)]

cat("Number of trials: ", fill=TRUE, length(design[,1]),append=TRUE)

print(cor(design))

return(design)

}

design.test <-gen.orthogonal.design(c(2,2,3,3,3,3,2,2),numCards=16)

design.test

Number of trials: 16

A B C D E F G H

A 1 0 0 0 0 0 0 0

B 0 1 0 0 0 0 0 0

C 0 0 1 0 0 0 0 0

D 0 0 0 1 0 0 0 0

E 0 0 0 0 1 0 0 0

F 0 0 0 0 0 1 0 0

G 0 0 0 0 0 0 1 0

H 0 0 0 0 0 0 0 1

design.test

A B C D EF G H

106 2 1 3 3 3 1 1 1

218 2 1 1 1 1 3 1 1

219 1 2 1 1 1 3 1 1

324 2 2 3 3 3 3 1 1

351 1 2 1 3 1 1 2 1

400 2 2 1 1 3 1 2 1

405 1 1 3 1 3 1 2 1

573 1 1 3 3 1 3 2 1

657 1 1 3 1 1 1 1 2

660 2 2 3 1 1 1 1 2

747 1 2 1 3 3 1 1 2

961 1 1 1 3 3 3 1 2

998 2 1 1 3 1 1 2 2

1224 2 2 3 3 1 3 2 2

1262 2 1 1 1 3 3 2 2

1271 1 2 3 1 3 3 2 2

5. 趣味实现:调用金山词霸网络版

ciba<-function(x){

link=url(paste(‘http://dict.youdao.com/m/search?keyfrom=dict.mindex&vendor=&q=‘,iconv(x,to=‘UTF-8‘)),encoding=‘UTF-8‘)

readLines(link)->a

gsub(‘(<[^<>]*>)|(^ )|(\t)‘,‘‘,a)->a;gsub(‘ {2,}‘,‘‘,a)->a

head(a,-11)->a;tail(a,-35)->a;a[a!=‘‘]->a

paste(a,collapse=‘\n‘)->a

gsub(‘(\n *){2,}‘,‘\n‘,a)->a;gsub(‘ *\n *‘,‘\n‘,a)->a

cat(a)

}

> ciba(‘china‘)

china

[‘t?ain?]

. 瓷器

adj

. 瓷制的

网络释义

1. China: 中国 | 中华人民共和国 | 瓷器 | 瓷料

2. Air China: 中国国际航空公司 | 中国国际航空 | 中国国航 | 中国航空

3. Mainland China: 中国大陆 | 大陆中国 | 大陆地区 | 中国内地

更多释义

例句

1.This ship was made in China.这艘轮船是中国制造的。dict.yoduao.com

2.We set sail from China for Japan.我们从中国启航驶往日本。dict.yoduao.com

3.This car is made in China.这辆汽车是中国制造的。www.rmbzj.com

> ciba(‘中国‘)

中国

[zhōng guó]

China

网络释义

1. 中国: China | CN | Chinese | CHN

2. 中国结: Chinese knot | Chinese knotting | The artof chinese knots | Korean knot

3. 中国石化: SINOPEC | China Petroleum &amp;Chemical | China Petroleum &amp; Chemical Corporation | China Petroleum

更多释义

例句

1.这辆汽车是中国制造的。This car is made in China.www.rmbzj.com

2.这艘轮船是中国制造的。This ship was made in China.www.rmbzj.com

3.我们从中国启航驶往日本。We set sail from China forJapan.www.rmbzj.com

6. sequence函数生成序列

sequence函数对向量中每一个数n，都生成一个1:n的序列，如下例：

y <- c(1,2,3,4,5)

sequence(y)

[1] 1 1 2 1 2 3 1 2 3 4 1 2 3 4 5

7. R6：新的面向对象语法

一直觉得 S3/S4 挺反直觉的，刚知道还有这个：R6

http://cran.r-project.org/web/packages/R6/

http://rpubs.com/wch/24456

摘抄部分代码如下：

library(R6)

Person <- R6Class("Person",

public = list(

name = NA,

hair = NA,

initialize = function(name, hair) {

if (!missing(name)) self$name <- name

if (!missing(hair)) self$hair <- hair

self$greet()

set_hair = function(val) {

self$hair <- val

greet = function() {

cat(paste0("Hello, my name is ", self$name, ".\n"))

}

)

ann <- Person$new("Ann","black")

#> Hello, my name is Ann.

ann

#> <Person>

#> Public:

#> greet: function

#> hair: black

#> initialize: function

#> name: Ann

#> set_hair: function

ann$hair

#> [1] "black"

ann$greet()

#> Hello, my name is Ann.

ann$set_hair("red")

ann$hair

#> [1] "red"

8. pdf中显示中文字符

pdf.options(family="GB1") #繁体字的话family="CNS1"

pdf(file="1.pdf",width=10)

plot(1,main="哈哈")

dev.off()

pdf.options(reset=TRUE)

技术分享

9. 实现序列左移右移的函数

shift <- function(x, k = 1, fill = 0) {

if (abs(k) > length(x))

rep(fill, length(x))

else

cbind(fill, x, fill)[seq_along(x) + length(x) - k]

}

shift(1:8)

#[1] 0 1 2 3 4 5 6 7

shift(1:8, 5)

#[1] 0 0 0 0 0 1 2 3

shift(1:8, 7)

#[1] 0 0 0 0 0 0 0 1

shift(1:8, 0)

#[1] 1 2 3 4 5 6 7 8

shift(1:8, -1)

#[1] 2 3 4 5 6 7 8 0

shift(1:8, -5)

#[1] 6 7 8 0 0 0 0 0

shift(1:8, 10)

#[1] 0 0 0 0 0 0 0 0

shift(1:8, -10)

#[1] 0 0 0 0 0 0 0 0

10. 让R启动时就自动加载指定文件夹下的R程序

经常编好一些程序，需要在运行R时就直接载入这些函数，在R安装目录的etc\Rprofile.site下可以直接将函数放进去或者source()一下，我觉得这样做还是有些麻烦，干脆将需要载入的程序直接放在某一个文件夹下（我的是F:\\My Documents\\R），然后在etc\Rprofile.site中加入如下代码（根据自己的文件夹修改path参数），即可成功载入该文件下所有的.r文件。

setwd("F:\\My Documents\\R")

myFun <-lapply(list.files(path="F:\\My Documents\\R", pattern=‘\\.r$‘),source)

11. **和^都是幂运算符

system.time(for(i in seq(1,120,0.00001))i**i)

# 用户系统流逝

# 18.196 0.454 18.933

system.time(for(j in seq(1,120,0.00001))j^j)

# 用户系统流逝

# 17.922 0.624 18.682

理论上**还是比^要差一点点, 在base::Arithmetic中找到了这样一个Note, 在解析时**是先转换成^再算的:

Note

** is translated in the parser to ^, butthis was undocumented for many years. It appears as an index entry in Becker etal (1988), pointing to the help for Deprecated but is not actually mentioned onthat page. Even though it had been deprecated in S for 20 years, it was stillaccepted in R in 2008.

12. 常见的时间格式

这些格式用于strptime函数中，可以对时间字符串进行格式化。

格式说明

%a Abbreviated weekday name

%A Full weekday name

%b Abbreviated month name

%B Full month name

%c Date and time, locale-speci?c.

%d Day of the month as decimal number(01-31).

%H Hours as decimal number (00-23).

%I Hours as decimal number (01-12).

%j Day of year as decimal number (001-366).

%m Month as decimal number (01-12).

%M Minute as decimal number (00-59).

%p AM/PM indicator in the locale.

Used in conjuction with ’%I’ and *not* with’%H’.

%S Second as decimal number (00-61),allowing for up to two leap-seconds

%U Week of the year as decimal number(00-53)

using the ?rst Sunday as day 1 of week 1.

%w Weekday as decimal number (0-6, Sundayis 0).

%W Week of the year as decimal number(00-53)

using the ?rst Monday as day 1 of week 1.

%x Date, locale-speci?c.

%X Time, locale-speci?c.

%y Year without century (00-99).

If you use this on input, which century youget is system-speci?c. So don’t!

Often values up to 69 (or 68) are pre?xedby 20 and 70-99 by 19.

%Y Year with century.

%z (output only.) Offset from Greenwich, so’-0800’ is 8 hours west of Greenwich.

%Z (output only.) Time zone as a characterstring (empty if not available).

%F Equivalent to %Y-%m-%d (the ISO 8601date format).

%g The last two digits of the week-basedyear (see ’%V’).

%G The week-based year (see ’%V’) as adecimal number.

%u Weekday as a decimal number (1-7, Mondayis 1).

%V Week of the year as decimal number(00-53).

If the week (starting on Monday) containing1 January

has four or more days in the new year, thenit is considered week 1.

Otherwise, it is the last week of theprevious year,

and the next week is week 1.

%D Locale-speci?c date format such as’%m/%d/%y’.

%k The 24-hour clock time with singledigits preceded by a blank.

%l The 12-hour clock time with singledigits preceded by a blank.

%n Newline on output, arbitrary whitespaceon input.

%r The 12-hour clock time (using thelocale’s AM or PM).

%R Equivalent to ’%H:%M’.

%t Newline on output, arbitrary whitespaceon input.

%T Equivalent to ’%H:%M:%S’.

13. 打印积分符号

plot(c(1, 1))

text(1.5, 1, expression(integral(f(x)*dx,a, b)))

技术分享

14. 同时绘制多幅图

要同时画多个图就要设置par中mfrow或mfcol参数，假设画四个图，可以如下操作：

par(mfrow=c(2,2))

plot(1:10)

plot(2:11)

plot(3:12)

plot(4:13)

技术分享

注意别忘了恢复原来的设置，比如：

par(mfrow=c(1,1))

15. 日期相减

D1 <- as.Date(d1,"%m/%d/%y")

D2 <- as.Date(d2,"%m/%d/%y")

difftime(D1,D2,units = "days")

#or

D1 - D2

R语言基础编程技巧汇编 - 16

标签：r语言数据分析数据挖掘机器学习

原文地址：http://blog.csdn.net/liu7788414/article/details/44890969

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行