24

What's your favorite one-liner in R?

Include a short companion example, and limit to one tip per post, please. Note, ; is cheating.

Example: calculate x[i] / x[i-1] for a vector x,

x <- 1:10
Reduce("/", as.data.frame(embed(x, 2)))

(collected from R-help, I forget who/when)

Edit: after some initial controversy, it looks like the question is now reopen for entries.

Brad Larson
  • 168,330
  • 45
  • 388
  • 563
baptiste
  • 71,030
  • 13
  • 180
  • 267
  • 2
    Too bad, I really would've like to see what else people come up with. Yours was really cool. Thanks! – Vincent Jun 06 '11 at 00:42
  • 4
    Why is this subjective and argumentative? Its either one line or it isn't and it does not seem argumentative to me either. – G. Grothendieck Jun 06 '11 at 00:58
  • 1
    @G.Grothendieck: What's the question? What's _your_ favorite one-liner in R? It's a poll, it's not a question that could be objectively be answered. – Jeff Mercado Jun 06 '11 at 01:23
  • 3
    Out of curiosity, had I phrased it "I'm looking for a concise piece of R code, limited to one line, yet providing a most elaborate example of R's functional programming paradigm in manipulating data."; would *that* have *fit in*? – baptiste Jun 06 '11 at 01:38
  • 13
    The most upvoted question in the R label is "What statistics should a programmer (or computer scientist) know?" and the second most upvoted one is "What is the most useful R trick?". If they are acceptable then I think this one should be too. – G. Grothendieck Jun 06 '11 at 01:54
  • 3
    Now that this question is open again, it should be CW cause you won't get a single best anwser – Sacha Epskamp Jun 06 '11 at 10:09
  • 3
    Voted to close, not a real question. Please put this on your blog instead. – user7116 Jun 06 '11 at 16:41
  • Am I the only one perplexed that regular R users don't seem to mind this unreasonable question? – baptiste Jun 08 '11 at 07:17
  • 1
    @sixlettervariables, @BlueRaja, @KirkWoll, @dmckee, @Graviton [Very similar question in javascript tag](http://stackoverflow.com/questions/472644/javascript-collection-of-one-line-useful-functions) and no closing votes. So why close this one? (more examples in [tips-and-tricks](http://stackoverflow.com/questions/tagged/tips-and-tricks) and [one-liner](http://stackoverflow.com/questions/tagged/one-liner) tags). – Marek Jun 08 '11 at 15:44
  • @Marek: thank you for bringing those to my attention. However, some are old, thus "grandfathered". Others are on topic, others are not and I'll vote to close them. – user7116 Jun 08 '11 at 15:56
  • 3
    Dear officious intermeddlers; I know you all are trying to do God's work by closing this question. However the [r] community on Stack Overflow likes this question quite a lot. Would you mind backing down from your religious zealotry on this one? Thanks. – JD Long Jun 08 '11 at 16:30

17 Answers17

14

If you want to record the time that you created a file in its name (perhaps to make it unique, or prevent overwriting), then try this one-line function.

timestamp <- function(format = "%y%m%d%H%M%S")
{
  strftime(Sys.time(), format)
}

Usage is, e.g.,

write.csv(
   some_data_frame, 
   paste("some data ", timestamp(), ".csv", sep = "")
)
Richie Cotton
  • 107,354
  • 40
  • 225
  • 343
9

Get odd or even indices.

odds <- function(x) seq_along(x) %% 2 > 0
evens <- function(x) seq_along(x) %% 2 == 0

Usage is, e.g.,

odds(1:5)
evens(1:5)
Richie Cotton
  • 107,354
  • 40
  • 225
  • 343
6

I often need fake data to illustrate, say, a regression problem. Instead of

X <- replicate(2, rnorm(100))
y <- X[,1] + X[,2] + rnorm(100)
df <- data.frame(y=y, X=X)

we can use

df <- transform(X <- as.data.frame(replicate(2, rnorm(100))), 
                y = V1+V2+rnorm(100))

to generate two uncorrelated predictors associated to the outcome y.

chl
  • 24,035
  • 5
  • 47
  • 70
6

Removing NaNs - which are a nuisance every once in a while - from a vector or dataframe (found somewhen on R-help)

is.na(x) <- is.na(x)

Example:

> x <- c(1, NaN, 2, NaN, 3, NA)
> is.na(x) <- is.na(x)
> x
[1]  1 NA  2 NA  3 NA
Mark Heckmann
  • 9,829
  • 3
  • 48
  • 77
5

Convert Excel dates to R dates. Answer adapted from code by Paul Murrell.

excel_date_to_r_date <- function(excel_date, format)
{
  #excel_date is the number of days since the 0th January 1900.  See
  #http://www.stat.auckland.ac.nz/~paul/ItDT/HTML/node67.html
  strftime(as.Date(as.numeric(excel_date) - 2, origin = "1900-01-01"), format)
}

Usage is, e.g.,

excel_date_to_r_date(40700, "%d-%m-%Y")
Richie Cotton
  • 107,354
  • 40
  • 225
  • 343
  • Did you check whether it works the same for Windows and Mac? – chl Jun 06 '11 at 13:25
  • That minus 2 was killing me in some work I'm doing. So glad I stumbled on this. – JD Long Jun 06 '11 at 14:36
  • @chl: Good point. I think, for a Mac, you can just change the `origin` value to `1904-01-01` and don't subtract the 2. Volunteers with Macs appreciated to test this. – Richie Cotton Jun 06 '11 at 17:34
4

Not quite what you are after, but fitting a multivariate linear regression model in one line is great:

lm(y ~ x1 + x2)
csgillespie
  • 54,386
  • 13
  • 138
  • 175
4

Reduce() is a new kid on the block. The same can be done using do.call(), and is a little bit quicker (on my system at least):

do.call("/", as.data.frame(embed(1:10, 2)))

R> do.call("/", as.data.frame(embed(1:10, 2)))
[1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000
[9] 1.111111
R> Reduce("/", as.data.frame(embed(1:10, 2)))
[1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000
[9] 1.111111
Gavin Simpson
  • 157,540
  • 25
  • 364
  • 424
3

Function summarize the amount of missing data for each variable in a data frame. Returns a list.

propmiss <- function(dataframe) lapply(dataframe,function(x) data.frame(nmiss=sum(is.na(x)), n=length(x), propmiss=sum(is.na(x))/length(x)))

Not a one-liner, but returning this info as a data frame is more useful.

propmiss <- function(dataframe) {
    m <- sapply(dataframe, function(x) {
        data.frame(
            nmiss=sum(is.na(x)), 
            n=length(x), 
            propmiss=sum(is.na(x))/length(x)
        )
    })
    d <- data.frame(t(m))
    d <- sapply(d, unlist)
    d <- as.data.frame(d)
    d$variable <- row.names(d)
    row.names(d) <- NULL
    d <- cbind(d[ncol(d)],d[-ncol(d)])
    return(d[order(d$propmiss), ])
}
Stephen Turner
  • 2,324
  • 7
  • 28
  • 44
3

Wipes the slate clean removes all objects from the memory.

rm(list=ls(all=TRUE))

Shreyas Karnik
  • 3,553
  • 3
  • 23
  • 26
2

Multiple columns edit is one of my favourite.

E.g. to change all numeric columns to characters:

X <- iris
X[id] <- lapply(X[id <- sapply(X, is.numeric)], as.character)

or standardize them

X[id] <- lapply(X[id <- sapply(X, is.numeric)], scales)
Marek
  • 45,585
  • 13
  • 89
  • 116
1

I'd say look at plyr for a package full of slick oneliners!

Sacha Epskamp
  • 42,423
  • 17
  • 105
  • 128
1

Well, not really a oneliner but textConnection is great!

x <- "1,3
1,a
1,g,4
3,d,6
2,X,1,3
2,K"
read.table(textConnection(x), sep=",", header=FALSE, na.strings="", fill=TRUE)

result

  V1 V2 V3 V4
1  1  3 NA NA
2  1  a NA NA
3  1  g  4 NA
4  3  d  6 NA
5  2  X  1  3
6  2  K NA NA
> 
jrara
  • 14,677
  • 28
  • 85
  • 117
  • 2
    one annoying side-effect of using `textConnection` in such a one-liner is that you get warnings when the connection closes later on. I usually have three lines; one to open, one to read, one to close the connection. – baptiste Jun 06 '11 at 10:31
  • @baptiste you can make it a one line by assigning *inline* `read.table(con – Gavin Simpson Jun 06 '11 at 10:36
  • 1
    @Gavin Simpson yeah, I suppose, though I wouldn't do that. Also, "real" one-liners should not use `;` i reckon. – baptiste Jun 06 '11 at 10:40
  • See `text_to_table` for a convenience wrapper to `textConnection`. http://stackoverflow.com/questions/3936285/is-there-a-way-to-use-read-csv-to-read-from-a-string-value-rather-than-a-file-in/3941145#3941145 – Richie Cotton Jun 06 '11 at 13:09
1

Here's another tip collected from R-help (if memory serves, by Romain François).

Remove existing variables from the workspace:

rm( list = Filter( exists, c("a", "b") ) )
baptiste
  • 71,030
  • 13
  • 180
  • 267
1

My favorite one-liner can be found in the help pages of the %in% function and is basically its opposite.

f.wo <- function(x, y) x[!x %in% y]

Wrapped up into a nice, small function it comes really handy. E.g.

R> f.wo(c("a", "b", "c"), "b")
[1] "a" "c"
R> f.wo(1:8, c(2,7))
[1] 1 3 4 5 6 8
mropa
  • 10,364
  • 9
  • 31
  • 29
1

Function to read space delimited data from the clipboard

read.cb <- function(...) read.table(file="clipboard", ...)

e.g.

# read data from the clipboard with a header
d<-read.cb(T) 

#read data from clipboard without header
d<-read.cb()
Stephen Turner
  • 2,324
  • 7
  • 28
  • 44
1

Function to convert columns of data in a data frame to factor variables

factorcols <- function(d, ...) lapply(d, function(x) factor(x, ...))

E.g. convert columns 1-4 in data frame d to factor variables

d[1:4] <- factorcols(d[1:4])
Stephen Turner
  • 2,324
  • 7
  • 28
  • 44
1

Return a new matrix, where the rows of the original matrix are sorted by columns:

newmat <- t(apply(orimat, 1, sort))
bill_080
  • 4,414
  • 1
  • 21
  • 30