Order a data.table

Question

I have a data.frame

> variable_importance
      Overall
x.1  87.30483
x.2  88.59212
x.3  34.16171
x.4  35.72880
x.5  50.62831
x.6  44.76673
x.7  31.12285
x.8  43.04628
x.9  33.01750
x.10 30.72718

I would like to order the data frame by the Overall variable, but such that the x.? identifiers remain with their respective values.

I.e. it should end up as

x.2  88.59212
x.1  87.30483
x.5  50.62831
[...]

order just gives me the indeces of the rearranged data frame and I loose the row identifiers.

How can I do this and is there a solution using the data.table library?

Minor nitpick: `data.table` is a "package," and the directory where packages are stored is a "library." It's an unfortunate naming convention, but we're stuck with it. — shadowtalker, Feb 22 '15 at 16:20
The solution using the `data.table` package looks like `variable_importance[order(-Overall)]`, as in the answer below. The `drop` thing isn't required. This only works if you load the package and then convert your data.frame to a data.table. — Frank, Feb 22 '15 at 16:48
I just wrote an answer at http://stats.stackexchange.com but in the meantime the question got moved here. So the question was locked and I was not able to post it and I just saw Frank beat me to it. — Marco Breitig, Feb 22 '15 at 16:58

score 1 · Answer 1 · answered Feb 22 '15 at 15:28

Use order to index into variable.importance but also use drop = FALSE to avoid coercing the data frame to a vector and losing the row names:

> variable.importance[order(-variable.importance),, drop = FALSE]
      Overall
x.2  88.59212
x.1  87.30483
x.5  50.62831
x.6  44.76673
x.8  43.04628
x.4  35.72880
x.3  34.16171
x.9  33.01750
x.7  31.12285
x.10 30.72718

score 0 · Answer 2 · answered Feb 22 '15 at 16:54

An data.table example:

require("data.table")
variable_importance <- data.frame(Overall=c(87.30483,88.59212,34.16171,35.72880,50.62831,44.76673,31.12285,43.04628,33.01750,30.72718), row.names=paste0("x.",1:10))
variable_importance # show data.frame
dt <- as.data.table(variable_importance, keep.rownames=T) # new data.table, by value (copy)
#dt <- setDT(variable_importance, keep.rownames=T)  # new data.table, by reference (so variable_importance is now the same data.table, too)
setorder(dt, -Overall)  # order data.table reverse by column Overall
setnames(dt, "rn", "")  # delete colname "rn"
dt # show data.table

setDT promotes variable_importance, which is much faster on huge data sets. When you transform the data.frame to a data.table you have to specify keep.rownames=T and you get a new column called rn with the original rownames, as data.table automaticly numbers the rows. Normly, when workign with data.table, you should not asign empty column names as you work with them. It is better practice to make a new column called id.

setnames(dt, "", "rn")  # give column back it's name to work with it
dt[,id:=as.integer(substr(rn, start=3, stop=nchar(rn)))]  # extract numbers from rownames
dt[,rn:=NULL] # delete column rn
setcolorder(dt, c("id","Overall"))  # reorder columns
dt # show data.table

score 0 · Answer 3 · answered Feb 22 '15 at 17:20

you can use apply to sort your data by the mentioned column

data<- structure(list(V1 = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 2L), .Label = c("x.1", "x.10", "x.2", "x.3", "x.4", "x.5", 
"x.6", "x.7", "x.8", "x.9"), class = "factor"), V2 = c(87.30483, 
88.59212, 34.16171, 35.7288, 50.62831, 44.76673, 31.12285, 43.04628, 
33.0175, 30.72718)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-10L))

apply(data, 2, sort)

score 0 · Answer 4 · edited May 23 '17 at 12:14

From version 1.9.5 of data.table (currently devel), you can also use setorder() on a data.frame. It reorders the input object by reference.

require(data.table)
setorder(df, -Overall)
df
#       Overall
# x.2  88.59212
# x.1  87.30483
# x.5  50.62831
# x.6  44.76673
# x.8  43.04628
# x.4  35.72880
# x.3  34.16171
# x.9  33.01750
# x.7  31.12285
# x.10 30.72718

Check this answer for benchmarks on how setorder() is both fast and memory efficient.

Order a data.table

4 Answers4