134

Let's say I have the following data.table in R:

  library(data.table)
  DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)

I want to order it by two columns (say columns x and v). I used this:

 DT[order(x,v)] # sorts first by x then by v (both in ascending order)

But now, I want to sort it by x (in decreasing order) and have the following code:

  DT[order(-x)] #Error in -x : invalid argument to unary operator

Therefore, I think this error is due to the fact that class(DT$x)=character. Could you give me any suggestion in order to solve this issue?

I know I can use DT[order(x,decreasing=TRUE)], but I want to know the syntax to sort by several columns using both ways (some decreasing, some increasing) at the same time.

Note that if you use DT[order(-y,v)] the result is ok, but if you use DT[order(-x,v)] there is an error. So, my question is: how to solve this error?

Petter Friberg
  • 19,652
  • 9
  • 51
  • 94
nhern121
  • 3,559
  • 4
  • 22
  • 33
  • 6
    Interesting question, but if you are working with large data sets, you likely should be setting keys for your data.tables. Keys put your data in an order that maximizes subsequent indexing, subsetting, aggregation-by-groups, etc. That *may* not be your preferred format for printing the data, but it's often a small price to pay for the speed it'll gain you. – Josh O'Brien Sep 10 '12 at 16:06
  • However, it appears to me that `DT[order(-x)]` is not an equivalent statement to `setorder(DT, -x)` because `setorder()` actually acts on `DT` while the other does not. Equivalent statements would be DT – jeromeResearch Mar 23 '17 at 01:07
  • @jerome You are correct. Pankil did not say they were equivalent, so I guess it's fine as-is. – Frank Mar 24 '17 at 02:18
  • 1
    I agree with @smci that a title edit makes sense here, though I would change it to indicate that this question is no longer relevant, eg by adding "in data.table 1.9.4 or earlier" to the title so people don't continue landing here from google expecting something else. I did this with one of my questions https://stackoverflow.com/questions/30035939/why-is-the-diag-function-so-slow-in-r-3-2-0-or-earlier – Frank Apr 23 '18 at 15:36
  • @Frank: rolled-back and done; in future please go ahead and do it yourself (do you think the notation 1.9.4- is generally understood?). Also, SO sadly removed the tag *known-issues*, which hamstrings us. It is a constant janitorial task with dt to triage old issues from new, solved from unsolved, package known-issues vs user issues, performance issues from bugs... harder when they take the damn tags away – smci Apr 23 '18 at 21:23
  • @smci Ok, thanks, just commented in case the OP disagreed (since they rolled back your recent edits and so might have some other opinion). Anyway, to not clog up the comments here, we could take it to R Public chat if there's more to say. Fwiw, part of that triage gets handled through github (eg, someone will link to a question and say "update when fixed" and hopefully someone will). Re old vs new... SO seems deliberately designed to make that hard / make closing as dupes hard... not sure it can be addressed by users without inordinate and continuous effort. – Frank Apr 23 '18 at 21:35
  • 1
    Nestorggh, please don't rollback the new title unless you can improve it. "sort rows in data.table" said almost nothing, that basic functionality was there for yonks. The title needs to mention your actual issue (multiple keys where one is decr order). Also important that this was a known issue in 1.9.4 and earlier and is no longer an issue. – smci Apr 23 '18 at 21:55

3 Answers3

152

Update

data.table v1.9.6+ now supports OP's original attempt and the following answer is no longer necessary.


You can use DT[order(-rank(x), y)].

   x y v
1: c 1 7
2: c 3 8
3: c 6 9
4: b 1 1
5: b 3 2
6: b 6 3
7: a 1 4
8: a 3 5
9: a 6 6
Community
  • 1
  • 1
Matthew Plourde
  • 41,229
  • 5
  • 88
  • 109
  • 1
    As pointed out by @PankilShah below this has been fixed for some time and OP's original approach now works as expected. I couldn't find the commit since it was fixed on the C level and I don't know what to search for. – MichaelChirico Feb 10 '16 at 03:43
  • 1
    Cool, thanks. It seems unlikely that anyone would end up here... but on the other hand I myself ended up here from googling something vaguely-related. – MichaelChirico Feb 10 '16 at 16:38
  • @MichaelChirico actually, I routinely get up-votes for this answer, so I'm really glad you pointed this out. I'm not really a **data.table** user and haven't been keeping up with its development. – Matthew Plourde Feb 12 '16 at 14:43
  • It's very useful to state the actual release number (1.9.6?), so we don't have to go hunt in [the archives of NEWS.md](https://github.com/Rdatatable/data.table/blob/master/NEWS.0.md). – smci Apr 19 '18 at 22:46
24

You can only use - on the numeric entries, so you can use decreasing and negate the ones you want in increasing order:

DT[order(x,-v,decreasing=TRUE),]
      x y v
 [1,] c 1 7
 [2,] c 3 8
 [3,] c 6 9
 [4,] b 1 1
 [5,] b 3 2
 [6,] b 6 3
 [7,] a 1 4
 [8,] a 3 5
 [9,] a 6 6
James
  • 61,307
  • 13
  • 140
  • 186
  • 3
    I like this way, unless you have two `character` columns and you want to sort one increasing and the other decreasing. – Matthew Plourde Sep 10 '12 at 15:04
  • 1
    @mplourde I think you can combine your solution with this one to tackle the problem you have posed. For instance, you can put: ``DT[order(x,-rank(w),decreasing=TRUE)]`` given that ``x`` and ``w`` are both character columns. Thank you! – nhern121 Sep 10 '12 at 15:44
18

DT[order(-x)] works as expected. I have data.table version 1.9.4. Maybe this was fixed in a recent version.
Also, I suggest the setorder(DT, -x) syntax in keeping with the set* commands like setnames, setkey

Pankil Shah
  • 840
  • 10
  • 7