18

Supose I have a data frame with 3 columns (name, y, sex) where name is character, y is a numeric value and sex is a factor.

sex<-c("M","M","F","M","F","M","M","M","F")
x<-c("MARK","TOM","SUSAN","LARRY","EMMA","LEONARD","TIM","MATT","VIOLET")
name<-as.character(x)
y<-rnorm(9,8,1)
score<-data.frame(x,y,sex)
score
     name      y     sex
1    MARK  6.767086   M
2     TOM  7.613928   M
3   SUSAN  7.447405   F
4   LARRY  8.040069   M
5    EMMA  8.306875   F
6 LEONARD  8.697268   M
7     TIM 10.385221   M
8    MATT  7.497702   M
9  VIOLET 10.177969   F

If I wanted to order it by y I would use:

score[order(score$y),]
        x         y sex
1    MARK  6.767086   M
3   SUSAN  7.447405   F
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
5    EMMA  8.306875   F
6 LEONARD  8.697268   M
9  VIOLET 10.177969   F
7     TIM 10.385221   M

So far, so good... The names keep the correct score BUT how could I reorder it to have M and F levels not mixed. I need to order and at the same time keep factor levels separated.

Finally I would like to take a step further to involve character, the example doesn't help, but what if there were tied y values and I would have to order again within factor (e.g. TIM and TOM got 8.4 and I have to assign alphabetical order).

I was thinking about by function but it creates a list and doesn't help really. I think there must be some function like it to apply on data frames and get data frames as return.

TO MAKE CLEAR THE POINT:

sep<-split(score,score$sex)
sep$M<-sep$M[order(sep$M[,2]),]
sep$M
x         y sex
1    MARK  6.767086   M
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
6 LEONARD  8.697268   M
7     TIM 10.385221   M

sep$F<-sep$F[order(sep$F[,2]),]
sep$F
x         y sex
3  SUSAN  7.447405   F
5   EMMA  8.306875   F
9 VIOLET 10.177969   F

merged<-rbind(sep$M,sep$F)
merged
x         y sex
1    MARK  6.767086   M
8    MATT  7.497702   M
2     TOM  7.613928   M
4   LARRY  8.040069   M
6 LEONARD  8.697268   M
7     TIM 10.385221   M
3   SUSAN  7.447405   F
5    EMMA  8.306875   F
9  VIOLET 10.177969   F

I know how to do that if I have 2 or 3 factors. But what if I had serious levels of factors, say 20, should I write a for loop?

double-beep
  • 3,889
  • 12
  • 24
  • 35
Matias Andina
  • 3,415
  • 4
  • 22
  • 43

4 Answers4

31

order takes multiple arguments, and it does just what you want:

with(score, score[order(sex, y, x),])
##         x        y sex
## 3   SUSAN 6.636370   F
## 5    EMMA 6.873445   F
## 9  VIOLET 8.539329   F
## 6 LEONARD 6.082038   M
## 2     TOM 7.812380   M
## 8    MATT 8.248374   M
## 4   LARRY 8.424665   M
## 7     TIM 8.754023   M
## 1    MARK 8.956372   M
Matthew Lundberg
  • 39,899
  • 6
  • 81
  • 105
10

Here is a summary of all methods mentioned in other answers/comments (to serve future searchers). I've added a data.table way of sorting.

# Base R
do.call(rbind, by(score, score$sex, function(x) x[order(x$y),]))
with(score, score[order(sex, y, x),])
score[order(score$sex,score$x),]

# Using plyr
arrange(score, sex,y)
ddply(score, c('sex', 'y'))

# Using `data.table`
library("data.table")
score_dt <- setDT(score)

# setting a key works sorts the data.table
setkey(score_dt,sex,x)
print(score_dt)

Here is Another question that deals with the same

marbel
  • 6,933
  • 5
  • 46
  • 65
3

I think there must be some function like it to apply on data frames and get data frames as return

Yes there is:

library(plyr)

ddply(score, c('y', 'sex'))
John
  • 32,659
  • 27
  • 74
  • 102
  • The question would be, why use `plyr` for a simple order operation? – thelatemail Jan 23 '14 at 03:00
  • 3
    @thelatemail, You could if you used `plyr::arrange`. i.e. `arrange(score, sex,y)`. – mnel Jan 23 '14 at 03:07
  • I've just learnt from a mistake a great use of arrange. If you call arrange(score,sex,y) it works like you said but if you call arrange(score,y,sex) it gives you a dataframe with the minimum value of every factor. That is terrific! (sorry I'm new to R) – Matias Andina Jan 23 '14 at 03:40
  • is it "plyr" or "dplyr"? – yenats Nov 30 '19 at 16:50
2

It sounds to me like you're trying to order by score within the males and females and return a combined data frame of sorted males and sorted females.

You are right that by(score, score$sex, function(x) x[order(x$y),]) returns a list of sorted data frames, one for male and one for female. You can use do.call with the rbind function to combine these data frames into a single final data frame:

do.call(rbind, by(score, score$sex, function(x) x[order(x$y),]))
#           x         y sex
# F.5    EMMA  7.526866   F
# F.9  VIOLET  8.182407   F
# F.3   SUSAN  9.677511   F
# M.4   LARRY  6.929395   M
# M.8    MATT  7.970015   M
# M.7     TIM  8.297137   M
# M.6 LEONARD  8.845588   M
# M.2     TOM  9.035948   M
# M.1    MARK 10.082314   M
josliber
  • 41,865
  • 12
  • 88
  • 126