Order data frame rows according to vector with specific order

Question

Is there an easier way to ensure that a data frame's rows are ordered according to a "target" vector as the one I implemented in the short example below?

df <- data.frame(name = letters[1:4], value = c(rep(TRUE, 2), rep(FALSE, 2)))

df
#   name value
# 1    a  TRUE
# 2    b  TRUE
# 3    c FALSE
# 4    d FALSE

target <- c("b", "c", "a", "d")

This somehow seems to be a bit too "complicated" to get the job done:

idx <- sapply(target, function(x) {
    which(df$name == x)
})
df <- df[idx,]
rownames(df) <- NULL

df 
#   name value
# 1    b  TRUE
# 2    c FALSE
# 3    a  TRUE
# 4    d FALSE

Edward · Accepted Answer · 2012-08-15T21:13:22.610

253

Try match:

df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))
target <- c("b", "c", "a", "d")
df[match(target, df$name),]

  name value
2    b  TRUE
3    c FALSE
1    a  TRUE
4    d FALSE

It will work as long as your target contains exactly the same elements as df$name, and neither contain duplicate values.

From ?match:

match returns a vector of the positions of (first) matches of its first argument 
in its second.

Therefore match finds the row numbers that matches target's elements, and then we return df in that order.

edited Aug 15 '12 at 21:13

answered Aug 15 '12 at 21:03

Edward

4,797
1
17
17

Great, that's more like it and exactly what I was looking for! Thanks a lot – Rappster Aug 15 '12 at 21:09
1

one question, what if the column that I would like to match has repeat values? like `b,c,a,d,b,c,a,d`. I tried `match` but it does not work well. – Yulong Aug 21 '13 at 15:35
@Yulong: I'd think you'd have to explicitly make sure that duplicates are removed prior to firing `match()`. What comes to mind is `duplicated()`, `unique()` or some other custom routine that "keeps" the desired elements while throwing away the others. HTH – Rappster Nov 17 '13 at 13:20
@Edward it's a nice solution. However, it also changes the indices. How can I also keep them in the ascending order(1, 2, 3, 4) ? – Hasan Iqbal Jun 08 '15 at 01:17
2

not sure it is the cleanest way, but with only "base" functions, this should work if you have duplicates in df: `df – Erica Fary Feb 14 '19 at 09:06

score 27 · Answer 2 · answered Jun 23 '16 at 03:30

27

I prefer to use ***_join in dplyr whenever I need to match data. One possible try for this

left_join(data.frame(name=target),df,by="name")

Note that the input for ***_join require tbls or data.frame

answered Jun 23 '16 at 03:30

Lerong

478
5
9

Yeah, the *_join functions in `dplyr` are really nice. End up using these a lot by now as well – Rappster Jun 24 '16 at 09:21
In this case, recommend declaring target order as a tibble, to avoid data.frame() conversion to factors. `target – Nettle Dec 14 '17 at 17:33
2

And with pipe syntax: `df %>% right_join(tibble(name = target), by = "name")` – Frank Aug 17 '18 at 18:51
2

Please, be careful. This is not anymore possible since dplyr 1.0.0. A breaking change. See [changelog](https://dplyr.tidyverse.org/news/index.html#breaking-changes) – damianooldoni Sep 11 '20 at 18:31
This works for me using dplyr 1.0.2. – Josh Jan 12 '21 at 15:54

score 18 · Answer 3 · answered May 26 '15 at 11:11

This method is a bit different, it provided me with a bit more flexibility than the previous answer. By making it into an ordered factor, you can use it nicely in arrange and such. I used reorder.factor from the gdata package.

df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))
target <- c("b", "c", "a", "d")

require(gdata)
df$name <- reorder.factor(df$name, new.order=target)

Next, use the fact that it is now ordered:

require(dplyr)
df %>%
  arrange(name)
    name value
1    b  TRUE
2    c FALSE
3    a  TRUE
4    d FALSE

If you want to go back to the original (alphabetic) ordering, just use as.character() to get it back to the original state.

@Reilstein `setDT(df)[ , name := factor(name, levels = target)]`. Then see the two `data.table` answers [here](http://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns/) — Henrik, Jan 07 '17 at 22:19

score 17 · Answer 4 · answered Jan 14 '20 at 09:10

17

We can adjust the factor levels based on target and use it in arrange

library(dplyr)
df %>% arrange(factor(name, levels = target))

#  name value
#1    b  TRUE
#2    c FALSE
#3    a  TRUE
#4    d FALSE

Or order it and use it in slice

df %>% slice(order(factor(name, levels = target)))

answered Jan 14 '20 at 09:10

Ronak Shah

286,338
16
97
143

4

The best solution IMO – stevec Apr 24 '20 at 11:00
1

The best and the simplest solutions for me. – Matt_B Jun 18 '20 at 08:18
Applies well, only if the "target" vector contains the same elements as the "name" vector; else, order breaks up. – Surya Dec 20 '20 at 21:17

score 2 · Answer 5 · answered Jan 17 '20 at 19:53

2

If you don't want to use any libraries and you have reoccurrences in your data, you can use which with sapply as well.

new_order <- sapply(target, function(x,df){which(df$name == x)}, df=df)
df        <- df[new_order,]

answered Jan 17 '20 at 19:53

eonurk

447
2
11

score 1 · Answer 6 · answered Jan 12 '21 at 00:41

Here's a similar system for the situation where you have a variable you want to sort by, initially, but then you want to sort by a secondary variable according to the order that this secondary variable first appears in the initial sort.

In the function below, the initial sort variable is called order_by and the secondary variable is called order_along - as in "order by this variable along its initial order".

library(dplyr, warn.conflicts = FALSE)
df <- structure(
  list(
    msoa11hclnm = c(
      "Bewbush", "Tilgate", "Felpham",
      "Selsey", "Brunswick", "Ratton", "Ore", "Polegate", "Mile Oak",
      "Upperton", "Arundel", "Kemptown"
    ),
    lad20nm = c(
      "Crawley", "Crawley",
      "Arun", "Chichester", "Brighton and Hove", "Eastbourne", "Hastings",
      "Wealden", "Brighton and Hove", "Eastbourne", "Arun", "Brighton and Hove"
    ),
    shape_area = c(
      1328821, 3089180, 3540014, 9738033, 448888, 10152663, 5517102,
      7036428, 5656430, 2653589, 72832514, 826151
    )
  ),
  row.names = c(NA, -12L), class = "data.frame"
)

this does not give me what I need:

df %>% 
  dplyr::arrange(shape_area, lad20nm)
#>    msoa11hclnm           lad20nm shape_area
#> 1    Brunswick Brighton and Hove     448888
#> 2     Kemptown Brighton and Hove     826151
#> 3      Bewbush           Crawley    1328821
#> 4     Upperton        Eastbourne    2653589
#> 5      Tilgate           Crawley    3089180
#> 6      Felpham              Arun    3540014
#> 7          Ore          Hastings    5517102
#> 8     Mile Oak Brighton and Hove    5656430
#> 9     Polegate           Wealden    7036428
#> 10      Selsey        Chichester    9738033
#> 11      Ratton        Eastbourne   10152663
#> 12     Arundel              Arun   72832514

Here’s a function:

order_along <- function(df, order_along, order_by) {
  cols <- colnames(df)
  
  df <- df %>%
    dplyr::arrange({{ order_by }})
  
  df %>% 
    dplyr::select({{ order_along }}) %>% 
    dplyr::distinct() %>% 
    dplyr::full_join(df) %>% 
    dplyr::select(dplyr::all_of(cols))
  
}

order_along(df, lad20nm, shape_area)
#> Joining, by = "lad20nm"
#>    msoa11hclnm           lad20nm shape_area
#> 1    Brunswick Brighton and Hove     448888
#> 2     Kemptown Brighton and Hove     826151
#> 3     Mile Oak Brighton and Hove    5656430
#> 4      Bewbush           Crawley    1328821
#> 5      Tilgate           Crawley    3089180
#> 6     Upperton        Eastbourne    2653589
#> 7       Ratton        Eastbourne   10152663
#> 8      Felpham              Arun    3540014
#> 9      Arundel              Arun   72832514
#> 10         Ore          Hastings    5517102
#> 11    Polegate           Wealden    7036428
#> 12      Selsey        Chichester    9738033

^{Created on 2021-01-12 by the reprex package (v0.3.0)}

how to handle if I have 27 observations (A1 to A38) repeated several times (total rows 7938) and I would like to sort based on the same. target — Mohammed Toufiq, Mar 24 '21 at 08:08

Order data frame rows according to vector with specific order

6 Answers6

Linked

Related