0

I want to sort all columns of a data frame in R by a column containing alphanumeric data. Here is an example data frame:

R> dd <- data.frame(b = c("Hi", "Med", "Hi", "Low"),
                 x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
                 z = c("A1", "A3", "A10", "A2"))

1   Hi  A   8   A1
2   Med D   3   A3
3   Hi  A   9   A10
4   Low C   9   A2

I would like to sort the entire data frame on column z. The desired output looks like this - with the info across columns staying consistent:

1   Hi  A   8   A1
2   Low C   9   A2
3   Med D   3   A3
4   Hi  A   9   A10

Here are the methods I've tried so far that have not worked:

Method 1: "A10" is incorrectly sorted

R> dd<- dd[with(dd, order(z)), ]
R> View(dd)
1   Hi  A   8   A1
4   Hi  A   9   A10
2   Low C   9   A2
3   Med D   3   A3

Method 2: No sort performed

R> library(gtools)
R> dd$z = factor(dd$z, levels = gtools::mixedsort(dd$z))
R> View(dd)
1   Hi  A   8   A1
2   Med D   3   A3
3   Hi  A   9   A10
4   Low C   9   A2

Method 3: No sort performed

R> library(stringr);
R> dd$z = factor(dd$z, levels = str_sort(dd$z, numeric=TRUE))
R> View(dd)
1   Hi  A   8   A1
2   Med D   3   A3
3   Hi  A   9   A10
4   Low C   9   A2
Alex W
  • 99
  • 1
  • 8

2 Answers2

2

I found a solution that works consistently for the example provided as well as my real data. Thanks to @rawr for the insight. When I create or load in data, I have to set stringAsFactors to false.

R> dd <- data.frame(b = c("Hi", "Med", "Hi", "Low"),
                 x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
                 z = c("A1", "A3", "A10", "A2"), stringsAsFactors = F)
R> dd

    b x y   z
1  Hi A 8  A1
2 Med D 3  A3
3  Hi A 9 A10
4 Low C 9  A2

R> library(gtools)
R> dd <- dd[gtools::mixedorder(dd$z), ]
R> dd

    b x y   z
1  Hi A 8  A1
4 Low C 9  A2
2 Med D 3  A3
3  Hi A 9 A10
Alex W
  • 99
  • 1
  • 8
  • 1
    I made an example where the different sorting techniques deviate: `df1 % arrange(gtools::mixedorder(z))` C10 is incorrectly placed in front of C1. This was an unexpected issue, so I may not have created the best example originally. My bad! – Alex W Oct 29 '19 at 00:00
  • May be it is a bug – akrun Oct 29 '19 at 02:11
1

Here is an option with mixedorder

library(dplyr)
dd <- dd %>% 
         arrange(gtools::mixedorder(z))
dd
#   b x y   z
#1  Hi A 8  A1
#2 Low C 9  A2
#3 Med D 3  A3
#4  Hi A 9 A10
akrun
  • 674,427
  • 24
  • 381
  • 486
  • Thank you! This gives the output I'm looking for for the example I posted. It's odd that it isn't sorting my actual data file correctly, although it is doing some kind of sort. Are there any special circumstances that may cause this code to not sort correctly that you're aware of? – Alex W Oct 23 '19 at 22:41
  • I noticed that running the code on the already correctly sorted data frame (or running it twice on an unsorted data frame) returns an unsorted data frame. Is there any code I can use that will always identify the correct alphanumeric order? – Alex W Oct 23 '19 at 23:21
  • this is weird.. `dd[gtools::mixedorder(dd$z), ]` gives a different order than `dd %>% arrange(gtools::mixedorder(z))`, the second being "correct" but only if we convert `dd$z` to character first. If we order by factor, the first is correct. So `arrange` is implicitly ignoring that `z` is a factor and ignoring the (ostensibly) desired order? Why should arrange make that decision for me? – rawr Oct 24 '19 at 01:22
  • @AlexW. Without an example that shows the issue, it is difficult to comment on it – akrun Oct 24 '19 at 16:35