sort columns of a data frame

Question

Possible duplicate: How to sort a dataframe by column(s)?

I have a data frame in different orders (in terms of columns).

       ID_REF      VALUE       ABS_CALL           DETECTION.P.VALUE
1    10071_s_at 3473.60000        P/present       0.000219000
2       1053_at  643.20000        P/present       0.000673000
3        117_at  564.00000        M/Marginal      0.000322000
4     1255_g_at    9.40000        A/absent        0.602006000
5       1294_at  845.60000        P/present       0.000468000
6       1320_at   94.30000        A/absent        0.204022000

Now below the order of columns is changed

 VALUE      ID_REF     ABS_CALL           DETECTION P-VALUE
1  3473.6   10071_s_at  P/present          0.000219
2  643.2    1053_at     P/present          0.000673
3  564      117_at      M/marginal         0.000322
4  9.4      1255_g_at   A/absent           0.602006
5  845.6    1294_at     P/present          0.000468
6  94.3     1320_at     A/absent           0.204022

Again it is changed

   DETECTION P-VALUE    VALUE   ID_REF     ABS_CALL
1  0.000219             3473.6  10071_s_at  P
2  0.000673             643.2   1053_at     P
3  0.000322             564     117_at      M
4  0.602006             9.4     1255_g_at   A
5  0.000468             845.6   1294_at     P
6  0.204022             94.3    1320_at     A

Here I have the same data frame in different column orders. I don't know the order of the data frame, so I want to put the data in the below format:

      ID_REF      VALUE       ABS_CALL           DETECTION.P.VALUE
1    10071_s_at 3473.60000        P/present       0.000219000
2       1053_at  643.20000        P/present       0.000673000
3        117_at  564.00000        M/Marginal      0.000322000
4     1255_g_at    9.40000        A/absent        0.602006000
5       1294_at  845.60000        P/present       0.000468000
6       1320_at   94.30000        A/absent        0.204022000

Here i need to check if there is a substring of _at in any column then put that as the first column. If any column has values greater than 1 then put that column as the second column of data frame. If any column has levels P,A,M or present ,absent ,marginal then put that as the third column and finally any column with values less than 1 will be the last. Can anybody please tell me how to do this in R efficiently?

Note: Column names are not permanent, those can be anything (different names).

score 2 · Accepted Answer · answered Apr 11 '15 at 13:50

This isn't elegant by any means, but you could write a function that looks at only a single row and search for each of your criteria:

dat1 <- read.table(header = TRUE, check.names = FALSE,
text="ID_REF      VALUE       ABS_CALL           DETECTION.P.VALUE
1    10071_s_at 3473.60000        P/present       0.000219000
2       1053_at  643.20000        P/present       0.000673000
3        117_at  564.00000        M/Marginal      0.000322000
4     1255_g_at    9.40000        A/absent        0.602006000
5       1294_at  845.60000        P/present       0.000468000
6       1320_at   94.30000        A/absent        0.204022000")

dat2 <- read.table(header = TRUE, check.names = FALSE,
text="VALUE      ID_REF     ABS_CALL           'DETECTION P-VALUE'
1  3473.6   10071_s_at  P/present          0.000219
2  643.2    1053_at     P/present          0.000673
3  564      117_at      M/marginal         0.000322
4  9.4      1255_g_at   A/absent           0.602006
5  845.6    1294_at     P/present          0.000468
6  94.3     1320_at     A/absent           0.204022")

dat3 <- read.table(header = TRUE, check.names = FALSE,
text="   'DETECTION P-VALUE'    VALUE   ID_REF     ABS_CALL
1  0.000219             3473.6  10071_s_at  P
2  0.000673             643.2   1053_at     P
3  0.000322             564     117_at      M
4  0.602006             9.4     1255_g_at   A
5  0.000468             845.6   1294_at     P
6  0.204022             94.3    1320_at     A")


f <- function(data) {
  d1 <- data[1, , drop = FALSE]
  ## from row 1, separate out numerics, p-value is < 1
  ## and the other we assume is value
  nums <- d1[, nn <- sapply(d1, is.numeric)]
  p <- names(nums[, nums < 1, drop = FALSE])
  val <- setdiff(names(nums), p)

  ## take all the other columns, find `_at$` as id
  ## and assume the other column is `abs_call`
  ch <- d1[, !nn, drop = FALSE]
  id <- names(ch[, grepl('_at$', as.character(unlist(ch))), drop = FALSE])
  abs <- setdiff(names(ch), id)

  ## order by the name found
  data[, c(id, val, abs, p)]
}

lapply(list(dat1, dat2, dat2), f)

# [[1]]
#       ID_REF  VALUE   ABS_CALL DETECTION.P.VALUE
# 1 10071_s_at 3473.6  P/present          0.000219
# 2    1053_at  643.2  P/present          0.000673
# 3     117_at  564.0 M/Marginal          0.000322
# 4  1255_g_at    9.4   A/absent          0.602006
# 5    1294_at  845.6  P/present          0.000468
# 6    1320_at   94.3   A/absent          0.204022
# 
# [[2]]
#       ID_REF  VALUE   ABS_CALL DETECTION P-VALUE
# 1 10071_s_at 3473.6  P/present          0.000219
# 2    1053_at  643.2  P/present          0.000673
# 3     117_at  564.0 M/marginal          0.000322
# 4  1255_g_at    9.4   A/absent          0.602006
# 5    1294_at  845.6  P/present          0.000468
# 6    1320_at   94.3   A/absent          0.204022
# 
# [[3]]
#       ID_REF  VALUE   ABS_CALL DETECTION P-VALUE
# 1 10071_s_at 3473.6  P/present          0.000219
# 2    1053_at  643.2  P/present          0.000673
# 3     117_at  564.0 M/marginal          0.000322
# 4  1255_g_at    9.4   A/absent          0.602006
# 5    1294_at  845.6  P/present          0.000468
# 6    1320_at   94.3   A/absent          0.204022

If i need to search for all the rows, because i don't know how the data comes, if any entry is missing for any column. Then it need to be searched in other row as well — Agaz Wani, Apr 11 '15 at 16:22
you could use `na.omit(data)[1, ]` instead of just `data[1, ]`. na.omit will filter out any row with any NA values, so if there is at least one row with all values, this should work — rawr, Apr 11 '15 at 16:42

sort columns of a data frame

1 Answers1