Sort data in R data frame within subgroups

Question

I have a large data frame of bond data, like that:

   ISIN      CF       DATE
1   A   105.750  2016-09-30
2   B   104.875  2016-05-31
3   C   106.875  2017-02-13
4   D   103.875  2016-10-07
5   E   5.000    2016-04-21
6   E   5.000    2017-04-21
7   E   5.000    2018-04-21
8   E   5.000    2019-04-21
9   E   105.000  2020-04-21
10  F   7.800    2016-09-09
11  F   7.800    2017-09-09
12  F   7.800    2018-09-09
13  F   7.800    2019-09-09
14  F   107.800  2020-09-09

I want to group the elements by the ISIN code, then sort the Dates within the groups in increasing order (already done in the example above), then I want to sort the groups (A, B, C, D, E,F in this example) such that the group with the earliest date comes first, then the group with the second earliest date and so on.

I want it to look like this:

  ISIN     CF      DATE
16  E   5.000   2016-04-21
15  E   5.000   2017-04-21
14  E   5.000   2018-04-21
13  E   5.000   2019-04-21
12  E   105.000 2020-04-21
 7  B   104.875 2016-05-31
10  F    7.800  2016-09-09
11  F    7.800  2017-09-09
12  F    7.800  2018-09-09
13  F    7.800  2019-09-09
14  F   107.800 2020-09-09
6   A   105.750 2016-09-30
23  D   103.875 2016-10-07
22  C   106.875 2017-02-13

I tried something like this from this question:

How to sort a dataframe by column(s)?

df<-df[order(df$ISIN,df$DATE ),]

But it doesn't do what I want.

I don't want

Thanks for helping!

"it doesnt do what I want" and "it doesn't work" are _not_ helpful — rawr, Apr 01 '16 at 12:51
@rawr OP posted expected output...it's kind of clear what he wants — Matias Andina, Apr 01 '16 at 12:52
@MatiasAndina It would be helpful to know what exactly "doesn't work". The command provided by him should work. So seeing, what is happening is helpful. I am guessing, Date is in charchter format, hence ordering is not as he wants. — Jav, Apr 01 '16 at 12:53
@JavK True, I also noticed that in expected output DATE wins over CF aparently — Matias Andina, Apr 01 '16 at 12:55
@MatiasAndina what's with all these comments, I thought you said that it was "kind of clear what he wants" — rawr, Apr 01 '16 at 12:56
@MatiasAndina, i believe OP wants to arrange by isin followed by date, not cf followed by date. — vdep, Apr 01 '16 at 13:01

score 1 · Answer 1 · answered Apr 01 '16 at 12:48

1

With data.table:

DT <- data.table(yourDF, key = c("ISIN", "Date"))

answered Apr 01 '16 at 12:48

Jav

1,577
7
17

Matias Andina · Answer 2 · 2016-04-01T16:12:35.353

NEW UPDATE

Much better now with that ISIN and more ties, I used two auxiliary columns. First, I generate the order by DATE, then group by the ISIN and get the min value for each group (that gives me the group order). My data.frame is named B.

ord<-B %>% arrange(DATE) %>% mutate(ord=order(DATE))
ord2<-ord %>% group_by(ISIN) %>% summarize(min_ord=min(ord))
ord3<-merge(ord,ord2)
ord3<-ord3 %>% arrange(min_ord)

ISIN      CF       DATE ord min_ord
1     E   5.000 2016-04-21   1       1
2     E   5.000 2017-04-21   7       1
3     E   5.000 2018-04-21   9       1
4     E   5.000 2019-04-21  11       1
5     E 105.000 2020-04-21  13       1
6     B 104.875 2016-05-31   2       2
7     F   7.800 2017-09-09   8       3
8     F   7.800 2018-09-09  10       3
9     F   7.800 2019-09-09  12       3
10    F 107.800 2020-09-09  14       3
11    F   7.800 2016-09-09   3       3
12    A 105.750 2016-09-30   4       4
13    D 103.875 2016-10-07   5       5
14    C 106.875 2017-02-13   6       6

You can delete the extra columns using select(ISIN:DATE) in the pipeline. I kept them because I thought they could be handy for extra calculations.

OLD UPDATE

Ok, the thing is your ISIN value is not working for the order you want to make. Sometimes, your ISIN goes in "descending" order (e.g, 503326>255820>255817) but sometimes it doesn't and you want your DATE column to order your data.frame (e.g, 2016-05-31 before 2016-09-30 before 2016-10-07 before 2017-02-13).

Since in this case ISIN allows to use ifelse in a pseudo-convinient way:

df %>% mutate(ord=ifelse(ISIN=="XS0503326083",1,
                  ifelse(ISIN=="XS0255820804",2,
                  ifelse(ISIN=="XS0255817685",3,
                  ifelse(ISIN=="XS0438753294",4,5))))) %>%
       arrange(ord)

  row.names         ISIN      CF       DATE ord
1        16 XS0503326083   5.000 2016-04-21   1
2        15 XS0503326083   5.000 2017-04-21   1
3        14 XS0503326083   5.000 2018-04-21   1
4        13 XS0503326083   5.000 2019-04-21   1
5        12 XS0503326083 105.000 2020-04-21   1
6         7 XS0255820804 104.875 2016-05-31   2
7         6 XS0255817685 105.750 2016-09-30   3
8        23 XS0438753294 103.875 2016-10-07   4
9        22 XS0286431100 106.875 2017-02-13   5

I know that you might have many ISIN value to incorporate to this kind of condition. Also, in your example the only tied ISIN values will get arranged by DATE and CF with no problems. This might not hold for your bigger data frame.

Thanks, it works, unfortunately I have a much bigger database. — user137425, Apr 01 '16 at 14:44
How did you come with this patern of order? Maybe there's a way to use the ISIN, clearly you can ultimately deduce the correct order in your mind, it's a matter of coding it :) — Matias Andina, Apr 01 '16 at 14:59
I use mapply in a for loop to append data for individual bonds into this data frame. — user137425, Apr 01 '16 at 15:27
Thanks! But somehow the CF values in the groups are not always in increasing order. I get e.g. 5, 105, 5, 5, 5. — user137425, Apr 01 '16 at 17:36
This is the first time you mention CF values...I think you could use this example to create a third column...after all you're trying to do arragement within subgroups again...In this case, however, you can't use min because your want to sort numbers — Matias Andina, Apr 01 '16 at 17:39
Sorry, I updated the question. I want to sort by Date in increasing order within the groups, but I get: 2016-04-21, 2020-04-21, 2017-04-21.. — user137425, Apr 01 '16 at 18:11
This shouldn't be the case if date is in the correct format...(In my example I just pasted and was too lazy to set the DATE like as.date when I loaded the data into R). Also this question mutated so much that I find it counter-productive, maybe close and ask again with the bunch of new code and your new questions... — Matias Andina, Apr 01 '16 at 18:15

score 1 · Answer 3 · answered Apr 01 '16 at 12:55

1

library(dplyr)
sorted <- df %>% arrange(ISIN,DATE)

answered Apr 01 '16 at 12:55

vdep

3,193
4
24
46

Sort data in R data frame within subgroups

3 Answers3