1

I have a large data frame of bond data, like that:

   ISIN      CF       DATE
1   A   105.750  2016-09-30
2   B   104.875  2016-05-31
3   C   106.875  2017-02-13
4   D   103.875  2016-10-07
5   E   5.000    2016-04-21
6   E   5.000    2017-04-21
7   E   5.000    2018-04-21
8   E   5.000    2019-04-21
9   E   105.000  2020-04-21
10  F   7.800    2016-09-09
11  F   7.800    2017-09-09
12  F   7.800    2018-09-09
13  F   7.800    2019-09-09
14  F   107.800  2020-09-09

I want to group the elements by the ISIN code, then sort the Dates within the groups in increasing order (already done in the example above), then I want to sort the groups (A, B, C, D, E,F in this example) such that the group with the earliest date comes first, then the group with the second earliest date and so on.

I want it to look like this:

  ISIN     CF      DATE
16  E   5.000   2016-04-21
15  E   5.000   2017-04-21
14  E   5.000   2018-04-21
13  E   5.000   2019-04-21
12  E   105.000 2020-04-21
 7  B   104.875 2016-05-31
10  F    7.800  2016-09-09
11  F    7.800  2017-09-09
12  F    7.800  2018-09-09
13  F    7.800  2019-09-09
14  F   107.800 2020-09-09
6   A   105.750 2016-09-30
23  D   103.875 2016-10-07
22  C   106.875 2017-02-13

I tried something like this from this question:

How to sort a dataframe by column(s)?

df<-df[order(df$ISIN,df$DATE ),]

But it doesn't do what I want.

I don't want

Thanks for helping!

Community
  • 1
  • 1
user137425
  • 423
  • 5
  • 16

3 Answers3

1

With data.table:

DT <- data.table(yourDF, key = c("ISIN", "Date"))
Jav
  • 1,577
  • 7
  • 17
1

NEW UPDATE

Much better now with that ISIN and more ties, I used two auxiliary columns. First, I generate the order by DATE, then group by the ISIN and get the min value for each group (that gives me the group order). My data.frame is named B.

ord<-B %>% arrange(DATE) %>% mutate(ord=order(DATE))
ord2<-ord %>% group_by(ISIN) %>% summarize(min_ord=min(ord))
ord3<-merge(ord,ord2)
ord3<-ord3 %>% arrange(min_ord)

ISIN      CF       DATE ord min_ord
1     E   5.000 2016-04-21   1       1
2     E   5.000 2017-04-21   7       1
3     E   5.000 2018-04-21   9       1
4     E   5.000 2019-04-21  11       1
5     E 105.000 2020-04-21  13       1
6     B 104.875 2016-05-31   2       2
7     F   7.800 2017-09-09   8       3
8     F   7.800 2018-09-09  10       3
9     F   7.800 2019-09-09  12       3
10    F 107.800 2020-09-09  14       3
11    F   7.800 2016-09-09   3       3
12    A 105.750 2016-09-30   4       4
13    D 103.875 2016-10-07   5       5
14    C 106.875 2017-02-13   6       6

You can delete the extra columns using select(ISIN:DATE) in the pipeline. I kept them because I thought they could be handy for extra calculations.

OLD UPDATE

Ok, the thing is your ISIN value is not working for the order you want to make. Sometimes, your ISIN goes in "descending" order (e.g, 503326>255820>255817) but sometimes it doesn't and you want your DATE column to order your data.frame (e.g, 2016-05-31 before 2016-09-30 before 2016-10-07 before 2017-02-13).

Since in this case ISIN allows to use ifelse in a pseudo-convinient way:

df %>% mutate(ord=ifelse(ISIN=="XS0503326083",1,
                  ifelse(ISIN=="XS0255820804",2,
                  ifelse(ISIN=="XS0255817685",3,
                  ifelse(ISIN=="XS0438753294",4,5))))) %>%
       arrange(ord)

  row.names         ISIN      CF       DATE ord
1        16 XS0503326083   5.000 2016-04-21   1
2        15 XS0503326083   5.000 2017-04-21   1
3        14 XS0503326083   5.000 2018-04-21   1
4        13 XS0503326083   5.000 2019-04-21   1
5        12 XS0503326083 105.000 2020-04-21   1
6         7 XS0255820804 104.875 2016-05-31   2
7         6 XS0255817685 105.750 2016-09-30   3
8        23 XS0438753294 103.875 2016-10-07   4
9        22 XS0286431100 106.875 2017-02-13   5

I know that you might have many ISIN value to incorporate to this kind of condition. Also, in your example the only tied ISIN values will get arranged by DATE and CF with no problems. This might not hold for your bigger data frame.

Matias Andina
  • 3,415
  • 4
  • 22
  • 43
  • this gives incorrect output. – mtoto Apr 01 '16 at 12:55
  • Thanks, it works, unfortunately I have a much bigger database. – user137425 Apr 01 '16 at 14:44
  • How did you come with this patern of order? Maybe there's a way to use the ISIN, clearly you can ultimately deduce the correct order in your mind, it's a matter of coding it :) – Matias Andina Apr 01 '16 at 14:59
  • I use mapply in a for loop to append data for individual bonds into this data frame. – user137425 Apr 01 '16 at 15:27
  • Thanks! But somehow the CF values in the groups are not always in increasing order. I get e.g. 5, 105, 5, 5, 5. – user137425 Apr 01 '16 at 17:36
  • This is the first time you mention CF values...I think you could use this example to create a third column...after all you're trying to do arragement within subgroups again...In this case, however, you can't use min because your want to sort numbers – Matias Andina Apr 01 '16 at 17:39
  • Sorry, I updated the question. I want to sort by Date in increasing order within the groups, but I get: 2016-04-21, 2020-04-21, 2017-04-21.. – user137425 Apr 01 '16 at 18:11
  • This shouldn't be the case if date is in the correct format...(In my example I just pasted and was too lazy to set the DATE like as.date when I loaded the data into R). Also this question mutated so much that I find it counter-productive, maybe close and ask again with the bunch of new code and your new questions... – Matias Andina Apr 01 '16 at 18:15
1
library(dplyr)
sorted <- df %>% arrange(ISIN,DATE)
vdep
  • 3,193
  • 4
  • 24
  • 46