3

I have a large data.frame of bond data, like that:

   ISIN      CF       DATE
    A   105.750  2016-09-30
    B   104.875  2016-05-31
    C   106.875  2017-02-13
    D   103.875  2016-10-07
    E   5.000    2016-04-21
    E   5.000    2017-04-21
    E   5.000    2018-04-21
    E   5.000    2019-04-21
    E   105.000  2020-04-21
    F   7.800    2016-09-09
    F   7.800    2017-09-09
    F   7.800    2018-09-09
    F   7.800    2019-09-09
    F   107.800  2020-09-09

I want to group the elements by the ISIN code, then within these groups sort the CF elements by DATE in increasing order (already done in the example above). Then I want to sort the groups (A, B, C, D, E, F in this example) such that the group with the earliest date comes first, then the group with the second earliest date, and so on.

I want it to look like this:

  ISIN     CF      DATE
    E   5.000   2016-04-21
    E   5.000   2017-04-21
    E   5.000   2018-04-21
    E   5.000   2019-04-21
    E   105.000 2020-04-21
    B   104.875 2016-05-31
    F    7.800  2016-09-09
    F    7.800  2017-09-09
    F    7.800  2018-09-09
    F    7.800  2019-09-09
    F   107.800 2020-09-09
    A   105.750 2016-09-30
    D   103.875 2016-10-07
    C   106.875 2017-02-13

I tried something like from this question:

How to sort a dataframe by column(s)?

df<-df[order(df$ISIN,df$DATE),]

But it doesn't do what I want.

Community
  • 1
  • 1
user137425
  • 423
  • 5
  • 16

2 Answers2

3

This gets the job done -- basically, create a rank of each ISIN by minimum date first, then order on that rank:

library(data.table)
setDT(DF)

DF[DF[ , min(DATE), by = ISIN
       ][ , .(ISIN, rank = frank(V1))
          ], on = "ISIN"
   ][order(rank, DATE)]
#     ISIN      CF       DATE rank
#  1:    E   5.000 2016-04-21    1
#  2:    E   5.000 2017-04-21    1
#  3:    E   5.000 2018-04-21    1
#  4:    E   5.000 2019-04-21    1
#  5:    E 105.000 2020-04-21    1
#  6:    B 104.875 2016-05-31    2
#  7:    F   7.800 2016-09-09    3
#  8:    F   7.800 2017-09-09    3
#  9:    F   7.800 2018-09-09    3
# 10:    F   7.800 2019-09-09    3
# 11:    F 107.800 2020-09-09    3
# 12:    A 105.750 2016-09-30    4
# 13:    D 103.875 2016-10-07    5
# 14:    C 106.875 2017-02-13    6

If you want to avoid creating a copy, do this instead:

DF[DF[ , min(DATE), by = ISIN
       ][ , .(ISIN, rank = frank(V1))
          ], rank := rank, on = "ISIN"]

setorder(DF, rank, DATE)

If you don't want to create the rank column, use factor levels instead:

ord <- DF[ , min(DATE), by = ISIN][ , ISIN[frank(V1)]]

DF[ , ISIN := factor(ISIN, levels = ord)]
DF[order(ISIN, DATE)]
#     ISIN      CF       DATE
#  1:    E   5.000 2016-04-21
#  2:    E   5.000 2017-04-21
#  3:    E   5.000 2018-04-21
#  4:    E   5.000 2019-04-21
#  5:    E 105.000 2020-04-21
#  6:    B 104.875 2016-05-31
#  7:    F   7.800 2016-09-09
#  8:    F   7.800 2017-09-09
#  9:    F   7.800 2018-09-09
# 10:    F   7.800 2019-09-09
# 11:    F 107.800 2020-09-09
# 12:    A 105.750 2016-09-30
# 13:    D 103.875 2016-10-07
# 14:    C 106.875 2017-02-13

You can also do this in base like so, but it'll be slower:

ord <- names(sort(by(DF, DF$ISIN, function(x) min(x$DATE))))

DF$ISIN <- factor(DF$ISIN, levels = ord)

DF[with(DF, order(ISIN, DATE)),]
MichaelChirico
  • 31,197
  • 13
  • 98
  • 169
2

Using dplyr, you could do the following:

library(dplyr)
df %>% group_by(ISIN) %>% 
  mutate(minDate = paste0(min(DATE), ISIN)) %>% 
  arrange(DATE) %>% ungroup() %>% arrange(minDate) %>%
  select(-minDate)

note the the temporary minDate column also includes ISIN, so that you can resolve the case where you have two min dates of the same value. Change mutate(minDate = paste0(min(DATE), ISIN)) -> mutate(minDate = min(DATE)) to get rid of this

pluke
  • 2,710
  • 5
  • 38
  • 60