0

There is already some part of the question answered here special-group-number-for-each-combination-of-data. In most cases we have pairs and other data values inside the data. What we want to achieve is that number those groups if those pairs exist and number them until the next pairs.

As I concentrated each pairs such as c("bad","good") would like to group them and for pairs c('Veni',"vidi","Vici") assign unique number 666.

Here is the example data

names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")

  df <- data.frame(names)

Here is the real and general case expected output

     names  Group
1      bad    1
2     good    1
3        1    1
4        2    1
5     good    2
6      bad    2
7      111    2
8      bad    3
9  J.James    3
10    good    4
11 J.James    4
12     333    4
13 J.James    5
14    good    5
15     761    5
16    Veni    666
17    vidi    666
18    Vici    666
Alexander
  • 3,691
  • 5
  • 30
  • 66
  • Why is a new group started on row 10. You treat good and Good as the same term? – Frank Feb 21 '18 at 18:55
  • 1
    @Frank No Nope. Just typing mistake. So sorry! – Alexander Feb 21 '18 at 18:56
  • 1
    The grouping scheme makes 0 sense to me. Please explain the groups for rows 1:15. – Vlo Feb 21 '18 at 18:59
  • 1
    @Vlo A group cannot contain the same value twice. Row 5 starts a new group because `good` appeared already in the current group; row 13 starts a new group since J.James appeared already in the current group (group 4) ... it seems like it must be done row-by-row and probably quite slowly, but maybe I'm missing something. – Frank Feb 21 '18 at 19:01
  • @Alexander is the data listed above in your final format? It's very frusterating to try to solve this problem if you're going to change what it looks like every 2 minutes. You've changed what "names" is considerably, with your latest adjustment. – InfiniteFlash Feb 21 '18 at 19:06
  • @Frank I understand your point. yeah unfortunately the real data is like this. and I have been scraching my head how to do that. I was trying something like `cumsum(names=='good|bad')` but no luck as the location of starting value of groups in pairs. – Alexander Feb 21 '18 at 19:06
  • @InfiniteFlashChess Sorry. I just changed the upper case 'Good' to lower case 'good'. It was the only change. Yes it is the final format:) – Alexander Feb 21 '18 at 19:07
  • I'll be frank, the way you're creating `Group` doesn't make sense. It looks like it's assigned at random. No one here understands how Group 1, 2, 3,4 or 5 are created. We only know how `666` is created. – InfiniteFlash Feb 21 '18 at 19:10
  • This looks like a duplicate of akrun's top post. – InfiniteFlash Feb 21 '18 at 19:16
  • https://stackoverflow.com/questions/28013850/change-value-of-variable-with-dplyr/28013895#28013895 – InfiniteFlash Feb 21 '18 at 19:16
  • @InfiniteFlashChess The way of creating group is as it says in previous post [Special group number for each combination of data](https://stackoverflow.com/questions/48912908/special-group-number-for-each-combination-of-data#48912908) . If those pairs exist in the rows, assign a group number to them until the next pairs. – Alexander Feb 21 '18 at 19:20
  • Hm, actually, turns out I don't understand the rule. For group 3, the values are bad and J.James, so I don't know why a new group starts with "good". – Frank Feb 21 '18 at 19:21
  • @Frank Because the data exist in that way. Lets say you have data starts with pairs (bad,J.James) and than (good,J.James). It just as it is. – Alexander Feb 21 '18 at 19:26
  • Ok, I guess I get it now. You can use `z = as.numeric(names); match(z, setdiff(unique(z), c("Veni", "vidi", "Vici", NA)))` where the answerer on your last question had `sequence(nrow(df))` and it should work... if the key thing is whether the name is coercible to numeric. – Frank Feb 21 '18 at 19:33

1 Answers1

1

Here are two approaches which reproduce OP's expected result for the given sample dataset.`

Both work in the same way. First, all "disturbing" rows, i.e., rows which do not contain "valid" names, are skipped and the rows with "valid" names are simply numbered in groups of 2. Second, the rows with exempt names are given the special group number. Finally, the NA rows are filled by carrying the last observation forward.

data.table

library(data.table)
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
exempt <- c("Veni", "vidi", "Vici")
data.table(names)[is.na(as.numeric(names)) & !names %in% exempt, 
                  grp := rep(1:.N, each = 2L, length.out = .N)][
                    names %in% exempt, grp := 666L][
                      , grp := zoo::na.locf(grp)][]
      names grp
 1:     bad   1
 2:    good   1
 3:       1   1
 4:       2   1
 5:    good   2
 6:     bad   2
 7:     111   2
 8:     bad   3
 9: J.James   3
10:    good   4
11: J.James   4
12:     333   4
13: J.James   5
14:    good   5
15:     761   5
16:    Veni 666
17:    vidi 666
18:    Vici 666

dplyr/tidyr

Here is my attempt to provide a dplyr/tidyr solution:

library(dplyr)
as_tibble(names) %>% 
  mutate(grp = if_else(is.na(as.numeric(names)) & !names %in% exempt,  
                       rep(1:n(), each = 2L, length.out = n()),
                       if_else(names %in% exempt, 666L, NA_integer_))) %>% 
  tidyr::fill(grp)
# A tibble: 18 x 2
   value     grp
   <chr>   <int>
 1 bad         1
 2 good        1
 3 1           1
 4 2           1
 5 good        3
 6 bad         3
 7 111         3
 8 bad         4
 9 J.James     5
10 good        5
11 J.James     6
12 333         6
13 J.James     7
14 good        7
15 761         7
16 Veni      666
17 vidi      666
18 Vici      666
Community
  • 1
  • 1
Uwe
  • 34,565
  • 10
  • 75
  • 109
  • I just noticed that the Q was tagged `dplyr` - my apologies. I am much more fluent in `data.table` than in `dplyr`, so this may take a while... – Uwe Feb 21 '18 at 23:00
  • I managed to provide a `dplyr`/`tidyr` version of `data.table` approach. – Uwe Feb 21 '18 at 23:24
  • your solution is elegant! and what I was looking for. Excellent! – Alexander Feb 22 '18 at 00:20