Special grouping number for each pairs

Question

There is already some part of the question answered here special-group-number-for-each-combination-of-data. In most cases we have pairs and other data values inside the data. What we want to achieve is that number those groups if those pairs exist and number them until the next pairs.

As I concentrated each pairs such as c("bad","good") would like to group them and for pairs c('Veni',"vidi","Vici") assign unique number 666.

Here is the example data

names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")

  df <- data.frame(names)

Here is the real and general case expected output

     names  Group
1      bad    1
2     good    1
3        1    1
4        2    1
5     good    2
6      bad    2
7      111    2
8      bad    3
9  J.James    3
10    good    4
11 J.James    4
12     333    4
13 J.James    5
14    good    5
15     761    5
16    Veni    666
17    vidi    666
18    Vici    666

Why is a new group started on row 10. You treat good and Good as the same term? — Frank, Feb 21 '18 at 18:55
The grouping scheme makes 0 sense to me. Please explain the groups for rows 1:15. — Vlo, Feb 21 '18 at 18:59
@Vlo A group cannot contain the same value twice. Row 5 starts a new group because `good` appeared already in the current group; row 13 starts a new group since J.James appeared already in the current group (group 4) ... it seems like it must be done row-by-row and probably quite slowly, but maybe I'm missing something. — Frank, Feb 21 '18 at 19:01
@Alexander is the data listed above in your final format? It's very frusterating to try to solve this problem if you're going to change what it looks like every 2 minutes. You've changed what "names" is considerably, with your latest adjustment. — InfiniteFlash, Feb 21 '18 at 19:06
@Frank I understand your point. yeah unfortunately the real data is like this. and I have been scraching my head how to do that. I was trying something like `cumsum(names=='good|bad')` but no luck as the location of starting value of groups in pairs. — Alexander, Feb 21 '18 at 19:06
@InfiniteFlashChess Sorry. I just changed the upper case 'Good' to lower case 'good'. It was the only change. Yes it is the final format:) — Alexander, Feb 21 '18 at 19:07
I'll be frank, the way you're creating `Group` doesn't make sense. It looks like it's assigned at random. No one here understands how Group 1, 2, 3,4 or 5 are created. We only know how `666` is created. — InfiniteFlash, Feb 21 '18 at 19:10
https://stackoverflow.com/questions/28013850/change-value-of-variable-with-dplyr/28013895#28013895 — InfiniteFlash, Feb 21 '18 at 19:16
@InfiniteFlashChess The way of creating group is as it says in previous post [Special group number for each combination of data](https://stackoverflow.com/questions/48912908/special-group-number-for-each-combination-of-data#48912908) . If those pairs exist in the rows, assign a group number to them until the next pairs. — Alexander, Feb 21 '18 at 19:20
Hm, actually, turns out I don't understand the rule. For group 3, the values are bad and J.James, so I don't know why a new group starts with "good". — Frank, Feb 21 '18 at 19:21
@Frank Because the data exist in that way. Lets say you have data starts with pairs (bad,J.James) and than (good,J.James). It just as it is. — Alexander, Feb 21 '18 at 19:26
Ok, I guess I get it now. You can use `z = as.numeric(names); match(z, setdiff(unique(z), c("Veni", "vidi", "Vici", NA)))` where the answerer on your last question had `sequence(nrow(df))` and it should work... if the key thing is whether the name is coercible to numeric. — Frank, Feb 21 '18 at 19:33

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

Here are two approaches which reproduce OP's expected result for the given sample dataset.`

Both work in the same way. First, all "disturbing" rows, i.e., rows which do not contain "valid" names, are skipped and the rows with "valid" names are simply numbered in groups of 2. Second, the rows with exempt names are given the special group number. Finally, the NA rows are filled by carrying the last observation forward.

`data.table`

library(data.table)
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
exempt <- c("Veni", "vidi", "Vici")
data.table(names)[is.na(as.numeric(names)) & !names %in% exempt, 
                  grp := rep(1:.N, each = 2L, length.out = .N)][
                    names %in% exempt, grp := 666L][
                      , grp := zoo::na.locf(grp)][]

      names grp
 1:     bad   1
 2:    good   1
 3:       1   1
 4:       2   1
 5:    good   2
 6:     bad   2
 7:     111   2
 8:     bad   3
 9: J.James   3
10:    good   4
11: J.James   4
12:     333   4
13: J.James   5
14:    good   5
15:     761   5
16:    Veni 666
17:    vidi 666
18:    Vici 666

`dplyr`/`tidyr`

Here is my attempt to provide a dplyr/tidyr solution:

library(dplyr)
as_tibble(names) %>% 
  mutate(grp = if_else(is.na(as.numeric(names)) & !names %in% exempt,  
                       rep(1:n(), each = 2L, length.out = n()),
                       if_else(names %in% exempt, 666L, NA_integer_))) %>% 
  tidyr::fill(grp)

# A tibble: 18 x 2
   value     grp
   <chr>   <int>
 1 bad         1
 2 good        1
 3 1           1
 4 2           1
 5 good        3
 6 bad         3
 7 111         3
 8 bad         4
 9 J.James     5
10 good        5
11 J.James     6
12 333         6
13 J.James     7
14 good        7
15 761         7
16 Veni      666
17 vidi      666
18 Vici      666

I just noticed that the Q was tagged `dplyr` - my apologies. I am much more fluent in `data.table` than in `dplyr`, so this may take a while... — Uwe, Feb 21 '18 at 23:00
I managed to provide a `dplyr`/`tidyr` version of `data.table` approach. — Uwe, Feb 21 '18 at 23:24
your solution is elegant! and what I was looking for. Excellent! — Alexander, Feb 22 '18 at 00:20

Special grouping number for each pairs

1 Answers1

`data.table`

`dplyr`/`tidyr`

Linked

Special grouping number for each pairs

1 Answers1

data.table

dplyr/tidyr

Linked

`data.table`

`dplyr`/`tidyr`