5

I have data like this:

ID=c(rep("ID1",3), rep("ID2",2), "ID3", rep("ID4",2))
item=c("a","b","c","a","c","a","b","a")

data.frame(ID,item)

ID1 a
ID1 b
ID1 c
ID2 a
ID2 c
ID3 a
ID4 b
ID4 a

and I would need it as a list of edges like this:

a;b
b;c
a;c
a;c
b;a

the first three edges coming from ID1, fourth from ID2, ID3 has no edges so nothing from that and fifth from ID4. Any ideas on how to accomplish this? melt/cast?

TylerH
  • 19,065
  • 49
  • 65
  • 86
ElinaJ
  • 731
  • 1
  • 6
  • 17
  • Possible duplicate of [How to create an edge list from a matrix in R?](https://stackoverflow.com/questions/13204046/how-to-create-an-edge-list-from-a-matrix-in-r) – TylerH May 03 '19 at 15:09

3 Answers3

8

I'd guess there should be a simple igrpah solution for this, but here's a simple solution using data.table package

library(data.table)
setDT(df)[, if(.N > 1) combn(as.character(item), 2, paste, collapse = ";"), ID]

#     ID  V1
# 1: ID1 a;b
# 2: ID1 a;c
# 3: ID1 b;c
# 4: ID2 a;c
# 5: ID4 b;a
David Arenburg
  • 87,271
  • 15
  • 123
  • 181
  • Nice answer. :) I remember your comment related to `if(...) else(...)` this week or last week. You were wondering why `else()` was not in a data.table solution. I cannot recall which question that was. Did you find the reason why one does not need else() part? If you have information, I would like to know it. – jazzurro Feb 08 '15 at 13:19
  • @jazzurro I was wondering about `if` when you want to make an operation such as `dplyr::mutate` and you have to get values for `else` too, otherwise you won't have "enough" values. In this situation I'm doing something similar to `dplyr::summarise`, so I don't need `else` values (I actually want to get rid of them, thus the `if`). The solution for the question aksed back then (I guess) is that the OP wanted `NA`s in the `else` statement, and when `if` is running within `data.table` environment and assfined by `:=` operator, it generates `NA`s by default (if `else` isn't provided). – David Arenburg Feb 08 '15 at 13:24
  • 1
    Thank you very much for the clear explanation. The default generation of NA is something good to know. Once again, thank you for taking your time. – jazzurro Feb 08 '15 at 13:36
3

Try

 res <- do.call(rbind,with(df, tapply(item, ID, 
         FUN=function(x) if(length(x)>=2) t(combn(x,2)))))
  paste(res[,1], res[,2], sep=";")
 #[1] "a;b" "a;c" "b;c" "a;c" "b;a"
akrun
  • 674,427
  • 24
  • 381
  • 486
  • Thanks! I'm using your previous version: lst =2) t(combn(x,2)) else NULL) nodes=as.data.frame(do.call(rbind,lst[!sapply(lst, is.null)]) ), but could You please advise me in how "take along" ID and some other variables (age, sex etc) from the original df and have them as columns in "nodes"? – ElinaJ Feb 11 '15 at 07:03
  • @ElinaJ Could you please update your post with the new dataset and expected result – akrun Feb 11 '15 at 07:37
  • To be clear and match the answers, I made a new topic: http://stackoverflow.com/questions/28449118/creating-edge-list-with-additional-variables-in-r – ElinaJ Feb 11 '15 at 07:55
2

Here is a more scalable solution that uses the same core logic as the other solutions:

library(plyr)
library(dplyr)

ID=c(rep("ID1",3), rep("ID2",2), "ID3", rep("ID4",2))
item=c("a","b","c","a","c","a","b","a")

dfPaths = data.frame(ID, item)
dfPaths2 = dfPaths %>% 
  group_by(ID) %>% 
  mutate(numitems = n(), item = as.character(item)) %>%
  filter(numitems > 1)


ddply(dfPaths2, .(ID), function(x) t(combn(x$item, 2)))
tchakravarty
  • 10,038
  • 10
  • 58
  • 107
  • 1
    You could do this within `dplyr` using `do` `dfPaths %>% group_by(ID) %>% filter(n()>1) %>% do(data.frame(V1=combn(as.character(.$item), 2, FUN=paste, collapse=";")))` – akrun Feb 08 '15 at 14:19
  • @akrun Thanks -- did not know that. – tchakravarty Feb 08 '15 at 14:25