I have a data frame that contains the pairs of elements found in a number of datasets. The order of pairs should not matter, they are given once by alphabetic sequence, however the first instance may differ between databases, as in the example.
data <- data.frame(i = c("b","b","b","c"), j = c("c","d","d","a"), +
database = c(1,1,2,3))
I would like to generate a score for them that would show the ratio of the instances in each database that contain the same pair.
I can imagine a crude function like this:
# For each database that includes particular i or j, test whether
# they have a connection to another particular element at j or i,
# respectively. Count the number of successes.
# Divide it by:
# Count(number of databases that contain either of the members of the pair in i or j)
The results I would expect from the example data set (order unimportant) are:
a c 0.5
b c 0.33
b d 1
I could see how this crude loop system might work, but I'm quite sure there is a more elegant solution, anyone able to help? Perhaps there is a specific function for this in a graph library. Thanks!