13

I'm rewriting all my code using dplyr, and need help with mutate / mutate_at function. All I need is to apply custom function to two columns in my table. Ideally, I would reference these columns by their indices, but now I can't make it work even referencing by names.

The function is:

binom.test.p <- function(x) {
  if (is.na(x[1])|is.na(x[2])|(x[1]+x[2])<10) {
    return(NA)
  } 
  else {
    return(binom.test(x, alternative="two.sided")$p.value)
  }
} 

My data:

table <- data.frame(geneId=c("a", "b", "c", "d"), ref_SG1_E2_1_R1_Sum = c(10,20,10,15), alt_SG1_E2_1_R1_Sum = c(10,20,10,15))

So I do:

table %>%
  mutate(Ratio=binom.test.p(c(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum)))
Error: incorrect length of 'x'

If I do:

table %>% 
mutate(Ratio=binom.test.p(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum))
Error: unused argument (c(10, 20, 10, 15))

The second error is probably because my function needs one vector and gets two parameters instead.

But even forgetting about my function. This works:

table %>%
  mutate(sum = ref_SG1_E2_1_R1_Sum + alt_SG1_E2_1_R1_Sum)

This doesn't:

    table %>%
      mutate(.cols=c(2:3), .funs=funs(sum=sum(.)))
Error: wrong result size (2), expected 4 or 1

So it's probably my misunderstanding of how dplyr works.

amonk
  • 1,651
  • 2
  • 15
  • 25
kintany
  • 481
  • 1
  • 5
  • 14

2 Answers2

11

Your problem seems to be binom.test instead of dplyr, binom.test is not vectorized, so you can not expect it work on vectors; You can use mapply on the two columns with mutate:

table %>% 
    mutate(Ratio = mapply(function(x, y) binom.test.p(c(x,y)), 
                          ref_SG1_E2_1_R1_Sum, 
                          alt_SG1_E2_1_R1_Sum))

#  geneId ref_SG1_E2_1_R1_Sum alt_SG1_E2_1_R1_Sum Ratio
#1      a                  10                  10     1
#2      b                  20                  20     1
#3      c                  10                  10     1
#4      d                  15                  15     1

As for the last one, you need mutate_at instead of mutate:

table %>%
      mutate_at(.vars=c(2:3), .funs=funs(sum=sum(.)))
Psidom
  • 171,477
  • 20
  • 249
  • 286
  • Thank you SO much! It works. Do you know by any chance how to do the same but referring to these columns by their indices? – kintany Jun 23 '17 at 22:56
  • You mean something like `mapply(function(...), 2, 3)`? – Psidom Jun 23 '17 at 22:59
  • I'm trying to make this code more usable for future, so columns can be named differently, it would be better to have something like mutate(p.val = mapply(function(x, y) binom.test.p(c(x,y)), select(.,2), select(.,3))) but working – kintany Jun 23 '17 at 23:03
  • 1
    You might try something like this, `table %>% mutate(Ratio = mapply(function(x, y) binom.test.p(c(x,y)), select(.,2)[[1]], select(.,3)[[1]]))`. Not sure how dynamic this might be though. – Psidom Jun 23 '17 at 23:09
2

In many cases it's sufficient to create a vectorized version of the function:

your_function_V <- Vectorize(your_function)

The vectorized function is then usable in a dplyr's mutate. See also this blog post.

The function posted in the question however takes one two-dimensional input from two different columns. Therefore we need to modify this, so the inputs are individual, before we vectorize.

binom.test.p <- function(x, y) {
  # input x and y
  x <- c(x, y)
  
  if (is.na(x[1])|is.na(x[2])|(x[1]+x[2])<10) {
    return(NA)
  } 
  else {
    return(binom.test(x, alternative="two.sided")$p.value)
  }
} 

# vectorized function
binom.test.p_V <- Vectorize(binom.test.p)

table %>%
  mutate(Ratio = binom.test.p_V(ref_SG1_E2_1_R1_Sum, alt_SG1_E2_1_R1_Sum))

# works!
Martin
  • 623
  • 5
  • 15