1

I courageously entered the world of statistics...I have a table with more then 500 entries. I want to do a exact fisher test on each of the rows and plot the p-values in one table with the name of the variable:

currently, i do it one by one, but it takes a lot of time:

aa  58  76  48  44
bb  65  69  30  62
cc  35  99  23  69
dd  36  98  16  76
ee  27  107 24  68
ff  30  104 12  80
....

example: aa = earthquake

aa <- matrix(c(58,76,48,44), nrow = 2)  
fisher.exact(aa)

bb <- matrix(c(65,69,30,62), nrow = 2)  
fisher.exact(bb)

cc <- matrix(c(35,99,23,69), nrow = 2)  
fisher.exact(cc)

(....)

How can i do it with in one time and how can i extract the p-values and odd ratios per line in a table or a plot?

hello_r
  • 13
  • 4
  • Welcome to StackOverflow. I suspect that a simple `lapply` would work for you, but how to implement it depends on you data structure. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Perhaps the following tips on [asking a good question](http://stackoverflow.com/help/how-to-ask) may also be worth a read. – lmo Jan 06 '17 at 14:00
  • For example, what type of object are you working with? the creating a great R example link shows a number of ways to share samples of your data or to construct a similar data.set. Please post a reproducible example. – lmo Jan 06 '17 at 15:30
  • Please do not use images as they are not reproducible. Use `dput` for an existing data set or construct a toy example and share the code. Also, the image that you posted is a 2X2 "matrix" and does not seem to illustrate your problem. – lmo Jan 06 '17 at 16:36
  • 1
    Dear you, each row is a 2*2 the way the image shows. i have more then 500 2*2, and as you can see i struggle to make a code out of it where i can calculate each p-value rather then running one sample after the other and then pick the result manually in a table. please try to understand and teach me. i appreciate. – hello_r Jan 06 '17 at 16:38
  • OK. Since you will not provide an example dataset, I will assume that it is stored in a data.frame. – lmo Jan 06 '17 at 16:44

2 Answers2

3

Lists are good to store your matrices in. Here's a tidyverse approach. You could do this without storing everything in a list frame, but I like keeping all the pieces of a workflow together.

EDIT: If you get everything in as a csv with each item in a row, as per your first example, you could run it like:

librar(tidyverse)

analysis  <- read.csv(path_to_your_file) %>% 
  setNames(c("group", "V1","V2","V3","V4")) %>% 
  nest(-group) %>% 
  mutate(matrix = map(data, ~matrix(unlist(.x), nrow = 2))) %>% 
  mutate(fisher = map(matrix, ~fisher.test(.x))) %>% 
  mutate(stats = map(fisher, ~broom::glance(.x))

analysis %>% 
  unnest(stats) %>%
  select(group, p.value, odds = estimate)

   # A tibble: 6 × 3
  group    p.value      odds
  <chr>      <dbl>     <dbl>
1    aa 0.22239730 0.7006909
2    bb 0.01993561 1.9411244
3    cc 0.87802037 1.0603520
4    dd 0.10923094 1.7407100
5    ee 0.33248291 0.7160521
6    ff 0.08389711 1.9177455

You can read more this approach: here and here.

Jake Kaupp
  • 7,097
  • 2
  • 21
  • 34
  • Sorry, skipped the line that included the libraries. Fixed now. – Jake Kaupp Jan 07 '17 at 17:09
  • Updated my answer – Jake Kaupp Jan 07 '17 at 18:24
  • Use / instead of \ in your file path. e.g: `read.csv("C:/Users/An.Ba/Desktop/ClimQue400TEST.csv")` – Jake Kaupp Jan 07 '17 at 22:09
  • dear jake, i would send you flowers if i could. it does read the file now. but running the whole code gives me this error: Error in eval(expr, envir, enclos) : object 'analysis' not found i m sorry to bother again, but i feel i m so close to heureka...that it would be a shame to give up. thank you, i m off to bed now. ps. i kept the line library(tidyverse) in the beginning. – hello_r Jan 07 '17 at 22:35
  • Check to make sure the number of columns in your csv is 5, if there is more you'd have to add more names in `setNames`. Check to see if your data has a header, if it does you should use `header = TRUE` in `read.csv`. Check to see if you've any lines to skip in the csv, if you do, remove them or use the `skip` argument. You just need to debug the data, if it resembles the data in your first example, it should work. – Jake Kaupp Jan 07 '17 at 22:56
  • Jake!! it works!! Even with bigger data sets. the two last problem (i hope): its only shows the results from the second line, bb. any idea why? i included the headder=true, but it does not make a difference. 2. how do i get this results out now graciously? Of course, i can use a copypaste out of the console, but im sure R has better options. Thank you! – hello_r Jan 09 '17 at 14:19
  • If you want the results out, you can use `write.csv("myfile.csv)` at the end of the output chain to write it to a CSV. – Jake Kaupp Jan 09 '17 at 14:44
  • does not work yet, but i try around. any thoughts on the missing first row "aa" in my output? it says: A tibble: 5 × 3 and then continues... – hello_r Jan 09 '17 at 15:06
  • You just need to work on your debugging. If something is missing, check to see if its in the data correctly, then find out where it disappears and check to see how that would go wrong. Good luck! – Jake Kaupp Jan 09 '17 at 15:55
  • It worked. i inserted a header in the cvs file. Appreciating your support. you have helped me a lot. best a. – hello_r Jan 09 '17 at 16:39
  • Most likely that is something in your data, and the lines you've added. If it's something that you can't figure out and need help with, make a new question. – Jake Kaupp Jan 11 '17 at 14:09
  • You need to review what exactly a t-test tells you. This is question for google, or to search on stats stack exchange. – Jake Kaupp Jan 11 '17 at 17:33
1

With the data.frame below,

# convert to data matrix
myMat <- data.matrix(df[-1])
# add rownames to matrix
rownames(myMat) <- df[[1]]

# run the test, store results in a list
myTests <- lapply(seq_len(nrow(myMat)), function(i) fisher.test(matrix(myMat[i,], nrow=2)))

Now, check out some of the results.

myTests[[1]]

    Fisher's Exact Test for Count Data

data:  matrix(myMat[i, ], nrow = 2)
p-value = 0.2224
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.3964215 1.2342274
sample estimates:
odds ratio 
 0.7006909 

Take a look at what the test stores:

str(myTests[[1]])
List of 7
 $ p.value    : num 0.222
 $ conf.int   : atomic [1:2] 0.396 1.234
  ..- attr(*, "conf.level")= num 0.95
 $ estimate   : Named num 0.701
  ..- attr(*, "names")= chr "odds ratio"
 $ null.value : Named num 1
  ..- attr(*, "names")= chr "odds ratio"
 $ alternative: chr "two.sided"
 $ method     : chr "Fisher's Exact Test for Count Data"
 $ data.name  : chr "matrix(myMat[i, ], nrow = 2)"
 - attr(*, "class")= chr "htest"

Pull out an interesting part of the test, the p-value

myTests[[1]]$p.value
[1] 0.2223973

Now, pull out the p-values from all of the tests

unlist(lapply(myTests, function(i) i$p.value))
[1] 0.22239730 0.01993561 0.87802037 0.10923094 0.33248291 0.08389711

This should get you started. I'd recommend looking up each of the unfamiliar functions in the help files and reading gregor's answer on this post on working with lists and why that is the way to go in R.

data

df <- structure(list(V1 = structure(1:6, .Label = c("aa", "bb", "cc", 
"dd", "ee", "ff"), class = "factor"), V2 = c(58L, 65L, 35L, 36L, 
27L, 30L), V3 = c(76L, 69L, 99L, 98L, 107L, 104L), V4 = c(48L, 
30L, 23L, 16L, 24L, 12L), V5 = c(44L, 62L, 69L, 76L, 68L, 80L
)), .Names = c("V1", "V2", "V3", "V4", "V5"), class = "data.frame", row.names = c(NA, 
-6L))
Community
  • 1
  • 1
lmo
  • 35,764
  • 9
  • 49
  • 57
  • I have to work through. will get back to you. thank you very much, i appreciate your support and helpfulness. – hello_r Jan 06 '17 at 17:37
  • It depends on the type of plot you want to do. For a simple scatter plot, use `plot`. For example, `plot(1:10, seq(1,20,2))`. There are plenty of options available for manipulating the plot. Just do a simple search for the type of plot and R and you will find many links. – lmo Jan 07 '17 at 17:09