0

I have a dataset with 74 columns, and I am trying to create a histogram for each one of these in a for loop. When run in the console, my code is fine, but when I try to knit it, R says that my data is not numeric. Anyone know why? I'm also open to other solutions to this problem of plotting multiple histograms (with the column name as an identifier on the plot). Below is a simplified representation of my problem.

library(dplyr) 

data2 <- data.frame(c(1,3,5,5,2,2,1,1,1,1),
                    c(2,4,2,3,4,5,1,2,3,3))

colnames(data2) <- c("A", "B")

for (cols in colnames(data2)) {
  data2 %>% select(cols) %>% hist()
}

Again, works fine line-by-line, but I end up receiving the following error when I try to knit it:

"Error in hist.default(.) : 'x' must be numeric Calls: ...freduce -> withVisible -> -> hist -> hist.default

Execution halted"

Interestingly, this code knits fine:

library(dplyr)

data2 <- data.frame(c(1,3,5,5,2,2,1,1,1,1),
                    c(2,4,2,3,4,5,1,2,3,3))

colnames(data2) <- c("A", "B")

hist(data2$A)
hist(data2$B)
  • Just a comment about isolating the problem: this has nothing to do with `knitr`, you get the same error if you run the R code in a script or in an Rmd document. Because the problem exists whether or not you use knitr, I've removed the `knitr` tag and the mention in the title. – Gregor Thomas Jan 06 '20 at 14:45

4 Answers4

2

You can simply use lapply.

lapply(data2, hist)

Edit: Of course you can extend this as you like, e.g. titles, labels.

op <- par(mfrow=c(1, 2))  # to put histograms side by side
lapply(seq(data2), function(x) 
  hist(x=data2[[x]], xlab=names(data2)[x], main=paste("Histogram", names(data2)[x])))
par(op)  # restore

enter image description here

jay.sf
  • 33,483
  • 5
  • 39
  • 75
  • Thank you, but the reason I used a for loop instead of lapply is that I wanted to preserve the column name for the main title or x axis title. – Nicholas C. Dove Jan 06 '20 at 21:15
1

selecting a single column creates a 1-column data frame. hist expects a numeric vector, not a data frame. Use pull instead to extract a column as a vector:

for (cols in colnames(data2)) {
  data2 %>% pull(cols) %>% hist()
}

I'm also not sure what you mean with "My line-by-line code works fine", I get the same error taking the code out of the for loop (but pull still works fine):

data2 %>% select(A) %>% hist
# Error in hist.default(.) : 'x' must be numeric
Gregor Thomas
  • 104,719
  • 16
  • 140
  • 257
1

Thanks, Gregor. In case people are interested, my final code to keep the column names for reference is:

for (cols in colnames(data2)) {
  data2 %>% pull(cols) %>% hist(main = cols)
}
0

This may also work in base.R, if you like.

data2 <- data.frame(c(1,3,5,5,2,2,1,1,1,1),
                    c(2,4,2,3,4,5,1,2,3,3))

colnames(data2) <- c("A", "B")

for(x in names(data2)) {
  hist(data2[,x], main = paste(x, 'distribution' ))
}