1

I am attempting to convert a data frame to a more easily readable format (by humans). The current data frame (DF1) contains a list of strings (questions) in Column A, and a list of strings (categories) in Column B. Where an item fits into multiple categories, there is another row further down in DF1 with the same question in Column A, but a different category in column B.

e.g.

   column1       column2      Column3
"question 1"   "category A"  "subcategory A"
"question 2"   "category A"  "subcategory B"
"question 1"   "category B"  "subcategory A"

I want to convert the data frame to have the question, and column B categories as column headers, with booleans indicating whether the question does or does not fall under that category.

e.g.

  Question    CategoryA    CategoryB
"question 1"    TRUE         TRUE
"question 2"    TRUE         FALSE

I have extracted a list of unique questions from DF1 using the following:

question <- list()
for (x in DF1$column1){
  if (!(x %in% question)){
    question[[x]] <- x
  }
}

What is the best way to do this in R? I'd prefer to avoid a series of nested loops if possible as I believe this can become very inefficient in R?

Many thanks!!

ben
  • 13
  • 2

1 Answers1

0

Does something like this do the trick

library(dplyr)
library(tidyr)

column1 <- c(
"question 1",   
"question 2" , 
"question 1" )

column2   <- c(
"category A"  ,
"category A"  ,
"category B"  )


column3 <- c(
"subcategory A",
"subcategory B",
"subcategory A")

df <- data.frame(column1, column2, column3, stringsAsFactors = FALSE)

dfwide <- df %>% 
          select(-column3) %>% 
          mutate(col4 = TRUE) %>% 
          spread(column1, col4, fill= FALSE)
Mike
  • 1,987
  • 1
  • 9
  • 21
  • Not quite - returns the error message: Error: Duplicate identifiers for rows (151, 158, 159, 665, 1592, 1599, 1600, 2106, 3033, 3040, 3041, 3547), (1325, 2766, 4207), (1322, 2763, 4204)...... etc – ben Apr 19 '18 at 20:53
  • From the data provided I did not get that error. here is a link that will show you how to resolve that issue, https://stackoverflow.com/questions/39053451/using-spread-with-duplicate-identifiers-for-rows . If you can't get it to work you will need to share more data so I can produce the error and correct the answer – Mike Apr 20 '18 at 13:13