0

I need to collapse "+/-" columns names into one column name and add counts. Here is an example of the dataset.

data <- as.data.frame(c("+A","-A","+A","-A", "+A","+A","-B","+B","-B", "C","C"))
colnames(data) <- "class"

table(data$class)
-A -B +A +B  C 
 2  2  4  1  2 

I would like to collapse those -/+ col names into no sign column names and add them up.

The desired dataset should look like below:

A   B   C
6   3   2

Sometimes these +/- variables might be missing or category "C" might be missing too. How can I add them up even though I have some missing categories?

Any thoughts?

Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
amisos55
  • 1,079
  • 7
  • 13

2 Answers2

5
table(gsub("\\W", "", data$class))

A B C 
6 3 2 

This gives you output of class table. If this is not sufficient, you might wrap it in as.data.frame(t(c(x))) to have the same output, but of class data.frame, like this:

as.data.frame(t(c(table(gsub("\\W", "", data$class)))))
Lennyy
  • 4,953
  • 2
  • 7
  • 21
1

I modified your example, to cover certain cases mentioned in your post. Let's say you'll have only 3 categories in your data, we can extract characters A-C, convert it into factor with 3 levels and then use table. In this case, we do not have any entry for category "C", converting to factor with levels helps to keep the count even when a category is absent.

data <- data.frame(col = c("+A","A","+A","-A","+A","+A","-B","+B","-B"))

table(factor(sub(".*([A-C]).*", "\\1", data$col), levels = LETTERS[1:3]))
#A B C 
#6 3 0 

If there are more than 3 categories present in the data, we can adjust the regex and levels in factor accordingly.

Ronak Shah
  • 286,338
  • 16
  • 97
  • 143