R problems with function 'subset'

Question

I'm just learning to use R, so it may seems as a noob question for you, but I have some problems with function “subset”. I tried to find an answer in previous questions, but failed(

For example, I have a data frame q with 3 variables x, y, z

q = read.csv("test.csv",encoding = "UTF-8",
                  header = TRUE, sep = ",", na.strings = c("",NA))

Variable x has 4 meanings a, b, c, d

I'm trying to make a data frame q1 only with 2 meanings of variable x - a & c

q1 = subset(q, q$x == 'a' | q$x == 'c')

As a result I have new data frame with 2 meanings of variable x (I check it by opening new dataframe).

But when I table variable x from new dataset q1, I see again 4 meanings, but with the number of b & d =0.

What do I do incorrectly? Why do I see b & d, when I table x in new data set?

Thanks for your help!

Just add a `droplevels()`. For example `q1 = droplevels(subset(q, x %in% c('a','c')))` — MrFlick, Dec 08 '17 at 16:54

score 1 · Answer 1 · answered Dec 08 '17 at 16:55

The column in your data frame is a factor, which is another name for a categorical variable, a thing that can take one of a number of possible character values, or "levels", such as "Male" or "Female".

When you subset a factor you don't change the levels. What you are seeing is the levels tabulated, so there are some zeroes.

If you want to avoid this then convert your factors to character values with the as.character function or read them in as character with the stringsAsFactors=FALSE option to read.csv.

score 1 · Answer 2 · answered Dec 08 '17 at 16:55

1

factor variables (R version of categorical variables) remember all possible categories unless you tell them not to. You can "forget" them with q1 = droplevels(q1) or by converting the factor to an ordinary string: q1$x = as.character(q1$x)

answered Dec 08 '17 at 16:55

Gregor Thomas

104,719
16
140
257

R problems with function 'subset'

2 Answers2