1

I am dropping a variables from data frame in R; checking the output thereafter as per below :

Step 1:

str(bill_11)
'data.frame':   403771 obs. of  11 variables:

$ Month          : Factor w/ 4 levels "Apr-12","Feb-12",..: 2 2 2 2 2 2 2 2 2 2 ...

Apr-12 Feb-12 Mar-12 May-12 
81891 103668 118070 100142

Step 2:

feb_bill  <- bill_11[which(bill_11$Month == "Feb-12"),]
str(feb_bill)
'data.frame':   103668 obs. of  11 variables:

 $ Month      : Factor w/ 4 levels "Apr-12","Feb-12",..: 2 2 2 2 2 2 2 2 2 2 ...

Apr-12 Feb-12 Mar-12 May-12 
 0 103668      0      0 

My question is ; I have dropped 3-levels of the factor month, **but new data frame is still showing that "Month" has 4-levls. Though the subset operation is correct, I have some doubt.

I am recent with R, comparing with SAS. Is it functionality of R str() function or .something wrong? Thanks for your help.

Nazik
  • 8,393
  • 26
  • 72
  • 115
user2090693
  • 11
  • 1
  • 1
  • 2

1 Answers1

4

Factor levels remain on subsetting. To drop them, use droplevels, eg:

feb_bill  <- droplevels(bill_11[which(bill_11$Month == "Feb-12"),])

This will drop all unused levels from all factor variables in your data.frame. To maintian levels on specific variables, use the except parameter.

James
  • 61,307
  • 13
  • 140
  • 186