correct filtering whith subset R

Question

I'm trying to subset data, but it seems I'm doing something wrong.

My data is one column data with a header such as :

platform
========
service
vps
dedic
dedic
vps
service
dedic
....
...
..
.

I've got it from a big data set by:

servertype<- mydata[c(18)] #it was 18th variale

Now I'm trying to filter it and subset only what I need, omitting all "services"

servertype <- subset(servertype, platform=="dedicated" | platform=="vps")

I expect to get something like :

platform
========
vps
dedic
dedic
vps
dedic
....
...
..
.

and by checking the data, this is exactly what I'm getting

but when I'm checking the summary, I'm getting

> summary(servertype)
      platform   
 dedicated:8564  
 service  :   0  
 vps      :4677

and when plotting, "service" comes up as well...

I tried to restart R, restart session, clean data, etc.. :)

But no changes, I suppose subset with conditions is not working as I expected ? is there any other way around ?

It is very hard to tell what your question is. We need a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It looks like your `subset` gives you exactly what you expect, so what's wrong? `summary(servertype)` is simply indicating that your `servertype` dataframe has one column `platform` with those counts. "when plotting" - what do you mean? Again, a reproducible example is key here — mathematical.coffee, Feb 23 '16 at 03:26
`platform` is stored as a `factor`. either convert to `character` or `drop` factor levels. google... — MichaelChirico, Feb 23 '16 at 03:29

astrosyam · Accepted Answer · 2016-02-23T03:52:19.877

2

Just factor the data again :

#sample data
mydata = data.frame( platform = c('service','vps','dedic','dedic','vps','service','dedic'))

#subset
mydata = subset(mydata, mydata$platform != 'service' )

#factor the data again
mydata$platform = factor(mydata$platform)

#check plot
plot(mydata)

The initial data had 3 factors. In order to compute with the new factor levels, just re run factor on the data.

The new data will have only two factor levels as desired.

> summary(mydata)
  platform
 dedic:3  
 vps  :2

edited Feb 23 '16 at 03:52

answered Feb 23 '16 at 03:43

astrosyam

817
4
15

Thanks, working fine, I was thinking, by selecting what I need I'm dropping what I don't need, but it seems I have to drop what I do not need in terms to pick what I need :) .. cheers – Zaza Feb 23 '16 at 04:12

score 1 · Answer 2 · edited May 23 '17 at 10:29

1

I think what you needed is this. If the original column was a factor, the subset column retains all the original factor levels. Remove them by applying factor function again.

Drop factor levels in a subsetted data frame

edited May 23 '17 at 10:29

Community

1
1

answered Feb 23 '16 at 03:30

myloginid

1,313
1
18
35

The OP has already mentioned that "by checking the data, this is exactly what's I'm getting". So the problem is not the subsetting. It is a little unclear, but with my mind-reading hat I think the problem is that the OP wants to know how to drop the empty levels. – mathematical.coffee Feb 23 '16 at 03:37

correct filtering whith subset R

2 Answers2