0

I'm trying to subset data, but it seems I'm doing something wrong.

My data is one column data with a header such as :

platform
========
service
vps
dedic
dedic
vps
service
dedic
....
...
..
.

I've got it from a big data set by:

servertype<- mydata[c(18)] #it was 18th variale

Now I'm trying to filter it and subset only what I need, omitting all "services"

servertype <- subset(servertype, platform=="dedicated" | platform=="vps")

I expect to get something like :

platform
========
vps
dedic
dedic
vps
dedic
....
...
..
.

and by checking the data, this is exactly what I'm getting

but when I'm checking the summary, I'm getting

> summary(servertype)
      platform   
 dedicated:8564  
 service  :   0  
 vps      :4677 

and when plotting, "service" comes up as well...

I tried to restart R, restart session, clean data, etc.. :)

But no changes, I suppose subset with conditions is not working as I expected ? is there any other way around ?

CuriousBeing
  • 1,442
  • 11
  • 31
Zaza
  • 368
  • 1
  • 4
  • 15
  • It is very hard to tell what your question is. We need a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It looks like your `subset` gives you exactly what you expect, so what's wrong? `summary(servertype)` is simply indicating that your `servertype` dataframe has one column `platform` with those counts. "when plotting" - what do you mean? Again, a reproducible example is key here – mathematical.coffee Feb 23 '16 at 03:26
  • Did you try `servertype – CuriousBeing Feb 23 '16 at 03:28
  • 4
    `platform` is stored as a `factor`. either convert to `character` or `drop` factor levels. google... – MichaelChirico Feb 23 '16 at 03:29

2 Answers2

2

Just factor the data again :

#sample data
mydata = data.frame( platform = c('service','vps','dedic','dedic','vps','service','dedic'))

#subset
mydata = subset(mydata, mydata$platform != 'service' )

#factor the data again
mydata$platform = factor(mydata$platform)

#check plot
plot(mydata)

The initial data had 3 factors. In order to compute with the new factor levels, just re run factor on the data.

The new data will have only two factor levels as desired.

> summary(mydata)
  platform
 dedic:3  
 vps  :2  
astrosyam
  • 817
  • 4
  • 15
  • Thanks, working fine, I was thinking, by selecting what I need I'm dropping what I don't need, but it seems I have to drop what I do not need in terms to pick what I need :) .. cheers – Zaza Feb 23 '16 at 04:12
1

I think what you needed is this. If the original column was a factor, the subset column retains all the original factor levels. Remove them by applying factor function again.

Drop factor levels in a subsetted data frame

Community
  • 1
  • 1
myloginid
  • 1,313
  • 1
  • 18
  • 35
  • The OP has already mentioned that "by checking the data, this is exactly what's I'm getting". So the problem is not the subsetting. It is a little unclear, but with my mind-reading hat I think the problem is that the OP wants to know how to drop the empty levels. – mathematical.coffee Feb 23 '16 at 03:37