-2

I am trying to create some descriptive statistics and histograms out of ordered variables (range 0 to 10). I used the following commands:

class(data$var1)
describe(as.numeric(data$var1))

But R starts from 1 and counts the "refusal" values as a further numeric value.

How can I let R start from 0 and ignore the "refusal" values?

Thank you.

Edit: I was able to let R ignore "refusal" value using the following command:

is.na (data$var1[data$var1=="Refusal"]) <- TRUE

But when I search for possible solution about the 0 values I am only finding suggestion on how to ignore/remove 0 values...

Edit2: This is a sample of my data,

 [1] 5       8       8       8       Refusal 10      8       Refusal 7      
  [10] 7       8       7       8       8       8       8       8       8      
  [19] 8       0       9       Refusal 6       10      7       7       9

as you can see the range is from 0 to 10 but using the R library "psych" and the command "describe" the output range is always 1 to 11 and this invalidates the whole statistics.

> class(data$var1)
[1] "factor"
> describe(as.numeric(data$var1), na.rm=TRUE)
  vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
1    1 1115 8.38 1.94      9    8.57 1.48   1  11    10 -1.06     1.42 0.06

Sorry for the ongoing editing but I am new of stackoverflow.com

Crescenzo
  • 1
  • 1
  • Sorry, I missed a bracket: class(data$var1) describe(as.numeric(data$var1)) – Crescenzo Mar 25 '15 at 10:31
  • Thank you, docendo discimus. :-) – Crescenzo Mar 25 '15 at 10:36
  • Its not very reproducible is it? – David Arenburg Mar 25 '15 at 10:40
  • It's unclear what you're asking. Please read [about how to ask a good question](http://stackoverflow.com/help/how-to-ask) and [how to provide a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Mar 25 '15 at 10:43
  • What is the output of the code that you've included? If the first line is `"factor"` you could try `describe(as.numeric(as.character(data$var1))` so that conversion from factor to number works as you expect. – Miff Mar 25 '15 at 10:44
  • This is the output after I removed the "refusal" values: > class(dataset$MCZ_1) [1] "factor" > describe(as.numeric(dataset$MCZ_1), na.rm=TRUE) vars n mean sd median trimmed mad min max range skew kurtosis se 1 1 1115 8.38 1.94 9 8.57 1.48 1 11 10 -1.06 1.42 0.06 So the "refusal" value have been successfully removed, but I still get a range from 1 to 11, despite the original one is from 0 to 10. – Crescenzo Mar 25 '15 at 10:51
  • @Crescenzo - check my code again. it converts to character, then to numeric. Does this solve your problem? – Miff Mar 25 '15 at 10:58
  • Thank you, Miff. I tried both your code and the one I've just modified and mine is working now with regards to the character values. But I still have issue with the range of values that the output statistics wrongly considers starting from 1 rather than 0. – Crescenzo Mar 25 '15 at 11:05
  • Please provide a reproducible example and expected result. – Roman Luštrik Mar 25 '15 at 11:29

1 Answers1

0

Have a look at how factors work, with ?factor, or looking at the example question here. In essence, each level is given a number starting at 1, hence ending at 11 if you have 11 unique values. Conversion of a factor to numeric returns these codes, rather than the underlying numbers they relate to. To do this, first convert to character, then to numeric. See the difference between these code snippets:

#create data
set.seed(0)
a <- factor(sample(c(0:10,"refusal"),50,T)) #Some dummy data
class(a)
# [1] "factor"

snippet 1 - how you're doing it

describe(as.numeric(a),na.rm=TRUE)
#as.numeric(a) 
#n missing  unique    Mean     .05     .10     .25     .50     .75     .90     .95 
#50       0      11    6.28    2.00    2.00    4.00    6.00    8.75   10.00   11.00 
#
#1  2  3 4 5  6  7  8 9 10 11
#Frequency 2  5  5 4 2  8  6  5 3  6  4
#%         4 10 10 8 4 16 12 10 6 12  8

snippet 2 - correct way

describe(as.numeric(as.character(a)),na.rm=TRUE)
#as.numeric(as.character(a)) 
#n missing  unique    Mean     .05     .10     .25     .50     .75     .90     .95 
#46       4      10   5.304     1.0     1.0     3.0     5.0     8.0     9.5    10.0 
#
#0  1 2 3  4  5  7 8  9 10
#Frequency 2  5 4 2  8  6  5 3  6  5
#%         4 11 9 4 17 13 11 7 13 11
#Warning message:
#  In describe(as.numeric(as.character(a)), na.rm = TRUE) :
#  NAs introduced by coercion

Note the difference in range (even if my describe function isn't the same as yours). The warning refers to the "refusals which are converted to NAs as they don't represent a number

Community
  • 1
  • 1
Miff
  • 5,268
  • 15
  • 16
  • Thank you, Miff. There was a bracket missing (again, my fault) and that's why your code didn't work properly the first time. Thank you also for your explanation, now I can see where I was wrong. I hope I can be as helpful as you in the future when R will be less "problematic". :-) – Crescenzo Mar 25 '15 at 11:34