1

I have imported SAS data into R, using rio:

library(rio)
r<-import("S:/MyFolder/MyData.sas7bdat", catalog_file = "S:/MyFolder/formatsforr.sas7bcat")  

That gives my r, which has column r$Race (storing atomic values 1,2,3, and 4), which has an attribute that seems to store my Race format (0="All", 1="White", ..., 8="Unknown", ...). I want to convert r$Race to a factor, using the attributes. I want to do this for many columns.

If I had used haven, I could have done this:

library(haven)
h <- read_sas("S:/MyFolder/MyData.sas7bdat", "S:/MyFolder/formatsforr.sas7bcat")  
h$Race <- as_factor(h$Race)  # as_factor is a haven function that converts the column to a factor, using the format to label the factor values.

But as_factor fails with r (with the object that rio created).

I'm hoping to just use rio, which we like more that haven for other reasons. I am trying to create simple examples for other coders in my health department to use. I would like to minimize the number of packages they need to learn and load, so we can focus our learning.

In case this helps: as_factor(r$Race) returns this:

Error in UseMethod("as_factor") : no applicable method for 'as_factor' applied to an object of class "c('double', 'numeric')"

str(r$Race) returns:

atomic [1:7776] 1 2 3 4 1 2 3 4 1 2 ... - attr(, "label")= chr "Race (1=W,2=B,3=NatAm/AKNat,4=Asian/PacIs)" - attr(, "format.sas")= chr "POPEST199XRACE" - attr(, "labels")= Named num [1:10] 0 1 2 3 4 0 1 2 3 4 ..- attr(, "names")= chr [1:10] "Total" "White" "Black" "American Indian or Alaska Native" ...

str(h$Race) returns:

Class 'labelled' atomic [1:7776] 1 2 3 4 1 2 3 4 1 2 ... ..- attr(, "label")= chr "Race (1=W,2=B,3=NatAm/AKNat,4=Asian/PacIs)" ..- attr(, "format.sas")= chr "POPEST199XRACE" ..- attr(, "labels")= Named num [1:10] 0 1 2 3 4 0 1 2 3 4 .. ..- attr(, "names")= chr [1:10] "Total" "White" "Black" "American Indian or Alaska Native" ...

Right after running the import(...), running:

    dput(head(r$Race))# returns this: 
c(1, 2, 3, 4, 1, 2)

Right after running the read_sas(...), running:

     dput(head(h$Race)) # returns this:
     structure( c(1, 2, 3, 4, 1, 2), 
       labels = structure(c(0, 1, 2, 3, 4, 0, 1, 2, 3, 4),
       .Names = c("Total", "White", "Black", "American Indian or Alaska Native", 
                "Asian or Pacific Islander", "Total", "White", "Black", 
                "American Indian or Alaska Native", "Asian or Pacific Islander")),
                   class = "labelled")

I did not find an online .sas7bcat file. I could post SAS code to create it and the data file.

IRTFM
  • 240,863
  • 19
  • 328
  • 451
user3799203
  • 294
  • 3
  • 12
  • If you would post the output from `dput(head(r$Race))` a complete solution might be possible. As it is you've lost much of the information in those attributes by using `str`. It would probably be better if you substituted your file input code (and the dput output) with code that uses an easy to access `.sas7bcat` file, perhaps from a govt website or from a CRAN-hosted package? – IRTFM Mar 28 '18 at 23:44
  • Right after running the `import(...)`, running `dput(head(r$Race))` returns this: **c(1, 2, 3, 4, 1, 2)**. Right after running the `read_sas(...)`, running `dput(head(h$Race))` returns this: **structure(c(1, 2, 3, 4, 1, 2), labels = structure(c(0, 1, 2, 3, 4, 0, 1, 2, 3, 4), .Names = c("Total", "White", "Black", "American Indian or Alaska Native", "Asian or Pacific Islander", "Total", "White", "Black", "American Indian or Alaska Native", "Asian or Pacific Islander")), class = "labelled")** I did not find an online `.sas7bcat` file. I could post SAS code to create it and the data file. – user3799203 Mar 29 '18 at 14:25
  • I moved the contents of your comment to the body of the question where it should have been put in the first place. Learn how to [edit] SO question bodies. The results of your dput effort raise questions in my mind about the accuracy of you earlier posting of results of `str(r$Race)` . It should not be possible to have `str` display all those attributes if `dput` cannot find them. Are you saying there are no examples of sas7bcat files in the documentation of haven? – IRTFM Mar 30 '18 at 16:27
  • After reviewing the documentation for rio, it appears to me that the `rio`-package actually uses `haven::read_sas` for sas7bcat files, so this question is again not making much sense to me. – IRTFM Mar 30 '18 at 16:38
  • I lack the expertise to explain why str() was more informative that dput(). I'm just showing what I get. – user3799203 Apr 01 '18 at 19:29
  • I am not saying that there are no examples of sas7bcat files in the documentation of haven; I don't know how to figure that out. I agree it is strange that rio, which uses haven, creates a different object with its `import()` function than haven does with `read_sas()`. But that is how it is. – user3799203 Apr 01 '18 at 19:35
  • Well, I cannot understand how `dput` fails to print out the attributes when `str` can obviously see them. Since it is not a situation I've ever seen before, I'm suspicious that it is user error... yours. Sorry, that's just the perspective of a longtime R user. The two str's looked pretty similar and I wasn't necessarily expecting any difference. I'm guessing that an answer that worked for the output of `haven` should also work with the output from `rio`. – IRTFM Apr 01 '18 at 20:13
  • 42-: The code I ran is above. Show me the user error. Or try rio's import and haven's read_sas yourself, and see if you get identical objects. – user3799203 Apr 02 '18 at 15:05
  • 1
    `rio::factorize(r)` will convert all variables with attributes to factors based upon the labels attributes. `rio::characterize(r)` will give you character strings instead. – Thomas Jul 10 '18 at 21:26

1 Answers1

0

I gave up trying to use rio, and used haven instead. When import() is invoked with catalog_file =, rio can capture SAS format information in the attributes of the created R object, but I could not figure out how to make use of them. Looking into haven's as_factor function, I saw that the somewhat complex logic haven uses to leverage similar attributes (created via haven's read_sas() function). Seeing it's complexity, I did not try to imitate it to apply to the rio import object's attributes.

user3799203
  • 294
  • 3
  • 12