1

The University of Cape Town make data available through it's DataFirst Portal.

All their data is made available in the following formats:

  1. SAS (sab7bdat)
  2. SPSS
  3. Stata (12)

I would like to import a dataset into R using the Haven package, which supports all of the above formats (it utilises the ReadStat Library).

Which would be the prefered format for doing this?

More specifically:

  1. Are there differences in terms of data available in the original formats?
  2. Are some formats closer to R's format than others, and does this affect the output?
  3. Are there differences in terms of speed? (less important)
Bastiaan Quast
  • 2,057
  • 19
  • 43

1 Answers1

0

The best way to transfer data between different systems is .csv, as it can be read by all systems without much hassle.

As you only have access to the other formats, there shouldn't be too much difference (given that haven works with all of them).

As to your questions:

I am not aware of any differences in the data availability or format-compatabilities. However, if you want to speed things up, you should probably look into data.table and it's fread (replaces read.table, so no support for the mentioned files).

You can read the data like this:

library(haven)
dat <- read_sas("link_to_sas_file")
dat <- read_spss("link_to_spss_file")
dat <- read_stata("link_to_stata_file")
David
  • 6,808
  • 4
  • 32
  • 55
  • Thanks, going to CSV could be a solution but I think that might sometimes be problematic with factors, also, I'd don't typically have a copy of Stata or SAS on my computer, I could use [PSPP](https://www.gnu.org/software/pspp/), to convert the SPSS file to a CSV file, but direct would be easier. Regarding, the speed, I'm not particularly interested in this myself, it not a repeat operation and I would therefore rather stick with the base data structures (hence the sub-question about which is closer to R's format), I added it so that possible answers can serve as a reference to others. – Bastiaan Quast Nov 13 '15 at 07:48