1

I am trying to load a .sas7bdat file to R, but get the error

ReadStat: Expected 849278526 rows in file, found 611874762. Failed to parse "filepath": File did not contain the expected number of rows.

What does this error message mean?

My code is:

dataset <- read_sas("datapath", NULL)
This works for one dataset but does not for the other (larger) one

The loaded packages are: fst, tidyverse, dplyr, data.table, haven

Elke
  • 13
  • 7
  • How many observations did you expect that SAS dataset to contain? Are your sure that the file is not truncated. Can you read it with SAS? – Tom Jan 09 '19 at 13:50
  • I have no idea about how many observations to expect. How can I check that the file is not truncated? Also, I cannot open it with SAS because I don't have SAS. – Elke Jan 09 '19 at 14:03
  • If you cannot access it yourself with SAS then you should ask the creator of the dataset how many observations it should have. Did they send you any documentation? Even better a copy of the SAS log from the job that created the dataset? – Tom Jan 09 '19 at 14:16
  • They did not send any documentation. Do I need to know how many observations it has in order to load it into R? The data is in long format – Elke Jan 09 '19 at 14:20
  • You don't need to know to load. You need to know to help figure out what might be the problem. Most likely the file was truncated somehow and that is why you are getting that message. Although there is a chance that the `read_sas` routine just doesn't understand your particular file. The format for `sas7bdat` files is not public and that program was created by someone trying to figure out the structure. – Tom Jan 09 '19 at 15:17
  • I asked the creator of the file and the data should contain 849278526 variables. I only unzipped the file before trying to load into R. – Elke Jan 09 '19 at 15:19
  • If by ZIP you mean a zip archive then confirm that the uncompressed file is the same size as the directory inside the ZIP says it should be to double check your unzip process. Also have the sender test on their side that they sent you a working file. – Tom Jan 09 '19 at 15:40
  • Yes, I checked and the uncompressed file has not the same size als the ZIP says it should. Why does this happen? – Elke Jan 09 '19 at 15:46
  • So either the ZIP file was created wrong or you unzipped it wrong. Older versions of ZIP could not handle files larger than 2Gbytes. Perhaps when you unzipped it the target disk ran out of space and you didn't realize it. – Tom Jan 09 '19 at 15:49
  • The compressed file is already larger than 2GB. – Elke Jan 09 '19 at 15:51
  • SAS datasets typically compress by a factor of 20 or so. You are talking about an uncompressed SAS dataset in the order of 50 to 100Gbytes. – Tom Jan 09 '19 at 15:55
  • The uncompressed dataset is around 20GB – Elke Jan 09 '19 at 16:09

1 Answers1

0

Most likely the file is truncated for some reason.

There is a small possibility that read_sas might not know how to handle your specific file, but in that case it seems unlikely it could read any of the file.

Tom
  • 36,086
  • 2
  • 10
  • 24