Write R data as csv directly to s3

Question

I would like to be able to write data directly to a bucket in AWS s3 from a data.frame\ data.table object as a csv file without writing it to disk first using the AWS CLI.

obj.to.write.s3 <- data.frame(cbind(x1=rnorm(1e6),x2=rnorm(1e6,5,10),x3=rnorm(1e6,20,1)))

at the moment I write to csv first then upload to an existing bucket then remove the file using:

fn <- 'new-file-name.csv'
write.csv(obj.to.write.s3,file=fn)
system(paste0('aws s3 ',fn,' s3://my-bucket-name/',fn))
system(paste0('rm ',fn))

I would like a function that writes directly to s3? is that possible?

score 26 · Answer 1 · edited Apr 09 '20 at 19:23

26

In aws.s3 0.2.2 the s3write_using() (and s3read_using()) functions were added.

They make things much simpler:

s3write_using(iris, FUN = write.csv,
                    bucket = "bucketname",
                    object = "objectname")

edited Apr 09 '20 at 19:23

dfrankow

16,533
35
121
177

answered Oct 12 '17 at 00:09

leerssej

10,604
5
42
50

1

This is a nice one, you can also use this function in order to easily save parquet files- `s3write_using(iris, FUN = arrow::write_parquet, bucket = "bucketname", object = "objectname")` – David Arenburg Sep 22 '19 at 14:07
what does the objectname refer to? The folder in the S3 bucket? – nak5120 Sep 28 '19 at 21:47
@nak5120 there isn't really such thing as a "folder" in s3 (look it up in Google), what you think is a folder is actually part of the name of the object, and you should provide it in `objectname` – RiskyMaor Sep 10 '20 at 23:01
It's worth noting that using `s3write_using` does make things simpler, but also writes the file to your local disk before placing it in S3. – RiskyMaor Sep 11 '20 at 02:47

score 6 · Answer 2 · answered May 23 '16 at 19:43

The easiest solution is just to save the .csv in a tempfile(), which will be purged automatically when you close your R session.

If you need to only work in memory you can do this by doing write.csv() to a rawConnection:

# write to an in-memory raw connection
zz <- rawConnection(raw(0), "r+")
write.csv(iris, zz)

# upload the object to S3
aws.s3::put_object(file = rawConnectionValue(zz),
    bucket = "bucketname", object = "iris.csv")

# close the connection
close(zz)

In case you're unsure, you can then check that this worked correctly by downloading the object from S3 and reading it back into R:

# check that it worked
## (option 1: save locally)
save_object(object = "iris.csv", bucket = "bucketname", file = "iris.csv")
read.csv("iris.csv")
## (option 2: keep in memory)
read.csv(text = rawToChar(get_object(object = "iris.csv", bucket = "bucketname")))

score 0 · Answer 3 · answered May 06 '15 at 18:22

Sure -- but 'saving to file' requires that your OS sees the desired target directory as an accessible filesystem. So in essence you "just" need to mount S3. Here is a quick Google search for that topic.

An alternative is writing to a temporary file, and then using whatever you use to transfer files. You could code up both operations as a simple helper function.

Write R data as csv directly to s3

3 Answers3

Linked

Related