19

I would like to be able to write data directly to a bucket in AWS s3 from a data.frame\ data.table object as a csv file without writing it to disk first using the AWS CLI.

obj.to.write.s3 <- data.frame(cbind(x1=rnorm(1e6),x2=rnorm(1e6,5,10),x3=rnorm(1e6,20,1)))

at the moment I write to csv first then upload to an existing bucket then remove the file using:

fn <- 'new-file-name.csv'
write.csv(obj.to.write.s3,file=fn)
system(paste0('aws s3 ',fn,' s3://my-bucket-name/',fn))
system(paste0('rm ',fn))

I would like a function that writes directly to s3? is that possible?

h.l.m
  • 11,457
  • 18
  • 73
  • 155

3 Answers3

26

In aws.s3 0.2.2 the s3write_using() (and s3read_using()) functions were added.

They make things much simpler:

s3write_using(iris, FUN = write.csv,
                    bucket = "bucketname",
                    object = "objectname")
dfrankow
  • 16,533
  • 35
  • 121
  • 177
leerssej
  • 10,604
  • 5
  • 42
  • 50
  • 1
    This is a nice one, you can also use this function in order to easily save parquet files- `s3write_using(iris, FUN = arrow::write_parquet, bucket = "bucketname", object = "objectname")` – David Arenburg Sep 22 '19 at 14:07
  • what does the objectname refer to? The folder in the S3 bucket? – nak5120 Sep 28 '19 at 21:47
  • @nak5120 there isn't really such thing as a "folder" in s3 (look it up in Google), what you think is a folder is actually part of the name of the object, and you should provide it in `objectname` – RiskyMaor Sep 10 '20 at 23:01
  • It's worth noting that using `s3write_using` does make things simpler, but also writes the file to your local disk before placing it in S3. – RiskyMaor Sep 11 '20 at 02:47
6

The easiest solution is just to save the .csv in a tempfile(), which will be purged automatically when you close your R session.

If you need to only work in memory you can do this by doing write.csv() to a rawConnection:

# write to an in-memory raw connection
zz <- rawConnection(raw(0), "r+")
write.csv(iris, zz)

# upload the object to S3
aws.s3::put_object(file = rawConnectionValue(zz),
    bucket = "bucketname", object = "iris.csv")

# close the connection
close(zz)

In case you're unsure, you can then check that this worked correctly by downloading the object from S3 and reading it back into R:

# check that it worked
## (option 1: save locally)
save_object(object = "iris.csv", bucket = "bucketname", file = "iris.csv")
read.csv("iris.csv")
## (option 2: keep in memory)
read.csv(text = rawToChar(get_object(object = "iris.csv", bucket = "bucketname")))
Thomas
  • 40,508
  • 11
  • 98
  • 131
0

Sure -- but 'saving to file' requires that your OS sees the desired target directory as an accessible filesystem. So in essence you "just" need to mount S3. Here is a quick Google search for that topic.

An alternative is writing to a temporary file, and then using whatever you use to transfer files. You could code up both operations as a simple helper function.

Dirk Eddelbuettel
  • 331,520
  • 51
  • 596
  • 675