0

I have a big data problem that I want to distribute over say 20 EC2 instances. My data set is produced locally, and I want to slice it for distribution across all of my EC2 instances. I don't quite understand the differences between block vs file vs object storage, but to me it seems that being able to mount the EFS on all EC2 instances would be more performant than copying data from S3 to individual instances. Is this assumption correct, and if so, is there a way to upload data to EFS without using the DataSync system provided by Amazon?

Jon Martin
  • 3,087
  • 5
  • 23
  • 43
  • Why you want to copy, why not directly access from S3? EFS is low latency and region aware, while S3 can be accessed from anywhere with high latency. – Kannaiyan May 10 '19 at 00:09
  • 1
    You may consider using AWS's EMR service and using one of the hadoop technologies to manage your distributed processing instead of doing this yourself with EC2 instances. EMR can put your data in S3 and have a cluster of machines run in parallel to process it. – JD D May 10 '19 at 02:59

2 Answers2

2

It depends on your specific use-cases and softwares but here's some basic guideline

  • S3 is object storage. Data on S3 is served over HTTP(s) to your machines
  • EFS is file system storage, using NFSv4 protocol

EFS is much much more expensive than S3 for the purpose of just saving into it and read from it

Here is a comparison already made on Stack AWS EFS vs EBS vs S3 (differences & when to use?)

qkhanhpro
  • 2,407
  • 1
  • 15
  • 28
1

S3 is like a web server. You upload files to it and download files from it, but you can't modify a file directly on the server. You have to download it, then modify, then put it back.

EFS, which is NFSv4, is like a disk. You can edit files directly. It's also significantly more expensive than S3. To upload files to EFS, you mount it on an EC2 instance like a normal disk.

That said, it sounds like the correct answer for what you're trying to do is to use EMR, like JD D suggested.

bgdnlp
  • 643
  • 5
  • 11