What's the fastest way to upload multiple local csv files to a AWS RDS? I have 100 Gbs of data in thousands of csv files sit on a local machine.
-
What DB engine is RDS running? MySQL? Postgres? – talentedmrjones Mar 21 '17 at 17:56
-
It's postgres. Thanks! – notObvious Mar 21 '17 at 18:11
2 Answers
I see 2 ways of achieving that:
1) create a local postgres database and parce your CSV files to insert data in your local postgres database. Then, pg_dump
your database's content to a local .sql
file. You can then upload this .sql
file to S3. Once the dump file is in S3, spin up an EC2 instance that pill psql
the dump file to your postgres RDS instance.
2) Use s3 sync
to copy your CSV files to a S3 bucket. Next, spin up and EC2 instance that will parse your CSV rows one by one, and insert the data in your RDS instance
In both cases, you want to insert the data in your RDS instance from an EC2 instance, to minimize latency. If you were to insert data in your RDS instance from your local machine, this would take a very long time. The communication between an EC2 instance and and RDS instance will be much faster, due to network proximity inside the AWS datacenter.
- 7,593
- 3
- 27
- 34
To expand on @spg's answer: He is right in that EC2 to RDS will be MUCH faster than local to RDS.
- So launch an EC2 instance and install Postgres client
psql
. - Use
scp
to copy the CSV file directly to the EC2 instance. Use the psql /copy command to import items from CSV:
$psql target-db ^ -U <admin user> ^ -p <port> ^ -h <DB instance name> ^ -c "\copy source-table from '/path/to/source-table.csv' with DELIMITER ','"
- 5,533
- 1
- 24
- 25