-1

What's the fastest way to upload multiple local csv files to a AWS RDS? I have 100 Gbs of data in thousands of csv files sit on a local machine.

Community
  • 1
  • 1
notObvious
  • 23
  • 1
  • 5

2 Answers2

0

I see 2 ways of achieving that:

1) create a local postgres database and parce your CSV files to insert data in your local postgres database. Then, pg_dump your database's content to a local .sql file. You can then upload this .sql file to S3. Once the dump file is in S3, spin up an EC2 instance that pill psql the dump file to your postgres RDS instance.

2) Use s3 sync to copy your CSV files to a S3 bucket. Next, spin up and EC2 instance that will parse your CSV rows one by one, and insert the data in your RDS instance

In both cases, you want to insert the data in your RDS instance from an EC2 instance, to minimize latency. If you were to insert data in your RDS instance from your local machine, this would take a very long time. The communication between an EC2 instance and and RDS instance will be much faster, due to network proximity inside the AWS datacenter.

spg
  • 7,593
  • 3
  • 27
  • 34
0

To expand on @spg's answer: He is right in that EC2 to RDS will be MUCH faster than local to RDS.

  1. So launch an EC2 instance and install Postgres client psql.
  2. Use scp to copy the CSV file directly to the EC2 instance.
  3. Use the psql /copy command to import items from CSV:

    $psql target-db ^ -U <admin user> ^ -p <port> ^ -h <DB instance name> ^ -c "\copy source-table from '/path/to/source-table.csv' with DELIMITER ','"

See AWS Documentation

talentedmrjones
  • 5,533
  • 1
  • 24
  • 25