3

I'm saving really large data.frame (30 million rows) to PostgreSQL database from R and it kills my PC. As this is a result of calculations produced by dplyr, I'd mind to use some build in functionality of this package, but copy_to doesn't work for such huge tables. Any suggestions?

kismsu
  • 971
  • 5
  • 20

1 Answers1

0

Can you copy the dataframe to a csv or tab delimited text file, then load that into PostgreSQL with the COPY FROM command [1]? That implements a bulk load approach which may perform faster.

In some cases, it may be possible to use an RScript to emit the data as a stream and pipe it directly into psql:

<RScript output tab delmited rows> | psql -c "COPY <tablename> (columnlist, ...) FROM STDIN WITH (FORMAT text)"

In some long running cases, I put | pv | in the middle to track progress (http://www.ivarch.com/programs/pv.shtml).

[1] http://www.postgresql.org/docs/current/interactive/sql-copy.html

kermatt
  • 1,435
  • 2
  • 16
  • 35