Quickest way to load data from PostgreSQL to R

Question

I’m planning to do some data analysis with R; the datasets are stored in PostgreSQL tables and some of them contain up to 2 million records. I thought this would not be a big problem for R and loading of the records would be rather quick, but the things turned out differently.

Doing something like this may take a minute or more, which is not what I was expecting:

library(RPostgreSQL);

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "mydb", user="me", password="my_password");

records <- dbGetQuery(con, statement = paste(
  "SELECT *",
  "FROM my_table",
  "ORDER BY id")); # 1.5M records

Alternative code is even slower:

records2 <- dbSendQuery(con, "select * from my_table ORDER BY id")
fetch(records2,n=-1)

I can’t say my hardware is the most advanced in the world, but it’s a rather decent MacBook Pro with 8G RAM and SSD. When I fetch the same data with, let’s say, QGIS, things are done significantly faster.

What can one do to increase performance in such case? Alternative libraries? Tricks and hacks? Anything else?

Hmm, I've just tested this code on table with 1M records. It's as fast as typical query. What kind of data are you using? Also, show `sessionInfo()` output, please. — redmode, Feb 04 '14 at 17:08
@redmode, what kind of R environment are you using (if that matters)? What’s ‘as fast as typical query’ means in your case? — Alexander Kachkaev, Feb 04 '14 at 19:32
Do you mean IDE? I'm using RStudio. Query for 1M records took ~2secs, roughly the same timing when run from pgadmin3. — redmode, Feb 04 '14 at 19:59
Also, it can be a good idea to work with PostgreSQL without fitting all data in memory. For instance, using wonderful `dplyr` package. See `dplyr::databases` vignette for details. — redmode, Feb 04 '14 at 21:03
You could try to use `copy` postgresql command to offload your data to CSV or TSV and process it in your `r` code — Konstantin V. Salikhov, May 12 '14 at 18:41

score 1 · Answer 1 · edited May 23 '17 at 12:15

1

You should exclude ORDER BY - this is not part of loading data. It may significantly slows your query.
You can order data afterwards having them in R memory. If you are looking for fast ordering check this SO answer.

edited May 23 '17 at 12:15

Community

1
1

answered Jan 09 '16 at 22:55

jangorecki

14,077
3
57
137

score 0 · Answer 2 · answered May 13 '14 at 21:56

0

More of a redesign than answer to the question, but...

You could always plug R directly into PostgreSQL, and run your query without moving the data anywhere. Can't move it any faster than not moving it at all :)

PL/R for PostgreSQL

answered May 13 '14 at 21:56

Kirk Roybal

15,289
1
24
37

Quickest way to load data from PostgreSQL to R

2 Answers2