0

Having a paython script that processes a large file (~1GB) then inserts its content into PostgreSQL database 9.3, all in one connection. This process takes very long time so I thought of distrubuting this quesry among more than one core (I got 8 cores) but from what I've read this seems impossible. Any idea if there is a workaround?

Piece of my code:

import psycopg2 as psycopg
try:
  connectStr = "dbname='postgis20' user='postgres' password='' host='localhost'"
  cx = psycopg.connect(connectStr)
  cu = cx.cursor()
  logging.info("connected to DB")
except:
  logging.error("could not connect to the database")

global cx
        try: 
                 cu.execute("INSERT INTO taxi (userid,carNum) SELECT '"+str(msg['UserID'])+"',"+str(msg['CarNumber']))
                 cu.execute
                 cu.execute
                 cu.execute
                 cu.execute
                 ..
                 ..
                 ..
                 .

        except Exception, err:
                 print('ERROR: %s\n' % str(err))
                 cx.commit()
       cx.commit()   
Jérôme Radix
  • 9,159
  • 3
  • 29
  • 38
Learner
  • 41
  • 5
  • Question: I'm not a python-programmer, but I do have doubts about concatenating a string and SQL injection... Looks like your code isn't safe, please check. – Frank Heikens Feb 01 '15 at 08:36
  • This "row at a time" method of insertion will take one complete roundtrip *per row* to the database, including parsing the query (plus at least two context switches) You will never get beyond a few records per second. – wildplasser Feb 01 '15 at 11:32

1 Answers1

0

Parallelizing your python program will not change the fact that you're inserting one line after the other into the same logged table (which put and remove lock everytime and forbid scaling).

Bulk inserting data into postgresql is a classic subject : the best way to bulk import data into postgresql is to use the COPY command (even if you're in a python program, you should use this postgresql command).

The postgresql.org site has a page on how to optimize your import. You should also read this SO answer on the same subject.

Community
  • 1
  • 1
Jérôme Radix
  • 9,159
  • 3
  • 29
  • 38