1

I want to insert billions of values(exchange rates) to a sqlite db file. I want to use threading because it takes a lot of time but threading pool loop executes same nth element multiple times. I have a print statement in the begining of my method and it prints out multiple times instead of just one.

pool = ThreadPoolExecutor(max_workers=2500)

def gen_nums(i, cur):
    global x
    print('row number', x, ' has started')
    gen_numbers = list(mydata)
    sql_data = []
    for f in gen_numbers:
        sql_data.append((f, i, mydata[i]))
    cur.executemany('INSERT INTO numbers (rate, min, max) VALUES (?, ?, ?)', sql_data)
    print('row number', x, ' has finished')
    x += 1


with conn:
    cur = conn.cursor()
    for i in mydata:
        pool.submit(gen_nums, i, cur)

pool.shutdown(wait=True)

and the output is:

row number 1  has started
row number 1  has started
row number 1  has started
row number 1  has started
row number 1  has started
row number 1  has started
row number 1  has started
...
ggnoredo
  • 698
  • 1
  • 10
  • 29
  • 1
    There's no point in using threads; only one will be able to insert rows at a time, making it effectively serial. – Shawn Jan 28 '19 at 11:41

1 Answers1

1

Divide your data into chunks on the fly using generator expressions, make inserts inside the transaction.

Here how your code may look like.

Also, sqlite has an ability to import CSV files.

Sqlite can do tens of thousands of inserts per second, just make sure to do all of them in a single transaction by surrounding the inserts with BEGIN and COMMIT. (executemany() does this automatically.)

As always, don't optimize before you know speed will be a problem. Test the easiest solution first, and only optimize if the speed is unacceptable.

Anonymous
  • 483
  • 4
  • 14