1

to_sql() is printing every insert statement within my Jupyter Notebook and this makes everything run very slowly for millions of records. How can I decrease the verbosity significantly? I haven't found any verbosity setting of this function. I've tried %%capture as written here. The same method works for another simple test case with print() but not for to_sql(). How do you suppress output in IPython Notebook?

from sqlalchemy import create_engine, NVARCHAR
import cx_Oracle

df.to_sql('table_name', engine, if_exists='append', 
              schema='schema', index=False, chunksize=10000,
                dtype={col_name: NVARCHAR(length=20) for col_name in df} )
Gord Thompson
  • 98,607
  • 26
  • 164
  • 342
Bobby
  • 1,345
  • 2
  • 16
  • 35
  • 2
    Does your connection URI include `echo=True` (or `echo=` anything other than `False`)? – Gord Thompson Nov 29 '20 at 18:46
  • 1
    @GordThompson yes, that was it! Do you want to post the answer or should I? Or is it a trivial question and worth deleting? I couldn't find the answer anywhere. Now that I know what to look for, it's easy to find: https://docs.sqlalchemy.org/en/13/core/engines.html#more-on-the-echo-flag – Bobby Nov 29 '20 at 19:15
  • You go ahead and post an answer; it could help others in the future. – Gord Thompson Nov 29 '20 at 19:19
  • Thanks, @GordThompson, I just did. FYI, now I could write 23M records in 18 minutes. Before it was around an hour, but I wasn't ever really sure because the Jupyter Notebook was getting completely bogged down and the cell never was showing as completed even if writing had finished. – Bobby Nov 29 '20 at 19:33
  • You might consider experimenting with `method="multi"` to see if you can speed things up even more. (It helps with other databases but I've never tried it with Oracle.) – Gord Thompson Nov 29 '20 at 20:44
  • 1
    … actually, disregard my previous comment re: `method="multi"`. I just tested against an OracleXE 18.4 instance and it apparently does not support table-value constructors, which is what `method="multi"` uses to try and speed things up. Perhaps the cx-Oracle DBAPI layer does its own optimizations for `.executemany()`. – Gord Thompson Nov 29 '20 at 22:22
  • @GordThompson Yes, I had been trying with multi and wasn't able to get it working either. I'm not sure if chunksize has any effect without multi. I had a look at executemany, but that would no longer work directly with my data frame, I would need to format the data as a list of tuples, right? I suppose that pre-processing would also also take a lot of time for a dataframe with 23M records. https://cx-oracle.readthedocs.io/en/latest/user_guide/batch_statement.html#batchstmnt – Bobby Nov 30 '20 at 08:18

1 Answers1

1

Inside create_engine(), set echo=False and all logging will be disabled. More detail here: https://docs.sqlalchemy.org/en/13/core/engines.html#more-on-the-echo-flag.

from sqlalchemy import create_engine, NVARCHAR
import cx_Oracle

host='hostname.net'
port=1521
sid='DB01' #instance is the same as SID
user='USER'
password='password'
sid = cx_Oracle.makedsn(host, port, sid=sid)

cstr = 'oracle://{user}:{password}@{sid}'.format(
    user=user,
    password=password,
    sid=sid
)

engine =  create_engine(
    cstr,
    convert_unicode=False,
    pool_recycle=10,
    pool_size=50,
    echo=False
)

Thanks to @GordThompson for pointing me in the right direction!

Bobby
  • 1,345
  • 2
  • 16
  • 35