1

I'm kinda new to the SQL world, but I was following a tutorial called Optimizing pandas.read_sql for Postgres. The thing is, I'm working with a big dataset, similar to the example in the tutorial and I need a faster way to execute my query and turn it into a DataFrame. There, they use this function:

def read_sql_tmpfile(query, db_engine):
    with tempfile.TemporaryFile() as tmpfile:
        copy_sql = "COPY ({query}) TO STDOUT WITH CSV {head}".format(
           query=query, head="HEADER"
        )
        conn = db_engine.raw_connection()
        cur = conn.cursor()
        cur.copy_expert(copy_sql, tmpfile)  # I want to replicate this
        tmpfile.seek(0)
        df = pandas.read_csv(tmpfile)
        return df

And I tried to replicate it, like this:

def read_sql_tmpfile(query, connection):
    with tempfile.TemporaryFile() as tmpfile:
        copy_sql = "COPY ({query}) TO STDOUT WITH CSV {head}".format(
           query=query, head="HEADER"
        )

        cur = connection.cursor()
        cur.copy_expert(copy_sql, tmpfile)
        tmpfile.seek(0)
        df = pandas.read_csv(tmpfile)
        return df

The thing is, cursor.copy_expert comes from the psycopg2 library for PostgreSQL, and I can't find a way to do the same thing with pymysql. Is there any way to do this? What should I do? Thanks

Juan C
  • 3,992
  • 1
  • 10
  • 33
  • 1
    They already mentioned, it's postgres specific feature, look at [this](https://stackoverflow.com/questions/18107953/how-to-create-a-large-pandas-dataframe-from-an-sql-query-without-running-out-of) or [this](https://stackoverflow.com/questions/31702621/loading-5-million-rows-into-pandas-from-mysql) – waynetech Jul 24 '19 at 07:51
  • Thank you very much @waynetech, specially the second link was just what I needed to read – Juan C Jul 24 '19 at 15:08

0 Answers0