I'm kinda new to the SQL world, but I was following a tutorial called Optimizing pandas.read_sql for Postgres. The thing is, I'm working with a big dataset, similar to the example in the tutorial and I need a faster way to execute my query and turn it into a DataFrame. There, they use this function:
def read_sql_tmpfile(query, db_engine):
with tempfile.TemporaryFile() as tmpfile:
copy_sql = "COPY ({query}) TO STDOUT WITH CSV {head}".format(
query=query, head="HEADER"
)
conn = db_engine.raw_connection()
cur = conn.cursor()
cur.copy_expert(copy_sql, tmpfile) # I want to replicate this
tmpfile.seek(0)
df = pandas.read_csv(tmpfile)
return df
And I tried to replicate it, like this:
def read_sql_tmpfile(query, connection):
with tempfile.TemporaryFile() as tmpfile:
copy_sql = "COPY ({query}) TO STDOUT WITH CSV {head}".format(
query=query, head="HEADER"
)
cur = connection.cursor()
cur.copy_expert(copy_sql, tmpfile)
tmpfile.seek(0)
df = pandas.read_csv(tmpfile)
return df
The thing is, cursor.copy_expert
comes from the psycopg2
library for PostgreSQL, and I can't find a way to do the same thing with pymysql
. Is there any way to do this? What should I do? Thanks