2

I'm dynamically generating queries from 11 different tables in SQL Server and storing that into S3 CSV file.

However, when I store null integer fields in CSV it converts them to float so when I'm doing a copy command it returns an error.

I really need to avoid that. is there and option for that?

for object in table_list:
    if args.load_type == "full":
        query_load = object["query_full"]
    else:
        query_load = object["query_active"]

    df = pd.read_sql_query(query_load, sql_server_conn)
    df = df.replace(",", " ", regex=True)
    df = df.replace("\n", " ", regex=True)


    #print(df)
    #df = df * 1
    #print(df.dtypes)
    #print(df.info())


    df = df.assign(extraction_dttm=currentdate)

    csv_buffer = StringIO()
    df.to_csv(csv_buffer, index=False)
    folder_name = "{}".format(object["lake_table_name"])
    file_name = "{}_{}.csv".format(object["lake_table_name"], currentdate.strftime("%Y%m%d"))
    full_path_to_file = DATALAKE_PATH + "/" + folder_name + "/" + file_name

    # print("{} - Storing files in {} ... ".format(dt.utcnow(), datalake_bucket))
    s3_resource.Object(datalake_bucket, full_path_to_file).put(Body=csv_buffer.getvalue())
Dale K
  • 16,372
  • 12
  • 37
  • 62
RosaNegra
  • 41
  • 1
  • 4
  • It's a bit off-topic, but doesn't most SQL language have an option to export into csv? Like this in MySQL: https://stackoverflow.com/questions/356578/how-to-output-mysql-query-results-in-csv-format – AerysS Jun 02 '20 at 08:10
  • Ideally this would be a lambda. so I'm not storing csv files anywhere locally. I'm reading from SQL server and pushing directly to S3 – RosaNegra Jun 02 '20 at 08:16
  • I'm only guessing here, but your DataFrame.replace calls are probably converting all of the fields to strings before DataFrame.to_csv even gets a look-in. DataFrame.to_csv does have options to deal with null values (na_rep, for example), floats (float_format) and quoting (quoting, quotechar, etc) when given the correct values. – AlwaysLearning Jun 02 '20 at 08:21
  • Nope. It does not. I commented out the df.replace method and printed the df.info(). It still reads my that one column as a float instead of integer. – RosaNegra Jun 02 '20 at 08:27

0 Answers0