I am trying to concatenate multiple files in a directory. I am able to concatenate the files, however I am getting duplicated rows and columns. If a field doesn't exist in the input file, the output file should ideally show NaN or null.
Ideally the output would look as follows:
Merged.csv
ItemID Price Discount ItemName Desc Barcode
0 1 4 2.0 Toy For kids 667865
1 2 10 1.0 Game For adults 998364
2 3 8 NaN NaN For everyone 43831
file1.csv
ItemID Price Discount
0 1 4 2.0
1 2 10 1.0
2 3 8 NaN
file2.csv
ItemID ItemName Desc
0 1 Toy For kids
1 2 Game For adults
2 3 NaN For everyone
file3.CSV
ItemID Barcode
0 1 667865
1 2 998364
2 3 43831
The following is the code I am working on:
import glob , os
import pandas as pd
path = '/location/test/'
all_files = glob.glob(os.path.join(path, "*.csv"))
pd.concat([pd.read_csv(f, sep=';').assign(file=os.path.basename(f)) for f in all_files]).to_csv('merged.csv', index=False)