How to open excel file, write data to column from dataframe column, save as new file

Question

I have an Excel file that contains a column with header 'Original Translation'. I also have a DataFrame with column 'Original Translation - {language}', based on the language I am using and some manipulations.

My goal is to open the Excel file, write over the column with header 'Original Translation' with all the data from my DataFrame column 'Original Translation - {language}', preserve the original Excel file formatting, and save to a new output folder.

Here is the code I currently have:

def output_formatted_capstan_file(df, original_file, country, language):
    # this is where you generate the file:
    # https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas
    try:
        print(original_file)
        book = load_workbook(original_file)
        writer = pd.ExcelWriter(original_file, engine='openpyxl')
        writer.book = book
        df.to_excel(writer, ['Original Translation - {}'.format(language)])
        writer.save()
    except:
        print('Failed')

score 1 · Accepted Answer · answered Feb 11 '19 at 16:40

I would approach this using the following method.

1) Import the excel file using a function such as pandas.read_excel, thus taking the data from excel into a dataframe. I'll call this exceldf

2) Merge this dataframe with the data you already have in the Pandas DataFrame. I will call your existing translated dataframe translateddf

3) Reorder the newly merged dataframe newdf and then export out the data. Further options for how to reorder are shown here: re-ordering data frame

4) Export the data to Excel. I'll leave you to integrate it into your initial code. For the generic answer to the question, others may want to look into the integrated Pandas options here to_excel

Example Code

import pandas

# Read in the Excel file
exceldf = pandas.read_excel(open('your_xls_xlsx_filename'), sheetname='Sheet 1')

# Create a new dataframe with your merged data, merging on 'key1'.
# We then drop the column of the original translation, as it should no longer be needed
# I've included the rename argument in case you need it.
newdf = exceldf.merge(translateddf, left_on=['key1'], \
                right_on=['key1']) \
.rename(columns={'Original Translation {language}': 'Original Translation {language}'}) \
.drop(['Original Translation'], axis=1)

# Re-order your data. 
# Note that if you renamed anything above, you have to update it here too
newdf = newdf[['0', '1', '2', 'Original Translation {language}']]

# An example export, that uses the generic implementation, not your specific code
pandas.newdf.to_excel("output.xlsx")

This is great. I appreciate the detail tremendously. My only issue, I am losing the formatting of the original excel, which contains color coded stuff, etc. I would ideally like to use something that does change those things.. do you have an suggestions? — sgerbhctim, Feb 11 '19 at 18:54
Formatting is often excel proprietary settings (especially conditional formatting). Instead of trying to read this into a dataframe, could I suggest that you copy > paste special > formatting, between the old and new excel file? — HadynB, Feb 11 '19 at 20:49
Sure thing, this is what I have been doing currently. Unfortunately, when I start processing thousands of Excel files, I won't have a chance to do that.. — sgerbhctim, Feb 11 '19 at 20:50
In that case, you will likely be forced to write the rules in xlsxwriter: https://xlsxwriter.readthedocs.io/example_pandas_conditional.html — HadynB, Feb 11 '19 at 20:52
You're welcome. Sorry, but I've not been forced to use that yet. Best of luck! — HadynB, Feb 11 '19 at 20:56

How to open excel file, write data to column from dataframe column, save as new file

1 Answers1