how to concatenate multiple excel sheets from the same file?

Question

I have a big excel file that contains many different sheets. All the sheets have the same structure like:

Name
col1  col2  col3  col4
1     1     2     4
4     3     2     1

How can I concatenate (vertically) all these sheets in Pandas without having to name each of them manually? If these were files, I could use glob to obtain a list of files in a directory. But here, for excel sheets, I am lost.
Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?

Thanks!

MaxU · Accepted Answer · 2020-11-28T13:47:50.173

8

Try this:

dfs = pd.read_excel(filename, sheet_name=None, skiprows=1)

this will return you a dictionary of DFs, which you can easily concatenate using pd.concat(dfs) or as @jezrael has already posted in his answer:

df = pd.concat(pd.read_excel(filename, sheet_name=None, skiprows=1))

sheet_name: None -> All sheets as a dictionary of DataFrames

UPDATE:

Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?

dfs = pd.read_excel(filename, sheet_name=None, skiprows=1)

assuming we've got the following dict:

In [76]: dfs
Out[76]:
{'d1':    col1  col2  col3  col4
 0     1     1     2     4
 1     4     3     2     1, 'd2':    col1  col2  col3  col4
 0     3     3     4     6
 1     6     5     4     3}

Now we can add a new column:

In [77]: pd.concat([df.assign(name=n) for n,df in dfs.items()])
Out[77]:
   col1  col2  col3  col4 name
0     1     1     2     4   d1
1     4     3     2     1   d1
0     3     3     4     6   d2
1     6     5     4     3   d2

edited Nov 28 '20 at 13:47

answered Oct 06 '17 at 12:42

MaxU

173,524
24
290
329

2

Then `pd.concat(dfs.values())` yields the result. – blacksite Oct 06 '17 at 12:44
nice but how can I get rid of the `Name` super column? I thought I could use some read_excel (skip = 1) somewhere with the dictionary? – ℕʘʘḆḽḘ Oct 06 '17 at 12:45
1

@blacksite, yes, thank you. I thought OP knows it already... ;-) – MaxU Oct 06 '17 at 12:45
1

@ℕʘʘḆḽḘ, use `skiprows=1` – MaxU Oct 06 '17 at 12:46
@MaxU just a little change if you can. Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from? – ℕʘʘḆḽḘ Oct 06 '17 at 13:03
say col4 that is sheet1 sheet1 sheet1 sheet2 sheet2 etc – ℕʘʘḆḽḘ Oct 06 '17 at 13:04
1

@ℕʘʘḆḽḘ, i've updated my answer - is that what you want? – MaxU Oct 06 '17 at 13:05
@ℕʘʘḆḽḘ, glad i could help :) – MaxU Oct 06 '17 at 13:10
Note that this should now read `dfs = pd.read_excel(filename, sheet_name=None, skiprows=1)` (Note the underscore in sheet_name) – zzz Nov 26 '20 at 04:34

jezrael · Answer 2 · 2017-10-06T12:49:03.703

4

First add parameter sheetname=None for dict of DataFrames and skiprows=1 for omit first row and then use concat for MultiIndex DataFrame.

Last use reset_index for column from first level:

df = pd.concat(pd.read_excel('multiple_sheets.xlsx', sheetname=None, skiprows=1))
df = df.reset_index(level=1, drop=True).rename_axis('filenames').reset_index()

edited Oct 06 '17 at 12:49

answered Oct 06 '17 at 12:44

jezrael

629,482
62
918
895

thanks jezrael. same issue, how can I get rid of the first line for every sheet here? – ℕʘʘḆḽḘ Oct 06 '17 at 12:45
`skiprows=1` should help. – jezrael Oct 06 '17 at 12:46
great. thanks guys but I had to give this one to the fastest one! :) – ℕʘʘḆḽḘ Oct 06 '17 at 12:47

blacksite · Answer 3 · 2017-10-06T13:23:02.410

3

Taking a note from this question:

import pandas as pd

file = pd.ExcelFile('file.xlsx')

names = file.sheet_names  # see all sheet names

df = pd.concat([file.parse(name) for name in names])

Results:

Then you can run df.reset_index(), to, well, reset the index.

Edit: pandas.ExcelFile.parse is, according to the pandas docs:

Equivalent to read_excel(ExcelFile, ...) See the read_excel docstring for more info on accepted parameters

edited Oct 06 '17 at 13:23

answered Oct 06 '17 at 12:43

blacksite

10,028
6
44
94

thanks but why using the `file.parse` instead of some `read.excel?` – ℕʘʘḆḽḘ Oct 06 '17 at 12:44
1

See above, please. – blacksite Oct 06 '17 at 12:46
great. thanks guys but I had to give this one to the fastest one! :) – ℕʘʘḆḽḘ Oct 06 '17 at 12:48

score 0 · Answer 4 · answered Apr 26 '19 at 21:19

file_save_location='myfolder'                                
file_name='filename'

location = ''myfolder1'
os.chdir(location)
files_xls = glob.glob("*.xls*")
excel_names=[f for f in files_xls]
sheets = pd.ExcelFile(files_xls[0]).sheet_names
def combine_excel_to_dfs(excel_names, sheet_name):
    sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_names]
    combined_df = pd.concat(sheet_frames).reset_index(drop=True)
    return combined_df

i = 0

while i < len(sheets):
    process = sheets[i]
    consolidated_file= combine_excel_to_dfs(excel_names, process)
    consolidated_file.to_csv(file_save_location+file_name+'.csv')
    i = i+1
else:
    "we done on consolidation part"

Would be better if you could edit the post with some description followed by the code. — coderpc, Apr 26 '19 at 21:38

how to concatenate multiple excel sheets from the same file?

4 Answers4

Linked

Related