0

I currently have a code that runs through all excel files in a directory and parses all data from a sheet # in the workbooks to one final sheet. I am trying to have the code access the sheets by a specific sheet name, all excel files have a sheet titled "Data Narrative" that I am trying to access. How do I get that to work instead of grabbing the sheets by index position?

Current code is below.

import pandas as pd
from os import listdir
from os.path import isfile, join

onlyfiles = [f for f in listdir('ALL EDTs') if isfile(join('ALL EDTs', f))]



# filenames
excel_names = onlyfiles

# read them in
excels = [pd.ExcelFile('ALL EDTS/'+ name) for name in excel_names]

# turn them into dataframes
frames = [x.parse(x.sheet_names[3], header=None,index_col=None) for x in 
excels]

# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[4:] for df in frames[1:]]

# concatenate them..
combined = pd.concat(frames)

# write it out
combined.to_excel("all.xlsx", header=False, index=False)
kaner32
  • 3
  • 2

2 Answers2

0

I would use pd.read_excel() for this, as it has an argument to specify to sheet name. Suppose all your filenames are in a list called f_names:

combined = pd.concat(
              pd.read_csv(open(f, 'rb'), sheet_name="Data Narrative") for f in f_names
           )
KenHBS
  • 5,620
  • 6
  • 30
  • 42
0

Welcome to Stackoverflow, kaner32!

You can just use sheet_name='Data Narrative as an argument in the .parse or pd.ExcelFile class call function.

For more look at the documentation here.

I found the solution in this post.

mcsitter
  • 52
  • 7