1

I have the following task: 1) I have an excel file with a few spreadsheets. From these spreadsheets I need information from columns "A:CU", rows 41 - 51 2) Then I need to collect information from from columns "A:CU", rows 41 - 51 from all spreadsheets (they have the same structure) and to create a database. 3) There should be a column that indicates from which spreadsheet data was collected

I did following:

import pandas as pd
file='January2020.xlsx'
#getting info from spreadsheets C(1), C(2) and so on
days = range(1,32)
sheets = []
for day in days:
    sheets.append('C(' + str(day)+')')
#importing data
all_sales=pd.read_excel(file,header=None,skiprows=41, usecols="A:CU", sheet_name=sheets,
                skipfooter=10)

Now I have collections.OrderedDict and struggle to put it into dataFrame.

enter image description here

What I need to have is a dataframe like this: enter image description here

Marwen Jaffel
  • 668
  • 1
  • 5
  • 12
kskirpic
  • 87
  • 1
  • 5

2 Answers2

1

Try pd.concat

df = pd.concat(all_sales, ignore_index = True) 
ManojK
  • 1,224
  • 2
  • 5
  • 14
  • no, It doesn'r work. I got a column with C(1), C(2)...without any other data – kskirpic Mar 02 '20 at 10:29
  • I also tried this code : ```xls=pd.ExcelFile(file); df=xls.parse("C(1)", header=None, skiprows=40, usecols="A:CU", skipfooter=10); df.insert(0,"Date","C(1)")``` It works perfectly with 1 spreadsheet. Then I try to go through spreadsheets C(1) - C(31) with this code : ```days = range(1,32) sheets = [] for day in days: sheets.append('C(' + str(day)+')') xls=pd.ExcelFile(file) df=xls.parse(sheets, header=None, skiprows=40, usecols="A:CU", skipfooter=10) df.insert(0,"Date",sheets) ``` but again end up with 'collections.OrderedDict' object has no attribute 'insert' – kskirpic Mar 02 '20 at 10:30
0

I used this code and it worked:

file='January2020.xlsx'
days = range(1,32)
all_sales=pd.DataFrame()
df = pd.DataFrame()
all_df = []
for day in days:
    sheet_name = "C("+str(day)+")"
    all_sales=pd.read_excel(file,header=None,skiprows=41,usecols="A:CU", sheet_name=sheet_name,
                skipfooter=10)
    all_sales["Date"] = sheet_name
    all_df.append(all_sales)
df_final = pd.concat(all_df)

kskirpic
  • 87
  • 1
  • 5