How to exclude certain rows in a pandas dataframe in Python

Question

I have an Excel sheet which has a list of folder names. I have to read the Excel sheet and create folder names on my drive. However if the process breaks during creation or if there is an exception then when I rerun the process it should exclude the folders which have already have been created.

Below is my current Python code:

data = pd.read_excel(r'C://Users//file1//Desktop//folderlist.xls')
print(data["producttype"])#folder list is in producttype column name
print(data.head())
data.drop("Unnamed: 0",axis=1,inplace=True)
root=(r'C://Users//file1//Desktop//google//')
dirlist =pd.DataFrame( [ item for item in os.listdir(root) if os.path.isdir(os.path.join(root, item)) ])
df=pd.DataFrame([x[0] for x in os.walk(r'C://Users//file1//Desktop//google//')])
print(dirlist)
for i in dirlist:
    for k,j in enumerate(data["producttype"]):
        if i==j:
            data.drop(data.producttype.index[k],axis=0,inplace=True)

While this is executing it is not excluding the already created folders.

Can someone help me to fix the issue?

can you show the format of `data` ? just a few rows should do — Umar.H, Nov 09 '20 at 19:52

score 2 · Answer 1 · answered Jun 05 '20 at 10:54

This question boils down to safely create a (nested) directory, answered here: How can I safely create a nested directory?

This code should do the trick, taken from the linked question:

import pandas as pd
from pathlib import Path

df_folders = pd.read_excel('file.xlsx', sheet_name='info', header=0)
for folder in df_folders['producttype']:
    Path(folder).mkdir(parents=True, exist_ok=True)

How to exclude certain rows in a pandas dataframe in Python

1 Answers1