1

I'm trying to:

Import a CSV of UPC codes into a dataframe. If the UPC code is 11 characters , append '0' to it. Ex: 19962123818 --> 019962123818

This is the code:

 #check UPC code length. If 11 characters, adds '0' before. If < 11 or > 13, throws Error
for index, row in clean_data.iterrows():
    if len(row['UPC']) == 11:
        row['UPC'] = ('0' + row['UPC'])
        #clean_data.set_value(row, 'UPC',('0' + (row['UPC']))
        print ("Edited UPC:", row['UPC'], type(row['UPC']))
    if len(row['UPC']) < 11 or len(row['UPC']) > 13:
        print ('Error, UPC length < 11 or > 13:')
        print ("Error in UPC:", row['UPC'])
        quit()

However, when I print the data, the original value is not edited:

enter image description here

Does anyone know what is causing this issue?

I tried the set_value method as mentioned in other posts, but it didn't work.

Thanks!


Thanks for the vectorized approach, much cleaner! However, I get the following error, and the value is still not updating:

enter image description here

Alex_L
  • 171
  • 1
  • 1
  • 9

3 Answers3

4

Can I suggest a different method?

#identify the strings shorter than 11 characters
fix_indx = clean_data.UPC.astype(str).str.len()<11

#append these strings with a '0'
clean_data.loc[fix_indx] = '0'+clean_data[fix_indx].astype(str)

To fix the others, you can similarly do:

bad_length_indx = clean_data.UPC.astype(str).str.len()>13
clean_data.loc[bad_length] = np.nan
Gene Burinsky
  • 6,553
  • 2
  • 15
  • 23
1

According to iterrowsdocumentation:

  1. You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

row['UPC'] = ('0' + row['UPC']) silently modifies a copy of the row, and clean_data is kept unmodified.

Do adopt a vectorized approach of your algorithm like @Gene is suggesting.

Boud
  • 26,823
  • 8
  • 58
  • 72
0

I finally fixed it. Thanks again for the vectorized idea. If anyone has this issue in the future, here's the code I used. Also, see this post for more info.

UPC_11_char = clean_data.UPC.astype(str).str.len() == 11
clean_data.ix[UPC_11_char, 'UPC'] = '0' + clean_data[UPC_11_char]['UPC'].astype(str)

print clean_data[UPC_11_char]['UPC']
Community
  • 1
  • 1
Alex_L
  • 171
  • 1
  • 1
  • 9