0

I have a list of 20000 Products with their Description This shows the variety of the products

I want to be able to write a code that searches a particular word say 'TAPA' and give a output of all the TAPAs

I found this Find a specific word from a list in python , but it uses startswith which finds only the first item for example:

 new = [x for x in df1['A'] if x.startswith('00320')]

 ## output ['00320671-01 Guide rail 25N/1660', '00320165S02 - Miniature rolling table']

How shall i find for the second letter, third or any other item

P.S- the list consists of strings, integers, floats

2 Answers2

1

You can use string.find(substring) for this purpose. So in your case this should work:

new = [x for x in df1['A'] if x.find('00320') != -1] 

The find() method returns the lowest index of the substring found else returns -1.

To know more about usage of find() refer to Geeksforgeeks.com - Python String | find()

Edit 1: As suggested by @Thierry in comments, a cleaner way to do this is:

new = [x for x in df1['A'] if '00320' in x]
Paandittya
  • 735
  • 7
  • 16
0

You can use the built-in functions of Pandas to find partial string matches and generate lists:

new = df1['A'][df1['A'].astype(str).str.contains('00320')]['A'].tolist()

An advantage of pandas str.contains() is that the use of regex is possible.

Community
  • 1
  • 1
David
  • 222
  • 1
  • 11
  • Thanks man much obliged. Is there a way to find similar items ? like every item that looks like '00320047S01' or '00201179S01' ?? – Adarsh Bhansali Sep 28 '18 at 08:19
  • Welcome. Similar items is a classic regex case - if you haven't used regex yet, check it out! You could find '00320047S01' or '00201179S01' by using ...str.contains(r'00[0-9]{6}S01',regex=True)... - in words: find any items that has '00' followed by 6 numbers ([0-9] = only match numbers, {6} = exactly six times) followed by 'S01'. Read here for an intro: https://stackoverflow.com/questions/4736/learning-regular-expressions – David Sep 28 '18 at 08:28