2

I have started to use Google's Dataprep solution to cleanse eCommerce product feeds. As I receive data from 100s of eCommerce stores, I want to cleanse the data for consistency and rename the various spellings of brand names. For example, I have a column 'Vendor' that has millions of rows with Adidas spelt differently:

adidas
Adidas
Adidas classic
Adidas orginals
adidas originals
adidas skateboarding
Adidas Skateboarding

For the purpose of my requirements, I want to rename all examples to 'adidas'. I was looking at the various routines in Dataprep and the Replace function could do the work, however, it's not a scalable solution.

Is there a way in Dataprep to have a master file of brand names and do a lookup on this data and replace the incorrect instances? In Excel, a simple VLOOKUP might work and I am questioning if this exists in Dataprep.

I hope the above makes sense, thank you to those who can help.

Craig

1 Answers1

1

If you have a master file that maps incorrect spellings to a standardized name, the lookup dialog (in the column menu) might be what you're looking for: https://cloud.google.com/dataprep/docs/html/Lookup-Wizard_57344860

Lars Grammel
  • 1,627
  • 14
  • 19