1

In Trifacta or Google Cloud Dataprep, i'm trying to flag rows with non alpha numeric (�). What formula do I use? tried this formula but doesn't work

Replace Matches of `�` from EMPLOYEE_FIRST with NOT VALID
Dan Bracuk
  • 20,184
  • 4
  • 24
  • 39
x117342
  • 13
  • 2

1 Answers1

1

Can you clarify what you mean by "doesn't work"?

The following step works for me in Dataprep. You can paste it directly into the New Step wizard:

replacepatterns col: EMPLOYEE_FIRST with: 'NOT VALID' on: `�` global: true

If that does not work for you, can you please post a screenshot of what happens after adding (or trying to add) this step?

kuipersn
  • 170
  • 9
  • no changes after i tried this step. same values with "�" appeared on the previewed column. I want to flag those rows in EMPLOYEE_FIRST column with "�" . I'm also trying to look for a boolean formula to tag it as true if "�" exist – x117342 Apr 28 '20 at 09:11
  • Alright, well I just replicated my initial result on free Wrangler as well. I am using a very trivial CSV dataset. What is the type of your input dataset? How is it encoded? Did you modify encoding during import? – kuipersn Apr 28 '20 at 10:34
  • it's a simple txt file from third party and i'm trying to transform it using cloud dataprep. the actual value from .txt file is "Ñ". when dataset is imported to dataprep, the value becomes "�" – x117342 Apr 28 '20 at 12:35
  • 1
    I made a text file that only contained instances of Ñ. It came up just fine in Dataprep. However, when I re-imported the file and changed the encoding to US-ASCII, it came up as all � characters.That is why I asked about the encoding, and mentioned changing the encoding during import (see Edit Settings after you specify the file to import). Dataprep imports using UTF-8 by default, but something is interfering. Perhaps you can work around this by playing with the encoding during import. – kuipersn Apr 28 '20 at 12:46
  • I should add: if no headway then consider filing a support ticket with Trifacta (support@trifacta.com). Include a link to this thread for context, as well as the edition of Dataprep (click Settings => About) and if possible your sample dataset. – kuipersn Apr 28 '20 at 12:54
  • I re-uploaded the dataset and set the encoding to US-ASCII. It worked! Thanks for your help! – x117342 Apr 29 '20 at 08:15