0

I'm considering using Cloud DLP to help me anonymize my data. However, I can't seem to find explicit mention of what languages are supported. AWS Comprehend's detect PII API only supports English so looking for an alternative.

kylejmcintyre
  • 1,585
  • 2
  • 13
  • 18

1 Answers1

2

In the detectors reference page you can find the detectors per country

https://cloud.google.com/dlp/docs/infotypes-reference

For global detectors as PHONE_NUMBER there is no information about the languages supported, but you can test the support for your language in the demo page

https://cloud.google.com/dlp/demo/#!/

For example if you write in Spanish Mi teléfono es 600111222 (my phone is 600111222) it detects a PHONE_NUMBER with LIKELY likelihood, but if you write Me puedes llamar al 600111222 (You can call me at 600111222) it detect just a GENERIC_ID with LOW likelihood.

Also, if in the previous examples, you add the country prefix (+34600111222), the likelihood increases to VERY_LIKELY in the first one and the second one detects a PHONE_NUMBER as POSSIBLE

In summary, it works with other languages and uses the context to improve the matches, but you should play with some samples to check the accuracy in your specific use case