0

I am trying to combine a regular expression for two cases in a query to ClickHouse DataBase:

  1. Up to symbol "@" => (^[^@]+)
  2. Without domain 'company\' and 'company.com\' => (?:company\.com\\\\\\\\+|company\\\\\\\\+)(.*)

I have two ways that I found using the "regex online" :

  1. ^company\\\\(.*)|^(.*)@|^(?!company\\\\.*).*$
  2. (?<=company\\\\)(.*)|(.*)(?<=@)|^(?!company\\\\.*).*

But, constructions ?! and ?<= are not supported (error : invalid perl operator). I've used command extract, e.g. extract(User, '(?<=company\\\\)(.*)|(.*)(?<=@)|^(?!company\\\\.*).*')

Could you give me some advice to understand how to make this regex suitable way, please.

In brackets - should pass, out of - shouldn't:

  1. company\[ABC-1D2E]
  2. [ABC-DE]@company.com
  3. [123-ABCDE]
  4. [12345]
Karina
  • 1
  • 1
  • could you provide the examples that should pass and shouldn't? – vladimir Apr 01 '20 at 07:25
  • sure) In brackets - should pass, out of brackets - shouldn't 1. company\\\[ABC-1D2E] 2. [ABC-DE]@company.com 3. [123-ABCDE] 4. [12345] – Karina Apr 01 '20 at 07:59

1 Answers1

0

It looks like need to use the several serial calls of replaceRegexpOne:

SELECT str,
    replaceRegexpOne(str, '(.*company\\\\)(.*)', '\\2') preliminary_result_1,
    replaceRegexpOne(preliminary_result_1, '(.*)(@company.com)', '\\1') result
FROM (
    /* test data */
    SELECT arrayJoin([
        'company\\ABC-1D2E',
        'ABC-DE@company.com',
        '123-ABCDE',
        '12345'
        ]) str
)
/* result:
┌─str────────────────┬─preliminary_result_1─┬─result────┐
│ company\ABC-1D2E   │ ABC-1D2E             │ ABC-1D2E  │
│ ABC-DE@company.com │ ABC-DE@company.com   │ ABC-DE    │
│ 123-ABCDE          │ 123-ABCDE            │ 123-ABCDE │
│ 12345              │ 12345                │ 12345     │
└────────────────────┴──────────────────────┴───────────┘
*/
vladimir
  • 8,809
  • 2
  • 23
  • 47
  • Thanks, Vladimir! replaceRegexOne is good function, but the challenge is to use extract and make one complex regex for one field. – Karina Apr 01 '20 at 15:51