2

.* is any character zero or more number of times. I was trying to find words starting with vowel and i used

^[aeiou](*) 

And it gave me all words starting with vowel. Same result is given when i do ^[aeiou].* Now i was looking for words that end with vowels. I did

^(*)[aeiou]$ 

It gave no result but when i did ^.*[aeiou]$ it gave valid results. Please explain the difference in the meaning of both. Thanks

Vorsprung
  • 28,957
  • 4
  • 32
  • 55
  • 2
    Are you sure [`^[aeiou](*)`](https://regex101.com/r/gjZbkh/1) fetched you anything at all? – Wiktor Stribiżew Sep 13 '17 at 11:19
  • Yes it did it gave valid results –  Sep 13 '17 at 11:21
  • Technically, `(*)` is not a valid `regex` or even `regex` fragment. I suppose there are implementations that either do not detect the error or ignore/fix it somehow. – axiac Sep 13 '17 at 11:23
  • You should read some tutorial about regex, it takes just a minute and will help you doing research yourself. – yacc Sep 13 '17 at 11:26
  • I was doing a sql command using regexp in Oracle and the results were like i have mentioned @axiac –  Sep 13 '17 at 11:26
  • Yes i am reading...there is no (*) in documentation but i saw this somewhere on web but the point is it was giving results. Why so @yacc –  Sep 13 '17 at 11:28
  • *"but the point is it was giving results"* -- you cannot rely on the fact that it produces now the results you expect. Being an incorrect `regex`, it may get fixed in a different way on a future version. – axiac Sep 13 '17 at 11:31
  • 1
    @WiktorStribiżew Yes, `*` (or `+`) with no preceding character appears to match zero (or one)-or-more instances of a `NULL` (zero-width) string. `(*)` is a capturing group which matches zero-or-more `NULL` strings. So `^[aeiou](*)` is effectively the same as `^[aeiou]()` or `^[aeiou]`. – MT0 Sep 13 '17 at 13:07
  • 2
    @MT0: The oracle tag was added after my comment. – Wiktor Stribiżew Sep 13 '17 at 13:08
  • Thanks ;) i like it –  Sep 13 '17 at 13:11
  • 1
    @JotWaraich Here you go. It is really a nice one. – Wiktor Stribiżew Sep 13 '17 at 13:11
  • Most of the times i make sense but community hits me bad...@Wiktor No doubt there is great help here... But some downvote without even understanding the need of question –  Sep 13 '17 at 13:12
  • 1
    @JotWaraich you got down votes because your question wasn't originally tagged as an Oracle SQL question. It's just unfortunate that you didn't realise that "regexp" in general is not the same as REGEXP_LIKE – Vorsprung Sep 13 '17 at 15:21

2 Answers2

1

The difference is that (*) is an invalid regexp and .* is valid

* means "zero or more of the previous character or expression"

. is "any character except newline" so .* is "any number of any character"

( and ) are used to delimit a capture group and must match. So the ( cannot be the "previous character"

To use a ( in a regexp use \( to get a literal bracket and match that

In some implementations of regexp (for instance in vim) the ( is assumed to be escaped and the reverse syntax applies so \( becomes the group capture bracket

In this case (*) will match "any number of ( including none, followed by a )"

NB: the above is NOT the case for Oracle REGEXP_LIKE which seems to have it's own, non standard syntax based loosely on the usual PCRE or Unix grep, see answer by @MT0 below

Vorsprung
  • 28,957
  • 4
  • 32
  • 55
  • I agree @Vorsprung and i had the same thought about it but why is it giving results as i mentioned above. There should be some error message or indicator to invalid regex –  Sep 13 '17 at 11:30
  • Technically `(*)` is not *invalid*, it's just being treated as `[(]*[)]`. – yacc Sep 13 '17 at 11:40
  • 1
    `SELECT * FROM DUAL WHERE REGEXP_LIKE( '123', '(*)' )` returns one row - `123` does not match `[(]*[)]`. – MT0 Sep 13 '17 at 11:45
  • googling for "REGEXP_LIKE" syntax shows examples that use ``(*)`` Clearly, Oracle REGEXP_LIKE is not standard regexp – Vorsprung Sep 13 '17 at 11:57
  • Exactly that was what i was trapped in and asked the question @Vorsprung. You can upvote my question –  Sep 13 '17 at 12:40
1

It appears that, in Oracle 11, the * and + patterns (with nothing preceding them) generate zero-width matches. So (*) and (+) are capturing groups () containing the zero-width patterns * or + respectively.

All these queries return a row:

SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '*' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '(*)' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '^*' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '^(*)' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '*1' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '+' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '(+)' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '^+' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '^(+)' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '+1' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '^+1' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '+1$' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '^+1$' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '()' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '12', '1(*)2' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '12', '1(+)2' );

These queries do not return any rows:

SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '*2' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '^*$' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', '^+$' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '12', '11(*)2' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '12', '11(+)2' );

Also, matching any pattern against NULL or a string against a NULL pattern will not return any rows:

SELECT * FROM DUAL WHERE REGEXP_LIKE( NULL, '*' );
SELECT * FROM DUAL WHERE REGEXP_LIKE( '1', NULL );

Using REGEXP_SUBSTR:

SELECT REGEXP_SUBSTR( '1', '+' ) FROM DUAL;

Outputs a single row containing NULL.

MT0
  • 86,097
  • 7
  • 42
  • 90
  • What do you mean by zero width matches @MT0 –  Sep 13 '17 at 12:50
  • 1
    The pattern will match zero characters (but will be a match). In the way that `1*` can match zero-or-more `1`s then `*` (without a preceding character) will match zero-or-more zero-width strings and `+` will match one-or-more zero-width strings. – MT0 Sep 13 '17 at 12:51
  • Does that mean anything having width i.e. even one character would be matched?? @MT0 –  Sep 13 '17 at 12:54
  • 1
    `*` and `+` on their own will match zero characters (but it will be a match) in a non-null string. So `REGEXP_LIKE( '12', '1(+)2' )` will match the character `1` then the capturing group will match one-or-more zero-width strings (so effectively a `NULL` string) and finally the `2` character. – MT0 Sep 13 '17 at 12:58