1

This question already has a duplicate question (Duplicate question)
I'm not able to create a comment there, so i'm creating a new one.

The solution provided is very explanatory, but I am still not able to get a clear view of preg_match_all() .

I tried the following code

preg_match_all("/#+([a-zA-Z0-9_]+)/i","#test this is #php test",$matches);

var_dump($matches)

The result is

array(2) {
  [0]=>
  array(2) {
    [0]=>
    string(5) "#test"
    [1]=>
    string(4) "#php"
  }
  [1]=>
  array(2) {
    [0]=>
    string(4) "test"
    [1]=>
    string(3) "php"
  }
}

My understanding is that the regex will only select the string starting with '#' as per my code.

But in the result, the array contains the string with '#' and without '#'.

Please help me figure this out. What am I missing.

Community
  • 1
  • 1
Braike dp
  • 146
  • 1
  • 8
  • 1
    `My understanding is that the regex will only select the starting with '#' as per my code` - thats because your understanding is wrong. You need to add the start of string `^` to the front of it. `/^#+([a-zA-Z0-9_]+)/i` The caret `^` when outside of a character group `[...]` means match the start of the `string/line`. Inside the `[^0-9]` means `Not`, it's confusing. But that is how it works. Also you can replace `[a-zA-Z0-9_]` with `\w` which is the same thing. And the `i` flag is pointless here, which case insensitive, as you have both upper and lowercase. That gives us `/^#+(\w+)/` – ArtisticPhoenix Mar 14 '19 at 07:18
  • 1
    In the first array there is result of the whole regexp. In the next array there are results of each bracket (in your case `[a-zA-Z0-9_]+`). – Pavel Třupek Mar 14 '19 at 07:23
  • Conversely the `$` matches the end. I would have posted it as a answer, but I saw the duplicate link. For the matches they are indexed by the capture groups `(...)` starting at `1`. The `0` just like in `preg_match` is the "full match" – ArtisticPhoenix Mar 14 '19 at 07:24
  • 2
    Take a look at the section on groups in the duplicate. Your `[0]` array elements match the entire regex i.e. `#+([a-zA-Z0-9_]+)` => `['#test', '#php']`, the `[1]` elements match just the contents of the `()` i.e. `[a-zA-Z0-9_]+` => `['test','php']` – Nick Mar 14 '19 at 07:24
  • @ArtisticPhoenix insight on `^` was a new information. thank you for that. – Braike dp Mar 14 '19 at 07:26
  • 1
    Sure, Regex can be very confusing. I am pretty good at it now after some years of struggling, although lookarounds still get me once in a while. Regex is extremely powerful, so it's worth the work to learn it. A good testing site is this, https://regex101.com/ Mainly because you can save your stuff, and it has built in documentation and a great regex parser. it doesn't do everything correctly from a PHP standpoint, such as there is no `preg_match_all` but you can get close. – ArtisticPhoenix Mar 14 '19 at 07:30
  • @Nick ok. but, in that case why did it skip the strings without # `this is`. – Braike dp Mar 14 '19 at 07:31
  • 1
    It skipped them due to the lack of both the `#` and a `\s` for spaces. What your code does is match any part that starts with a `#` and is followed `a-z`, `A-Z`, `0-9` or `_`. No spaces. – ArtisticPhoenix Mar 14 '19 at 07:32
  • 1
    @Braikedp because they do not match *the whole regex*. – Nick Mar 14 '19 at 07:32
  • @ArtisticPhoenix Nick Thank you for the help. That is very informative and clarified my doubt. – Braike dp Mar 14 '19 at 07:34
  • 1
    Here is your Regex in the Testing site I mentioned above, https://regex101.com/r/5Sm864/1 if you look on the right it gives you a full explanation of what it does. Which is why I like this tester so much. Note the `g` flag is more of a Javascript Regex thing, then a PHP thing. this is one of those things I said it's not 100% correct in PHP. – ArtisticPhoenix Mar 14 '19 at 07:35

0 Answers0