23

I have this simple pattern that splits a text into periods:

$text = preg_split("/[\.:!\?]+/", $text);

But I want to include . : or ! at the end of the array items.

That is, now for "good:news.everyone!" I have:

array("good", "news", "everyone", "");

But I want:

array("good:", "news.", "everyone!", "");
Peter Mortensen
  • 28,342
  • 21
  • 95
  • 123
skyline26
  • 1,834
  • 4
  • 20
  • 34

2 Answers2

56

Here you go:

preg_split('/([^.:!?]+[.:!?]+)/', 'good:news.everyone!', -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

How it works: The pattern actually turns everything into a delimiter. Then, to include these delimiters in the array, you can use the PREG_SPLIT_DELIM_CAPTURE constant. This will return an array like:

array (
    0 => '',
    1 => 'good:',
    2 => '',
    3 => 'news.',
    4 => '',
    5 => 'everyone!',
    6 => '',
);

To get rid of the empty values, use PREG_SPLIT_NO_EMPTY. To combine two or more of these constants, we use the bitwise | operator. The result:

array (
    0 => 'good:',
    1 => 'news.',
    2 => 'everyone!'
);
Elias Van Ootegem
  • 67,812
  • 9
  • 101
  • 138
  • And what if i need to split "good:" as whole word with the : ? And can i also add Tags in? So what i need is Good: – user1551496 Mar 05 '16 at 10:18
  • 2
    @user1551496: Then you're dealing with markup. Use a parser instead of regex, because [regex can't handle markup well](http://stackoverflow.com/a/1732454/1230836) – Elias Van Ootegem Mar 05 '16 at 12:40
  • 1
    @NinoŠkopac: `[^.:!?]+` greedily matches all characters except `.:!?` one or more times, the next character group greedily matches `.:!?` one or more times. These two character classes are grouped because the pattern wraps them in `()`, so the result is you match everything. However, the match ends whenever one or more of `.:!?` is encountered, and the next match is set aside in a new group, hence the array -> match, empty, match, empty... using `PREG_SPLIT_DELIM_CAPTURE`, you ensure the matches used as delimiter are in the array, `PREG_SPLIT_NO_EMPTY` gets rid of the empty bits – Elias Van Ootegem Aug 02 '17 at 15:41
  • 1
    Essentially, every part in the string is a delimiter, and you're splitting out empty strings. You ignore the empty strings and ask `preg_split` to give you the matched delimiters, which is what the OP was after – Elias Van Ootegem Aug 02 '17 at 15:42
9

No use for PREG_SPLIT_DELIM_CAPTURE if you use a positive lookbehind in your pattern. The function will keep the delimiters.

$text = preg_split('/(?<=[.:!?])/', 'good:news.everyone!', 0, PREG_SPLIT_NO_EMPTY);

If you use lookbehind, it will just look for the character without matching it. So, in the case of preg_split(), the function will not discard the character.

The result without PREG_SPLIT_NO_EMPTY flag:

array (
    0 => 'good:',
    1 => 'news.',
    2 => 'everyone!',
    3 => ''
);

The result with PREG_SPLIT_NO_EMPTY flag:

array (
    0 => 'good:',
    1 => 'news.',
    2 => 'everyone!'
);

You can test it using this PHP Online Function Tester.

pmrotule
  • 6,841
  • 3
  • 41
  • 49