1

I'm using the following code to split my UTF-8 strings to characters:

$characters = preg_split('//u', $word, -1, PREG_SPLIT_NO_EMPTY);

In some cases, a character might have a single quote after it. for example: hel'lo. I want to keep that quote with the character before it.

Using the regex above, my array is this:

Array
(
    [0] => h
    [1] => e
    [2] => l
    [3] => '
    [4] => l
    [5] => o
)

And I want the array to be:

Array
(
    [0] => h
    [1] => e
    [2] => l'
    [3] => l
    [4] => o
)

How can I do it? Thanks!

(the single quote can be at the beginning of the string, at the end of it and in the middle of it).

Itay Ganor
  • 3,025
  • 1
  • 15
  • 34
  • I would not use `preg_split` if you deal with UTF8 strings. I'd recommend `preg_match_all("~\X'?~u", $s, $m)` to get all Unicode chars with an optional `'` after them. Your other cases with an initial `'` are not clear to me, please add details to the question. – Wiktor Stribiżew Jul 09 '17 at 16:35
  • Please show your expected results when splitting `'hello` and `hello'`. With the first sample, there is no character before the single quote -- should the `'` be by itself or bound to the `h`? – mickmackusa Jul 19 '17 at 02:43

2 Answers2

0

Rather than split, you can do preg_match_all using

'?\p{L}'?

i.e. an optional ' before and after the Unicode letter:

preg_match_all("/'?\\p{L}'?/u", $str, $matches);

RegEx Demo

anubhava
  • 664,788
  • 59
  • 469
  • 547
0

Use ! to prevent from split

$characters = preg_split("/(?!')/u", $word, -1, PREG_SPLIT_NO_EMPTY);
B. Desai
  • 16,092
  • 5
  • 22
  • 43