6

How can I get a sentence that is in double quotes in which there is a dot that must be split?

Example document like this:

“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.

I want to get output like this:

Array
(
    [0] =>"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.
    [1] =>"On a chess board you are fighting. as we are also fighting the hardships in our daily life," he said.
 )

My code still explode by dots.

function sample($string)
{
    $data=array();
    $break=explode(".", $string);
    array_push($data, $break);

    print_r($data);
}

I'm still confused to split two delimiter about double quote and dot. because inside double quote there is a sentence that contain dot delimiter.

Community
  • 1
  • 1
Rachmad
  • 141
  • 8

3 Answers3

2

A perfect example for (*SKIP)(*FAIL):

“[^“”]+”(*SKIP)(*FAIL)|\.\s*
# looks for strings in double quotes
# throws them away
# matches a dot literally, followed by whitespaces eventually


In PHP:
$regex = '~“[^“”]+”(*SKIP)(*FAIL)|\.\s*~';
$parts = preg_split($regex, $your_string_here);

This yields

Array
(
    [0] => “Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen
    [1] => “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”
)

See a demo on regex101.com as well as a demo on ideone.com.

Jan
  • 38,539
  • 8
  • 41
  • 69
  • Awesome.. thanks about your solution.. it's very helpfull.. @Jan – Rachmad May 20 '17 at 06:43
  • can you tell me what the meaning of character `~` in your regex sintax? Cz I try to learn regex but I didn't find the character `~` in regex. Or can you give me reference to learn regex character?, thanks. – Rachmad May 26 '17 at 10:30
  • @Rachmad: These are delimiters such as `/` or `#` and needed on both sides of the regex string. – Jan May 26 '17 at 19:49
  • Oh..so If I change `~` to ~/~ its no problem? @Jan – Rachmad May 26 '17 at 20:44
0

Here is a simpler pattern used by preg_split() followed by preg_replace() to fix the left and right double quotes up (Demo):

$in='“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.” he said.';

$out=preg_split('/ (?=“)/',$in,null,PREG_SPLIT_NO_EMPTY);

$find='/[“”]/u';  // unicode flag is essential
$replace='"';
$out=preg_replace($find,$replace,$out);  // replace curly quotes with standard double quotes

var_export($out);

Output:

array (
  0 => '"Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen.',
  1 => '"On a chess board you are fighting. as we are also fighting the hardships in our daily life." he said.',
)

preg_split() matches the space followed by a (LEFT DOUBLE QUOTE).

The preg_replace() step requires a pattern with the u modifier to make sure the left and right double quotes in the character class are identified. Using '/“|”/' means you can remove the u modifier, but it doubles the steps that the regex engine has to perform (for this case, my character class uses just 189 steps versus the piped characters using 372 steps).

Furthermore regarding the choice between preg_split() and preg_match_all(), the reason to go with preg_split() is because the objective is to merely split the string on the space that is followed by a left double quote. preg_match_all() would be a more practical choice if the objective was to omit substrings not neighboring the delimiting space character.

Despite my logic, if you still want to use preg_match_all(), my preg_split() line can be replaced with:

$out=preg_match_all('/“.+?(?= “|$)/',$in,$out)?$out[0]:null;
mickmackusa
  • 33,121
  • 11
  • 58
  • 86
0

Alternatively:

regex101 ( 16 steps )

“.[^”]+”(?:.[^“]+)?

  • “.[^”]+” matches everything between and .
  • (?:.[^“]+)? matches - a possibility, this why there's the last ?- of everything that's not a starting , ?: means non-capturing group.

PHP - PHPfiddle: - Hit "Run-F9" - [ updated to replace , with " ]

<?php
    $str = '“Chess helps us overcome difficulties and sufferings,” said Unnikrishnan, taking my queen. “On a chess board you are fighting. as we are also fighting the hardships in our daily life.”';

if(preg_match_all('/“.[^”]+”(?:.[^“]+)?/',$str, $matches)){
    echo '<pre>';
    print_r(preg_replace('[“|”]', '"', $matches[0]));
    echo '</pre>';
}
?>

output:

Array
(
    [0] => "Chess helps us overcome difficulties and sufferings," said Unnikrishnan, taking my queen. 
    [1] => "On a chess board you are fighting. as we are also fighting the hardships in our daily life."
)
Mi-Creativity
  • 9,101
  • 10
  • 33
  • 44