-1

I would like to strip Cantillation from Hebrew strings, not Nikkud. I found this JS code. How do I do this in PHP ?

function stripCantillation(str){
    return str.replace(/[\u0591-\u05AF]/g,"").replace("׀", "").replace("׃","").replace("־","");
}

Hebrew text with Cantillation

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ

Hebrew text without Cantillation

בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָרֶץ

mickmackusa
  • 33,121
  • 11
  • 58
  • 86

1 Answers1

1

This is the php-friendly regex pattern that includes all of your unicode characters:

/[\x{0591}-\x{05AF}\x{05BE}\x{05C0}\x{05C3}]/u

(Pattern Demo)
To express these unicode characters, the 4-character codes are wrapped in curly brackets {} and prepended with \x. The u flag must trail the expression. The contents of the character class (between the square brackets []) begins with a range of characters, followed by three individual characters.

The following snippet will execute the regex pattern with php and display the output depending on if any replacements were actually made. Of course, if you don't need to count the replacements, you can just re-declare the input string with the return value from preg_replace() and omit the 3rd and 4th parameters.

Code (Demo):

$inputs = [
    'בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָֽרֶץ',
    'בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ',
];
foreach ($inputs as $input) {
    $output = preg_replace('/[\x{0591}-\x{05AF}\x{05BE}\x{05C0}\x{05C3}]+/u', '', $input, -1, $count);
    echo !$count ? "no change" : "Replacement Count: {$count}\nBefore: {$input}\n After: {$output}";
    echo "\n---\n";
}

Output:

no change
---
Replacement Count: 6
Before: בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ
 After: בְּרֵאשִׁית בָּרָא אֱלֹהִים אֵת הַשָּׁמַיִם וְאֵת הָאָֽרֶץ
---

This is a highlighted table of the characters that will be replaced: Image Source: http://unicode.org/charts/PDF/U0590.pdf

enter image description here

mickmackusa
  • 33,121
  • 11
  • 58
  • 86
  • 1
    Thank you SO MUCH ! This worked perfectly and you gave me all the resources I need to learn about regex and Hebrew Unicode ! I'm going to study about them more. Thank you ! – Regina Hong Jun 11 '17 at 20:18