1

I get differents results, anyone could tell me why?

RegExp:

[0-9]+(?:\s){0,10}(?:\r?\n?)([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}),([0-9]{1,3}) --> ([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}),([0-9]{1,3})(?:\s){0,10}(?:\r\n|\n|\r){1}(.*\r?\n?.*\r?\n?.*)(?:\n|\r)(?:\n|\r)

On Regex101 I use 'gm' modifiers.

On PHP I use:

preg_match_all($this->Pattern, $txt, $matches, PREG_SET_ORDER);

Regex101 result (look match 4 - this is correct. Pattern get only empty line, without any "time line text"):

MATCH 1
1.  [2-4]   `00`
2.  [5-7]   `00`
3.  [8-10]  `01`
4.  [11-14] `163`
5.  [19-21] `00`
6.  [22-24] `00`
7.  [25-27] `05`
8.  [28-31] `150`
9.  [32-39] `aaaaaaa`
MATCH 2
1.  [43-45] `00`
2.  [46-48] `00`
3.  [49-51] `05`
4.  [52-55] `556`
5.  [60-62] `00`
6.  [63-65] `00`
7.  [66-68] `05`
8.  [69-72] `921`
9.  [73-82] `bbbb
bbbb`
MATCH 3
1.  [86-88] `00`
2.  [89-91] `00`
3.  [92-94] `07`
4.  [95-98] `753`
5.  [103-105]   `00`
6.  [106-108]   `00`
7.  [109-111]   `08`
8.  [112-115]   `168`
9.  [116-130]   `cccccccccccccc`
MATCH 4
1.  [134-136]   `00`
2.  [137-139]   `00`
3.  [140-142]   `22`
4.  [143-146]   `854`
5.  [151-153]   `00`
6.  [154-156]   `00`
7.  [157-159]   `28`
8.  [160-163]   `721`
9.  [164-164]   ``
MATCH 5
1.  [168-170]   `00`
2.  [171-173]   `00`
3.  [174-176]   `23`
4.  [177-180]   `336`
5.  [185-187]   `00`
6.  [188-190]   `00`
7.  [191-193]   `31`
8.  [194-197]   `558`
9.  [198-228]   `dddddddddddddd
dddddddddddddd
`
MATCH 6
1.  [232-234]   `00`
2.  [235-237]   `00`
3.  [238-240]   `34`
4.  [241-244]   `228`
5.  [249-251]   `00`
6.  [252-254]   `00`
7.  [255-257]   `36`
8.  [258-261]   `296`
9.  [262-276]   `eeeeeeeeeeeeee`
MATCH 7
1.  [280-282]   `00`
2.  [283-285]   `00`
3.  [286-288]   `35`
4.  [289-292]   `165`
5.  [297-299]   `00`
6.  [300-302]   `00`
7.  [303-305]   `39`
8.  [306-309]   `785`
9.  [310-320]   `fffff
ffff`

My Server Results (look at "[3] => Array", pattern gets two "time lines"):

(
    [0] => Array
        (
            [0] => 1
00:00:01,163 --> 00:00:05,150
aaaaaaa

2

            [1] => 00
            [2] => 00
            [3] => 01
            [4] => 163
            [5] => 00
            [6] => 00
            [7] => 05
            [8] => 150
            [9] => aaaaaaa

2
        )

    [1] => Array
        (
            [0] => 00:00:05,556 --> 00:00:05,921
bbbb
bbbb


            [1] => 0
            [2] => 00
            [3] => 05
            [4] => 556
            [5] => 00
            [6] => 00
            [7] => 05
            [8] => 921
            [9] => bbbb
bbbb

        )

    [2] => Array
        (
            [0] => 3
00:00:07,753 --> 00:00:08,168
cccccccccccccc

4

            [1] => 00
            [2] => 00
            [3] => 07
            [4] => 753
            [5] => 00
            [6] => 00
            [7] => 08
            [8] => 168
            [9] => cccccccccccccc

4
        )

    [3] => Array
        (
            [0] => 00:00:22,854 --> 00:00:28,721


5
00:00:23,336 --> 00:00:31,558
dddddddddddddd

            [1] => 0
            [2] => 00
            [3] => 22
            [4] => 854
            [5] => 00
            [6] => 00
            [7] => 28
            [8] => 721
            [9] => 5
00:00:23,336 --> 00:00:31,558
dddddddddddddd
        )

    [4] => Array
        (
            [0] => 6
00:00:34,228 --> 00:00:36,296
eeeeeeeeeeeeee

7

            [1] => 00
            [2] => 00
            [3] => 34
            [4] => 228
            [5] => 00
            [6] => 00
            [7] => 36
            [8] => 296
            [9] => eeeeeeeeeeeeee

7
        )

    [5] => Array
        (
            [0] => 00:00:35,165 --> 00:00:39,785
fffff
ffff


            [1] => 0
            [2] => 00
            [3] => 35
            [4] => 165
            [5] => 00
            [6] => 00
            [7] => 39
            [8] => 785
            [9] => fffff
ffff

        )

)

Test String:

1
00:00:01,163 --> 00:00:05,150
aaaaaaa

2
00:00:05,556 --> 00:00:05,921
bbbb
bbbb

3
00:00:07,753 --> 00:00:08,168
cccccccccccccc

4
00:00:22,854 --> 00:00:28,721


5
00:00:23,336 --> 00:00:31,558
dddddddddddddd
dddddddddddddd


6
00:00:34,228 --> 00:00:36,296
eeeeeeeeeeeeee

7
00:00:35,165 --> 00:00:39,785
fffff
ffff
Bermar
  • 13
  • 5
  • What's the value of `$this->Pattern` ? – Thomas Ayoub Sep 15 '16 at 13:09
  • value is "/[0-9]+(?:\s){0,10}(?:\r?\n?)([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}),([0-9]{1,3}) --> ([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}),([0-9]{1,3})(?:\s){0,10}(?:\r\n|\n|\r)(.*\r?\n?.*\r?\n?.*)(?:\n|\r)(?:\n|\r)/m" – Bermar Sep 15 '16 at 13:13
  • I [cannot repro](https://ideone.com/MyM30a). The results are in line with [the regex101 demo](https://regex101.com/r/kH5iP1/1). – Wiktor Stribiżew Sep 15 '16 at 13:13
  • Your approach seems a bit ugly. I think you are trying to parse an .srt file, probably to extract informations. What are the informations you want to extract? Why do you need to extract the timecode in separate parts? Isn't it better to work line by line and to get the dialogues until the next line with a number? – Casimir et Hippolyte Sep 15 '16 at 13:21

1 Answers1

1

The reason why this happens is the different line break styles at regex101 (\n) and in your input (\r\n).

You can easily solve this by using a unified \R pattern for any kind of linebreaks.

Note I did not optimize your pattern, I am just showing how to solve the problem stated in the question:

'~[0-9]+\s{0,10}\R?([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}),([0-9]{1,3}) --> ([0-9]{1,2}):([0-9]{1,2}):([0-9]{1,2}),([0-9]{1,3})\s{0,10}\R(.*\R?.*\R?.*)\R{2}~'

See the PHP demo

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • Thx - problem was \r\n . I have file with "windows end of line" (HEX: 0d0a), that my regexp doesn't work on server. – Bermar Sep 15 '16 at 14:22
  • Yes, that is why you should be using either `(?:\r?\n|\r)` instead of all your attempted linebreaks, or - best - use a simple `\R` shorthand that PCRE regex flavor supports well. – Wiktor Stribiżew Sep 15 '16 at 14:33