1

How to replace a part of string with avoid a year numbers (f.e. 2019 or 2019-2020) before the first slash occurance with Regex

//something is wrong here
preg_replace('/^[a-z0-9\-]+(-20[0-9]{2}(-20[0-9]{2})?)?/', '$1', $input_lines);

Needed:

abc-def/something/else/ [incl. slash if there is not character before it]

abc-def-2019/something/else/

abc-def-2019-2020/something/else/

abc-def-125-2019/something/else/

alexso
  • 153
  • 1
  • 7
  • 1
    That is usually done like this [(?m)^(?:(?!20(?:19|20))\[a-z0-9\-\])+/?](https://regex101.com/r/j9qA27/1) Expanded https://regex101.com/r/5JT2Sk/1 –  Oct 27 '19 at 22:33

2 Answers2

1

My initial closure was insufficient to handle all requirements. Yes, you have a greedy quantifier problem, but there is more to handle.

Code: (Demo) (Regex101 Demo)

$tests = [
    'abc-def/something/else/',
    'abc-def-2019/something/else/',
    'abc-def-2019-2020/something/else/',
    'abc-def-125-2019/something/else/'
];

var_export(
    preg_replace('~^(?:[a-z\d]+-?)*?(?:/|(?=20\d{2}-?){1,2})~', '', $tests)
);

Output:

array (
  0 => 'something/else/',
  1 => '2019/something/else/',
  2 => '2019-2020/something/else/',
  3 => '2019/something/else/',
)

My pattern matches alpha-numeric sequences, optionally followed by a hyphen -- a subpattern than may be repeated zero or more times ("giving back", aka non-greedy, when possible).

Then the first non-capturing group must be followed by a slash (which is matched) or a your year substrings which also may have a trailing hyphen (this is not matched, but found via a lookahead).

If this doesn't suit your real projects data, you will need to provide more and more accurate samples to test against which reveal the fringe cases.

mickmackusa
  • 33,121
  • 11
  • 58
  • 86
0

If the forward slash has to be present and it should stop after the first occurrence of 2019 or 2020, you might use:

^(?=[a-z\d-]*/)[a-zA-Z013-9-]+(?>2(?!0(?:19|20)(?!\d))|[a-zA-Z013-9-]+)*/?

In separate parts that would look like

  • ^ Start of string
  • (?=[a-z\d-]*/) Assert that a / is present
  • [a-zA-Z013-9-]+ Match 1+ times any of the listed (Note that the 2 is not listed)
  • (?> Atomic group
    • 2(?!0(?:19|20)(?!\d)) Match 2 and assert what is on the right is not 019 or 020
    • | Or
    • [a-zA-Z013-9-]+ Match 1+ times any of the listed
  • )* Close group and repeat 0+ times
  • /? Match optional /

Regex demo | Php demo

Your code might look like

preg_replace('~^(?=[a-z\d-]*/)[a-zA-Z013-9-]+(?>2(?!0(?:19|20)(?!\d))|[a-zA-Z013-9-]+)*/?~', '', $input_lines);
The fourth bird
  • 96,715
  • 14
  • 35
  • 52