this works fine until there are more than 1 pdf links in one table cell
The regex engine is greedy by default, and it consumes as much as it can attempting a match. In order to reverse this behaviour, you could use a lazy quantifier, as explained in this post: Greedy vs. Reluctant vs. Possessive Quantifiers. So you have to add an extra ?
after a quantifier to attempt a match with as less as it can consume. To make your greedy construct lazy, use [^\s]+?
.
some containing string http://www.example.com/subfolder/name.pdf
or /subfolder/name.pdf
But how to replace the pattern with another pattern?
As you can see, "http://www.example.com
" is optional. You can make a part of your pattern optional with a (?:group)
and a ?
quantifier.
Pattern with an optional group:
(?:http://www\.example\.com)?/(\S+?)\.pdf
- Don't forget to escape the dots, as they have a special meaning in regex.
- Notice I used
\S
(capital "S") instead of [^\s]
(they are both exactly the same).
One more thing, you may consider adding some boundaries in your pattern. I suggest using (?<!\w)
(not preceded by a word character) and \b
a word boundary to avoid a match as part of another word (as I commented in your question).
Regex:
(?<!\w)(?:http://www\.example\.com)?/(\S+?)\.pdf\b
Code:
$re = "@(?<!\\w)(?:http://www\\.example\\.com)?/(\\S+?)\\.pdf\\b@i";
$str = "some containing string http://www.example.com/subfolder/name.pdf
or /subfolder/name.pdf
<a href=\"/pdf/subdir/name.pdf\">clickhere</a>
<a href=\"/pdf/subdir/name.pdf\">2nd PDF</a>";
$subst = "/wp-content/uploads/old/$1.pdf";
$result = preg_replace($re, $subst, $str);
Test in regex101