Explode and/or regex text to HTML link in PHP

Question

I have a database of texts that contains this kind of syntax in the middle of English sentences that I need to turn into HTML links using PHP

"text1(text1)":http://www.example.com/mypage

Notes:

text1 is always identical to the text in parenthesis
The whole string always have the quotation marks, parenthesis, colon, so the syntax is the same for each.
Sometimes there is a space at the end of the string, but other times there is a question mark or comma or other punctuation mark.
I need to turn these into basic links, like

<a href="http://www.example.com/mypage">text1</a>

How do I do this? Do I need explode or regex or both?

is there always a space or something else after the url? can text1 contain parenthesis or escaped quotes? — Casimir et Hippolyte, Aug 31 '14 at 15:25
text1 doesn't contain any punctuation mark. Sometimes there is a space at the end of the url, but other times there is a question mark or comma or other punctuation mark. — break68, Aug 31 '14 at 15:45
In the middle of English sentences, were is the example for sentence? Url can't be parsed with a simple regex. Other than that the delimiter looks like `"()":` would this be a conflict with the other parts of the sentence? — , Aug 31 '14 at 16:18

vks · Answer 1 · 2014-08-31T16:24:20.333

1

"(.*?)\(\1\)":(.*\/[a-zA-Z0-9]+)(?=\?|\,|\.|$)

You can use this.

See Demo.

http://regex101.com/r/zF6xM2/2

edited Aug 31 '14 at 16:24

answered Aug 31 '14 at 15:34

vks

63,206
9
78
110

+1 for using a simple solution (a backreference) however `but other times there is a question mark or comma or other punctuation mark` - could do with `[\?\.\,]?` or similar stuck outside the last capturing group, otherwise this trailing punctuation will be in the url. And: `text1 doesn't contain any punctuation mark`, the first group can be more restrictive. – AD7six Aug 31 '14 at 16:16

score 0 · Answer 2 · answered Aug 31 '14 at 15:32

0

You can use this regex:

"(.*?)\(.*?:(.*)

Working demo

enter image description here

answered Aug 31 '14 at 15:32

Federico Piazza

27,409
11
74
107

2

i think you don't need to include the screenshot. – Avinash Raj Aug 31 '14 at 15:51
Its a good idea actually.Clearly shows the regex without needing to go to the actual link. @Fede nice way to express your ans. – vks Aug 31 '14 at 16:14
Thanks for the regex AND the screenshot. :-) – break68 Sep 01 '14 at 14:48

score 0 · Answer 3 · answered Aug 31 '14 at 15:40

0

An appropriate Regular Expression could be:

$str = '"text1(text1)":http://www.example.com/mypage';
preg_match('#^"([^\(]+)' .
           '\(([^\)]+)\)[^"]*":(.+)#', $str, $m);
print '<a href="'.$m[3].'">'.$m[2].'</a>' . PHP_EOL;

answered Aug 31 '14 at 15:40

giusc

93
3

Casimir et Hippolyte · Accepted Answer · 2014-09-01T15:53:36.393

You can use this replacement:

$pattern = '~"([^("]+)\(\1\)":(http://\S+)(?=[\s\pP]|\z)~'; 

$replacement = '<a href="\2">\1</a>'; 

$result = preg_replace($pattern, $replacement, $text);

pattern details:

([^("]+) this part will capture text1 in the group 1. The advantage of using a negated character class (that excludes the double quote and the opening parenthesis) is multiple:

it allows to use a greedy quantifier, that is faster
since the class excludes the opening parenthesis and is immediatly followed by a parenthesis in the pattern, if in an other part of the text there is content between double quotes but without parenthesis inside, the regex engine will not go backward to test other possibilities, it will skip this substring without backtracking. (This is because the PCRE regex engine converts automatically [^a]+a into [^a]++a before processing the string)

\S+ means all that is not a whitespace one or more times

(?=[\s\pP]|\z) is a lookahead assertion that checks that the url is followed by a whitespace, a punctuation character (\pP) or the end of the string.

Thank you for the answer and for the explanation. I tested it and it worked. — break68, Sep 01 '14 at 14:46
@break68: I'm glad for you. However be careful with the question mark, since it can be at the end an url *(used for GET values but without GET values)*. In this case, the question mark at the end of the url will be "transformed" into a trailing question mark. I do not know the urls you have to deal with, but it can be a trap. — Casimir et Hippolyte, Sep 01 '14 at 19:15

Explode and/or regex text to HTML link in PHP

4 Answers4