I'll let others go for regex solutions. I'll propose something that hopefully more readable.
The following code uses my Parser class from Paladio (it's under CC-BY 3.0), it works on UTF-8.
The code is explained in the comments:
<?php
$string = "Hello how are ##you ? I want ##join you.";
// now I want replace all ##.... words with <a>##.....</a>
//Create a parser object for the string
$parser = new Parser($string);
//Create a variable to hold the result
$result = '';
//While we haven't reached the end of the string
while($parser->CanConsume())
{
//Take all the text until the next '##' or the end of the string
// and add to the result the string taken
$result .= $parser->ConsumeUntil('##');
//If we take a '##'
if ($parser->Consume('##'))
{
//Take the text until the next whitespace or new line
$tag = $parser->ConsumeUntil(array(' ', "\t", "\r", "\n"));
//Add the new converted text to the result
$result .= '<a>###'.$tag.'</a>';
}
}
// example Hello how are <a>##you</a> ? I want <a>##join</a>you.
echo $result;
?>
Based on the comments, this is a modified version that will allow to detect words marked with any of the given strings ('##'
and '**'
in the example):
function autolink($ptext, $detc)
{
// declared whitespace for readability and performance
$whitespace = array(' ', "\t", "\r", "\n");
$parser = new Parser($ptext);
$result = '';
while($parser->CanConsume())
{
$result .= $parser->ConsumeUntil($detc);
if ($parser->Consume($detc))
{
$newtag = $parser->ConsumeUntil($whitespace);
$result .= '<a href='.$newtag.'>'.$newtag.'</a>';
}
}
return $result;
}
Example usage:
echo autolink("Hello how are ##you ? I want **join you.", array('##', '**'));
Outputs:
Hello how are <a href=you>you</a> ? I want <a href=join>join</a> you.
Tested on my local server.
Notes:
The instruction $parser->Consume($detc)
will return the found string, so you can use it to branch, example:
$input = $parser->Consume(array('a', 'b'));
if ($input === 'a')
{
// do something
}
else if ($input === 'b')
{
// do something else
}
else /if ($input === null)/
{
// fallback case
}
The supported things to Consume
are:
- Given strings.
- Arrays of strings.
- Numbers (amount of characters).
- Null (will consume a single character).
Parser
uses mb_*
functions for some of the operations^, and expects UTF-8
. If you experience problems with encoding you want to call mb_internal_encoding('UTF-8');
before using Parser
and convert your string to UTF-8 (I recommend iconv
for this operation). ^: Some other parts are optimized using byte per byte operations.