Regex: replace multiple occurrences of a pattern within the same line, using one single regular expression

Question

I'm trying to capture the %20's in a URL and replace them with +'es, as well as strip away some other stuff, all preferably using a single regular expression.

Specifically, I'd like something like this

a%20sentence%20divided%20by%20spaces_123456.html

to be turned into something like this

a+sentence+divived+by+spaces

Edit: for clarity, it's crucial both the %20's AND the trailing _1233456.html are targeted, preferably using one single expression.

The source can be targeted with

^([\w]+%20)+.*\.html$ (multiple occurrences of [\w]+%20, followed by any character, followed by .html)

but I'm confused about how to specifically replace both the multiple occurrences of %20 and the trailing '123456'. I'd guess this would be a shot in the right direction

^(([\w]+)%20)+([\w]+)_[0-9]+\.html$

$1 being each occurrence of ([\w]+)%20, $2 being each occurrence of [\w]+ within the first match, and $3 being [\w]+, but I'm not getting the result I'm looking for (using Sublime Text for this):

string: a%20sentence%20divided%20by%20spaces_123456.html
search: ^(([\w]+)%20)+([\w]+)_[0-9]+\.html$
replace: $2+$3
expected result: a+sentence+divided+by+spaces
actual result: by+spaces

Any ideas where my line of thought goes awry?

[`decodeURIComponent()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent). `str = decodeURIComponent(str).replace(/\s+/g, '+');` — Tushar, Jan 13 '16 at 10:35
@ndn Mostly because of the trailing text (_123456.html) that also needs to be stripped using the same single expression, and therefore has to be part of the matching pattern. — Ack, Jan 13 '16 at 10:44

Jan · Answer 1 · 2016-01-13T10:58:21.220

You can use two regular expressions (there may be better solutions though):

var string ="a%20sentence%20divided%20by%20spaces_123456.html";
// replace %20 with +
var regex1 = '%20';
var re1 = new RegExp(regex1, 'g');
string = string.replace(re1, '+');
// trailing _12345
var regex2 = '([^_]+)_([^.]+)(\.html)$';
// match everything except an underscore and capture it in group 1
// underscore
// match everything except a dot
// match the file extension (html in this case) and capture it in group 3
var re2 = new RegExp(regex2);
string = string.replace(re2,'$1$3');
// replace the string with capture group 1 and 3
alert(string);

See a JS fiddle here.

Thanks for you reply, however this doesn't strip away the trailing `_123456.html` - that's crucial. — Ack, Jan 13 '16 at 10:49

score 0 · Answer 2 · answered May 04 '17 at 13:13

Replacing parts of a string with different strings depending on what has been captured isn't something easily done with regex. It can be done very easily using 2 regular expressions. However if you really want to do this with only 1 regex, here is a solution

Solution with 1 regular expression :

original_string = 'a%20sentence%20divided%20by%20spaces_123456.html'
searched_string = original_string + "+"
regex : '%20(?=[^\+]*(\+))|_[^_]*$'
replace : '$1'
result : a+sentence+divided+by+spaces

For the explanation :
The regex will search for either a "%20" followed by any string of character ending with "+" and capture the "+" OR every character after the last "_" and capture nothing
It will then replace the matched string by the capture string, which is a "+" if "%20" has been matched, and nothing if it's the end of the string
To work, this regex needs that the string contains a "+".
That is why you NEED to concatenate it at the end of your string (it will be erased by the regex at the end anyway)

Regex: replace multiple occurrences of a pattern within the same line, using one single regular expression

2 Answers2