-5

I'm trying to capture the %20's in a URL and replace them with +'es, as well as strip away some other stuff, all preferably using a single regular expression.

Specifically, I'd like something like this

a%20sentence%20divided%20by%20spaces_123456.html

to be turned into something like this

a+sentence+divived+by+spaces

Edit: for clarity, it's crucial both the %20's AND the trailing _1233456.html are targeted, preferably using one single expression.

The source can be targeted with

^([\w]+%20)+.*\.html$ (multiple occurrences of [\w]+%20, followed by any character, followed by .html)

but I'm confused about how to specifically replace both the multiple occurrences of %20 and the trailing '123456'. I'd guess this would be a shot in the right direction

^(([\w]+)%20)+([\w]+)_[0-9]+\.html$

$1 being each occurrence of ([\w]+)%20, $2 being each occurrence of [\w]+ within the first match, and $3 being [\w]+, but I'm not getting the result I'm looking for (using Sublime Text for this):

string: a%20sentence%20divided%20by%20spaces_123456.html
search: ^(([\w]+)%20)+([\w]+)_[0-9]+\.html$
replace: $2+$3
expected result: a+sentence+divided+by+spaces
actual result: by+spaces

Any ideas where my line of thought goes awry?

Ack
  • 263
  • 1
  • 4
  • 10
  • 1
    What is the problem with simply replacing `%20` with `+`? – ndnenkov Jan 13 '16 at 10:33
  • [`decodeURIComponent()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent). `str = decodeURIComponent(str).replace(/\s+/g, '+');` – Tushar Jan 13 '16 at 10:35
  • @ndn Mostly because of the trailing text (_123456.html) that also needs to be stripped using the same single expression, and therefore has to be part of the matching pattern. – Ack Jan 13 '16 at 10:44

2 Answers2

0

You can use two regular expressions (there may be better solutions though):

var string ="a%20sentence%20divided%20by%20spaces_123456.html";
// replace %20 with +
var regex1 = '%20';
var re1 = new RegExp(regex1, 'g');
string = string.replace(re1, '+');
// trailing _12345
var regex2 = '([^_]+)_([^.]+)(\.html)$';
// match everything except an underscore and capture it in group 1
// underscore
// match everything except a dot
// match the file extension (html in this case) and capture it in group 3
var re2 = new RegExp(regex2);
string = string.replace(re2,'$1$3');
// replace the string with capture group 1 and 3
alert(string);

See a JS fiddle here.

Jan
  • 38,539
  • 8
  • 41
  • 69
  • Thanks for you reply, however this doesn't strip away the trailing `_123456.html` - that's crucial. – Ack Jan 13 '16 at 10:49
0

Replacing parts of a string with different strings depending on what has been captured isn't something easily done with regex. It can be done very easily using 2 regular expressions. However if you really want to do this with only 1 regex, here is a solution

Solution with 1 regular expression :

original_string = 'a%20sentence%20divided%20by%20spaces_123456.html'
searched_string = original_string + "+"
regex : '%20(?=[^\+]*(\+))|_[^_]*$'
replace : '$1'
result : a+sentence+divided+by+spaces

For the explanation :
The regex will search for either a "%20" followed by any string of character ending with "+" and capture the "+" OR every character after the last "_" and capture nothing
It will then replace the matched string by the capture string, which is a "+" if "%20" has been matched, and nothing if it's the end of the string
To work, this regex needs that the string contains a "+".
That is why you NEED to concatenate it at the end of your string (it will be erased by the regex at the end anyway)

Gawil
  • 1,091
  • 4
  • 13