I'm trying to capture the %20's in a URL and replace them with +'es, as well as strip away some other stuff, all preferably using a single regular expression.
Specifically, I'd like something like this
a%20sentence%20divided%20by%20spaces_123456.html
to be turned into something like this
a+sentence+divived+by+spaces
Edit: for clarity, it's crucial both the %20
's AND the trailing _1233456.html
are targeted, preferably using one single expression.
The source can be targeted with
^([\w]+%20)+.*\.html$
(multiple occurrences of [\w]+%20
, followed by any character, followed by .html
)
but I'm confused about how to specifically replace both the multiple occurrences of %20
and the trailing '123456'. I'd guess this would be a shot in the right direction
^(([\w]+)%20)+([\w]+)_[0-9]+\.html$
$1
being each occurrence of ([\w]+)%20
, $2
being each occurrence of [\w]+
within the first match, and $3
being [\w]+
, but I'm not getting the result I'm looking for (using Sublime Text for this):
string: a%20sentence%20divided%20by%20spaces_123456.html
search: ^(([\w]+)%20)+([\w]+)_[0-9]+\.html$
replace: $2+$3
expected result: a+sentence+divided+by+spaces
actual result: by+spaces
Any ideas where my line of thought goes awry?