-1

I'm struggling to figure out the best way to strip out all the content in a URL from a specific keyword onwards (including the keyword), using either regex or a substring operation. So if I have an example dynamic URL http://example.com/category/subcat/filter/size/1/ - I would like to strip out the /filter/size/1 element of the URL and leave me with the remaining URL as a separate string. Grateful for any pointers. I should clarify that the number of arguments after the filter keyword isn't fixed and could be more than in my example and the number of category arguments prior to the filter keyword isn't fixed either

bsod99
  • 1,167
  • 6
  • 14
  • 29
  • `'http://example.com/category/subcat/filter/size/1/'.replace(/^.*filter\/size\/1/, '')` try [regex101.com](https://regex101.com). – RobG Oct 11 '20 at 14:14

5 Answers5

0

Use the split() function.

url='http://example.com/category/subcat/filter/size/1/';
console.log(url.split('/filter')[0]);
JMP
  • 2,299
  • 17
  • 26
  • 34
  • Thanks for the answer - that won't catch variable nested levels of subcategories - e.g http://example.com/category/subcat/subsubcat/filter/size/1/color/black - sorry i should have clarified that in the OP – bsod99 Oct 11 '20 at 14:29
  • This splits on a word. – JMP Oct 11 '20 at 14:48
  • @bsod99 What kind of URL do you expect with subcategories? – plalx Oct 11 '20 at 14:54
0

To be a little safer you could use the URL object to handle most of the parsing and then just sanitize the pathname.

const filteredUrl = 'http://example.com/category/subcat/filter/test?param1&param2=test';

console.log(unfilterUrl(filteredUrl));

function unfilterUrl(urlString) {
  const url = new URL(urlString);
  url.pathname = url.pathname.replace(/(?<=\/)filter(\/|$).*/i, '');
  return url.toString();
}
plalx
  • 39,329
  • 5
  • 63
  • 83
0

You can tweak this a little based on your need. Like it might be the case where filter is not present in the URL. but lets assume it is present then consider the following regex expression.

/(.*)\/filter\/(.*)/g

the first captured group ( can be obtained by $1 ) is the portion of the string behind the filter keyword and the second captured group ( obtained by $2 ) will contain all your filters present after the filter keyword

have a look at example i tried on regextester.com regex tester

Ahmed Nawaz Khan
  • 711
  • 9
  • 30
0

Split

The simplest solution that occurs to me is the following:

const url = 'http://example.com/category/subcat/filter/size/1/';
const [base, filter] = url.split('/filter/');

// where:
// base == 'http://example.com/category/subcat'
// filter == 'size/1/'

If you expect more than one occurrence of '/filter/', use the limit parameter of String.split(): url.split('/filter/', 2);

RegExp

The assumption of the above is that after the filter parameter, everything is part of the filter. If you need more granularity, you can use a regex that terminates at the '?', for example. This will remove everything from 'filter/anything/that/follows' that immediately follows a / and until the first query string separator ?, not including.

const filterRegex = /(?<=\/)filter(\/|$)[^?]*/i;

function parseURL(url) {
    const match = url.match(filterRegex);
    if (!match) { return [url, null, null]; } // expect anything

    const stripped = url.replace(filterRegex, '');
    return [url, stripped, match[0]];
}

const [full, stripped, filter] = parseURL('http://example.com/category/subcat/filter/size/1/?query=string');

// where:
// stripped == 'http://example.com/category/subcat/?query=string'
// filter == 'filter/size/1/'
Roubo
  • 13
  • 4
0

I'm sadly not able to post the full answer here, as i'ts telling me 'it looks like spam'. I created a gist with the original answer. In it i talk about the details of String.prototype.match and of JS/ES regex in general including named capture groups and pitfalls. And incude a link to a great regex tool: regex101. I'm not posting the link here in fear of triggering the filter again. But back to the topic:

In short, a simple regext can be used to split and format it (using filter as the keyword):

  • /^(.*)(\/filter\/.*)$/

or with named groups:

  • /^(?<main>.*)(?<stripped>\/filter\/.*)$/

(note that the forward slashes need to be escaped in a regex literal)

Using String.prototype.match with that regex will return an array of the matches: index 1 will be the first capture group (so everything before the keyword), index 2 will be everything after that (including the keyword).

Again, all the details can be found in the gist