Update
"Note that there are many more symbols that may not be visible."
null
(\u{0}
) has been added. If you want to detect all whitespaces regardless of width use the meta sequence plus, quantifier \s+
(+
quantifier matches one or more of the characters that precedes it.) and an alternate(s) |
between it and the rest of the regex (|
is an OR gate).
This may be a
XY problem. If you really need to replace zero width characters with
[Fill in the blank] then proceed. If you need a way to copy them and keep the url intact then encode the url with
encodeURIComponent()
instead.
There's only four zero width characters:
- U+feff - zero width no-break space
- U+200b - zero width space
- U+200c - zero width non-joiner
- U+200d - zero width joiner
In order to use Unicode in RegEx we must use the u
nicode flag and the following syntax:
U+feff >>> \u{feff}
The following demo will extract the slug from a url and removes the previously mentioned zero width characters. If you wish to replace them with something (I have no idea why it would be useful to do so...), then do the following:
- First Parameter:
url
|String| (required)
- Second Parameter:
slug
|Boolean| (default: true
) Returns the slug by default. If false
it returns the full url. Pass explicitly in order to pass third parameter explicitly.
- Third Parameter:
rpl
String or Regex This is the replacement characters. By default it will remove zero width characters without a replacement.
// Each url has a zero width character in the slug
let u200b = `https://example.com/path/to/slugs`;
let ufeff = `https://example.com/path/to/slugs`;
let u200c = `https://example.com/path/to/slugs`;
let u200d = `https://example.com/path/to/slugs`;
const getSlug = (url, slug = true, rpl = '') => {
let regex = /\u{0}+|\u{feff}+|\u{200b}+|\u{200c}+|\u{200d}/gu;
let string;
if (slug) {
string = url.split('/').pop().trim();
console.log(`Old Length of slug: ${string.length}`);
} else {
string = url;
console.log(`Old Length of url: ${string.length}`);
}
let clean = string.replace(regex, rpl);
console.log(`New Length: ${clean.length}`);
return clean;
}
getSlug(u200b);
getSlug(ufeff);
getSlug(u200c);
getSlug(u200d);