1

I've found this piece of code on the internet. It takes a sentence and makes every single word into link with this word. But it has weak side: if a sentence has HTML in it, this script doesn't remove it.

For example: it replaces '<b>asserted</b>' with 'http://www.merriam-webster.com/dictionary/<b>asserted</b>'

Could you please tell me what to change in this code for it to change '<b>asserted</b>' to 'http://www.merriam-webster.com/dictionary/asserted'.

var content = document.getElementById("sentence").innerHTML;

var punctuationless = content.replace(/[.,\/#!$%\؟^?&\*;:{}=\-_`~()”“"]/g, "");
var mixedCase = punctuationless.replace(/\s{2,}/g);
var finalString = mixedCase.toLowerCase();

var words = (finalString).split(" ");

var punctuatedWords = (content).split(" ");

var processed = "";
for (i = 0; i < words.length; i++) {
    processed += "<a href = \"http://www.merriam-webster.com/dictionary/" + words[i] + "\">";
    processed += punctuatedWords[i];
    processed += "</a> ";
}

document.getElementById("sentence").innerHTML = processed;
Al.G.
  • 3,929
  • 6
  • 32
  • 52
  • 1
    Possible duplicate of [JavaScript: How to strip HTML tags from string?](http://stackoverflow.com/questions/5002111/javascript-how-to-strip-html-tags-from-string) – evolutionxbox Oct 18 '16 at 09:34
  • You might want to escape the `.` in your regexp like `\.`, like you do with `\-` for instance. – Azamantes Oct 18 '16 at 09:37
  • 2
    Possible duplicate of [Strip HTML from Text JavaScript](http://stackoverflow.com/questions/822452/strip-html-from-text-javascript) – Andreas Oct 18 '16 at 09:41

3 Answers3

5

This regex /<{1}[^<>]{1,}>{1}/g should replace any text in a string that is between two of these <> and the brackets themselves with a white space. This

  var str = "<hi>How are you<hi><table><tr>I<tr><table>love cake<g>"
  str = str.replace(/<{1}[^<>]{1,}>{1}/g," ")
  document.writeln(str);

will give back " How are you I love cake".

If you paste this

var stripHTML = str.mixedCase(/<{1}[^<>]{1,}>{1}/g,"")

just below this

var mixedCase = punctuationless.replace(/\s{2,}/g);

and replace mixedCase with stripHTML in the line after, it will probably work

milo.farrell
  • 621
  • 6
  • 15
1
function stripAllHtml(str) {
  if (!str || !str.length) return ''

  str = str.replace(/<script.*?>.*?<\/script>/igm, '')

  let tmp = document.createElement("DIV");
  tmp.innerHTML = str;

  return tmp.textContent || tmp.innerText || "";
}

stripAllHtml('<a>test</a>')

This function will strip all the HTML and return only text.

Hopefully, this will work for you

Gaurav joshi
  • 1,685
  • 11
  • 26
0

if you need to remove HTML tags And HTML Entities You can use

const text = '<p>test content </p><p><strong>test bold</strong>&nbsp;</p>'
text.replace(/<[^>]*(>|$)|&nbsp;|&zwnj;|&raquo;|&laquo;|&gt;/g, '');

the result will be "test content test bold"