Remove HTML tags in script

Question

I've found this piece of code on the internet. It takes a sentence and makes every single word into link with this word. But it has weak side: if a sentence has HTML in it, this script doesn't remove it.

For example: it replaces 'asserted' with 'http://www.merriam-webster.com/dictionary/asserted'

Could you please tell me what to change in this code for it to change 'asserted' to 'http://www.merriam-webster.com/dictionary/asserted'.

var content = document.getElementById("sentence").innerHTML;

var punctuationless = content.replace(/[.,\/#!$%\؟^?&\*;:{}=\-_`~()”“"]/g, "");
var mixedCase = punctuationless.replace(/\s{2,}/g);
var finalString = mixedCase.toLowerCase();

var words = (finalString).split(" ");

var punctuatedWords = (content).split(" ");

var processed = "";
for (i = 0; i < words.length; i++) {
    processed += "<a href = \"http://www.merriam-webster.com/dictionary/" + words[i] + "\">";
    processed += punctuatedWords[i];
    processed += "</a> ";
}

document.getElementById("sentence").innerHTML = processed;

Possible duplicate of [JavaScript: How to strip HTML tags from string?](http://stackoverflow.com/questions/5002111/javascript-how-to-strip-html-tags-from-string) — evolutionxbox, Oct 18 '16 at 09:34
You might want to escape the `.` in your regexp like `\.`, like you do with `\-` for instance. — Azamantes, Oct 18 '16 at 09:37
Possible duplicate of [Strip HTML from Text JavaScript](http://stackoverflow.com/questions/822452/strip-html-from-text-javascript) — Andreas, Oct 18 '16 at 09:41

milo.farrell · Accepted Answer · 2016-10-18T15:13:24.160

5

This regex /<{1}[^<>]{1,}>{1}/g should replace any text in a string that is between two of these <> and the brackets themselves with a white space. This

  var str = "<hi>How are you<hi><table><tr>I<tr><table>love cake<g>"
  str = str.replace(/<{1}[^<>]{1,}>{1}/g," ")
  document.writeln(str);

will give back " How are you I love cake".

If you paste this

var stripHTML = str.mixedCase(/<{1}[^<>]{1,}>{1}/g,"")

just below this

var mixedCase = punctuationless.replace(/\s{2,}/g);

and replace mixedCase with stripHTML in the line after, it will probably work

edited Oct 18 '16 at 15:13

answered Oct 18 '16 at 10:23

milo.farrell

621
6
15

Thank you a lot for your efforts to help me. That was really valuable. – Максим Ціпан Oct 19 '16 at 12:20

score 1 · Answer 2 · answered Oct 18 '16 at 09:37

1

function stripAllHtml(str) {
  if (!str || !str.length) return ''

  str = str.replace(/<script.*?>.*?<\/script>/igm, '')

  let tmp = document.createElement("DIV");
  tmp.innerHTML = str;

  return tmp.textContent || tmp.innerText || "";
}

stripAllHtml('<a>test</a>')

This function will strip all the HTML and return only text.

Hopefully, this will work for you

answered Oct 18 '16 at 09:37

Gaurav joshi

1,685
11
26

Thank you a lot for your answer. Would you please tell me where to place this function in the script? – Максим Ціпан Oct 18 '16 at 09:50
@МаксимЦіпан Anywhere you want just pass your HTML string to this function and use the returned result while making your URLs – Gaurav joshi Oct 18 '16 at 10:29
@gaurav joshi Thank you a lot for you help and patience. – Максим Ціпан Oct 19 '16 at 12:19
@МаксимЦіпан If this works for you then please add right Flag or upvote my answer – Gaurav joshi Oct 20 '16 at 05:52

score 0 · Answer 3 · answered Sep 17 '20 at 11:56

0

if you need to remove HTML tags And HTML Entities You can use

const text = '<p>test content </p><p><strong>test bold</strong>&nbsp;</p>'
text.replace(/<[^>]*(>|$)|&nbsp;|&zwnj;|&raquo;|&laquo;|&gt;/g, '');

the result will be "test content test bold"

answered Sep 17 '20 at 11:56

HusseiELbadady

124
5

Remove HTML tags in script

3 Answers3