0

Below I have the following Regex Expression which I am using to remove the HTML tags from a string object:

(?:<style.+?>.+?</style>|<script.+?>.+?</script>|<(?:!|/?[a-zA-Z]+).*?/?>)

This expression works well... a little too well. I want to exclude the HTML comment tags like:

  • <!--/ nav -->
  • <!--end nav-->
  • <!-- subnavup -->
  • <!--/ subnavup -->

Not specifically these examples, but all HTML comment tags. What changes to my Regex would I need to do in order to accomplish this?

admdrew
  • 3,600
  • 4
  • 22
  • 39
  • 2
    3..2..1 htmlagilitypack. – Uwe Keim Jun 16 '14 at 18:04
  • 3
    Rule 1: don't use RegEx to parse HTML. Rule 2: if you still want to parse HTML with RegEx, see rule 1. [RegEx can only match regular languages, and HTML is not a regular language](http://stackoverflow.com/a/590789/930393) – freefaller Jun 16 '14 at 18:05
  • 2
    @UweKeim - Does HtmlAgilityPack handle comments usefully? Or does it ignore them entirely because they're *comments*? – Bobson Jun 16 '14 at 18:08
  • @Bobson http://stackoverflow.com/questions/13441470/htmlagilitypack-remove-script-and-style – L.B Jun 16 '14 at 18:18
  • Apart from what the other have said... What you're trying to do is explained in detail in this question about [matching pattern x but excluding y](http://stackoverflow.com/questions/23589174/match-or-replace-a-pattern-except-in-situations-s1-s2-s3-etc/23589204#23589204), have a look. – zx81 Jun 16 '14 at 18:21
  • To follow up on @L.B's vague comment, the very last comment on the answer to that question shows that HtmlAgilityPack will support comments and assigns them a `.Name` of `"#comment"`. – Bobson Jun 16 '14 at 18:23
  • Maybe I am missing something, but how in the world is this a duplicate of that other post? This post is asking about general regex use and the other is asking about extracting text using C#? – Jake Wilson Dec 22 '17 at 16:26

1 Answers1

-1

Try:

(<!--[\S\s]+?-->)

I didn't test it, but I'm quite sure it works.

admdrew
  • 3,600
  • 4
  • 22
  • 39
Hermios
  • 584
  • 1
  • 3
  • 18
  • 2
    In my opinion, this is a wrong answer. I'm quite sure. – Uwe Keim Jun 16 '14 at 18:16
  • Well, your answer misses some argues. I know that Html shouldn't be parsed with regex. Since this case looks for specific start and end of sentences, this ist just here possible... If not, please develop your idea. – Hermios Jun 16 '14 at 18:21
  • admdrew, Your code finds all of the elements I am trying to exclude. I need this incorporated into my original regex expression to exclude all the comment tags from getting matched. – jordanburnam1990 Jun 16 '14 at 18:24
  • Hi, Actually, I'm not admdrew, but Hermios. Whatever. Then I confirm the other opinion -> Don't parse html with regex! – Hermios Jun 16 '14 at 18:26