-2

I'm new to regex and I'm not able to do what I need.

Let's suppose we have this text:

<h1>Título</h1>
<h2>Los gatos felices</h2>
Existen una serie de gatos...
<h2 style="color:red" class="grande">los gatos: curiosidades</h2>
<p style='text-align: justify;' align='justify'>De por si 
<strong>los gatos</strong> saben saltar y además 
<strong>los perros odian a los gatos</strong>
</p>

And I need to get all tags that contains the "los gatos" text.

It should match 4 coincidences:

- <h2>Los gatos felices</h2>

- <h2 style="color:red" class="grande">los gatos: curiosidades</h2>

- <strong>los gatos</strong>

- <strong>los perros odian a los gatos</strong>

How can I solve it with a regular expression?

Edit:

I finally found what I need! I share it for anyone who might need it:

<(.*)([^<]*)>([^<]*)los gatos([^<]*)<\/\1>
halfer
  • 18,701
  • 13
  • 79
  • 158
Dalamar
  • 37
  • 1
  • 7
  • 2
    You [do NOT solve this with a regex](http://stackoverflow.com/a/1732454/3764814) - use a HTML parser. Try AngleSharp for instance. – Lucas Trzesniewski Oct 13 '16 at 18:29
  • Hi, First of all, this question is not duplicated. What I need is not to get

    tags. I need to get all tags that contains "los gatos" text. In this case would be

    tags, but may be more tags like

    , ...

    – Dalamar Oct 14 '16 at 06:25
  • @Dalamar, Lucas is right - regular expressions are not the right tool to parse HTML. – halfer Oct 31 '16 at 19:15

1 Answers1

1

Instead of Regex use a real Html parser like HtmlAgilityPack

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(yourhtmlstring);

var h2s = doc.DocumentNode.SelectNodes("//h2").Select(x => x.InnerText).ToList();
L.B
  • 106,644
  • 18
  • 163
  • 208