-1

I've HTML code like this:

<tr class="discussion r0"><td class="topic starter"><a href="SITE?d=6638">Test di matematica</a></td>

I need to only select "Test di matematica" and I think to do this with Regular Expression. I tried with:

 string pattern= "<tr class=\"discussion r0\"><td class=\"topic starter\"><a href=\"" + site + "=d{1,4}\"" + ">\\s*(.+?)\\s*</a></td>";

but it doesn't works.. what I can do for selecting words after expression and before other expression?

EDIT: Can you tell me how can I do with HTMLAgility to parse this string? Thanks.

user3579313
  • 83
  • 1
  • 6

2 Answers2

0

Try this:

string myString = "<tr class=\"discussion r0\"><td class=\"topic starter\"><a href=\"SITE?d=6638\">Test di matematica</a></td>";
Regex rx = new Regex(@"<a.*?>(.*?)</a>");
MatchCollection matches = rx.Matches(myString);
if (matches.Count > 0)
{
    Match match = matches[0]; // only one match in this case
    GroupCollection groupCollection = match.Groups;
    Console.WriteLine( groupCollection[1].ToString());
}

DEMO

http://ideone.com/nFY6aw

Pedro Lobito
  • 75,541
  • 25
  • 200
  • 222
0

This regex makes sure that the text we capture is inside an <a tag which is inside a <td tag which is inside a <tr tag.

using System;
using System.Text.RegularExpressions;
class Program {
static void Main() {

string s1 = "<tr class=\"discussion r0\"><td class=\"topic starter\"><a href=\"SITE?d=6638\">Test di matematica</a></td>";
var r = new Regex(@"(?i)<tr[^>]*?>\s*<td[^>]*?>\s*<a[^>]*?>([^<]*)<", RegexOptions.IgnoreCase);
string capture = r.Match(s1).Groups[1].Value;
Console.WriteLine(capture);
Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program

The Output: Test di matematica

zx81
  • 38,175
  • 8
  • 76
  • 97