5

I am trying to get the data between the html (span) provided (in this case 31)

Here is the original code (from inspect elements in chrome)

<span id="point_total" class="tooltip" oldtitle="Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again." aria-describedby="ui-tooltip-0">31</span>

I have a rich textbox which contains the source of the page, here is the same code but in line 51 of the rich textbox:

<DIV id=point_display>You have<BR><SPAN id=point_total class=tooltip jQuery16207621750175125325="23" oldtitle="Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again.">17</SPAN><BR>Points </DIV><IMG style="FLOAT: right" title="Gain subscribers" border=0 alt="When people subscribe to you, you lose a point" src="http://static.subxcess.com/images/page/decoration/remove-1-point.png"> </DIV>

How would I go about doing this? I have tried several methods and none of them seem to work for me.

I am trying to retrieve the point value from this page: http://www.subxcess.com/sub4sub.php The number changes depending on who subs you.

Jonathan Leffler
  • 666,971
  • 126
  • 813
  • 1,185
  • You can add a "runat=server" to your span and get the inner text if you need to access it in your codebehind. – Tim Jun 25 '12 at 16:15
  • Is a jquery solution okay with you? – Dexter Huinda Jun 25 '12 at 16:16
  • var yourdata = $('span').html(); – Dexter Huinda Jun 25 '12 at 16:17
  • The program I have written is in C#, as I am fairly new to it please could you explain what you mean by jquery solution? I have tried a few Regex methods i found online, i have also tried to use the HTMLAgility library to find the string – Connor Spencer Harries Jun 25 '12 at 16:17
  • Where are you trying to access this data? In the c# code in the codebehind or actively on the client side? – Tony Jun 25 '12 at 16:18
  • jquery is client-side scripting language. I dunno if you require a server-side solution. – Dexter Huinda Jun 25 '12 at 16:19
  • I am trying to get my program to detect it (using a code in the program rather than on the server as i can't access that (not mine)) My program has a label like this "Points:" I am trying to get the label to say "Points: *VALUE*" Value = 39 in my case – Connor Spencer Harries Jun 25 '12 at 16:21
  • If you require a client-side solution, use jQuery, it's much easier. `var yourdata = $('#point_total').html();` – Dexter Huinda Jun 25 '12 at 16:26

3 Answers3

10

You could be incredibly specific about it:

var regex = new Regex(@"<span id=""point_total"" class=""tooltip"" oldtitle="".*?"" aria-describedby=""ui-tooltip-0"">(.*?)</span>");

var match = regex.Match(@"<span id=""point_total"" class=""tooltip"" oldtitle=""Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again."" aria-describedby=""ui-tooltip-0"">31</span>");

var result = match.Groups[1].Value;
Dave Bish
  • 17,987
  • 6
  • 40
  • 60
10

You'll want to use HtmlAgilityPack to do this, it's pretty simple:

HtmlDocument doc = new HtmlDocument();
doc.Load("filepath");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//span"); //Here, you can also do something like (".//span[@id='point_total' class='tooltip' jQuery16207621750175125325='23' oldtitle='Note: If the number is black, your points are actually a little bit negative.  Don't worry, this just means you need to start subbing again.']"); to select specific spans, etc...

string value = node.InnerText; //this string will contain the value of span, i.e. <span>***value***</span>

Regex, while a viable option, is something you generally would want to avoid if at all possible for parsing html (see Here)

In terms of sustainability, you'll want to make sure that you understand the page source (i.e., refresh it a few times and see if your target span is nested within the same parents after every refresh, make sure the page is in the same general format, etc..., then navigate to the span using the above principle).

Majid
  • 12,271
  • 14
  • 71
  • 107
gfppaste
  • 1,040
  • 4
  • 16
  • 48
  • This code works for me other than the fact it keeps displaying the same number no matter what the actual value is – Connor Spencer Harries Jun 25 '12 at 16:47
  • Are you making sure to reload the page source? – gfppaste Jun 25 '12 at 16:49
  • yes, i have a timer set to refresh the source every 5 seconds – Connor Spencer Harries Jun 25 '12 at 16:50
  • 1
    Thanks for redirecting me back to the HTML Agility Pack, something I've used in the past. I just added the reference and used your code as a starting point in my own project. The only thing I'd like to add for any other late-comers is to make sure and specify the type to prevent ambiguity. I noticed that '.Load()' would not compile until I changed to 'HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();' – Justin May 08 '13 at 23:01
1

There are multiple possibilities.

  1. Regex
  2. Let HTML be parsed as XML and get the value via XPath
  3. Iterate through all elements. If you get on a span tag, skip all characters until you find the closing '>'. Then the value you need is everything before the next opening '<'

Also look at System.Windows.Forms.HtmlDocument

nhahtdh
  • 52,949
  • 15
  • 113
  • 149