0

i'm trying to get the content between two tags with a condition.

So here is a example, i want to get all tr's but only the ones with /c in it.

example string:

<tr><td> /a </td></tr>
<tr><td> /b </td></tr>
<tr><td> /c </td></tr>
<tr><td> /d </td></tr>

using what i got so far i get all tr's

preg_match_all("/<tr.*<\/tr>/", $input_lines, $output_array);

what do i need to do to get this working?

thank you

  • ***[Obligatory link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454).*** – Sam Jun 04 '14 at 21:09
  • @Sam [Obligatory link](http://stackoverflow.com/q/4231382/471272): please link only to answers, not to non-answers as you have done here. – tchrist Jun 08 '14 at 19:59
  • @tchrist mind linking me to a Meta Q/A where the reason for this is outlined? I didn't link to a specific answer to OP's question, but it is a very (in my opinion) obligatory resource for OP to read regarding HTML parsing with RegEx. Since it wasn't an answer, I commented. I link to [this article](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/) when users set out to validate name inputs..should I not be doing this either? – Sam Jun 08 '14 at 20:23
  • @Sam [Here](http://meta.stackexchange.com/search?q=zalgo) and [here](http://meta.stackoverflow.com/a/250100/471272). Attacking small snippets of HTML using pattern matching is trivial and common, since we do it in our editors all the time. Telling people not to do that is like telling them not use *vi* or *emacs* on an HTML file: it’s complete nonsense. Rather, it’s the full spec-compliant parsing of a full webpage of unknown provenance wherein lies the rub. – tchrist Jun 08 '14 at 20:31
  • Fair enough @tchrist, thank you for the links..gives me a better understanding of your reasoning. – Sam Jun 09 '14 at 14:23

1 Answers1

0
preg_match_all("/<td>([^<]*/c[^<]*)<\/td>/", $input_lines, $output_array);

in other words: Whenever the regex will encounter < without having found a /c - it will fail.

if other html tags could be inside the td, you need to replace [^<]* with .*?.

<td>([^<]*/c[^<]*)<\/td>

Regular expression visualization

Debuggex Demo

Note: using .*? can lead to unwanted, greedy results see this example:

<td>(.*?/c.*?)<\/td>

Regular expression visualization

Debuggex Demo

dognose
  • 18,985
  • 9
  • 54
  • 99