It is considered better to parse HTML with a proper DOM parser than to use regular expressions on it, so I'll give that solution first:
1. With DOMDocument
Use DOMDocument in combination with DOMXPath for this.
Here is code that only gets the content of the third column, which contains date/times:
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//td[3]');
$matches = array_map(function($td) {
return $td->textContent;
}, iterator_to_array($elements));
This code will do an XPath query, finding td elements in the given HTML, that are the third child of their respective parent (tr), and then it maps the text content of each found td into an array.
If the $html variable has this string:
<table width="100%" border="0" cellspacing="0" cellpadding="0" id="facturas">
<tr>
<td align="center">13.44.333-3</td>
<td align="center">asdf3</td>
<td align="center">15/01/2016 00:22:16</td>
<td align="center">$ 1531</td>
</tr>
<tr>
<td align="center">13.333.333-3</td>
<td align="center">asdf3</td>
<td align="center">16/01/2016 00:22:16</td>
<td align="center">$ 1531</td>
</tr>
<tr>
<td align="center">13.333.333-3</td>
<td align="center">asdf3</td>
<td align="center">11/01/2015 00:22:16</td>
<td align="center">$ 1531</td>
</tr>
</table>
Then $matches will be the following array:
array (
'15/01/2016 00:22:16',
'16/01/2016 00:22:16',
'11/01/2015 00:22:16',
)
See the code run with output on eval.in.
Some alternative XPath queries:
If the $html could have other tables, you should limit the search to the table of interest, e.g. with id equal to facturas:
//*[@id="facturas"]//td[3]
To make sure each matched td has the align attribute set to "center":
//td[@align="center"]
To find elements that have a specific text, like "/2016":
//td[contains(., "/2016")]
2. With a Regular Expression
Although not advised, you could use a regular expression.
If you still want to go for this, then use this code:
preg_match_all("/<td[^>]*\>\s*(\d\d\/\d\d\/\d{4}\b[^<]*)<\/td\s*>/mis",
$html, $matches);
This will match td elements that contain a value that starts with text in the format "99/99/9999" (9 can be any digit).
Now $matches will be:
array (
0 =>
array (
0 => '<td align="center">15/01/2016 00:22:16</td>',
1 => '<td align="center">16/01/2016 00:22:16</td>',
2 => '<td align="center">11/01/2015 00:22:16</td>',
),
1 =>
array (
0 => '15/01/2016 00:22:16',
1 => '16/01/2016 00:22:16',
2 => '11/01/2015 00:22:16',
),
)
See the code run with output on eval.in
But note that in general text in HTML can have entities like >
(can be solved with html_entity_decode), or td elements can have <br>
or other tags inside them (can sometimes be solved with strip_tags), or tag attributes can have values that contain HTML, which could trick the regular expression. The same goes for script tags, which may have JavaScript that contains HTML strings in variables.
These are just examples. The list of things that can make such a regular expression go wrong is long. All of this is never a problem when using the DOM parser, but with regular expressions it is near impossible to get the right for all possible cases.
Solution 1 is therefore the one to go for.