php regular expression in td when have date and time

Question

i need to extract the date and time with reg exp, but doesn't work, i don't know why?

    <tr>
        <td align="center">13.44.333-3</td>
        <td align="center">asdf3</td>
        <td align="center">15/01/2016 00:22:16</td>
        <td align="center">$ 1531</td>
    </tr>
 <tr>
        <td align="center">13.333.333-3</td>
        <td align="center">asdf3</td>
        <td align="center">16/01/2016 00:22:16</td>
        <td align="center">$ 1531</td>
    </tr>
 <tr>
        <td align="center">13.333.333-3</td>
        <td align="center">asdf3</td>
        <td align="center">11/01/2015 00:22:16</td>
        <td align="center">$ 1531</td>
    </tr>

the reg exp what i use:

preg_match_all("/<td align=\"center\"\>[\s]*([^\s\<\/]*)<\/td>[\s]*<td align=\"center\"\>/is",$content, $matches, null, 0);

the result is : 11/01/2016

but i need this: 11/01/2016 11:59:49

i don't know that i'm doing wrong.

the result what i need is:

array (
  0 => 
  array (
    0 => '<td align="center">15/01/2016 00:22:16</td>
        <td align="center">',
    1 => '<td align="center">11/01/2015 00:22:16</td>
        <td align="center">',
  ),
  1 => 
  array (
    0 => '15/01/2016 00:22:16',
    1 => '11/01/2015 00:22:16',
  ),
)

You should *never* parse HTML with regex. Use [a PHP DOM parser](http://simplehtmldom.sourceforge.net/) instead. Please don't die. — Jay Blanchard, Feb 09 '16 at 22:06
i created a robot, i dont have other mode, is a client requeriment — Camilo Ortúzar, Feb 09 '16 at 22:07
Obligatory link to a famous relevant answer: http://stackoverflow.com/a/1732454/116923 — adrianbanks, Feb 09 '16 at 22:07
I do not think that here parsing with DOM parser is necessary. Just [`\b\d{2}\/\d{2}\/\d{4}\h+\d{2}:\d{2}:\d{2}\b`](https://regex101.com/r/sX1qE8/1) is enough. Unless the string in question is a part of a biii-i-g HTML file. — Wiktor Stribiżew, Feb 09 '16 at 22:08
@JayBlanchard: It is not always right, and there are better posts on SO than that one. — Wiktor Stribiżew, Feb 09 '16 at 22:10
@CamiloOrtúzar: Then you must show the whole document or at least how it can be identified in the DOM. And in that case, you will have to use a DOM parser. — Wiktor Stribiżew, Feb 09 '16 at 22:11
@JayBlanchard: [*Oh Yes You Can Use Regexes to Parse HTML!*](http://stackoverflow.com/a/4234491/3832970) — Wiktor Stribiżew, Feb 09 '16 at 22:13
i capture the code ($buffer) with phantom js, and then parse it in a mapping file. — Camilo Ortúzar, Feb 09 '16 at 22:14
So, not better, just an argument for the other side @WiktorStribiżew? Well played. — Jay Blanchard, Feb 09 '16 at 22:15
@CamiloOrtúzar: So, you need date/datetimes from the `table` with `id="facturas"`? — Wiktor Stribiżew, Feb 09 '16 at 22:16
Let me clarify: the DOM parsing is required here to get to the pertinent document part, otherwise, you might extract dates/times from other parts of the HTML document that you are not interested in. When you identify where the data you need is, you get it with DOM, and then you will still need the regex I posted in the comment above. Unless you can identify the exact position of the node containing the value. I cannot see any class or id in the TD that could help, perhaps, it is always the third child... Please check. — Wiktor Stribiżew, Feb 09 '16 at 22:23
the problem is: i have a lot of td: 13.333.333-3 asdf3 15/01/2016 00:22:16 $ 1531 — Camilo Ortúzar, Feb 10 '16 at 01:40
edited question with more info guys plz hlp me @Wiktor Stribiżew — Camilo Ortúzar, Feb 10 '16 at 01:48
thanks @wiktor. i need to lear exp reg, using only mind, not help, any book or suggestions?, for easy learning? — Camilo Ortúzar, Feb 10 '16 at 02:05
Easy learning means start with simple patterns, then try more difficult ones. Use regexone.com and test your regexes at regex101.com. Do not always rely on regexes, especially when you have HTML or any other ML. — Wiktor Stribiżew, Feb 10 '16 at 06:37

score 1 · Answer 1 · answered Feb 09 '16 at 22:30

Here's a parser/regex approach:

$html = '<tr>
                            <td align="center">13.333.333-3</td>
                            <td align="center">asdf3</td>
                            <td align="center">15/01/2016 00:22:16</td>
                            <td align="center">$ 1531</td>
                        </tr>';
$thedoc = new DOMDocument();
$thedoc->loadHTML($html);
$cells = $thedoc->getElementsByTagName('td');
foreach($cells as $cell){
    if(preg_match('~^(\d{2}/\d{2}/\d{4})\h(\d{2}:\d{2}:\d{2})$~', $cell->nodeValue, $matches)) {
         echo 'Date:' . $matches[1] . ' Time:'. $matches[2];
    }
}

PHP Demo: https://eval.in/515935
Regex101 Demo: https://regex101.com/r/sT2hD9/1

This also would allow invalid times/dates but they would have to be formatted correctly e.g. 22/22/2222 25:61:62. Depending on requirements you could make it work, also could make parts (seconds) optional, if needed. You also could group the day, month, year, hours, minutes, and seconds all separately.

score 1 · Answer 2 · edited May 23 '17 at 11:53

It is considered better to parse HTML with a proper DOM parser than to use regular expressions on it, so I'll give that solution first:

1. With DOMDocument

Use DOMDocument in combination with DOMXPath for this.

Here is code that only gets the content of the third column, which contains date/times:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//td[3]');
$matches = array_map(function($td) {
    return $td->textContent;
}, iterator_to_array($elements));

This code will do an XPath query, finding td elements in the given HTML, that are the third child of their respective parent (tr), and then it maps the text content of each found td into an array.

If the $html variable has this string:

<table width="100%" border="0" cellspacing="0" cellpadding="0" id="facturas">
<tr>
    <td align="center">13.44.333-3</td>
    <td align="center">asdf3</td>
    <td align="center">15/01/2016 00:22:16</td>
    <td align="center">$ 1531</td>
 </tr>
 <tr>
    <td align="center">13.333.333-3</td>
    <td align="center">asdf3</td>
    <td align="center">16/01/2016 00:22:16</td>
    <td align="center">$ 1531</td>
 </tr>
 <tr>
    <td align="center">13.333.333-3</td>
    <td align="center">asdf3</td>
    <td align="center">11/01/2015 00:22:16</td>
    <td align="center">$ 1531</td>
</tr>
</table>

Then $matches will be the following array:

array (
  '15/01/2016 00:22:16',
  '16/01/2016 00:22:16',
  '11/01/2015 00:22:16',
)

See the code run with output on eval.in.

Some alternative XPath queries:

If the $html could have other tables, you should limit the search to the table of interest, e.g. with id equal to facturas:

//*[@id="facturas"]//td[3]

To make sure each matched td has the align attribute set to "center":

//td[@align="center"]

To find elements that have a specific text, like "/2016":

//td[contains(., "/2016")]

2. With a Regular Expression

Although not advised, you could use a regular expression.

If you still want to go for this, then use this code:

preg_match_all("/<td[^>]*\>\s*(\d\d\/\d\d\/\d{4}\b[^<]*)<\/td\s*>/mis",
               $html, $matches);

This will match td elements that contain a value that starts with text in the format "99/99/9999" (9 can be any digit).

Now $matches will be:

array (
  0 => 
  array (
    0 => '<td align="center">15/01/2016 00:22:16</td>',
    1 => '<td align="center">16/01/2016 00:22:16</td>',
    2 => '<td align="center">11/01/2015 00:22:16</td>',
  ),
  1 => 
  array (
    0 => '15/01/2016 00:22:16',
    1 => '16/01/2016 00:22:16',
    2 => '11/01/2015 00:22:16',
  ),
)

See the code run with output on eval.in

But note that in general text in HTML can have entities like > (can be solved with html_entity_decode), or td elements can have <br> or other tags inside them (can sometimes be solved with strip_tags), or tag attributes can have values that contain HTML, which could trick the regular expression. The same goes for script tags, which may have JavaScript that contains HTML strings in variables.

These are just examples. The list of things that can make such a regular expression go wrong is long. All of this is never a problem when using the DOM parser, but with regular expressions it is near impossible to get the right for all possible cases.

Solution 1 is therefore the one to go for.

Yes, but it is not clear to me whether the OP only wants the third element's value. The `preg_match_all` statement provided in the question does not even return the date, only the first elements content. Probably an error in encoding the question. I think the intention was to get all four elements. — trincot, Feb 09 '16 at 23:00

amachree tamunoemi · Answer 3 · 2016-02-17T01:58:04.550

Have you find a solution yet, I wish to help.

<?php

$html=<<<HEREDOC
  <tr>
    <td align="center">13.44.333-3</td>
    <td align="center">asdf3</td>
    <td align="center">15/01/2016 00:22:16</td>
    <td align="center">$ 1531</td>
</tr>
<tr>
    <td align="center">13.333.333-3</td>
    <td align="center">asdf3</td>
    <td align="center">16/01/2016 00:22:16</td>
    <td align="center">$ 1531</td>
</tr>
 <tr>
    <td align="center">13.333.333-3</td>
    <td align="center">asdf3</td>
    <td align="center">11/01/2015 00:22:16</td>
    <td align="center">$ 1531</td>
</tr>
HEREDOC;

if(preg_match_all('~<td\s+[^>]*>((?:\d+(?:\/\d+){2})\s+(?:\d+(?:\:\d+){2}))<\/td>~mi',$html,$matchall)){
    print_r($matchall);
}
?>

Output will be

Array
(
[0] => Array
    (
        [0] => <td align="center">15/01/2016 00:22:16</td>
        [1] => <td align="center">16/01/2016 00:22:16</td>
        [2] => <td align="center">11/01/2015 00:22:16</td>
    )

[1] => Array
    (
        [0] => 15/01/2016 00:22:16
        [1] => 16/01/2016 00:22:16
        [2] => 11/01/2015 00:22:16
    )

)

you could hlp me with this: http://stackoverflow.com/questions/35467999/php-reg-exp-to-especific-values @amachree — Camilo Ortúzar, Feb 17 '16 at 21:35

php regular expression in td when have date and time

3 Answers3

1. With DOMDocument

Some alternative XPath queries:

2. With a Regular Expression