Get text from Web page via PHP

Question

I'm trying to get the text from a certain URL, this text is located between two texts. For example :

<td >Item number:</td><td >**GX12033111**</td></tr>

I need to get the GX12033111 part,

I've tried this :

<?php
$file_string = file_get_contents('LINK GOES HERE');
preg_match('/<td >Item number:</td><td >(.*)<\/td><\/tr>/i', $file_string, $title);
$title_out = $title[1];
echo $title_out ;
?>

But it doesn't work.

yes there are millions, on this new crazy thing called the internet, heard of it? — , Apr 09 '13 at 20:47
sure i should just repeat something that has already been posted on S.O a few thousand times, and the internet millions because you are to lazy to do a simple search. — , Apr 09 '13 at 20:55

score 2 · Accepted Answer · answered Apr 09 '13 at 20:48

2

Try using:

preg_match('@<td >Item number:</td><td >([^<]+)</td></tr>@i', $file_string, $title);

answered Apr 09 '13 at 20:48

Ozzy

9,483
25
86
132

2

down vote for using a regular expression to scrape html – Apr 09 '13 at 20:49
Works perfectly :) Thanks – Yosef Naser Apr 09 '13 at 20:50
@Dagon : Not everyone goes by your standards, this code works, that's what matters the most. – Yosef Naser Apr 09 '13 at 20:51
1

@YosefNaser Please be advised this is a BAD idea. – DavidScherer Apr 09 '13 at 20:51
@YosefNaser hardly my standard, its called best practice, you ignore it only when you understand it in the first place, you clearly don't – Apr 09 '13 at 20:53
Even though I don't opt to up/down vote the answer, I second @Dagon's words. I had the same need to pick some text between HTMML tags a wild regex did the trick... until I end up messing up with some badly formatted HTML put all the rest of the page in a text area. – AKS Apr 09 '13 at 20:56
@AyeshK That means you're doing it wrong ... – HamZa Apr 09 '13 at 20:59
@DavidScherer : What do you mean ? the code's output matches my needs. – Yosef Naser Apr 09 '13 at 21:01
@Dagon This is a "correct" answer, and I don't see why you need to downvote, ok I agree with you that "the best practice" is to use DOM, but a good written regex may be enough for a simple match. http://stackoverflow.com/questions/4231382/regular-expression-pattern-not-matching-anywhere-in-string/4234491#4234491 – HamZa Apr 09 '13 at 21:02
@Dagon : Oh, if you lose humbleness, that's the start of your way downhill. – Yosef Naser Apr 09 '13 at 21:03
@HamZaDzCyberDeV yes I **did** it wrong. – AKS Apr 09 '13 at 21:04
2

@YosefNaser "It works" and "Best Solution" are two different things. Will the regex work if in a year "Item Number:" is changed to read "Item Number" (No colon)? Or what if it changes in some other way? Using the DOM gives you a more stable aproach IMO. And while, yes, it's "fine" to use REGEX here, It's just not typically advised. I personally try to avoid REGEX anytime it's possible, most importantly because it's hard to read and understand and if anyone has to take over my work at any point, and they're not a REGEX guru, they're going to have problems. It's kinda like the ternary paradox. – DavidScherer Apr 09 '13 at 21:13
@DavidScherer : Thank you very much sir, I respect your way of explaining this. In my case, there is no chance for the prefix and suffix part to change, so it fits. I don't have any programming background, but I search for certain tasks as I need. Thank you sir. – Yosef Naser Apr 09 '13 at 21:17

score 2 · Answer 2 · answered Apr 09 '13 at 20:49

2

You'll want to use PHP's DOM document http://php.net/manual/en/book.dom.php

With this you can DOM::loadHTML(file_get_contents("URL")); and then you can DOM::getElementsByTagName("td");

answered Apr 09 '13 at 20:49

DavidScherer

814
1
13
26

Get text from Web page via PHP

2 Answers2