0

I have a text file filled with a list of IDs. Using PHP, I am trying to load a url for each of the IDs and pull something from that page (another id)

For example, if I have the IDs 555, 888, 222 I want to load the URLs

http://example.edu/bvl.P_Sel?facultyID=555

http://example.edu/bvl.P_Sel?facultyID=888

http://example.edu/bvl.P_Sel?facultyID=222

I tried to get the content via
file_get_contents("http://example.edu/bvl.P_Sel?facultyID=$lines[0]");

where $lines is an array of the IDs. This returns the following error:

Warning: file_get_contents(http://example.edu/bvl.P_Sel?facultyID=222)    [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 

That url is an example, but the url in the error does work when I visit it manually. And if I replace the file_get_contents variable with an actual ID, like ?facultyID=222, it works perfectly.

I visited this question's answer How to post data in PHP using file_get_contents? and tried assigning a variable in the $postdata array to a variable, and I get the same exact error only with ?facultyID=XXX removed from the error message's url.

My implementation of the latter is here.

Community
  • 1
  • 1
Rob
  • 275
  • 1
  • 4
  • 15
  • What does print_r($lines) show? – Captain Insaneo Mar 07 '12 at 21:10
  • Just tried it. The same error message appears with the id appended on the error message's URL – Rob Mar 07 '12 at 21:13
  • Do you hav `allow_url_fopen = On` in php.ini? – erm410 Mar 07 '12 at 21:13
  • `print_r($lines)` shows `Array ( [0] => 813667 [1] => 1124279 [2] => 760643 [3] => 668461 [4] => 2868 [5] => 33613 )` Which are the correct IDs – Rob Mar 07 '12 at 21:13
  • On the off chance that there's something extra on those values, can you `urlencode` the IDs and try again? – enygma Mar 07 '12 at 21:15
  • `allow_url_fopen = On` is on my php.ini, yes. I am not familiar with `urlencode`, but I will look into it now – Rob Mar 07 '12 at 21:15
  • When using urlencode, it turns the error message url from `?facultyID=813667` to `?facultyID=813667%0D%0A`. Not sure what to make out of this, does this mean that there are extra values being appended to the id? – Rob Mar 07 '12 at 21:22
  • Are the pages you are retrieving behind a log in page? In your browser a valid session would allow you to access the page, but the PHP script won't have the session open so might hit a 404? – cgwyllie Mar 07 '12 at 21:23
  • There is a loginpage, but I found that when I clear my cookies and visit the links with the ID variables appended, no login is required. If PHP visits the link when I replace the variable `$lines[0]` with actual IDs, it works fine. The problem is I need to use variables because I need to visit a ton of ids – Rob Mar 07 '12 at 21:24
  • 1
    Ok, hope you can figure this out :-) – cgwyllie Mar 07 '12 at 21:25

3 Answers3

2

Those encoded characters when you use the urlencode function (%0D%0A) are a new line, so maybe you array of id's have them in each element. Try this:

// your code to generate the lines array
file_get_contents("http://example.edu/bvl.P_Sel?facultyID=" . trim($lines[0]));
Eduardo Reveles
  • 2,055
  • 16
  • 14
  • I was just working on using str_replace to do that and those newline things were still appearing. Thanks, I love stackoverflow, haha. – Rob Mar 07 '12 at 21:52
1
$lines = array(813667,1124279,760643,668461,2868,33613);
print_r($lines);

output:

Array ( [0] => 813667 [1] => 1124279 [2] => 760643 [3] => 668461 [4] => 2868 [5] => 33613 )

this:

foreach($lines as $key => $value):
echo '<pre>';
print_r($lines[$key]);
endforeach;

output:

    813667
1124279
760643
668461
2868
33613

and this: $get = file_get_contents("http://example.edu/bvl.P_Sel?facultyID=$lines[0]"); print_r($get);

output:

  Example Domains
  As described in RFC 2606,
    we maintain a number of domains such as EXAMPLE.COM and EXAMPLE.ORG
    for documentation purposes. These domains may be used as illustrative
    examples in documents without prior coordination with us. They are 
    not available for registration.

What is wrong? :) it's what you need?

Crsr
  • 616
  • 3
  • 9
  • Thanks a lot for your help. I should have just gave the direct link i was using. it seems that it works for example.com but not for the actual domain i was visiting – Rob Mar 07 '12 at 21:53
0

Try to use CURL for scraping and also post data as it is more powerful and more advanced.