62

I'm using a simple PHP library to add documents to a SOLR index, via HTTP.

There are 3 servers involved, currently:

  1. The PHP box running the indexing job
  2. A database box holding the data being indexed
  3. The solr box.

At 80 documents/sec (out of 1 million docs), I'm noticing an unusually high interrupt rate on the network interfaces on the PHP and solr boxes (2000/sec; what's more, the graphs are nearly identical -- when the interrupt rate on the PHP box spikes, it also spikes on the Solr box), but much less so on the database box (300/sec). I imagine this is simply because I open and reuse a single connection to the database server, but every single Solr request is currently opening a new HTTP connection via cURL, thanks to the way the Solr client library is written.

So, my question is:

  1. Can cURL be made to open a keepalive session?
  2. What does it take to reuse a connection? -- is it as simple as reusing the cURL handle resource?
  3. Do I need to set any special cURL options? (e.g. force HTTP 1.1?)
  4. Are there any gotchas with cURL keepalive connections? This script runs for hours at a time; will I be able to use a single connection, or will I need to periodically reconnect?
Frank Farmer
  • 35,103
  • 11
  • 67
  • 86
  • 3
    Well i have used it where we were parsing a whole site with many many pages that required authentication and maintain a session throughout. Using the inital handle resource you can continue to execute commands to get pages and maintain the same session and connection with the client. Using the command line this has lasted for approximately 20min (for all our data requirements - so could last longer) without needing to reconnect. But i'm not sure if this is what you're asking thus it's a comment and not an answer :) – Shadi Almosri Jun 09 '09 at 23:28
  • Another note, often there are options that you will need to set depending on what you're doing and the server you are connecting to. All of this is well documented here: http://uk3.php.net/manual/en/function.curl-setopt.php – Shadi Almosri Jun 09 '09 at 23:30
  • 3
    This portion of the FAQ is relevant, albeit not terribly detailed: http://curl.haxx.se/docs/faq.html#Can_I_perform_multiple_requests – Frank Farmer Jun 09 '09 at 23:40
  • 2
    One gotcha I ran into: after making something on the order of 100,000 requests via a single curl handle, my script hit 512 meg of memory usage; it never went over 60 MB before I started reusing connections. I'm now reconnecting every 1000 requests (which is probably more often than necessary, but infrequent enough that connection overhead should be very small) – Frank Farmer Jun 10 '09 at 17:42
  • 3
    There's also: CURLOPT_MAXCONNECTS - The maximum amount of persistent connections that are allowed. When the limit is reached, CURLOPT_CLOSEPOLICY is used to determine which connection to close. – David d C e Freitas Apr 02 '11 at 21:35

4 Answers4

56

cURL PHP documentation (curl_setopt) says:

CURLOPT_FORBID_REUSE - TRUE to force the connection to explicitly close when it has finished processing, and not be pooled for reuse.

So:

  1. Yes, actually it should re-use connections by default, as long as you re-use the cURL handle.
  2. by default, cURL handles persistent connections by itself; should you need some special headers, check CURLOPT_HTTPHEADER
  3. the server may send a keep-alive timeout (with default Apache install, it is 15 seconds or 100 requests, whichever comes first) - but cURL will just open another connection when that happens.
floww
  • 1,998
  • 11
  • 24
Piskvor left the building
  • 87,797
  • 43
  • 170
  • 220
  • 1
    Brilliant! I was this close to posting my first stackoverflow question. This solution worked for our middleware provided we added the request header 'Connection: close'. – renevanderark Jul 23 '14 at 08:54
23

Curl sends the keep-alive header by default, but:

  1. create a context using curl_init() without any parameters.
  2. store the context in a scope where it will survive (not a local var)
  3. use CURLOPT_URL option to pass the url to the context
  4. execute the request using curl_exec()
  5. don't close the connection with curl_close()

very basic example:

function get($url) {
    global $context;
    curl_setopt($context, CURLOPT_URL, $url);
    return curl_exec($context);
}

$context = curl_init();
//multiple calls to get() here
curl_close($context);
Dr. Rajesh Rolen
  • 13,143
  • 39
  • 98
  • 173
Richard Keizer
  • 247
  • 2
  • 2
  • You also need to set cookie before second call, something like: `curl_setopt($context, CURLOPT_COOKIE, 'name=value');` for example for my request is `curl_setopt($context, CURLOPT_COOKIE, 'PHPSESSID=bl392rgi8q664l7faat33hfta4');` – Malus Jan Oct 26 '17 at 19:02
14
  1. On the server you are accessing keep-alive must be enabled and maximum keep-alive requests should be reasonable. In the case of Apache, refer to the apache docs.

  2. You have to be re-using the same cURL context.

  3. When configuring the cURL context, enable keep-alive with timeout in the header:

    curl_setopt($curlHandle, CURLOPT_HTTPHEADER, array(
        'Connection: Keep-Alive',
        'Keep-Alive: 300'
    ));
    
Oleg Barshay
  • 3,718
  • 3
  • 19
  • 20
  • 2
    Frank, I just re-tested my code and it looks to be on by default. Couldn't hurt to set it explicitly though. – Oleg Barshay Nov 17 '09 at 02:27
  • 1
    @OlegBarshay do you know if we need to remove `curl_close($curlHandle);` in order to keep alive the conn. ? – zeflex Jun 01 '15 at 00:33
  • 1
    @zeflex yes, you have to remove it, if you call `curl_close` the connection will be closed – Grain Feb 10 '17 at 11:14
1

If you don't care about the response from the request, you can do them asynchronously, but you run the risk of overloading your SOLR index. I doubt it though, SOLR is pretty damn quick.

Asynchronous PHP calls?

Community
  • 1
  • 1
Brent
  • 21,918
  • 10
  • 42
  • 49
  • That's certainly interesting, but it doesn't address connection re-use at all. In fact, it would only make my connection overhead issues worse. – Frank Farmer Jun 10 '09 at 17:43