-2

I'm having the following implementation both in PHP and in C++ with Boost. It simply reads a file into a string, separates it by spaces (I want to be able to choose this character) and it runs on a file with 20 mln space-separated random numbers (called "spaces"):

In PHP:

<?php

$a = explode(" ", file_get_contents("spaces"));
echo "Count: ".count($a)."\n";
foreach ($a as $b) {
  echo $b."\n";
}

and in C++:

#include <boost/algorithm/string.hpp>
#include <string>
#include <vector>
#include <iostream>
#include <fstream>
#include <sstream>
#include <stdio.h>

using namespace boost;
using namespace std;

int main(int argc, char* argv[])
{
//  ifstream ifs("spaces");

//  string s ((istreambuf_iterator<char>(ifs)), (istreambuf_iterator<char>()));
  char * buffer = 0;
  long length;
  string filename = "spaces";
  FILE * f = fopen (filename.c_str(), "rb");

  if (f)
  {
    fseek (f, 0, SEEK_END);
    length = ftell (f);
    fseek (f, 0, SEEK_SET);
    buffer = (char*) malloc (length);
    if (buffer)
    {
      size_t t = fread (buffer, 1, length, f);
    }
    fclose (f);
  }
  string s(buffer, 0, length);
  vector <string> v;

  split(v, s, is_any_of(" "));

  cout << "Count: " << v.size() << endl;

  for (int i = 0; i < v.size(); i++) {
    cout << v[i] << endl;
  }

}

I compiled it with g++ split.cpp -O2 -o split and running it consistently takes 4.5 seconds on my system and PHP7 4.2 seconds. How can PHP be 8% faster than C++?

Niels
  • 466
  • 3
  • 19
  • 4
    That is something we cannot answer because there are many variables involved in the process - including your current environment. We would only be guessing. – Jay Blanchard Sep 29 '17 at 14:51
  • Try reading the file the C++ way using [ifstream](https://stackoverflow.com/questions/116038/what-is-the-best-way-to-read-an-entire-file-into-a-stdstring-in-c) – rustyx Sep 29 '17 at 14:54
  • I did that, like in the 2 commented lines, it gave the same result. – Niels Sep 29 '17 at 14:56
  • 2
    Because there's only 8% as much code? (jk, I have no idea.) – Don't Panic Sep 29 '17 at 14:57
  • 3
    One thing about your C++ code is you don't pre-allocate space in `v`. If you know how many elements there will be then you should `reserve` that space which will save **a lot** of copies that your current approach has. – NathanOliver Sep 29 '17 at 15:01
  • 1
    Also, I'm not sure what compiler you are using but you should specify `-std=c++11` or `-std=c++14` to make sure move semantics is turned on. This can also really improve performance. – NathanOliver Sep 29 '17 at 15:05
  • But how does PHP solve this behind the scenes... I'm thinking the amount of code has not anything to do with the difference.. as PHP is written in C as well... – Niels Sep 29 '17 at 15:05
  • 3
    .3 seconds is not a meaningful difference for such a small test. This could easily be explained by caching, I/O activity, or CPU activity. – Jim V Sep 29 '17 at 15:26
  • 3
    The `endl` inside a loop probably doesn't help either -- lots of unnecessary flushing. – Dan Mašek Sep 29 '17 at 16:18
  • 2
    @Dan Masek: The endl comment did the trick. Putting down the C++ time to 2.2 seconds. Thanks for that, so for me, despite all the downvotes, this was very helpful because I just couldn't imagine PHP is faster. – Niels Sep 29 '17 at 16:36
  • @NathanOliver : The use of std=c++14 helped a bit, what is the default, how can I check? – Niels Sep 29 '17 at 16:36
  • @Niels You would have to check your compilers documentation. I know GCC 6 and above defaults to `-std=c++14` and anything before it is `-std=c++98` – NathanOliver Sep 29 '17 at 16:38
  • What I found now is when I write the output of the two implementations to /dev/null with > they differ a factor 2, but when outputting to a real file, they differ a factor 60, being PHP the slower one, which I expected from the beginning. – Niels Sep 29 '17 at 17:34
  • I upvoted the question and the correct answer should be changed to highlight "endl" being the culprit. That was like adding a sleep() into the loop. – John Jun 05 '18 at 01:13

1 Answers1

5

Although your question is very specific, I will give an answer that is a touch more generic, and possibly suitable here.

High-level languages tend to be written by very smart and capable programmers. They are highly familiar with the tools they use and are capable of finding great solutions for common case scenarios, solutions that is most cases outperform some code written by an average programmer. Thus, code in PHP that matches a common scenario can outperform a bad implementation even in assembly.

Possible effect of JIT (Just-in-time compilation)

I will explicitly state that this is NOT the case here, since PHP7 doesn't support JIT compilation, but this is a very interesting case that might be relevant to any programmer that thinks interpreted languages are always slower.

Some interpreted languages use a feature called JIT (Just-In-Time compilation). This allows a dynamic translation of the higher level code into machine code during runtime.

Since this process is done during runtime, the compiler is aware of the exact CPU, thus enabling it to choose the most suitable machine code for the mission.

Since languages such as C and C++ are often compiled for a broader selection of CPUs, their code might not be as optimized as the one created by the JIT compiler.

Daniel Trugman
  • 6,100
  • 14
  • 35