6

I've been trying to get a standardized estimate of FLOPS across all of the computers that I've implemented a Python distributed processing program on. While I currently can calculate pystones quite fine, pystones are not particularly well known, and I'm not entirely sure how accurate they really are.

Thus, I need a way to calculate (or a module that already does it) FLOPS on a variety of machines, which may have any variety of CPU's, etc. Seeing as Python is an interpreted language, simply counting the time it takes to do a set number of operations won't perform on the level of, say, Linpack. While I don't particularly need to have the exact same estimates as one of the big 'names' in benchmarking, I'd like it to be reasonably close at least.

Thus, is there a way, or pre-existing module to allow me to get FLOPS? Otherwise, my only choice will be compiling into Cython, or trying to estimate the capabilities based on CPU clock speed...

Doc Sohl
  • 165
  • 1
  • 10

1 Answers1

5

Linpack, or High performance linpack, is generally the industry standard for measuring flops. I found a python implementation here, but it might not be of much use, The standard implementation (especially if you have a cluster) would be to use the HPL. Unless you want to implement your own parallel linpack in python, HPL is the way to go. This is what most of those monster super computers on the top 500 list use to measure their performance

If you're really hell bent on doing this, even though it might not make sense or be of much use, You might want to look into porting the original MPI version to 0-MQ, which has a nice python interface.

TomCho
  • 2,928
  • 3
  • 21
  • 64
pyCthon
  • 10,295
  • 16
  • 62
  • 122
  • Thanks for linking those, but I seem to be running into difficulties when trying the UCS implementation. When I run it on 3 seperate computers with an i7-960@3.2GHz, an i7-2960M@2.7GHz, and an Intel-Q8200@2.33GHZ, all 3 computers return the same value of 120.4 MFlops. This is patently not true (or measuring something else) since the machine with the 960 core is running at 35 GFlops according to LinX. Perhaps I mis-stated my question a little since I'm actually on a Distributed Computing Network, where processing nodes are individual computers only linked by the Internet. – Doc Sohl Sep 07 '12 at 18:51
  • so if it's connected geographically and by the internet your result will vary every time due to network latency and will be far from the theoretical performance and since it looks like each computer or processor you have linked up is different that will make your performance even lower since there is a great deal of communication – pyCthon Sep 08 '12 at 19:22
  • and the processors will have to wait for the other , look up the zero mq program above you would probably have to implement it your self – pyCthon Sep 08 '12 at 19:32
  • No, each separate computer runs the code separately, then uploads the results of it's individual calculations to a controller. This would be the same application as wanting to determine the FLOPS of a variety of separate computers, individually. – Doc Sohl Sep 09 '12 at 20:07
  • yes that is why i suggested Zero-mQ, which is an API for distributed computing – pyCthon Sep 10 '12 at 03:23