1

benchmarks here:

On the use and abuse of alloca

I assume the same holds for other Unices, when google perftools exists, and it is 2x (a bit more) faster than a standard allocator, why then do distributions still ship with sub-optimal ones ? Bearing in mind that tcmalloc has been available for 5+ years.

Community
  • 1
  • 1
Hassan Syed
  • 19,054
  • 9
  • 76
  • 156
  • Where are the benchmarks for tcmalloc? The link compares malloc to alloca. Obviously a stack allocation will be faster than heap. tcmalloc isn't even mentioned in the link. – edA-qa mort-ora-y Apr 28 '11 at 15:46
  • You know, saying "perftools = tcmalloc" would be sufficient. The page on google for tcmalloc has only one reference to perftools as a header file. It's not obvious. – edA-qa mort-ora-y Apr 28 '11 at 16:21
  • well I assume you gave me the "-1", thats a tad aggresive, if my question left parts out , it's because it's meant for people that at least are aware of the subject matter... – Hassan Syed Apr 28 '11 at 16:44
  • Thing is I know the subject matter, I even know benchmarks for tcmalloc. Unless you actually code with it is totally non-obvious that it is part of perftools. Besides, I still don't consider it a good benchmark. It is single-threaded and it has a totally non-fragmented heap: a non-realistic usage scenario. – edA-qa mort-ora-y Apr 28 '11 at 16:49
  • Actually this is EXACTLY how my code will use alloca.... and anyone that is aware of perftools knows that tcmalloc is bundled with it, and knows the ins and outs of using it. – Hassan Syed Apr 28 '11 at 16:59
  • I'm glad you'd found an allocator that works well for your code. However I'm sure you can see how a non-fragmented non-threaded benchmark may not be very representative of the general case. – edA-qa mort-ora-y Apr 28 '11 at 17:07
  • tcmalloc is designed with an extra level of indirection of allocation, TLS. so yes, it also applies to threaded code. – Hassan Syed Apr 28 '11 at 17:21

1 Answers1

4

It's rare that something is quite simply "2x faster" than something else. It might be 2x faster 90% of the time, and 10x slower 10% of the time. For a general system allocator, you want something that does fairly well all the time, rather than very-good-at-specific-cases. That's probably why the default allocator isn't tcmalloc - it needs to be at least OK-ish at everything, rather than super specialised.

AshleysBrain
  • 20,705
  • 15
  • 81
  • 119
  • thats my assumption as well, but if you look at the tcmalloc implementation spec, it is pretty general purpose, from single threaded to multithreaded, the only tradeoff I can see is space overhead. – Hassan Syed Apr 28 '11 at 15:35
  • 3
    The docs I find shows that it allocates 6MB of memory per default. Perhaps for a browser that isn't much, but for a program like `ls` or `cat` that is a terrible overhead! – edA-qa mort-ora-y Apr 28 '11 at 15:48
  • @AshleysBrain, I don't get where you got the numbers for "10x slower 10% of the time" from. I mean, I am currently actively searching for performance improvements in my own heavily multi threaded code that's still being capable of running in the main-thread only. and I don't see a performance drop et all but a very big performance win when going threaded. However, I am not 100% of the dev-user base. I'd just like to get proof of the drops before someone claims it shall be. Regarding the 6MB I don't think this'll be a problem, now, in 2013, ppl have 4 to 32GB RAM installed on their personals. – christianparpart May 20 '13 at 12:53