281

I'm not asking this question because of the merits of garbage collection first of all. My main reason for asking this is that I do know that Bjarne Stroustrup has said that C++ will have a garbage collector at some point in time.

With that said, why hasn't it been added? There are already some garbage collectors for C++. Is this just one of those "easier said than done" type things? Or are there other reasons it hasn't been added (and won't be added in C++11)?

Cross links:

Just to clarify, I understand the reasons why C++ didn't have a garbage collector when it was first created. I'm wondering why the collector can't be added in.

Rakete1111
  • 42,521
  • 11
  • 108
  • 141
Jason Baker
  • 171,942
  • 122
  • 354
  • 501
  • 28
    This is one of the top ten myths about C++ that the haters always bring up. Garbage collection isn't "built in", but there are several easy ways to do it C++. Posting a comment because others have already answered better than I could below :) – davr Sep 29 '08 at 03:25
  • 6
    But that's the whole point about not being built-in, you have to do it yourself. Realibility from high to low : built-in, library, home-made. I use C++ myself, and definitely not a hater because it's the best language in the world. But dynamic memory management is a pain. – QBziZ Sep 29 '08 at 08:55
  • 4
    @Davr - I'm not a C++ hater nor am I even trying to even argue that C++ needs a garbage collector. I'm asking because I know that Bjarne Stroustrup has said that it WILL be added and was just curious what the reasons for not implementing it were. – Jason Baker Sep 29 '08 at 23:21
  • See also http://stackoverflow.com/questions/819425/why-does-c-need-language-modifications-to-be-managed. – Daniel Daranas May 19 '09 at 07:38
  • i think a better question is; why a garbage collection strategy is not optional in c++? and why the c++0x original proposal didn't allow for partial gc in a program? – lurscher Dec 10 '10 at 16:42
  • 1
    This article [The Boehm Collector for C and C++ from Dr. Dobbs](http://www.drdobbs.com/the-boehm-collector-for-c-and-c/184401632) describes an open source garbage collector that can be used with both C and C++. It discusses some of the issues that arise with using a garbage collector with C++ destructors as well as the C Standard Library. – Richard Chambers Jan 17 '16 at 14:56
  • c++11 allows for it "if the implementation choose it" apparently: https://stackoverflow.com/questions/15157591/c11-garbage-collector-why-and-hows – rogerdpack Nov 08 '17 at 18:03
  • 1
    @rogerdpack: But it's not that useful by now (see my answer...) so it's unlikely implementations will invest in having one. – einpoklum Dec 31 '17 at 23:38

16 Answers16

168

Implicit garbage collection could have been added in, but it just didn't make the cut. Probably due to not just implementation complications, but also due to people not being able to come to a general consensus fast enough.

A quote from Bjarne Stroustrup himself:

I had hoped that a garbage collector which could be optionally enabled would be part of C++0x, but there were enough technical problems that I have to make do with just a detailed specification of how such a collector integrates with the rest of the language, if provided. As is the case with essentially all C++0x features, an experimental implementation exists.

There is a good discussion of the topic here.

General overview:

C++ is very powerful and allows you to do almost anything. For this reason it doesn't automatically push many things onto you that might impact performance. Garbage collection can be easily implemented with smart pointers (objects that wrap pointers with a reference count, which auto delete themselves when the reference count reaches 0).

C++ was built with competitors in mind that did not have garbage collection. Efficiency was the main concern that C++ had to fend off criticism from in comparison to C and others.

There are 2 types of garbage collection...

Explicit garbage collection:

C++0x will have garbage collection via pointers created with shared_ptr

If you want it you can use it, if you don't want it you aren't forced into using it.

You can currently use boost:shared_ptr as well if you don't want to wait for C++0x.

Implicit garbage collection:

It does not have transparent garbage collection though. It will be a focus point for future C++ specs though.

Why Tr1 doesn't have implicit garbage collection?

There are a lot of things that tr1 of C++0x should have had, Bjarne Stroustrup in previous interviews stated that tr1 didn't have as much as he would have liked.

Right leg
  • 13,581
  • 5
  • 36
  • 68
Brian R. Bondy
  • 314,085
  • 114
  • 576
  • 619
  • 76
    I would **become** a hater if C++ forced garbage collection upon me! Why can't people use `smart_ptr's`? How would you do low level Unix style forking, with a garbage collector in the way? Other things would be affected such as threading. Python has its **global interpreter lock** mostly because of it's garbage collection (see Cython). Keep it out of C / C++, thanks. – unixman83 Apr 19 '12 at 07:47
  • 1
    How do you do transparent garbage collection of `void *` operating system API opaque structures? – unixman83 Apr 19 '12 at 07:54
  • 32
    @unixman83: The main problem with reference counted garbage collection (ie `std::shared_ptr`) is cyclical references, which cause a memory leak. Therefore you must carefully use `std::weak_ptr` to break cycles, which is messy. Mark and sweep style GC does not have this problem. There is no inherent incompatibility between threading/forking and garbage collection. Java and C# both have high performance preemptive multithreading and and a garbage collector. There are issues to do with realtime applications and a garbage collector, as most garbage collectors have to stop the world to run. – Andrew Tomazos Jan 08 '13 at 20:14
  • 11
    "The main problem with reference counted garbage collection (ie `std::shared_ptr`) is cyclical references" and awful performance which is ironic because better performance is usually the justification for using C++... http://flyingfrogblog.blogspot.co.uk/2011/01/boosts-sharedptr-up-to-10-slower-than.html – J D Jun 17 '13 at 11:57
  • 15
    "How would you do low level Unix style forking". The same way GC'd languages like OCaml have been doing it for ~20 years or more. – J D Jun 17 '13 at 11:57
  • 11
    "Python has its global interpreter lock mostly because of it's garbage collection". Strawman argument. Java and .NET both have GCs but neither have global locks. – J D Jun 17 '13 at 11:59
  • The trouble I have with forced and widespread use of GC is the difficulty of avoiding leaks. If we go all the way to the most manual forms of memory management, like C, the only way to leak memory is to fail to have a corresponding `free` to a `malloc/calloc`. In GC, an author of something like a texture object could do everything perfectly to allocate a texture and remove all existing references he created to it at the time it is removed -- absolutely perfect, unit tested. And yet there could still be hundreds of places leaking the texture -- all it takes in GC for the texture resource to... –  Nov 14 '15 at 05:06
  • ... have a bunch of logical leaks associated where it's not being freed until program shutdown is for any random developer involved to simply store a reference to the texture in some aggregate and fail to remove the reference at the appropriate time. Worse off, these kinds of frequent GC-associated logical leaks in very complex codebases and large teams are much, much harder to detect, debug, and correct than a lack of a matching `free` to a `malloc`. They don't show up in valgrind, e.g., since they're logical leaks rather than physical leaks. –  Nov 14 '15 at 05:07
  • 1
    To me this is a much bigger issue than the frequently-cited performance-related ones. It has to do with correctness -- and to get incorrect code that leaks without GC only requires negligence on the part of the person allocating memory, and it's often a lot easier to detect/correct even in those worst-case scenarios. With GC, it only requires negligence on the part of anyone involved (even a third party plugin) to leak some of the largest resources in the system, and those types of leaks are the hardest to detect/correct (sometimes rivaling race condition-like pains in debugging). –  Nov 14 '15 at 05:16
  • The "source" link is dead. Please update the answer. I tried, but the edit button is disabled. – th3an0maly Sep 28 '16 at 06:40
  • 1
    @DrunkCoder After *any random developer involved simply stored a reference to the texture in some aggregate and failed to remove the reference at the appropriate time*, you diligently call `free` and then another random developer uses this reference and you all enjoy debugging the *undefined behavior* killing your cats and burning your houses. While valgrind finds the reference and the accessing code, I can't see how it's easier than analyzing the memory leak you'd get with GC instead. – maaartinus Dec 30 '17 at 13:40
  • Practically speaking if such undefined behavior could easily fly under the radar and the program could appear to function normally, or somewhat, then I'd agree. But most of the time such UB tends to manifest itself in the form of a crash, a heap corruption, segfault, etc., all of which can be immediately caught during debugging. If you are working on mission-critical software where a leak isn't the end of the world, I do agree that GC is a very smart choice. But otherwise the predominant desire for me is to make bugs as easy to reproduce consistently as possible and as easy to spot as possible –  Dec 30 '17 at 13:53
  • .. in such a case where the desire is the easiest way to reproduce an issue and catch it as quickly as possible to correct in testing before the product ships, then I'd happily accept the dangling pointer crash over a GC resource leak, since I work in a domain where leaks are really undesirable while the testing process generally weeds out all the major kinks provided that they can be easily detected (which unfortunately a GC leak is not so easy to detect). –  Dec 30 '17 at 13:54
  • Just in my personal experience, bugs that are easy to reproduce consistently like segfaults tend to be reported and corrected if not caught in our unit/integration testing before the product officially ships. Meanwhile bugs that are extremely difficult to reproduce/detect can stay in the product for years -- among those I've experienced are race conditions, uninitialized variables, and GC leaks. Just consider, how do you write a test which can detect such a GC leak with a rooted resource? I don't think it's possible, but it is possible to immediately detect dangling pointer access. –  Dec 30 '17 at 14:01
  • @maaartinus Lastly, valgrind cannot detect a logical leak AFAIK. A rooted resource doesn't necessarily continue to take memory post-shutdown. It just keeps hogging up memory while the application is running. Once the application starts closing, the object (let's say shader) which stores the texture gets removed from the system and so does the texture it references likewise get freed. Since GC relies on implicit destruction rather than explicit, there's no way to tell what resource is meant to be freed when exactly. The texture simply wasn't freed early enough when the user requested it. –  Dec 30 '17 at 14:16
  • The type of leak I'm talking about is one where, let's say, these shaders store references to textures they use. So does the image library. However, the user can explicitly request to remove a texture from the image library, at which point shaders should release their references to the texture to allow GC to collect their memory. However, they don't remove/null out those references so the textures silently continue to take memory until the shaders themselves are removed. That's the kind of GC leak that flies under the radar and never gets caught by testing/valgrind in my experience. –  Dec 30 '17 at 14:24
  • Meanwhile let's take the equivalent with explicit destruction. In that case, when the user requests to remove the texture from the image library, the developer explicitly requests to destroy it. Now if we have the same bug above where the shaders don't release their pointers/refs to the texture, they might then try to access it leading to a dangling pointer UB (typically leading to an ugly crash). But that tends to be immediately detectable and reproducible by QA if not our automated tests themselves -- so I actually find that much more preferred. –  Dec 30 '17 at 14:26
  • That said, as a repeated caveat I don't work in mission-critical fields where it is desirable for the application to continue running gracefully in the presence of bugs (there I'd probably really want GC). I work in one that would prefer to detect and reproduce such bugs as quickly and as easily as possible and ideally before the product even ships -- even if the bug leads to a hard crash. So there GC has actually been a thorn in my side, at least for very large codebases with big teams -- they lead to some of the hardest-to-detect leak bugs I've found. –  Dec 30 '17 at 14:38
  • And phew, apologies for lengthy comments, but I think for many applications leaking a bit during runtime is probably no big deal. But there are some fields like games or, in my case, VFX where non-trivial resources are allocated rather frequently -- with VFX an artist might load a high-def HDR texture that spans a gigabyte. So to fail to free it at the appropriate time is a really nasty bug in our case, not something relatively benign, and we don't want such bugs to easily go unnoticed as GC can allow. –  Dec 30 '17 at 14:45
  • And one last thing, (last one I promise!) -- let's consider a scenario which GC doesn't allow -- the scenario where the original developer never freed the texture at all -- no part of the system frees textures. That's the kind of physical leak that valgrind easily detects, so that likewise tends to be very easy to detect... it's only the GC-style logical leak in this scenario that I've found incredibly difficult to detect in large codebases (millions of LOC) because you see the fact that the app hogs more and more mem the longer it runs, but who done it? Who failed to null/remove a ref? –  Dec 30 '17 at 14:54
  • ... and in my case some lengthy investigations actually lead me to discover that it wasn't even our code that was causing those GC leaks. It was the third party plugins the users were loading.. and all the plugins had to do to make massive textures or meshes leak in the system was just store a reference to them and fail to set them to null at the appropriate time... and that's a very scary prospect when anything written among anyone among millions of LOC can become a shared resource owner by just storing a reference to it in an app where logical leaks should not be ignored. –  Dec 30 '17 at 14:59
  • @DrunkCoder Valgrind really can't find logical leaks and there's no way to test them, but there are other tools for GC collected languages. You're speaking from a C++ perspective, which is something completely different from Java. Indeed, GC doesn't help with logical leaks directly, but it makes them improbable. What's your shader holding the reference? In Java, it'd probably be a lightweight object allocated when needed and it itself would get GC'd. It may be a singleton instead holding a reference to the *current* texture and then it'd leak it only until it gets replaced by a new one. – maaartinus Dec 30 '17 at 15:10
  • @maaartinus It's a resource management issue not solved by any language feature AFAIK (though there might be some nice debugging tools in Java I'm unaware about to help detect those leaks). Say you have a central scene graph in a software. Scene graphs store things like shaders, meshes, textures, cameras, etc, and some of these things are implemented by third parties (third party shaders loaded by plugins). The types of leaks I encountered were ones where you might have something like a mesh or texture being referenced (and therefore owned) by... –  Dec 30 '17 at 15:15
  • ... many different things -- cameras might store a list of meshes to exclude from rendering. Certain shaders might reference specific meshes and textures. The renderer might store references to all sorts of scene objects it wants to render... it's the fact that all these things are effectively sharing in the ownership of these resources that became such a thorn in my side. Logically there was only one owner -- the scene graph owns meshes, the image library owners textures. If our team and third party plugin devs were aware of phantom and weak references and used them... –  Dec 30 '17 at 15:17
  • ... appropriately, I might have a much more positive view.. but too often they just stored a reference to these scene objects, and occasionally they'd forget to release them at the right time... and then I end up facing that scenario where the app starts taking more and more memory, and users do start to notice, but it's so difficult to then figure out which line of code failed to release the resource(s)... especially when not all lines of code are even under our control. –  Dec 30 '17 at 15:18
  • Now in our case the shaders were lightweight objects -- very light -- just mostly code. But they might reference something which isn't light, like a mesh with a million polygons, or a 8000x8000 pixel HDR texture. And when the user requests to remove such texture from the image library, the shader (which might be a third party plugin shader) might fail to handle that event and properly release the texture.... and that's when my nightmares begin. –  Dec 30 '17 at 15:22
  • I find it deeply uncomfortable, nightmarishly so -- this idea that anything can be become a shared resource owner by simply storing a ref to it in a codebase of such scale, with third party plugins, and a strong desire to avoid logical leaks. The problem only occurred among persistent scene objects storing persistent references to resources they had no business sharing in ownership. Where I found GC immensely valuable personally was multithreading -- because many things in our system did not make sense to be shared in ownership persistently... –  Dec 30 '17 at 15:27
  • ... but there were cases where it made a whole lot of sense for a short-lived thread to extend the lifetime of a resource until it was finished processing. There I find GC enormously helpful, but most o the time I find it too scary and too error-prone when used to store persistent references around in more than one persistent place. –  Dec 30 '17 at 15:28
  • @DrunkCoder It looks like you're working with Java. There are tons of mission critical applications in Java and only very few memory leak complaints. There are classloader leaks, which are a separate problem, so let's forget them. Many problems get indeed solved by using weak references, especially anything like "list of meshes to exclude from rendering". If your shaders are that lightweight, you can allocate them and forget after they did their job, so that they get GD'd. If it's impossible and the plugin can't be changes, then you may be able to avoid giving it the real thing... – maaartinus Dec 30 '17 at 15:28
  • That's true and I've seen my share of very competently developed applications in GC languages -- I don't want to make it sound like it's automatically going to lead to leaky software. But I do consider GC a bit of a razor blade or chainsaw in that regard if leaks are not benign given how easily one, even a third party writing a plugin, can cause the software to leak a massive resource by just storing a reference to it and not letting it go in response to the appropriate event. I wish there was some middle ground since I don't like manual memory management either, but still prefer it. –  Dec 30 '17 at 15:30
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/162224/discussion-between-maaartinus-and-drunkcoder). – maaartinus Dec 30 '17 at 15:31
  • What I don't like in STL ATM is the awkward way that iteratiors are used to work around the missing GC. In Linq I could simply do a .Select(), in C++ the filtered out items are moved to the beginning and the iterator moved to the first selected item. I guess so you can free the previous items. Else memory leak. These things could be a lot nicer with GC. At the moment they are just awkward. IMHO – HankTheTank Dec 26 '18 at 09:34
155

To add to the debate here.

There are known issues with garbage collection, and understanding them helps understanding why there is none in C++.

1. Performance ?

The first complaint is often about performance, but most people don't really realize what they are talking about. As illustrated by Martin Beckett the problem may not be performance per se, but the predictability of performance.

There are currently 2 families of GC that are widely deployed:

  • Mark-And-Sweep kind
  • Reference-Counting kind

The Mark And Sweep is faster (less impact on overall performance) but it suffers from a "freeze the world" syndrome: i.e. when the GC kicks in, everything else is stopped until the GC has made its cleanup. If you wish to build a server that answers in a few milliseconds... some transactions will not live up to your expectations :)

The problem of Reference Counting is different: reference-counting adds overhead, especially in Multi-Threading environments because you need to have an atomic count. Furthermore there is the problem of reference cycles so you need a clever algorithm to detect those cycles and eliminate them (generally implement by a "freeze the world" too, though less frequent). In general, as of today, this kind (even though normally more responsive or rather, freezing less often) is slower than the Mark And Sweep.

I have seen a paper by Eiffel implementers that were trying to implement a Reference Counting Garbage Collector that would have a similar global performance to Mark And Sweep without the "Freeze The World" aspect. It required a separate thread for the GC (typical). The algorithm was a bit frightening (at the end) but the paper made a good job of introducing the concepts one at a time and showing the evolution of the algorithm from the "simple" version to the full-fledged one. Recommended reading if only I could put my hands back on the PDF file...

2. Resources Acquisition Is Initialization (RAII)

It's a common idiom in C++ that you will wrap the ownership of resources within an object to ensure that they are properly released. It's mostly used for memory since we don't have garbage collection, but it's also useful nonetheless for many other situations:

  • locks (multi-thread, file handle, ...)
  • connections (to a database, another server, ...)

The idea is to properly control the lifetime of the object:

  • it should be alive as long as you need it
  • it should be killed when you're done with it

The problem of GC is that if it helps with the former and ultimately guarantees that later... this "ultimate" may not be sufficient. If you release a lock, you'd really like that it be released now, so that it does not block any further calls!

Languages with GC have two work arounds:

  • don't use GC when stack allocation is sufficient: it's normally for performance issues, but in our case it really helps since the scope defines the lifetime
  • using construct... but it's explicit (weak) RAII while in C++ RAII is implicit so that the user CANNOT unwittingly make the error (by omitting the using keyword)

3. Smart Pointers

Smart pointers often appear as a silver bullet to handle memory in C++. Often times I have heard: we don't need GC after all, since we have smart pointers.

One could not be more wrong.

Smart pointers do help: auto_ptr and unique_ptr use RAII concepts, extremely useful indeed. They are so simple that you can write them by yourself quite easily.

When one need to share ownership however it gets more difficult: you might share among multiple threads and there are a few subtle issues with the handling of the count. Therefore, one naturally goes toward shared_ptr.

It's great, that's what Boost for after all, but it's not a silver bullet. In fact, the main issue with shared_ptr is that it emulates a GC implemented by Reference Counting but you need to implement the cycle detection all by yourself... Urg

Of course there is this weak_ptr thingy, but I have unfortunately already seen memory leaks despite the use of shared_ptr because of those cycles... and when you are in a Multi Threaded environment, it's extremely difficult to detect!

4. What's the solution ?

There is no silver bullet, but as always, it's definitely feasible. In the absence of GC one need to be clear on ownership:

  • prefer having a single owner at one given time, if possible
  • if not, make sure that your class diagram does not have any cycle pertaining to ownership and break them with subtle application of weak_ptr

So indeed, it would be great to have a GC... however it's no trivial issue. And in the mean time, we just need to roll up our sleeves.

MaksymB
  • 1,267
  • 10
  • 33
Matthieu M.
  • 251,718
  • 39
  • 369
  • 642
  • 2
    I wish I could accept two answers! This is just great. One thing to point out, in regards to performance, the GC that runs in a separate thread is actually pretty common (it's used in Java and .Net). Granted, that might not be acceptable in embedded systems. – Jason Baker Feb 25 '10 at 13:24
  • 16
    Only two types? How 'bout copying collectors? Generational collectors? Assorted concurrent collectors (including Baker's hard real-time treadmill)? Various hybrid collectors? Man, the sheer ignorance in the industry of this field astonishes me sometimes. – JUST MY correct OPINION Jun 01 '10 at 14:51
  • 13
    Did I say there were only 2 types ? I said that there were 2 that were widely deployed. As far as I know Python, Java and C# all use Mark and Sweep algorithms now (Java used to have a reference counting algorithm). To be even more precise, it seems to me than C# uses Generational GC for minor cycles, Mark And Sweep for major cycles and Copying to fight off memory fragmentation; though I would argue that the heart of the algorithm is Mark And Sweep. Do you know any mainstream language that uses another technology ? I'm always happy to learn. – Matthieu M. Jun 01 '10 at 17:59
  • 3
    You just named one mainstream language that used three. – JUST MY correct OPINION Jun 02 '10 at 02:15
  • 2
    @JUST MY correct OPINION, Your sheer arrogance astonishes me regularly. But arrogant people are kind of cool sometimes... ;) – d-_-b Apr 12 '11 at 05:00
  • There are also Incremental and Generational Garbage Collectors. GC's can affect RAII, so if we are to have GC for pointers we will have to have some sort of *local* object type that acts like a stack object ... also there are so many cases where Automatic Garbage Collection is just not needed. – Keldon Alleyne Apr 17 '12 at 19:12
  • @JSPerfUnkn0wn: this was already brought up by JUST MY correct OPINION. I would argue that Generational GC are based on Mark & Sweep (though they limit the algorithm to a smaller scope). As for not being needed, the problem is that either you use GC and risk having space leaks or you use manual handling and risk having memory leaks/stale references. I am pretty interested in Rust's Regions System and I wonder if it will introduce an alternative to the above issue. – Matthieu M. Apr 18 '12 at 06:48
  • 3
    Main difference is that Generational and Incremental GC do not need to stop the world to work, and you can make them work on single-threaded systems without too much overhead by occasionally performing a iterations of the tree traversal when accessing the GC pointers (the factor can be determined by the number of new nodes, along with a basic prediction of the need to collect). You can take GC even further by including data about where in code the creation/modification of the node occurred, which could allow you to improve your predictions, and you get Escape Analysis for free with it. – Keldon Alleyne Apr 18 '12 at 08:32
  • @JSPerfUnkn0wn: well, to be fair, regarding Mark & Sweep you can also avoid the stop the world effect by introducing read or write barriers etc... so really I consider generational as a refinement of Mark & Sweep. Obviously I only vaguely broached the subject, first because it's a SO answer, and second because I am certainly no expert in the subject. Perhaps could you add your own answer ? – Matthieu M. Apr 18 '12 at 08:56
  • @MatthieuM. you said *'The Mark And Sweep is faster (less impact on overall performance) but it suffers from a "freeze the world" syndrom'*, that is all I was responding too. I don't care if you *can* call them Mark & Sweep variations, or if you want to call them "Egg and Bacon", all I care about is the implementation and clear communication between programmers, not what one **argues** it *should* be called :) ... now back to the topic, GC (or let's call it Automatic GC as Python calls it) does not have to stop the world, that's all I was hinting at (and the other things I mentioned) :) – Keldon Alleyne Apr 18 '12 at 11:39
  • @MatthieuM. Regarding, "Perhaps could you add your own answer", there's no need, these answers are fine, as is yours :) And as you said, those algorithms are refinements of Mark & Sweep, although I would go further to point out that they are only ever going to be traversing the nodes (which is what makes them stand out as variants of M&S). What defines them most is how their behavior differentiates, for example I don't just mark, I have 5 partitions/node states, Mark&Sweep has 2, and I've seen one on Wikipedia with 3. I've never tried to make a Generational though, but I do have a few ideas. – Keldon Alleyne Apr 18 '12 at 11:40
  • 1
    "suffers from a "freeze the world" syndrom". Dijkstra solved that problem in 1975 with his tricolor marking scheme for incremental mark-sweep. Languages like OCaml (1996) use this and, consequently, do not suffer from that problem. http://www.cs.utexas.edu/~EWD/ewd05xx/EWD595.PDF – J D Jun 17 '13 at 12:08
  • 1
    @JonHarrop: I touched on possible improvements after the "Reference Counting" scheme, however concurrent GCs are far from trivial and induce a cost of their own; you either need read barriers or write barriers in the other threads, which is the *cooperation* part Dijkstra talked about I guess. Maybe I should put a *naive* before Mark and Sweep ? – Matthieu M. Jun 17 '13 at 12:17
  • Note that I was talking about incremental GCs and not concurrent GCs. Incremental GCs are trivial. Real concurrent GCs tend to be complicated but simple concurrent GCs do exist (e.g. the VCGC). You can eliminate unbounded GC pauses by using separate heaps with incremental GCs (e.g. OCaml and Erlang) instead of a global concurrent GC but you need to deep copy messages. Your statement is certainly true of naive mark-sweep but, then, nobody uses naive mark-sweep. Production GCs tend to be generational with copying collectors between young generations and mark-sweep/compact for the old generation. – J D Jun 17 '13 at 12:52
  • 1
    @JonHarrop: The issue, when applied to C and C++, is that copying collectors just do not work: you are entitled in C and C++ to store a pointer value into an integer, and even just an offset compared to your own address, so the GC cannot trace the references back to update them when moving. Similarly, addresses can be used as identifiers of objects, so you cannot copy an object behind the developer's back. And of course, memory is easily shared between threads. This seriously limits the possible implementations of GCs. – Matthieu M. Jun 17 '13 at 13:25
  • 2
    @MatthieuM.: To take things even further: suppose a program repeatedly allocated 1,000 objects with `new`, fills them in with some data, converts their addresses to numbers, displays them for 1/60 of a second, and then abandons the objects without performing a `delete`. The program later allows the user to type in an number, converts it to an address, and uses it on the assumption it represents a valid object. If the number the user types in has been displayed as an object's address, the C++ standard requires that the address still be valid. – supercat Oct 01 '13 at 21:37
  • 1
    @MatthieuM.: While it would be somewhat unlikely that a user would manage to copy down every number that gets displayed [a digital camera could certainly grab arbitrary screen-fulls], and any object associated with an address that disappeared from the screen before being transcribed could be safely deleted, it would clearly be absolutely 100% impossible for the program to know which addresses still existed somewhere outside the computer and which did not. – supercat Oct 01 '13 at 21:43
  • I realize your answer is 10 years old by now, but - I don't think you've made a good case about why GC is necessary in today's C++. The fact that even with RAII, effective memory leaks are still possible, is not such a case - because you can have memory leaks with GC as well - by holding on to references to allocated space you don't need and effectively won't use. – einpoklum May 14 '20 at 21:05
57

What type? should it be optimised for embedded washing machine controllers, cell phones, workstations or supercomputers?
Should it prioritise gui responsiveness or server loading?
should it use lots of memory or lots of CPU?

C/c++ is used in just too many different circumstances. I suspect something like boost smart pointers will be enough for most users

Edit - Automatic garbage collectors aren't so much a problem of performance (you can always buy more server) it's a question of predicatable performance.
Not knowing when the GC is going to kick in is like employing a narcoleptic airline pilot, most of the time they are great - but when you really need responsiveness!

Martin Beckett
  • 90,457
  • 25
  • 178
  • 252
  • 6
    I definitely see your point, but I feel compelled to ask: isn't Java used in just about as many applications? – Jason Baker Sep 29 '08 at 01:05
  • 36
    No. Java is not suitable for high performance applications, for the simple reason that it doesn't have performance guarantees to the same extent as C++. So you'll find it in a cell phone, but you won't find it in a cell switch or supercomputer. – Zathrus Sep 29 '08 at 01:22
  • 11
    You can always buy more server, but you can't always buy more CPU for the cell phone already in the customer's pocket! – Crashworks Jan 10 '10 at 05:06
  • 9
    Java has done a lot of performance catchup in CPU efficiency. The really intractable problem is memory usage, Java is inherently less memory efficient than C++. And that inefficiency is due to the fact that it is garbage collected. Garbage collection cannot be both fast and memory efficient, a fact that becomes obvious if you look into how fast GC algorithms work. – Nate C-K Nov 03 '10 at 15:20
  • 2
    @Zathrus java can win on throughput b/c of the optimizing jit, though not latency (boo real-time), and certainly not memory footprint. – gtrak Oct 22 '11 at 21:54
  • 2
    @Zathrus - Java on Mainframe: http://www-03.ibm.com/systems/z/os/zos/tools/java/. Java switch controller - http://floodlight.openflowhub.org/. Realtime performance guarantees are only important for a relatively small subset of high performance applications, in most areas of HPC Java has been strong for a long time. – mikera Aug 08 '12 at 06:10
  • 1
    @Zathrus Do [864 cores](https://www.youtube.com/watch?v=5uljtqyBLxI) not qualify for a supercomputer? Do [6 million orders per second on a single thread](https://martinfowler.com/articles/lmax.html) in realtime trading not qualify as high-performance? Java still has some disadvantages, but it's pretty good for nearly all tasks. – maaartinus Dec 30 '17 at 13:58
36

One of the biggest reasons that C++ doesn't have built in garbage collection is that getting garbage collection to play nice with destructors is really, really hard. As far as I know, nobody really knows how to solve it completely yet. There are alot of issues to deal with:

  • deterministic lifetimes of objects (reference counting gives you this, but GC doesn't. Although it may not be that big of a deal).
  • what happens if a destructor throws when the object is being garbage collected? Most languages ignore this exception, since theres really no catch block to be able to transport it to, but this is probably not an acceptable solution for C++.
  • How to enable/disable it? Naturally it'd probably be a compile time decision but code that is written for GC vs code that is written for NOT GC is going to be very different and probably incompatible. How do you reconcile this?

These are just a few of the problems faced.

Greg Rogers
  • 33,366
  • 15
  • 63
  • 93
  • 17
    GC and destructors is a solved problem, by a nice sidestep from Bjarne. Destructors don't run during GC, because that's not the point of GC. GC in C++ exists to create the notion of infinite _memory_, not infinite other resources. – MSalters Sep 29 '08 at 11:54
  • 3
    If destructors don't run that completely changes the semantics of the language. I guess at the very least you'd need a new keyword "gcnew" or something so that you explicitly allow this object to be GC'ed (and therefore you shouldn't use it to wrap resources besides memory). – Greg Rogers Sep 29 '08 at 13:43
  • 7
    This is a bogus argument. Since C++ has explicit memory management, you need to figure out when every object must be freed. With GC, it is no worse; rather, the problem is reduced to figuring out when certain objects are freed, namely those objects that require special considerations upon deletion. Experience programming in Java and C# reveals that the vast majority of objects require no special considerations and can be safely left to the GC. As it turns out, one of the main functions of destructors in C++ is to free child objects, which GC handles for you automatically. – Nate C-K Nov 03 '10 at 15:27
  • 2
    @NateC-K: One thing which is improved in GC vs non-GC (perhaps the biggest thing) is the ability a solid GC system to guarantee that every reference will continue to point to the same object as long as the reference exists. Calling `Dispose` on an object may make it unsable, but references which pointed to the object when it was alive will continue to do so after it's dead. By contrast, in non-GC systems, objects can be deleted while references exist, and there's seldom any limit to the havoc that may be wreaked if one of those references gets used. – supercat Oct 01 '13 at 21:49
23

Though this is an old question, there's still one problem that I don't see anybody having addressed at all: garbage collection is almost impossible to specify.

In particular, the C++ standard is quite careful to specify the language in terms of externally observable behavior, rather than how the implementation achieves that behavior. In the case of garbage collection, however, there is virtually no externally observable behavior.

The general idea of garbage collection is that it should make a reasonable attempt at assuring that a memory allocation will succeed. Unfortunately, it's essentially impossible to guarantee that any memory allocation will succeed, even if you do have a garbage collector in operation. This is true to some extent in any case, but particularly so in the case of C++, because it's (probably) not possible to use a copying collector (or anything similar) that moves objects in memory during a collection cycle.

If you can't move objects, you can't create a single, contiguous memory space from which to do your allocations -- and that means your heap (or free store, or whatever you prefer to call it) can, and probably will, become fragmented over time. This, in turn, can prevent an allocation from succeeding, even when there's more memory free than the amount being requested.

While it might be possible to come up with some guarantee that says (in essence) that if you repeat exactly the same pattern of allocation repeatedly, and it succeeded the first time, it will continue to succeed on subsequent iterations, provided that the allocated memory became inaccessible between iterations. That's such a weak guarantee it's essentially useless, but I can't see any reasonable hope of strengthening it.

Even so, it's stronger than what has been proposed for C++. The previous proposal [warning: PDF] (that got dropped) didn't guarantee anything at all. In 28 pages of proposal, what you got in the way of externally observable behavior was a single (non-normative) note saying:

[ Note: For garbage collected programs, a high quality hosted implementation should attempt to maximize the amount of unreachable memory it reclaims. —end note ]

At least for me, this raises a serious question about return on investment. We're going to break existing code (nobody's sure exactly how much, but definitely quite a bit), place new requirements on implementations and new restrictions on code, and what we get in return is quite possibly nothing at all?

Even at best, what we get are programs that, based on testing with Java, will probably require around six times as much memory to run at the same speed they do now. Worse, garbage collection was part of Java from the beginning -- C++ places enough more restrictions on the garbage collector that it will almost certainly have an even worse cost/benefit ratio (even if we go beyond what the proposal guaranteed and assume there would be some benefit).

I'd summarize the situation mathematically: this a complex situation. As any mathematician knows, a complex number has two parts: real and imaginary. It appears to me that what we have here are costs that are real, but benefits that are (at least mostly) imaginary.

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
  • I would posit that even if one specifies that for proper operation all objects must be deleted, and only objects which had been deleted would be eligible for collection, compiler *support* for reference-tracking garbage collection could *still* be useful, since such a language could ensure that use of a deleted pointer (reference) would be guaranteed to trap, rather than causing Undefined Behavior. – supercat Jul 24 '13 at 16:33
  • 3
    Even in Java, the GC is not really specified to do anything useful AFAIK. It might call `free` for you (where I mean `free` analagous to the C language). But Java never guarantees to call finalizers or anything like that. In fact, C++ does much more than Java to run around commit database writes, flushing file handles, and so on. Java claims to have "GC", but Java developers have to meticulously call `close()` all the time and they have to be very aware of resource management, being careful not to call `close()` too soon or too late. C++ frees us from that. ...(continued) – Aaron McDaid Oct 16 '14 at 10:58
  • 2
    .. my comment a moment ago is not intended to criticise Java. I'm just observing that the term "garbage collection" is a very weird term - it means much less than people think it does and therefore it's difficult to discuss it without being clear what it means. – Aaron McDaid Oct 16 '14 at 11:01
  • @AaronMcDaid It's true that GC does not help with non-memory resources at all. Luckily, such resources get allocated pretty rarely when compared to memory. Moreover, more than 90% of them can be freed in the method which allocated them, so `try (Whatever w=...) {...}` solves it (and you get a warning when you forget). The remaining ones are problematic with RAII, too. Calling `close()` "all the time" means maybe once per tens of thousand lines, so that's not that bad, while memory gets allocated nearly on every Java line. – maaartinus Dec 30 '17 at 14:19
16

If you want automatic garbage collection, there are good commercial and public-domain garbage collectors for C++. For applications where garbage collection is suitable, C++ is an excellent garbage collected language with a performance that compares favorably with other garbage collected languages. See The C++ Programming Language (4rd Edition) for a discussion of automatic garbage collection in C++. See also, Hans-J. Boehm's site for C and C++ garbage collection (archive).

Also, C++ supports programming techniques that allow memory management to be safe and implicit without a garbage collector. I consider garbage collection a last choice and an imperfect way of handling for resource management. That does not mean that it is never useful, just that there are better approaches in many situations.

Source: http://www.stroustrup.com/bs_faq.html#garbage-collection

As for why it doesnt have it built in, If I remember correctly it was invented before GC was the thing, and I don't believe the language could have had GC for several reasons(I.E Backwards compatibilty with C)

Hope this helps.

Andriy Makukha
  • 5,904
  • 1
  • 24
  • 35
Rayne
  • 28,305
  • 16
  • 83
  • 100
  • "with a performance that compares favorably with other garbage collected languages". Citation? – J D Jun 17 '13 at 12:13
  • 1
    My link was broken. I wrote this answer 5 years ago. – Rayne Jun 18 '13 at 00:59
  • 1
    Ok, I was hoping for some independent verification of these claims, i.e. not by Stroustrup or Boehm. :-) – J D Jun 18 '13 at 12:07
13

Stroustrup made some good comments on this at the 2013 Going Native conference.

Just skip to about 25m50s in this video. (I'd recommend watching the whole video actually, but this skips to the stuff about garbage collection.)

When you have a really great language that makes it easy (and safe, and predictable, and easy-to-read, and easy-to-teach) to deal with objects and values in a direct way, avoiding (explicit) use of the heap, then you don't even want garbage collection.

With modern C++, and the stuff we have in C++11, garbage collection is no longer desirable except in limited circumstances. In fact, even if a good garbage collector is built into one of the major C++ compilers, I think that it won't be used very often. It will be easier, not harder, to avoid the GC.

He shows this example:

void f(int n, int x) {
    Gadget *p = new Gadget{n};
    if(x<100) throw SomeException{};
    if(x<200) return;
    delete p;
}

This is unsafe in C++. But it's also unsafe in Java! In C++, if the function returns early, the delete will never be called. But if you had full garbage collection, such as in Java, you merely get a suggestion that the object will be destructed "at some point in the future" (Update: it's even worse that this. Java does not promise to call the finalizer ever - it maybe never be called). This isn't good enough if Gadget holds an open file handle, or a connection to a database, or data which you have buffered for write to a database at a later point. We want the Gadget to be destroyed as soon as it's finished, in order to free these resources as soon as possible. You don't want your database server struggling with thousands of database connections that are no longer needed - it doesn't know that your program is finished working.

So what's the solution? There are a few approaches. The obvious approach, which you'll use for the vast majority of your objects is:

void f(int n, int x) {
    Gadget p = {n};  // Just leave it on the stack (where it belongs!)
    if(x<100) throw SomeException{};
    if(x<200) return;
}

This takes fewer characters to type. It doesn't have new getting in the way. It doesn't require you to type Gadget twice. The object is destroyed at the end of the function. If this is what you want, this is very intuitive. Gadgets behave the same as int or double. Predictable, easy-to-read, easy-to-teach. Everything is a 'value'. Sometimes a big value, but values are easier to teach because you don't have this 'action at a distance' thing that you get with pointers (or references).

Most of the objects you make are for use only in the function that created them, and perhaps passed as inputs to child functions. The programmer shouldn't have to think about 'memory management' when returning objects, or otherwise sharing objects across widely separated parts of the software.

Scope and lifetime are important. Most of the time, it's easier if the lifetime is the same as the scope. It's easier to understand and easier to teach. When you want a different lifetime, it should be obvious reading the code that you're doing this, by use of shared_ptr for example. (Or returning (large) objects by value, leveraging move-semantics or unique_ptr.

This might seem like an efficiency problem. What if I want to return a Gadget from foo()? C++11's move semantics make it easier to return big objects. Just write Gadget foo() { ... } and it will just work, and work quickly. You don't need to mess with && yourself, just return things by value and the language will often be able to do the necessary optimizations. (Even before C++03, compilers did a remarkably good job at avoiding unnecessary copying.)

As Stroustrup said elsewhere in the video (paraphrasing): "Only a computer scientist would insist on copying an object, and then destroying the original. (audience laughs). Why not just move the object directly to the new location? This is what humans (not computer scientists) expect."

When you can guarantee only one copy of an object is needed, it's much easier to understand the lifetime of the object. You can pick what lifetime policy you want, and garbage collection is there if you want. But when you understand the benefits of the other approaches, you'll find that garbage collection is at the bottom of your list of preferences.

If that doesn't work for you, you can use unique_ptr, or failing that, shared_ptr. Well written C++11 is shorter, easier-to-read, and easier-to-teach than many other languages when it comes to memory management.

Aaron McDaid
  • 24,484
  • 9
  • 56
  • 82
  • 1
    GC should only be used for objects that don't acquire resources (i.e. ask other entities to do things on their behalf "until further notice"). If `Gadget` doesn't ask anything else to do anything on its behalf, the original code would be perfectly safe in Java if the meaningless (to Java) `delete` statement were removed. – supercat Apr 01 '15 at 22:13
  • @supercat, objects with boring destructors are interesting. (I haven't defined 'boring', but basically destructors that never need to be called, except for the freeing of memory). It might be possible for an individual compiler to treat `shared_ptr` specially when `T` is 'boring'. It could decide to not actually manage a ref counter for that type, and instead to use GC. This would allow GC to be used without the developer needing to notice. A `shared_ptr` could simply be seen as a GC pointer, for suitable `T`. But there are limitations in this, and it would make many programs slower. – Aaron McDaid Apr 02 '15 at 06:30
  • A good type system should have different types for GC and RAII-managed heap objects, since some usage patterns work very well with one and very poorly with the other. In .NET or Java, a statement `string1=string2;` will execute very quickly regardless of the length of the string (it's literally nothing more than a register load and register store), and does not require any locking to ensure that if the above statement is executed while `string2` is being written, `string1` will hold either the old value or the new value, with no Undefined Behavior). – supercat Apr 02 '15 at 15:36
  • In C++, assignment of a `shared_ptr` requires a lot of behind-the-scenes synchronization, and assignment of a `String` may behave oddly if a variable is read and written simultaneously. Cases where one would want to write and read a `String` simultaneously aren't terribly common, but can arise if e.g. some code wishes to make ongoing status reports available to other threads. In .NET and Java, such things just "work". – supercat Apr 02 '15 at 15:38
  • Could you clarify a few things, so that we are comparing like with like? In C++, you mean a `String` class which is immutable, just like the String class in Java? And therefore, you never really assign to a String. You can assign to a `String*` or a `shared_ptr` (or to a String *Java-reference* in Java). I'm confused about your discussion of 'assignment' and 'read and write', because I would prefer clarity that the underlying object is immutable. – Aaron McDaid Apr 02 '15 at 15:58
  • In Java and .NET, the most common type used to encapsulate a sequence of characters is `String`, an immutable reference type which is used because that's the only thing that can give practical value semantics in Java, and the only thing that could give value semantics in .NET without excess boxing [in .NET, a structure which encapsulated a `char[]` might be better than heap object except for the boxing issue]. For purposes of comparison, I'd say that should be compared against whatever type in C++ would most efficiently represent... – supercat Apr 02 '15 at 17:07
  • ...a sequence of characters with value semantics. For some usage patterns, C++ `String` would be best; for others, `shared_ptr` would be better. There are some usage patterns, however, for which no C++ type can achieve the efficiency and semantics of GC strings, since the only synchronization required is with the GC itself, and it has magical abilities to force synchronization with threads that have no other synchronization mechanisms built into them. – supercat Apr 02 '15 at 17:11
  • Sorry to repeat myself, but is your C++ `String` immutable? You talked earlier about reading and writing, but writing doesn't make sense with an immutable `String`. something like `using String = const std::vector;`. I can't think about other issues before being clear what the type is. – Aaron McDaid Apr 02 '15 at 17:43
  • The intention is to have a variable which at different times may be made to encapsulate different sequences of characters, independently from any other string-type variable. In C++ that may be achieved by having the variable encapsulates a mutable string directly, a pointer to a shareable instance of a string that will never be mutated, a pointer to an instance of a string which will only be mutated if it has never been shared (and will otherwise be replaced with a new instance that the owner could then mutate). The key point is that the *variable* might change. – supercat Apr 02 '15 at 18:17
  • In short, for purposes of fair comparison, use whatever sort of type would let C++ achieve the best performance for each usage pattern. That .NET and Java use a mutable reference to immutable object as their string type doesn't mean C++ shouldn't be allowed to use something better if it can. – supercat Apr 02 '15 at 22:26
  • "_Java does not promise to call the finalizer ever - it maybe never be called_" Fun fact: early Java versions allowed the finalizer to run while the objects was still being used; in fact, as early as the ctor was finished!!! Apparently it didn't cause many complains from the "C++ sucks because order of `f()+g()` is not deterministic, Java rules because it's deterministic" crowd. – curiousguy Jan 14 '17 at 11:55
  • 1
    @curiousguy nothing has changed, unless you take the right precautions, Java still allows to the finalizer to be called as soon as the constructor has finished. Here a real life example: “[finalize() called on strongly reachable objects in Java 8](https://stackoverflow.com/q/26642153/2711488)”. The conclusion is to never use this feature, that almost everyone agrees to be a historical design mistake of the language. When we follow that advice, the language provides the determinism we love. – Holger Jun 09 '20 at 08:50
13

tl;dr: Because modern C++ doesn't need garbage collection.

Bjarne Stroustrup's FAQ answer on this matter says:

I don't like garbage. I don't like littering. My ideal is to eliminate the need for a garbage collector by not producing any garbage. That is now possible.


The situation, for code written these days (C++17 and following the official Core Guidelines) is as follows:

  • Most memory ownership-related code is in libraries (especially those providing containers).
  • Most use of code involving memory ownership follows the RAII pattern, so allocation is made on construction and deallocation on destruction, which happens when exiting the scope in which something was allocated.
  • You do not explicitly allocate or deallocate memory directly.
  • Raw pointers do not own memory (if you've followed the guidelines), so you can't leak by passing them around.
  • If you're wondering how you're going to pass the starting addresses of sequences of values in memory - you'll be doing that with a span; no raw pointer needed.
  • If you really need an owning "pointer", you use C++' standard-library smart pointers - they can't leak, and are decently efficient (although the ABI can get in the way of that). Alternatively, you can pass ownership across scope boundaries with "owner pointers". These are uncommon and must be used explicitly; but when adopted - they allow for nice static checking against leaks.

"Oh yeah? But what about...

... if I just write code the way we used to write C++ in the old days?"

Indeed, you could just disregard all of the guidelines and write leaky application code - and it will compile and run (and leak), same as always.

But it's not a "just don't do that" situation, where the developer is expected to be virtuous and exercise a lot of self control; it's just not simpler to write non-conforming code, nor is it faster to write, nor is it better-performing. Gradually it will also become more difficult to write, as you would face an increasing "impedance mismatch" with what conforming code provides and expects.

... if I reintrepret_cast? Or do complex pointer arithmetic? Or other such hacks?"

Indeed, if you put your mind to it, you can write code that messes things up despite playing nice with the guidelines. But:

  1. You would do this rarely (in terms of places in the code, not necessarily in terms of fraction of execution time)
  2. You would only do this intentionally, not accidentally.
  3. Doing so will stand out in a codebase conforming to the guidelines.
  4. It's the kind of code in which you would bypass the GC in another language anyway.

... library development?"

If you're a C++ library developer then you do write unsafe code involving raw pointers, and you are required to code carefully and responsibly - but these are self-contained pieces of code written by experts (and more importantly, reviewed by experts).


So, it's just like Bjarne said: There's really no motivation to collect garbage generally, as you all but make sure not to produce garbage. GC is becoming a non-problem with C++.

That is not to say GC isn't an interesting problem for certain specific applications, when you want to employ custom allocation and de-allocations strategies. For those you would want custom allocation and de-allocation, not a language-level GC.

einpoklum
  • 86,754
  • 39
  • 223
  • 453
  • 1
    Well, it does (need G C) if you are grinding strings .. Imagine you have large string arrays (think hundreds of megabytes) that you are building piecemeal, then processing and rebuilding into different lengths, deleting unused ones, combining others etc. I know because I have had to switch to high level languages to cope. (Of course you could build your own G C as well). – www-0av-Com Jan 27 '18 at 13:42
  • 2
    @user1863152 : That's a case in which a custom allocator would be useful. It still doesn't necessitate a language-integral GC... – einpoklum Jan 27 '18 at 13:50
  • to einpoklum : true. It's just horse for courses. My requirement was to process dynamically changing gallons of Transport Passenger Information. Fascinating subject .. Really comes down to software philosophy. – www-0av-Com Jan 27 '18 at 13:58
  • GC as the Java and .NET world have discovered finally has a massive problem - it does not scale. When you have billions of live objects in memory as we do these days with any non trivial software, you'll have to start writing code to hide things from the GC. It's a burden to have GC in Java and .NET. – Zach Saw Sep 25 '19 at 12:34
  • @ZachSaw: What fraction of programs would ever have even one billion live objects in memory? You're saying all programs that don't have more than that are trivial? – supercat Nov 16 '20 at 15:56
  • @supercat: Maybe he meant millions of objects? Although even then it's not true for "any non trivial" piece of software. – einpoklum Nov 16 '20 at 16:22
  • @einpoklum: GC can work fine at the "millions of objects" scale. And for jobs that are a good fit for a model based upon shareable immutable objects, it probably scales better to multi-core systems than RAII since it doesn't require interlocked reference counts. – supercat Nov 16 '20 at 16:26
  • @supercat: I didn't claim GC doesn't work - please don't mistake my answer with ZachSaw's comment. Having said that - it doesn't necessarily scale better than RAII-base ownership because most RAII ownership doesn't use reference counts nor locks. Also, in C++, you would avoid the use of massive numbers of `std::shared_ptr` - that sounds like a promise of huge time wastage. Still, if that's how your software works, then yes, GC may scale better than individual shared pointers. – einpoklum Nov 16 '20 at 16:41
  • @supercat by non-trivial I meant non-memory-trivial - i.e. big memory applications such as our proprietary patented high performance document search engine that is capable of handling millions of objects, sorting on the fly etc. Or one more familiar to everyone - the StackOverflow site itself. There have been lots of tech blogs written by the devs on this topic. – Zach Saw Nov 17 '20 at 06:17
  • @einpoklum: Different kinds of tasks have different requirements. I don't think any single approach can be optimal for all of them. RAII is good for applications where all objects have clear ownership; a generational tracing GC is good for applications where there are large numbers of objects that happen to be equal, but have no other meaningful relationship, and have useful lifetimes that fit well with generational expectations. – supercat Nov 17 '20 at 15:44
  • @supercat : The last case, while important, is not something that will just happen in your application without you noticing; and would thus should be well enough served with an opt-in, library-based, garbage-collection mechanism (if garbage can really not be avoided). – einpoklum Nov 17 '20 at 19:00
  • @einpoklum: Systems that are going to use tracing garbage collectors should integrate them into the design, since such integration will allow them to do things that a library-based one would not be able to do. For example, if `foo` holds the last reference to an object, and thread 1 performs `foo = bar;` at about the same time as another performs `boz = foo;`, a Framework-integrated collector will be able to handle all associated corner cases of thread timing without any need for memory barriers in either thread's code. The GC would need to forcibly inject some expensive memory barriers... – supercat Nov 17 '20 at 19:54
  • ...when it runs, but between GC cycles no barriers would be needed to guard against the possibility of memory-corrupting race conditions. For that to work, though, the GC would need more detailed information about register usage than would be provided by a compiler that wasn't specifically designed to cooperate with GC. – supercat Nov 17 '20 at 19:58
  • @einpoklum: An interesting aspect of many framework-integrated garbage collectors that a lot of people fail to appreciate is that the bit patterns stored in references can spontaneously change when objects move. A library-based relocating GC would require that object references hold some unchanging attribute of an object, such as the address of a structure holding the object's real address, but a framework-integrated one can store references using direct pointers, but then when objects get relocated, set the old pages to trap and then change all references so they hold the new location. – supercat Nov 17 '20 at 20:06
  • If an attempt is made to access an object object is accessed between the time a new address is assigned and the time the reference used for access gets updated, the trap handler can detect that the reference holds the old address of an object, update the reference to reflect the object's new address, and retry the access. Garbage collectors that aren't integrated with a run-time framework can't use such techniques. – supercat Nov 17 '20 at 20:09
  • perhaps this is a better approach which has the benefit of both worlds https://www.microsoft.com/en-us/research/publication/project-snowflake-non-blocking-safe-manual-memory-management-net/ – Zach Saw Dec 18 '20 at 01:46
10

The idea behind C++ was that you would not pay any performance impact for features that you don't use. So adding garbage collection would have meant having some programs run straight on the hardware the way C does and some within some sort of runtime virtual machine.

Nothing prevents you from using some form of smart pointers that are bound to some third-party garbage collection mechanism. I seem to recall Microsoft doing something like that with COM and it didn't go to well.

Uri
  • 84,589
  • 46
  • 214
  • 312
  • 2
    I don't think GC requires a VM. The compiler could add code to all pointer operations to update a global state, while a separate thread runs in the background deleting objects as needed. – user83255 May 07 '09 at 11:39
  • 3
    I agree. You don't need a virtual machine, but the second you start having something manage your memory for you like that in the background, my feel is that you've left the actual "electric wires" and have sort of a VM situation. – Uri May 07 '09 at 16:25
8

To answer most "why" questions about C++, read Design and Evolution of C++

Nemanja Trifunovic
  • 23,597
  • 3
  • 46
  • 84
4

One of the fundamental principles behind the original C language is that memory is composed of a sequence of bytes, and code need only care about what those bytes mean at the exact moment that they are being used. Modern C allows compilers to impose additional restrictions, but C includes--and C++ retains--the ability to decompose a pointer into a sequence of bytes, assemble any sequence of bytes containing the same values into a pointer, and then use that pointer to access the earlier object.

While that ability can be useful--or even indispensable--in some kinds of applications, a language that includes that ability will be very limited in its ability to support any kind of useful and reliable garbage collection. If a compiler doesn't know everything that has been done with the bits that made up a pointer, it will have no way of knowing whether information sufficient to reconstruct the pointer might exist somewhere in the universe. Since it would be possible for that information to be stored in ways that the computer wouldn't be able to access even if it knew about them (e.g. the bytes making up the pointer might have been shown on the screen long enough for someone to write them down on a piece of paper), it may be literally impossible for a computer to know whether a pointer could possibly be used in the future.

An interesting quirk of many garbage-collected frameworks is that an object reference not defined by the bit patterns contained therein, but by the relationship between the bits held in the object reference and other information held elsewhere. In C and C++, if the bit pattern stored in a pointer identifies an object, that bit pattern will identify that object until the object is explicitly destroyed. In a typical GC system, an object may be represented by a bit pattern 0x1234ABCD at one moment in time, but the next GC cycle might replace all references to 0x1234ABCD with references to 0x4321BABE, whereupon the object would be represented by the latter pattern. Even if one were to display the bit pattern associated with an object reference and then later read it back from the keyboard, there would be no expectation that the same bit pattern would be usable to identify the same object (or any object).

supercat
  • 69,493
  • 7
  • 143
  • 184
  • That is a really good point, I just recently stole some bits from my pointers because otherwise there would be stupid amounts of cache misses. – Passer By Jun 18 '17 at 14:44
  • @PasserBy: I wonder how many applications that use 64-bit pointers would benefit more from either using scaled 32-bit pointers as object references, or else keeping almost everything in 4GiB of address space and using special objects to store/retrieve data from high-speed storage beyond? Machines have enough RAM that the RAM consumption of 64-bit pointers might not matter, *except* that they gobble twice as much cache as 32-bit pointers. – supercat Jun 19 '17 at 14:32
3

SHORT ANSWER: We don't know how to do garbage collection efficiently (with minor time and space overhead) and correctly all the time (in all possible cases).

LONG ANSWER: Just like C, C++ is a systems language; this means it is used when you are writing system code, e.g., operating system. In other words, C++ is designed, just like C, with best possible performance as the main target. The language' standard will not add any feature that might hinder the performance objective.

This pauses the question: Why garbage collection hinders performance? The main reason is that, when it comes to implementation, we [computer scientists] do not know how to do garbage collection with minimal overhead, for all cases. Hence it's impossible to the C++ compiler and runtime system to perform garbage collection efficiently all the time. On the other hand, a C++ programmer, should know his design/implementation and he's the best person to decide how to best do the garbage collection.

Last, if control (hardware, details, etc.) and performance (time, space, power, etc.) are not the main constraints, then C++ is not the write tool. Other language might serve better and offer more [hidden] runtime management, with the necessary overhead.

Sqandr
  • 57
  • 7
3

When we compare C++ with Java, we see that C++ was not designed with implicit Garbage Collection in mind, while Java was.

Having things like arbitrary pointers in C-Style is not only bad for GC-implementations, but it would also destroy backward compatibility for a large amount of C++-legacy-code.

In addition to that, C++ is a language that is intended to run as standalone executable instead of having a complex run-time environment.

All in all: Yes it might be possible to add Garbage Collection to C++, but for the sake of continuity it is better not to do so.

Mike76
  • 699
  • 6
  • 22
  • 1
    Freeing memory and running destructors are too completely separate issues. (Java doesn't have destructors, which is a PITA.) GC frees memory, it doesn't run dtors. – curiousguy Jan 14 '17 at 11:50
3

All the technical talking is overcomplicating the concept.

If you put GC into C++ for all the memory automatically then consider something like a web browser. The web browser must load a full web document AND run web scripts. You can store web script variables in the document tree. In a BIG document in a browser with lots of tabs open, it means that every time the GC must do a full collection it must also scan all the document elements.

On most computers this means that PAGE FAULTS will occur. So the main reason, to answer the question is that PAGE FAULTS will occur. You will know this as when your PC starts making lots of disk access. This is because the GC must touch lots of memory in order to prove invalid pointers. When you have a bona fide application using lots of memory, having to scan all objects every collection is havoc because of the PAGE FAULTS. A page fault is when virtual memory needs to get read back into RAM from disk.

So the correct solution is to divide an application into the parts that need GC and the parts that do not. In the case of the web browser example above, if the document tree was allocated with malloc, but the javascript ran with GC, then every time the GC kicks in it only scans a small portion of memory and all PAGED OUT elements of the memory for the document tree does not need to get paged back in.

To further understand this problem, look up on virtual memory and how it is implemented in computers. It is all about the fact that 2GB is available to the program when there is not really that much RAM. On modern computers with 2GB RAM for a 32BIt system it is not such a problem provided only one program is running.

As an additional example, consider a full collection that must trace all objects. First you must scan all objects reachable via roots. Second scan all the objects visible in step 1. Then scan waiting destructors. Then go to all the pages again and switch off all invisible objects. This means that many pages might get swapped out and back in multiple times.

So my answer to bring it short is that the number of PAGE FAULTS which occur as a result of touching all the memory causes full GC for all objects in a program to be unfeasible and so the programmer must view GC as an aid for things like scripts and database work, but do normal things with manual memory management.

And the other very important reason of course is global variables. In order for the collector to know that a global variable pointer is in the GC it would require specific keywords, and thus existing C++ code would not work.

Bob Holmes
  • 47
  • 1
0

Mainly for two reasons:

  1. Because it doesn't need one (IMHO)
  2. Because it's pretty much incompatible with RAII, which is the cornerstone of C++

C++ already offers manual memory management, stack allocation, RAII, containers, automatic pointers, smart pointers... That should be enough. Garbage collectors are for lazy programmers who don't want to spend 5 minutes thinking about who should own which objects or when should resources be freed. That's not how we do things in C++.

Marc Coll
  • 95
  • 5
  • There are numerous (newer) algorithms which are inherently difficult to implement without garbage collection. Time moved on. Innovation also comes from new insights which match well to (garbage collecting) high level languages. Try to backport any of these to GC free C++, you will notice the bumps in the road. (I know I should give examples, but I am kind of in a hurry right now. Sorry. One I can think of right now revolves around persistent data structures, where reference counting won't work.). – BitTickler Aug 07 '18 at 16:27
0

Imposing garbage collection is really a low level to high level paradigm shift.

If you look at the way strings are handled in a language with garbage collection, you will find they ONLY allow high level string manipulation functions and do not allow binary access to the strings. Simply put, all string functions first check the pointers to see where the string is, even if you are only drawing out a byte. So if you are doing a loop that processes each byte in a string in a language with garbage collection, it must compute the base location plus offset for each iteration, because it cannot know when the string has moved. Then you have to think about heaps, stacks, threads, etc etc.

www-0av-Com
  • 596
  • 8
  • 13