This post is a prequel to a series of posts discussing why garbage collection is hard, especially for Rust, and brainstorming about solutions to the problems we face.
The goal of this post is to provide the background foundational material about Garbage Collection that the other posts will then build upon.
You can skip ahead to the follow-up posts (once they are published) if you feel you are already well-versed in the low-level mechanics of garbage collection.
I may add more material to this post in the future if I discover a need to provide more detail on a particular subtopic, such as “write barriers”.
Update (17 November 2015): It has come to my attention that portions of the post are not rendering properly in Google Chrome. I will try to fix this, but in the meantime, you should be able to get the proper renderings in Safari or Firefox. (I do not yet know about other browsers.) If your browser is working as original expected, then you should see a picture load immediately beneath this text.
Update (18 Novemer 2015): many thanks to
othermike on reddit who
pointed out my bug.
The content should now render properly on Chrome (and hopefully other
What is Garbage Collection
A garbage collector is a component in the runtime for a programming language that periodically attempts to reclaim memory (without requiring explicit calls to memory-freeing routines in the programning language). To do this soundly, the collector must identify blocks of memory that cannot possibly be used in the future by the program (i.e., “dead objects”).
Discussions of garbage collection often equate the notion of “dead object” with “unreachable object”: If no chain of references exists that could lead the program to an object, then that object cannot be reachedResearchers have explored methods to identify objects as dead even when reachable, such as using “parametricity”; but I am not aware of any such method being used outside of a research setting. (and therefore cannot be used in the future).
When one says “garbage collector”, one usually means a “tracing garbage collector”: a collector that works by identifying the reachable objects by computing the connected components that include the “roots” of the object graph. (The “roots” are the starting points from which any chain of references must originate in the source program.)
So, for example, we might have the following set of gc-managed objects (labelled “A” through “F” below), along with a register file labelled “RF”.
In the simple model above, the roots are the processor registers. Such a model is applicable to a language runtime where all memory blocks (including the stack frames) are managed by the garbage collector. (In other words, we are not talking about Rust yet.)
The reachable objects, as stated above, are the connected components of the graph that includes the roots, highlighted below.
A garbage collector would determine that the objects labelled “D”, “E”, and “F” are unreachable, and thus their storage can be reclaimed.
How Garbage Collection works
A garbage collector is often presented as a coroutine
Coroutines are much like subroutines,
except that instead of having a parent-child relationship (where the
child subroutine “returns” to the parent caller), a call from
coroutine A to coroutine B: saves the current context of where A
currently is, transfers control to B, and the next time B calls A,
resumes the context that was saved at the outset.
In other words, once the linkage has been established between A and B, then A’s calls to B look like returns from the viewpoint of B, (and B’s calls to A look like returns from the viewpoint of A). that is linked in with the main program. The main program itself is often referred to as a “mutator”, since it is the entity that mutates the object graph. (The collector does not modify the abstract object graph, but rather the representation of the object graph in memory.)
The mutator requests memory from some allocation service (usually deeply integrated with the garbage collector for reasons we will see). If there is a memory block immediately available to satisfy the request, then the allocator hands that over. If there is not sufficient free memory, then the mutator’s allocation attempt invokes the garbage collector coroutine.
Garbage collectors are also often divided into two categories: Copying collectors, and Mark-Sweep collectors. Both collectors accomplish the goal of identifying the reachable objects and reclaiming the remainder, but they do it in different ways.
It is worthwhile to remember at this point that even though our object graphs above are drawn as abstract circles and arrows, the objects are represented somehow in memory.
For example, here is one potential representation for the above object
- denotes some value that the GC knows is not a
(Assume for this example that every GC allocated object is made from four consecutive words in memory.)
Regarding the “(header)” words in the diagram:
Garbage collectors often require that the
GC-managed memory be formatted in a way such that the collector can
“parse” it when doing a scan over its address space.
As part of this “parsing”, the GC needs to be able to derive the size of each object it looks at.
The address-space could be partitioned into size classes. Or if objects in a block are to be variable-sized, the size could be recorded in object headers (which are often needed anyway to store things like mark bits or other data).
Clever representation techniques exist that avoid using header words for small objects (like pairs) without requiring size-class partitioning; but I digress.
In these pictures, there is no difference between an arrow pointing to
the left- verus right-side of a memory cell; so the occurrence of the
pointer to A (
r0 is no different than the occurrence
of that same value in memory cell
0x10004 (the first non-header word
D), even though the arc for the former is pointing at the left
side of the first memory cell of
A, and the arc for the latter is
pointing at the right side of that memory cell.
A Mark-Sweep collector works by first doing a traversal of the reachable memory, marking each object it finds (e.g. by setting a bit reserved in the object header, or in separate mark bitmap if there is no such bit reserved). This traversal requires some amount of extra memory in reserve to track remaining work for the trace (e.g. a “mark stack” of objects we are in the midst of traversing, and/or a queue of objects scheduled for future traversal).
The “Mark” phase
Here is a sketch of the traversals that the garbage collector makes in order to mark each reachable object.
As reflected in the diagram above, each object that the GC reaches has its mark bit set to “marked” in its header word.
The numbers on the arcs above are meant to correspond to a hypothetical traversal order as the GC marks the memory; particular tracing strategies may yield different orders. (No matter what, we will not trace object “G” until after we have seen “C” via some route.)
Also, I have left the memory for the mark-stack out of the picture; in this case the mark-stack would not grow very large, but in general one must anticipate the mark-stack growing as large as the longest path through the reachable object graph. (The longest path in this case is three objects long.)
The “Sweep” phase
As previously mentioned, the GC much be able to “parse” the memory when
scanning the address space.
In the case of the Mark-Sweep collector, in addition to having to be able to derive the size of each object, we also need the mark bit for each object to be located at predictable location. A Mark-Sweep collector does not move objects, so it must resort to metadata such as a free-list to track reclaimed memory. So, after the marking is finished, the GC then sweeps over the memory: it walks over the GC-managed address space and builds up a free-list of blocks that were not marked during the traversal.
(The arcs that make up the free-list above are dashed, to distinguish
them from the “real” references that make up the object graph. In the
above scheme, the pointer to the next element in the free list is held
in the second word of each free block.Putting the
next-pointers for the free-list into the second word of each four-word block is one way of satisfying the aforementioned requirement that the GC-managed memory be parseable. )
With that, the GC is done; the mutator (i.e. main program) is now free to take blocks off of the free-list to satisfy memory requests.
A conservative collector is a particular kind of Mark-Sweep collector where it is not provided enough information to know whether one or more words of reachable data that it encounters should be interpreted as a reference or not. Such a collector is forced to assume (conservatively) that if the encountered datum is an allocated address on the GC-managed heap, then that could be its purpose from the perspective of the mutator, and therefore that datum must be treated as a reference to memory.
In the diagrams above, I said that
- denotes some value that the
GC knows is not a memory reference. In a conservative collector
without any type information, the only kind of value that can be
interpreted that way is one that lies outside the addreses space of
So for example, if we had the same picture as above, but the register
r2 happen to hold an integer with value
0x10000, and the
conservative collector is not told “
r2 holds a non-reference at this
point in the execution”, then this diagram would result:
That is, even though in the program itself, the value
0x10000 is not
meant to be interpreted as a memory address,
all conservatively classified as live objects.
A Copying collector moves objects from one location to another as part of its tracing process, and updates the references in reachable objects as it goes.
I will first illustrate this using our low-level memory graph, but I will not draw the edges for the dead objects anymore, as they add significant clutter to the picture.
The Reserved “To-space”
First, we need to have some reserved memory to target as we copy
objects. I have put this target memory (the so-called “to-space”) on
the left-hand side of the picture; the nearby “avail” circle is a
local variable in the GC that indicates the starting address that we
can use to copy objects into; so it starts off at the first address,
Copying from the Roots
First we walk over the roots (in our simplified model, the registers),
and copy over all of the objects we see. So the below results after
we scan just the first two registers, copying the objects
into new locations, respectively labelled
B', and updating
Note that as we copy objects from the source memory
(the so-called “from-space”), we must maintain a map from the
original object to its newly allocated copy. In this model,
this is accomplished by imperatively overwriting the original object
fwd header marking it as “forwarded” as well as a “forwarding
pointer” (the dashed arcs) that points to the new location.
The copies themselves just get the original memory contents, so they
may have pointers to the old objects in the source memory (such as the
B' -> C arc in the picture). Those will get fixed up later.
We still need to scan the rest of the registers, which copies
Scan the “To-space”
Now that we have finished scanning the roots, we start the fixup process of scanning over the “to-space.” Every time we encounter a pointer into the “from-space”, there are two cases to consider: Either it is an already copied object (in which case there will be a forwarding pointer installed), or it is an object that is not yet copied.
If its a forwarded object, then we fixup our reference so that it
points to the new copy in the “to-space”. We see this when the fixup
scan is scanning over
B' and sees the
B' -> C reference, which it
then rewrites to a
B' -> C' reference, highlighted below.
The fixup scan is not yet complete; the next object it encounters,
C', illustrates the other case of a reference to an object (
here) that has not yet been copied. In this case, it just copies it,
in the same manner that we did when we were scanning the roots. (This
adds the forwarded object to the set of objects enqueued for fixup
Reclaim the “From-space”
Eventually the fixup scan will finish processing all of the “to-space” (since there are only a finite number of objects that could be enqueued). At this point, there will be no more reachable objects in any part of the from-space, and thus those memory blocks can be reclaimed in their entirety.
A Spectrum of Collectors
The mark-sweep and copying collection methods illustrated above actually form two extreme points on a spectrum of implementation techhniques.
In practice, many collectors are neither entirely mark-sweep nor copying, but rather employ a hybrid strategy, where some memory regions are reclaimed via a copying-collection, while others are managed via a mark-sweep method. For example, some generational collectors work by promoting objects from a young space into an older space via copying collection, but then the last space (with the eldest objects) can be managed via mark-sweep.
As another example of a hybrid strategy, there exist conservative collectors (such as “mostly copying” collectors) where exact type information is known for the heap-allocated objects, but the types are not known for the roots (i.e. the registers and values embedded in the stack). In such systems, it is not safe to move objects that are referenced via conservatively-scanned words. Such objects are “pinned” in place (which means that it cannot be moved by the collector) for the duration of this collection, and thus space in their memory blocks can only be reclaimed with via a mark-sweep collection. However, objects reachable solely via precisely-scanned words can be moved, and memory blocks made up solely of such objects can be reclaimed via a copying-collection strategy.
In our discussion to follow, rather than attempt to characterize a collector as “mark-sweep” or “copying”, it will be more useful to distinguish collectors in terms of whether or not they support “pinning”. In a language runtime that supports pinning, a mutator (i.e. the main program linked with the collector coroutine) can tag any live object as “pinned”.
(In the example of “mostly copying” above, such pinning is
accomplished by putting a reference into a conservatively-scanned
root. However, some language runtimes provide first class support for
pinning; for example, the Microsoft CLR once offerred a
pin_ptr smart-pointer for Managed C++ that would prevent
a referenced object from moving.)
In other words, in a runtime with pinning, the mutator dictates which objects can be managed via copying collection. If the runtime does not support pinning, then it is the collector that dictates which objects are managed via copying; the mutator cannot rely on the collector allowing arbitrary objects to be pinned.
Simplifying our diagrams
While I am sure it was fun to decode the above renderings of memory banks, now that we have seen how collectors work at a low-level, I am going to revert to the earlier high-level object notation. It should be easier for you to read, and (just as important) for me to write.
To show a copying collector’s intermediate state in a high-level picture, I will show newly copied objects with prime marks (and, if possible, in a separately delineated to-space), and a dashed-line for forwarding pointers.
Here is an example of this style of rendering, using the earlier
example at the point where a copying
collector had scanned just registers
r1 (but had not yet
C from register
r3), highlighting the copied objects and
the newly written references (including the dashed forwarding
(This rendering is arguably just as clear (or unclear) as our earlier memory diagram was, apart from some annoying edge-crossings due to the graphviz layout engine.)
Well, I don’t know if you learned anything about GC from this post. I certainly learned a lot about techniques for wrestling with graphviz. :)
There will be a followup post soon-ish that will bring Rust into the picture, discussing what GC and Rust integration even means, and the host of problems that crop up.