This is the second in a series of posts will discuss why garbage collection is hard, especially for Rust, and brainstorm about solutions to the problems we face.
The previous post wrote down some criteria for integration. Now I want to delve into why satisfying those criteria is hard, at least in Rust as it stands today.
Let us make some assumptions, so that we can focus on why this problem is still hard even after being somewhat simplified.
As previously discussed in the proloque, one can think of the main program and the GC as coroutines. We will continue with that mind set.
Let us assume (for now) that the main program will not be running concurrently with the GC;This is certainly a significant assumption; in practice, to enforce this we would probably need to employ some sort of rendezvous protocol with gc-safepoints on every thread that might hold roots for a given GC Heap. (Of course, one might also satisfy this by directly adopting the coroutine model and also disallowing sharing of references into any one GC heap across multiple threads.) or more specifically, the main program thread(s) will not read or write any GC roots nor GC Heap-allocated objects concurrently with the GC thread.
In addition, let us assume that in the context of the GC coroutine, no mutator roots are live values in CPU registers;This assumption is mostly a convenience for the text below, so that instead of saying “in live callee-save registers or on the stack”, I can just say “on the stack.” Also, in practice, I do not know if many GCs actually handle this in a more “clever” fashion than just forcing such values to live on the stack during a root scan (though certainly some do support roots in registers). in other words, all mutator register values that the GC might care about will have been saved somewhere in a mutator stack frame (and will be reloaded from those saved stack slots before subsequent use by the mutator).
Again, these assumptions are not meant to be interpreted as specific requirements of Rust’s final solution for GC; instead, they describe a simplified version of “the GC integration problem” that I claim is still hard to solve for Rust in general.
Throughout most of this post, I will be discussing various data structures to support GC activity. When providing concrete examples of the runtime state, the goal will usually be to represent something analogous to the following fragment of an object graph (or some small variant thereof).
Instances of structured data are
shown with a labelThese labels are often a presentation artifact: they do not necessarily denote a header word in the memory itself.
Y) that identifies the data and usually includes its type.
Often I will omit the type of a local variable or member (such as with
z above). If I want to specify the type, I will do so via
x: Gc<X> above), and if I want to specify the particular
value given to a variable, I will use assignment syntax (e.g.
o = Box(O) above).
(Note that the assigned value should always be redundant, since there
should also be an arrow linking to the assigned value in these diagrams.)
Values of type
Gc<T> hold references to objects allocated on the GC Heap.
Every object on the GC Heap will have a label that is derived from the
T and, if necessary, a numeric suffix to disambiguate between
multiple instances of
T on the GC Heap.
I have denoted the contents of the objects on the GC Heap by ellipses, because I am focusing in this post solely on problems related to finding the roots; the contents of the objects referenced by the roots, and the remaining transitively reachable objects beyond them, are not important to us today.
Objects allocated on the Rust Heap will tend to be boxes owned by
values of type
Box<T>; they will have an identifying label and the
type of the contents (e.g.
O: StructZ above).
I will tend to present examples of structured data with trivial
structs that have one field; e.g.
StructY has a single field
StructZ has just the field
z. (Of course in real
programs there will be structs and arrays with multiple members, but single field
structs simplifies the diagrams here.)
Rust complicates Root Identification
At some point, probably in response to a memory allocation request, the GC is going to initiate a collection.
That requires traversing the root set of the main program, since those roots will be the kernel that the GC uses to identify the reachable objects.
What are the options for traversing the root-set?
Do we need to solve this problem?
One approach is to “define away the problem”; one version of this I previously described is to hide the root-set itself inside the black-box abstraction that we are interoperating with, and expose only handles that point to the roots.
The key principle in this picture is that the GC is meant to be
completely isolated from the state of the main program; when it does a
collection, the GC just starts from the root-set hidden within the
handles in the GC-heap. It does not inspect any state in the boxes
labelled “Stack” nor “Rust Heap.”
But a big problem with this, that I failed to stress in my earlier
post, is that you now need to manage the hidden-root set stored in
In particular, in the above picture, every entry in
handles maps to
Handle value on the “Stack” or “Rust Heap.” This leads
to some troubling questions.
What happens when you clone the box referenced by the local variable
o: does that need to create a new entry in the hidden
How about if you instead dropped
o– does that clear the
handlesentry at index 2?
If not, when/how will the root set be updated appropriately?
If so, are previously cleared entries reused? If so, how do you determine whether an entry is available for reuse – do you keep a mark-bit on each handle?
This handle array maintenance sounds expensive, maybe we should instead periodically scan the stack to look for pointers to handles …
Just to be clear: the joke here is that we are basically suggesting layering our own semi-automated memory management system on top of a third-party automated memory management system. We should be striving to reduce our problems to smaller subproblems, not reproducing them. … maybe we should rethink our overall approach here.
Scanning the Mutator State
So let’s assume we are not dealing with a complete black box; instead, the main program (aka “the mutator”) and the GC are going to collaborate in some more fundamental way.
In other words, let’s assume that roots are allowed to leak outside of the GC Heap and into the mutator; no more black-box.
Once we have roots floating around under the control of the mutator, we need to talk about identifying those roots by inspecting/querying the mutator state.
Some relevant issues to consider on this topic:
Are all roots precisely identified as roots?
Where can the roots reside in the mutator? (Frames on the stack? Boxes on the Rust Heap?)
How is the GC informed about the location of the roots in the mutator?
How does the mutator itself access the roots?
What information might the mutator maintain on the behalf of the GC?
Might a given root’s value (i.e. the address of the referenced object on the GC Heap) be changed by the GC during the collection (in other words, does the GC rule out pinning)?
Let’s explore some of the problems associated with these questions, especially how it relates to Rust.
Are roots precisely identified?
The roots are somewhere in mutator-managed memory. The GC will need to know the values held in those roots, and possibly will need to update those values if the referenced objects are moved in memory.
There are two basic options for root scanning: conservative or precise.
A conservative scan is needed when some of the values might hold an actual root, but might also hold a false-positive.
This arises when, for example, there is not enough type information available“Not available” can mean that that the information is absent; but it can also mean that it is untrusted. I discuss this further below. to know that a scanned word is meant to be interpreted by the mutator as an object reference.
If there are any conservatively-scanned roots, the GC needs to validate their values (e.g. by checking if it lies within one of the ranges of addresses used for the objects allocated on the GC Heap), and trace any object potentially referenced by such values.
An earlier discussion on “pinning” established that any object referenced by a conservatively scanned root cannot be moved by the GC. Therefore, integrating with a GC that does not support object pinning (such as a fully-copying collector) will require we scan the roots precisely, not conservatively.
One problem with ensuring that a word on the stack is precisely identified
is that it requires close cooperation with the compiler backend.
E.g. if the backend (LLVM in the case of
rustc) is permitted to reuse a stack
slot for two values of different types (and disjoint extents) then
we need to take care that the GC knows whether the current value in that slot
is or is not a GC reference.
(LLVM is a very expressive backend, so it provides mechanisms to account for this scenario, but it is not automatic.)
A given word in memory can be precisely scanned
if we ensure that the root’s location in memory is
unambiguously interpreted by the mutator as an object reference.
I will say that such a root can be
“unambiguously classified” only if such assurance is established.
Often the ability to classify a root unambiguously is derived from static types, runtime type information, and global system invariants.
Where the roots might reside influences the design space for unambiguous classification quite a bit. For example, if all roots are members of heap-allocated objects, then the allocator might embed a type-tag in the header of such an object, or segregate objects into disjoint regions of memory based on that type.
Therefore, we will explore the question of where roots reside next.
Where are the roots?
There are two components to the question “where are the roots?”:
Where can roots possibly reside?
Where do the roots actually reside?
The first bullet is a question about the system as a whole; it is a question that we must answer as designers.
The second bullet is about encoding how the GC will look up the memory addresses of the roots (potentially with the mutator’s help) every time it wants to initiate a root scan.
The two parts interact with each other, so we will address them both somewhat simultaneously.
This list is leaving out some other options, such as completely unconstrained, where roots might live in memory unknown to the both the GC and Rust runtime (I do not see a way this first option could work without requiring programmers to instrument foreign code with calls to root registration and deregistration functions), or keeping the roots solely on a shadow stack with structure isomorphic to the actual stack, but not vulnerable to disruption due to compiler code transformations (I am omitting this second option since it is known to carry a significant performance penalty).
Consider these options for locations where roots can reside:
Constrained To Boxed Values: Solely found embedded in boxed values on the Rust Heap.
Constrained To Stack: Stored solely on the program stack, and
Rust-Managed But Otherwise Unconstrained: Stored either on the stack or embedded in boxed values on Rust Heap.
Roots Constrained To Boxed Values (Option 1)
If roots are solely stored in members of boxed values, then we might store runtime-type metadata in an allocator-injected header.
This option is seductive: Adding a header via the runtime system’s
#[allocator] crate could sidestep a significant amount of compiler
integration (maybe even all of it).
There are some interesting ideas to explore from that starting point, such as collecting all such boxed values together in a linked list whose head is known to the GC (and thus the answer to “how does the GC scan the roots?” is just “it walks the list”Do not be fooled; it would not be that easy. In particular, properly maintaining such a list could complicate the interface between the mutator and the values holding the roots. ).
constraining roots to solely live in members of boxed
values may not be feasible in Rust as it stands today.
For example, one is always free to move the instance of
T out of a
Box<T>, deallocating the backing storage but moving the
another location, such as a stack-allocated local variable.
Let’s look at the remaining two approaches.
Roots Constrained To Stack (Option 2)
If roots can be stored directly on the stack (i.e. options 2 or 3 above), then when the GC initiates a root scan, it will need to find those roots.
This search of the stack can be guided by “stack maps”:For details, see Compiler Support for Garbage Collection in a Statically Typed Language, Diwan Moss and Hudson (1992). compiler-generated metadata providing a mapping from a code addressThis mapping need not have an entry for every address from the program instruction stream; we can make do with just the addresses of call-sites into runtime routines that could cause a GC. to the set of stack slotsMore specifically, the offset in a stack frame of a slot, and any relevant type information needed by the GC the compiler opted to include. that hold values of interest.
However, restricting the roots to live solely on the stack may be
problematic for much the same reason that plagues the earlier idea of
restricting roots to boxed values: in Rust today, one is always free
to move instances of
T from a stack-local slot into a member of a
In some circumstances, we might be able to counteract these “freedom of movement” issues in a backwards-compatible manner with a compiler plugin (lint) that analyzes the source and trys to flag any code might move a root into an illegal location. (Servo already uses a lint like this for its integration with the Spidermonkey GC.)
Or, if we are willing to extend the language itself,
we might add marker trait
Immobile that indicates
that values of such type cannot be
moved.Proper integration of
trait Immobile would probably require a way type for type parameters to opt-out of the capability to be moved, e.g. via a
T: ?Moved anti-bound, analogous to the
Yes, I just made up the term “anti-bound.”
But either of those options are just ways of enforcing a restriction,
and it will outlaw certain kinds of program composition.An easy example of this: If you want to be able to put a root as part of the element type in a
Vec<T>, then that
T has to be able to be moved (since expanding the capacity of a vec will require moving its contents from one backing buffer to another).
In practice, we simply may be better off lifting such restrictions entirely. So, let us now consider our remaining option: allowing roots to be embedded in values on the stack or boxed on the Rust Heap.
Roots are Rust-Managed, But Otherwise Unconstrained (Option 3)
Now we come to what is probably the most realistic option for Rust/GC integration: allowing roots to reside anywhere that the Rust compiler or runtime knows about.
Arguably, I might well have started this discussion with this approach, since it is by definition the most general of the three, and thus if we do have a solution for it, then why not jump to it?
The reason I left it for last is that I suspect any design we adopt for GC integration in Rust 1.x is going to require a hybrid of the approaches described in the prior two sections (allocator-injected metadata and stack maps), and therefore I wanted to outline them in isolation, before I started mixing them together.
GC: “Where are the roots?”, Mutator: “…”
If we assume that roots can be embedded in values either on the stack or in boxes on the Rust Heap, then how will the GC find the roots when it needs to scan them?
The support for the GC’s root scanning capability can be seen as having three parts:
What work does the GC itself do, on the fly, to determine the roots when it needs them,
What work does the mutator do (if any) as it executes the program“Mutator work” here includes code hidden in library functions the mutator calls, such as
#[allocator]subroutines, or code automatically injected by the compiler, such read- or write-barriers. to support a potential root scan by the GC in a future, and,
What meta-data must be gathered and emitted by the compiler to support root-scanning?
One idea for enabling easy GC root traversal was mentioned earlier: why not collect the roots together in a linked list structure? Towards this goal, we might consider maintaining an intrusive links forming a list of all roots.
This is an intrusive list because the pointers in the list are pointing into the interiors of objects. This allows traversing the list to be completely uniform (from the viewpoint of the GC, it looks like nothing more than a linked list of pairs). In this scenario, the GC does essentially zero work on-the-fly to find the locations of the roots; maintaining the list would become the reponsibility of the mutator as it creates and moves values with embedded roots.
However, Rust today does not have good support for intrusive data structures (RFC Issue 417).
The already-mentioned capability to move values freely, as well as the
capability to swap a pre-existing
T with the value held beind a
&mut T reference, are among the reasons that intrusive structures
are difficult today, since it changes the addresses associated
with objects, and thus requires updating of the interior links.
So, what other options do we have?
Having the GC traverse the memory of the call-stack, even with the assistance of a stack map to provide precise type information, will not give us the locations of all the roots, since some are located on the Rust Heap. A stack map cannot hold the addresses of the blocks of memory dynamically allocated for a box on the heap.
However, the stack map can hold the type information for the local
variables, and that sounds promising: If we record that a local
o has type
Box<Struct>, then treat the contents of the
box on the heap as owned by the stack, so that when we encounter
during the stack scan, we can recursively scan the memory of the box,
using the type
Struct to inform the scan about how to treat each of
the embedded members.
I have slightly modified the running example to show two instances
of the local
x on the call-stack in separate frames, each corresponding
to a distinct (recursive) invocation of the function
This is just to bring home the point that the stack map info encodes static information about the frame for a function (at a particular call-site), and thus recursive invocations of the same function can potentially reuse the same entry in the stack map.
The principle is that when control shifts to the GC coroutine, it walks through the stack backtrace, consulting the stack map for each callsite.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
From the stack map, it finds the offsets of relevant local variables
within that stack frame, and the type information for those locals, so
that it knows when it needs to dereference an pointer to inspect a
block on the Rust Heap (such as the
Box(O) in our running example).
The GC will need separate meta-data describing the layout of each type, with the offset and type of each field of interest:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
The boxed objects may themselves own other root-holding objects on the Rust Heap, like so:
To find all the roots starting from the stack
in the presence of such ownership chains
(which may go through other types like
the GC will need to recursively traverse the boxes,
or otherwise enqueue them onto a worklist structure.
In principle, if we can prove that certain types never
transitively own roots, then the GC should be able to skip traversing
boxed data for such types.
Using the stack map and type map data to find all roots transitively owned by the stack is a promising approach. What is the catch, if any?
from_raw method that converts a
*mut T to
is unsafe, but
into_raw is a safe method. Safe code
can always convert a
Box<T> to a
*mut T, and clients
expect it to also be reasonable to round-trip via
What should we do about unsafe pointers
*mut T and
For example, it is not uncommon for library code to convert
Box<T> to a
*mut T or vice versa;
that is an ownership transfer.
local o = Box(O) above (where
but it is entirely possible that
o has type
Here are some options for how to handle unsafe pointers:
Skip unsafe pointers during root scanning.
This seems almost certain to cause unsound behavior; as noted above, transmuting
*mut Tis an ownership transfer, and if
Tholds a root, then later code might access it. This means that the roots owned by
Tneed to be scanned, to keep their associated objects on the GC Heap alive.
Punt the question: if a program uses GC, any use of unsafe pointers (as local variables or as members of structures) needs some sort of attribute or annotation that tells the GC how to interpret the value held in the unsafe pointer.
A Rust program that uses GC should be able to link to a crate whose source code was authored without knowledge of GC.
Requiring annotations on every use of unsafe pointers means sacrificing this goal.
Treat unsafe pointers as potential root owners: Traverse them and use their type as the basis for the scan.
This seems like the most reasonable option. But, can the types of unsafe pointers be trusted?
Is the meta-data trustworthy?
We assumed the existence of stack and type maps. But where do they come from?
These maps need to be generated by the
rustc compiler; after all,
they rely on low-level details of the generated code, such as the
offsets of fields within types, the offsets of local variables in a
stack frame, or the addresses of function call-sites.
rustc compiler, in turn, is going to generate the information
based on the source code text. So far, so good.
Here’s the rub: we assumed that the stack map will tell us the types we need for all local variables of interest for all frames on the call stack.
But in practice, a value can be cast to a different type.
In particular, in today’s Rust 1.x, it is considered safe to cast
*mut T and
*mut U for any
1 2 3 4 5 6 7 8 9 10 11 12 13
This is a real problem, in terms of the goals we have set up for ourselves. 100% modular GC requires that we be able to link with code that does things like the above with the owners of its roots, and that includes when the roots are indirectly held in boxes on the Rust Heap.
We may be able to add constraints on the
Gc<T> type to prevent such things
from occurring when the types are apparent (e.g. when one is working with a
local variable of type
Gc<T>). But in general, the types will
not be apparent to the code performing the cast; in particular,
we would still need to address type-parametric code that performs
such casts of unsafe pointers.
What can we do about these problems?
One obvious response to the untrustworthy meta-data problem would be to change the language and make casts from
unsafe.Indeed, we may make such casts unsafe anyway; nmatsakis has said during informal conversation that he is not sure why we classified such casts as safe in the first place.
This would deal with the problem at a surface level, in the sense that
we would be able to at least allow a program using GC to link to a
GC-oblivious crate if the latter crate did not use any
But it would not be terribly satisfactory; we want Rust’s
solution for GC to be able to link to as many crates as possible, and
ruling out all code that does any cast of an unsafe pointer seems quite limiting.
We could also revise the set of goals, e.g. scale back our ambitions with respect to compositionality, and return to ideas like having the roots constrained to stack, as discussed above.
An alternative solution I have been considering is to try to adopt a
hybrid approach for root scanning: use stack maps for the local
variables, but also have the allocator inject tracing meta-data onto
the objects allocated on the Rust Heap, and do a kind of conservative
scanning, but solely for finding roots embeded in objects on the
Rust Heap. This way, unsafe casts might become irrelevant: when
encountering any native pointer (e.g.
*mut u8), we would ignore
the referenced type and instead look up whether it is an object on the
Rust Heap, and if so, extract the allocator-injected tracing
I plan to dive more deeply into discussing solutions in a follow-up post. This post is already quite long, but more importantly, I want to get some concrete data first on the overheads imposed by the metadata injected during allocation in state of the art conservative GC’s like BDW.
Oh, and finally (but completely unrelated): Happy 2016 to all the hackers out there! Hope that you copied over all your live objects from 2015!