15 Memory Manager
This assignment has you implement an automated memory management system for an existing interpreter.
15.1 Language
You are provided with a mostly-complete interpreter for Vulture, which has the following concrete and abstract syntax:
exp := (<op> <exp> <exp>) |
| (let (<id> <exp>) <exp>) [single-binding let] |
| (if <exp> <exp> <exp>) |
| true |
| false |
| <number> |
|
| (lam (<id>) <exp>) [single-argument functions] |
| (<exp> <exp>) |
|
| (pair <exp> <exp>) |
| (left <exp>) |
| (right <exp>) |
| (set-left <exp> <exp>) |
| (set-right <exp> <exp>) |
|
| (do <exp> <exp> ...) |
|
| (gc <exp>) |
|
op := + | - | < | > |
id := any symbol |
data Op: | op-plus | op-minus | op-greater | op-less end data Expr: | e-num(n :: Number) | e-bool(b :: Boolean) | e-op(op :: Op, left :: Expr, right :: Expr) | e-if(cond :: Expr, consq :: Expr, els :: Expr) | e-let(x :: String, bind :: Expr, body :: Expr) | e-lam(arg :: String, body :: Expr) | e-app(f :: Expr, arg :: Expr) | e-id(x :: String) | e-pair(first :: Expr, rest :: Expr) | e-left(pair :: Expr) | e-right(pair :: Expr) | e-set-left(pair :: Expr, newl :: Expr) | e-set-right(pair :: Expr, newr :: Expr) | e-do(exprs :: List<Expr>) | e-gc(expr :: Expr) end
15.2 Memory Management
The most important difference between other interpreters you’ve seen and the Vulture interpreter is that, given enough allocations, the interpreter simply runs out of space! You will write both the functions that allocate on the heap, and a function to collect garbage on the heap when it runs out of space.
15.2.1 Runtime Representation
First, Vulture defines a data structure called Heap for keeping track of stop & copy metadata, along with an Array representing memory:
data Heap: | heap(memory :: Array, offset :: Box<Number>, front-half :: Box<Boolean>) end
Note that the offset and front-half fields are boxes, not just numbers. In addition, the memory field is a mutable array. You should use the provided set-box and get-box functions to manipulate offset and front-half, and the operations get-now and set-now on arrays to mutably update the heap when necessary. Your memory manager should only create new heaps at the beginning of the run of a program, and shouldn’t create new ones on the fly.
15.2.2 Memory Layout and Allocation
There are four kinds of values, each with its own representation. All of the values are stored as a string tag followed by a number of values in adjacent indices of memory.
num:
"num" number
The tag is "num", and the next location holds the value of the number as a Pyret number.
bool:
"bool" boolean
The tag is "bool", and the next location holds a boolean.
pair:
"pair" left-loc right-loc
The tag is "pair", and the two next locations hold the locations of the values for the left and right parts of the pair (why is it imperative that they hold the location of the value and not the value itself?). Locations are stored as Pyret numbers.
closure:
"clos" number string_1 loc_1 ... string_n loc_n string_arg Expr
The tag is "clos". The location after that holds a number representing the *number of bindings* in the environment of this closure. The following locations hold alternating strings and locations, representing the environment of the closure, with strings as identifier names and locations that point to the locations of values those identifiers map to. The penultimate location holds the string name of the function’s argument. The final location holds the body original lambda expression that defined this closure, as an Expr.
So, for example, in this heap:
[array: "pair", 3, 5, "num", 1, "clos", 1, "x", 0, "y", e-id("x")]
the first location 0 holds a pair, whose left element is the number at location 3 (the number’s value is 1), and whose right element is the closure at location 5. The closure at location 5 has a parameter "y" (which it doesn’t use), and closes over an evironment where "x" points back to the pair at location 0. Its body is the expression e-id("x").
Your first task is to write an allocator function for each of the value types in the language. The interpreter calls these functions when creating values, and it’s the job of the memory management system to fill them in:
alloc-num :: (Heap, Stack, Number -> Loc)
alloc-bool :: (Heap, Stack, Bool -> Loc)
alloc-pair :: (Heap, Stack, Loc, Loc -> Loc)
alloc-closure :: (Heap, Stack, Env, String, Expr -> Loc)
The data definition for the heap, along with a description of where to allocate, is described below. When an alloc- function is called, it puts the value’s data in the heap as described above, and returns the location of the beginning of the data (the place where the string tag, like "num" or "clos" is stored). If there isn’t enough space on the heap, an out-of-memory exception should be thrown. Note that the heap runs out of space when the current live half of the heap is full; the allocator should never fill up more than half of the heap with live data.
15.2.3 Contexts and Stack
Vulture’s interpreter is defined with Contexts as in the explicit stack interpreter. They differ in that in the positions that held values before now hold locations, which reference positions on the heap:
data Context: | c-op-left(op :: Op, right :: Expr) | c-op-right(op :: Op, left :: Box<Loc>) | c-if-cond(thn :: Expr, els :: Expr) | c-app-f(arg :: Expr) | c-app-arg(f :: Box<Loc>) | c-let-bind(x :: String, body :: Expr) | c-pair-left(right :: Expr) | c-pair-right(left :: Box<Loc>) | c-left-pair | c-right-pair | c-set-left-pair(newl :: Expr) | c-set-left-newl(pair :: Box<Loc>) | c-set-right-pair(newr :: Expr) | c-set-right-newr(pair :: Box<Loc>) | c-do(exprs :: List<Expr>) | c-binding(x :: String, v :: Box<Loc>) end
Note that these location are stored in Boxes, which gives the garbage collector a point of indirection for updating the stack after moving values around on the heap. This is discussed more in the next section.
Other than this difference, the contexts are handled much the same as in the explicit stack assignment. The provided interpreter is provided for you to explore for more detail about how they are used.
15.2.4 Collecting Garbage
For collecting garbage, you must implement a stop and copy collector. Your garbage collector will have the following signature:
collect-garbage :: (Heap, Stack -> Nothing)
This function will be called in two different situations:
By the interpreter (not by your code) in the (gc e) expression, which explicitly triggers a collection and is useful for testing.
In your code, when an allocation is attempted by calling one of the alloc- functions, but there isn’t enough space on the heap.
Since collect-garbage has a return type of Nothing, it must do all of its work by mutating the heap.
The locations on the Stack provided to collect-garbage make up the set of locations to start from (the root set) in finding the live data on the heap. When called from a gc expression, this will be the current stack. In the alloc- functions, you may need to do some extra work with the stack to keep track of some values. With that in mind, the locations on the stack define the root set, and from this root set, collect-garbage should proceed to copy all the live data from one half of the heap to the other.
The algorithm you write should:
Treat the heap as if it is divided exactly in half (assume all heaps are even-sized), so for a heap of size N, locations 0 to (N / 2) - 1 are one semispace, and locations N / 2 to N - 1 are the other semispace.
The runtime "front-half" field should be true when allocating from the front half, and false when allocating in the back half.
The "offset" field should always store the current next point of allocation in absolute terms from location 0.
Run in O(n) time, where n is the size of the live data on the heap.
Example
For a heap of (total) size 40, here’s what it might look like before and after a garbage collection. We assume as input to garbage-collect a stack that only has a single binding in the root set, which points to location 5:
[list:c-binding("f", box(5))]
So, we assume that stack, and the following heap, are passed to collect-garbage:
front is live, offset is 16
0 1 2 3 4 5 6 7 8 9
--------------------------------------------------------------------------------------------------------------
0 | "pair" 3 5 "num" 1 "clos" 1 "x" 0 "y"
10 | <expr> "pair" 0 14 "num" 72 0 0 0 0
20 | 0 0 0 0 0 0 0 0 0 0
30 | 0 0 0 0 0 0 0 0 0 0
40 |
The live data is the closure itself, and the pair and number value that are reachable from it (at locations 0 and 3), so they need to be copied over. The pair and number at locations 11 and 14 are garbage, and should be collected. Here’s what a post-collection heap might look like:
back is live, offset is 31
0 1 2 3 4 5 6 7 8 9
--------------------------------------------------------------------------------------------------------------
0 | "fwd" 26 5 "fwd" 29 "fwd" 20 "x" 0 "y"
10 | <expr> "pair" 0 14 "num" 72 0 0 0 0
20 | "clos" 1 "x" 26 "y" <expr> "pair" 29 20 "num"
30 | 1 0 0 0 0 0 0 0 0 0
40 |
You don’t have to produce this exact output; there are several possible traversal orders that will work. Those values are copied over to the back half of the heap (starting at location 20), with updated pointers that reference the copied values in the new live half of the heap. The runtime "offset" is 31, and "front-half" should now be false. Note also that some residue is left on the front half of the heap: there are forwarding pointers with tag "fwd" left there. You might find the representation of forwarding pointers that this suggests to be useful.
Finally, your collector needs to ensure that any references on the stack are updated to handle any moved data. So the initial stack in this example, which held a single environment with a single binding, needs to have the Box within it updated so the binding points to 20, the new location of the closure that was originally at location 5. Note that we use Boxes so that you can perform mutable update on the bindings in the environment to achieve this extra level of indirection. You shouldn’t construct a new stack in garbage-collect, but should use stateful operations on boxes to make sure all of the bindings point to the right locations.
15.3 Testing and Debugging
There is no provided reference implementation of the garbage collector – it’s up to you how to design the traversal and copying algorithms, and make sure you sufficiently test the collector.
A few notes to help you test:
If you test for array equality, make sure to use equal-now (the =~ operator). Since arrays are mutable, Pyret responds with false for arrays that are not identical when using ==. The testing operator is uses == by default, so you should use is=~ for tests that involve comparing arrays.
The interpreter returns a Result data structure containing the heap and a location representing the value returned by the program. One thing that may be useful for testing, and for exploring the data structure initially, is to write a function with this signature:
Heap, Loc -> Value
where Value is a data structure you define that looks like the Values we’ve had in previous assignments. This will force you to understand the heap structure, and give you an easy mechanism for testing that programs return the correct values.
This will work well for many values, but will run into issues with cycles, so be careful.
Another testing strategy is to write functions that observe the structure of the heap. Some of these are in a sample file, and we’ll go over this in lab:
https://code.pyret.org/editor#share=0B32bNEogmncOTTVuV2hsTHM0Skk&v=v0.5r852
For debugging, the support code provides a function called print-heap that you can use to pretty-print a heap in the grid style above. Feel free to copy, adapt, and extend this function if you want more or different information printed.
Similar to the last assignment, the interpreter is defined by a single-step function step, so you can easily walk through the evaluation of a program and print the heap at each point by using step.
15.4 Submission Steps and Deadlines
The provided files are:
Test stencil (for you to edit): https://code.pyret.org/editor#share=0B32bNEogmncOTUZSNHk3SGFOSTQ&v=v0.5r852
Code stencil (for you to edit): https://code.pyret.org/editor#share=0B32bNEogmncONUlDNThmWUNvNzA&v=v0.5r852
Interpreter (for reference, not to edit): https://code.pyret.org/editor#share=0B32bNEogmncOR05SQ3VhN2lwMk0&v=v0.5r852
Definitions (for reference, not to edit): https://code.pyret.org/editor#share=0B32bNEogmncOck9pRllMRDVoT3M&v=v0.5r852
By midnight Saturday, April 25, submit 10 interesting test cases in whatever style you think is most useful. Partners should collaborate on testing strategy, but still submit separate tests.
By midnight Monday, April 27, submit reviews for tests you received.
By midnight Wednesday, May 6, submit a complete bundle of tests and GC implementation.