Lecture 14 Garbage collection (GC): automatic reclaimation if heap records that will never again be accessed by the program. - GC is normall adopted for languages with closures and complex data structures that are implicitly heap-allocated. - GC should be useful for any language that supports heap allocation, be it allocated implicitely or not 1) Reference Counting 2) Mark-Sweep 3) Copying Collector 1) Reference Counting Objects in the heap are associated with a counter that counts the number of references to this object. When counter has zero means this object is not in use any more. o varied size problem. o track links from the object (garbage now) to other objects to update their counter. Pros: o simple o incremental Cons: o expensive (fairly): extra space to keep the count, extra time to change/ckeck the count o cycles e.g., double link. as a whole, garbage. But how you find out? |-----| |-----| | |<----------| | | | --------->| | |-----| |-----| Tracing: o start with root set o reachability: reachable from root set => not garbage 2) Mark-Sweep Do tracing and mark all active objects. Walk through the whole heap: unmark all marked nodes, and collect (or sweep) all nodes that are garbage into a freelist. Pros: o can handle cycles o one bit/object to mark v.s. an integer to count reference in Reference Counting o low processing Cons: o pause for garbage collection o O(L+G) time, where L is # of live objects, and G is # of garbage. Internal and external fragmentation 3) Copying and Collector Objects live in FromSpace. When FromSpace runs out, do garbage collection (GC). GC: o tracing mark alive objects o copy alive objects to ToSpace o exchange the role of FromSpace and ToSpace Pros: o No free list (so no problem with different size of objects) o O(L) time where L is # of live objects o compact v.s. fragmental Cons: o use only half space (means GC more often) o pause Details about Copying Collector 1. move all objects pointed by the root set to ToSpace 2. scan these objects (already in ToSpace) and move objects in FromSpace which are pointed by the objects already in ToSpace. Comparison: H: size of the heap R: reachable data Cost_M&S = (c1*R + c2*H) / (H-R) Cost_copy = c3*R / (H/2 - R) If H -> infinite, then Cost_copy -> 0 but Cost_M&S -> c2 Therefore, asymptotically copying collector cost less than mark and sweep. 4) Generational Collector one observation: long-live objects tend to live live longer. strategy: to divide objects into generations.