Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
News Books Media Book Reviews

Review:Garbage Collection 105

A.M. Kuchling is back again, this time with a review of Richard Jones and Rafael Lins' oddly named Garbage Collection. This book is not actually about Waste Management Inc, but is "A highly detailed survey of garbage collection algorithms, and the research literature surrounding them. Interesting, if at times very dry.". Click below for more information.
Garbage Collection
author Richard Jones and Rafael Lins
pages
publisher Wiley & Sons
rating 6
reviewer A.M. Kuchling
ISBN
summary A highly detailed survey of garbage collection algorithms, and the research literature surrounding them. Interesting, if at times very dry.

The first time you encounter garbage collection (GC, for short) is often by using Lisp or Scheme in an introductory computer science course, where it can seem an almost magical technology. How does the interpreter determine that a variable has become garbage, and can safely be deallocated? This book explains how GC works, in great detail, and interest people implementing garbage collectors or who've wondered how it's done. There are 3 approaches to GC, all of which are covered in this book:

  1. Reference counting:
    Every object keeps a count of the references pointing to it. When new references are created, the counter must be increased; when references are destroyed, the counter is decremented, and, if the count is zero, the object has become garbage and can be reclaimed. Reference counting unfortunately exacts the most overhead from constantly incrementing and decrementing the counters, and in its simplest form it can't collect cyclic data structures, because their count will never reach zero.
  2. Mark-sweep:
    Traverse the graph of object references, starting from the roots of the computation; roots might be the program's global variables, the contents of the C stack, static class variables, or a main namespace of some sort. Each object that's traversed is marked as being reachable; when the traversal is complete, any unmarked objects are unreachable because they can't be reached by the program any longer, and can be reclaimed as garbage.
  3. Copying:
    Divide memory into two halves, called "Fromspace" and "Tospace". Memory for new objects is always allocated out of Fromspace. When Fromspace fills up, a traversal similar to the one used by mark-sweep is done, but instead of just marking objects as reachable, they're copied into Tospace. Once the traversal is done, all the live objects are now in Tospace, and Fromspace no longer contains any live data; only garbage objects can be left in Fromspace. Tospace is now declared the new Fromspace, and new objects are allocated until it fills up; the algorithm is then repeated.

This book covers the above 3 algorithms for GC in great detail; there's a chapter for each algorithm, examining its limitations, common implementation techniques, and historical notes on the its development. Later chapters cover more advanced collectors that extend the basic algorithms in various ways. Generational collectors are based on the observation that many objects are temporary and have short lifetimes, while some few objects may last for the entire lifetime of the program. A generational garbage collector concentrates on young objects, because they will occupy the most recyclable space, and spends less time examining old objects that probably won't be reclaimed. Incremental and concurrent collectors don't bring the whole program to a halt while they scavenge through memory doing a collection, making them more suited for interactive or even real-time applications.

Chapter 9 is an interesting examination of the implementation of GC in an unfriendly environment, namely C programs. The Boehm-Demer-Weiser collector implements a mark-sweep garbage collector that can replace C's malloc() function, scanning stack frames and registers for potential pointers, and managing to do it fairly efficiently. One study found the Boehm-Demer-Weiser collector added 20% more execution time overhead than malloc() did -- that is, the time spent in memory allocation was 20% longer. (That doesn't mean it made programs 20% slower, because most programs spend more time computing results than they spend allocating memory). A garbage collector for C is a tricky job, complicated by the lack of type information, data values that may mimic pointers, and compiler optimizations that may render objects invisible to the collector. Barlett's Mostly Copying Garbage Collector, also described in this chapter, is more liberal and requires some assistance from the C programmer in exchange for greater accuracy.

The remaining 3 chapters consider suggestions for adding garbage collection to C++, the interaction of GC with cache memory, and distributed garbage collection. The topics of these chapters are still subjects of active research, so the final chapters are lighter on implementation details and heavy on references to research papers, making them a lot less interesting.

The authors are computer scientists specializing in GC (Richard Jones maintains a garbage collection page), and their coverage is quite complete, which at times makes for exceedingly dry reading as the authors enumerate what seems like every minor implementation variation and every relevant paper. On the other hand, if you're actually faced with implementing a garbage collector, the pointers into the research literature will be very useful. A casual reader (like me) can simply skim, or even skip, pages until the next interesting section arrives.

Excerpt

However, the matter is more subtle than this. The collectors described in this chapter are very simple and inefficient. The behaviour of these collectors cannot be automatically ascribed to their more sophisticated variants. The cost of copying an object is likely to be more expensive than simply testing and setting a mark-bit, particularly if the object is large. Although mark-sweep must sweep the entire heap, in practice its real cost is dominated by the mark phase. Linearly scanning the heap will generally be less expensive than tracing data structures even using the simple technique shown above. ... Furthermore, more sophisticated methods can substantially reduce the cost of the sweep. Although copying collectors have predominated in the past, recent studies suggest that the choice between mark-sweep and copying garbage collection may well depend as much on the behaviour of the client program as on the inherent properties of the garbage collection algorithm.
-- From section 2.4 of the book.

To purchase this book, head over to Fatbrain.

Table of Contents

  1. Introduction
  2. The Classical Algorithms
  3. Reference Counting
  4. Mark-Sweep Garbage Collection
  5. Mark-Compact Garbage Collection
  6. Copying Garbage Collection
  7. Generational Garbage Collection
  8. Incremental and Concurrent Garbage Collection
  9. Garbage Collection for C
  10. Garbage Collection for C++
  11. Cache-Conscious Garbage Collection
  12. Distributed Garbage Collection
This discussion has been archived. No new comments can be posted.

Review:Garbage Collection

Comments Filter:
  • by Anonymous Coward
    This book should have created a new euphamism for garbage collection. I was going to buy it, but was embarrassed to be buying a book about cleaning up trash. Microsoft called directories folders, dejanews and netscape both made up other words for usenet newsgroups, and following that precedent, we need a new name for garbage collection.
  • I don't have 80 Megs to run a hello world java application.

    Almost as a rule, Java implementation don't have modern GC. You are still exaggerating.

    The memory requirements of a garbage collected program are generally about 2x that of a manually deallocated one.
  • The most hideous bugs can lurk in memory management. You get memory leaks if you are lucky or segfaults if you are not. GC frees you from a whole class of bugs and permits leaner, more legible code.

    GC was a revelation to me when I switched from C/C++ to Perl. Of course Perl's memory management goes beyond GC, but that's another story.

    Rapid Application Development is nearly impossible without GC -- you can't develop rapidly when you are hunting down memory leaks and illegal pointers.

  • Posted by !ErrorBookmarkNotDefined:

    The GC term is used in literature, code, conferences, and everyday conversation about memory management.
    What's odd about that (besides the obvious geekiness of it all)?

    -----------------------------
    Computers are useless. They can only give answers.
  • Perl only uses a reference counting GC, and last time I looked at one of the postscript formatting modules its circular data structures caused the refcount to never reach 0.

    It's still a lot easier then not having a GC at all, though.

    -Peter
  • Rubbish disposal?

    Sanitary deportation?

    Forceful of unwanted elements?

    -Peter

  • by spacey ( 741 )
    If you're doing good generalized libraries then memory management is a problem. Just doing malloc and free is difficult if you don't know the lifetime of an object because another programmer determines that. Overriding malloc() calloc() realloc() and free() in these situations with a GC is a very nice thing.

    -Peter



  • The Harlequin Memory Management Reference is a good reference for lots of this stuff, especially the glossary:

    http://www.harlequin.com/mm/reference/
    --
  • Some programs can be *faster* and *more responsive* as a result of substituting proper MM technology for malloc/free. As well as being cleaner, more reusable and more bug-free.
    --
  • Obviously you have never written a real program spanning several million lines of code with subroutines written by someone other then you.

    Combine the above with a data structure malloced in a library not written by you (possibal in OS/2, though it isn't malloc) that needs to be freeed in a latter subroutine. Maybe we should toss threads in too, so you may have a differnt thread looking at your memory after your subroutine is done, and your subroutine has to return a pointer to memory malloced. Suddenly you don't know who is done with that memory you just started with first. Can't call free() too soon lest you prevent access by someone else who needs it, but you have to call it. GC solves this problem, and if it was fast enough would be more universial.

    The creater of C++ has listed allowing himself to be convinced that GC was too slow as his greatest regret.

    Now for simple cases you are right, but the simple cases are rarely got wrong (for long) but when the memory needs to hang around for a long time the case is no longer simple.

  • One problem with GC in CORBA: when is an object garbage, and how can the server with the object tell? If I stringify an object reference, write it down on a piece of paper, and send it message-in-a-bottle to parts unknown...the object is still referenced.

    Of course, "nobody seems to be using the object recently" is useful for objects which can be shut down and restarted...
  • GC is a *big* reason why Java is so popular in enterprise environments. Forget write once, run anywhere.

    GC and simplified pointers help you to pay more attention to the problem *at hand*, then to have to worry about differentiating between (in C++):

    - stack allocated objects that don't have to be deleted but can't be safely passed around because they go out of scope
    - heap allocated objects that have to be deleted
    - pointers vs references (i.e. dot-notation vs. '->' notation)

    Granted, these are not big problems by themselves. Put together however, they wind up being an unnecessary headache, especially when dealing with application-level programming. And on the bright side of things, Java catches your null-pointers with a catchable-exception - not a segfault.

    When dealing with systems-level stuff, putting up with C++'s self-memory management is a "grin & bear it scenario", until Java's performance increases.

    Though, of course, C++ and C will always be used where bit-twiddling is needed.

  • Actually, I agree more or less with your assessment, though I'm a little more optimistic about Java's abilities. (This probably comes down to the strong vs. weak typing argument that I won't get into. In summary: I like both, but going foward there are certain domains that need strong typing, and others that should have weak typing. This is too much of a religious issue: witness the flamewars against Tcl :)

    SmallTalk did catch on in many industries, but it never quite penetrated into mainstream use. Does anyone really know why? Perhaps it primarily because it was so advanced for its time (in the business community, I'd say late 80s to early 90s). Perhaps it was because C++ offered the "path of least resistance". I don't really know.

    Java is just an attempt (imho, and from what I understand, some veteren SmallTalker's opinions) at doing SmallTalk over again for the C++ crowd (albeit with some limitations, since the programming community at large seems to be set against weakly-typed languages)

    I love SmallTalk [and am very interested in Dylan or Self], and wish it would resurge with industry support. Until it does, (go Squeak go!) however, Java is an acceptable alternative, as it also provides the "path of least resistence" to some of SmallTalk's better technologies: GC, dynamic typing (though restricted through java interfaces), and a Packages/Classes/Methods IDE (through VisualAge for Java).

    Regarding COM: I felt that Microsoft actually did some cool stuff with their VM to support COM and make it MUCH easier to use than the C++-way of using COM. COM inherently is based upon the idea of C/C++ pointers, so it's not going to naturally map to any language, Java or otherwise (witness the atrocity called "Automation" to allow COM to talk to Visual Basic).

    Furthermore, as for your "double lines of code" assertion, I think you're overstating it a bit. Java doesn't have blocks, and it requires explicit type down-casting, but I don't see this as "doubling" code requirements. Certainly, the code requirement is larger than SmallTalk, but not significantly so.
  • Tony got Richard Jones to sign his copy of the book the last time he was up here visiting. Richard signed it:

    "A Good Collection of Garbage"

    :-)
  • Data euthanasia?
    Memory cleaning?
    Data cleansing?
    Memory downsizing? (or "rightsizing")
    RIM? ("Reduction in (allocated) memory" - analogous to corporate-speak RIF)
  • Perl actually uses a reference counting algorithm for garbage collection, rather than mark and sweep -- this can cause problems with self-referential data structures (i.e. if A points to B and B points to A, there are references, but the variable may not be available.)

    In the short run, this doesn't matter so much with Perl because it's more useful as a tool than as an actual language. Try to run mod_perl, where the interpreter is persistent, and issues with reference counting become immediately apparent.

    Many people don't consider reference counting to be "true" GC at all -- Java has some interesting arguments about their GC implementation, along with the little known fact that up to 40% of CPU time can be used just in GC: they do it right but it's not cheap...

    I believe there's even a koan regarding GC in the Hacker Dictionary(which is appropriately self-referential...)
  • I have NEVER seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching malloc()/free() pairs.

    No one ever tries to be not careful. The cautious statement you made of "if you were careful" points out that there are times that you were possibly not careful. malloc()/free() pairs gets more complicated with distributed code.

  • Although GC is pretty helpful in RAD applications, so many people misuse GC that it has gotten a
    bad name. Usually the same people who misuse GC also misuse pointers.

    The main form of GC misuse in OO-land is to avoid properly designing object destructors. As things
    get more complicated and multiple people start working on things this is often the place of the
    first design failure.

    Selecting the right time to destroy persitent objects is the biggest problem for real-time/
    multi-threaded/retentrant code. Subtle bugs (like the TCP listen-timeout bugs) usually live here.
    Defering these problems to a generic GC is probably the wrong thing to do in most cases.

    Using GC to free nodes on a recursive traversal is great, but using GC to solve buffer deadlock
    is probably a horrible thing to do.

    Remember that most GC algorithms are only faster in that they essentially defer free() calls and
    process them in bulk reducing overhead. Ref-count and mark-sweep type algorithms require that
    more work be done than simply marking unused on a call to free().

    As they say: pick the right tool for the job.
  • So many transistors going down the drain of diminishing returns: more and more branch prediction, out-of-order execution, multiple execution units..

    And just a little bit of silicon helps GC in a big way!

    I hope that the trend resumed by PicoJava will propagate to other embedded and general purpose processors.

    And dynamic type checking is mainstream too now,
    but that is another story..
  • There's too many people out there (some with good positions at important companies) who either:

    a) Don't know anything besides C and C++
    b) Are forced to use C/C++ because that's the company orientation

    so that then they later force that trash on developers in the form of libraries, debuggers that crash, broken compilers, etc.

    Every time I upgrade systems/tools, the release notes make me sick.

    No. Open Source is not doing better. You can't ignore C/C++ on Linux either.

    For simple things, malloc/free can be enough, but
    there's a certain point where the complex relationships between components/subsystems/clients crave for GC.
  • Get the alternative Lisp syntax back. That's not hard to do.
  • Ever tried searching for `garbage collection' or `memory management' in amazon.com? I'm happy enough this book exists. I'm sure others will come (The recommendations list I get consists mostly of `patterns' titles. I'm so bored I started getting into semantics, partial evaluation and old Lisp books)
  • Not to mention just plain sounding better than malloc or calloc. Malloc and Calloc sound like they should be villians in Dr. Who or something. Alloca has a nice Hawaii-five-o ring to it.

    But seriously, how bad is alloca for portability? Man pages generally claim that it is system dependent...
  • In your example, all you need is . . . maintain your own refrence counts internally, or some such.



    But reference counting is a form of garbage collection. So in this case, you suggest writing your own garbage collection instead of having the language provide it.

  • Yes, there are incremental garbage collectors which don't try to collect *all* abandoned objects, they instead just try to keep some fraction of the allocation space free. They get called more often, but do less work, so the GC time is spread out over the execution. Of course, when they fail, they have to call a more serious GC routine, which will often cause a "burp" in performance.
  • I enjoyed reading this. It is great to have a well-organized one-source presentation of the classics.

    My one complaint: as is often the case for academic texts, is it does not even mention important deployed industry technologies.

    For example, if I recall correctly, in its discussion of distributed GC, there is nary a mention of DCOM or CORBA. (The book was published in 1996, when DCOM and some CORBA ORBs were shipping.)
  • You get memory leaks if you are lucky or segfaults if you are not.

    Actually the other way around... I've worked as a C++ programmer under Windows (ugh!). A segfault is an obvious indication that something is wrong and will tell you where it occured, while a memory leak can go unnoticed for days or weeks of development and is a pain to track down. Give me segfaults over memory leaks everytime.

  • Garbage collection for those who find it to lazy or dificult to call free()...

    I am afraid you have just pressed one of my buttons. There is nothing that alerts me to danger more than one programmer acusing another of being 'lazy' because he does not want to use some rediculously complicated tool of which the first programmer is fond.

    I have NEVER seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching malloc()/free() pairs.

    If this were usenet, at this point I would endeavour to call you out on this, and we would end up in a pissing contest over who had the most experience and whose experience was most real

    Since this is not usenet, and I am in a good mood, I will not. I will just say that I have seen many examples of supposed OO code hacked to death or made overly complicated by the need to keep track of memory allocation. The stability, memory usage and locality of a great many C programs (including XFree86) can be improved by linking them with a garbage collector. In many cases this can actually make them faster.

    I am, at this very minute, attempting to track down a bug in a C++ library linked against my Java program, which is caused by a failure of the evil reference counting scheme the library author used to keep track of his memory usage. Now I know, from looking at the code, that he could not have used malloc and free in the traditional way without breaking the clean interface of his code, and using a real garbage collector with C++ is hard, so what choice did he have ? But the need to reimplement garbage collection badly for every project rather than doing it properly once is a total waste of time.

  • I don't believe it - now we are going to have politically correct computer jargon??? Please tell me you're kidding...
  • The most hideous bugs can lurk in memory management. You get memory leaks if you are lucky or segfaults if you are not. GC frees you from a whole class of bugs and permits leaner, more legible code.

    I think I'd rather have the segfaults... sometimes memory leaks can be tricky to track down. At any rate, gc is the way to go.

    GC was a revelation to me when I switched from C/C++ to Perl. Of course Perl's memory management goes beyond GC, but that's another story.

    Rapid Application Development is nearly impossible without GC -- you can't develop rapidly when you are hunting down memory leaks and illegal pointers.



    This, IMHO, is one of the really nice things about java, and perhaps a feature that really should have made it into C++ but didn't. It seems to me that properly handling exceptions w/o gc can make life real hairy for the C++ compiler.

    I think GC is an interesting topic, and I am glad to read about it on /.

  • that would violate C++'s "no overhead" rule - you only pay for what you buy. If you don't want GC, you don't pay for its performance hit.

    FWIW, there is a variant of C++ called EC++ (embedded C++), which strips out many of the less well though out "features" of C++, like exception throwing (other than in constructors, where it is required to deal with error conditions, because you can't return a value), templates, etc.

    I have heard that it helps reduce the C++ code bloat a lot...




  • It is sad that the world does not have very many people that can handle powerful tools like manual memory management.

    I don't see what is so hard about it. If you allocate something, you should free it. If you don't allocate something, you had damn well better not try to use or free it.

    For the less clueful, try static arrays. They are friendly and fast.


    If I take that argument to the extreme, then I would arrive at the conclusion that real hackers should program in assembly. (or even machine language!)


    Yes GC makes it harder for idiots to make mistakes, but I don't think that limits its usefulness to real hackers.

    GC is of more use than simply memory management. In an OO language that supports destructors, GC can handle resources other than memory. For example, if you had an class that mixed audio streams together and wrote them to the audio device, your class's constructor could detach from the audio device freeing it for use by other programs. There would not be an easy way to do this without GC besides reference counting... and what hacker wants to waste time re-inventing the wheel!

    BTW, I can't see any good excuse for using static arrays, other than perhaps real-time embedded systems where you can't afford the overhead of GC.

  • My point is exactly that. If you have bad program design, GC is necessary, but well thought out design makes it unnecessary.

    Well, you could probably do without GC, but the question is whether you can do it better without GC.

    Without GC, there are situations where you either need to (a) implement reference counting (which is a form of GC) in order to know when to free some data, or (b) make duplicate copies of objects so each object can be freed by the code that receives a reference to the object when it is done with it. I am really not convinced that either approach is better than just using GC in the first place. (FWIW, the example I am thinking of is a multithreaded application that communicates between threads via message passing, where you would like to be able to have multicast messages. The objects passed are treated as read-only, so side effects are not an issue. I am sure that given a couple minutes I could come up with a better example.)


  • If you have bad program design, GC is necessary, but well thought out design makes it unnecessary. In your example, all you need is some sort of semaphore between the threads, or maintain your own refrence counts internally, or some such.


    This is garbage collection! It's just GC done on a per-program level, rather than a systemwide level. I think you just defeated your own argument...


    But reference counting is a poor form of garbage collection; it's slow and can't reclaim cyclical structures. Most modern garbage collectors are based on mark and sweep somewhere along the line. (I'll go so far as to claim that copying is a modification of mark and sweep, with the two operations combined into one copy.)


    it is also a crutch bad programmers use far too often to cover their own lazyness, and for really complex programs, doesn't solve the problem reliably!


    Actually, if a GC algorithm isn't reliable, it's useless. A hell of a lot of care and attention is paid to proving GC algorithms correct. It's permissible for GC to err on the side of caution (we won't reclaim that block because it might still be in use), and a fair few GC methods do this (usually incremental and conservative algorithms) but reclaiming blocks in use is a complete no-no. And whilst I appreciate your points about copying, for one thing it's usually unnecessary in modern systems - tracing works just as well, if not better, and even Baker's copying method can be done without copying - and for another thing, in conservative systems you must trace, as copying would have undesirable side effects on those things which were not pointers...


    In any case, you say `lazy' as if it were an insult. I would have thought that the good programmers were the lazy ones, the ones who aren't going to expend any unnecessary effort or have their programs do the same. Why build storage management more than once? If the OS or the language runtime has a totally reliable (and remember, it *must* be, or it's useless) GC system, why storage manage at all?


    In any case, there's no harm in having a `free' call that you can call to put something on the collectable list straightaway - but when it becomes difficult to free something in an orderly fashion, why not let the infrastructure take the strain? It's what it's there for.

    --

  • Sure, and I've never seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching label/goto pairs. But all you structured programming weenies insist on using while loops and for loops and switch statements.

    Garbage collection isn't necessary, but it makes the programmer's life much easier and simplifies your code by moving the memory management responsibilities out of the program and into the language runtime library (or VM, or whatever).

    peter

  • I work for Geodesic Systems [geodesic.com]. We sell a drop-in memory management product called "Great Circle" which includes garbage collection for C and C++ and has a web interface.
  • Garbage collection for those who find it to lazy or dificult to call free()...

    I have NEVER seen a place where, if you were careful (as you should be) in designing your code, you couldn't have nice matching malloc()/free() pairs.

    (Yes, I know in LISP and such GC is a must, but those languages don't even have the same concept of variables as a normal imperitive language such as C)

    -The Code Nazi
    (now you see why some friends of mine
    gave me that nickname...)
  • Obviously you have never written a real program spanning several million lines of code with subroutines written by someone other then you.

    That is rather assuming... and self contradictory.
    [snip - some example of program design]

    My point is exactly that. If you have bad program design, GC is necessary, but well thought out design makes it unnecessary. In your example, all you need is some sort of semaphore between the threads, or maintain your own refrence counts internally, or some such.

    If the library blindly mallocs such structures, and gives the user a pointer to them, I have a few things to say about the library as well. I guess that could be ok, if the library never touched it again, and there was an equivilent destruct funtion in the library (it becomes another malloc/free pair). If the library returns a pointer, and continues to modify it, you not only have a GC problem, you will be clobbering data!

    I guess the point is, that while GC can be a time-saver (in having to not re-design sections of badly written code), it is also a crutch bad programmers use far too often to cover their own lazyness, and for really complex programs, doesn't solve the problem reliably! (I really hate it when structures and code change behind my back (hence my hate for C++) )

    -The Code Nazi

  • Your comments suggest that you think that GC somehow imposes overheads that you otherwise don't pay in C++. But that's not true.

    Memory allocation and deallocation in C++ imposes runtime costs qualitatively no different from costs that you incur if you allocate memory in the presence of a GC. Quantitatively, GC is actually often faster.

    The main reason C++ doesn't have GC is because C++ has C pointer semantics and because many C++ programmers habitually make a lot of unportable assumptions about the runtime and memory layout. That makes adding a GC to C++ tricky and would break lots of programs, programs that are not legal C++ programs but that still work.

    The best compromise for GC in C++ is to use a conservative GC (like Boehm's GC). Boehm's GC may break some valid programs, but on the whole it lives fairly well within the framework of C and C++ and does not change C++ semantics too much. But in most languages other than C++, one can do considerably better with GC.

  • Holding on to too much memory is not a practical problem in languages with GC. If you are really concerned about it, you can use many of the standard techniques from C/C++ for dealing with it and you still retain the advantages of GC (including the runtime safety).

    Java's Hello World does not take 80M of memory. In fact, Java runs in pretty small footprints and on the whole is probably smaller than current C/C++ runtimes. The only reason you don't see the C/C++ runtimes is because they are usually loaded at boot time these days and you consider them part of the OS.

  • Comparing actual memory usage of Java and C/C++ is rather tricky because memory is handled so differently. For example, in C/C++, the GUI libraries are "shared" and not necessarily counted against memory usage of the process, but they still take up space. The Java GUI libraries, however, are stored on the Java runtime's heap.

    Because Java isolates different software components from one another, you can run different applications and services within the same runtime, and that's really the right comparison to make (at least in the long run when more and more applications run in Java).

    That may seem like a distant hypothetical future, but it actually already has a lot of practical significance. For example, when writing servlets in Java, the footprint of each additional servlet is tiny, yet servlets are isolated from one another as well as if they were running in separate processes. In contrast, CGI scripts and even shared library implementations of web services have enormous overhead.

  • Actually, last I checked, Perl's memory management was jut simple refcounting, not even smart enough to avoid leaking circular garbage.
  • So, it's all free() and malloc(), but garbage collection lets you defer the decision to do free().

    I once worked on a compiler which did garbage collection this way. When you compile a procedure, the signature is persistent, but the body isn't. That is to say, the data structure used to represent the signature will remain forever, but the data structures used to represent the bodies are not useful after you have finished compiling the body. So, you allocate space for the persistent stuff in permanent storage, and space for the body in some other, ephemeral blocks of storage. When you are finished with the procedure, you can just collect all the ephemeral stuff and throw it out, without thinking about it. This is much faster than using reference counts, since you don't pay a penalty for each pointer assignment. Since the language was Ada, what was persistent in one context was ephemeral in another. C or C++ wouldn't have the same problem. I thought it was a neat idea, and wish that it was *my* idea.
  • This is a wonderful book. I've owned it for some months and read it end to end in one sitting. I'm not kidding. This seems to be the first `proper book' on the subject, something I've been looking for for about five years, and I've had to make do with terse survey papers. Buy it. :)
  • Which is easier to find? The segfault, of course. When developing/testing your application, a memory is very likely to slip through into the shipped product. Any way to set "tripwires" (like assertions) will help you shake out bugs before shipping.

  • alloca() is like malloc(), except it allocates memory on top of your stack (not the heap). Allocation is fast, O(1), because alloca() simply pushes the stack pointer forward. You don't need to free() alloca blocks. When your function returns, the stack disappears! But you need to be careful not to use alloca() for any memory blocks that need to be retained after the function returns (like nodes in a global data structure).
  • This is a really good book. It explains garbage collection in lots of detail and covers various implementation tradeoffs.

    If you're not willing to read several hundred academic papers on this subject--and sort them all out--this book is definitely the best starting place.

  • Wow. Somebody's posted an advertisement for our project. Since high praise always makes me nervous, let me try to deflect some of that enthusiasm in various directions. :-)

    You also want the check out GNU Common Lisp, GNU Guile, Eiffel, ML, Prolog, Haskell and the various R4RS scheme interpreters.

    There's a lot of free (and GPL'd) programming languages out there, and some of them are pretty fast and impressively powerful.

  • Sure... but if you learned Dylan well enough to implement a Python compiler in Dylan, then you might find yourself losing interest in Python in favor of Dylan. :-)

    That's not true! I help maintain a Dylan compiler, and I still like Python!

  • Well, my favorite language is Python, but I just wondered, is there any documentation about Dylan-to-C compiler ? The internals manuals aren't very verbose :-).

    Oops. We're still filling those out as we find our way around the code.

  • by Mr T ( 21709 )
    You sound like you've never worked on anything other than a one man project. With large, dynamic, multithreaded applications it is very easy to lose track of when a piece of memory needs to be returned. It happens in just about every application but it's usually small enough that you don't notice it. I've got a masters in software engineering and I can tell you that the best programming shops in the world (CMM level 4 type places) end up with code that leaks, being careful is not good enough because nobody is good enough to be that careful on large projects. Good modern garbage collection (difficult to do with C or C++) offers a lot of advantages. Generational Copy Propgation collection is fast (much faster than reference counting), it compacts memory so the OS can reclaim pages that aren't in use (tell me how to do this with out garbage collection, you can free all your mallocs but you can't control which page the memory lands on. If all you have is two bytes on two different pages and you free everything else, you're still consuming a lot of memory...) it speeds up memory allocation which can have a dramatic impact on performance in some problems spaces, and it can enhance cache performance when accessing memory in some circumstances. Not to mention it fixes memory leaks.. The advantages far out weigh the disadvantages (a small performance hit, but you'll have a much worse one if you count refs) and it has nothing to do with design. If design is what you really care about, GC does you a huge favor because it allows you to actually focus on the design and solving the problem at hand. The anti-GC crowd are really old school thinkers who didn't understand the problems in the first place, GC is the future, even realtime systems are using it now.
  • It's been a while since I've used a language with built-in GC. The one thing that I never found to be acceptable was that the program/system froze for anywhere from 100s of milliseconds to several seconds when the GC decided to clean up. Has this problem been fixed in modern implementations of GC?
  • I've never heard that 40% figure. Java GC implementations have historically not been state-of-the-art, though. The fact is that a good GC is faster than manual deallocation.

    Hmm.. That sounds odd... I imagine it's dependant on the type of program that's running... One with lots of idle time should be better off with GC, one with less should be better with manual deallocation.
  • Good book but should have covered distributed GC. These days one of the "big things" for objects is using them to build distributed systems.

    A good paper on distributed GC came from one of the INRIA project OS's called COOL, but there have been several research papers on the topic.
  • No mention of distributed GC in the book?

    Then they sure picked a funny title for section 12...

I've noticed several design suggestions in your code.

Working...