Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Debugging

Posted by timothy on Tue Feb 24, 2004 02:41 PM
from the unlousy dept.
dwheeler writes "It's not often you find a classic, but I think I've found a new classic for software and computer hardware developers. It's David J. Agan's Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems." Read on for the rest.
Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
author David J. Agans
pages 192
publisher Amacom
rating 9
reviewer David A. Wheeler
ISBN 0814471684
summary A classic book on debugging principles

Debugging explains the fundamentals of finding and fixing bugs (once a bug has been detected), rather than any particular technology. It's best for developers who are novices or who are only moderately experienced, but even old pros will find helpful reminders of things they know they should do but forget in the rush of the moment. This book will help you fix those inevitable bugs, particularly if you're not a pro at debugging. It's hard to bottle experience; this book does a good job. This is a book I expect to find useful many, many, years from now.

The entire book revolves around the "nine rules." After the typical introduction and list of the rules, there's one chapter for each rule. Each of these chapters describes the rule, explains why it's a rule, and includes several "sub-rules" that explain how to apply the rule. Most importantly, there are lots of "war stories" that are both fun to read and good illustrations of how to put the rule into practice.

Since the whole book revolves around the nine rules, it might help to understand the book by skimming the rules and their sub-rules:

  1. Understand the system: Read the manual, read everything in depth, know the fundamentals, know the road map, understand your tools, and look up the details.
  2. Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen, and never throw away a debugging tool.
  3. Quit thinking and look (get data first, don't just do complicated repairs based on guessing): See the failure, see the details, build instrumentation in, add instrumentation on, don't be afraid to dive in, watch out for Heisenberg, and guess only to focus the search.
  4. Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.
  5. Change one thing at a time: Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.
  6. Keep an audit trail: Write down what you did in what order and what happened as a result, understand that any detail could be the important one, correlate events, understand that audit trails for design are also good for testing, and write it down!
  7. Check the plug: Question your assumptions, start at the beginning, and test the tool.
  8. Get a fresh view: Ask for fresh insights, tap expertise, listen to the voice of experience, know that help is all around you, don't be proud, report symptoms (not theories), and realize that you don't have to be sure.
  9. If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process.

This list by itself looks dry, but the detailed explanations and war stories make the entire book come alive. Many of the war stories jump deeply into technical details; some might find the details overwhelming, but I found that they were excellent in helping the principles come alive in a practical way. Many war stories were about obsolete technology, but since the principle is the point that isn't a problem. Not all the war stories are about computing; there's a funny story involving house wiring, for example. But if you don't know anything about computer hardware and software, you won't be able to follow many of the examples.

After detailed explanations of the rules, the rest of the book has a single story showing all the rules in action, a set of "easy exercises for the reader," tips for help desks, and closing remarks.

There are lots of good points here. One that particularly stands out is "quit thinking and look." Too many try to "fix" things based on a guess instead of gathering and observing data to prove or disprove a hypothesis. Another principle that stands out is "if you didn't fix it, it ain't fixed;" there are several vendors I'd like to give that advice to. The whole "stimulate the failure, don't simulate the failure" discussion is not as clearly explained as most of the book, but it's a valid point worth understanding.

I particularly appreciated Agans' discussions on intermittent problems (particularly in "Make it Fail"). Intermittent problems are usually the hardest to deal with, and the author gives straightforward advice on how to deal with them. One odd thing is that although he mentions Heisenberg, he never mentions the term "Heisenbug," a common jargon term in software development (a Heisenbug is a bug that disappears or alters its behavior when one attempts to probe or isolate it). At least a note would've been appropriate.

The back cover includes a number of endorsements, including one from somebody named Rob Malda. But don't worry, the book's good anyway :-).

It's important to note that this is a book on fundamentals, and different than most other books related to debugging. There are many other books on debugging, such as Richard Stallman et al's Debugging with GDB: The GNU Source-Level Debugger. But these other texts usually concentrate primarily on a specific technology and/or on explaining tool commands. A few (like Norman Matloff's guide to faster, less-frustrating debugging ) have a few more general suggestions on debugging, but are nothing like Agans' book. There are many books on testing, like Boris Beizer's Software Testing Techniques, but they tend to emphasize how to create tests to detect bugs, and less on how to fix a bug once it's been detected. Agans' book concentrates on the big picture on debugging; these other books are complementary to it.

Debugging has an accompanying website at debuggingrules.com, where you can find various little extras and links to related information. In particular, the website has an amusing poster of the nine rules you can download and print.

No book's perfect, so here are my gripes and wishes:

  1. The sub-rules are really important for understanding the rules, but there's no "master list" in the book or website that shows all the rules and sub-rules on one page. The end of the chapter about a given rule summarizes the sub-rules for that one rule, but it'd sure be easier to have them all in one place. So, print out the list of sub-rules above after you've read the book.
  2. The book left me wishing for more detailed suggestions about specific common technology. This is probably unfair, since the author is trying to give timeless advice rather than a "how to use tool X" tutorial. But it'd be very useful to give good general advice, specific suggestions, and examples of what approaches to take for common types of tools (like symbolic debuggers, digital logic probes, etc.), specific widely-used tools (like ddd on gdb), and common problems. Even after the specific tools are gone, such advice can help you use later ones. A little of this is hinted at in the "know your tools" section, but I'd like to have seen much more of it. Vendors often crow about what their tools can do, but rarely explain their weaknesses or how to apply them in a broader context.
  3. There's probably a need for another book that takes the same rules, but broadens them to solving arbitrary problems. Frankly, the rules apply to many situations beyond computing, but the war stories are far too technical for the non-computer person to understand.

But as you can tell, I think this is a great book. In some sense, what it says is "obvious," but it's only obvious as all fundamentals are obvious. Many sports teams know the fundamentals, but fail to consistently apply them - and fail because of it. Novices need to learn the fundamentals, and pros need occasional reminders of them; this book is a good way to learn or be reminded of them. Get this book.


If you like this review, feel free to see Wheeler's home page, including his book on developing secure programs and his paper on quantitative analysis of open source software / Free Software. You can purchase Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.
Debugging | Log In/Create an Account | Top | 290 comments (Spill at 50!) | Index Only | Search Discussion
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • i hate debugging (Score:5, Funny)

    by Anonymous Coward on Tuesday February 24 2004, @02:42PM (#8376753)
    cause when i do it, it is often re-bugging
    • Effective Technique (Score:5, Funny)

      by Rick the Red (307103) <Rick.The.Red@[ ]il.com ['gma' in gap]> on Tuesday February 24 2004, @03:04PM (#8377038)
      (Last Journal: Friday June 24 2005, @05:12AM)
      I find the best way to uncover bugs is to do a demo for your boss's boss.
      [ Parent ]
    • Re:i hate debugging (Score:5, Funny)

      by Frymaster (171343) on Tuesday February 24 2004, @03:33PM (#8377381)
      (http://frymaster.ca/ | Last Journal: Monday September 15 2003, @12:58AM)
      cause when i do it, it is often re-bugging

      we have a special process we call "debuggery". debuggery - maxims and arrows

      1. be hostile: your application was your friend - your baby. you gave it life. well, no longer. now your application is your enemy. do you admire the intricate house of cards you have built like hiram abif? don't. you have a glue gun now and you are going to do a little explaining about who is boss here! your app is taunting you - it's thinking "what does a chemical/analogue hack like that have that i don't?" well, i'll tell you: an index finger. suitable for hitting the "del" key. make this crystal goddamn clear!
      2. kludge everything! the debug stage of the development life cycle is all about kludges. we call it klop - kludge-oriented programming:

        kludge foo = new kludge(specialCase bar);

        you've written that. the debugging phase comes at the end of a project. ie the part closest to the deadline when clueless suits and moneyment confuse line count with product. the pressure is on. the company is on the line. are you going to walk into the glass tower and pitch to the vc's about how yr going to have to go back to the uml's and rebuild x? good luck! can i have your job when you're done? get the tape, get the staples, get the glue.

      3. blame others: teamwork is just a code word for being the shepherd to a flock of scapegoats. if you were smart, you'd have been working on cultivating a culture of accepting blame early on in the cycle. this is espescially effective if yr building a client/server thingy. establish early on that most of the failures are on the client(server) side. whichever one you're not writing.

        make yourself documentation czar if possible - then abuse the position to retroactively assign blame to other team members ("the docs explicitly state that we use roman numerals" - "gee, i don't remember that" - "well tough. get coding").if you set it up right you can build an army of debugging minions to do your kluding for you while you, uh, read slashdot...

      4. redefine feature sets. the client is a clueless little doughboy who can't tell his ass from his operating system anyway. he's been flaking you on the spec-n-req all year. turn those tables! if a feature is buggy, yank it. if there's a complaint, reference the client to some vaguely-related advisory somewhere (trust me, he won't read all the way down). if he complains say "in light of advisory x we strongly adivse against implementing _______ (feature). a work around may be possible at a future point and we are more than willing to calculate the billing for that additional work now."
      all that and echo will solve all yr debuggery problems.
      [ Parent ]
    • 1 reply beneath your current threshold.
  • #9 is wrong (Score:5, Funny)

    by Anonymous Coward on Tuesday February 24 2004, @02:46PM (#8376808)
    What if someone else fixes it?
    • Re:#9 is wrong by Neil Blender (Score:1) Tuesday February 24 2004, @02:52PM
    • Re:#9 is wrong by kubrick (Score:1) Tuesday February 24 2004, @05:38PM
  • yuck (Score:5, Funny)

    by theMerovingian (722983) on Tuesday February 24 2004, @02:47PM (#8376810)
    (Last Journal: Tuesday October 23, @02:06AM)
    Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen

    Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.

    Does anyone else feel dirty after reading this?

    • Re:yuck by kooso (Score:2) Tuesday February 24 2004, @02:56PM
      • Re:yuck by Ed Avis (Score:2) Tuesday February 24 2004, @04:32PM
      • 1 reply beneath your current threshold.
    • Re:yuck by Tony-A (Score:2) Tuesday February 24 2004, @06:48PM
  • Change one thing at a time (Score:5, Insightful)

    by tcopeland (32225) * <tomNO@SPAMinfoether.com> on Tuesday February 24 2004, @02:47PM (#8376822)
    (http://tomcopeland.blogs.com/)
    > Change one thing at a time: Isolate the
    > key factor, grab the brass bar with both
    > hands (understand what's wrong before fixing),
    > change one test at a time, compare it with a
    > good one, and determine what you changed
    > since the last time it worked.

    This is helpful with unit tests, too. If I find a bug, I want to figure out which unit test should have caught this and why it didn't. Then I can either fix the current tests, or add new ones to catch this.

    Either way, if someone reintroduces that particular bug it'll get caught by the unit tests during the next hourly build [ultralog.net].
    • Unit Test? (Score:4, Funny)

      by MooseByte (751829) on Tuesday February 24 2004, @03:23PM (#8377273)

      What is this "unit test" you refer to? If we consider our customer base to be a "unit", does that count?

      Yours Truly (All Belongs To Me),

      Bill
      [ Parent ]
    • Re:Change one thing at a time (Score:5, Interesting)

      by wrp103 (583277) <Bill.Pringle@gmail.com> on Tuesday February 24 2004, @03:38PM (#8377446)
      (http://billpringle.com/)

      It is nice to see a book that addresses this topic. I get very frustrated with so many text books that have at most a small chapter on debugging. Let's face it, beginning programmers spend more time debugging code than they do writing code, so why isn't that activity stressed?

      I particularly liked the rule about "Quit thinking and look". I worked with a guy who used what I call the "Zen method of debugging". He would keep staring at the code, trying to determine what was going on. I, on the other hand, would throw in some print statements so I could see what was going on. In one case, he insisted there was nothing wrong with the code, but what he didn't realize was that an early test failed, which meant the code he was looking at never got executed. I had suggested he print something out at the start of the routine, but he insisted it wasn't necessary because he knew what it was doing.

      He might cover this in the book, but one rule that I stress with my students is, if you make a change and the behavior of the program is the same, back out your changes because either:

      • You are probably looking in the wrong place (which is why the behavior is the same)
      • You could easily have just inserted several new bugs that you won't see until the path you are looking at gets executed.

      I often have students insist that their changes should have fixed something, but it turns out the program was actually executing an alternative path that they weren't looking at, or that the problem was much earlier, so when it got to where they thought the problem was, the data was different than they assumed.

      [ Parent ]
      • Re:Change one thing at a time (Score:4, Interesting)

        by CargoCultCoder (228910) on Tuesday February 24 2004, @05:01PM (#8378455)
        (http://www.cv6.org/)

        I particularly liked the rule about "Quit thinking and look". I worked with a guy who used what I call the "Zen method of debugging". He would keep staring at the code, trying to determine what was going on...

        Personally, I would consider this to be the anti-Zen method. He was apparently focused so much on what he "knew" to be true, that he failed to consider clues trying to point him in another direction. That is not the Zen way of looking at things.

        Zen and the Art of Motorcycle Maintenance [amazon.com] has a lot to say about this. If you're stuck on a problem, the solution is not to beat on it harder (e.g., stare at the code some more). The solution is to back off, and (to paraphrase from memory) allow yourself to become aware of the one little fact that's out there, waving it's hand, hoping that you might notice it ... and that'll point you at the real problem.

        Stupidly staring at code is not Zen. Having an open mind for interesting and helpful facts -- whatever their source -- is.

        [ Parent ]
      • Re:Change one thing at a time by e-Motion (Score:3) Tuesday February 24 2004, @06:41PM
      • Because debugging should be avoided. by Chemisor (Score:2) Tuesday February 24 2004, @09:20PM
      • Re:Change one thing at a time by tcopeland (Score:1) Wednesday February 25 2004, @10:49AM
      • 1 reply beneath your current threshold.
    • 1 reply beneath your current threshold.
  • Heisenbugs... (Score:5, Informative)

    by Aardpig (622459) on Tuesday February 24 2004, @02:48PM (#8376829)

    ...are always the worst: bugs which disappear when you look for them. Insert a print statement? The bug disappears. Use a debugger? The bug reappears, but in a different place.

    Heisenbugs are almost always caused by buffer overflows. They can often be prevented (at least in Fortran 77/90/95/03) by enabling array-bounds checking at compile time; but before I knew about this, I had a hell of a time tracking them down.

    • Re:Heisenbugs... (Score:5, Funny)

      by AndroidCat (229562) on Tuesday February 24 2004, @02:56PM (#8376953)
      (http://home.primus.ca/~ronsharp/tororg.html)
      When I was working on arcade games, we had a sure-fire method of making bugs go away. However, shipping each coin-op game with an engineer and $40k worth of testing equipment connected to it wasn't really cost-effective.
      [ Parent ]
    • Sonuvabitch! by Anonymous Coward (Score:3) Tuesday February 24 2004, @02:59PM
      • Re:Sonuvabitch! (Score:4, Informative)

        by Aardpig (622459) on Tuesday February 24 2004, @03:07PM (#8377072)

        I have hated fortran for years, having written a single program in it, based on this.

        Fortunately, things have changed a lot since then. With the introduction of modules and array arithmetic in Fortran 90/95, sitations where routines are called with the wrong arguments, or arrays are subscripted incorrectly, are much less frequent. I haven't been bitten by a Heisenbug for a couple of years now; and when I am, switching on checking at compile and run time usually reveals the problem pretty quickly.

        [ Parent ]
        • OT, but ... by josephgrossberg (Score:2) Wednesday February 25 2004, @12:10PM
          • Re:OT, but ... by josephgrossberg (Score:2) Wednesday February 25 2004, @12:16PM
        • Re:Sonuvabitch! by Aardpig (Score:2) Wednesday February 25 2004, @12:56AM
        • 1 reply beneath your current threshold.
      • Re:Sonuvabitch! by Lil'wombat (Score:2) Wednesday February 25 2004, @12:47PM
    • Re: Heisenbugs... (Score:4, Interesting)

      by gidds (56397) <slashdot&gidds,me,uk> on Tuesday February 24 2004, @03:01PM (#8376998)
      (http://www.gidds.me.uk/)
      You're describing bugs which are reproducible, but only on the unchanged code.

      Worse even that those are bugs which aren't reproducible at all, where there's no way to determine the conditions that caused them, or be sure you've fixed them. The only way to handle them is to fill the code with assertions and defensive code, and hope that at some point it'll catch something for you...

      [ Parent ]
      • Re: Heisenbugs... by pclminion (Score:2) Tuesday February 24 2004, @04:01PM
        • Re: Heisenbugs... by Detritus (Score:2) Tuesday February 24 2004, @04:10PM
        • Re: Heisenbugs... by Alzheimers (Score:2) Tuesday February 24 2004, @04:37PM
        • Re: Heisenbugs... (Score:4, Funny)

          by aled (228417) on Tuesday February 24 2004, @05:41PM (#8379002)
          Computers are deterministic

          Oh god, a computer calvinist!

          Let me tell you about the group of girls I hear telling the profesor how they were debugging the timing loop to measure an analogic signal, stepping through the loop code in the debugger. Free will kills determinism!
          [ Parent ]
        • Re: Heisenbugs... (Score:4, Insightful)

          by gidds (56397) <slashdot&gidds,me,uk> on Tuesday February 24 2004, @07:06PM (#8380099)
          (http://www.gidds.me.uk/)
          Well, yes, but that determinism can be arbitrarily complex; causes may be very far removed from their effects. A GUI app can have a *lot* of past input to affect things, for example, especially if it runs for days or weeks. Exactly when asynchronous events happen can be extremely difficult to predict, detect, handle, or test; livelocks and race conditions are notoriously hard to track down. Even exact patterns of memory layout and allocation, file organisation or access, &c can affect subtle bugs. So while strictly true, determinism isn't a lot of help in some cases.
          [ Parent ]
        • Re: Heisenbugs... by BinxBolling (Score:2) Tuesday February 24 2004, @11:31PM
        • Re: Heisenbugs... by maysonl (Score:1) Wednesday February 25 2004, @02:31AM
        • 1 reply beneath your current threshold.
    • Re:Heisenbugs... (Score:5, Insightful)

      by WayneConrad (312222) * <wconrad@@@yagni...com> on Tuesday February 24 2004, @03:04PM (#8377029)
      (http://yagni.com/)

      Heisenbugs are almost always caused by buffer overflows.

      They are also almost always caused by race conditions, the most insidious of which is thread-safe code that turns out only to be safe on a uniprocessor system.

      And don't forget the phase of the moon, or for the truly unlucky, intermittently glitchy hardware.

      [ Parent ]
      • Re:Heisenbugs... by jimsum (Score:3) Tuesday February 24 2004, @03:58PM
      • Re:Heisenbugs... by thentil (Score:2) Tuesday February 24 2004, @04:22PM
      • Phase of the Moon (Score:4, Interesting)

        by cpeterso (19082) on Tuesday February 24 2004, @04:51PM (#8378298)
        (http://www.cpeterso.com/)

        There really was a bug based on the phase of the moon. See the Jargon Dictionary for more info: phase of the moon [astrian.net]:


        phase of the moon
        phase of the moon n. Used humorously as a random parameter on which something is said to depend. Sometimes implies unreliability of whatever is dependent, or that reliability seems to be dependent on conditions nobody has been able to determine. "This feature depends on having the channel open in mumble mode, having the foo switch set, and on the phase of the moon." See also heisenbug.

        True story: Once upon a time there was a program bug that really did depend on the phase of the moon. There was a little subroutine that had traditionally been used in various programs at MIT to calculate an approximation to the moon's true phase. GLS incorporated this routine into a LISP program that, when it wrote out a file, would print a timestamp line almost 80 characters long. Very occasionally the first line of the message would be too long and would overflow onto the next line, and when the file was later read back in the program would barf. The length of the first line depended on both the precise date and time and the length of the phase specification when the timestamp was printed, and so the bug literally depended on the phase of the moon!

        The first paper edition of the Jargon File (Steele-1983) included an example of one of the timestamp lines that exhibited this bug, but the typesetter `corrected' it. This has since been described as the phase-of-the-moon-bug bug.

        However, beware of assumptions. A few years ago, engineers of CERN (European Center for Nuclear Research) were baffled by some errors in experiments conducted with the LEP particle accelerator. As the formidable amount of data generated by such devices is heavily processed by computers before being seen by humans, many people suggested the software was somehow sensitive to the phase of the moon. A few desperate engineers discovered the truth; the error turned out to be the result of a tiny change in the geometry of the 27km circumference ring, physically caused by the deformation of the Earth by the passage of the Moon! This story has entered physics folklore as a Newtonian vengeance on particle physics and as an example of the relevance of the simplest and oldest physical laws to the most modern science.
        [ Parent ]
      • 1 reply beneath your current threshold.
    • Re:Heisenbugs... (Score:5, Insightful)

      by kzinti (9651) on Tuesday February 24 2004, @03:09PM (#8377107)
      (http://jimthompson.org/ | Last Journal: Monday August 20 2001, @09:22AM)
      Heisenbugs are almost always caused by buffer overflows.

      In my experience, Heisenbugs are almost always caused by stack problems. That's why they go away when you put print statements in the code - because you're causing the usage of the stack to change.

      Buffer overflows (to arrays on the stack) are one good way to munge the stack. Returning the address of an input parameter or automatic variable is another way, because these are declared on the stack and cease to exist when the enclosing block exits. Anybody else using such an address is writing into the stack in an undefined manner, and chaos can result!
      [ Parent ]
      • Re:Heisenbugs... (Score:5, Interesting)

        by Rufus88 (748752) on Tuesday February 24 2004, @04:13PM (#8377808)
        In my experience, Heisenbugs are often the result of race conditions between concurrent threads.

        This reminds me of a famous hardware "bug":
        > This is a weird but true story (with a moral) ...
        > A complaint was received by the Pontiac Division of General Motors:
        >
        > "This is the second time I have written you, and I don't blame you for not
        > answering me, because I kind of sounded crazy, but it is a fact that we
        > have a tradition in our family of ice cream for dessert after dinner each
        > night.
        >
        > But the kind of ice cream varies so, every night, after we've eaten, the
        > whole family votes on which kind of ice cream we should have and I drive
        > down to the store to get it. It's also a fact that I recently purchased a
        > new Pontiac and since then my trips to the store have created a problem.
        >
        > You see, every time I buy vanilla ice cream, when I start back from the
        > store my car won't start. If I get any other kind of ice cream, the car
        > starts just fine. I want you to know I'm serious about this question, no
        > matter how silly it sounds: 'What is there about a Pontiac that makes it
        > not start when I get vanilla ice cream, and easy to start whenever I get any
        > other kind?'"
        >
        > The Pontiac President was understandably skeptical about the letter, but
        > sent an engineer to check it out anyway. The latter was surprised to be
        > greeted by a successful, obviously well educated man in a fine neighborhood.
        >
        > He had arranged to meet the man just after dinner time, so the two hopped
        > into the car and drove to the ice cream store. It was vanilla ice cream
        > that night and, sure enough, after they came back to the car, it wouldn't
        > start.
        >
        > The engineer returned for three more nights. The first night, the man got
        > chocolate. The car started. The second night, he got strawberry. The car
        > started. The third night he ordered vanilla. The car failed to start.
        >
        > Now the engineer, being a logical man, refused to believe that this man's
        > car was allergic to vanilla ice cream. He arranged, therefore, to continue
        > his visits for as long as it took to solve the problem. And toward this end
        > he began to take notes: he jotted down all sorts of data, time of day, type
        > of gas used, time to drive back and forth, etc.
        >
        > In a short time, he had a clue: the man took less time to buy vanilla than
        > any other flavor. Why? The answer was in the layout of the store.
        >
        > Vanilla, being the most popular flavor, was in a separate case at the front
        > of the store for quick pickup. All the other flavors were kept in the back
        > of the store at a different counter where it took considerably longer to
        > find the flavor and get checked out.
        >
        > Now the question for the engineer was why the car wouldn't start when it
        > took less time. Once time became the problem-not the vanilla ice cream-the
        > engineer quickly came up with the answer: vapor lock. It was happening
        > every night, but the extra time taken to get the other flavors allowed the
        > engine to cool down sufficiently to start. When the man got vanilla, the
        > engine was still too hot for the vapor lock to dissipate.
        >
        > Moral of the story: even insane looking problems are sometimes real.
        [ Parent ]
      • Re:Heisenbugs... by composer777 (Score:3) Tuesday February 24 2004, @04:59PM
        • Quick Correction by composer777 (Score:2) Tuesday February 24 2004, @05:10PM
          • 1 reply beneath your current threshold.
      • Buffer overflows, bad pointers, stack problems... by jtheory (Score:2) Tuesday February 24 2004, @05:49PM
    • Re:Heisenbugs... by morcheeba (Score:3) Tuesday February 24 2004, @03:13PM
    • Re:Heisenbugs... by JWW (Score:3) Tuesday February 24 2004, @03:26PM
    • Re:Heisenbugs... (Score:4, Informative)

      Heisenbugs are almost always caused by buffer overflows.

      In my experience with embedded systems, a Heisenbug is almost always caused by un-initialized data. You wind up assuming a particular value whereas you originally didn't plan on doing that. What value the data actually turns out to be is highly dependant on things like where in memory the code loads, how big the executable is, and so forth. Adding debugging statements will shift all the code after it up in memory and often make the bug go away and behave differently.

      Another interesting bug that is unrelated to the Heisenbug is when you port (for example) ANSI C code from one platform to another and code that originally worked starts doing weird things. For example, the C compiler under a BSD would allow modulo 0 and produce a zero result, which was incidentally what was wanted. Moved the code to Linux and started getting core dumps, because modulo 0 was considered dividing by zero. Some problems like this actually turn out to be Heisenbugs, for example due to differences in the way memory is malloc-ed on different systems. For example, suppose you accidentally malloc a pointer rather then its contents. One one OS you wind up allocating more memory than you need, but have no problems because addresses start fairly low in memory. On another OS memory addresses start somewhere else and you start getting weird errors due to lack of memory.
      [ Parent ]
    • Re:Heisenbugs... by badmammajamma (Score:2) Tuesday February 24 2004, @03:57PM
    • Re:Heisenbugs... by pommiekiwifruit (Score:2) Tuesday February 24 2004, @03:58PM
    • Re:Heisenbugs... by jlseagull (Score:1) Tuesday February 24 2004, @04:48PM
    • Re:Heisenbugs... by NumbThumb (Score:1) Tuesday February 24 2004, @04:54PM
    • Re:Heisenbugs... by TimeZone (Score:1) Tuesday February 24 2004, @04:56PM
    • Re:Heisenbugs... by TelevisioSledgicus (Score:1) Tuesday February 24 2004, @07:46PM
    • Re:Heisenbugs... by ginsu (Score:1) Tuesday February 24 2004, @11:20PM
    • Re:Heisenbugs... by blorf (Score:1) Wednesday February 25 2004, @01:04AM
    • Re:Heisenbugs... (Score:4, Interesting)

      by pclminion (145572) on Tuesday February 24 2004, @03:11PM (#8377128)
      In my operating system class my groups' program caused an error at one of the delete[] statements and it dissappeared and reappeared depending on whether we ran it in the debug environment or not.

      I'll tell you with 99% certainty that this was caused by a piece of code overrunning the end (or beginning) of a new[]'d buffer, clobbering the memory allocation meta-data. This causes delete[] to crump when it hits a bogus pointer and flies off into never never land.

      By running in the debug environment you changed the memory layout of the allocation in such a way that the problem was masked.

      These kinds of bugs only seem weird the first time you encounter them. They're actually some of the most common types of bugs. With enough experience you'll be finding them in your sleep.

      [ Parent ]
    • 6 replies beneath your current threshold.
  • I'd agree (Score:5, Informative)

    by scatterbrained (144748) on Tuesday February 24 2004, @02:49PM (#8376839)
    (Last Journal: Monday December 08 2003, @12:57PM)
    I've read it and it's a good book, but I would
    just borrow it from the library and then print
    out the poster to remember the 'rules'.

    There's not enough meat to keep it on my
    precious shelf space.
    • Re:I'd agree by caseydk (Score:2) Tuesday February 24 2004, @03:44PM
  • I don't need a book... (Score:5, Funny)

    by garethwi (118563) on Tuesday February 24 2004, @02:50PM (#8376848)
    (http://www.venditor.com/)
    ...to learn how to debug. I only need my own sloppy code.
  • He forgot regression tests (Score:5, Insightful)

    by mark99 (459508) on Tuesday February 24 2004, @02:50PM (#8376853)
    (Last Journal: Saturday April 03 2004, @09:04AM)
    Regression test suites (if possible) should be maintained so that when bugs get fixed, they stay fixed.

    Just my 2 cents.
  • Good read (Score:5, Insightful)

    by GoMMiX (748510) on Tuesday February 24 2004, @02:50PM (#8376857)
    "
    If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process."

    I can think of a WHOLE lot of tech's and admin's who really need to follow number 9 a lot closer.

    Especially those Windows admins/techs who think 'restart' is the ultimate fix-all. Though, sadly, I suppose in many cases that's about all you can do with proprietary software. Well, that and beg vendors to fix the problem. (We all know how productive that is....)
    • Re:Good read by Dukael_Mikakis (Score:2) Tuesday February 24 2004, @03:08PM
    • Re:Good read (Score:5, Insightful)

      by swb (14022) <mobocracy@gmail.com> on Tuesday February 24 2004, @03:11PM (#8377126)
      No, it's number *5* that EVERYONE needs to remember to follow. I see way too many people (including myself in a hurry) changing more than one thing at a time and then immediately wondering what fixed or why it didn't get fixed.

      This is especially important when changing a second variable can actually mask the fix of the change of the first variable or cause a second failure that appears to be the same as the initial failure.

      I guess they should have added a rule 10: be patient and systematic. Obvious problems usually have non-obvious solutions, and a thorough examination of the situation is time consuming. Don't take short cuts or you might miss the problem.
      [ Parent ]
      • Re:Good read by monique (Score:3) Tuesday February 24 2004, @05:06PM
      • Re:Good read by jafac (Score:2) Tuesday February 24 2004, @05:06PM
        • Re:Good read by HeyLaughingBoy (Score:2) Wednesday February 25 2004, @11:31AM
      • Re:Good read by ocie (Score:2) Tuesday February 24 2004, @09:54PM
      • 1 reply beneath your current threshold.
    • Re:Good read by Ytsejam-03 (Score:1) Tuesday February 24 2004, @03:20PM
    • Re:Good read by globalar (Score:2) Tuesday February 24 2004, @03:35PM
    • But restart is the fix-all by girgit (Score:1) Tuesday February 24 2004, @09:30PM
  • but how do you know it's fixed? (Score:5, Insightful)

    by sohp (22984) <`snewton' `at' `io.com'> on Tuesday February 24 2004, @02:51PM (#8376858)
    (http://www.io.com/~snewton/)
    Nothing about writing code for a test case that exercises the bug, then rerunning it every time you make a change you think will fix the bug? Seems like a big oversight. Any program of reasonable size is going to require wasting a significant amount of time restarting and re-running to the point of failure, and with every manual check of the result, there's an increasing probability that fallible human will make a mistake.

    More programmers need to get Test Infected [sourceforge.net].
  • My Favorite Debugging Tale by stuffduff (Score:2) Tuesday February 24 2004, @02:52PM
  • for every cell phone provider out there by NumLk (Score:1) Tuesday February 24 2004, @02:52PM
  • The first law of debugging (Score:5, Funny)

    by ToSeek (529348) on Tuesday February 24 2004, @02:52PM (#8376892)
    "The most likely source of the current bug is the fix you made to the last one."
  • by Dr_Marvin_Monroe (550052) on Tuesday February 24 2004, @02:53PM (#8376895)
    These "rules" are great, but nothing beats the mystic power of a little goat blood and chicken bones waved over a misbehaving system.

    Without these, the average user might be tempted to try and fix it themselves.... Next thing, my job is being "offshored" to a phone bank in India.

    No, the chicken bones and a little incantation will keep my job right here, where it belongs.
  • And the final solution (Score:5, Funny)

    by aliens (90441) on Tuesday February 24 2004, @02:53PM (#8376898)
    (http://rapture-cms.com/ | Last Journal: Tuesday June 24 2003, @02:11PM)
    10) Hammer.

    if 10 fails

    11) Shotgun.

    Congrats problem solved, human destressed.
  • Time (Score:5, Insightful)

    by quarkoid (26884) on Tuesday February 24 2004, @02:53PM (#8376906)
    (http://slashdot.org/)
    One thing's clear from looking at that list - spend more time on testing your code.

    Unfortunately, speaking as an ex-programmer, time is one luxury that PHBs don't afford their minions. A project needs to be completed and knocked out of the door as soon as possible. The less time spent on unnecessary work, the better.

    It is also unfortunate that PC users have been brought up expecting to have buggy software in front of them and expecting to have to reboot/reinstall. What motivation is there to produce bug free code when the users will accept buggy code?

    Ho well, at least I run my own company now - master of my own wallet - and can concentrate on quality solutions.
    • Re:Time by Dukael_Mikakis (Score:3) Tuesday February 24 2004, @03:17PM
    • Re:Time by ocie (Score:2) Tuesday February 24 2004, @09:59PM
  • Sounds interesting (Score:5, Interesting)

    by pcraven (191172) <paul&cravenfamily,com> on Tuesday February 24 2004, @02:53PM (#8376907)
    (http://www.cravenfamily.com/)
    Teaching people how to debug isn't that easy. It requires some experience before they get the hang of it.

    I'm a stickler for labeling code often, and tracking changes released to production. Because of this, I often seem to be a stick in the mud when it comes to refactoring.

    Heavy refactoring makes your code nicer. But when you have to do a lot of debugging on something that worked be refactoring, you can start to appreciate that keeping the change set managable is a 'good thing'. (I do financial apps, so this may not work for everyone.)

    The things I see people fail at most is the ability to 'bracket' the problem. Go between code that works and doesn't work, filtering the problem down to something simple.

    The second thing is the inability of some people to go 'deep' in their debugging. Decompile the java/C#/whatever code, trace through the library calls, whatever.

    Its nice to see another good book on the market that seems to cover these topics.
    • 1 reply beneath your current threshold.
  • Rule 0 (Score:5, Funny)

    by Anonymous Coward on Tuesday February 24 2004, @02:54PM (#8376913)
    0. If you're a software guy blame it on hardware, if you're a hardware guy blame it on software.

    0.1. Blame it on the user.

    0.2. Blame it on your colleague.

    0.3. Blame it on your manager.

    0.4. Yell at the computer and tell it to work dammit!

    0.5. Put head on keyboard and sob.

    0.6. Read Slashdot.

    0.7. Post on Slashdot.

    0.8. Call it a feature not a bug.
    • Re:Rule 0 by Patrik_AKA_RedX (Score:1) Tuesday February 24 2004, @03:18PM
    • BOfH by Archangel Michael (Score:2) Tuesday February 24 2004, @03:45PM
    • Re:Rule 0 by cant_get_a_good_nick (Score:2) Tuesday February 24 2004, @07:04PM
  • Remain focused. Don't let others' WAGs get to you by PornMaster (Score:1) Tuesday February 24 2004, @02:54PM
    • by RobinH (124750) on Tuesday February 24 2004, @03:30PM (#8377358)
      (http://slashdot.org/)
      I find that when troubleshooting systems with which other people have worked longer, I have had better luck just asking them simple facts and troubleshooting myself rather than listening to their wild-ass guesses and having to shoot them down.

      Yes, but within their guesses are sometimes tidbits of information. Last week we had a complaint from a user that every time they clicked this one button on a form, it set off a certain process that wasn't supposed to happen right then, but we knew that there was no connection between that click event and the process. However, I knew he wasn't imagining it.

      After investigating, I found that when he opened the form that the button was on, it loaded a timer object that started ticking away, and 5 seconds later initiated the process. Just happens that it takes about 5 seconds from opening the form to click on the button.

      Of course, if I'd written the software... well, whatever.
      [ Parent ]
  • by TheCrayfish (73892) on Tuesday February 24 2004, @02:54PM (#8376915)
    (http://leecoursey.blogspot.com/)
    You can read a sample chapter from the Debugging Rules book in PDF format by going here [debuggingrules.com]. (Requires the free Adobe reader [adobe.com].)
  • Top 10 Rules of Debugging (Score:5, Funny)

    by ackthpt (218170) * on Tuesday February 24 2004, @02:55PM (#8376930)
    (http://www.dragonswest.com/ | Last Journal: Monday November 05, @07:35PM)

    10. Code is _always_ Beta. It's never done until it's no longer in use or support no longer exists.

    9. The better the SDK, the more sophisticated the bugs.

    8. There's always more bugs in the other guy's (girl's) code.

    7. Declaring code bug-free is asking for it to fail at the worst possible time with the greatest visibility.

    6. A good design is as likely to have bugs as a bad one. Bugs are equal opportunity.

    5. Debugging time is inversely proportional to coding time.

    4. If it works the first time, there's a bug, but you won't find it until you roll it out.

    3. Debugging is fun. Really! It's when you run out of bugs that you should wonder if you got them all, that's not fun.

    2. The most difficult bugs to find are in the most straightforward looking code.

    1. That's not a bug, that's a feature.

  • Number one by Jooly Rodney (Score:2) Tuesday February 24 2004, @02:58PM
    • Re:Number one by burris (Score:2) Tuesday February 24 2004, @04:11PM
  • Race Conditions? (Score:5, Insightful)

    by Speare (84249) on Tuesday February 24 2004, @03:03PM (#8377015)
    (http://www.halley.cc/ed/)

    Make It Fail is pretty hard to do when it comes to race conditions. This has got to be the most frustrating kind of bug. Others are referring to the Heisenbug which comes in a variety of flavors.

    Sometimes you don't KNOW when there's multiple threads or processes, or when there are other factors involved.

    Have you noticed that a new thread is spawned on behalf of your process when you open a Win32 common file dialog? Have you noticed that MSVC++ likes to initialize your memory to values like 0xCDCDCDCD after operator new, but before the constructor is called? It also overwrites memory with 0xDDDDDDDD after the destructors are called. And that it ONLY does these things when using the DEBUG variant build process? Did you know that .obj and .lib can be incompatible if one expects DEBUG and the other expects non-DEBUG memory management?

    Someone on perlmonks.org was just asking about a Heisenbug where just the timing of the debugger threw off his network queries. Add the debugger, it works. Take away the debugger, it fails. I've got a serial-port device which comes with proprietary drivers that seem to have the same sort of race condition.

    The top 9 rules mentioned here look great. But you could write a whole book on just debugging common race conditions for the modern multi-threaded soup that passes for operating systems, these days.

  • by mykepredko (40154) on Tuesday February 24 2004, @03:05PM (#8377044)
    (http://www.myke.com/)
    probably added a step stating that the problem symptoms and causes should be articulated clearly (probably between #3 and #4) before trying to fix anything. I've seen too many engineers/programmers/technicians list symptoms and attack them individually, only to discover that they were related.

    On the surface, this flies in the face of "divide and conquer" - but what I'm really saying here is make sure you have the problem bounded before you attack it.

    Also, with Step 9, I would have liked to see more emphasis on ensuring that nothing else is affected by the "fix". Making changes to code to fix a problem is often a one step forward and two steps backwards when you don't completely understand the function of the code that was being changed.

    All in all, an excellent book in a little understood area.

    myke
  • Missed one: explain it to someone (Score:5, Insightful)

    by deanj (519759) on Tuesday February 24 2004, @03:06PM (#8377060)
    They missed a good one: explain the bug to someone.

    If you start explaining the bug to someone, there's a good chance in mid-explanation you'll realize a solution to the problem.

    Some school (can't remember which) had a Teddy Bear in their programming consulting office... There was a sign. "Explain it to the bear first, before you talk to a human". Silly as it sounds, people would do it, and a large portion of the time they'd never actually have to consult the staff... by explaining it to the bear, they solved the problem.

    Weird, but true.
    • Re:Missed one: explain it to someone by MythMoth (Score:2) Tuesday February 24 2004, @03:22PM
    • Re:Missed one: explain it to someone by og_sh0x (Score:2) Tuesday February 24 2004, @03:33PM
      • Re:Missed one: explain it to someone (Score:5, Interesting)

        by Speare (84249) on Tuesday February 24 2004, @04:22PM (#8377920)
        (http://www.halley.cc/ed/)

        No, that's a funny thing. I drew that bear icon over ten years ago when I was on the Win3.1 shell team. I didn't even know it still shipped in any MSFT product.

        The teddy bear is named Bear, and was the cuddly companion of one of the Windows 3.1 / Windows 95 shell team developers. He'd carry it *EVERYWHERE*. There are quite a few internal APIs called BunnyThis() or BearThat(), usually with generic numbers, because giving it a name would entice application writers to try to call it. (They're useless three-line internal helpers, but that didn't stop conspiratorial book-writers from trying to document them anyway.)

        Bear also appears in the Win3.1 credits, where I made portraits of spectacled Bill, bald Steve, and large-schnozzed Brad Silverberg.

        Now I don't have any Microsoft products at my house, anymore, except one outdated off-net machine which runs edutainment CD-ROMs for my daughter.

        [ Parent ]
    • Actually, the book has that one by dwheeler (Score:2) Tuesday February 24 2004, @03:43PM
    • Re:Missed one: explain it to someone by ChefBork (Score:1) Tuesday February 24 2004, @06:02PM
    • Debugging by email by gidds (Score:2) Wednesday February 25 2004, @11:25PM
  • The Three R's of Windows Debugging by iguana (Score:1) Tuesday February 24 2004, @03:06PM
  • Missing rule (Score:3, Insightful)

    by timdaly (539918) on Tuesday February 24 2004, @03:06PM (#8377062)
    He missed a rule: Explain the bug to someone else.
    The second pair of eyes often finds the problem
    even if they don't have a clue what you are talking
    about.