Document the Why

freework · on March 6, 2013

Writing great comments is the kind of thing you really have to learn the hard way. When I first started coding, I used to comment my code in what I call the "micro-comment" format (comments that describe a single line of code). In my case, I had a project that I had to abandon for a period of few months. When I had to go back into that project to fix up some things, I couldn't anything, despite my micro-comments. I actually had to completely abandon a 20,000 line code base of a personal project because I couldn't read the code. That vivid experience of personally feeling the pain of my bad commenting is what taught me how to comment properly. I kept asking myself "Why the fuck did I do that? What is this code doing? Jeez, these comments are not helping me at all".

Now when I write things, I know to comment in what I call the "macro-comment". That is to say instead of writing a comment to describe a single line of code, you write comments to describe a block of code. Since that one project's fail, I haven't had that problem again.

Another point I want to make is that I really feel like my commenting skills are extremely valuable to me as a developer. I I had to have my brain erased, but I could choose one skill to remain, I'd choose keeping my code commenting skills. Its also the one skills that I care about the most in my co-workers. I don't care if you're really good at code golf. I don't care if you know haskell. Can you comment code properly? Whenever I interview for a job, I'm never able to demonstrate this skill. The Fizzbuzz problem doesn't allow me to show off my commenting skills. Writing a binary tree parser doesn't let me show off my documentation skills. I think it's the most important skill I have, but I never get the chance to show it off when interviewing.

sophacles · on March 6, 2013

I almost fully agree with you, however there are also very important "micro comment" cases. IME this comes when implementing algorithms from papers. For example, when looking at some formula in a paper, and looking at a python implementation of it that uses numpy, there may not be an obvious 1:1 mapping. Numpy has some very powerful operations, so commenting each line with how that statement maps back to the algorithm makes revisiting or others' visits much easier to comprehend.

Another case is the "reminder note"... e.g. "This function also sets up the next step..." (I know, side effects and whatnot, but sometimes you're stuck with other people's libraries).

mmatants · on March 6, 2013

So basically the generalized description is "don't document what is immediately evident from language syntax/function name itself".

thebear · on March 6, 2013

Thank you for mentioning the case of implementing algorithms. I could not agree more. Without comments, it can be excruciatingly difficult for a reader to map (thanks for using that word, spot on!) the code to the high-level description of the algorithm. Comments provide that mapping. This gets more important as the algorithms become more complex, as in higher mathematics.

ZoFreX · on March 6, 2013

> Whenever I interview for a job, I'm never able to demonstrate this skill. The Fizzbuzz problem doesn't allow me to show off my commenting skills.

A good interview process does let you show off your commenting skills. FizzBuzz is useless to measure ability, but wonderful for screening out those who have literally no ability. For my money one of the most valuable components to a good interview is a code sample, either look at code they've written in the past or give them a small problem to work through.

I might be biased though. I got more job offers from companies that interviewed me that way, including my current position :)

h2s · on March 6, 2013

People who document the "how", or worse, the "what", generally do so because they're bad at reading code. I work with a few people that do this and they readily admit that that is the reason why they feel the need for comments of the form...

     // set the user data
     var userData = {

The thing that grates with me is that this is apparently an acceptable stance on the issue. These people don't see this inability to read plain code as a flaw, or something they should work on.

LinaLauneBaer · on March 6, 2013

I think another reason of "overdocumenting" code is that you are working on something that has never been done before so you make extra sure that what you did makes sense by writing it out. I think that I am good at reading code but I still comment the how or what in some edge cases where the thing at hand is new to myself and/or to most of the people on the team.

LukeShu · on March 6, 2013

I also like "over-documenting" the "what" when implementing a spec/standard. Let's just copy/paste in the relevant paragraph from the document! In those cases, you don't need to know the "why" (though it helps); the committee/WG did that, and is telling you the "what".

Ensorceled · on March 6, 2013

Actually that is also a "why" as far as I'm concerned.

// HAVE to do it this way to meet IEEE 12345, DO NOT CHANGE

eykanal · on March 6, 2013

This applies to change tracking as well... I've essentially used this message when describing to my team how to write useful Git commit messages. The diffs will tell you the where and the what, the job of the commit message is to tell you the why.

jtbigwoo · on March 6, 2013

I've started pretending that all my commit messages will finish the statement, "I made this change because..."

muxxa · on March 6, 2013

I agree and would go further: comments in commit messages tells you the motivation of a person at a particular point in time, while comments in the code are less trustworthy as they can go out of sync with the code. I find the usecases for in-code commenting to be very rare indeed.

natefinch · on March 6, 2013

The whole point is the why. Yes, they can occasionally get out of sync with the code, but that should get caught in code reviews. In 2 years you'll go back to look at the code and wonder why in the hell you're stripping out the 3rd byte of that array... and only the why will tell you. The code can tell you WHAT it is doing, but you have to infer why, unless there's a comment.

jtheory · on March 6, 2013

My primary rule for commenting is to document anything likly to be unexpected or counter-intuitive to someone else on their first read of the code.

You don't always need to "document the why", because often it's obvious enough (with a well-named hierarchy of code, method names, etc.)... but it's essential to get a feel for times when someone else will see your code doing Y and think "ah, this would be easier if we just do X", and comment those carefully.

Related: read other code; get a sense for common programming idioms, and use the common approaches unless you have a really good reason not to.

You may be proud to have mastered a little-used feature in your language of choice that saves you a line of code here and there, but if the cost is that 80% of the people reading your code later are confused, you're not winning.

ZoFreX · on March 6, 2013

> You may be proud to have mastered a little-used feature in your language of choice that saves you a line of code here and there, but if the cost is that 80% of the people reading your code later are confused, you're not winning.

What if it's a feature little-known within your team, but considered common or even idiomatic in the wider world?

jtheory · on March 8, 2013

Then take the time to teach them about it, and why it's worth picking up.

Everyone wins in that scenario.

LinaLauneBaer · on March 6, 2013

For a long time comments in code were considered as a code smell by myself. I only did comment my code if I had to say something about the "why" aspect of it. A few months ago I changed that and now I am back to commenting more than before and not only the why but also the how - of course not always but if there is a complex piece of code I will comment it. I do this even though I am using a language which produces very readable source code. Here is the why:

We are humans and we do communicate with our own language. Our brain is not made to read code. We are not a compiler who thinks in EBNF. Commenting the how is appropriate in many cases especially if you are not working alone but in a team. (1)

I really enjoy comments of people who are smarter than myself. Those comments (especially) helped me a lot to improve my own skills, to understand their code better and my feeling is that those essential comments make our code better - not worse.

(1) "we/our" = the people working in our startup

pjungwir · on March 6, 2013

Some day I'm going to write a blog post about this, but I think comments are like commentary on a chess game. Sure, I can read Nf6, but I want to know so much more. Like chess, computer code leaves a lot unsaid. Perhaps I did it this way for performance implications, or an edge case off in some other file, or to work around a bug in a 3rd-party library, or for the sake of pattern x, or . . .

As a mediocre chess player, I can read the notation for two chess games and appreciate a lot, but I doubt I could tell the difference between a game played by players rated 1800 and one by 2400s. Programming seems the same way: you write comments so that other engineers don't have to be 2400s to understand the hidden implications.

ricardobeat · on March 6, 2013

I'm surprised no one has mentioned Literate Programming[1]. It's a concept where comments should reflect the programmer's intentions, explain why, not what is going on, intertwining text and code.

It lives on in CoffeeScript and tools like Docco. The just-released Coffeescript 1.6 supports a mix of markdown and code that looks great: http://ashkenas.com/literate-coffeescript/

[1] http://en.wikipedia.org/wiki/Literate_programming

jes5199 · on March 6, 2013

To jump a level: you should be documenting the "why" of your business decisions as well! I've been on projects that have long TODO lists that are divorced from the reasoning behind those lists, and then market changes but the roadmap doesn't - because we've forgotten why we're doing what we're doing.

Ensorceled · on March 6, 2013

I've been putting essentially this same advice in coding guidelines for about 20 years now:

    Comments explain why you are doing something, your code
    should be written in a style that automatically shows the
    who, what and how.

ctdonath · on March 6, 2013

"When" too. Nice to know whether the weirdness was added last week or last decade. I've left old obtuse comments in place just as proof that the function, as written, was there for the last 14 years (which sometimes explained a lot).

Ensorceled · on March 6, 2013

I don't find when useful, except in post mortem situations. The code is the code. Trying to capture history leads to things like old code being commented out to preserve the context. Use a good code repository instead.

mercurial · on March 6, 2013

git annotate?

ctdonath · on March 6, 2013

I'm talking code that has out-lived several "yeah we'll use this forever" version control systems. If the comment isn't in the code it will get lost.

ZoFreX · on March 6, 2013

FWIW you can actually migrate between most major VCS systems without losing your history.

Someone · on March 6, 2013

Yes, but it is neigh impossible to migrate from say Python to C, a copy-paste into an entirely different project, and from there via Java to C# and keep history. That may be a stretch, but even one such step will almost certainly lose history.

mercurial · on March 6, 2013

It also depends on what kind of stuff you've done with your VCS. "Creative" SVN layouts may be difficult to merge.

ctdonath · on March 7, 2013

That's assuming somebody remembers where it was.

stcredzero · on March 6, 2013

What does "who" mean in this context?

Ensorceled · on March 6, 2013

The developer.

I ended up firing a guy because he would not stop littering the code with crap like:

    // 2 lines added by I. Dee Ott Feb 12, 2005
    // increment i
    i++;
    // 1 line removed by I. Dee Ott, Feb 12, 2005
    // i = i + 1;

But also the actor or user role.

    // this function can only be run by admin

Is only true if enforced in some way.

EvilTerran · on March 6, 2013

I'd take that to mean "who wrote the code"; so it's less of a coding style guideline, more coding strategy: "use version control with per-person accounts, so you can tell down the line who made a change".

stcredzero · on March 6, 2013

> I'd take that to mean "who wrote the code"

Which would make sense, except that it's the code itself said to indicate this. Or, does he mean for that to be documented in the RCS?

timr · on March 6, 2013

Why is a good start. But documenting the who, how, where, and when are important, too:

* Who is meant to use this code?

* How is this code supposed to be called?

* How is it organized?

* How does it do its job?

* Where are the dependencies?

* When is it appropriate to call it?

And finally, the What is also extremely important at a macro level:

* What does this block of code do?

* What are the gotchas?

* What special requirements are necessary?

Obviously, commenting isn't typically as useful at the granularity of a single line of code. But the bigger the block of code these questions document, the more important they are. By the time you get to a class or file level, they're essential.

I think that every programmer should take a course in journalism, so that they understand the critical importance of the 5w's + h. But I'd settle for programmers who actually take the time to write comments. An extra 10% of your time saves your team exponential time in the future, because it cuts down on the communication overhead. There's simply no valid excuse for not writing comments -- only laziness.

kintamanimatt · on March 6, 2013

This applies to more than just code comments. I have a tendency to write out decisions I make so that I can come back to them later. In particular I write not just about what I've decided, but why and how I arrived at the decision. Often I've forgotten these things months later and the written reasoning either helps me change direction intelligently if necessary, or gives me reason to stick to my original resolution.

EEGuy · on March 6, 2013

Expressing "The Five Ws"[1] (plus the "How") in and about code, adapts that excellent journalistic tradition [1].

I'm constantly trying to find a balance. Definitely the "Why" of a code change is most difficult to self-document.

A style I've used for decades, a bit wordy, but useful in merging change sets, keeps a change log at the top of every module / source file naming the person making a change (who), dating the change (when), tagging it with a pseudo-html tag [wrapping an area of code changes in the body of the source file] (what and where), and at the top, a reference to the failure case or test case (why) explaining why it was necessary to make this change.

When one change spans multiple files, I dedicate one source file to contain the detail, and all the others make reference comments to the one source file containing the detail.

[1] https://en.wikipedia.org/wiki/Five_Ws

subb · on March 6, 2013

This is interesting because what you are basically doing is what your VCS should do. However, even if you write good commit message (why) and make localized commit (what, where, who, when), the visualization that you get from your technique is not matched by the VCS. Plus I guess it's more high-level than what the VCS gives you. You probably don't add something to your change log if all you did was fix a typo.

Also, a changelog only documents the why of modifications, not the why of existence of a class, function, etc.

EEGuy · on March 7, 2013

Indeed. VCS commit messages speak first to me in distinguishing the major "synch points" of a particular version's (branch's / shelf's / trunk's / tag's ) life cycle. Some of those synch points summarize an achieved goal (a concise "what" and "why"), but you have it spot on: VCS comments, at least as I write them, cannot match the finer granularity of inline comments.

In writing VCS commit comments, I'm finding it most useful to compose these major-event commit events beginning with a <what> word written in all caps:

o COPY out to SHELF <name>

o MERGE in fixes for Ticket <id> from <what node>

o RELEASE to TAG <id> as copy from <TRUNK | BRANCH <name>, MERGE back to TRUNK >

o REVERSE MERGE my bad in Commit #<Rev>, detail in Ticket <id>

Daily commits for work in progress don't get the all-caps treatment. My VCS (or perhaps my understanding of it) doesn't allow me line-level or block-level commit comments; those would reduce my inline comment density.

That said, it's worth noting that the code base has been maintained through three VCSs, and independently annotated by two disjoint Ticketing systems, one of which went through an involuntary Ticket renumbering that was self-overlapping and irreversible. That taught me to prefix my Ticket references with a few "character constants" identifying the Ticketing system "du jour". A bare Ticket ID number has an implicit context; a "fixed prefixed" number communicates context whose value takes quantum jumps in value and user gratitude when the Ticketing system gets changed out.

A cumulative top-of-the-file change log, in reverse chrono order. increases in value with age. Some of the judgement errors I made in years past become clear now, many times tripped across while looking for something else. Candor in commenting pays.

ZoFreX · on March 6, 2013

Another way to achieve this goal is to include a ticket number in any commit relating to it. That way you can easily find out why the changes had to be made at that time (assuming your tickets have good descriptions - they do, right?)

EEGuy · on March 7, 2013

Oh yes, very much so. As I use it, a Ticket can carry, as necessary:

o A discovery and progress narrative, useful to regain working context after long interruption, fit neither for inline comments nor VCS commit comments.

o A research and reference document list

o A related Ticket reference list

o A summary recovery plan and record

o A summary integration plan and record

jimbobimbo · on March 6, 2013

I can't recommend this enough. I'm working through a massive legacy codebase now and sometimes you see something really-really weird w/o any explanation around it at all. So frustrating.

Please love your fellow co-workers, document the why!

mooreds · on March 7, 2013

My favorite way to comment the why is to put in a URL to an external bug tracker or wiki. This means more work for the reader, but can really show the back and forth over why a decision was made. I've come across comments like this in code I wrote years ago and being able to quickly review the logic helps tremendously.

vineet · on March 6, 2013

The WHY is definitely important. But, it is also important to include the HOW the code works (as opposed to just WHAT the code does) and HOW to use the component.

I find it useful to think of three four questions as a good checklist when reviewing code. Naming often helps in only one of the above.

Alexandervn · on March 6, 2013

The problem is that the implementation of the 'why' can be scattered over many files, classes, functions, etc. So where to put the comments?

In my projects (usually building websites or webapps) I therefore add a 'readme.md' to the root of the project and document general choices there.

randomdata · on March 6, 2013

I think inline comments are fine for when you are doing something really weird (for, say, performance reasons) that needs explanation, but if you want to document the why of the entire codebase, I feel the test suite is a better place for that.

Not only can you explain the why and demonstrate usage in a logical manner, but you get some consistency checks from your documentation for free.

ZoFreX · on March 6, 2013

I wish more projects would document the more "macro" knowledge about their code - the infrastructure and organisation, for instance, or the metaphors used throughout the code.

timr · on March 6, 2013

The problem is that the implementation of the 'why' can be scattered over many files, classes, functions, etc. So where to put the comments?

You put them everywhere that might be relevant, along with cross-references to other relevant code/documentation. Humans aren't robots -- redundancy is a good thing, because it helps our feeble minds more quickly reinforce key concepts.

mercurial · on March 6, 2013

The first thing to know about comments is that they're sooner or later going to get out of touch with the code, unless you have some phase where you are going to review them in depth.

Obviously, redundant comments scattered across the code are even worse: out of sight, out of mind.

timr · on March 6, 2013

That's the oldest, tiredest objection to commenting that you could possibly write. It's the first objection raised by every wet-behind-the-ears, new-grad coder. It's a cliche. It's also crap. Yes, unmaintained comments will go out of date. That's why you maintain them, just like you maintain your code. And don't argue that it's too hard -- writing code is harder, and you do that all day long.

Your first job as a professional programmer is making the rest of your team more productive by reducing the communication burden. Your second job is understanding old code. The amount of code you personally write is a distant third priority.

mercurial · on March 6, 2013

Funny you should say that, considering the number of experienced coders who have a different opinion. I'm not arguing for no comments at all (especially regarding non-obvious things), but most of your code should be self-documenting, assuming sufficient domain-level knowledge from the reader. If it's self-documenting, comments are redundant.

timr · on March 6, 2013

"most of your code should be self-documenting, assuming sufficient domain-level knowledge from the reader. If it's self-documenting, comments are redundant."

You're hitting on all the classic lame excuses. Self-documenting code is a myth, invented by lazy coders who would rather crank out code than write documentation, because it's more fun to write code. They're wrong, but they often don't know that they're wrong, because the same people who suck at writing documentation usually suck at reading code.

Clean code is a necessary, but insufficient precondition to understandability. Documentation is essential, because code can only tell you what -- it can't tell you where it should be used, why you should use it, how it should be used, who who should use it or when it's appropriate. And in any case, your code is never as clean or self-documenting as you think it is.

That said, I have no doubt that you're hearing these things from "experienced" coders, because our industry is filled with "experienced" people who suck at what they do. Also, these things are cliches for a reason. They get repeated a lot.

mercurial · on March 7, 2013

Actually, that's what we use with a rather large codebase, and it works out pretty well. Different tacks for different people, I suppose.

timr · on March 8, 2013

Not really, no. It's not a matter of opinion.

You can obviously get away with not documenting code, but you don't know what it's costing you. However well things are working out now, I can guarantee that they'd be working better and faster for your team if you spent the time to write good documentation.

mercurial · on March 8, 2013

I'll just leave this here: http://www.martinfowler.com/bliki/CodeAsDocumentation.html

You may think Martin Fowler is one of these "experienced people who suck at what they do", but you're likely to find yourself part of a small minority.

geoka9 · on March 6, 2013

going to get out of touch with the code, unless you have some phase where you are going to review them in depth

I think treating comments as something that's done in a separate phase is the wrong mindset. Comments should be integral part of code and therefore change whenever the relevant code changes.

mercurial · on March 6, 2013

Maybe, but it's not that easy. You're generally going to see a 3-lines-of-context diff, and the reviewer won't necessarily have the idea to read the entire file (or worse, go poking elsewhere in the codebase) to see if comments deserve changing.

natefinch · on March 6, 2013

Well, that's an overarching why. But why you're doing this one specific thing on this line, which might look wrong or overly complex or just bizarre... you need to comment those.

  // foo library throws an extra two bytes at the front of the
  // array, even though that isn't up to spec, so we have to
  // strip them out here.

car54whereareu · on March 6, 2013

I throw away most of my own code, and my comments.