Have you ever been tasked with doing classic “Fagan” style code reviews? They’re boring. They’re tedious. They take up lots of time and yield little, if any, results. At a previous employer, we did these kinds of code reviews. It consisted of printing out the source code to be reviewed, one copy per reviewer. Each of us reviewed the code alone, making notes to the printout. Then we’d gather in a conference room and discuss our findings while one person combined all our notes into a list of items to rework. That person would rework the code alone and then report back to the group the results. Whew! This is what the agile development world would call a “high ceremony process”. It has lots of formal steps and the associated paperwork. Is there a better way? In this post, I’ll discuss what we do at my current employer that makes code reviews a fun team activity.
First, we banish the paperwork! Instead of printouts we use a projector and a laptop with a development IDE running (or a remote desktop session to a development machine with the IDE running). We have quite a bit of legacy code on our hands, so we don’t have the luxury of reviewing everything in our code base. Since we have so much code, where do we focus our attention?
The above diagram visualizes the idea that the hot spots in the code are the areas where high complexity, high bug counts and high revision counts converge. This fits with your intuition that the problem areas of your code are complex, keep getting bugs reported against them and you have to come back and edit them again and again.
That’s great, but how do we measure this stuff so that we can find it? First, let’s start with complexity. SourceMonitor is a great tool for measuring cyclomatic complexity and other metrics. SourceMonitor is free and I highly recommend it. You can even automate the gathering of metrics on your source base in your nightly build and build a running dataset of how your code complexity changes over time. If you’ve got really good code, you can even fail the build if complexity exceeds a certain threshold.
For bug and revision counts, you can adopt either a metrics approach or an informal approach. On my team, we’ve adopted an informal approach of reviewing the highest complexity methods and classes in our code base and leverage the collective memory of the team in identifying which code we’ve changed often in response to bugs.
So what would the metrics based approach look like? For this to work, you have to have some sort of association between bug reports and code. Some bug databases require that you enter the files that changed in order to correct the bug. If you have that information, you can mine it for which files are associated with the most bugs. If you have version control commit messages that follow a certain format and contain markers indicating that the revision was done in response to a bug fix, then you can mine your commit messages for information associating files with bug counts. For the revision counts, you can mine your source code control system to identify the files that change most often. Even if these files are not changing often in response to bug reports, that they change often might indicate that the code needs a review.
For revision counts, some version control systems count the number of revisions to a file. CVS is one such version control system. Subversion attaches revisions to a single snapshot of the entire repository, so it doesn’t track revisions per file. It does track which files were modified for each revision, so with a simple script you can accumulate the revisions per file. RevisionCounts.js is a simple JScript file for Windows that will process the output of
svn log -v on its standard input and output a tabulated list of the revision counts, with the most often revised files listed first.
Once we’re armed with this information, we can identify the hotspots. So what do we do about them? We treat the code review as a team refactoring session instead of some formal process with paperwork and log sheets. We use Visual Assist X, an automated refactoring tool, to refactor our code. Because the refactorings are applied in an automatic manner, there is a reduced likelihood that we will introduce defects as we change the code.
When we have long methods that are hard to understand, we apply the Compose Method refactoring to break it down into smaller methods.
When we have names that don’t reveal their semantic intention, we apply the Rename Method refactoring to the offending names.
When we have lots of code duplication, we apply the Extract Method refactoring to eliminate the duplication.
What makes this a team activity is that we are discussing the code as a group as one person drives the code changes in the IDE. We discuss what would be a good name when we change names. We discuss how to best eliminate code duplication when we extract methods. We discuss the best way to compose a long method into a series of steps. Because we have a group of eyeballs looking at the code as its being changed, its somewhat like a super-charged pair programming experience. Little typos are caught right away and any change errantly introduced in the logic is caught by a member of the team.
Naturally, it would all be more comforting if we had a suite of unit tests or regression tests covering the code as we change it. So far, we have restricted ourselves to small refactorings that can be performed in an automated way by tools like Visual Assist X to minimize the danger of the changes. Over time as we take care of the worst offending long methods and complex methods, there will be less that we can do in an automated way without the safety of tests. When that time approaches, we’ll have to spend some time fitting tests onto the code we are changing before we can safely make the change. Whats really interesting is that these team refactoring exercises have made the team more comfortable with refactoring on real code, giving them confidence in when and how to apply the refactorings to the new code they’re writing on a daily basis. Best of all, it has turned what used to be a chore (“code cleanup”) into a fun activity that we perform together. Along the way, we all get a little more familiar with the code and with techniques for taming unruly code.