Static analysis refers to analysis of source code outside the context of its execution. For C++, static analysis can identify simple mistakes in your code that you can catch before you ship your code to a customer. Static analysis can be performed during your automated nightly builds alerting you to problems early. In this post, I’ll discuss some tools and techniques for static analysis of your C++ code.
One of the most annoying things that can happen to your code base over time is creeping duplication. Snippets of code get reused by copy and paste instead of being extracted into a function or method that is called from two places. As the code base gets larger and larger and more and more people are working on the code, identifying the duplication can be difficult. When the code is small, we’ve passed all the source text in front of a single set of eyeballs and we can rely on the pattern matching of our brains to identify the duplicated code. You look at a chunk of code and think: “Didn’t I just see that somewhere else?” Your brain helps you identify repeated phrases that you can extract into functions or methods to eliminate the duplication.
So how do we find duplicated code blocks? We could run a text differencing tool on the code, but this will only find blocks of code that are exactly duplicated in multiple places. What if the code block is the result of a copy and paste, but has had its variables renamed, comments changed or removed, and so-on? What we need is a tool that looks for similar code constructs ignoring the parts that don’t matter: whitespace, bracing style, comments, arbitrary renamed variables.
Fortunately there is such a tool availble to you: its called Simian, short for Similarity Analyzer. Simian is free for use on non-commercial projects and is available at a reasonable licensing fee for commercial projects. Simian can work with a variety of languages and works by comparing code structures and not just the text. You can generate XML output from Simian for processing of the results into reports or trend tracking.
Another thing that can creep up on you as you extend and modify a code base is complexity. Ideally you’d like to have the complexity of functions and methods not exceed some ceiling. A common rule of thumb among agile practitioners is to examine any method or function whose cyclomatic complexity exceeds 30. There have been teams that have instituted checkin policies that will reject checkins whose complexity exceeds the agreed upon ceiling.
As I discussed in my post on agile code reviews, SourceMonitor is a tool for measuring cyclomatic complexity on a code base. It can be driven from the command line and produce complexity reports that you can use to guide you on areas of your code base that have gotten too complex. SourceMonitor can product XML output for easy manipulation of the results.
OK, duplication and complexity are great, but those are more in the “nice to have” category for static analysis. How do I find the errors in my code? Finding the errors is a “must have”. What about identifying memory allocated in a constructor that isn’t freed in a destructor? What about checking that the types of the arguments to
printf match the format specifiers? These are all examples coding errors that can be identified by static analysis of the source code.
Lint is a static analysis tool for C that’s been around almsot as long as C itself. Commercial versions of lint are still available, such as PC-lint. There are two free, open-source tools available for static analysis of C++ source code: cppcheck on sourceforge and cppclean on google code.
Cppclean feels much more like an experiment than a useful tool. When I tried to run it on some C++ source code I had laying around for fractint, it’s parsing mechanism reported many internal exceptions and failed to analyze those files. For the files it could parse, it didn’t report very many useful results and the format of the results isn’t very useful for generating reports or subsequent processing.
Cppcheck was able to process my source code without difficulty (a good sign; the fractint source code is quite grungy). One thing I like about cppcheck is that it processes your source file for each variant of #if clauses that are present in the file. So if you have a source file that is #ifdef’ed one way for Windows and another way for Mac and another way for Linux, cppcheck will process the file three times looking for inconsistencies in each combination. If you have too many combinations to analyze, cppcheck will give up and continue with the next source file.
Cppcheck has a goal of no false positives, meaning that if it reports about your source files, then its found an error. If your code checks out clean, then you can setup your nightly build to fail if cppcheck reports finding any problems. When combined with a continuous integration system, this can be a rapid way to identify problems creeping into your codebase. Cppcheck can generate output in XML form for easy manipulation.
For a full list of checks performed by cppcheck, consult the cppcheck main page. It can report the following kinds of problems in your code:
- Misuse of automatic variables
- Bounds checking
- Problems with classes (virtual methods without virtual destructors, etc.)
- Use of deprecated functions
- Memory leaks in functions and classes
- Undefined behavior (unsigned division, divide by zero, etc.)
- STL usage (dereferencing an erased iterator, etc.)
Cranking It Up
Last but not least, don’t forget that your C++ compiler can have quite a bit of static analysis that it can do for you as well. The best way to get the most out of your compiler is to crank up the warning level. On Visual C++, the default warning level is 3, so you can get more analysis from the compiler by setting the warning level to 4. For gcc, see the section in the manual on warning options. Generally compilers don’t product XML output of their warning messages, but many times the output of the compiler is integrated into the IDE to allow you to quickly edit the location of the warning. If you get your code compiling cleanly with no warnings at the maximum warning level, you can change your compiler to treat warnings as errors, quickly alerting you to any new warnings creeping into your codebase.
All of these static analysis tools allow you to perform two tasks on your code base: lock in forward progress and identify new problems as soon as they occur. You can lock in forward progress by integrating static analysis into your nightly builds and reporting trends, alerting you when the trend indicates a decrease in quality (more duplication, excessive complexity, more warnings). You can identify new problems as soon as they occur by integrating static analysis into your continuous integration builds as well as your nightly builds. Even without continuous integration builds, a nightly static analysis report can alert you today to problems introduced yesterday. The most aggressive approach is to integrate a level of static analysis into your source code control system, rejecting commits that don’t satisfy the constraints of static analysis.