C++ Static Analysis

Static analysis refers to analysis of source code outside the context of its execution. For C++, static analysis can identify simple mistakes in your code that you can catch before you ship your code to a customer. Static analysis can be performed during your automated nightly builds alerting you to problems early. In this post, I’ll discuss some tools and techniques for static analysis of your C++ code.

Duplication

One of the most annoying things that can happen to your code base over time is creeping duplication. Snippets of code get reused by copy and paste instead of being extracted into a function or method that is called from two places. As the code base gets larger and larger and more and more people are working on the code, identifying the duplication can be difficult. When the code is small, we’ve passed all the source text in front of a single set of eyeballs and we can rely on the pattern matching of our brains to identify the duplicated code. You look at a chunk of code and think: “Didn’t I just see that somewhere else?” Your brain helps you identify repeated phrases that you can extract into functions or methods to eliminate the duplication.

So how do we find duplicated code blocks? We could run a text differencing tool on the code, but this will only find blocks of code that are exactly duplicated in multiple places. What if the code block is the result of a copy and paste, but has had its variables renamed, comments changed or removed, and so-on? What we need is a tool that looks for similar code constructs ignoring the parts that don’t matter: whitespace, bracing style, comments, arbitrary renamed variables.

Fortunately there is such a tool availble to you: its called Simian, short for Similarity Analyzer. Simian is free for use on non-commercial projects and is available at a reasonable licensing fee for commercial projects. Simian can work with a variety of languages and works by comparing code structures and not just the text. You can generate XML output from Simian for processing of the results into reports or trend tracking.

Complexity

Another thing that can creep up on you as you extend and modify a code base is complexity. Ideally you’d like to have the complexity of functions and methods not exceed some ceiling. A common rule of thumb among agile practitioners is to examine any method or function whose cyclomatic complexity exceeds 30. There have been teams that have instituted checkin policies that will reject checkins whose complexity exceeds the agreed upon ceiling.

As I discussed in my post on agile code reviews, SourceMonitor is a tool for measuring cyclomatic complexity on a code base. It can be driven from the command line and produce complexity reports that you can use to guide you on areas of your code base that have gotten too complex. SourceMonitor can product XML output for easy manipulation of the results.

Coding Errors

OK, duplication and complexity are great, but those are more in the “nice to have” category for static analysis. How do I find the errors in my code? Finding the errors is a “must have”. What about identifying memory allocated in a constructor that isn’t freed in a destructor? What about checking that the types of the arguments to printf match the format specifiers? These are all examples coding errors that can be identified by static analysis of the source code.

Lint is a static analysis tool for C that’s been around almsot as long as C itself. Commercial versions of lint are still available, such as PC-lint. There are two free, open-source tools available for static analysis of C++ source code: cppcheck on sourceforge and cppclean on google code.

Cppclean feels much more like an experiment than a useful tool. When I tried to run it on some C++ source code I had laying around for fractint, it’s parsing mechanism reported many internal exceptions and failed to analyze those files. For the files it could parse, it didn’t report very many useful results and the format of the results isn’t very useful for generating reports or subsequent processing.

Cppcheck was able to process my source code without difficulty (a good sign; the fractint source code is quite grungy). One thing I like about cppcheck is that it processes your source file for each variant of #if clauses that are present in the file. So if you have a source file that is #ifdef’ed one way for Windows and another way for Mac and another way for Linux, cppcheck will process the file three times looking for inconsistencies in each combination. If you have too many combinations to analyze, cppcheck will give up and continue with the next source file.

Cppcheck has a goal of no false positives, meaning that if it reports about your source files, then its found an error. If your code checks out clean, then you can setup your nightly build to fail if cppcheck reports finding any problems. When combined with a continuous integration system, this can be a rapid way to identify problems creeping into your codebase. Cppcheck can generate output in XML form for easy manipulation.

For a full list of checks performed by cppcheck, consult the cppcheck main page. It can report the following kinds of problems in your code:

  • Misuse of automatic variables
  • Bounds checking
  • Problems with classes (virtual methods without virtual destructors, etc.)
  • Use of deprecated functions
  • Memory leaks in functions and classes
  • Undefined behavior (unsigned division, divide by zero, etc.)
  • STL usage (dereferencing an erased iterator, etc.)

Cranking It Up

Last but not least, don’t forget that your C++ compiler can have quite a bit of static analysis that it can do for you as well. The best way to get the most out of your compiler is to crank up the warning level. On Visual C++, the default warning level is 3, so you can get more analysis from the compiler by setting the warning level to 4. For gcc, see the section in the manual on warning options. Generally compilers don’t product XML output of their warning messages, but many times the output of the compiler is integrated into the IDE to allow you to quickly edit the location of the warning. If you get your code compiling cleanly with no warnings at the maximum warning level, you can change your compiler to treat warnings as errors, quickly alerting you to any new warnings creeping into your codebase.

Summary

All of these static analysis tools allow you to perform two tasks on your code base: lock in forward progress and identify new problems as soon as they occur. You can lock in forward progress by integrating static analysis into your nightly builds and reporting trends, alerting you when the trend indicates a decrease in quality (more duplication, excessive complexity, more warnings). You can identify new problems as soon as they occur by integrating static analysis into your continuous integration builds as well as your nightly builds. Even without continuous integration builds, a nightly static analysis report can alert you today to problems introduced yesterday. The most aggressive approach is to integrate a level of static analysis into your source code control system, rejecting commits that don’t satisfy the constraints of static analysis.

13 Responses to “C++ Static Analysis”

  1. Romain Says:

    Just to add that with a compiler like Visual C++, you can use the /analyze option which will perform analysis on buffer overflows, etc. This is actually using the Microsoft Prefast tool.
    Another tool, free, is Rose and especially the Compass plugin: http://rosecompiler.org/
    which is a tool for C++ by C++ guys, one of the most advanced C++ understanding I’ve seen across static analysis tools…

    Like

  2. legalize Says:

    I wanted to focus on tools that are freely available in this article. The /analyze switch for Visual C++ is only available in the enterprise editions fo Visual Studio (i.e. Team Suite Edition). Still, its worth a mention, so thanks for reminding me of its existence.

    Like

  3. Jörgen Grahn Says:

    Good to see the most underused analysis tool mentioned — the compiler. A bit disappointing that there was no mention of the super-expensive tools, like that French one.
    Also, what’s up with that “export to XML” feature? I’m a Unix guy — plain text output I can handle (by eyeballing and trivial Perl scripting), but XML is both hard to parse and read.

    Like

  4. legalize Says:

    If output is available in XML, then you can use a large variety of XML processing tools to manipulate it, including XSLT to easily convert the XML to HTML for reporting. There are plenty of tools for unix that manipulate XML as well. If a tool offers plaintext output only, then I have to write custom scripts to manipulate that tool’s output. With XML I can use standard tools.

    Like

  5. Emil Dotchevski Says:

    Not saying that copy&paste is a good thing but reusing code results in coupling, and coupling is bad too. Sometimes it makes sense to duplicate a piece of code from a library in order to avoid having to require that library. For example, usually I can’t justify including the appropriate header from Boost Typetraits instead of spelling out remove_const:

    template remove_const { typedef T type; };
    template remove_const { typedef T type; }

    Like

    • legalize Says:

      Like most things in code, its a judgment call. Simian would report that duplication, but I don’t know any shops that are so slavishly eliminating duplication that they’d justify the replacement of that one line with the one line #include to bring in the boost header for it. When I use Simian to look for duplication, its because I’ll have huge swaths of duplication, not just single lines :-)

      Like

  6. legalize Says:

    Recently the topic of static analysis came up on the DirectX developer’s mailing list. If you’re using Visual Studio, take a look at the documentation for the warnings that are off by default. In particular, Visual Studio’s C++ compiler will warn you about classes containing virtual methods that have no virtual destructor, but this warning is off by default.

    Like

  7. legalize Says:

    On the Visual Studio Team Blog, they have announced that static code analysis will be supported in all editions of Visual Studio 11, including express editions. Also, code analysis has broader coverage and if you use the SAL annotations on your headers, you can leverage the analysis further.

    Like

    • Alex Z Says:

      The problem is (as I understand it) that different editions will include different rule sets, and Express edition will include only very basic one. Please correct me if I’m wrong

      Like

      • legalize Says:

        I believe they are correcting this problem in Visual Studio 11

        Like

        • Alex Z Says:

          I doubt it. Currently only ‘big’ enterprise editions have /analyze feature, but for VC11 they announced universal availability. However as I understood set of rules will be different.
          However we’ll it see very soon, when final release will be available.

          Like

          • legalize Says:

            The static analysis feature will be present in all editions. I’m not sure about the rule sets, I think we will have to wait and see, but I believe their intention is to not only make the feature available to all editions, but also increase the rule sets available to all editions. That may not mean that all editions have all the rules, but they will have more rules available than currently, which is good.

            Like


Leave a comment