Adding Static Analysis to Your C++ GitHub Repository

Static analysis can be extremely useful for monitoring the quality of your code base. These tools analyze your source code and check for certain kinds of mistakes that can be detected purely based on how the code is written. In this post, I’ll show you how you can add two free static analysis tools to a free continuous integration build for your C++ github repository.

Free Continuous Integration Builds

To get free continuous integration build on your github repository, you can use Travis. This is a free service that you enable for your repository. It uses the github API to perform a build every time you do a push to your repository on any branch, or whenever someone issues a pull request against your repository. Currently Travis supports Linux builds and MacOS builds on a case by case basis. Longer term they are working on a solution for Windows builds.

The basic mechanism that drives Travis is a YAML file named .travis.yml at the top level of your repository.

Free C++ Static Analysis Tools

The general advice for static analysis tools is to run as many tools as you can. Each tool will have its own strengths and weaknesses and will reveal different problems with your code. Since the Travis build currently only supports Linux, my choice of tools was limited to those I could run in this environment. I was already familiar with running cppcheck and the clang static analyzer in a Linux environment, so I chose those two.

cppcheck

Cppcheck is a free static analysis tool for C++ that works by matching source file tokens against a pattern for each analysis rule. It works fairly well and has the ability to explore all possible permutations of #define symbols controlling compilation of your code. Individual false positives can be suppressed by command-line argument, by a configuration file, or by annotating source code with suppression comments. I prefer the suppression comment since it will move with the code as lines are added or removed before the location of the false positive and it serves to document an analysis assertion about the associated code.

clang Static Analyzer

The clang static analyzer is a free static analysis tool that uses the clang compiler to parse C++ source files into a data structure which is then analyzed for potential problems. This has the advantage that the analysis proceeds from a properly parsed interpretation of the source file, but has the disadvantage that it can’t analyze sections of code that are disabled through use of the preprocessor.

Setting up Travis for Static Analysis

To enable static analysis for my Iterated Dynamics repository, I wanted to run both cppcheck and the clang static analyzer. I already had an existing build matrix for compiling the code with gcc and with clang using this snippet of YAML for Travis:

compiler:
  - gcc
  - clang

These tools don’t require access to compiled code in order to do their job, so I can run them in parallel with the gcc and clang builds. To do this, I added an environment variable ANALYZE to the Travis YAML file with the values true and false giving me a two by two build matrix:

env:
  - ANALYZE=false
  - ANALYZE=true

In my Travis build script, I could test this environment variable and the CC environment variable to identify the four configurations and take the appropriate action. Until this point, my build script was simply a couple commands in my YAML file, but now it was becoming a little complicated, so I moved the build script to a new file I called .travis.sh:

#!/bin/sh
#
# Build script for travis-ci.org builds to handle compiles and static
# analysis when ANALYZE=true.
#
mkdir build
cd build
if [ $ANALYZE = "true" ]; then
    if [ "$CC" = "clang" ]; then
        scan-build -h
        scan-build cmake -G "Unix Makefiles" ..
        scan-build -enable-checker security.FloatLoopCounter \
          -enable-checker security.insecureAPI.UncheckedReturn \
          --status-bugs -v \
          make -j 8
    else
        cd ..
        cppcheck --help
        cppcheck --template "{file}({line}): {severity} ({id}): {message}" \
            --enable=style --force --std=c++11 -j 8 \
            --suppress=incorrectStringBooleanError \
            --suppress=invalidscanf --inline-suppr \
            -I headers hc common headers unix win32 2> cppcheck.txt
        if [ -s cppcheck.txt ]; then
            cat cppcheck.txt
            exit 1
        fi
    fi
else
    cmake -G "Unix Makefiles" ..
    make
fi

When analysis is disabled, I use the same build commands for both clang and for gcc. When performing the analysis, I found it helpful to have the help output in the build log in case I wanted to adjust the command-line arguments, so I first run the tool to get the command-line help output. Then I run the necessary commands to perform the analysis.

The static analysis can be fairly time consuming. In my initial experiments, the cppcheck analysis took approximately 20 minutes and the clang static analyzer took about 12 minutes. Considering that my existing compile builds were only taking about a minute, this was a considerable increase in time. Therefore, I needed to figure out a way to decrease the time of the static analysis so that I could keep it running on all continuous integration builds.

Initially I focused on reducing the amount of time taken by cppcheck. Since I was telling cppcheck to analyze all possible preprocessor combinations, this revealed that the structure of my preprocessor directives was allowing some combinations that didn’t make any sense. In this code base, there are only two meaningful alternatives: _WIN32 is defined on a Windows build and XFRACT is defined on a Linux build. Some use of these preprocessor symbols resulted in analysis pursuing a combination of both symbols being defined. A little bit of refactoring of the preprocessor directives solved this problem. Some additional preprocessor legacy configurations were identified and those configurations were eliminated from the code. This reduced the time for cppcheck to about 12 minutes.

Following that I decided to see if increasing the number of analysis jobs running in parallel would help. Fortunately the virtual machines supplied by Travis allow access to multiple cores of execution in parallel, so it helped considerably. Increasing the number of jobs to 8 for each analysis build reduced the execution time to about 5 minutes for each tool.

Enabling multiple jobs for cppcheck disabled its ability to identify unused code. I may add a second invocation of cppcheck looking only for unused functions at a later time. For some reason, even though my cppcheck build was eventually clean of reported problems, it still reported a failure code and failed my build. I compensated for this by directing the error report to a file and then failing the build if the file was non-empty.

When using the clang static analyzer, it was important to run the CMake step through the scan-build script in order to have the build properly configured to use scan-build as the compiler. Having done that, the code was properly analyzed.

Suppressing False Positives

Each tool has it’s own way of suppressing a false positive. With cppcheck, this can be done by inserting a special comment just before the line reported for a false positive. Complaints about dereferencing a null pointer can be silenced in clang by simply adding an assertion on the pointer before the offending line. For other reports, a special preprocessor symbol, __clang_analyzer__, can be used to surround code that should not be analyzed by clang. In my case this was necessary to silence a false problem reported around the use of the FD_SET macro.

When suppressing false positives, it is best to suppress them in as small a scope as possible. However, sometimes it is simply useful to disable specific checks. My code base was a legacy code base containing many calls to scanf. Eventually, these calls will be replaced with C++ I/O mechanisms, but for now I didn’t want them failing my static analysis builds. My code also contained a bunch of assertions of the form:

_ASSERTE(0 && "Don't call standard I/O without a console on Windows");

The purpose of these asserts is to fail with a meaningful message, so the full expression is guaranteed to evaluate to false, but the run-time assertion will display the string used in the expression. Unfortunately this triggers a diagnostic in cppcheck. After verifying that all the diagnostics were instances of these kinds of assertions, the entire check was suppressed. These assertions are sprinkled around in this legacy code in places where one of two things will happen: either a replacement implementation will be deployed instead of the assertion, or the entire code path will be eliminated through a modernization of the code base.

Fixing the Issues

After getting the analysis tools running and working through the relatively small number of false positives, all the remaining issues had to be fixed in order to get a clean build. In many cases, these were issues resulting from low-level C style coding habits: declaring variables without initializing them, using malloc instead of std::vector, using C-style strings instead of std::string and so-on. Addressing these issues one-by-one took a number of commits but was worth the time spent in order to get a clean analysis build. When fixing these issues, I prefer to keep the commits small and focused, even if that introduces a large number of commits. Commits don’t cost anything and keeping each one focused means that if a mistake is made somewhere along the way, the individual offending commit that introduced the problem will be small and focused making it easy to identify the problem. My code base is a legacy code base with no automated testing yet, so I needed to proceed carefully and slowly making sure that each commit represented progress and not a potential bug.

One Response to “Adding Static Analysis to Your C++ GitHub Repository”

  1. Give Your Old Code Some New Love | Legalize Adulthood! Says:

    […] Adding static analysis to Iterated Dynamics was relatively straightforward*. Over time, the static analysis checks have ensured that forward progress remains locked in. All the checks performed by cppcheck and clang’s static analyzer get flagged as failed builds in the continuous integration server and that prevents problems, such as reading a variable before initializing it, from slowly creeping back into the code base. […]

    Like


Leave a comment