Refactoring: Convert C to C++

You have C source code.

Create a C++ source file containing the C source code, modified to adhere to the rules of C++.

Motivation

The C language provides limited scoping and encapsulation of data and functions. It has no objects, but legacy systems often involve a large body of procedural code that could benefit from object-oriented refactorings. In order to apply object-oriented refactorings the code must first be ported to C++. Then the C++ code can be refactored from procedural code and data into objects. The objects can be further refactored with standard object-oriented refactorings.

C++ is syntactically compatible with C, but introduces new reserved keywords that are not present in the C language, such as template, typename, class and so-on. Because these additional keywords are reserved in C++ but not in C, you may need to apply Rename Function, Rename Field or Rename Variable refactorings in order to eliminate errors arising from compiling the source with a C++ compiler instead of a C compiler.

C++ is link compatible with C when external data and functions are declared with “C” linkage via the extern "C" in the C++ code. Data and functions made available to C compilation unit from a C++ compilation unit must also be declared extern "C" in order to be visible to the C code.

Mechanics

  1. Rename the source file from .c to .cpp
  2. Wrap any application header files designed to be included from C inside an extern "C" {} block.
  3. Declare C linkage, via extern "C", to any globally visible data and functions provided by the source file.
  4. Modify the build system to compile the new C++ source file and link the C and C++ object files into an executable.
  5. Compile the source file alone to identify any identifiers that conflict with C++ reserved keywords. Apply Rename Variable, Rename Field, and Rename Function to eliminate any reserved word conflicts. Correct any casting errors.
  6. Compile and link the whole application. Correct any linkage errors.
  7. Repeat these steps for all the C source files.
  8. Once all the source files have been converted to C++, remove the extern "C" {} linkage specifications from all header and source files.
  9. Modify the build system to compile all source files with the C++ compiler.

Example

This example is taken from FractInt, an open source program for rendering fractal images. FractInt originated as a 16-bit MS-DOS program with a polling I/O architecture. We’d like to “upgrade” it to a program for 32-bit Windows. This is a massive transformation of the code and virtually no lines of the source base will be left untouched. If we first transform the code from C to C++, we can start refactoring the procedural code into objects and incrementally build a framework for extension that follows the Open/Closed Principle.

We’ll take a look at the C file 3d.c and the changes we need to make to compile it as the C++ file 3d.cpp. 3d.c is intended to provide a number of utility routines for manipulating 3D vectors and 4×4 transformation matrices. Ultimately, we’d like to transform this code to operate on the primitive types for a 3D vector and tranformation matrix with the normal arithmetic operators, but first we need to get this procedural C code into C++.

1. Rename the source file.

This process is simple enough, we simply rename the file 3d.c to 3d.cpp.

However, if you are using CVS for your version control system, you’ll find that renaming a file looks to CVS like “delete file 3d.c and create file 3d.cpp“. The version history associated with 3d.c will no longer be accessible when browsing the version history of 3d.cpp. If preserving that history is important to you, then you probably want to switch to a version control system like Subversion, which supports file renaming and directory structure manipulations better than CVS. In the case of FractInt, we converted our CVS repository to a Subversion repository before converting source code to C++ in order to preserve as much of the file version history as possible.

2. Wrap any application header files.

3d.c has this sort of include structure at the start of the file:

#include <string.h>
#include "port.h"
#include "prototyp.h"

For standard C header files like <string.h>, we usually don’t need to do anything. Standard C header files are already properly adjusted to specify “C” linkage when included from a C++ source file. This is the case with Microsoft C++ compiler as well as the gnu C++ compiler. If for some reason your C++ compiler is using old C header files that are not guarded for inclusion from C++, you will see undefined symbol linkage errors for the standard functions declared by the included header files. In that case, the easiest remedy is to check the include path for header files and make sure it is correct for your C++ compiler, or look into getting an updated set of C header files that are guarded against being included in a C++ source file.

The header files "port.h" and "prototyp.h" are application specific include files for FractInt. Exactly what these header files declare isn’t important; what is important is that they represent the declarations for C code and because C source files are compiled with “C linkage”, we need to ensure that the declarations also use “C linkage”. The easiest way to do this is to bracket the includes with a C linkage specification:

#include <string.h>
extern "C"
{
#include "port.h"
#include "prototyp.h"
}

Now all global variables and functions declared in "port.h" and "prototyp.h" will be declared in a C linkage context. When the C++ compiler references any of the items declared by these headers, they will not undergo C++ name mangling and they will be compatible with the names created by the C compiler.

3. Declare C linkage for any globally visible data or functions provided by the source file.

You have two choices for how you apply the C linkage specification to data and functions. You can enclose the relevant data and
functions inside an extern "C" {} block, or you can decorate individual pieces of data and code.

In this case, 3d.c doesn’t define any globally visible data, just functions. For example, this function that creates an
identity transformation matrix looks like this in 3d.c:

void identity(MATRIX m)
{
    int i, j;
    for (i = 0; i < CMAX; i++)
    {
        for (j = 0; j < RMAX; j++)
        {
            m[j][i] = (i == j) ? 1.0 : 0.0;
        }
    }
}

We can specify C external linkage for this function by adding the extern "C" linkage specification before the function:

extern "C" void identity(MATRIX m)
{
    int i, j;
    for (i = 0; i < CMAX; i++)
    {
        for (j = 0; j < RMAX; j++)
        {
            m[j][i] = (i == j) ? 1.0 : 0.0;
        }
    }
}

Now C clients can call this C++ function without getting tripped up by the name mangling that happens for pure C++ code. This is
important since we won’t be changing all of the clients for 3d.c‘s data and functions just yet. We want to make this change one file at a time.

4. Modify the build system to compile the new C++ source file.

In this case, the source code compiles in two platforms: Windows and Linux. For Windows, Visual Studio .NET 2005 is used and which compiler (C or C++) is used is driven entirely by the filename. The 3d.c file is removed from the Visual Studio project and the 3d.cpp file is added to the project. Visual Studio will now use the C++ compiler for the source file.

For Linux, we need to edit the Makefile to refer to the new filename and use g++ to compile the file instead of gcc. Additionally, when mixing C and C++ object files compiled with gcc and g++, you must link with g++ or you will get a strange linker error. So we modify the Makefile to compile 3d.cpp as follows:

.cpp.o:
        $(CXX) -c $(CFLAGS) $<

This adds a rule for building .o files from .cpp files. The CXX macro is predefined by the GNU make program to point to g++.

In the Makefile that links the object files together to form the executable, we change the link command from:

$(CC) -o xfractint $(CFLAGS) $(OBJS) $(U_OBJS) $(LIBS)

to

$(CXX) -o xfractint $(CFLAGS) $(OBJS) $(U_OBJS) $(LIBS)

Now the .cpp files will be compiled with g++, the .c files will be compiled with gcc as before and the application will be linked with g++. Now we’re ready to compile the new C++ source file.

5. Compile the source file alone to identify keyword conflicts and casting errors.

Depending on the source code in question, you may have used identifiers that are now keywords in C++. The following words are keywords in C++, but not in C:

Reserved Words in C++, but not in C
and_eq bitand bitor
bool* catch class
compl const_cast delete
dynamic_cast explicit export
false friend inline
mutable namespace new
not not_eq operator
or_eq private protected
public reinterpret_cast static_cast
template this throw
true try typeid
typename using virtual
wchar_t xor xor_eq
(from The C++ Programming Language, 3rd ed., by Bjarne Stroustrup)

In this particular case, 3d.c didn’t use any of these keywords. However, other source files in FractInt used the keywords new, delete and had to be adjusted. The X11 Window System code support in FractInt had to be modified to refer to the class member of a data structure as c_class. The data structure in question came from an X11 header file, which was conditionally compiled to change the name of the member when the header file was included from C++.

You may also encounter compilation errors related to casting when C code is compiled with a C++ compiler. The two most common warnings and errors are those generated from using functions like strlen that return size_t and code that calls malloc.

In the case of strlen, the problem is that size_t is an unsigned quantity. Code that assigns the result of strlen to an integer will generate a warning about signed/unsigned mixing and possible loss of data. You can cast the unsigned quantity to a signed quantity, or you can change the data type of your variable. The simplest way to deal with these warnings is to use static_cast<> to explicitly convert the result of strlen to an integer. For example, change:

void some_string_function(char *s)
{
    int i;
    for (i = 0; i < strlen(s); i++)
    {
        parse_character(s[i]);
    }
}

to:

void some_string_function(char *s)
{
    int i;
    for (i = 0; i < static_cast<int>(strlen(s)); i++)
    {
        parse_character(s[i]);
    }
}

This satisfies the compiler, but you might not be happy looking at that invocation of static_cast<>. Your alternative is to declare i as type size_t:

void some_string_function(char *s)
{
    size_t i;
    for (i = 0; i < strlen(s); i++)
    {
        parse_character(s[i]);
    }
}

In this case i is used only within this trivial function and this might be preferable. In other cases, i could be passed on through many functions as an argument and you would have to ripple through the change of declaration through all of those functions. Ultimately this may be the right thing to do, but we’re focused on doing one thing at a time and right now we want this old C source to compile cleanly as C++ source, so the cast may be more appropriate. Note as well that we use C++ casting and not C style casting to more clearly indicate our intent to the C++ compiler.

Another casting error that can result is the void * return type from malloc. In C, the void * will be automatically converted to the pointer of the appropriate type upon assignment:

char *foo = malloc(4096);

However, such code in C++ will require a cast of the void * to a specific pointer type:

char *foo = static_cast<char *>(malloc(4096));

6. Compile and link the whole application.

Once you’ve eliminated all warnings and errors from the C++ compiler for this source file, you’re ready to test linking the modified source back into the whole application to check for any remaining linkage errors.

If you’ve neglected to declare a C++ function with C linkage and it is expected by other C code, you will see errors complaining about
undefined symbols without name mangling:

Undefined symbol: identity

If you have done the reverse — called a C function from C++ without declaring it with C linkage — then you will see errors complaining about undefined symbols with C++ name mangling:

Undefined symbol: identity@f3f3f3v

Correct the linkage specifications in the modified file until the linker is satisfied. In the first case, you need to add the extern "C" linkage specification to a function provided by the new C++ source file. In the second case, you need to declare the function consumed by the new C++ source file as having C linkage. If you have already wrapped any includes, look for implicit declarations of functions (functions used but not declared), or look for embedded extern declarations in the source file.

Test the application with your unit tests and/or functional tests to ensure that nothing was broken during the process. Once you have a successfully compiled C++ source file and the application successfully linked with no unresolved symbols, commit your changes to a version control system as a checkpoint.

7. Repeat the steps for all the source files.

Work through all the source files you wish to convert one at a time. Always get a working system after each file has been converted
and commit that to the version control system to checkpoint your progress. If things get too confusing while working on a file you can
always revert your source code back to the last known good state and start over paying closer attention to any compilation and linking problems along the way.

8. Remove all extern "C" linkage specifications.

Once all the source files have been converted to C++, there is no need for the extern "C" linkage specifiers anymore. If you have many source files that were converted, you may find it safer to remove the linkage specifiers one file at a time. However, if the declarations of data and functions are not paired with source files but contained in a single global include file, then this may involve too much error-prone editing. It may be safer to remove all the linkage specifiers at once. If you remove them all at once and miss some, then you will get a linker error complaining about unresolved symbols. Look for the linkage specifiers with a tool like Visual Studio’s Find in Files or a recursive grep on Linux.

Once all the linkage specifiers have been removed, you should have a pure C++ application.

9. Modify the build system to use C++ exclusively.

With a pure C++ application, there’s no need to have Makefile support for compiling C code anymore. Remove any commands in the Makefile that were there purely for supporting the compilation of C code. In Visual Studio, simply removing all the .c files from the project is sufficient.

Now you should have C source code that compiles and links cleanly with a C++ compiler. Of course a “true” conversion to C++ for the code involves much more than compiling and linking without error, but this is a good starting point for refactoring.

When refactoring your converted C code as “better C++” there are a number of items you will want to consider. For example, you will want to consider the following:

  1. Migrate dynamic memory allocation from using malloc and free to using new, new[], delete and delete[].
  2. Migrate from C-style casts to C++ static_cast<>, reinterpret_cast<>, and dynamic_cast<>.
  3. Migrate from hand-crafted containers to C++ Standard Template Library containers.
  4. Migrate to for variables whose scope is contained within the scope of the loop body.
  5. Migrate to the use of const for read-only arguments passed by pointers.
  6. Migrate to std::string instead of character arrays.
  7. Migrate from pointers to true references (&).
  8. Migrate from structs to classes.

This is not an exhaustive list. You will probably encounter more opportunities as you review your source code.


*bool was introduced to C in 1999, see C99.

2 Responses to “Refactoring: Convert C to C++”

  1. haroun Says:

    thank you

    Like

  2. Give Your Old Code Some New Love | Legalize Adulthood! Says:

    […] simple refactoring you may want to approach first is to Convert C to C++. C is mostly syntactically compatible with C++ (some keywords in C++ are not keywords or reserved […]

    Like


Leave a comment