Refactoring to Learn a New Code Base

When you join a team with a large existing code base, it can be intimidating. How long does it take to become familiar with a million lines of code? I don’t have the answer to that question, but based on my past experience it easily takes several years if not more. Meanwhile, the rest of the team is adding and changing code all the time. When I apply small, focused, systemic refactorings to a code base, I visit lots of code that may be unfamiliar to me. Visiting unfamiliar code and making the same kind of systematic change throughout can help me become more familiar with the code.

I’m currently working on a C++ code base for a large 3D graphics desktop application. This code is fairly well structured and often has doxygen documentation. The code base is quite large at around 550,000 lines of code. Learning all the classes and what they do is a significant task. None of the original authors are around anymore, so there is noone to ask for help when I get stuck.

Leveraging Visual Patterns and Cognition

One of the ways that I learn a large API like Direct3D is to print out the documentation and flip past everything, running my eyes from top-to-bottom across everything they’ve written about it. I might pause to read some details here and there, but the main purpose of doing this is to get everything about that API passed under my eyeballs. Seeing something, even if it is just once briefly, allows you to make connections to those words when you later encounter something related. Getting your vision and pattern recognition centers involved gives you a mental bearing on the conceptual landscape that is addressed by the API. The same approach can be applied to the code itself. Skimming past all the code will get your pattern recognition centers helping you to get a feel for the code base. If a class represents a key relationship for the code base, it is likely used in many places. The sheer repetition of it appearing before your eyes as you scan through the code will convey that importance. You are also likely to notice repeated patterns of duplication throughout the code base.

Refactoring to Use a Smart Pointer

In my code base, many of the classes have their lifetime managed by an intrusive reference count. Calling ref() increases the reference count and calling unref() decreases the reference count. When the reference count reaches zero, the resource is released. In a modern C++ code base, we would use std::shared_ptr for such reference counting. Unfortunately, this code base is still using older compilers where this standard smart pointer is not available. Changing all the existing code to use a different shared pointer implementation would be a large undertaking and one probably best done with a batch transformation tool like clang-tidy.

So we are left with existing code that looks like this, where m_menuBar is stored inside FFrame as a raw pointer:

void FFrame::setMenuBar(FMenuBar *menuBar)
{
    if (menuBar != NULL) {
        menuBar->ref();
    }
    if (m_menuBar != NULL) {
        m_menuBar->unref();
    }
    m_menuBar = menuBar;
    if (m_menuTitleBarLayout != NULL) {
        m_menuTitleBarLayout->addContainer(menuBarBR_SOUTH);
    }
}

We can make our own smart pointer class that knows how to ref() and unref() at the appropriate times. We can change m_menuBar to be an instance of this smart pointer. Then we can simplify this code considerably to:

void FFrame::setMenuBar(FMenuBar *menuBar)
{
    m_menuBar = menuBar;
    if (m_menuTitleBarLayout != NULL) {
        m_menuTitleBarLayout->addContainer(menuBarBR_SOUTH);
    }
}

As you might imagine, this intrusive reference counting technique is pervasive throughout this code base. Using the smart pointer class makes it simpler to manage the lifetimes of the owned objects and ensures that every ref() is matched with the corresponding unref(). Applying this mechanical change, one class at a time, throughout the code base allows you to become familiar with the classes in the code base. For each class, you can reason about the ownership implied by the existing code. In this case, an FFrame owns an FMenuBar.

Increase Global Understanding by Refactoring

Skimming documentation increases your global understanding of an API. Mechanically refactoring existing classes gives you global understanding of the code base. Along the way, you are likely to discover errors in the existing code. For instance, you may encounter a class that manually calls ref() to obtain ownership on an object, but never manually calls unref() to release that resource. Congratulations, you’ve just discovered a resource leak. These are examples of situations where the smart pointer just does the right thing and relieves you of the tedium of managing the reference count.

Applying such refactorings to a code base can be a fairly large undertaking. The best approach is to apply the changes one class at a time and use a version control system to track the changes to each class separately. We’ll end up with lots of little commits reflecting the changes to each class. This is a good thing! If you squash the commits into a single huge change, it will be impossible to make sense of the change. Worse, if one of the classes is improperly modified, we won’t be able to use a bisect technique to isolate the change that introduced the problem. Commits are free, they aren’t a scarce resource. Don’t be afraid to introduce a stream of small commits into your tree. You’ll thank yourself later when you need to identify the source of a mistake later.

Since we’re making these changes manually, we’ll want to easily review each one to catch any mistakes. In this situation, we prefer to make the changes manually because it forces us to consider each class one at a time. Reviewing each class for ownership of reference counted resources increases our understanding of the code base. Sometimes you encounter a class that is just too confusing. That’s not a problem; just skip that class and move on to the next one. Our goal with refactoring is to make the code better than it was. Using a smart pointer to manage reference counts makes the code simpler and easier to understand. If we don’t understand the existing reference counting behavior of a class, then it is better to just leave it alone than possibly introduce mistakes by refactoring it.

2 Responses to “Refactoring to Learn a New Code Base”

  1. Steven Gordon, PhD Says:

    Do you test the refactored code? If so, how comprehensively?

    Do you check your changes in?

    Like

    • legalize Says:

      Yes, testing the code is assumed, but I should be more explicit about that, LOL.

      In this code base, there are a few unit tests and one simple automated integration smoke test. The unit tests run as a part of every build and I run the smoke test manually. I also do manual integration testing. There is risk that because of the low automated test coverage, that things will be broken. This is another reason why leaving a fine-grained commit trail is important in locating unintentional errors.

      Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: