What is the Achilles heel of your software project? Could it be fragile code? Learn how to recognize the signs of fragile code, deal with any fragile code you have found, and how to prevent this problem in the future.
Traditional metrics for the development of a software project watch cost, defect rates, and schedule. All are measured, of course, against the project scope document created at the start of the project or at least measured against the expectation that was set when the project was started. While there are still far too many software development projects that don't meet these metrics there are other less visible factors to the long-term success of the project.
We are all aware that most of the time and effort spent on a software development effort is not spent within the design and construction phases. Most of the time and effort is spent supporting, maintaining, and enhancing the system. Subtle differences in the code can make these parts of the software lifecycle painful or relatively pain free.
The problem with software is that some systems are infinitely more difficult to maintain than others. The additional difficulty in maintaining the software far outstrips any benefit that is gained by negotiating with vendors, maintaining tight schedules, or any of the other project management techniques that are typically used to control project costs. The testing process, which removes bugs may not necessarily improve the ability to find bugs, accept configuration changes, or make enhancements. The reason for the difficulty in maintaining systems can be broken down into three key areas.
System maintainability starts with the ability to understand the existing system and part of understanding the system is a proper set of documentation. Unfortunately, too many systems are delivered with missing or inadequate documentation. Often this is because developers don't know the kind of documentation necessary to maintain a system. They develop documentation that is of little or no value.
Every system should have an architectural document that describes all of the major components, how they fit together, and how they communicate between each other. This helps to form the framework for understanding the application. When this document is missing it means that understanding how the entire system fits together will be difficult and therefore systemic problems may difficult to find. This is the document that helps you understand the forest before you go hunting for a tree.
The next piece of documentation isn't a separate document at all. It's meaningful comments in the source code. The comments that should be in the code include the header comments that describe each function, it's parameters and returns, and comments which describe why the code is being used.
Everyone has seen comments above functions, which describe what they do and what the parameters are. These are some of the easiest comments for the developer to generate. They need only cover the standard pattern of name, usage, parameters, and return.
The more difficult, but necessary, type of comments are the ones that explain why not what the code does. The extreme example of this is a line that subtracts one from a variable. A bad comment is the one that say 'Removes one from the variable.' Anyone who can read the syntax of the code knows that.
The good comment is the one that explains why one must be removed from the number. For instance, 'Convert from base 1 to base 0.' That comment explains not what is being done but why it's necessary.
No amount of documentation can overcome a bad architecture. The architecture of the application, no matter what the language, has the most profound impact on the ultimate maintainability of the application. Applications with well thought out, flexible, and structured architectures are the easiest to maintain because they are the easiest to understand and the easiest to extend.
Most architectural problems are caused by lack of forethought. Whether the development project was started with no design phase or whether it was a design phase that was excessively constrained, the result is the same. An architecture, which cannot be expanded to support potential new needs, is difficult to maintain because the architecture itself may need to be changed during the maintenance phase. In some cases this can be likened to pouring the foundation after the building has been built. While it's technical possible, it's never the easiest way.
It's also painful from a psychological standpoint that a project that has just recently been completed must be reworked to support what are perceived to be minor changes. The barrier that exists within corporate management is that they felt like they were done with the large expenditures on the project. Selling why it is necessary to re-architect a part or all of an application is a very difficult sale.
Error Handling and Logging
The final area that will determine how difficult or easy a project is to maintain is the error handling and error logging that occurs. When a problem is found the most intensive and time consuming process is determining what the root cause is. This typically takes substantially longer than the process of fixing the problem itself.
Generally problems are simple to solve once the problem is fully understood. Identifying exactly where the error is caused is the key in understanding the problem. Applications that are designed from the start to identify the exact cause of the error are much more likely to be maintainable than software that was not built with this mandate.
One of the things that is most often added to code after it's been delivered is better error handling and logging facilities so that problems can be resolved quicker and easier. Take Microsoft Word for instance. Eight versions of the product came out each one with, arguably, better error handling. Version 9, Word XP, now allows you to send an automatic bug report to Microsoft for analysis. This is a much higher level of error logging that takes advantage of the logging technologies available today.
Fragile code tends not to have good error handling or logging. The tendency is for the application to receive a general protection fault and have the operating system shut the application down. The problem with this is that this doesn't help identifying the cause of the problem. Without modification to add or improve error handling and logging it may be almost impossible to make progress on problem resolution.
If you've inherited some code that is fragile you'll need a set of coping skills that will allow you to harden the code that you have even if you can't totally re-architect it. The best place to start is in shoring up the error handling, logging, and adding the ability to trace the application.
Error Handling, Logging and Tracing
The best way to help code become more stable is to install error handlers throughout the code. This means providing a place for the operating system to go when the program encounters an error and it means testing conditions that were never tested.
In languages like VB and C++ you have the option of actual error handlers. They allow for a place, within the function, that the operating system can go to when an error is encountered. Even if you're not working with one of these languages you can add some basic error handling. First, you can register critical error handlers with the operating system when the program starts up. This allows the program to receive critical errors rather than the user receiving a generic operating system error.
The key to the error handler is to log whatever information is available about where the error occurred and whatever other conditions can be captured. For instance, logging the call stack and global variables before exiting may provide clues to what happened to cause the error.
The next step in shoring up fragile code is to add testing for all of the parameters passed to ensure that they are valid. The process of adding code to test parameters is relatively noninvasive. Although with any change in fragile code there is some risk, it is the least invasive was to get the most information. By checking every parameter before it enters the existing code you can identify problems caused between functions. Statistically speaking most problems occur between two functions rather than in the middle of a single function.
Finally, adding a set of logging statements that indicate when execution enters and exits a function you can determine what the entire call history is for an application. Obviously, you need the ability to turn off this logging so you don't impact performance when you're not debugging. Think of this logging as a general ledger for an accounting system. It identifies everything that happened. Further more each function that starts should end - just like every credit has a debit in a journaling system.
In some environments the extreme fragility of the code may lead you away from wanting to add the additional statements to support an increased level of error handling and logging. However, in the long term the number of problems caused by adding this additional testing will be far outstripped by the number of subtle errors that are detected and logged.
If one of the signs of fragile code is poor documentation then it would stand to reason that one of the ways to help reduce the fragility of code would be to generate documentation for it. Unfortunately documentation must be done with a certain amount of knowledge of the code itself. An amount of knowledge that is difficult to recover once the project is done - and even more difficult if the resources that were used to build the code are no longer available.
The key with documentation is a mixture of automated tools that can convert the code itself into meaningful documentation. For instance, a tool or set of tools that allow you to build the capability of determining where a function is used or a call tree that indicates what functions a function calls.
Automated tools convert the code into useful information about how the solution is architected. The time that can be saved by condensing and converting the code into useful information can reduce the challenges with understanding what the code does. Documentation will continue to be a source of struggle. However, every opportunity to add meaningful documentation should be exercised.
Preventing the problem
Coping with fragile code is a good reactive stance. However, there is also the proactive approach to consider. It's one thing to cope with fragile code but quite another to prevent fragile code in the first place. This involves being aware of the things that result in fragile code during the development phase.
The most challenging thing for most professionals is to make time in the schedule of a program being developed to include the necessary "checks and balances" that prevent fragile code from occurring in the first place. With the pressures to deliver code as soon as possible with as many features as is possible it's easy to see how it might be difficult to maintain time in the schedule to ensure that the code is built soundly. It's important to create an awareness of what "fragile code" is and how removing checks and balances increases the risk of creating a system composed of "fragile code"
Before the start of development agreeing on a set of standards for documentation, comments, error handling, parameter testing, trace logging, etc. will simplify the development process and help to ensure that the code has an even level of resilience to problems. It's a simple, easy step that is often overlooked in the rush to get started on a project.
By providing standards it become clear what is expected of all of the developers and causes an increased awareness of the core concepts of software development. This in turn helps to develop better code with minimal rework.
One of the best but frequently painful ways to ensure that code isn't fragile is to involve multiple parties at every level of the software development process. While it's typical in even the most harried environments to involve multiple people during the architecture phase it's fairly rare for projects to maintain a formal code review process during the development phase.
The code review process is not a punitive process designed to punish those developers who do not possess the greatest skill. It's a teaching tool designed to help all of the developers remember the standards that they've agreed to meet and to learn techniques from one another.
Code reviews need not be long, but they should be done because of their power to help prevent fragile code from being written.
Fragile code is expensive. The additional maintenance costs associated with fragile code will quickly eliminate any gains created during the construction and development phases. Spotting fragile code is easy if you know what you're looking for. Perhaps more importantly fragile code can be prevented.
Robert Bogue, MCSE (NT4/W2K), MCSA, A+, Network+, Server+, I-Net+, IT Project+, E-Biz+, CDIA+ has contributed to more than 100 book projects and numerous other publishing projects. He writes on topics from networking and certification to Microsoft applications and business needs. Robert is a strategic consultant for Crowe Chizek in Indianapolis. Some of Robert's more recent books are Mobilize Yourself!: The Microsoft Guide to Mobile Technology, Server+ Training Kit, and MCSA Training Guide (70-218): Managing a Windows 2000 Network. You can reach Robert at Robert.Bogue@CroweChizek.com.