The Mixed DLL Loading Problem
On Friday, March 14th, Microsoft announced a potentially serious bug that affects only Visual C++ developers using Visual Studio .NET 2002 and Visual Studio .NET 2003. This bug cannot affect C# and Visual Basic .NET programmers. In addition 90 to 95% of Visual C++ .NET programmers cannot be affected. Of the 5-10% of Visual C++ programmers who are developing code that is vulnerable to the bug, only a handful have actually experienced the problem that the bug may cause. Nevertheless, all Visual C++ programmers should be aware of it, and know what to do to prevent problems in mixed DLLs.
What is a mixed DLL?
Visual C++ .NET, as I've mentioned before, is unique among the .NET-supported languages from Microsoft: it can generate both intermediate language (IL) and native code. When you create a "Managed C++ application", the build product is an assembly of IL with an .exe extension. When you create an MFC application, the build product is a Windows executable file of native code, also with a .exe extension. The internal layout of the two files is utterly different.
When you create a class library, the file has a .dll extension. When you create an MFC DLL, the file contains only native code. When you create a Managed C++ Class Library, the file usually contains a mixture of native code and intermediate language. This bug can only affect these mixed DLLs. It is quite simple to arrange for your Managed C++ Class Library project to have an IL-only file as the build output, and in fact that is the workaround for this bug.
You may be wondering if there's such a thing as mixed EXE files. There are, and they are unaffected by this bug. Of the six kinds of build outputs Visual C++ can create (IL, native, and mixed DLLs; IL, native, and mixed EXEs) only mixed DLLs are affected.
What's the problem?
Every DLL has a special function called DllMain(). It's called when the DLL is first loaded and when it is unloaded, and it takes care of initialization and then cleanup. Until it has run, the loader (part of the operating system that controls interactions between running code) will not let any other function in the DLL run. We say that DllMain holds a loader lock. This lock will also not let any other DLL be loaded while a DllMain is in process.
There are several actions that are not allowed inside DllMain. Neither the compiler nor the operating system will warn you if you perform these actions, and typically the end result of performing these actions is that your process will hang. You are not allowed to load another DLL, to access the registry, to call a function from another DLL (with the exeption of Kernel32.dll which is always available) or to touch another thread, including threads in other processes. Get in, initialize your variables, and get out.
Visual C++ .NET developers need to add another item to their mental list of DllMain no-nos: you can't run any MSIL. That means, of course, that DllMain can't be written in managed code, and it can't call any function that is written in managed code — directly or indirectly. Life is made even more complicated for the writers of mixed-mode DLLs by the runtime's ability to do things unexpectedly, such as running the garbage collector, or without an explicit request, such as loading a DLL because you are trying to access a method in the DLL and it hasn't been loaded yet. Normally this behavior is considered a feature, but when unexpected or unrequested work is done inside DllMain, you're heading for a deadlock and a hung process.
Actually, it's a little worse than that. If you were guaranteed a hung process, you'd discover this problem during the most elementary testing cycle, and you'd be really motivated to make it stop. But this problem is intermittent, and more likely to happen if your system is under stress — a horrible time for a bug to appear.
What should you do?
Let's start with the first thing. Your DllMain should not be written in managed code. If you're creating a Managed C++ Class Library, by default all your methods are in managed code. Even if you mark DllMain as unmanaged with a pragma, there will be a bit of managed code around it anyway. Even if you don't write a DllMain, the compiler will generate an unmanaged entry point and then call various other methods from it. You must suppress that entry point with the /noentry option in your project properties. This option is reasonably well disguised. Here's how to set it:
- In Solution Explorer, right-click the project name and choose Properties
- Expand the Linker folder on the properties sheet
- Select the Advanced sub-section
- Change the Resource Only Dll property to Yes.
Using the /noentry option in Visual Studio .NET 2002 and 2003 does not completely eliminate every chance of this bug hurting you. The CLR itself needs to be changed to squash the bug completely, and version 1.1 of the Windows .NET Framework is too close to release to implement such a change. In a future version, assemblies built with /noentry will be protected from this bug entirely.
It's natural, at this point, to wonder how you will initialize static variables if you don't have a DllMain to initialize them in. And how can you call out to ATL, MFC, or C Runtime Library code? Those libraries need to have some statics initialized before you use them. There is a new Knowledge Base article to work you through the process — it's not quick or simple, be warned. You can find it at http://support.microsoft.com/?id=814472. The article starts by referring to a number of linker errors, including a new one that has been added to Visual Studio .NET 2003 to draw attention to the potential problem for a developer. Whenever you create a managed code DLL, you should follow the instructions in the Knowledge Base article even if you are using Visual Studio .NET 2002 and did not receive a linker error. Be sure to read to the very end of the Knowledge Base article before you start to type and click, because there is a convenient header file provided (it's been added to Visual Studio .NET 2003) to reduce the workload a little.
Should you be worried?
Well, I'm not. I've created a handful of Managed C++ Class Library projects, mostly to run on lightly-stressed machines, and this problem has never bitten me. In fact, you can count the folks it has bitten on the fingers of one hand. It's stressful, though, to imagine a time bomb in your code, waiting to freeze a process just when the largest number of people want it, so you should understand the problem and take steps to prevent it.
- Projects built in Visual Basic .NET, Visual C# .NET, and any other .NET language except Visual C++ are immune to this problem since they cannot emit unmanaged code.
- Projects that create exe files are immune to this problem.
- Projects that create DLLs that consist entirely of unmanaged (native) code are immune to this problem.
- Projects that create managed-code DLLs require a linker option (/noentry) to prevent the compiler from creating an unmanaged entry point, and may require a fair amount of manual work to initialize unmanaged libraries that are to be called from the DLL.
- Projects that appear to be working fine may harbor the vulnerability and hang or freeze at a very inconvenient time, so be sure to revise all your managed-code DLL projects as soon as you can.
If you're looking for even more details on this, be sure to read the Knowledge Base article referred to above, http://support.microsoft.com/?id=814472, and a technical whitepaper at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/vcconMixedDLLLoadingProblem.asp that contains the marvelous understatement, "The Visual C++ and common language runtime teams made engineering choices for mixed (managed and native) DLL loading and initialization that they have since decided to revisit." Once you understand the reason the problem occurs, you'll be ready to fix up your projects, and you won't be worried about the magnitude of this problem any more.
# # #