How does MS Windows search for a DLL?

Once upon a time, I was programming in Windows 3.1 using, I think, a Borland product called Turbo C++.  Back then, the operating system didn’t provide true multitasking and each application had to cooperatively give up control so that other programs could run at the same time.  Compared to today’s programming environments, it seemed like the stone age.

One thing that was very important back then was to know how Windows goes about finding custom DLL libraries.  The problem was that there was no version control for DLL’s and it was common for developers of different applications to create DLL’s with the same name.  If the DLL being loaded turned out to be the wrong one, the application would likely crash before the first window even showed up on the desktop.  Knowing the order in which Windows searched for a DLL was important because it was a critical component of fixing the problem.

Today with abstract frameworks handling most everything for you, there isn’t much attention paid to the way DLL’s are loaded and there have been some improvements to the OS which help with the name collision problems of the past.  However, it’s still an important distinction to be aware of when building an application.

For example, at work I have a web server which has about a dozen different custom dotnet websites running on it.  Some of them use the Oracle Data Access Components(ODAC) to be able to read data from backend databases.  We’ve been having a problem with some of these applications crashing at start up because the original application developers deployed the website incorrectly.

Instead of identifying the specific DLLs from the ODAC that were linked with the website, the ODAC was installed on the webserver directly.  The reason why this is a problem is because not every developer uses the exact same version of the ODAC.   If I download the current ODAC, compile my website, and verify it works correctly, it will crash when it’s deployed to the webserver.  The reason for the crash is most likely because my ODAC version is newer than what is installed on the server.  If I then upgrade the version of the ODAC on the webserver, older applications will fail.  I need a solution that lets me deploy new code without having to recompile and redeploy every website on the server.

The solution to this problem is to extract the necessary DLL’s from the ODAC and place them in the bin directory of the deployed website.  The reason this fixes the problem is because Windows will first check the current working directory for any needed DLL’s.  If the correct DLL’s are found, then no further searching is done.  If the DLL’s are not found, then the system will use the %PATH% variable to search for the needed DLL until it is found.  If an older version of the ODAC comes first in the %PATH% variable, then the older applications that were linked with that DLL will work fine and newer applications will crash.  The only easy way to get around the potential version conflict is to put the needed DLL’s into the application’s working directory.

[Note: I’m not going to show the specific details for the ODAC here – there are many online posts that show which DLL’s need to be copied.  Failing that, an analysis of the program’s link structure should reveal what’s required.]

I’m writing this because as time has moved forward, these details seem to be getting lost to time.  Fewer and fewer developers have this knowledge and current programming training tends to skip over OS dependent behavior in favor of ‘run this program on everything and you’ll see dancing unicorns’.  Of course, I’m no expert on the matter, so if you have further information that would be helpful, please let me know.