Writing a Windows Loader (Part 3)
Continuation of Writing a Windows Loader (Part 2).
Entry Point Invocation
At this point, we are finally able to invoke the entry point for our loaded binary! One important consideration, is that entry points for DLLs and regular executables have slightly different signatures. Additionally, the treatment of these by their respective C runtimes is also slightly different - most executables will fall through to ExitProcess
after execution is finished, while DLLs typically expect that their entry point (and eventually, DllMain
) will be invoked several times: On load, it expects to be invoked once during "process attach", and once during "thread attach", and on unload, it expects to be called once with "thread detach" and again with "process detach". So how should our entry point look? Keep in mind that this isn't exactly DllMain
or main
, as implemented by the end developer, the first entry point typically performs some special setup before eventually invoking the entry point implemented in the program you just compiled. The main difference here, really, is in the number of arguments each take, with the DLL entry point typically taking three parameters: an HINSTANCE
(which is really just a pointer to the beginning of the DLL in memory), a "reason", indicating whether we are calling it for "process attach", "thread attach", etc, and a void pointer, which at present is just a reserved parameter. Altogether, it looks something like the following:
BOOL(WINAPI *dll_entry)(HINSTANCE hInst, DWORD dwReason, LPVOID lpReserved);
It is worth noting that some reflective loader implementations have historically used this to pass along additional information, such as configuration data, but in general this is sort of a bad practice for a number of reasons. First of all, while we may have no issues here with our custom loader, in general, DllMain
is a very restrictive environment, as some locks are held by the Windows loader functions when it is invoked, and some operations may lead to deadlocks, or other bad behavior. Additionally, this may leave us in a situation where our DLL simply cannot be loaded by LoadLibrary
in the traditional fashion, as we will have no way to provide that third parameter in situations where we do not load the DLL ourselves short of adding additional dwReason
s and invoking DllMain after the fact. While the last is certainly at least some kind of option, we will talk a bit later about some alternative options.
The entry point for regular executables, conversely, does not take any arguments. At some point between its beginning and the invocation of the developer-supplied main
(or wmain
, WinMain
, etc), the CRT-supplied entry point will perform the appropriate setup steps to get the program commandline, etc, and pass those along to the entry point. All told, it looks something like this:
void(WINAPI *exe_entry)(void);
After all of this setup, actually locating the entry point is a fairly trivial endeavor; the AddressOfEntryPoint
field off the optional header provides the RVA to reach the binary's entry point. Putting this all together, we can construct an invoke_entry
method as follows:
static LoaderStatus invoke_entry(unsigned char* image_base,
uint32_t size,
PIMAGE_NT_HEADERS* nt)
{
LoaderStatus status = LoaderSuccess;
union {
void* p; // Our initial value - we will have to check if this is a DLL or not!
BOOL(WINAPI *dll_entry)(HINSTANCE, DWORD, LPVOID);
void(WINAPI *exe_entry)(void);
} u;
// First, we find the location of our entry point
// using the RVA and optional header
u.p = image_base + nt->OptionalHeader.AddressOfEntryPoint;
if(nt->FileHeader.Characteristics & IMAGE_FILE_DLL) {
if(!u.dll_entry(image_base, DLL_PROCESS_ATTACH, NULL))
return LoaderFailed;
if(!u.dll_entry(image_base, DLL_THREAD_ATTACH, NULL))
return LoaderFailed;
} else {
u.exe_entry();
}
return LoaderSuccess;
}
And with that, we have execution!
Finding Exports
Now that we have actually built a loader and gained execution, we will talk briefly about finding exports manually. This was omitted earlier (in favor of LoadLibrary
and GetProcAddress
) due to some of the potential edge cases (such as forwards to other DLLs, managing circular dependencies, etc) that the built in tooling in Windows can handle, but custom solutions may have issues with. Really, this is most useful for creating custom APIs (as we will discuss later on), as an alternative to repurposing the arguments in a DLL's entry point.
The export directory is referenced by the DataDirectory
entry IMAGE_DIRECTORY_ENTRY_EXPORT
, which will give us access to the export structure - an IMAGE_EXPORT_DIRECTORY
:
typedef struct _IMAGE_EXPORT_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD Name;
DWORD Base;
DWORD NumberOfFunctions;
DWORD NumberOfNames;
DWORD AddressOfFunctions;
DWORD AddressOfNames;
DWORD AddressOfNameOrdinals;
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;
For our purposes, we mostly care about the bottom six members of the structure. Essentially, the AddressOfXxxx
structure members serve to provide RVAs to a number of tables that we will use to locate the actual function inside of the binary. The ultimate goal is to find the correct offset into the AddressOfFunctions
table, which provides just that - the RVA of the exported function. We'll first consider the easier case - finding an export by ordinal.
Finding an Export by Ordinal
Ordinals are a simple case, as they almost provide a direct offset into the AddressOfFunctions
table. I say almost, because two things exist that we must consider:
- In C and C++, array indexes start at 0, while ordinal values start at 1.
- There is no guarantee that the first ordinal value exported is 1 - it could be any value from 1-65535.
So how then do we convert this ordinal value, to a suitable index? As it turns out, the Base
field of the structure provides the answer to this; assuming the ordinal we are examining is well-behaved, we should simply be able to subtract the Base
value from it, and that should provide us with a direct index into the AddressOfFunctions
table. As always, it should be important to consider bad behavior as well, and as such, well-formed export-finding code should consider edge cases, such as ordinal values smaller than the base, or that otherwise result in indicies beyond the boundaries of the AddressOfFunctions
table. We can actually determine this by looking at the NumberOfFunctions
field, which will tell us definitively how many exported methods the table contains. To illustrate this (sans error checking), we can consider the following example:
PIMAGE_EXPORT_DIRECTORY export_dir = NULL;
uint32_t* addr_of_fns = NULL;
uint32_t num_fns = 0;
uint16_t offset = 0;
void(*exported_method)(void) = NULL;
// Find export directory from its DataDirectory entry
addr_of_fns = (uint32_t*)(image_base + export_dir->AddressOfFunctions);
// We should check to make sure the ordinal value is bigger than base here!
offset = ordinal_value - export_dir->Base;
// We should check to make sure we aren't outside of the table!
exported_method = (void(*)())(image_base + addr_of_fns[offset]);
// Invoke our found export!
exported_method();
Finding an Export by Name
This is slightly more involved, as we now need to use two parallel tables to find the proper index for the AddressOfFunctions
table. Also, instead of simply considering the NumberOfFunctions
value, we must also consider the NumberOfNames
, which indicates the number of functions in the binary that are exported by name.
The two tables we must first consider here, are the AddressOfNames
table, a table of 32-bit values that serve as the RVAs for a set of strings denoting the names of all methods exported by name, and the AddressOfNameOrdinals
table, which maps the names to their corresponding ordinals, which we can utilize (as we did to find a function directly by ordinal) to find the corresponding offset in the AddressOfFunctions
table. The values in these two tables are essentially parallel to each other, meaning that the string value fount at the RVA at index 5 in the AddressOfNames
table has an ordinal value stored at index 5 of the AddressOfNameOrdinals
table. We can illustrate this as follows:
PIMAGE_EXPORT_DIRECTORY export_dir = NULL;
uint32_t* addr_of_fns = NULL;
uint32_t* addr_of_names = NULL;
// Remember, as these are 16 bit ordinal values,
// this table is smaller!
uint16_t* addr_of_name_ords = NULL;
uint16_t offset = 0;
uint32_t i = 0;
void(*exported_method)(void) = NULL;
// Find export directory from its DataDirectory entry
addr_of_names = (uint32_t*)(image_base + export_dir->AddressOfNames);
addr_of_name_ords = (uint16_t*)(image_base + export_dir->AddressOfNameOrdinals);
addr_of_fns = (uint32_t*)(image_base + export_dir->AddressOfFunctions);
for(; i < export_dir->NumberOfNames; i++) {
// Find the name using its RVA
const char* current_name = (const char*)(image_base + addr_of_names[i]);
// Check to see if this is the function we are looking for!
if(0 == strcmp(current_name, search_name)) {
// Find the offset using the ordinal we just found
offset = addr_of_name_ords[i] - export_dir->Base;
// Find the exported method
exported_method = (void(*)())(image_base + addr_of_fns[offset]);
// Invoke!
exported_method();
return LoaderSuccess;
}
}
// The name we are looking for isn't in the export table
return LoaderNotFound;
Considerations
Missing Pieces
As mentioned at the beginning of this article series, while this will definitely get us execution, it is far from comprehensive. Many additional steps are performed by the built-in Windows loading facilities, and many of those vary greatly between OS versions and service packs. Features such as support for Windows exceptions, inclusion in many loader-specific structures, and other such things have largely been left as an exercise for the reader (as that topic could certainly fill a book in its own right!). All of that said, it is certainly a worthwhile exercise to spend some time poking around those methods, to see how they operate and differ between releases, and even architectures.
API and ABI Definition
One major thing to consider is how our binaries will be packaged, loaded, and utilized. Not only does this have implications for simple things, such as "should my tool exist as an exe or a dll?" but also should be considered when thinking about how this tool should interface with others - do we care if it only works with our custom loader? Does it need to be used with LoadLibrary
or rundll32
(and similar tools)? These sorts of things will help determine how to best structure the last leg of the loader: how we launch modules, whether we want to support both full executables and dlls, or if we just want to support dlls (for example). Finally, whether or not we want to manage loading third-party libraries, or if it will only be the modules we have written. A few good reference projects also exist that demonstrate what all of that means a bit more concretely; though a deep analysis of tradeoffs (and how those techniques compare to the one described in this series) is probably a good topic for another article.