Writing a Windows Loader (Part 3)

Continuation of Writing a Windows Loader (Part 2).

Entry Point Invocation

At this point, we are finally able to invoke the entry point for our loaded binary! One important consideration, is that entry points for DLLs and regular executables have slightly different signatures. Additionally, the treatment of these by their respective C runtimes is also slightly different - most executables will fall through to ExitProcess after execution is finished, while DLLs typically expect that their entry point (and eventually, DllMain) will be invoked several times: On load, it expects to be invoked once during "process attach", and once during "thread attach", and on unload, it expects to be called once with "thread detach" and again with "process detach". So how should our entry point look? Keep in mind that this isn't exactly DllMain or main, as implemented by the end developer, the first entry point typically performs some special setup before eventually invoking the entry point implemented in the program you just compiled. The main difference here, really, is in the number of arguments each take, with the DLL entry point typically taking three parameters: an HINSTANCE (which is really just a pointer to the beginning of the DLL in memory), a "reason", indicating whether we are calling it for "process attach", "thread attach", etc, and a void pointer, which at present is just a reserved parameter. Altogether, it looks something like the following:

BOOL(WINAPI *dll_entry)(HINSTANCE hInst, DWORD dwReason, LPVOID lpReserved);

It is worth noting that some reflective loader implementations have historically used this to pass along additional information, such as configuration data, but in general this is sort of a bad practice for a number of reasons. First of all, while we may have no issues here with our custom loader, in general, DllMain is a very restrictive environment, as some locks are held by the Windows loader functions when it is invoked, and some operations may lead to deadlocks, or other bad behavior. Additionally, this may leave us in a situation where our DLL simply cannot be loaded by LoadLibrary in the traditional fashion, as we will have no way to provide that third parameter in situations where we do not load the DLL ourselves short of adding additional dwReasons and invoking DllMain after the fact. While the last is certainly at least some kind of option, we will talk a bit later about some alternative options.
The entry point for regular executables, conversely, does not take any arguments. At some point between its beginning and the invocation of the developer-supplied main (or wmain, WinMain, etc), the CRT-supplied entry point will perform the appropriate setup steps to get the program commandline, etc, and pass those along to the entry point. All told, it looks something like this:

void(WINAPI *exe_entry)(void);

After all of this setup, actually locating the entry point is a fairly trivial endeavor; the AddressOfEntryPoint field off the optional header provides the RVA to reach the binary's entry point. Putting this all together, we can construct an invoke_entry method as follows:

static LoaderStatus invoke_entry(unsigned char* image_base,
                                 uint32_t size,
                                 PIMAGE_NT_HEADERS* nt)
{
    LoaderStatus status = LoaderSuccess;
    union {
      void* p; // Our initial value - we will have to check if this is a DLL or not!
      BOOL(WINAPI *dll_entry)(HINSTANCE, DWORD, LPVOID);
      void(WINAPI *exe_entry)(void);
    } u;

    // First, we find the location of our entry point
    // using the RVA and optional header
    u.p = image_base + nt->OptionalHeader.AddressOfEntryPoint;

    if(nt->FileHeader.Characteristics & IMAGE_FILE_DLL) {
       if(!u.dll_entry(image_base, DLL_PROCESS_ATTACH, NULL))
           return LoaderFailed;
       if(!u.dll_entry(image_base, DLL_THREAD_ATTACH, NULL))
           return LoaderFailed;

    } else {
       u.exe_entry();
    }

    return LoaderSuccess;

}

And with that, we have execution!

Finding Exports

Now that we have actually built a loader and gained execution, we will talk briefly about finding exports manually. This was omitted earlier (in favor of LoadLibrary and GetProcAddress) due to some of the potential edge cases (such as forwards to other DLLs, managing circular dependencies, etc) that the built in tooling in Windows can handle, but custom solutions may have issues with. Really, this is most useful for creating custom APIs (as we will discuss later on), as an alternative to repurposing the arguments in a DLL's entry point.
The export directory is referenced by the DataDirectory entry IMAGE_DIRECTORY_ENTRY_EXPORT, which will give us access to the export structure - an IMAGE_EXPORT_DIRECTORY:

typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD   Characteristics;
    DWORD   TimeDateStamp;
    WORD    MajorVersion;
    WORD    MinorVersion;
    DWORD   Name;
    DWORD   Base;
    DWORD   NumberOfFunctions;
    DWORD   NumberOfNames;
    DWORD   AddressOfFunctions;
    DWORD   AddressOfNames;
    DWORD   AddressOfNameOrdinals;
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

For our purposes, we mostly care about the bottom six members of the structure. Essentially, the AddressOfXxxx structure members serve to provide RVAs to a number of tables that we will use to locate the actual function inside of the binary. The ultimate goal is to find the correct offset into the AddressOfFunctions table, which provides just that - the RVA of the exported function. We'll first consider the easier case - finding an export by ordinal.

Finding an Export by Ordinal

Ordinals are a simple case, as they almost provide a direct offset into the AddressOfFunctions table. I say almost, because two things exist that we must consider:

In C and C++, array indexes start at 0, while ordinal values start at 1.
There is no guarantee that the first ordinal value exported is 1 - it could be any value from 1-65535.

So how then do we convert this ordinal value, to a suitable index? As it turns out, the Base field of the structure provides the answer to this; assuming the ordinal we are examining is well-behaved, we should simply be able to subtract the Base value from it, and that should provide us with a direct index into the AddressOfFunctions table. As always, it should be important to consider bad behavior as well, and as such, well-formed export-finding code should consider edge cases, such as ordinal values smaller than the base, or that otherwise result in indicies beyond the boundaries of the AddressOfFunctions table. We can actually determine this by looking at the NumberOfFunctions field, which will tell us definitively how many exported methods the table contains. To illustrate this (sans error checking), we can consider the following example:

PIMAGE_EXPORT_DIRECTORY export_dir = NULL;
uint32_t* addr_of_fns = NULL;
uint32_t  num_fns = 0;
uint16_t  offset = 0;
void(*exported_method)(void) = NULL;

// Find export directory from its DataDirectory entry

addr_of_fns = (uint32_t*)(image_base + export_dir->AddressOfFunctions);
// We should check to make sure the ordinal value is bigger than base here!
offset = ordinal_value - export_dir->Base;

// We should check to make sure we aren't outside of the table!
exported_method = (void(*)())(image_base + addr_of_fns[offset]);

// Invoke our found export!
exported_method();

Finding an Export by Name

This is slightly more involved, as we now need to use two parallel tables to find the proper index for the AddressOfFunctions table. Also, instead of simply considering the NumberOfFunctions value, we must also consider the NumberOfNames, which indicates the number of functions in the binary that are exported by name.
The two tables we must first consider here, are the AddressOfNames table, a table of 32-bit values that serve as the RVAs for a set of strings denoting the names of all methods exported by name, and the AddressOfNameOrdinals table, which maps the names to their corresponding ordinals, which we can utilize (as we did to find a function directly by ordinal) to find the corresponding offset in the AddressOfFunctions table. The values in these two tables are essentially parallel to each other, meaning that the string value fount at the RVA at index 5 in the AddressOfNames table has an ordinal value stored at index 5 of the AddressOfNameOrdinals table. We can illustrate this as follows:

PIMAGE_EXPORT_DIRECTORY export_dir = NULL;
uint32_t* addr_of_fns = NULL;
uint32_t* addr_of_names = NULL;
// Remember, as these are 16 bit ordinal values,
// this table is smaller!
uint16_t* addr_of_name_ords = NULL;
uint16_t  offset = 0;
uint32_t  i = 0;
void(*exported_method)(void) = NULL;

// Find export directory from its DataDirectory entry


addr_of_names = (uint32_t*)(image_base + export_dir->AddressOfNames);
addr_of_name_ords = (uint16_t*)(image_base + export_dir->AddressOfNameOrdinals);
addr_of_fns = (uint32_t*)(image_base + export_dir->AddressOfFunctions);

for(; i < export_dir->NumberOfNames; i++) {
    // Find the name using its RVA
    const char* current_name = (const char*)(image_base + addr_of_names[i]);

    // Check to see if this is the function we are looking for!
    if(0 == strcmp(current_name, search_name)) {
        // Find the offset using the ordinal we just found
        offset = addr_of_name_ords[i] - export_dir->Base;

        // Find the exported method
        exported_method = (void(*)())(image_base + addr_of_fns[offset]);

        // Invoke!
        exported_method();
        return LoaderSuccess;
    }

}

// The name we are looking for isn't in the export table
return LoaderNotFound;

Considerations

Missing Pieces

As mentioned at the beginning of this article series, while this will definitely get us execution, it is far from comprehensive. Many additional steps are performed by the built-in Windows loading facilities, and many of those vary greatly between OS versions and service packs. Features such as support for Windows exceptions, inclusion in many loader-specific structures, and other such things have largely been left as an exercise for the reader (as that topic could certainly fill a book in its own right!). All of that said, it is certainly a worthwhile exercise to spend some time poking around those methods, to see how they operate and differ between releases, and even architectures.

API and ABI Definition

One major thing to consider is how our binaries will be packaged, loaded, and utilized. Not only does this have implications for simple things, such as "should my tool exist as an exe or a dll?" but also should be considered when thinking about how this tool should interface with others - do we care if it only works with our custom loader? Does it need to be used with LoadLibrary or rundll32 (and similar tools)? These sorts of things will help determine how to best structure the last leg of the loader: how we launch modules, whether we want to support both full executables and dlls, or if we just want to support dlls (for example). Finally, whether or not we want to manage loading third-party libraries, or if it will only be the modules we have written. A few good reference projects also exist that demonstrate what all of that means a bit more concretely; though a deep analysis of tradeoffs (and how those techniques compare to the one described in this series) is probably a good topic for another article.