Beginning Kernel Development
Understanding Kernel Development
If you're here, right now, you are likely to be in one of two camps: you know something about kernels in the context of operating systems, and want to know more about working on them, or you don't really have a good understanding of why kernel development is even something you should care about.
If you've got a reasonable understanding of the what, you can safely skip over the rest of this section, and jump straight to "Before We Begin". Otherwise, however, we will briefly speak on the subject (and why it's something you should even be bothered with). Long ago, in the early days of computing (and even well into the 1980s) operating systems didn't do nearly as much as they do today (outside of big terminal server-style environments, at least). When considering commodity consumer-facing OSes, such as MS-DOS, a program ran in what is now called "real mode" (in x86 terminology), and had full access to all system resources - it could directly speak with hardware, interface with the BIOS, and read/write to any memory location it desired. Code execution generally meant that you could do anything you wanted (this is still also true for many embedded systems).
Over time, these operating systems became more robust, and there was demand for a user to be able to run multiple programs at once, and even to allow multiple users to run programs at the same time. This of course created some complications: How could you restrict access between users/processes if all of the programs are able to do whatever they want? If one program crashes, how can we make sure that it doesn't take down the entire system? Modern (for at least the last 30 years) CPUs provided some interesting features for this: a set of "layers" of privilege that could allow the operating system to apply restrictions to programs that run on the computer, and prevent those programs from taking privileged actions, like changing memory that belongs to the operating system itself (or the memory of other processes), executing some instructions, or modifying specific registers (among other things).
This separation of privileges now allows us to create a hardware enforceable split between things that belong to the "Operating System" (Kernel Mode), and "User Applications" (User Mode). Note that we are actually simplifying a bit here - the x86 model actually specifies 4 different "rings" of privilege (as an example), and has some additional "hidden" modes (like SMM, or System Management Mode) for initial setup, but those are either mostly unused/used for niche cases (some virtualization engines, for example), or are not generally directly usable once the system finishes booting. In any case, we will mostly be focused on User and Kernel mode for this discussion: where User Mode may also be referred to as ring 3, and Kernel Mode can also be called ring 0 (this "ring" terminology is an x86 and x64-specific thing, but ARM and most other modern architectures have similar constructs).
User Mode vs. Kernel Mode
If you have written an application in any language that has run on a Windows, Linux, or MacOS system, you've most likely developed a User Mode application. In User Mode, many separate programs can run, and all of them will generally have their own memory space (some elements of those spaces may be shared, but we will discuss that later!). For these programs, they are no longer allowed to directly access most hardware resources; instead, they must communicate with the kernel through system calls, and the kernel will interface with hardware on their behalf. Consider the following code snippet:
int main(int argc, char** argv)
{
printf("Hello World\n");
return 0;
}
This program, when compiled and executed, will be linked with a C library, which will handle the heavy lifting of actually calling down into the kernel to print to screen (eventually).
So what kind of programs run in the kernel? Essentially, anything that needs to operate with elevated privilege - some common types might include device drivers, antivirus software, and programs that handle filesystem operations, along with many others. One big word of warning around kernel development: it is complicated, heavy-weight, and the software that runs in the kernel must be very stable. Mistakes in the kernel will often result in system stability issues (if not outright kernel panics or bugchecks).
Before We Begin
This article serves as a precursor to a few short series on writing drivers and kernel modules. The topic (in general) is a rather advanced one, so this will not be a good "introductory" project for someone unfamiliar with software development and basic OS concepts.
While there are many architectural and stylistic differences in kernel development from platform to platform, a lot of the basic concepts will translate well between them (due in large part to the fact that they are generally built to operate on the same sorts of hardware). Before we dive in, however, it is important to go over a few quick prerequisites for the article:
- C Programming - unfortunately, there's not really a good way around this. While Windows does have some support for C++ in the kernel, and Linux is working to incorporate Rust, these languages add a great number of additional complexities and corner cases that we won't dive into at this point. Due to the way the kernel environment works, you will likely find this journey to be a slog if you don't have a solid understanding of C (including pointers!).
- Basic OS Fundamentals - Scheduling, paging, multiprocessing, etc., etc., are all exposed in a very different way when operating within the kernel.
- Some Basic Computer Architecture Knowledge - there's no expectation of knowing how to design a CPU from scratch here, but understanding a bit about the general composition of modern processors will be very helpful here.
Basic Concepts
We will start by working through some basic concepts in the abstract, which we will discuss in terms of concrete implementation when we get a specific OS/kernel implementation.
The Kernel and Processes
One important general concept to keep in mind with kernel development is that it implies a very different concept of process than what you are likely used to dealing with in user mode. Instead of being sandboxed to a single process with a range of memory addresses that you are able to uniquely access, the kernel generally exists in every process (usually in the top part of memory that is inaccessible from user mode).
This means that when you begin execution, there are not generally strong guarantees about what exists in the user-mode section of memory (with some caveats), and the kernel-mode portion of memory is shared with all other pieces of kernel mode software, including the operating system kernel itself, other device drivers, etc.
Preemption
When running a user mode program, the threads of that program can be preempted to do other work (more on this later) when certain events occur, or may be scheduled out to allow other threads to run. Similarly, kernel-mode threads can also be preempted - but how and when is a little more complicated.
While the system is running, the processor must service a great number of interrupts that continually occur under the hood. This might include things like a user pressing a key on a keyboard, clicking with a mouse, when a clock tick occurs, or even when said user tries to access memory that has been paged to disk, so a bit of code within the operating system can find and page in the requested memory.
When these interrupts happen, the processor will invoke a callback function registered by the operating system, which will then handle the interrupt that fired, before normal execution can resume. Since these interrupts are happening all the time in your system, and any significant slowdown in servicing them would cause noticeable delays, these Interrupt Service Routines or ISRs, must execute quickly and return control. In order to keep things from getting out of hand, when code written by a third-party developer executes under one of those ISRs, most often interrupts are turned off.
What this means to you as a kernel developer, is that at any point, your program may interrupted, and if you are operating within the context of an ISR (e.g., interrupts are disabled), there are a whole host of operations that you cannot do; anything that would cause an interrupt to be generated (like accessing paged-out memory) will generally lead to a kernel panic or bugcheck, and you can't use anything that would result in your program "sleeping" (e.g., taking a mutex). Some platforms (like Windows) will further subdivide this into a series of levels, which complicate things even further (since each level has more restrictions placed upon it).
Paging and Memory
Another important thing to keep in mind is that while operating in the context of the kernel, you must be aware of whether the memory you are accessing is pageable or not. As discussed above, if you are operating in a context where interrupts are disabled (or on Windows, at IRQL_DISPATCH or above), you simply cannot touch paged out memory.
An interesting side effect of this (and for security reasons) is that user-mode threads generally have two different stacks: a user-mode stack, which is pageable and can grow, and a kernel stack, which is usually small and fixed-size, and must always be paged in. What this means to you, as a kernel developer, is that you must be very careful of how much stack space you take up; exceeding the small (1-2 page) amount of memory you have available on the stack generally ends in disaster!
In summary, you need to be very cautious of what memory you touch, and how it is used. This is one of the big reasons more modern (and ergonomic) languages add some additional complexities in the kernel: when your language handles allocations for you, it may impose substantial limitations when you might need to allocate from different pools of memory depending on a variety of things. Additionally, if the compiler has more freedom to utilize the stack, it might exceed the small amounts you are able to access in the kernel context. Finally, on the note of compiler optimizations, more modern languages often implicitly add indirect references to other parts of memory in, and some types of optimizations, such as the red zone, or indirect memory references from things like virtual functions, can wreak havoc if not disabled.
Doing Work Later
One other important concept (which has slightly different implementations from OS to OS) that operating systems apply in the context of kernel operations is to defer tasks till later. This is especially important when discussing things like ISRs, due to the fact that disabling interrupts is a very disruptive operation, and we want to return to normal as quickly as possible. In order to facilitate this, and also to be able to access resources we might otherwise be unable to access while running in the context of an ISR (e.g., paged-out memory, etc.), most OS kernels support mechanisms for deferring work until later.
Some examples of this are so-called "bottom halves" and related mechanisms, or things like deferred procedure calls in Linux and Windows respectively. These mechanisms both allow operations that might be longer running, or require more resource allocations to queued for execution at a later time.
Synchronization and Sleeping
Finally, the kernel is an environment where parallel execution is often implicit. In user mode, we often have some measure of control over how operations occur; if we only declare a single thread (for example), we will never (or at least, in most cases) need to worry about other threads of execution attempting to manipulate our data. In the kernel, however, the story is a bit different - there are often situations where methods you create and register may be invoked many times concurrently across the operating system.
Consider, for example, the PsSetLoadImageNotifyRoutine method from the Windows API: if you register a callback with this function, any accesses to a shared resource must be synchronized, as it will be impossible to determine a priori how many times it might be called concurrently on systems your driver may get loaded on. Even worse, consider a situation where you have an ISR, and need to be able to operate on some bit of shared data - perhaps a global variable, or a hardware register that needs to be written to - whatever synchronization mechanism you choose cannot cause the current thread to sleep. What this ultimately means is that, just like with memory accesses, you must be very cognizent of what contexts the functions you write may be called in, and choose synchronization primitives that are appropriate: spinlocks and perhaps atomics when sleeping is not permitted, and mutexes when it is.