Writing a Loadable Kernel Module in C
⚠️ This article may not be 100% accurate, do not hesitate to report any mistake.
Before starting
There are a few things we need to discuss before starting.
What is a LKM?
A Loadable Kernel Module is a chunk of code that can be inserted into a running Linux kernel, thus “extending” it. Linux kernel includes many drivers to support various hardware however there are also “out of tree” (outside the official kernel source code) drivers, these are LKM.
This article gives multiple examples of what can be done with an LKM.
Be careful, mere mortal!
Unlike applications you usually write, LKM run in kernel mode which means that there is basically zero protection against mistakes and that you have to be very very VERY careful not to fuck up. For example, you are not protected against memory leaks and you can easily crash your system while testing your LKM. It can also lead to a file system corruption or other kinds of bad stuff so I would recommend you to test it in a virtual machine to be safe.
Also, you have no access to the standard C library but no worries, the kernel provides some cool functions to help you out.
Install the headers
Before starting you will need to install the Linux kernel headers (packaged under the name linux-headers
, it may be different in some distributions). You can then find headers of the running kernel in /lib/modules/$(uname -r)/build/include
.
Getting started
A minimal LKM
Let’s start by writing a minimal LKM.
#include <linux/init.h>
#include <linux/module.h>
// Prepend the module name when using pr_XXX functions
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/printk.h>
static __init int lkm_init(void)
{
pr_info("Hello World!");
return 0;
}
static __exit void lkm_exit(void)
{
pr_info("Bye");
}
module_init(lkm_init);
module_exit(lkm_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Just a simple LKM");
MODULE_AUTHOR("Nhqml");
As you may have noticed, there is no main
function, this is because a kernel module is not an application. Instead, you have two functions:
- init that will be called when the module is loaded (e.g. after
insmod
) - exit that will be called when the module is unloaded (e.g. after
rmmod
)
The name of these functions can be whatever you want as long as you respect the prototypes. Also, do not mind the __init
and __exit
macros, they are just hints for the compiler.
This simple module will print (with log level KERN_INFO) to kernel log buffer when loaded and unloaded.
Let’s try this module, compile it with this Makefile:
ifeq ($(KERNELRELEASE),)
# Assume the source tree is where the running kernel was built
KERNELDIR ?= /lib/modules/$(shell uname -r)/build
# The current directory is passed to sub-makes as argument
PWD := $(shell pwd)
modules:
$(MAKE) -C $(KERNELDIR) M=$(PWD) modules
modules_install:
$(MAKE) -C $(KERNELDIR) M=$(PWD) modules_install
clean:
$(MAKE) -C $(KERNELDIR) M=$(PWD) clean
.PHONY: modules modules_install clean
else
# called from kernel build system: just declare our module
obj-m := mylkm.o
endif
This will produce multiple files, our LKM is the one with the .ko
extension.
Then watch the kernel ring buffer using dmesg -kH --follow --level=info
and load your module using insmod mylkm.ko
. You should see “Hello World!” popping in the kernel log. When you unload your module (rmmod mylkm
) you should see “Bye”.
When inserting your module into the kernel, you will probably see something like:
loading out-of-tree module taints kernel.
module verification failed: signature and/or required key missing - tainting kernel
There is nothing to worry about, you can learn more about tainted kernel here.
Adding a misc device
As is, our module is pretty useless. Why would we take risks messing up with our system in kernel land just to print a “Hello World!” message?
We will register a misc device in our module to make it useful less useless.
A misc device is a special file that acts as an interface between userland and kernel mode.
#include <linux/init.h>
#include <linux/module.h>
// Prepend the module name when using pr_XXX functions
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/printk.h>
#include <linux/fs.h>
#include <linux/miscdevice.h>
static ssize_t device_read(struct file *f, char __user *to, size_t count, loff_t *off)
{
pr_info("Read!\n");
return 0;
}
static ssize_t device_write(struct file *f, const char __user *from, size_t count, loff_t *off)
{
pr_info("Write (received %lu bytes)\n", count);
return count;
}
static struct file_operations device_fops = {
.owner = THIS_MODULE,
.read = device_read,
.write = device_write,
};
static struct miscdevice device = {
.minor = MISC_DYNAMIC_MINOR,
.name = "miscdev",
.fops = &device_fops,
};
static __init int lkm_init(void)
{
int r = misc_register(&device);
if (r < 0)
{
pr_err("Could not register misc device (Error %i)\n", r);
return r;
}
pr_info("Module init succeed\n");
return 0;
}
static __exit void lkm_exit(void)
{
misc_deregister(&device);
pr_info("Module exit\n");
}
module_init(lkm_init);
module_exit(lkm_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Just a simple LKM");
MODULE_AUTHOR("Nhqml");
Here, our device defines two basic file_operations
(see this):
- read which is called whenever read(2) is called on our file (e.g.
cat /dev/miscdev
) - write which is called whenever write(2) (e.g.
echo something > /dev/miscdev
)
These functions have four arguments:
- a pointer to
struct file
pointing to our device’s underlying file - a buffer (
char *
) which is a userland buffer (that’s why we use the macro__user
) used to “receive” data from the kernel in theread
function and “send” data to the kernel in thewrite
function - the size of the buffer (
size_t
) - a pointer to
loff_t
, the internal offset of the file
Their return value should be the same as the ones described in read(2) and write(2).
When registered the device will show up in the devfs so the user can interact with it, type ls -l /dev/miscdev
to find it.
You may notice two numbers after the user and group info, they are the major and minor device numbers. In our case, the major number is 10 which is the major number for misc devices (see major.h for a list of major numbers). The minor number helps identify the device inside a category, we do not need to manually allocate it so we use MISC_DYNAMIC_MINOR
(see line 30), this will prevent any conflict with already registered devices (see a list here).
Our module now provides an API with the kernel through the misc device using read(2) and write(2) syscalls.
A more practical example
Here is an example of an LKM with a misc driver that provides an API to read and write to utsname()->version
. This kernel variable can be read via /proc/sys/kernel/version
and is displayed by uname -v
.
When the module is removed it will restore the original version.
#include <linux/init.h>
#include <linux/module.h>
#include <linux/uaccess.h>
// Prepend the module name when using pr_XXX functions
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/printk.h>
#include <linux/fs.h>
#include <linux/miscdevice.h>
#include <linux/utsname.h>
#define BUF_LEN __NEW_UTS_LEN
static char ORIGINAL_VERSION[BUF_LEN];
static char *version;
static size_t version_len;
static struct mutex version_mutex;
static void restore_version(void)
{
mutex_lock(&version_mutex);
strncpy(version, ORIGINAL_VERSION, BUF_LEN);
version_len = strlen(version);
mutex_unlock(&version_mutex);
pr_info("Restored original version '%s'", ORIGINAL_VERSION);
}
static ssize_t device_read(struct file *f, char __user *to, size_t count, loff_t *off)
{
// Everything has been read
if (*off > version_len)
return 0;
mutex_lock(&version_mutex);
version[version_len] = '\n';
ssize_t ret = simple_read_from_buffer(to, count, off, version, version_len + 1);
version[version_len] = '\0';
mutex_unlock(&version_mutex);
return ret;
}
static ssize_t device_write(struct file *f, const char __user *from, size_t count, loff_t *off)
{
// Value is too long
if (count >= BUF_LEN)
return -EINVAL;
mutex_lock(&version_mutex);
if (copy_from_user(version, from, count))
return -EFAULT;
version_len = count;
version[version_len] = '\0';
mutex_unlock(&version_mutex);
return count;
}
static struct file_operations device_fops = {
.owner = THIS_MODULE,
.read = device_read,
.write = device_write,
};
static struct miscdevice device = {
.minor = MISC_DYNAMIC_MINOR,
.name = "version",
.fops = &device_fops,
};
static __init int lkm_init(void)
{
int r = misc_register(&device);
if (r < 0)
{
pr_err("Could not register misc device (Error %i)\n", r);
return r;
}
mutex_init(&version_mutex);
// Initialize global variables
version = utsname()->version;
version_len = strlen(version);
// Save original version
strncpy(ORIGINAL_VERSION, version, BUF_LEN);
pr_info("Saved original version '%s'", ORIGINAL_VERSION);
pr_info("Module init succeed\n");
return 0;
}
static __exit void lkm_exit(void)
{
misc_deregister(&device);
restore_version();
}
module_init(lkm_init);
module_exit(lkm_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Edit kernel version variable");
MODULE_AUTHOR("Nhqml");
Here are a few points to note:
- Even if the glibc is not available, some useful functions have been implemented in the kernel such as
strlen
andstrncpy
. - We use a mutex to avoid Race Condition to occur, there is not much to understand here, we just lock our mutex before accessing our variables and unlock it after. Learn more about locking here.
- To copy data to/from userland the kernel provides multiple functions such as
copy_from_user
that we use here, it takes three parameters: the destination buffer (in the kernel), the source buffer (in userland), and the number of bytes to copy. Also have a look atsimple_read_from_bufer
documentation and source. Other functions can be found here. - A negative return value indicates an error and its value corresponds to an error code (see errno(3)).
Going further
ioctl: Define device-specific operations
Ioctl stands for “input-output control”. The ioctl(2) syscall is used to make a request to underlying devices of special files (e.g. character files). We previously implemented read
and write
functions (these are “standard”, by that I mean there is an associated syscall) for our misc device, ioctl provides a way to create custom commands four our device.
First, we need to define our ioctl inside a header file, the kernel provides four macros to define ioctl numbers:
_IO(type, nr)
for ioctl without parameter_IOW(type, nr, size)
for ioctl with Write parameter (userland is writing, kernel is reading)_IOR(type, nr, size)
for ioctl with Read parameter (userland is reading, kernel is writing)_IOWR(type, nr, size)
for ioctl with both Write and Read parameter (userland and kernel are both writing and reading)
type
(or major) is just a number used to identify our driver, nr
is another number to distinguish ioctl within the same driver and size
is actually not a size but the parameter data type (e.g. int
).
You can find a list of reserved ioctl numbers here. Note that it is not really a big deal if you use a reserved ioctl number. However, avoiding conflicts is not too difficult and it will prevent the use of an ioctl on a file that does not support it.
// myioctl.h
#ifndef _MYIOCTL_H
#define _MYIOCTL_H
#define IO_MAJOR 0x4E
#define VERSION_RESET _IO(IO_MAJOR, 0)
#define VERSION_MODIFIED _IOR(IO_MAJOR, 1, int)
#define VERSION_APPEND _IOR(IO_MAJOR, 2, char)
#endif /* _MYIOCTL_H */
Then we implement our ioctl function and add it to the device’s file_operations
as unlocked_ioctl
.
// mylkm.c
#include "myioctl.h"
static long device_ioctl(struct file *f, unsigned int cmd, unsigned long param)
{
if (cmd == VERSION_RESET)
restore_version();
else if (cmd == VERSION_MODIFIED)
{
mutex_lock(&version_mutex);
put_user((strcmp(ORIGINAL_VERSION, version) != 0), (int __user *)param);
mutex_unlock(&version_mutex);
}
else if (cmd == VERSION_APPEND)
{
char *str = (char __user *)param;
// strlen_user returns the lenght INCLUDING the '\0'
size_t str_len = strlen_user(str) - 1;
mutex_lock(&version_mutex);
size_t max_len = (BUF_LEN - 1) - version_len;
mutex_unlock(&version_mutex);
// Not enough space, do not copy anything
if (str_len > max_len)
return -EINVAL;
mutex_lock(&version_mutex);
// Copy the whole string, including terminating '\0'
if (copy_from_user(version + version_len, str, str_len + 1))
return -EFAULT;
version_len += str_len;
mutex_unlock(&version_mutex);
}
else
return -EINVAL;
return 0;
}
static struct file_operations device_fops = {
.owner = THIS_MODULE,
.read = device_read,
.write = device_write,
.unlocked_ioctl = device_ioctl,
};
The ioctl function itself is really trivial, it’s just a switch on the cmd
parameter that designates the ioctl that has been called. If the ioctl takes any parameter, it is accessible at the address pointed by param
, the unsigned long is actually an address in userspace that you need to cast before using (see lines 11 & 16).
Here is a simple program to call the ioctls:
#include <sys/ioctl.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <err.h>
#include "myioctl.h"
int openf(const char *path, int flags)
{
int fd = open(path, flags);
if (fd < 1)
err(1, "Could not open file \"%s\"", path);
return fd;
}
void reset_version(void)
{
int fd = openf("/dev/version", O_RDONLY);
ioctl(fd, VERSION_RESET);
close(fd);
}
int is_modified_version(void)
{
int fd = openf("/dev/version", O_RDONLY);
int is_modified;
ioctl(fd, VERSION_MODIFIED, &is_modified);
close(fd);
return is_modified;
}
void append_version(const char *str)
{
int fd = openf("/dev/version", O_RDONLY);
ioctl(fd, VERSION_APPEND, str);
close(fd);
}
int main(int argc, char **argv)
{
if (argc == 2 && strcmp(argv[1], "reset") == 0)
reset_version();
else if (argc == 2 && strcmp(argv[1], "is_modified") == 0)
printf("%s\n", is_modified_version() ? "true": "false");
else if (argc == 3 && strcmp(argv[1], "append") == 0)
append_version(argv[2]);
else
{
printf("Valid commands: reset, is_modified, append <str>\n");
return 1;
}
return 0;
}
Infinite stream device
You probably know devices like /dev/zero
or dev/random
, they produce an infinite stream of bytes (e.g. cat /dev/zero
will stop only if you cancel it).
To be continued… (maybe, if I’m not too lazy)