../

/Linux/ /Kernel/ /C/

Writing a Loadable Kernel Module in C

⚠️ This article may not be 100% accurate, do not hesitate to report any mistake.

Before starting

There are a few things we need to discuss before starting.

What is a LKM?

A Loadable Kernel Module is a chunk of code that can be inserted into a running Linux kernel, thus “extending” it. Linux kernel includes many drivers to support various hardware however there are also “out of tree” (outside the official kernel source code) drivers, these are LKM.

This article gives multiple examples of what can be done with an LKM.

Be careful, mere mortal!

Unlike applications you usually write, LKM run in kernel mode which means that there is basically zero protection against mistakes and that you have to be very very VERY careful not to fuck up. For example, you are not protected against memory leaks and you can easily crash your system while testing your LKM. It can also lead to a file system corruption or other kinds of bad stuff so I would recommend you to test it in a virtual machine to be safe.

Also, you have no access to the standard C library but no worries, the kernel provides some cool functions to help you out.

Install the headers

Before starting you will need to install the Linux kernel headers (packaged under the name linux-headers, it may be different in some distributions). You can then find headers of the running kernel in /lib/modules/$(uname -r)/build/include.

Getting started

A minimal LKM

Let’s start by writing a minimal LKM.

#include <linux/init.h>
#include <linux/module.h>

// Prepend the module name when using pr_XXX functions
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/printk.h>

static __init int lkm_init(void)
{
    pr_info("Hello World!");
    return 0;
}

static __exit void lkm_exit(void)
{
    pr_info("Bye");
}

module_init(lkm_init);
module_exit(lkm_exit);

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Just a simple LKM");
MODULE_AUTHOR("Nhqml");

As you may have noticed, there is no main function, this is because a kernel module is not an application. Instead, you have two functions:

The name of these functions can be whatever you want as long as you respect the prototypes. Also, do not mind the __init and __exit macros, they are just hints for the compiler.

This simple module will print (with log level KERN_INFO) to kernel log buffer when loaded and unloaded.

Let’s try this module, compile it with this Makefile:

ifeq ($(KERNELRELEASE),)
# Assume the source tree is where the running kernel was built
KERNELDIR ?= /lib/modules/$(shell uname -r)/build

# The current directory is passed to sub-makes as argument
PWD := $(shell pwd)

modules:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) modules

modules_install:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) modules_install

clean:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) clean

.PHONY: modules modules_install clean
else
# called from kernel build system: just declare our module
obj-m := mylkm.o
endif

This will produce multiple files, our LKM is the one with the .ko extension.

Then watch the kernel ring buffer using dmesg -kH --follow --level=info and load your module using insmod mylkm.ko. You should see “Hello World!” popping in the kernel log. When you unload your module (rmmod mylkm) you should see “Bye”.

When inserting your module into the kernel, you will probably see something like:

loading out-of-tree module taints kernel.
module verification failed: signature and/or required key missing - tainting kernel

There is nothing to worry about, you can learn more about tainted kernel here.

Adding a misc device

As is, our module is pretty useless. Why would we take risks messing up with our system in kernel land just to print a “Hello World!” message?

We will register a misc device in our module to make it useful less useless.

A misc device is a special file that acts as an interface between userland and kernel mode.

#include <linux/init.h>
#include <linux/module.h>

// Prepend the module name when using pr_XXX functions
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/printk.h>

#include <linux/fs.h>
#include <linux/miscdevice.h>

static ssize_t device_read(struct file *f, char __user *to, size_t count, loff_t *off)
{
    pr_info("Read!\n");
    return 0;
}

static ssize_t device_write(struct file *f, const char __user *from, size_t count, loff_t *off)
{
    pr_info("Write (received %lu bytes)\n", count);
    return count;
}

static struct file_operations device_fops = {
    .owner = THIS_MODULE,
    .read = device_read,
    .write = device_write,
};

static struct miscdevice device = {
    .minor = MISC_DYNAMIC_MINOR,
    .name = "miscdev",
    .fops = &device_fops,
};

static __init int lkm_init(void)
{
    int r = misc_register(&device);
    if (r < 0)
    {
        pr_err("Could not register misc device (Error %i)\n", r);
        return r;
    }

    pr_info("Module init succeed\n");
    return 0;
}

static __exit void lkm_exit(void)
{
    misc_deregister(&device);
    pr_info("Module exit\n");
}

module_init(lkm_init);
module_exit(lkm_exit);

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Just a simple LKM");
MODULE_AUTHOR("Nhqml");

Here, our device defines two basic file_operations (see this):

These functions have four arguments:

Their return value should be the same as the ones described in read(2) and write(2).

When registered the device will show up in the devfs so the user can interact with it, type ls -l /dev/miscdev to find it.

You may notice two numbers after the user and group info, they are the major and minor device numbers. In our case, the major number is 10 which is the major number for misc devices (see major.h for a list of major numbers). The minor number helps identify the device inside a category, we do not need to manually allocate it so we use MISC_DYNAMIC_MINOR (see line 30), this will prevent any conflict with already registered devices (see a list here).

Our module now provides an API with the kernel through the misc device using read(2) and write(2) syscalls.

A more practical example

Here is an example of an LKM with a misc driver that provides an API to read and write to utsname()->version. This kernel variable can be read via /proc/sys/kernel/version and is displayed by uname -v.

When the module is removed it will restore the original version.

#include <linux/init.h>
#include <linux/module.h>
#include <linux/uaccess.h>

// Prepend the module name when using pr_XXX functions
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/printk.h>

#include <linux/fs.h>
#include <linux/miscdevice.h>

#include <linux/utsname.h>

#define BUF_LEN __NEW_UTS_LEN
static char ORIGINAL_VERSION[BUF_LEN];

static char *version;
static size_t version_len;
static struct mutex version_mutex;

static void restore_version(void)
{
    mutex_lock(&version_mutex);
    strncpy(version, ORIGINAL_VERSION, BUF_LEN);
    version_len = strlen(version);
    mutex_unlock(&version_mutex);
    pr_info("Restored original version '%s'", ORIGINAL_VERSION);
}

static ssize_t device_read(struct file *f, char __user *to, size_t count, loff_t *off)
{
    // Everything has been read
    if (*off > version_len)
        return 0;

    mutex_lock(&version_mutex);
    version[version_len] = '\n';
    ssize_t ret = simple_read_from_buffer(to, count, off, version, version_len + 1);
    version[version_len] = '\0';
    mutex_unlock(&version_mutex);

    return ret;
}

static ssize_t device_write(struct file *f, const char __user *from, size_t count, loff_t *off)
{
    // Value is too long
    if (count >= BUF_LEN)
        return -EINVAL;

    mutex_lock(&version_mutex);
    if (copy_from_user(version, from, count))
        return -EFAULT;
    version_len = count;
    version[version_len] = '\0';
    mutex_unlock(&version_mutex);

    return count;
}

static struct file_operations device_fops = {
    .owner = THIS_MODULE,
    .read = device_read,
    .write = device_write,
};

static struct miscdevice device = {
    .minor = MISC_DYNAMIC_MINOR,
    .name = "version",
    .fops = &device_fops,
};

static __init int lkm_init(void)
{
    int r = misc_register(&device);
    if (r < 0)
    {
        pr_err("Could not register misc device (Error %i)\n", r);
        return r;
    }

    mutex_init(&version_mutex);
    // Initialize global variables
    version = utsname()->version;
    version_len = strlen(version);

    // Save original version
    strncpy(ORIGINAL_VERSION, version, BUF_LEN);
    pr_info("Saved original version '%s'", ORIGINAL_VERSION);

    pr_info("Module init succeed\n");

    return 0;
}

static __exit void lkm_exit(void)
{
    misc_deregister(&device);
    restore_version();
}

module_init(lkm_init);
module_exit(lkm_exit);

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Edit kernel version variable");
MODULE_AUTHOR("Nhqml");

Here are a few points to note:

Going further

ioctl: Define device-specific operations

Ioctl stands for “input-output control”. The ioctl(2) syscall is used to make a request to underlying devices of special files (e.g. character files). We previously implemented read and write functions (these are “standard”, by that I mean there is an associated syscall) for our misc device, ioctl provides a way to create custom commands four our device.

First, we need to define our ioctl inside a header file, the kernel provides four macros to define ioctl numbers:

type (or major) is just a number used to identify our driver, nr is another number to distinguish ioctl within the same driver and size is actually not a size but the parameter data type (e.g. int).

You can find a list of reserved ioctl numbers here. Note that it is not really a big deal if you use a reserved ioctl number. However, avoiding conflicts is not too difficult and it will prevent the use of an ioctl on a file that does not support it.

// myioctl.h
#ifndef _MYIOCTL_H
#define _MYIOCTL_H

#define IO_MAJOR 0x4E
#define VERSION_RESET _IO(IO_MAJOR, 0)
#define VERSION_MODIFIED _IOR(IO_MAJOR, 1, int)
#define VERSION_APPEND _IOR(IO_MAJOR, 2, char)

#endif /* _MYIOCTL_H */

Then we implement our ioctl function and add it to the device’s file_operations as unlocked_ioctl.

// mylkm.c
#include "myioctl.h"

static long device_ioctl(struct file *f, unsigned int cmd, unsigned long param)
{
    if (cmd == VERSION_RESET)
        restore_version();
    else if (cmd == VERSION_MODIFIED)
    {
        mutex_lock(&version_mutex);
        put_user((strcmp(ORIGINAL_VERSION, version) != 0), (int __user *)param);
        mutex_unlock(&version_mutex);
    }
    else if (cmd == VERSION_APPEND)
    {
        char *str = (char __user *)param;
        // strlen_user returns the lenght INCLUDING the '\0'
        size_t str_len = strlen_user(str) - 1;

        mutex_lock(&version_mutex);
        size_t max_len = (BUF_LEN - 1) - version_len;
        mutex_unlock(&version_mutex);

        // Not enough space, do not copy anything
        if (str_len > max_len)
            return -EINVAL;

        mutex_lock(&version_mutex);
        // Copy the whole string, including terminating '\0'
        if (copy_from_user(version + version_len, str, str_len + 1))
            return -EFAULT;
        version_len += str_len;
        mutex_unlock(&version_mutex);
    }
    else
        return -EINVAL;

    return 0;
}

static struct file_operations device_fops = {
    .owner = THIS_MODULE,
    .read = device_read,
    .write = device_write,
    .unlocked_ioctl = device_ioctl,
};

The ioctl function itself is really trivial, it’s just a switch on the cmd parameter that designates the ioctl that has been called. If the ioctl takes any parameter, it is accessible at the address pointed by param, the unsigned long is actually an address in userspace that you need to cast before using (see lines 11 & 16).

Here is a simple program to call the ioctls:

#include <sys/ioctl.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <err.h>

#include "myioctl.h"

int openf(const char *path, int flags)
{
    int fd = open(path, flags);
    if (fd < 1)
        err(1, "Could not open file \"%s\"", path);

    return fd;
}

void reset_version(void)
{
    int fd = openf("/dev/version", O_RDONLY);
    ioctl(fd, VERSION_RESET);
    close(fd);
}

int is_modified_version(void)
{
    int fd = openf("/dev/version", O_RDONLY);
    int is_modified;
    ioctl(fd, VERSION_MODIFIED, &is_modified);
    close(fd);

    return is_modified;
}

void append_version(const char *str)
{
	int fd = openf("/dev/version", O_RDONLY);
	ioctl(fd, VERSION_APPEND, str);
	close(fd);
}

int main(int argc, char **argv)
{
	if (argc == 2 && strcmp(argv[1], "reset") == 0)
		reset_version();
	else if (argc == 2 && strcmp(argv[1], "is_modified") == 0)
		printf("%s\n", is_modified_version() ? "true": "false");
	else if (argc == 3 && strcmp(argv[1], "append") == 0)
		append_version(argv[2]);
	else
	{
		printf("Valid commands: reset, is_modified, append <str>\n");
		return 1;
	}

    return 0;
}

Infinite stream device

You probably know devices like /dev/zero or dev/random, they produce an infinite stream of bytes (e.g. cat /dev/zero will stop only if you cancel it).

To be continued… (maybe, if I’m not too lazy)