Implementing abstraction

In general, abstraction is implemented by what is generically termed an Application Programming Interface (API). API is a somewhat nebulous term that means different things in the context of various programming endavours. Fundamentally, a programmer designs a set of functions and documents their interface and functionality with the principle that the actual implementation providing the API is opaque.

For example, many large web-applications provide an API accessible via HTTP. Accessing data via this method surely triggers many complicated series of remote-procedure calls, database queries and data transfer; all of which is opaque to the end user who simply receives the contracted data.

Those familiar with object-oriented languages such as Java, Python or C++ would be familiar with the abstraction provided by classes. Methods provide the interface to the class, but abstract the implementation.

Implementing abstraction with C

A common method used in the Linux Kernel and other large C code bases, which lacks a built-in concept of object-orientation, is function pointers. Learning to read this idom is key to navigating most large C code-bases. By understanding how to read the abstractions provided within the code an understanding of internal API designs can be built.

Example 1.1. Abstraction with function pointers
#include <stdio.h>

/* The API to implement */
struct greet_api
{
	int (*say_hello)(char *name);
	int (*say_goodbye)(void);
};

/* Our implementation of the hello function */
int say_hello_fn(char *name)
{
	printf("Hello %s\n", name);
	return 0;
}

/* Our implementation of the goodbye function */
int say_goodbye_fn(void)
{
	printf("Goodbye\n");
	return 0;
}

/* A struct implementing the API */
struct greet_api greet_api =
{
	.say_hello = say_hello_fn,
	.say_goodbye = say_goodbye_fn
};

/* main() doesn't need to know anything about how the
 * say_hello/goodbye works, it just knows that it does */
int main(int argc, char *argv[])
{
	greet_api.say_hello(argv[1]);
	greet_api.say_goodbye();

	printf("%p, %p, %p\n", greet_api.say_hello, say_hello_fn, &say_hello_fn);

	exit(0);
}

Code such as the above is the simplest example of constructs used repeatedly through the Linux Kernel and other C programs. Lets have a look at some specific elements.

We start out with a structure that defines the API (struct greet_api). The functions whose names are encased in parenthesis with a pointer marker describe a function pointer[1]. The function pointer describes the prototype of function it must point to; pointing it at a function without the correct return type or parameters will generate a compiler warning at least; if left in code will likely lead to incorrect operation or crashes.

We then have our implementation of the API. Often for more complex functionality you will see an idiom where API implementation functions will only be a wrapper around another function that is conventionally prepended with one or or two underscores[2] (i.e. say_hello_fn() would call another function _say_hello_function()). This has several uses; generally it relates to having simpler and smaller parts of the API (marshalling or checking arguments, for example) separate to more complex implemenation, which often eases the path to significant changes in the internal workings whilst ensuring the API remains constant. Our implementation is very simple however, and doesn't even need it's own support functions. In various projects, single, double or even triple underscore function prefixes will mean different things, but universally it is a visual warning that the function is not supposed to be called directly from "beyond" the API.

Second to last, we fill out the function pointers in struct greet_api greet_api. The name of the function is a pointer, therefore there is no need to take the address of the function (i.e. &say_hello_fn).

Finally we can call the API functions through the structure in main.

You will see this idiom constantly when navigating the souce code. The tiny example below is taken from include/linux/virtio.h in the Linux kernel source to illustrate:

Example 1.2. Abstraction in include/linux/virtio.h
/**
 * virtio_driver - operations for a virtio I/O driver
 * @driver: underlying device driver (populate name and owner).
 * @id_table: the ids serviced by this driver.
 * @feature_table: an array of feature numbers supported by this driver.
 * @feature_table_size: number of entries in the feature table array.
 * @probe: the function to call when a device is found.  Returns 0 or -errno.
 * @remove: the function to call when a device is removed.
 * @config_changed: optional function to call when the device configuration
 *    changes; may be called in interrupt context.
 */
struct virtio_driver {
        struct device_driver driver;
        const struct virtio_device_id *id_table;
        const unsigned int *feature_table;
        unsigned int feature_table_size;
        int (*probe)(struct virtio_device *dev);
        void (*scan)(struct virtio_device *dev);
        void (*remove)(struct virtio_device *dev);
        void (*config_changed)(struct virtio_device *dev);
#ifdef CONFIG_PM
        int (*freeze)(struct virtio_device *dev);
        int (*restore)(struct virtio_device *dev);
#endif
};

It's only necessary to vaguely understand that this structure is a description of a virtual I/O device. We can see the user of this API (the device driver author) is expected to provide a number of functions that will be called under various conditions during system operation (when probing for new hardware, when hardware is removed, etc). It also contains a range of data; structures which should be filled with relevant data.

Starting with descriptors like this is usually the easiest way into understanding the various layers of kernel code.

Libraries

Libraries have two roles which illustrate abstraction.

  • Allow programmers to reuse commonly accessed code.

  • Act as a black box implementing functionality for the programmer.

For example, a library implementing access to the raw data in JPEG files has both the advantage that the many programs who wish to access image files can all use the same library and the programmers building these programs do not need to worry about the exact details of the JPEG file format, but can concentrate their efforts on what their program wants to do with the image.

The standard library of a UNIX platform is generically referred to as libc. It provides the basic interface to the system: fundamental calls such as read(), write() and printf(). This API is described in its entirety by a specification called POSIX. It is freely available online and describes the many calls that make up the standard UNIX API.

Most UNIX platforms broadly follow the POSIX standard, though often differ small but sometimes important ways (hence the complexity of the various GNU autotools, which often tries to abstract away these differences for you). Linux has many interfaces that are not specified by POSIX; writing applications that use them exclusively will make your application less portable.

Libraries are a fundamental abstraction with many details. Later chapters will describe how libraries work in much greater detail.



[1] Often you will see that the names of the parameters are omitted, and only the type of the parameter is specified. This allows the implementer to specify their own parameter names avoiding warnings from the compiler.

[2] A double-underscore function __foo may conversationally be referred to as "dunder foo".