Towards Hard Realtime Response from the Linux Kernel
Traditionally, realtime response has been designed into operating systems offering it. Retrofitting realtime response into an existing general-purpose OS is not impossible, but is quite difficult: all non-preemptive code paths in the OS must have deterministic execution time. It turns out that there is a technique that can allow realtime capability to be incrementally added to the Linux kernel. This technique is based on the interrupt-shielding techniques used by Dipankar Sarma and Dinakar Guniguntala and by Jim Houston, where interrupts are directed away from a particular CPU of an SMP system, thereby reducing the scheduling latency on that particular CPU. The advent of low-cost hardware multithreaded and multi-core CPUs makes this approach quite attractive for some applications, as it provides excellent realtime latency for CPU-bound user-mode applications.
However, any time a process on this realtime CPU executes a system call, it will likely acquire a lock in the kernel, thus incurring excessive latencies if a non-realtime CPU holds that lock. This might not be a problem for the process executing the system call, after all,a synchronous write to a disk drive involves considerable unavoidable latency. The problem is that -other- realtime processes running on the same CPU would also see degraded realtime response due to locks acquired while setting up the write.
This problem is addressed by surprisingly small changes: 1. Migrate realtime tasks that execute a system call to a non-realtime CPU, and then back at the completion of the system call. 2. Add checks to the scheduler's load-balancing code so that realtime CPUs do not attempt to acquire other CPUs' runqueue locks and vice versa, except under carefully controlled conditions. With these changes, the code-path-length work that currently must cover almost the entire kernel can focus on the system-call trap path (but not the system calls themselves) and portions of the scheduler, a much smaller body of code. This approach does have some shortcomings: (1) realtime processes lose their realtime characteristics within system calls, (2) any page faults destroy the page-faulting process's realtime characteristics, (3) unnecessary overhead is incurred migrating any deterministic system calls, and (4) the interactions between realtime system-call migration and other types of migration have not been thoroughly dealt with. In the short term, these can be addressed by carefully coding the realtime application.
Nonetheless, this approach promises to provide many of the benefits of a realtime OS with minimal modifications to the Linux kernel. In addition, extending approach permits the Linux kernel to be incrementally moved to a deterministic realtime design on a system-call-by-system-call basis, as needed. It is quite possible that further improvements to this approach will permit Linux to offer hard-realtime response. This paper describes the changes required to permit realtime processes to execute normal Linux system calls without degrading the realtime response provided to other realtime processes running on the same CPU, and provides detailed realtime-response comparisons.
Paul E. McKenney is a distinguished engineer at IBM and has worked on SMP, NUMA, and RCU algorithms for longer than he cares to admit. Prior to that, he worked on packet-radio and Internet protocols (but long before the Internet became popular), system administration, realtime systems, and even business applications. His hobbies include running and the usual house-wife-and-kids habit.