A Presentation by Lee Schermerhorn
The Linux 2.6 kernels continue to evolve their support for
NUMA platforms. Mechanisms and interfaces now [~2.6.16 and
later] support "NUMA friendly" placement of kernel and user
data, both automatically and under explict kernel subsystem
and/or application control, and kernel developers are
enhancing their subsystems to use these interfaces. As a
result, the kernel does a fairly good job of allocating memory
close to where it will be referenced--at least initially.
However, once automatic load balancing kicks in, all bets
are off. The scheduler takes no note of NUMA locality when
migrating tasks between nodes, other than fact that internode
load balancing takes place somewhat less frequently than
intranode balancing due to scheduling domain thresholds.
This paper will present two series of patches developed by
the author: "Migrate on Fault" and "Automatic Page Migration"
that, together, permit a task to pull pages local to itself
after being migrated to a new node. These patches build on
the direct migration mechanisms in recent upstream kernel
to support "lazy migration"--i.e., page are migrated in
the context of the referencing task in the fault path.
The "automatic migration" series of patches enhances the
scheduler to notify a task that it has been migrated to a
new node, so that it can arrange for lazy migration of the
pages that if references. Both "migrate on fault" and
"automatic migration" can be enabled/disabled on a per
The paper will present the results of running the
McAlpin Stream benchmark to show that enabling automatic
page migration results in overall increased bandwidth
of a NUMA platform by allowing the system to return to
a state of maximum locality after load perturbations.
This is the "hygiene" aspect of the mechanisms.
The paper will also present the results of other workloads
that show the cost/benefit of the patches when disabled
and when enabled.
I plan on presenting an overview of the current [2.6.16+]
direct migration support--depth will depending on time
available--and how these patches enhance and interact with
the current support. Design choices and rationale will
also be covered. Also, plan to address the types of
workloads that might benefit more from this approach.
Status of the patches:
I submitted the migrate on fault and automatic migration
patches to linux-mm in early '06. Because we currently
do not have workloads that demonstrate signficant improvement
[Streams benchmark is not considered a valid workload], there
is no interest in including the patches in the upstream kernel
[or even in -mm, apparently]. Still, I continue to maintain
and test the patches on upstream kernels [2.6.18-rc6-mm2, most
recently], often uncovering regressions in basic migration,
radix-tree, mm subsystems in the process.
Links to up-to-date copies of the patches will be provided.
[Current externally available patches are somewhat out of