Random people Random location Random misc

NUMA pagecache replication

In a NUMA system, the machine is divided into nodes. Each node usually
consists of one or more CPUs and one or more memory controllers, uniformly
connected to one another. The memory is considered to be "local" to the CPUs.
Each node is connected to other nodes by an interconnect, and so the CPU on
one node can access "remote" memory on another node. Remote memory
access tends to be slower, and as a rule it should be minimised.

In the most ideal situation, each CPU in the system would operate only on its
local memory. Linux tries to achieve this goal by allocating local memory by
default, when a program running on a CPU asks for more memory. This is a
reasonable strategy, but what happens if memory needs to be accessed by
different CPUs with different local nodes?

The current strategy used is to interleave such widely used allocations over
all nodes, with a bit of memory allocated to each node. This means that memory
load is placed as evenly as possible on the system. While this helps prevent
some bottlenecks, it is not optimal. If we consider an 8 node system,
then statistically, access to such interleaved memory has only a 1/8th chance
of being a local memory access. Suppose remote memory is 75% the
performance of local memory, then interleaved memory is statistically only 78%
the performance of local memory.

An alternative approach is to replicate the memory. That is, make a copy of
that memory which is local to each CPU that accesses it. This can by achieved
by replicating pages that are only being read from, and discarding the replicas
when the memory is written to. Despite the limitation to read-only pages,
replication is applicable to a number of important cases such as shared
libraries, program files, and common data files. In these situations,
replication can lift the performance of shared memory to 100% across all CPUs.

I have implemented an initial patch to do NUMA pagecache replication in Linux
2.6.20 (which hopefully will have had more work by next year), and I would like
to present its motivation, design, and performance results, and possibly
to explore the way it complements automatic memory migration.


Project: NUMA pagecache replication in Linux 

Nick Piggin

Nick is a Linux kernel hacker working at SUSE Labs.

Nick Piggin

Nick is a Linux kernel hacker working at SUSE Labs.

© 2007 MEL8OURNE LCA2008 and Linux Australia | Linux is a registered trademark of Linus Torvalds | Site map | Valid XHTML 1.0