linux-kernelorg-stable/mm
Jack Steiner 6adef3ebe5 cpusets: new round-robin rotor for SLAB allocations
We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes & skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file & uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages & slab
pages.  Writing the file allocates both a data page & a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 >/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:44 -07:00
..
Kconfig
Kconfig.debug
Makefile
backing-dev.c
bootmem.c
bounce.c
compaction.c
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap.c
filemap_xip.c
fremap.c
highmem.c
hugetlb.c
hwpoison-inject.c
init-mm.c
internal.h
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c
maccess.c
madvise.c
memcontrol.c
memory-failure.c
memory.c
memory_hotplug.c
mempolicy.c
mempool.c
migrate.c
mincore.c
mlock.c
mm_init.c
mmap.c
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c
oom_kill.c
page-writeback.c
page_alloc.c
page_cgroup.c
page_io.c
page_isolation.c
pagewalk.c
percpu-km.c
percpu-vm.c
percpu.c
percpu_up.c
prio_tree.c
quicklist.c
readahead.c
rmap.c
shmem.c
slab.c
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap.c
swap_state.c
swapfile.c
thrash.c
truncate.c
util.c
vmalloc.c
vmscan.c
vmstat.c