2012-12-08

Forcing the CPU affinity can make a monothreaded process run 2-3x faster

Today, I chrooted in my system from sysresCD and I discovered that running eix-update (a gentoo portage indexing program) ran almost 3x faster under the 3.2.x linux kernel compared to the 3.6.x kernel.

After poking around, I finally discovered why. Check out this CPU usage pattern...

Basically, this is a monothreaded program (it tops at 100% CPU total). Because it is probably doing tons on IO on top of consuming 1 full thread, it must release its context often and the scheduler spreads its execution on all the cores & hyperthreads.

.
# time eix-update
[...]
eix-update  499.42s user 225.91s system 92% cpu 13:03.09 total
#

This behavior is a rather severe performance killer because the CPU cores continuously go from different sleeping states with a latency penalty each time, the cache lines are cooling down etc ...[edit] The most important latency factor is the cpu frequency scalling under the ondemand governor see below [/edit]

Let's try the same operation forcing the process on one hyperthread.

Let's see the result...

.
# time taskset 1 eix-update
[...]
taskset 1 eix-update  198.51s user 38.26s system 73% cpu 5:21.76 total
#

The 1 is in fact a bitmask saying "allowed to run on CPU1 only"

So now why the 3.2.x kernel was faster ? I suspect it is because the idling driver for the CPU was not "as good" as the one in 3.6.x so the cores did not sleep as much as on the 3.6.x kernel.

[edit] In the comment I show that putting the cpu frequency governor to "performance" is improving it as much as forcing the affinity. But it basically says to forget about energy efficiency.

Tweaking the ondemand governor to be less trigger happy on the frequency is also possible :

.
# echo -n 24 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold  
# time eix-update
[...]
eix-update  221.58s user 56.28s system 76% cpu 6:02.94 total

This up_thresold tells the governor to up the frequency if the load is more than 24% (I tried here to put slightly less than 100%/4)

It is way better, see the powertop screenshot for the different limits the CPU hits:

Trying to push it a little bit (17%)

.
# echo -n 17 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold  
# time eix-update
eix-update  212.72s user 49.50s system 76% cpu 5:42.72 total

Putting it at its minimum value (11%)

.
# echo -n 11 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
# time eix-update
eix-update  204.94s user 44.77s system 75% cpu 5:28.59 total

CPU scheduling mixed with power efficiency is a clearly a complex problem and a one size fit all is certainly not possible. Nevertheless, it is interesting to see where we can tweak our system for specific workloads. [/edit]