Code Monkey home page Code Monkey logo

wastedcores's Introduction

The Linux Scheduler: a Decade of Wasted Cores

As a central part of resource management, the OS thread scheduler must maintain the following, simple, invariant: make sure that ready threads are scheduled on available cores. As simple as it may seem, we found that this invariant is often broken in Linux. Cores may stay idle for seconds while ready threads are waiting in runqueues. In our experiments, these performance bugs caused many-fold performance degradation for synchronization-heavy scientific applications, 13% higher latency for kernel make, and a 14-23% decrease in TPC-H throughput for a widely used commercial database. The main contribution of this work is the discovery and analysis of these bugs and providing the fixes. Conventional testing techniques and debugging tools are ineffective at confirming or understanding this kind of bugs, because their symptoms are often evasive. To drive our investigation, we built new tools that check for violation of the invariant online and visualize scheduling activity. They are simple, easily portable across kernel versions and run with a negligible overhead. We believe that making these tools part of the kernel developers' tool belt can help keep this type of bugs at bay.

Important note about the patches

The main point of our paper is to raise awareness about issues in the Linux scheduler. The provided patches fix the issues encountered with our workloads, but they are not intended as generic bug fixes. They may have unwanted side effects and result in performance loss or energy waste on your machine.

Article

The Linux Scheduler: a Decade of Wasted Cores, Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, and Alexandra Fedorova. To appear in Proceedings of the Eleventh European Conference on Computer Systems (EuroSys '16), London, United Kingdom, 2016.

wastedcores's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wastedcores's Issues

wastes cores on hyperthreaded CPUs

As of 2016-04-30, this scheduler may assign jobs to both hyperthreads on a single core before assigning jobs to other fully idle physical cores, leaving them wasted.

Reproduced on a single Intel(R) Core(TM) i7-6700K CPU @ 4.70GHz, with 4 cores, 8 threads.

First, we discover which logical CPUs are provided by each physical core:
grep "core id" /proc/cpuinfo

core id     : 0
core id     : 1
core id     : 2
core id     : 3
core id     : 0
core id     : 1
core id     : 2
core id     : 3

Logical CPU's 0 and 4 are provided by physical core 1
Logical CPU's 1 and 5 are provided by physical core 2
Logical CPU's 2 and 6 are provided by physical core 3
Logical CPU's 3 and 7 are provided by physical core 4

You can also get this information here:
grep '' /sys/devices/system/cpu/cpu*/topology/thread_siblings_list

/sys/devices/system/cpu/cpu0/topology/thread_siblings_list:0,4
/sys/devices/system/cpu/cpu1/topology/thread_siblings_list:1,5
/sys/devices/system/cpu/cpu2/topology/thread_siblings_list:2,6
/sys/devices/system/cpu/cpu3/topology/thread_siblings_list:3,7
...

How to reproduce it:

Launch a CPU bound job with as many threads (or processes) as physical cores on an idle system (install 'schedtool' if you don't have it). Batch scheduling gives longer runtime slices, giving you time to inspect job placement.

schedtool -B -e openssl speed rsa4096 -multi 4

Ideally each openssl thread would run on one physical core; so any combination of logical CPUs (0 or 4) and (1 or 5) and (2 or 6) and (3 or 7) should be loaded. What actually happens?

mpstat -P ALL 1

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all   50.00    0.00    0.00    0.06    0.00    0.00    0.00    0.00    0.00   49.94
Average:       0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
Average:       1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
Average:       2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       4  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
Average:       5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
Average:       6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

The first core is over-scheduled with two jobs - CPU 0 is 100% and CPU 4 is 100%.
The second core is over-scheduled with two jobs - CPU 1 is 100% and CPU 5 is 100%.
The third core is wasted - logical CPUs 2 and 6 are both idle.
The fourth core is wasted - logical CPUs 3 and 7 are both idle.

Try another round ..

10:59:48 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
10:59:49 AM  all   50.19    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   49.81
10:59:49 AM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
10:59:49 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:59:49 AM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:59:49 AM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:59:49 AM    4  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
10:59:49 AM    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
10:59:49 AM    6  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
10:59:49 AM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

The first core is over-scheduled with two jobs - CPU 0 is 100% and CPU 4 is 100%.
The second core is ideally scheduled with one job - CPU 1 is idle and CPU 5 is 100%.
The third core is ideally scheduled with one job - CPU 2 is idle and CPU 6 is 100%.
The fourth core is wasted - logical CPUs 3 and 7 are both idle.

When cores are wasted, this is the typical result:

OpenSSL 1.0.2g-fips  1 Mar 2016
                  sign    verify    sign/s verify/s
rsa 4096 bits 0.001132s 0.000019s    883.6  51948.1

What happens if we disable one logical CPU for each hyper threaded core? I'll disable CPUs 4-7:

echo 0 > /sys/devices/system/cpu/cpu4/online
echo 0 > /sys/devices/system/cpu/cpu5/online
echo 0 > /sys/devices/system/cpu/cpu6/online
echo 0 > /sys/devices/system/cpu/cpu7/online

Relaunching the same job shows the loading is now ideal:

11:44:37 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11:44:38 AM  all   99.75    0.00    0.25    0.00    0.00    0.00    0.00    0.00    0.00    0.00
11:44:38 AM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
11:44:38 AM    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
11:44:38 AM    2  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
11:44:38 AM    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
11:44:38 AM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
11:44:38 AM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
11:44:38 AM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
11:44:38 AM    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

Resulting in 68% better performance:

OpenSSL 1.0.2g-fips  1 Mar 2016
                  sign    verify    sign/s verify/s
rsa 4096 bits 0.000673s 0.000011s   1485.2  93650.8

sched_max_numa_distance undeclared in missing_sched_domains_linux_4.patch

Problem:
The variable 'sched_max_numa_distance' defined in the 'missing_sched_domains_linux_4.patch' is not declared/referenced previously, which result in the following compiler error:

arch/x86/kernel/smpboot.c: In function ���set_cpu_sibling_map���:
arch/x86/kernel/smpboot.c:436:7: error: ���sched_max_numa_distance��� undeclared (first use in this function)
&& sched_max_numa_distance == -1)

Thank you.

[Wiki?] Script for newbies to compile under Ubuntu

My purpose of this is to help people who've not compiled a custom kernel before, let alone applied custom patches.

It's a little daunting for beginners to clone the correct branch, apply all of the patches, fix in the source a v4.1 compiler error - you identified, get all of their dependencies, generate a .config, open menuconfig, compile and then install the package.

Hopefully this will be of use to someone - it takes about 5 minutes, skipping menuconfig:
https://github.com/Turbine1991/build_ubuntu_kernel_wastedcores

Please publish the tool code

The tools are potentially the most interesting aspect of this work in the long term. They would be of great help to groups who are attempting to recreate this work and to understand the impact of the proposed kernel changes.

While there's enough detail in the paper to attempt to recreate this it would be very useful to have a working reference implementation to look at.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.