Code Monkey home page Code Monkey logo

scx's Introduction

Sched_ext Schedulers and Tools

sched_ext is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. This repository contains various scheduler implementations and support utilities.

sched_ext enables safe and rapid iterations of scheduler implementations, thus radically widening the scope of scheduling strategies that can be experimented with and deployed; even in massive and complex production environments.

  • The scx_layered case study concretely demonstrates the power and benefits of sched_ext.
  • For a high-level but thorough overview of the sched_ext (especially its motivation), please refer to the overview document.
  • For a description of the schedulers shipped with this tree, please refer to the schedulers document.
  • The following video is the scx_rustland scheduler which makes most scheduling decisions in userspace Rust code showing better FPS in terraria while kernel is being compiled. This doesn't mean that scx_rustland is a better scheduler but does demonstrate how safe and easy it is to implement a scheduler which is generally usable and can outperform the default scheduler in certain scenarios.
scx_rustland-terraria.mp4

While the kernel feature is not upstream yet, we believe sched_ext has a reasonable chance of landing upstream in the foreseeable future. Both Meta and Google are fully committed to sched_ext and Meta is in the process of mass production deployment. See (#kernel-feature-status) for more details.

In all example shell commands, $SCX refers to the root of this repository.

Getting Started

All that's necessary for running sched_ext schedulers is a kernel with sched_ext support and the scheduler binaries along with the libraries they depend on. Switching to a sched_ext scheduler is as simple as running a sched_ext binary:

root@test ~# cat /sys/kernel/sched_ext/state /sys/kernel/sched_ext/*/ops 2>/dev/null
disabled
root@test ~# scx_simple
local=1 global=0
local=74 global=15
local=78 global=32
local=82 global=42
local=86 global=54
^Zfish: Job 1, 'scx_simple' has stopped
root@test ~# cat /sys/kernel/sched_ext/state /sys/kernel/sched_ext/*/ops 2>/dev/null
enabled
simple
root@test ~# fg
Send job 1 (scx_simple) to foreground
local=635 global=179
local=696 global=192
^CEXIT: BPF scheduler unregistered

scx_simple is a very simple global vtime scheduler which can behave acceptably on CPUs with a simple topology (single socket and single L3 cache domain).

Above, we switch the whole system to use scx_simple by running the binary, suspend it with ctrl-z to confirm that it's loaded, and then switch back to the kernel default scheduler by terminating the process with ctrl-c. For scx_simple, suspending the scheduler process doesn't affect scheduling behavior because all that the userspace component does is print statistics. This doesn't hold for all schedulers.

In addition to terminating the program, there are two more ways to disable a sched_ext scheduler - sysrq-S and the watchdog timer. Ignoring kernel bugs, the worst damage a sched_ext scheduler can do to a system is starving some threads until the watchdog timer triggers.

As illustrated, once the kernel and binaries are in place, using sched_ext schedulers is straightforward and safe. While developing and building schedulers in this repository isn't complicated either, sched_ext makes use of many new BPF features, some of which require build tools which are newer than what many distros are currently shipping. This should become less of an issue in the future. For the time being, the following custom repositories are provided for select distros.

Install Instructions by Distro

Repository Structure

scx
|-- scheds               : Sched_ext scheduler implementations
|   |-- include          : Shared BPF and user C include files including vmlinux.h
|   |-- c                : Example schedulers - userspace code written C
|   \-- rust             : Example schedulers - userspace code written Rust
\-- rust                 : Rust support code
    \-- scx_utils        : Common utility library for rust schedulers

Build & Install

meson is the main build system but each Rust sub-project is its own self-contained cargo project and can be built and published separately. The followings are the dependencies and version requirements.

Note: Many distros only have earlier versions of meson, in that case just clone the meson repo and call meson.py e.g. /path/to/meson/repo/meson.py compile -C build. Alternatively, use pip e.g. pip install meson or pip install meson --break-system-packages (if needed).

  • meson: >=1.2, build scripts under meson-scripts/ use bash and standard utilities including awk.
  • clang: >=16 required, >=17 recommended
  • libbpf: >=1.2.2 required, >=1.3 recommended (RESIZE_ARRAY support is new in 1.3). It's preferred to link statically against the source from the libbpf git repo, which is cloned during setup.
  • Rust toolchain: >=1.72
  • libelf, libz, libzstd if linking against staic libbpf.a
  • bpftool By default this is cloned and built as part of the default build process. Alternatively it's usually available in linux-tools-common.

The kernel has to be built with the following configuration:

  • CONFIG_BPF=y
  • CONFIG_BPF_EVENTS=y
  • CONFIG_BPF_JIT=y
  • CONFIG_BPF_SYSCALL=y
  • CONFIG_DEBUG_INFO_BTF=y
  • CONFIG_FTRACE=y
  • CONFIG_SCHED_CLASS_EXT=y

Setting Up and Building

meson always uses a separate build directory. Running the following commands in the root of the tree builds and installs all schedulers under ~/bin.

Static linking against libbpf (preferred)

$ cd $SCX
$ meson setup build --prefix ~
$ meson compile -C build
$ meson install -C build

Notes: meson setup will also clone both libbpf and bpftool repos and meson compile will build them both.

Make sure you have dependencies installed that allow you to compile from source!

Ubuntu/Debian
apt install gcc-multilib build-essential libssl-dev llvm lld libelf-dev
Arch Linux
pacman -S base-devel

Static linking against system libbpf

Note, depending on your system configuration libbpf_a and libbpf_h may be in different directories. The system libbpf version needs to match the minimum libbpf version for scx.

$ cd $SCX
$ meson setup build --prefix ~ -D libbpf_a=/usr/lib64/libbpf.a libbpf_h=/usr/include/bpf/
$ meson compile -C build
$ meson install -C build

Dynamic linking against libbpf

$ cd $SCX
$ meson setup build --prefix ~ -D libbpf_a=disabled
$ meson compile -C build
$ meson install -C build

Using a different bpftool

This will check the system for an installed bpftool

$ meson setup build --prefix ~ -D bpftool=disabled

Using a custom built bpftool

$ meson setup build --prefix ~ -D bpftool=/path/to/bpftool

Note that meson compile step is not strictly necessary as install implies compile. The above also will build debug binaries with optimizations turned off, which is useful for development but they aren't optimized and big. For actual use you want to build release binaries. meson uses -D argument to specify build options. The configuration options can be specified at setup time but can also be changed afterwards and meson will do the right thing. To switch to release builds, run the following in the build directory and then compile and install again.

$ meson configure -Dbuildtype=release

Running meson configure without any argument shows all current build options. For more information on meson arguments and built-in options, please refer to meson --help and its documentation.

Building Specific Schedulers and Binary Locations

If you just want to build a subset of schedulers, you can specify the scheduler names as arguments to meson compile. For example, if we just want to build the simple example scheduler scheds/c/scx_simple and the Rust userspace scheduler scheds/rust/scx_rusty:

$ cd $SCX
$ meson setup build -Dbuildtype=release
$ meson compile -C build scx_simple scx_rusty

⚠️ If your system has sccache installed: meson automatically uses sccache if available. However, sccache fails in one of the build steps. If you encounter this issue, disable sccache by specifying CC directly - $ CC=clang meson setup build -Dbuildtype=release.

You can also specify -v if you want to see the commands being used:

$ meson compile -C build -v scx_pair

For C userspace schedulers such as the ones under scheds/c, the built binaries are located in the same directory under the build root. For example, here, the scx_simple binary can be found at $SCX/build/scheds/c/scx_simple.

For Rust userspace schedulers such as the ones under scheds/rust, the same directory under the build root is used as the cargo build target directory. Thus, here, the scx_rusty binary can be found at $SCX/build/scheds/rust/scx_rusty/release/scx_rusty.

SCX specific build options

While the default options should work in most cases, it may be desirable to override some of the toolchains and dependencies - e.g. to directly use libbpf built from the kernel source tree. The following meson build options can be used in such cases.

  • bpf_clang: clang to use when compiling .bpf.c
  • bpftool: bpftool to use when generating .bpf.skel.h. Set this to "disabled" to check the system for an already installed bpftool
  • libbpf_a: Static libbpf.a to use. Set this to "disabled" to link libbpf dynamically
  • libbpf_h: libbpf header directories, only meaningful with libbpf_a option
  • cargo: cargo to use when building rust sub-projects
  • 'cargo_home': 'CARGO_HOME env to use when invoking cargo'
  • offline: 'Compilation step should not access the internet'
  • enable_rust: 'Enable the build of rust sub-projects'
  • serialize: 'Enable/disable the sequential build of the schedulers. Set this to false if you need to build just one scheduler.'

For example, let's say you want to use bpftool and libbpf shipped in the kernel tree located at $KERNEL. We need to build bpftool in the kernel tree first, set up SCX build with the related options and then build & install.

$ cd $KERNEL
$ make -C tools/bpf/bpftool
$ cd $SCX
$ BPFTOOL=$KERNEL/tools/bpf/bpftool
$ meson setup build -Dbuildtype=release -Dprefix=~/bin \
    -Dbpftool=$BPFTOOL/bpftool \
    -Dlibbpf_a=$BPFTOOL/libbpf/libbpf.a \
    -Dlibbpf_h=$BPFTOOL/libbpf/include
$ meson install -C build

Note that we use libbpf which was produced as a part of bpftool build process rather than buliding libbpf directly. This is necessary because libbpf header files need to be installed for them to be in the expected relative locations.

Offline Compilation

Rust builds automatically download dependencies from crates.io; however, some build environments might not allow internet access requiring all dependencies to be available offline. The fetch target and offline option are provided for such cases.

The following downloads all Rust dependencies into $HOME/cargo-deps.

$ cd $SCX
$ meson setup build -Dcargo_home=$HOME/cargo-deps
$ meson compile -C build fetch

The following builds the schedulers without accessing the internet. The build directory doesn't have to be the same one. The only requirement is that the cargo_home option points to a directory which contains the content generated from the previous step.

$ cd $SCX
$ meson setup build -Dcargo_home=$HOME/cargo-deps -Doffline=true -Dbuildtype=release
$ meson compile -C build

Working with Rust Sub-projects

Each Rust sub-project is its own self-contained cargo project. When buildng as a part of this repository, meson invokes cargo with the appropriate options and environment variables to sync the build environment. When building separately by running cargo build directly in a sub-project directory, it will automatically figure out build environment. Please take a look at the scx_utils::BpfBuilder documentation for details.

For example, the following builds and runs the scx_rusty scheduler:

$ cd $SCX/scheds/rust/scx_rusty
$ cargo build --release
$ cargo run --release

Here too, the build step is not strictly necessary as it's implied by run.

Note that Rust userspace schedulers are published on crates.io and can be built and installed without cloning this repository as long as the necessary toolchains are available. Simply run:

$ cargo install scx_rusty

and scx_rusty will be built and installed as ~/.cargo/bin/scx_rusty.

systemd services

See: services

Kernel Feature Status

The kernel feature is not yet upstream and can be found in the sched_ext repository. The followings are important branches:

A list of the breaking changes in the sched_ext kernel tree and the associated commits for the schedulers in this repo.

Getting in Touch

We aim to build a friendly and approachable community around sched_ext. You can reach us through the following channels:

We also hold weekly office hours every Tuesday. Please see the #office-hours channel on slack for details. To join the slack community, you can use this link.

Additional Resources

There are blog posts and articles about sched_ext, which helps you to explore sched_ext in various ways. Followings are some examples:

scx's People

Contributors

arighi avatar aruhier avatar byte-lab avatar d-e-s-o avatar danieljordan10 avatar davemarchevsky avatar davide125 avatar dschatzberg avatar frelon avatar gghh avatar hodgesds avatar htejun avatar jfernandez avatar jordalgo avatar kawanaao avatar kkdwivedi avatar leitao avatar multics69 avatar nycko123 avatar otteryc avatar pprighi avatar ptr1337 avatar rinhizakura avatar sirlucjan avatar takase1121 avatar thinkeryzu1 avatar vax-r avatar vimproved avatar vnepogodin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scx's Issues

README : Incorrect description about behavior of scx_XXX scheduler

Currently, there is a statement in the README that if you install scx_XXX, it means that all tasks will be moved there.

scx/README.md

Line 67 in 274bcf7

Above, we switch the whole system to use `scx_simple` by running the binary,

However, in this commit, I thought the change was made so that simply installing scx_XXX would not move the task to this scheduler?

If so, I felt the README description should be changed.
I apologize if I have missed something.

ref : https://lore.kernel.org/r/all/[email protected]/T/

[QUESTION] rustland CPU usage in high load

During kernel compilation, I opened btop to see that rustland is using around 7% CPU.
Is this normal?
I suppose for a job like compiling the scheduler has to constantly allocate and switch jobs in the queue. That could result in high CPU utilization.

Specs:

  • AMD Ryzen 5 7600
  • CachyOS kernel 6.8.4
  • 16 Gigs of RAM

Schedulers fail to load: program of this type cannot use helper bpf_probe_read_str

Hi,
Using CachyOS kernel (6.9.1) on Gentoo with libbpf 1.4.2, I can't load any scheduler of the release 1.9 or on main (c09bc2a).

Logs of scx_simple -v
$ sudo ./scx_simple -v

libbpf: object 'scx_simple': failed (-95) to create BPF token from '/sys/fs/bpf', skipping optional step...
libbpf: loaded kernel BTF from '/sys/kernel/btf/vmlinux'
libbpf: extern (func ksym) 'scx_bpf_consume': resolved to vmlinux [52985]
libbpf: extern (func ksym) 'scx_bpf_create_dsq': resolved to vmlinux [52992]
libbpf: extern (func ksym) 'scx_bpf_dispatch': resolved to vmlinux [52995]
libbpf: extern (func ksym) 'scx_bpf_dispatch_vtime': resolved to vmlinux [52999]
libbpf: extern (func ksym) 'scx_bpf_select_cpu_dfl': resolved to vmlinux [53024]
libbpf: extern 'scx_bpf_switch_all' (weak): not resolved, defaulting to zero
libbpf: struct_ops init_kern simple_ops: type_id:419 kern_type_id:50305 kern_vtype_id:50386
libbpf: struct_ops init_kern simple_ops: func ptr select_cpu is set to prog simple_select_cpu from data(+0) to kern_data(+0)
libbpf: struct_ops init_kern simple_ops: func ptr enqueue is set to prog simple_enqueue from data(+8) to kern_data(+8)
libbpf: struct_ops init_kern simple_ops: func ptr dispatch is set to prog simple_dispatch from data(+24) to kern_data(+24)
libbpf: struct_ops init_kern simple_ops: func ptr running is set to prog simple_running from data(+48) to kern_data(+48)
libbpf: struct_ops init_kern simple_ops: func ptr stopping is set to prog simple_stopping from data(+56) to kern_data(+56)
libbpf: struct_ops init_kern simple_ops: func ptr enable is set to prog simple_enable from data(+144) to kern_data(+144)
libbpf: struct_ops init_kern simple_ops: func ptr init is set to prog simple_init from data(+248) to kern_data(+248)
libbpf: struct_ops init_kern simple_ops: func ptr exit is set to prog simple_exit from data(+256) to kern_data(+256)
libbpf: struct_ops init_kern simple_ops: copy dispatch_max_batch 4 bytes from data(+264) to kern_data(+264)
libbpf: struct_ops init_kern simple_ops: copy flags 8 bytes from data(+272) to kern_data(+272)
libbpf: struct_ops init_kern simple_ops: copy timeout_ms 4 bytes from data(+280) to kern_data(+280)
libbpf: struct_ops init_kern simple_ops: copy exit_dump_len 4 bytes from data(+284) to kern_data(+284)
libbpf: struct_ops init_kern simple_ops: copy hotplug_seq 8 bytes from data(+288) to kern_data(+288)
libbpf: struct_ops init_kern simple_ops: copy name 128 bytes from data(+296) to kern_data(+296)
libbpf: sec 'struct_ops/simple_enqueue': found 1 CO-RE relocations
libbpf: CO-RE relocating [21] struct task_struct: found target candidate [127] struct task_struct in [vmlinux]
libbpf: prog 'simple_enqueue': relo #0: <byte_off> [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848)
libbpf: prog 'simple_enqueue': relo #0: matching candidate #0 <byte_off> [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920)
libbpf: prog 'simple_enqueue': relo #0: patched insn #23 (LDX/ST/STX) off 848 -> 920
libbpf: sec 'struct_ops/simple_running': found 2 CO-RE relocations
libbpf: prog 'simple_running': relo #0: <byte_off> [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848)
libbpf: prog 'simple_running': relo #0: matching candidate #0 <byte_off> [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920)
libbpf: prog 'simple_running': relo #0: patched insn #8 (LDX/ST/STX) off 848 -> 920
libbpf: prog 'simple_running': relo #1: <byte_off> [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848)
libbpf: prog 'simple_running': relo #1: matching candidate #0 <byte_off> [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920)
libbpf: prog 'simple_running': relo #1: patched insn #11 (LDX/ST/STX) off 848 -> 920
libbpf: sec 'struct_ops/simple_stopping': found 4 CO-RE relocations
libbpf: prog 'simple_stopping': relo #0: <byte_off> [21] struct task_struct.scx.slice (0:23:14 @ offset 840)
libbpf: prog 'simple_stopping': relo #0: matching candidate #0 <byte_off> [127] struct task_struct.scx.slice (0:23:15 @ offset 912)
libbpf: prog 'simple_stopping': relo #0: patched insn #5 (LDX/ST/STX) off 840 -> 912
libbpf: prog 'simple_stopping': relo #1: <byte_off> [21] struct task_struct.scx.weight (0:23:4 @ offset 756)
libbpf: prog 'simple_stopping': relo #1: matching candidate #0 <byte_off> [127] struct task_struct.scx.weight (0:23:4 @ offset 820)
libbpf: prog 'simple_stopping': relo #1: patched insn #8 (LDX/ST/STX) off 756 -> 820
libbpf: prog 'simple_stopping': relo #2: <byte_off> [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848)
libbpf: prog 'simple_stopping': relo #2: matching candidate #0 <byte_off> [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920)
libbpf: prog 'simple_stopping': relo #2: patched insn #11 (LDX/ST/STX) off 848 -> 920
libbpf: prog 'simple_stopping': relo #3: <byte_off> [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848)
libbpf: prog 'simple_stopping': relo #3: matching candidate #0 <byte_off> [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920)
libbpf: prog 'simple_stopping': relo #3: patched insn #13 (LDX/ST/STX) off 848 -> 920
libbpf: sec 'struct_ops/simple_enable': found 1 CO-RE relocations
libbpf: prog 'simple_enable': relo #0: <byte_off> [21] struct task_struct.scx.dsq_vtime (0:23:15 @ offset 848)
libbpf: prog 'simple_enable': relo #0: matching candidate #0 <byte_off> [127] struct task_struct.scx.dsq_vtime (0:23:16 @ offset 920)
libbpf: prog 'simple_enable': relo #0: patched insn #4 (LDX/ST/STX) off 848 -> 920
libbpf: sec 'struct_ops.s/simple_init': found 1 CO-RE relocations
libbpf: CO-RE relocating [407] enum scx_ops_flags: found target candidate [50297] enum scx_ops_flags in [vmlinux]
libbpf: prog 'simple_init': relo #0: <enumval_exists> [407] enum scx_ops_flags::SCX_OPS_SWITCH_PARTIAL = 8
libbpf: prog 'simple_init': relo #0: matching candidate #0 <enumval_exists> [50297] enum scx_ops_flags::SCX_OPS_SWITCH_PARTIAL = 8
libbpf: prog 'simple_init': relo #0: patched insn #0 (LDIMM64) imm64 1 -> 1
libbpf: sec 'struct_ops/simple_exit': found 6 CO-RE relocations
libbpf: CO-RE relocating [413] struct scx_exit_info: found target candidate [50296] struct scx_exit_info in [vmlinux]
libbpf: prog 'simple_exit': relo #0: <byte_off> [413] struct scx_exit_info.reason (0:2 @ offset 16)
libbpf: prog 'simple_exit': relo #0: matching candidate #0 <byte_off> [50296] struct scx_exit_info.reason (0:2 @ offset 16)
libbpf: prog 'simple_exit': relo #0: patched insn #1 (LDX/ST/STX) off 16 -> 16
libbpf: prog 'simple_exit': relo #1: <byte_off> [413] struct scx_exit_info.msg (0:5 @ offset 40)
libbpf: prog 'simple_exit': relo #1: matching candidate #0 <byte_off> [50296] struct scx_exit_info.msg (0:5 @ offset 40)
libbpf: prog 'simple_exit': relo #1: patched insn #12 (LDX/ST/STX) off 40 -> 40
libbpf: prog 'simple_exit': relo #2: <byte_off> [413] struct scx_exit_info.dump (0:6 @ offset 48)
libbpf: prog 'simple_exit': relo #2: matching candidate #0 <byte_off> [50296] struct scx_exit_info.dump (0:6 @ offset 48)
libbpf: prog 'simple_exit': relo #2: patched insn #18 (LDX/ST/STX) off 48 -> 48
libbpf: prog 'simple_exit': relo #3: <field_exists> [413] struct scx_exit_info.exit_code (0:1 @ offset 8)
libbpf: prog 'simple_exit': relo #3: matching candidate #0 <field_exists> [50296] struct scx_exit_info.exit_code (0:1 @ offset 8)
libbpf: prog 'simple_exit': relo #3: patched insn #22 (ALU/ALU64) imm 1 -> 1
libbpf: prog 'simple_exit': relo #4: <byte_off> [413] struct scx_exit_info.exit_code (0:1 @ offset 8)
libbpf: prog 'simple_exit': relo #4: matching candidate #0 <byte_off> [50296] struct scx_exit_info.exit_code (0:1 @ offset 8)
libbpf: prog 'simple_exit': relo #4: patched insn #24 (LDX/ST/STX) off 8 -> 8
libbpf: prog 'simple_exit': relo #5: <byte_off> [413] struct scx_exit_info.kind (0:0 @ offset 0)
libbpf: prog 'simple_exit': relo #5: matching candidate #0 <byte_off> [50296] struct scx_exit_info.kind (0:0 @ offset 0)
libbpf: prog 'simple_exit': relo #5: patched insn #26 (LDX/ST/STX) off 0 -> 0
libbpf: prog 'simple_init': relo #1: poisoning insn #3 that calls kfunc 'scx_bpf_switch_all'
libbpf: map 'stats': created successfully, fd=3
libbpf: map 'scx_simp.rodata': created successfully, fd=4
libbpf: map '.data.uei_dump': created successfully, fd=5
libbpf: map 'scx_simp.data': created successfully, fd=6
libbpf: map 'scx_simp.bss': created successfully, fd=7
libbpf: map 'simple_ops': created successfully, fd=8
libbpf: prog 'simple_exit': BPF program load failed: Invalid argument
libbpf: prog 'simple_exit': -- BEGIN PROG LOAD LOG --
Global function simple_exit() doesn't return scalar. Only those are supported.
0: R1=ctx() R10=fp0
; void BPF_STRUCT_OPS(simple_exit, struct scx_exit_info *ei) @ scx_simple.bpf.c:143
0: (79) r6 = *(u64 *)(r1 +0)
func 'exit' arg0 has btf_id 50296 type STRUCT 'scx_exit_info'
1: R1=ctx() R6_w=trusted_ptr_scx_exit_info()
; UEI_RECORD(uei, ei); @ scx_simple.bpf.c:145
1: (79) r3 = *(u64 *)(r6 +16)         ; R3_w=scalar() R6_w=trusted_ptr_scx_exit_info()
2: (18) r7 = 0xffffabe20501f000       ; R7_w=map_value(map=scx_simp.data,ks=4,vs=1168)
4: (18) r1 = 0xffffabe20501f000       ; R1_w=map_value(map=scx_simp.data,ks=4,vs=1168)
6: (07) r1 += 16                      ; R1_w=map_value(map=scx_simp.data,ks=4,vs=1168,off=16)
7: (b4) w2 = 128                      ; R2_w=128
8: (85) call bpf_probe_read_str#45
program of this type cannot use helper bpf_probe_read_str#45
processed 7 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'simple_exit': failed to load: -22
libbpf: failed to load object 'scx_simple'
libbpf: failed to load BPF skeleton 'scx_simple': -22
../scheds/c/scx_simple.c:88 [scx panic]: Invalid argument
Failed to load skel

I'm using BORE-sched-ext, and sched_ext seems to be working:

$ cat /sys/kernel/sched_ext/state
disabled

Do I need to enable a specific BPF feature or is it an incompatibility with scx and cachyos patches?

Kernel config

scx_rustland stalling tasks dispatched in the per-CPU DSQs

A user reported a stall condition with scx_rustland, where tasks dispatched to the per-CPU DSQs may trigger the sched_ext watchdog. The trace looks like the following:

Jun 13 05:27:08 SoulHarsh007 bash[1097]: DEBUG DUMP
Jun 13 05:27:08 SoulHarsh007 bash[1097]: ================================================================================
Jun 13 05:27:08 SoulHarsh007 bash[1097]: kworker/u64:0[186743] triggered exit kind 1026:
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   runnable task stall (panel-91-system[2686] failed to run for 5.761s)
Jun 13 05:27:08 SoulHarsh007 bash[1097]: Backtrace:
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   scx_watchdog_workfn+0x154/0x1e0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   process_one_work+0x19d/0x360
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   worker_thread+0x2fa/0x490
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   kthread+0xd2/0x100
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   ret_from_fork+0x34/0x50
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   ret_from_fork_asm+0x1a/0x30
Jun 13 05:27:08 SoulHarsh007 bash[1097]: CPU states
Jun 13 05:27:08 SoulHarsh007 bash[1097]: ----------
Jun 13 05:27:08 SoulHarsh007 bash[1097]: CPU 9   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=23798810 pnt_seq=90222261
Jun 13 05:27:08 SoulHarsh007 bash[1097]:           curr=swapper/9[0] class=idle_sched_class
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   R panel-91-system[2686] -5761ms
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       sticky/holding_cpu=-1/-1 dsq_id=0x9 dsq_vtime=0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       cpus=ffff
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     __x64_sys_poll+0x58/0x180
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     do_syscall_64+0x82/0x160
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jun 13 05:27:08 SoulHarsh007 bash[1097]: CPU 11  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=19746198 pnt_seq=63418051
Jun 13 05:27:08 SoulHarsh007 bash[1097]:           curr=swapper/11[0] class=idle_sched_class
Jun 13 05:27:08 SoulHarsh007 bash[1097]:   R scx_rustland[1097] +0ms
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=2/19746197
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       cpus=ffff
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     do_syscall_64+0x82/0x160
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jun 13 05:27:08 SoulHarsh007 bash[1097]: CPU 13  : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=19979467 pnt_seq=64167440
Jun 13 05:27:08 SoulHarsh007 bash[1097]:           curr=kworker/u64:0[186743] class=ext_sched_class
Jun 13 05:27:08 SoulHarsh007 bash[1097]:  *R kworker/u64:0[186743] +0ms
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:       cpus=ffff
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     scx_dump_state+0x6e9/0x8b0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     scx_ops_error_irq_workfn+0x40/0x50
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     irq_work_run_list+0x53/0x90
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     irq_work_run+0x18/0x50
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     __sysvec_irq_work+0x1c/0xb0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     sysvec_irq_work+0x6c/0x90
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     asm_sysvec_irq_work+0x1a/0x20
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     scx_watchdog_workfn+0x16d/0x1e0
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     process_one_work+0x19d/0x360
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     worker_thread+0x2fa/0x490
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     kthread+0xd2/0x100
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     ret_from_fork+0x34/0x50
Jun 13 05:27:08 SoulHarsh007 bash[1097]:     ret_from_fork_asm+0x1a/0x30
Jun 13 05:27:08 SoulHarsh007 bash[1097]: ================================================================================
Jun 13 05:27:08 SoulHarsh007 bash[1097]: 23:57:08 [INFO] Unregister RustLand scheduler
Jun 13 05:27:08 SoulHarsh007 bash[1097]: Error: EXIT: runnable task stall (panel-91-system[2686] failed to run for 5.761s)

Considering that scx_rustland doesn't manage tasks running in the SCHED_IDLE class, I'm wondering if we're missing some logic to reserve CPU bandwidth to the tasks running in the lower scheduling class.

Edit: tasks running in SCHED_IDLE class are indeed managed by scx_rustland, so it definitely seems a scx_rustland issue.

scx_qmap NOHZ tick-stop error and unrelated question

System specs:
up to date arch with cachyos repos and cachyos kernel
cpu 5800X3D
gpu 3060ti nvidia prop
ram 64gb ecc

dmesg from time to time shows this: NOHZ tick-stop error: local softirq work is pending, handler #40!!!, though I didn't notice any negative impacts from that.

Unrelated, but I have a question, this scheduler's note says:

This scheduler is primarily for demonstration and testing of sched_ext features and unlikely to be useful for actual workloads.

But on 5800X3D this scheduler gives me the best 0.1%min, 1%min fps compared to EEVDF or CFS (which otherwise are second best results) while avg is more or less at the same level in proton and native games (tested with Elden Ring and Project Zomboid using mangohud benchmark over 5 min in a controlled environment). Meanwhile LAVD causes stutters and almost half of the avg framerate, which is I guess expected because it is not a single CCX cpu.

So why scx_qmap is considered "unlikely to be useful for actual workloads"? From my experience it is the best scheduler I happened to use with my CPU so far and I daily drive it for the past week.

scx_central causes system hang up on 6.7.1-scx1-2

Upon starting scx_central, the system hangs up immediately. After the hang-up, neither keyboard input is responsive nor does the system switch to a crash kernel, making it difficult to gather information. Nonetheless, I have created this issue for now. Please let me know if any additional information is needed.

OS: Arch Linux (6.7.1-scx1-2)
CPU: i7-1165G7
scx: a4ff395

Compilation fail to "generate libbpf with custom command"

While trying to compile the entire project, following the steps outilined on the README.md, it gives error while generating libbpf with custom command. The error log (in file below) talk about various unexpected arguments.

Extra information which could be useful:

  • last week I did build it successfully, but now i don't seem to be able to build even older version (i don't remember the exact version I built)
  • the only change in the system was the installation of rustup instead of rust (but even reinstalling rust doesn't fix it) and package upgrade on Arch Linux.
  • trying to build with dynamic linking also fail at step "[11/39] Generating bpftool_target with a custom command" with very similar errors.

Sorry in advance if it is a stupid problem on my parts, but I cannot seem to be able to fix it.

log.txt

Something went wrong with the 'timeout_ms' field in the 'struct sched_ext_ops' structure

I'am interest in sched_ext and try it on debian on qemu, with sched_ext kernel(sched_ext/d93dc).Some schedulers works fine but BPF code that utilizes the timeout_ms field generates 'Failed to attach struct_ops' errors. This occurs both on scx_qmap and scx_rustland scheduler, built from scx version 0.1.7,.

$ ./scx_qmap
scx_qmap.c:88 [scx panic]: Argument list too long
Failed to attach struct_op

------

$ ./scx_rustland
Error: Failed to attach struct ops

Caused by:
    bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL

schedulers work fine after removing timeout_ms field

$ ./test/scx_qmap 
enq=0, dsp=0, delta=0, reenq=0, deq=0, core=0
enq=80, dsp=80, delta=0, reenq=0, deq=0, core=0
enq=156, dsp=156, delta=0, reenq=0, deq=0, core=0
^CEXIT: BPF scheduler unregistered

--- 

$ ./test/scx_rustland 
06:24:54 [INFO] RustLand scheduler attached
06:24:55 [INFO] vruntime=105489434
06:24:55 [INFO]   tasks=7
06:24:55 [INFO]   nr_user_dispatches=18 nr_kernel_dispatches=144
06:24:55 [INFO]   nr_cancel_dispatches=0 nr_bounce_dispatches=0
06:24:55 [INFO]   nr_waiting=1 [nr_queued=1 + nr_scheduled=0]
06:24:55 [INFO]   nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]
06:24:55 [INFO] time slice = 20000 us
06:24:55 [INFO] slice boost = 200

Same results on newest sched_ext kernel

Is there any way to solve this problem?

linux-tools-6.8.0-32 is not installable

Hello,

In the last step of the 'installation' procedure, sudo apt build-dep scx produces the error: builddeps:scx : Depends: linux-tools-6.8.0-32 but it is not installable. I have run apt search linux-tools, but it seems that linux-tools-6.8.0-32 is not available. Additionally, since my current kernel version is '6.8.0-38-generic', I installed linux-tools-6.8.0-38-generic. However, the same error still occurs when running sudo apt build-dep scx. Would you please advise me on how to resolve this issue?

Thanks.

[Bug Report] `scx_lavd` Error: Failed to load BPF program

Summary

I tried to compile scx_lavd with this particular commit 04c9e7f and ended up with the following errors:

⯁ nyx git:(main) ✗ ❯❯❯ sudo ./result/bin/scx_lavd
Error: Failed to load BPF program

Caused by:
    Invalid argument (os error 22)

Expectation

Expect scx_lavd works as usual.

Additional information

Linux Kernel

⯁ ~ ❯❯❯ uname -ra
Linux nixos-nuc-12 6.8.2-cachyos #1-NixOS SMP PREEMPT_DYNAMIC Tue Mar 26 22:23:34 UTC 2024 x86_64 GNU/Linux

With commit 5bfd90b, scx_lavd was working as expected.

Any workarounds would be appreciated. Thanks.

scx_lavd fails to load

Since kernel update 6.8.9 and scx-scheds 0.1.9.r40.g07b521b-1 scx_lavd fails to load with error Operation not supported (os error 95).

scx_lavd.log

scx_rustland_core: Trying to allocate too large size for scx_dsp_ctx from percpu allocator

I was trying to play with the scx_rustland scheduler on the sched_ext kernel, but an error happened to me.

$ sudo ./scx_rustland
Error: Failed to attach struct_ops BPF programs

Caused by:
    bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL

Also from dmesg we have some information for the issue.

[ 2605.866281] ------------[ cut here ]------------
[ 2605.866320] illegal size (32792) or align (8) for percpu allocation
[ 2605.866348] WARNING: CPU: 9 PID: 592 at mm/percpu.c:1754 pcpu_alloc+0x82b/0x920
[ 2605.866379] Modules linked in:
[ 2605.866396] CPU: 9 PID: 592 Comm: scx_rustland Tainted: G        W          6.9.0-rc7-g0042b59beb1b #7
[ 2605.866406] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 2605.866414] Sched_ext: rustland (prepping)
[ 2605.866419] RIP: 0010:pcpu_alloc+0x82b/0x920
[ 2605.866430] Code: c6 d3 96 5f 8e e9 19 ff ff ff f7 c3 00 20 00 00 0f 85 6d ff ff ff 90 48 c7 c7 be 65 52 8e 48 89 ee 4c 89 ea e8 76 0c dc ff 90 <0f> 0b 90 90 e9 50 ff ff ff e8 c7 f1 da 00 ff 0d 71 33 08 02 0f 85
[ 2605.866439] RSP: 0018:ffffb27d0020bc30 EFLAGS: 00000246
[ 2605.866446] RAX: 96301220be868d00 RBX: 0000000000000cc0 RCX: ffffffff8ee55000
[ 2605.866451] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: ffffffff8ee85290
[ 2605.866455] RBP: 0000000000008018 R08: 0000000000001fff R09: ffffffff8ee55290
[ 2605.866460] R10: 0000000000005ffd R11: 0000000000000004 R12: 0000000000008000
[ 2605.866464] R13: 0000000000000008 R14: 0000000000000000 R15: 000000000000801b
[ 2605.866469] FS:  00007f02b0c4b7c0(0000) GS:ffff91a2bea40000(0000) knlGS:0000000000000000
[ 2605.866475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2605.866479] CR2: 0000563f68daa080 CR3: 0000000003870000 CR4: 00000000000006f0
[ 2605.866485] Call Trace:
[ 2605.866498]  <TASK>
[ 2605.866510]  ? __warn+0xc4/0x1c0
[ 2605.866522]  ? pcpu_alloc+0x82b/0x920
[ 2605.866530]  ? report_bug+0x148/0x1e0
[ 2605.866542]  ? handle_bug+0x3e/0x70
[ 2605.866550]  ? exc_invalid_op+0x1a/0x50
[ 2605.866557]  ? asm_exc_invalid_op+0x1a/0x20
[ 2605.866569]  ? pcpu_alloc+0x82b/0x920
[ 2605.866575]  ? bpf_prog_60538423f94f9923_rustland_init+0x1ac/0x1f0
[ 2605.866586]  ? __bpf_prog_exit_sleepable+0x24/0xa0
[ 2605.866621]  bpf_scx_reg+0x3f1/0xf90
[ 2605.866648]  bpf_struct_ops_link_create+0xdd/0x140
[ 2605.866660]  __sys_bpf+0x2e9/0x550
[ 2605.866674]  __x64_sys_bpf+0x17/0x20
[ 2605.866680]  do_syscall_64+0xd2/0x1b0
[ 2605.866689]  ? irqentry_exit_to_user_mode+0x38/0x130
[ 2605.866697]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 2605.866707] RIP: 0033:0x7f02b0d6e88d
[ 2605.866717] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[ 2605.866722] RSP: 002b:00007ffc8333b418 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 2605.866730] RAX: ffffffffffffffda RBX: 0000556112c35f90 RCX: 00007f02b0d6e88d
[ 2605.866734] RDX: 0000000000000040 RSI: 00007ffc8333b500 RDI: 000000000000001c
[ 2605.866738] RBP: 00007ffc8333b430 R08: 00007ffc8333b500 R09: 00007ffc8333b500
[ 2605.866742] R10: 0000556146125bf0 R11: 0000000000000202 R12: 8000000000000000
[ 2605.866746] R13: 8000000000000000 R14: 00007ffc8333ef68 R15: 00007ffc8333ee28
[ 2605.866755]  </TASK>
[ 2605.866759] ---[ end trace 0000000000000000 ]---
[ 2605.896250] sched_ext: BPF scheduler "rustland" errored, disabling
[ 2605.896287] sched_ext: runtime error

After exploring, I found that the problem comes from the function __alloc_percpu() under scx_ops_enable(). It seems that in the scx_rustland_core, the setting of ops->dispatch_max_batch leads to a too large allocated size for percpu allocator, and it will be unhappy if we want a size larger than PCPU_MIN_UNIT_SIZE. As a result, If I reduce MAX_ENQUEUED_TASKS which is related to ops->dispatch_max_batch, and recompile the scheduler, it seems that I can attach the scheduler successfully.

Is adjusting the ops->dispatch_max_batch dynamically to avoid it getting over PCPU_MIN_UNIT_SIZE a solution?

Kernel OOPS under high memory pressure

The system was under high memory pressure, and this is when this kernel oops was triggered, I am not sure if high memory pressure was the root cause here.

Full Log: https://paste.cachyos.org/p/4af5d85

Relevant Trace:

Jun 18 20:37:40 SoulHarsh007 kernel: ------------[ cut here ]------------
Jun 18 20:37:40 SoulHarsh007 kernel: WARNING: CPU: 7 PID: 1 at kernel/sched/ext.c:3725 scx_cgroup_can_attach+0x196/0x340
Jun 18 20:37:40 SoulHarsh007 kernel: Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables br_netfilter bridge stp llc overlay rfcomm snd_seq_dummy snd_seq_midi snd_hrtimer snd_seq_midi_event snd_seq cmac algif_hash algif_skcipher af_alg bnep vmnet(OE) nct6683 iwlmvm mac80211 libarc4 ptp pps_core uvcvideo videobuf2_vmalloc uvc videobuf2_memops snd_usb_audio videobuf2_v4l2 snd_usbmidi_lib iwlwifi videodev snd_ump snd_rawmidi videobuf2_common snd_seq_device r8169 mc realtek mdio_devres cfg80211 libphy btusb btrtl btintel btbcm btmtk bluetooth rfkill crc16 mousedev joydev hid_generic snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi amd_atl intel_rapl_msr intel_rapl_common vfat snd_hda_intel fat snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core kvm_amd snd_hwdep razermouse(OE) usbhid snd_pcm snd_timer kvm snd gpio_amdpt soundcore wmi_bmof
Jun 18 20:37:40 SoulHarsh007 kernel:  i2c_piix4 pcspkr gpio_generic rapl mac_hid lz4 lz4_compress dm_mod loop nfnetlink zram ip_tables x_tables crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme aesni_intel gf128mul nvme_core crypto_simd cryptd ccp xhci_pci nvme_auth zenpower(OE) xhci_pci_renesas amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq winesync(OE) vmmon(OE) vmw_vmci vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) i2c_dev sg crypto_user
Jun 18 20:37:40 SoulHarsh007 kernel: CPU: 7 PID: 1 Comm: systemd Tainted: G S         OE      6.10.0-rc4-1-cachyos-rc #1 5c580bd5f712751ef3d9c6e2d14b3ccfe6c31623
Jun 18 20:37:40 SoulHarsh007 kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C91/MAG B550 TOMAHAWK (MS-7C91), BIOS A.G0 03/12/2024
Jun 18 20:37:40 SoulHarsh007 kernel: Sched_ext: rustland (enabled+all), task: runnable_at=-3ms
Jun 18 20:37:40 SoulHarsh007 kernel: RIP: 0010:scx_cgroup_can_attach+0x196/0x340
Jun 18 20:37:40 SoulHarsh007 kernel: Code: ff 48 8b 04 24 48 c7 c5 80 f0 8d a8 48 85 c0 0f 85 50 ff ff ff 48 83 bb a8 03 00 00 00 48 c7 c0 80 f0 8d a8 0f 84 5a ff ff ff <0f> 0b e9 53 ff ff ff 80 3d cf 8b 27 02 00 0f 85 5d ff ff ff ba 01
Jun 18 20:37:40 SoulHarsh007 kernel: RSP: 0018:ffffad554006fa30 EFLAGS: 00010286
Jun 18 20:37:40 SoulHarsh007 kernel: RAX: ffff958640fdc000 RBX: ffff9586fb3da600 RCX: 0000000000000001
Jun 18 20:37:40 SoulHarsh007 kernel: RDX: ffffffffa88df080 RSI: ffffad554006fa30 RDI: ffffad554006fb20
Jun 18 20:37:40 SoulHarsh007 kernel: RBP: ffffffffa88df080 R08: ffff958ac5e8f400 R09: ffffad554006fb20
Jun 18 20:37:40 SoulHarsh007 kernel: R10: 0000000000000001 R11: 0000000000002000 R12: ffff958640893900
Jun 18 20:37:40 SoulHarsh007 kernel: R13: ffffad554006fb20 R14: ffffad554006fa30 R15: ffff9586fb3da600
Jun 18 20:37:40 SoulHarsh007 kernel: FS:  00007c6fca2af880(0000) GS:ffff958c41d80000(0000) knlGS:0000000000000000
Jun 18 20:37:40 SoulHarsh007 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 18 20:37:40 SoulHarsh007 kernel: CR2: 000061945cd3b000 CR3: 0000000307fbc000 CR4: 0000000000f506f0
Jun 18 20:37:40 SoulHarsh007 kernel: PKRU: 55555554
Jun 18 20:37:40 SoulHarsh007 kernel: Call Trace:
Jun 18 20:37:40 SoulHarsh007 kernel:  <TASK>
Jun 18 20:37:40 SoulHarsh007 kernel:  ? scx_cgroup_can_attach+0x196/0x340
Jun 18 20:37:40 SoulHarsh007 kernel:  ? __warn.cold+0x8e/0xf3
Jun 18 20:37:40 SoulHarsh007 kernel:  ? scx_cgroup_can_attach+0x196/0x340
Jun 18 20:37:40 SoulHarsh007 kernel:  ? report_bug+0xe7/0x200
Jun 18 20:37:40 SoulHarsh007 kernel:  ? handle_bug+0x3c/0x80
Jun 18 20:37:40 SoulHarsh007 kernel:  ? exc_invalid_op+0x19/0xc0
Jun 18 20:37:40 SoulHarsh007 kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jun 18 20:37:40 SoulHarsh007 kernel:  ? scx_cgroup_can_attach+0x196/0x340
Jun 18 20:37:40 SoulHarsh007 kernel:  cgroup_migrate_execute+0x5b1/0x700
Jun 18 20:37:40 SoulHarsh007 kernel:  ? cgroup_migrate+0x15f/0x360
Jun 18 20:37:40 SoulHarsh007 kernel:  cgroup_attach_task+0x296/0x400
Jun 18 20:37:40 SoulHarsh007 kernel:  ? cgroup_attach_permissions+0x8b/0x230
Jun 18 20:37:40 SoulHarsh007 kernel:  __cgroup_procs_write+0x128/0x140
Jun 18 20:37:40 SoulHarsh007 kernel:  cgroup_procs_write+0x17/0x30
Jun 18 20:37:40 SoulHarsh007 kernel:  kernfs_fop_write_iter+0x141/0x1f0
Jun 18 20:37:40 SoulHarsh007 kernel:  vfs_write+0x31d/0x4a0
Jun 18 20:37:40 SoulHarsh007 kernel:  __x64_sys_write+0x72/0xf0
Jun 18 20:37:40 SoulHarsh007 kernel:  do_syscall_64+0x82/0x160
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? __x64_sys_openat+0x1f5/0x230
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? syscall_exit_to_user_mode+0x76/0x1f0
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? do_syscall_64+0x8e/0x160
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? syscall_exit_to_user_mode+0x76/0x1f0
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? do_syscall_64+0x8e/0x160
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? syscall_exit_to_user_mode+0x76/0x1f0
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? __x64_sys_fcntl+0x98/0xd0
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? syscall_exit_to_user_mode+0x76/0x1f0
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? do_syscall_64+0x8e/0x160
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? do_syscall_64+0x8e/0x160
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? syscall_exit_to_user_mode+0x76/0x1f0
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? do_syscall_64+0x8e/0x160
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  ? do_syscall_64+0x8e/0x160
Jun 18 20:37:40 SoulHarsh007 kernel:  ? irq_exit_rcu+0x53/0xc0
Jun 18 20:37:40 SoulHarsh007 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jun 18 20:37:40 SoulHarsh007 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jun 18 20:37:40 SoulHarsh007 kernel: RIP: 0033:0x7c6fc9d175a4
Jun 18 20:37:40 SoulHarsh007 kernel: Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d a5 6a 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
Jun 18 20:37:40 SoulHarsh007 kernel: RSP: 002b:00007ffdf24f8a48 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Jun 18 20:37:40 SoulHarsh007 kernel: RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007c6fc9d175a4
Jun 18 20:37:40 SoulHarsh007 kernel: RDX: 0000000000000005 RSI: 00007ffdf24f8c3a RDI: 0000000000000008
Jun 18 20:37:40 SoulHarsh007 kernel: RBP: 00007ffdf24f8c3a R08: 0000000000000005 R09: 0000000000000000
Jun 18 20:37:40 SoulHarsh007 kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000005
Jun 18 20:37:40 SoulHarsh007 kernel: R13: 000061945cd36b10 R14: 00007ffdf24f8c3a R15: 0000000000000005
Jun 18 20:37:40 SoulHarsh007 kernel:  </TASK>
Jun 18 20:37:40 SoulHarsh007 kernel: ---[ end trace 0000000000000000 ]---

scx_rustland eventually times out for no explicable reason

I've tried twice running rustland with linux-cachyos 6.7.0-4, and it times out within a few minutes to an hour of running. I previously ran it in a TTY with exec, so I had no output after it died. The second time I ran it, I ran it under tmux, and captured that it stopped updating status output for 3 seconds, then updated status again, and immediately WARN output due to the 5s watchdog timing out, and was terminated.

"rusty" scheduler causing system hang at login.

I use CachyOS (Arch based) and have the scx service enabled, with the rusty scheduler specified. After logging in after first boot I get a black screen for several minutes that can only be skipped by switching to another tty and back. The issue is not present when using scx_simple for example.

CachyOS Linux x86_64
Kernel: 6.9.3-4-cachyos (issue persists on 6.10.rc3-1 too)
DE: KDE Plasma 6.0.5
CPU: AMD Ryzen 9 5950X

Below is the excerpt from my most recent boot:

Jun 13 12:38:26 ra kernel: sched_ext: BPF scheduler "rusty" errored, disabling
Jun 13 12:38:26 ra kernel: sched_ext: runnable task stall (kworker/31:1[311] failed to run for 13.736s)
Jun 13 12:38:26 ra kernel:    scx_watchdog_workfn+0x154/0x1e0
Jun 13 12:38:26 ra kernel:    process_one_work+0x18e/0x350
Jun 13 12:38:26 ra kernel:    worker_thread+0x2fa/0x490
Jun 13 12:38:26 ra kernel:    kthread+0xd2/0x100
Jun 13 12:38:26 ra kernel:    ret_from_fork+0x34/0x50
Jun 13 12:38:26 ra kernel:    ret_from_fork_asm+0x1a/0x30

Below is an excerpt from a previous boot with more information:

May 09 13:14:15 ra kernel: sched_ext: BPF scheduler "rusty" errored, disabling 
May 09 13:14:15 ra kernel: sched_ext: runnable task stall (kworker/31:1[311] failed to run for 39.079s) 
May 09 13:14:15 ra kernel:   scx_watchdog_workfn+0x154/0x1e0 
May 09 13:14:15 ra kernel:   process_one_work+0x193/0x3c0 
May 09 13:14:15 ra kernel:   worker_thread+0x393/0x540 
May 09 13:14:15 ra kernel:   kthread+0xd2/0x100 
May 09 13:14:15 ra kernel:   ret_from_fork+0x34/0x50 
May 09 13:14:15 ra kernel:   ret_from_fork_asm+0x1a/0x30 
May 09 13:14:15 ra systemd[1]: scx.service: Main process exited, code=exited, status=1/FAILURE 
May 09 13:14:15 ra systemd[1]: scx.service: Failed with result 'exit-code'. 
May 09 13:14:16 ra systemd[1]: scx.service: Scheduled restart job, restart counter is at 1. 
May 09 13:14:16 ra systemd[1]: Started Start scx_scheduler. 
May 09 13:14:16 ra systemd[1246]: plasma-ksplash.service: start operation timed out. Terminating. 
May 09 13:14:26 ra kernel: rcu_tasks_wait_gp: rcu_tasks grace period number 41 (since boot) is 10096 jiffies old. 
May 09 13:14:26 ra xdg-desktop-por[1297]: Failed to create file chooser proxy: Error calling StartServiceByName for org.freedesktop.impl.portal.desktop.kde: Timeout was reached 
May 09 13:14:26 ra xdg-desktop-por[1297]: No skeleton to export 
May 09 13:14:36 ra plasma_waitforname[1284]: org.kde.knotifications: WaitForName: Service was not registered within timeout 
May 09 13:14:36 ra systemd[1246]: dbus-:[email protected]: Main process exited, code=exited, status=1/FAILURE 
May 09 13:14:36 ra systemd[1246]: dbus-:[email protected]: Failed with result 'exit-code'. 
May 09 13:14:47 ra kernel: sched_ext: BPF scheduler "rusty" errored, disabling 
May 09 13:14:47 ra kernel: sched_ext: runnable task stall (kworker/31:1H[426] failed to run for 30.658s) 
May 09 13:14:47 ra kernel:   scx_watchdog_workfn+0x154/0x1e0 
May 09 13:14:47 ra kernel:   process_one_work+0x193/0x3c0 
May 09 13:14:47 ra kernel:   worker_thread+0x393/0x540 
May 09 13:14:47 ra kernel:   kthread+0xd2/0x100 
May 09 13:14:47 ra kernel:   ret_from_fork+0x34/0x50 
May 09 13:14:47 ra kernel:   ret_from_fork_asm+0x1a/0x30 
May 09 13:14:47 ra systemd[1]: scx.service: Main process exited, code=exited, status=1/FAILURE 
May 09 13:14:47 ra systemd[1]: scx.service: Failed with result 'exit-code'. 
May 09 13:14:47 ra systemd[1]: scx.service: Scheduled restart job, restart counter is at 2. 
May 09 13:14:47 ra systemd[1]: Started Start scx_scheduler. 
May 09 13:14:51 ra xdg-desktop-por[1297]: Failed to create app chooser proxy: Error calling StartServiceByName for org.freedesktop.impl.portal.desktop.kde: Timeout was reached 
May 09 13:14:51 ra xdg-desktop-por[1297]: No skeleton to export 
May 09 13:14:56 ra kernel: rcu_tasks_wait_gp: rcu_tasks grace period number 41 (since boot) is 40208 jiffies old. 
May 09 13:14:56 ra systemd[1246]: plasma-ksplash.service: State 'stop-sigterm' timed out. Killing. 
May 09 13:14:56 ra systemd[1246]: plasma-ksplash.service: Killing process 1293 (ksplashqml) with signal SIGKILL. 
May 09 13:15:06 ra systemd[1246]: plasma-kcminit.service: start operation timed out. Terminating. 
May 09 13:15:06 ra systemd[1246]: xdg-desktop-portal.service: start operation timed out. Terminating. 
May 09 13:15:06 ra systemd[1246]: xdg-desktop-portal.service: Failed with result 'timeout'. 
May 09 13:15:06 ra systemd[1246]: Failed to start Portal service. 
May 09 13:15:17 ra systemd[1246]: Started Konsole - Terminal. 
May 09 13:15:17 ra systemd[1246]: Starting Portal service... 
May 09 13:15:18 ra systemd[1246]: plasma-ksplash.service: Main process exited, code=killed, status=15/TERM 
May 09 13:15:18 ra systemd[1246]: plasma-ksplash.service: Failed with result 'timeout'. 
May 09 13:15:18 ra systemd[1246]: Failed to start Splash screen shown during boot. 
May 09 13:15:18 ra systemd[1246]: plasma-kcminit.service: Failed with result 'timeout'. 
May 09 13:15:18 ra systemd[1246]: Failed to start KDE Config Module Initialization. 
May 09 13:15:18 ra systemd[1246]: Dependency failed for KDE Configuration Module Initialization (Phase 1). 
May 09 13:15:18 ra systemd[1246]: plasma-kcminit-phase1.service: Job plasma-kcminit-phase1.service/start failed with result 'dependency'. 
May 09 13:15:18 ra kernel: sched_ext: BPF scheduler "rusty" errored, disabling 
May 09 13:15:18 ra kernel: sched_ext: runnable task stall (Xwayland[1374] failed to run for 30.634s) 
May 09 13:15:18 ra kernel:   scx_watchdog_workfn+0x154/0x1e0 
May 09 13:15:18 ra kernel:   process_one_work+0x193/0x3c0 
May 09 13:15:18 ra kernel:   worker_thread+0x393/0x540 
May 09 13:15:18 ra kernel:   kthread+0xd2/0x100 
May 09 13:15:18 ra kernel:   ret_from_fork+0x34/0x50 
May 09 13:15:18 ra kernel:   ret_from_fork_asm+0x1a/0x30 
May 09 13:15:18 ra systemd[1]: scx.service: Main process exited, code=exited, status=1/FAILURE 
May 09 13:15:18 ra systemd[1]: scx.service: Failed with result 'exit-code'. 
May 09 13:15:18 ra kwin_wayland[1290]: kf.windowsystem: static bool KX11Extras::mapViewport() may only be used on X11 
May 09 13:15:18 ra systemd[1246]: Starting KDE Session Management Server... 
May 09 13:15:18 ra kwin_wayland_wrapper[1484]: The XKEYBOARD keymap compiler (xkbcomp) reports: 
May 09 13:15:18 ra kwin_wayland_wrapper[1484]: > Warning:        Unsupported maximum keycode 708, clipping. 
May 09 13:15:18 ra kwin_wayland_wrapper[1484]: >                 X11 cannot support keycodes above 255. 
May 09 13:15:18 ra systemd[1246]: Started Unlock kwallet from pam credentials. 
May 09 13:15:18 ra kwin_wayland_wrapper[1484]: > Warning:        Could not resolve keysym XF86KbdInputAssistPrevgrou 
May 09 13:15:18 ra kwin_wayland_wrapper[1484]: > Warning:        Could not resolve keysym XF86KbdInputAssistNextgrou 
May 09 13:15:18 ra kwin_wayland_wrapper[1484]: Errors from xkbcomp are not fatal to the X server 
May 09 13:15:18 ra systemd[1246]: Starting KDE Daemon 6... 
May 09 13:15:18 ra pam_kwallet_init[1490]: 2024/05/09 13:15:18 socat[1490] W address is opened in read-write mode but only supports read-only 
May 09 13:15:18 ra kcminit[1486]: Initializing "/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_fonts.so" 
May 09 13:15:18 ra kcminit[1486]: Initializing "/usr/lib/qt6/plugins/plasma/kcms/systemsettings/kcm_style.so" 
May 09 13:15:18 ra systemd[1246]: Started KDE Daemon 6. 
May 09 13:15:18 ra kded6[1487]: org.kde.libkbolt: Failed to connect to Bolt manager DBus interface: 
May 09 13:15:18 ra kded6[1487]: org.kde.bolt.kded: Couldn't connect to Bolt DBus daemon 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.4972] agent-manager: agent[c3df4899cf53b27f,:1.38/org.kde.plasma.networkmanagement/1000]: agent registered 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.4973] policy: auto-activating connection 'VM9490937' (9401069b-a2e6-45e4-852a-c1f48f44da58) 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.4975] device (wlan0): Activation: starting connection 'VM9490937' (9401069b-a2e6-45e4-852a-c1f48f44da58) 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.4975] device (wlan0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.4977] manager: NetworkManager state is now CONNECTING 
May 09 13:15:18 ra kded6[1487]: QDBusObjectPath: invalid path "/modules/plasma-session-shortcuts" 
May 09 13:15:18 ra kded6[1487]: kf.dbusaddons: The kded module name "plasma-session-shortcuts" is invalid! 
May 09 13:15:18 ra systemd[1]: scx.service: Scheduled restart job, restart counter is at 3. 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.5076] device (wlan0): set-hw-addr: reset MAC address to 7C:50:79:07:6F:DE (preserve) 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.5086] device (wlan0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.5087] device (wlan0): Activation: (wifi) access point 'VM9490937' has security, but secrets are required. 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.5087] device (wlan0): state change: config -> need-auth (reason 'none', sys-iface-state: 'managed') 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.5088] sup-iface[36a9694bd6cd1c04,0,wlan0]: wps: type pbc start... 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.5090] device (wlan0): supplicant interface state: disconnected -> inactive 
May 09 13:15:18 ra NetworkManager[1054]: <info> [1715256918.5090] device (p2p-dev-wlan0): supplicant management interface state: disconnected -> inactive 
May 09 13:15:18 ra wpa_supplicant[1111]: wlan0: WPS-PBC-ACTIVE 
May 09 13:15:18 ra systemd[1246]: Started dbus-:[email protected]. 
May 09 13:15:18 ra systemd[1246]: Started KDE Session Management Server. 
May 09 13:15:18 ra systemd[1]: Started Start scx_scheduler. 
May 09 13:15:18 ra systemd[1246]: Starting KDE Plasma Workspace...

This is one of my first issue submissions so please let me know if you need any more information or if Ive done anything wrong.

[Bug Report] Not able to start scx programs with Kernel `6.8.1-cachyos-lto`

Summary

As the title suggests. It was working 3 days ago with Kernel 6.8.0-cachyos-lto before I sync up the upstream changes.

Addtional Info

Hardware:

⯁ flake git:(master) ✗ ❯❯❯ neofetch
          ▗▄▄▄       ▗▄▄▄▄    ▄▄▄▖            kev@nixos-x1-carbon
          ▜███▙       ▜███▙  ▟███▛            -------------------
           ▜███▙       ▜███▙▟███▛             OS: NixOS 24.05.20240316.c75037b (Uakari) x86_64
            ▜███▙       ▜██████▛              Host: LENOVO 21KC0000CD
     ▟█████████████████▙ ▜████▛     ▟▙        Kernel: 6.8.1-cachyos
    ▟███████████████████▙ ▜███▙    ▟██▙       Uptime: 4 mins
           ▄▄▄▄▖           ▜███▙  ▟███▛       Packages: 1184 (nix-system), 1367 (nix-user), 6 (flatpak)
          ▟███▛             ▜██▛ ▟███▛        Shell: fish 3.7.0
         ▟███▛               ▜▛ ▟███▛         Resolution: 2880x1800
▟███████████▛                  ▟██████████▙   DE: Hyprland (Wayland)
▜██████████▛                  ▟███████████▛   WM: .Hyprland-wrapp
      ▟███▛ ▟▙               ▟███▛            Theme: Flat-Remix-GTK-Grey-Darkest [GTK2/3]
     ▟███▛ ▟██▙             ▟███▛             Icons: Papirus-Dark [GTK2/3]
    ▟███▛  ▜███▙           ▝▀▀▀▀              Terminal: alacritty
    ▜██▛    ▜███▙ ▜██████████████████▛        CPU: Intel Ultra 7 155H (22) @ 1.400GHz
     ▜▛     ▟████▙ ▜████████████████▛         GPU: Intel Arc Graphics]
           ▟██████▙       ▜███▙               Memory: 2921MiB / 31778MiB
          ▟███▛▜███▙       ▜███▙
         ▟███▛  ▜███▙       ▜███▙
         ▝▀▀▀    ▀▀▀▀▘       ▀▀▀▘

CPU: Intel(R) Core(TM) Ultra 7 155H

Platform: NixOS 24.05.20240316.c75037b (Uakari) x86_64

Kernel: 6.8.1-cachyos-lto

Logs:

 flake git:(master) ✗ ❯❯❯ sudo scx_rustland
thread 'main' panicked at src/main.rs:242:36:
Failed to build host topology: Failed to open or read file "/sys/devices/system/node/node0/cpu0/cache/index3/id"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
✖ 101 flake git:(master) ✗ ❯❯❯ sudo scx_lavd
thread 'main' panicked at src/main.rs:115:36:
Failed to build host topology: Failed to open or read file "/sys/devices/system/node/node0/cpu0/cache/index3/id"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

sched/c/scx_nest: task priority is not respected properly

Code in scx_next obviously tries to take p->scx.weight into account and priorities (or nice values) should work.

However, scheduling two tasks on the same rq with +5 and -5 nice values results in roughly 50%-50% run time, while such values should give you roughly 10% and 90% in CFS.

Some initial investigations lead me to this code block in nest_enqueue()

        /*
         * Limit the amount of budget that an idling task can accumulate
         * to one slice.
         */ 
        if (vtime_before(vtime, vtime_now - slice_ns))
                vtime = vtime_now - slice_ns;

The line right below it

        scx_bpf_dispatch_vtime(p, FALLBACK_DSQ_ID, slice_ns, vtime,
                               enq_flags);

will then bring the nice(-5) task's vtime to roughly the same as that of the nice(5) task, therefore ruining the previous slow accumulation of vtime and ruining how priorities work.

Removing the vtime fix here and just use the normal version scx_bpf_dispatch() without passing vtime gives the correct task run time distribution of 10% and 90%.

scx_lavd: Failed to load BPF program

Description

After sync with the latest repo, I tried to recompile and load the scx_lavd scheduler into the kernel, and the error message bump up.

Error: Failed to load BPF program

Caused by:
    Operation not supported (os error 95)

Also looked at dmesg for more information, but empty message generated in kernel space.

Expected behavior

scx_lavd should be able to load into the kernel and work as expected.

Too few runtime for I/O intensive user space program

Description

I tried to use the following code snippet to generate a mixture workload of I/O-bound load and CPU-bound load , aiming to cause CPU load unbalance which can trigger task migration to happen.

import threading
import os
import time

def io_bound_task(file_path, size_mb):
    with open(file_path, 'wb') as f:
        f.write(os.urandom(size_mb * 1024 * 1024))

    with open(file_path, 'rb') as f:
        data = f.read()

    os.remove(file_path)

def generate_io_load(num_threads, size_mb):
    threads = []
    for i in range(num_threads):
        file_path = f'test_file_{i}.dat'
        t = threading.Thread(target=io_bound_task, args=(file_path, size_mb))
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

def cpu_bound_task(duration):
    end_time = time.time() + duration
    while time.time() < end_time:
        result = 0
        for i in range(10000):
            result += i ** 2

def generate_cpu_load(num_threads, duration):
    threads = []
    for i in range(num_threads):
        t = threading.Thread(target=cpu_bound_task, args=(duration,))
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

if __name__ == "__main__":
    io_threads = threading.Thread(target=generate_io_load, args=(1200, 100))
    cpu_threads = threading.Thread(target=generate_cpu_load, args=(12, 300))

    io_threads.start()
    cpu_threads.start()

    io_threads.join()
    cpu_threads.join()

After executing the code snippet, I use the following command to monitor the scheduling bahavior or EEVDF scheduler and scx_rusty.

$ sudo perf sched record -ag sleep 30
$ sudo perf sched latency > eevdf.txt
// switch to scx_rusty
$ sudo perf sched record -ag sleep 30
$ sudo perf sched latency > scx_rusty.txt

The result are in the attached files below, what is weird is that the python program gets 58276.834 ms under EEVDF scheduler but only 0.000 ms in scx_rusty ( the code executed fine under scx_rusty ), while the switches times seems to be reasonable for both scheduler.

The weird behavior doesn't only appear in scx_rusty but nearly all scheduler under scx , including scx_simple , scx_rustland and so on. The python program all progress normally under each scheduler but perf sched shows 0 ms runtime for each of them.

Is it the issue of scx or the I should add more options when using perf sched ?

Expected behavior

the python program should get more than 0.000 ms runtime under scx_rusty

[build failure] fatal error: 'bits/libc-header-start.h' file not found

Hi there,

I'm trying to build SCX from today's GIT code on Debian 12, but I get this:

[2/32] Generating 'scheds/c/scx_qmap.p/scx_qmap.bpf.o'
FAILED: scheds/c/scx_qmap.p/scx_qmap.bpf.o 
/usr/bin/clang -g -O2 -Wall -Wno-compare-distinct-pointer-types -D__TARGET_ARCH_x86 -mcpu=v3 -mlittle-endian '-idirafter /usr/lib/llvm-18/lib/clang/18/include' '-idirafter /usr/local/include' '-idirafter /usr/include/x86_64-linux-gnu' '-idirafter /usr/include' -target bpf -I /root/src/cachyos/scx/b4/libbpf/src/usr/include -I /root/src/cachyos/scx/b4/libbpf/include/uapi -I /root/src/cachyos/scx/scheds/include -I /root/src/cachyos/scx/scheds/include/vmlinux -I /root/src/cachyos/scx/scheds/include/bpf-compat -c ../scheds/c/scx_qmap.bpf.c -o scheds/c/scx_qmap.p/scx_qmap.bpf.o
In file included from ../scheds/c/scx_qmap.bpf.c:26:
/usr/include/string.h:26:10: fatal error: 'bits/libc-header-start.h' file not found
   26 | #include <bits/libc-header-start.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
[3/32] Generating bpftool_target with a custom command
ninja: build stopped: subcommand failed.

However, bits/libc-header-start.h is correctly installed on my system:

root@debian:~/src/cachyos/scx/b4# find / -name "libc-header-start.h"
/usr/include/x86_64-linux-gnu/bits/libc-header-start.h

Maybe there's a problem with SCX build system?

scx_nest causes kernel crash on 6.7.1-scx1-2

I encountered a system crash when scx_nest started. (After disabling IBT, it ran as I expected)

[ 4047.176940] Missing ENDBR: bpf_cpumask_release+0x0/0x50
[ 4047.176987] ------------[ cut here ]------------
[ 4047.176988] kernel BUG at arch/x86/kernel/cet.c:102!
[ 4047.176993] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 4047.176996] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G        W  OE      6.7.1-scx1-2-sched-ext #1 0039d023e024b48c685b3094e8366c561cb76d98
[ 4047.176999] Hardware name: Dell Inc. XPS 13 9310 2-in-1/0062CR, BIOS 2.5.1 10/04/2021
[ 4047.177000] Sched_ext: nest (enabled+all)
[ 4047.177001] RIP: 0010:exc_control_protection+0x2a5/0x2b0
[ 4047.177005] Code: 44 89 e0 c6 05 f5 f6 1f 01 01 48 c7 c2 20 19 62 8f 25 ff 7f 00 00 83 f8 05 77 cb eb bd 48 c7 43 50 00 00 00 00 e9 73 fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90
[ 4047.177006] RSP: 0018:ffffa1450027cca8 EFLAGS: 00010002
[ 4047.177008] RAX: 000000000000002b RBX: ffffa1450027ccd8 RCX: 0000000000000000
[ 4047.177009] RDX: 0000000000000000 RSI: ffff94b02f721b80 RDI: ffff94b02f721b80
[ 4047.177010] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffa1450027cb48
[ 4047.177010] R10: 0000000000000003 R11: ffffffff904cb128 R12: 0000000000000003
[ 4047.177011] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 4047.177012] FS:  0000000000000000(0000) GS:ffff94b02f700000(0000) knlGS:0000000000000000
[ 4047.177013] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4047.177013] CR2: 00007e4527d9a000 CR3: 000000013f7b4001 CR4: 0000000000f70ef0
[ 4047.177014] PKRU: 55555554
[ 4047.177015] Call Trace:
[ 4047.177016]  <IRQ>
[ 4047.177017]  ? die+0x36/0x90
[ 4047.177020]  ? do_trap+0xda/0x100
[ 4047.177023]  ? exc_control_protection+0x2a5/0x2b0
[ 4047.177024]  ? do_error_trap+0x6a/0x90
[ 4047.177025]  ? exc_control_protection+0x2a5/0x2b0
[ 4047.177027]  ? exc_invalid_op+0x50/0x70
[ 4047.177029]  ? exc_control_protection+0x2a5/0x2b0
[ 4047.177030]  ? asm_exc_invalid_op+0x1a/0x20
[ 4047.177034]  ? exc_control_protection+0x2a5/0x2b0
[ 4047.177036]  ? exc_control_protection+0xcd/0x2b0
[ 4047.177037]  asm_exc_control_protection+0x26/0x30
[ 4047.177039] RIP: 0010:bpf_cpumask_release+0x0/0x50
[ 4047.177040] Code: 00 e8 34 bf 37 00 48 89 d8 5b c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <66> 0f 1f 00 0f 1f 44 00 00 53 b8 ff ff ff ff 48 89 fb f0 0f c1 47
[ 4047.177041] RSP: 0018:ffffa1450027cd88 EFLAGS: 00010002
[ 4047.177042] RAX: ffffffff8e8dd180 RBX: ffff94a8c5f2c078 RCX: ffff94a92d774c48
[ 4047.177043] RDX: 0000000000000001 RSI: ffff94a945615ed0 RDI: ffff94a92d774c48
[ 4047.177043] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000008020000f
[ 4047.177044] R10: ffff94a8e55f6600 R11: 0000000000015aff R12: 0000000000000000
[ 4047.177045] R13: ffff94a8c5f2c060 R14: ffff94a945615ed0 R15: ffff94a8c5f2c078
[ 4047.177046]  ? __pfx_bpf_cpumask_release+0x10/0x10
[ 4047.177048]  bpf_obj_free_fields+0x111/0x1c0
[ 4047.177051]  bpf_selem_free+0x23/0xa0
[ 4047.177054]  bpf_selem_unlink_storage_nolock.constprop.0+0x8c/0x160
[ 4047.177056]  bpf_local_storage_destroy+0x81/0x120
[ 4047.177058]  bpf_task_storage_free+0x32/0x50
[ 4047.177060]  security_task_free+0x23/0x50
[ 4047.177063]  __put_task_struct+0x70/0x180
[ 4047.177065]  rcu_do_batch+0x1d7/0x5d0
[ 4047.177069]  ? rcu_do_batch+0x16c/0x5d0
[ 4047.177071]  rcu_core+0x1b5/0x4c0
[ 4047.177073]  __do_softirq+0xc9/0x2c8
[ 4047.177075]  __irq_exit_rcu+0xa3/0xc0
[ 4047.177078]  sysvec_apic_timer_interrupt+0x72/0x90
[ 4047.177080]  </IRQ>
[ 4047.177081]  <TASK>
[ 4047.177081]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 4047.177083] RIP: 0010:cpuidle_enter_state+0xcc/0x440
[ 4047.177085] Code: da e0 36 ff e8 d5 f3 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 e3 88 35 ff 45 84 ff 0f 85 56 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[ 4047.177086] RSP: 0018:ffffa1450018be90 EFLAGS: 00000246
[ 4047.177086] RAX: ffff94b02f7346c0 RBX: ffff94b02f73dfe8 RCX: 000000000000001f
[ 4047.177087] RDX: 0000000000000004 RSI: 000000002da97f6a RDI: 0000000000000000
[ 4047.177088] RBP: 0000000000000002 R08: 0000000000000000 R09: 000000000000004f
[ 4047.177089] R10: 0000000000000018 R11: ffff94b02f7330a4 R12: ffffffff90549780
[ 4047.177089] R13: 000003ae4e8a563b R14: 0000000000000002 R15: 0000000000000000
[ 4047.177091]  cpuidle_enter+0x2d/0x40
[ 4047.177095]  do_idle+0x1d8/0x230
[ 4047.177098]  cpu_startup_entry+0x2a/0x30
[ 4047.177099]  start_secondary+0x11e/0x140
[ 4047.177102]  secondary_startup_64_no_verify+0x18f/0x19b
[ 4047.177106]  </TASK>

SCX has problems resuming from standby

Greetings

I've been using CachyOS with a Sched-EXT kernel and using lavd on a Steam Deck (though it might apply to other systems) that doesn't happen when I disable EXT-SCHED (sudo systemctl stop scx)

not really sure what to mention but...
a thread seems to get stuck and it causes the device to turn really unresponsive to even inputs and audio de-syncs or gets garbled

the steps to reproduce this problem are...
boot into gamemode (default behavior)
go into a game
after I'm in game, I press standby
after a few minutes, resume play...
then I quit the game and try to open another game
it usually does it this quickly, but sometimes it needs a second try (standby -> resume -> change game)
and if you have onscreen performance numbers enabled you'll see a core, or multiple, stuck on 100%

I've included pictures and a log
IMG20240607185822
IMG20240607193301
lavd3.log

Fix failing test (bpf_builder::tests::test_vmlinux_h_ver_sha1) in fedora mock build

Error:

running 2 tests
test bpf_builder::tests::test_vmlinux_h_ver_sha1 ... FAILED
test bpf_builder::tests::test_bpf_builder_new ... ok

failures:

---- bpf_builder::tests::test_vmlinux_h_ver_sha1 stdout ----
thread 'bpf_builder::tests::test_vmlinux_h_ver_sha1' panicked at src/bpf_builder.rs:376:18:
called `Option::unwrap()` on a `None` value

How to repro:

rust2rpm -s scx_utils
mock -r fedora-rawhide-aarch64 --sources . --spec rust-scx_utils.spec --postinstall

potential deadlock with scx_flatcg

scx_flatcg is calling bpf_cgroup_from_id() from .dispatch(), that is called with rq->__lock held, bpf_cgroup_from_id() is basically a kfunc wrapper of cgroup_get_from_id() that is calling kernfs_find_and_get_node_by_id(), that can acquire kernfs_idr_lock (with rq->_lock held).

In kernfs we can acquire kernfs_idr_lock, be interrupted by a scheduler tick and acquire rq->__lock, leading to the following deadlock scenario:

        CPU0                    CPU1
        ----                    ----
   lock(kernfs_idr_lock);
                                lock(rq->__lock);
                                lock(kernfs_idr_lock);
   <Interrupt>
    lock(rq->__lock);

I don't know exactly how to prevent this without doing a big redesign of scx_flatcg or changing kernfs to use irqsave/irqrestore spinlocks (that seems a pretty strong change... but probably the safest if we want to make bpf_cgroup_from_id() available for general usage).

Suggestions?

Crash in scheduler_tick+0xe9/0x340 during CI run (while unloading scheduler)

I hit a crash in the CI job for the infeasible weights PR: https://github.com/sched-ext/scx/actions/runs/7793978070/job/21254555511?pr=129


[   37.221386] int3: 0000 [#1] PREEMPT SMP NOPTI
[   37.221714] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.7.0-virtme #1
[   37.221830] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   37.221909] Sched_ext: nest (disabling)
[   37.221909] RIP: 0010:scheduler_tick+0xe9/0x340
[   37.221909] Code: 89 ef 4c 89 fe e8 c7 fe ff ff 48 89 ef e8 1f 61 ff ff 0f 1f 44 00 00 eb 6c e8 d3 13 19 00 41 f6 47 2c 20 0f 85 5d 01 00 00 0f <1f> 44 00 00 48 89 d8 31 d2 4a 03 04 e5 e0 39 b1 a6 48 8b 88 f0 09
[   37.221909] RSP: 0018:ffffac32800e8eb0 EFLAGS: 00000046
[   37.221909] RAX: 0000000000000000 RBX: 000000000002d900 RCX: 00000000fffbfcc8
[   37.221909] RDX: ffffa086c12c9f80 RSI: 0000000000000000 RDI: ffffa086c12c9f80
[   37.221909] RBP: ffffa086fecad900 R08: 00000008aa8a03e2 R09: 00000008a998a67f
[   37.221909] R10: 0000000000000b01 R11: ffffac32800e8ff8 R12: 0000000000000001
[   37.221909] R13: 0000000000000001 R14: ffffac32800a3e38 R15: ffffa086c12c9f80
[   37.221909] FS:  0000000000000000(0000) GS:ffffa086fec80000(0000) knlGS:0000000000000000
[   37.221909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.221909] CR2: 00005607eb08d2f8 CR3: 0000000002a46000 CR4: 00000000000006f0
[   37.221909] Call Trace:
[   37.221909]  <IRQ>
[   37.221909]  ? die+0x37/0x90
[   37.221909]  ? exc_int3+0x10f/0x120
[   37.221909]  ? asm_exc_int3+0x39/0x40
[   37.221909]  ? scheduler_tick+0xe9/0x340
[   37.221909]  ? scheduler_tick+0xe9/0x340
[   37.221909]  update_process_times+0x83/0x90
[   37.221909]  ? __pfx_tick_nohz_highres_handler+0x10/0x10
[   37.221909]  tick_sched_handle+0x33/0x50
[   37.221909]  tick_nohz_highres_handler+0x71/0x90
[   37.221909]  __hrtimer_run_queues+0x112/0x2b0
[   37.221909]  hrtimer_interrupt+0x100/0x240
[   37.221909]  __sysvec_apic_timer_interrupt+0x52/0x140
[   37.221909]  sysvec_apic_timer_interrupt+0x73/0x90
[   37.221909]  </IRQ>
[   37.221909]  <TASK>
[   37.221909]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   37.221909] RIP: 0010:default_idle+0xf/0x20
[   37.221909] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 13 d3 3e 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[   37.221909] RSP: 0018:ffffac32800a3ee8 EFLAGS: 00000216
[   37.221909] RAX: 0001ad4000000004 RBX: ffffa086c12c9f80 RCX: 4000000000000000
[   37.221909] RDX: 0000000000000001 RSI: ffffffffa6a23ed3 RDI: 000000000000d73c
[   37.221909] RBP: 0000000000000001 R08: 000000000000d73c R09: 00000000ffffffff
[   37.221909] R10: ffffa086fffdb5c0 R11: 0000000000000064 R12: 0000000000000000
[   37.221909] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   37.221909]  default_idle_call+0x2e/0xe0
[   37.221909]  do_idle+0x1d0/0x210
[   37.221909]  cpu_startup_entry+0x2a/0x30
[   37.221909]  start_secondary+0xf7/0x100
[   37.221909]  secondary_startup_64_no_verify+0x178/0x17b
[   37.221909]  </TASK>
[   37.221909] Modules linked in:
[   37.221916] int3: 0000 [#2] PREEMPT SMP NOPTI
[   37.221909] ---[ end trace 0000000000000000 ]---
[   37.221909] RIP: 0010:scheduler_tick+0xe9/0x340
[   37.221909] Code: 89 ef 4c 89 fe e8 c7 fe ff ff 48 89 ef e8 1f 61 ff ff 0f 1f 44 00 00 eb 6c e8 d3 13 19 00 41 f6 47 2c 20 0f 85 5d 01 00 00 0f <1f> 44 00 00 48 89 d8 31 d2 4a 03 04 e5 e0 39 b1 a6 48 8b 88 f0 09
[   37.221909] RSP: 0018:ffffac32800e8eb0 EFLAGS: 00000046
[   37.221916] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D            6.7.0-virtme #1
[   37.221916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   37.221916] Sched_ext: nest (disabling)
[   37.221916] RIP: 0010:scheduler_tick+0xe9/0x340
[   37.221909] 
[   37.221916] Code: 89 ef 4c 89 fe e8 c7 fe ff ff 48 89 ef e8 1f 61 ff ff 0f 1f 44 00 00 eb 6c e8 d3 13 19 00 41 f6 47 2c 20 0f 85 5d 01 00 00 0f <1f> 44 00 00 48 89 d8 31 d2 4a 03 04 e5 e0 39 b1 a6 48 8b 88 f0 09
[   37.221909] RAX: 0000000000000000 RBX: 000000000002d900 RCX: 00000000fffbfcc8
[   37.221909] RDX: ffffa086c12c9f80 RSI: 0000000000000000 RDI: ffffa086c12c9f80
[   37.221916] RSP: 0018:ffffac3280003eb0 EFLAGS: 00000046
[   37.221909] RBP: ffffa086fecad900 R08: 00000008aa8a03e2 R09: 00000008a998a67f
[   37.221916] RAX: 0000000000000000 RBX: 000000000002d900 RCX: 00000000fffbfcc8
[   37.221909] R10: 0000000000000b01 R11: ffffac32800e8ff8 R12: 0000000000000001
[   37.221916] RDX: ffffffffa720a900 RSI: 0000000000000000 RDI: ffffffffa720a900
[   37.221909] R13: 0000000000000001 R14: ffffac32800a3e38 R15: ffffa086c12c9f80
[   37.221916] RBP: ffffa086fec2d900 R08: 00000008aa8a2200 R09: 00000008a998d253
[   37.221909] FS:  0000000000000000(0000) GS:ffffa086fec80000(0000) knlGS:0000000000000000
[   37.221916] R10: 0000000000000b01 R11: ffffac3280003ff8 R12: 0000000000000000
[   37.221909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.221916] R13: 0000000000000000 R14: ffffffffa7203dd8 R15: ffffffffa720a900
[   37.221909] CR2: 00005607eb08d2f8 CR3: 0000000002a46000 CR4: 00000000000006f0
[   37.221916] FS:  0000000000000000(0000) GS:ffffa086fec00000(0000) knlGS:0000000000000000
[   37.221916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.221916] CR2: 00007f03f72fd6c0 CR3: 000000002523e000 CR4: 00000000000006f0
[   37.221916] Call Trace:
[   37.221916]  <IRQ>
[   37.221916]  ? die+0x37/0x90
[   37.221916]  ? exc_int3+0x10f/0x120
[   37.221916]  ? asm_exc_int3+0x39/0x40
[   37.221909] Kernel panic - not syncing: Fatal exception in interrupt
[   37.221916]  ? scheduler_tick+0xe9/0x340
[   37.221916]  ? scheduler_tick+0xe9/0x340
[   37.221916]  update_process_times+0x83/0x90
[   37.221916]  ? __pfx_tick_nohz_highres_handler+0x10/0x10
[   37.221916]  tick_sched_handle+0x33/0x50
[   37.221916]  tick_nohz_highres_handler+0x71/0x90
[   37.221916]  __hrtimer_run_queues+0x112/0x2b0
[   37.221916]  hrtimer_interrupt+0x100/0x240
[   37.221916]  __sysvec_apic_timer_interrupt+0x52/0x140
[   37.221916]  sysvec_apic_timer_interrupt+0x73/0x90
[   37.221916]  </IRQ>
[   37.221916]  <TASK>
[   37.221916]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   37.221916] RIP: 0010:default_idle+0xf/0x20
[   37.221916] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 13 d3 3e 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[   37.221916] RSP: 0018:ffffffffa7203e88 EFLAGS: 00000216
[   37.221916] RAX: 0001ad4000000004 RBX: ffffffffa720a900 RCX: 4000000000000000
[   37.221916] RDX: 0000000000000001 RSI: ffffffffa6a23ed3 RDI: 000000000001af7c
[   37.221916] RBP: 0000000000000000 R08: 000000000001af7c R09: 000000000000006b
[   37.221916] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[   37.221916] R13: 0000000000000000 R14: ffffffffa720a080 R15: 0000000000013d50
[   37.221916]  default_idle_call+0x2e/0xe0
[   37.221916]  do_idle+0x1d0/0x210
[   37.221916]  cpu_startup_entry+0x2a/0x30
[   37.221916]  rest_init+0xc5/0xd0
[   37.221916]  arch_call_rest_init+0xe/0x30
[   37.221916]  start_kernel+0x406/0x660
[   37.221916]  x86_64_start_reservations+0x18/0x30
[   37.221916]  x86_64_start_kernel+0xc6/0xe0
[   37.221916]  secondary_startup_64_no_verify+0x178/0x17b
[   37.221916]  </TASK>
[   37.221916] Modules linked in:
[   37.221916] ---[ end trace 0000000000000000 ]---
[   37.221916] RIP: 0010:scheduler_tick+0xe9/0x340
[   37.221916] Code: 89 ef 4c 89 fe e8 c7 fe ff ff 48 89 ef e8 1f 61 ff ff 0f 1f 44 00 00 eb 6c e8 d3 13 19 00 41 f6 47 2c 20 0f 85 5d 01 00 00 0f <1f> 44 00 00 48 89 d8 31 d2 4a 03 04 e5 e0 39 b1 a6 48 8b 88 f0 09
[   37.221916] RSP: 0018:ffffac32800e8eb0 EFLAGS: 00000046
[   37.221916] RAX: 0000000000000000 RBX: 000000000002d900 RCX: 00000000fffbfcc8
[   37.221916] RDX: ffffa086c12c9f80 RSI: 0000000000000000 RDI: ffffa086c12c9f80
[   37.221916] RBP: ffffa086fecad900 R08: 00000008aa8a03e2 R09: 00000008a998a67f
[   37.221916] R10: 0000000000000b01 R11: ffffac32800e8ff8 R12: 0000000000000001
[   37.221916] R13: 0000000000000001 R14: ffffac32800a3e38 R15: ffffa086c12c9f80
[   37.221916] FS:  0000000000000000(0000) GS:ffffa086fec00000(0000) knlGS:0000000000000000
[   37.221916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.221916] CR2: 00007f03f72fd6c0 CR3: 000000002523e000 CR4: 00000000000006f0
[   37.221909] Shutting down cpus with NMI
[   37.221909] Kernel Offset: 0x24600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
qemu-system-x86_64: terminating on signal 15 from pid 44869 (timeout)
FAIL: scx_nest
FAILED: meson-internal__test_sched 
env MESON_SOURCE_ROOT=/home/runner/work/scx/scx MESON_BUILD_ROOT=/home/runner/work/scx/scx/build MESON_SUBDIR= 'MESONINTROSPECT=/home/runner/.local/bin/meson introspect' /home/runner/work/scx/scx/meson-scripts/test_sched /home/runner/work/scx/scx/linux
ninja: build stopped: subcommand failed.
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /home/runner/work/scx/scx/build test_sched
Error: Process completed with exit code 1.

Is this something we've seen before? @arighi, is it possible to get the vmlinux for these VMs? It would be useful to have them as artifacts so we can more effectively debug crashes like this. Hmmmm, I suppose it should be as simple as grabbing the vmlinux and exporting it as a CI workflow artifact: https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts. I can look into that later this week.

C sample schedulers failing with "`scx_bpf_switch_all` not found"

The following is happening when trying to run any of the C sample schedulers:

$ sudo ./scx_simple
libbpf: extern (func ksym) 'scx_bpf_switch_all': not found in kernel or module BTFs
libbpf: failed to load object 'scx_simple'
libbpf: failed to load BPF skeleton 'scx_simple': -22
../scheds/c/scx_simple.c:81 [scx panic]: Invalid argument
Failed to load skel

This occurs when using scx b7c06b9 and sched-ext/sched_ext@13f2f03.

After some more testing it looks like the in-kernel examples under tools/sched_ext are working, so it doesn't seem to be a kernel config issue.

scx-scheds PKGBUILD

Hi,

Could you please share the PKGBUILD for scx-scheds?
Since you have already a archlinux repository, I would like to provide this also in our CachyOS repository.

Thanks in advance

Unable to start scx_rusty after topology commit

I am unable to start scx_rusty after 8aba090 because it thinks I have 16 cores, 32 threads, when I actually have 8 cores, 16 threads:

> sudo scx_rusty
09:21:27 [INFO] CPUs: online/possible = 16/32
thread 'main' panicked at /usr/src/debug/scx-scheds-git/scx/rust/scx_utils/src/topology.rs:218:21:
Failed to open or read core_id file "/sys/devices/system/cpu/cpu16/topology/core_id"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

[scx_rustland] Runtime error (invalid DSQ ID), likely related to `nosmt` kernel cmdline parameter being set

Hi,
scx_rustland 5de8ff5 crashes when some of smt cores are disabled.
Tested with and without nosmt kernel cmdline parameter on the same kernel sched-ext/sched_ext@5a00279.
No problem with other scx_* schedulers [except scx_pair].

dmesg:

[18868.377213] sched_ext: BPF scheduler "rustland" errored, disabling
[18868.377219] sched_ext: runtime error (invalid DSQ ID 0x000000000000000a)
[18868.377221]    scx_bpf_consume+0xbd/0x110
[18868.377227]    bpf_prog_615c6e385c3ea84d_rustland_dispatch+0x184/0x1df

log:

sudo ./scx_rustland -d
21:46:34 [INFO] RustLand scheduler attached - 8 online CPUs
21:46:34 [INFO] vruntime=0
21:46:34 [INFO]   tasks=0
21:46:34 [INFO]   nr_user_dispatches=0 nr_kernel_dispatches=11
21:46:34 [INFO]   nr_cancel_dispatches=0 nr_bounce_dispatches=0
21:46:34 [INFO]   nr_waiting=1437 [nr_queued=1437 + nr_scheduled=0]
21:46:34 [INFO]   nr_failed_dispatches=0 nr_sched_congested=0 nr_page_faults=0 [OK]
21:46:34 [INFO] slice boost = 100
21:46:34 [INFO] Running tasks:
21:46:34 [INFO]   core  0 cpu  0 pid=0
21:46:34 [INFO]   core  1 cpu  2 pid=0
21:46:34 [INFO]   core  2 cpu  4 pid=0
21:46:34 [INFO]   core  3 cpu  6 pid=0
21:46:34 [INFO]   core  4 cpu  8 pid=[self]
21:46:34 [INFO]   core  5 cpu 10 pid=0
21:46:34 [INFO]   core  6 cpu 12 pid=0
21:46:34 [INFO]   core  7 cpu 14 pid=0

DEBUG DUMP
================================================================================

swapper/10[0] triggered exit kind 1024:
  runtime error (invalid DSQ ID 0x000000000000000a)

Backtrace:
  scx_bpf_consume+0xbd/0x110
  bpf_prog_615c6e385c3ea84d_rustland_dispatch+0x184/0x1df

CPU states
----------

CPU 6   : nr_run=2 flags=0x0 cpu_rel=0 ops_qseq=13924727 pnt_seq=23974268
          curr=scx_rustland[110404] class=ext_sched_class

 *R scx_rustland[110404] +0ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=0
      cpus=5555

  R systemd-udevd[538] +0ms
      scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=17237280334746
      cpus=ffff

    __x64_sys_epoll_wait+0x73/0x110
    do_syscall_64+0x85/0x1a0
    entry_SYSCALL_64_after_hwframe+0x6c/0x74

================================================================================

21:46:34 [INFO] Unregister RustLand scheduler
Error: EXIT: runtime error (invalid DSQ ID 0x000000000000000a)

[scx_lavd] Getting large pauses and stutters in-game

I read about scx_lavd on Phoronix and decided to give it a try and get some basic benchmarks.

I have observed that I am able to quite reliably reproduce stutters and pauses, sometimes several seconds long while running around my ship in Helldivers 2.

Here's the output from sudo scx_lavd while I was running the game with several stutters in the period (not sure on exact timestamps:
https://gist.github.com/ChrisLane/1945f66b5d8a2f36a530ca9ac8abcfa1

Please let me know if I can do anything to debug or provide more useful info for your debugging purposes.

System:
Distro: Arch Linux
Kernel: 6.8.7-1-cachyos
GPU: AMD Radeon RX 6800 XT (RADV NAVI21)
CPU: AMD Ryzen 9 5900X 12-Core Processor
RAM: 36GB

Large amount of redundant task migration

Description

Task migration are performed widely among scheduling operation, and it's a rather costly operation.
In EEVDF/CFS , softirq are utilize to perform system wide CPU load balancing, which might include task migration .

Especially when system are heavy loaded or loads are unbalanced , task migration will happen frequently , take the following python code snippet to generate a mixture of CPU-bound and I/O-bound workloads which can cause significant CPU load unbalance for the system.

import threading
import os
import time

def io_bound_task(file_path, size_mb):
    with open(file_path, 'wb') as f:
        f.write(os.urandom(size_mb * 1024 * 1024))

    with open(file_path, 'rb') as f:
        data = f.read()

    os.remove(file_path)

def generate_io_load(num_threads, size_mb):
    threads = []
    for i in range(num_threads):
        file_path = f'test_file_{i}.dat'
        t = threading.Thread(target=io_bound_task, args=(file_path, size_mb))
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

def cpu_bound_task(duration):
    end_time = time.time() + duration
    while time.time() < end_time:
        result = 0
        for i in range(10000):
            result += i ** 2

def generate_cpu_load(num_threads, duration):
    threads = []
    for i in range(num_threads):
        t = threading.Thread(target=cpu_bound_task, args=(duration,))
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

if __name__ == "__main__":
    io_threads = threading.Thread(target=generate_io_load, args=(1200, 100))
    cpu_threads = threading.Thread(target=generate_cpu_load, args=(12, 300))

    io_threads.start()
    cpu_threads.start()

    io_threads.join()
    cpu_threads.join()

Using mpstat command to monitor per-CPU usage rate over 60 seconds and calculate the maximum load difference among all CPUs.

$ mpstat -P ALL 1 60

Comparing the result for EEVDF and scx_lavd

EEVDF Maximum CPU load imbalance over the period: 51.00%
LAVD Maximum CPU load imbalance over the period: 50.51%

Then we use perf stat to do the recording of the number of task migration performed.

The result for EEVDF is

$ sudo perf stat -e sched:sched_migrate_task -a sleep 60

 Performance counter stats for 'system wide':

            2,5733      sched:sched_migrate_task

      60.002513562 seconds time elapsed

The result for scx_lavd is

$ sudo perf stat -e sched:sched_migrate_task -a sleep 60

 Performance counter stats for 'system wide':

           12,8343      sched:sched_migrate_task

      60.004100279 seconds time elapsed

While having almost the same maximum CPU load unbalance ratio, scx_lavd peformed much more task migration than EEVDF. The situation doesn't just exist in scx_lavd , but also in scx_rusty , scx_central as far as I tested.

Possible enhancement

In linux kernel , there's a helper function can_migrate_task() to determine whether a selected task should be migrate or not due to its NUMA locality and cache affinity . We should have a similar mechanism to determine whether a task migration should happen or not in scx schedulers .

Since it's in scx provide an excellent framework for us to implement scheduler or load balancer in user space, just like scx_rusty and scx_rustland , I think we can build the mechanism in user space as well.

I would like to try and implement a machine learning based method regarding to solve this problem , which train a ML model that imitate the behavior of can_migrate_task() in linux kernel . To provide a light-weighted and fast inference to determine whether a task should be migrated or not.
The idea comes from the reference paper linked below.

If it works out, we should be able to avoid redundant and even harmful task migration within scx schedulers.

What do you guys think? please let me know everything that should be considered and if it's possible I would like to start the experiments and do the implementation .

Reference Paper

Machine learning for load balancing in the Linux kernel

Many schedulers seem to be heavily penalized by heavy I/O, at least with bcachefs

I am using Arch, with linux-cachyos 6.7.0-4, with every upstream bcachefs patch from the 2024-01-01 tag up to 2bf0b0a9dff974cac259ce92d146e7142f472496 applied on top, with a bcachefs rootfs on a WD SN750 SSD, another bcachefs filesystem on a Samsung 980 Pro, and finally, my major storage on two 18TB WD Red Pro drives with 2 replicas enabled for both metadata and user data.

For heavy I/O, I am either running qBittorrent downloading some rather large (200GiB+) data sets at 1-2MB/s, plus some smaller ones (10-30GiB) at 20MB/s, to the two replicas hard drives. Or I am building a kernel on the rootfs with all 16 threads of my CPU.

Either of those tasks cause my compositor, Wayfire, to bog down heavily, if I happen to be using either scx_rustland, scx_rusty, or scx_nest. Disabling the sched-ext processes, and the compositor becomes performant again under the same conditions.

scx_central : Backward compability issue

Description

Like the bug fixed by PR #298 , the scheduler scx_central suffers the same issue for older version of kernels in which the scheduler only schedule task in SCHED_EXT class. Generate the following output

[SEQ 0]
total   :         0    local:         0   queued:         0  lost:         0
timer   :        18 dispatch:         0 mismatch:         0 retry:         0
overflow:         0
[SEQ 1]
total   :         0    local:         0   queued:         0  lost:         0
timer   :       998 dispatch:      1854 mismatch:         0 retry:         0
overflow:         0
[SEQ 2]
total   :         0    local:         0   queued:         0  lost:         0
timer   :      1975 dispatch:      3468 mismatch:         0 retry:         0
overflow:         0
[SEQ 3]
total   :         0    local:         0   queued:         0  lost:         0
timer   :      2953 dispatch:      5042 mismatch:         0 retry:         0
overflow:         0
^CEXIT: BPF scheduler unregistered

Expected behavior

scx_central should schedule every task in the system by default.

Ambiguity on "Scheduling cycle" subsection (OVERVIEW.md)

I'm trying to understand the Scheduling cycle from Overview. I'm confused about step 3:

When a CPU is ready to schedule, it first looks at its local DSQ. If empty, it invokes .consume() which should make one or more scx_bpf_consume() calls to consume tasks from DSQ's. If a scx_bpf_consume() call succeeds, the CPU has the next task to run and .consume() can return. If .consume() is not defined, sched_ext will by-default consume from only the built-in SCX_DSQ_GLOBAL DSQ.

Operations referred from previous steps (select_cpu() and .dispatch()) are from structure sched_ext_ops but .consume() is not. What is it then?

Error encountered:

 error: field designator 'consume' does not refer to any field in type 'struct sched_ext_ops'

Is CONFIG_PAHOLE_HAS_BTF_TAG needed?

Hi there,

I'm trying to setup a self-built 6.8.8 kernel with the Cachyos patchset to experiment with sched-ext.

I have seen on Phoronix that these kernel config options are supposedly needed for sched-ext to work:

# essential configs to enable sched_ext
CONFIG_BPF=y
CONFIG_SCHED_CLASS_EXT=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_BPF_JIT_DEFAULT_ON=y
CONFIG_PAHOLE_HAS_BTF_TAG=y
CONFIG_DEBUG_INFO_BTF=y

Is that correct? I mean, is CONFIG_PAHOLE_HAS_BTF_TAG really needed?
It seems to depend on CONFIG_CC_IS_CLANG which is an additional hurdle since I have never built the kernel with CLANG (which is needed for sched-ext anyway, but still I would like to be sure of what am I doing and why).

Thanks!

`bpfland`: scheduler failure during CPU hotplug operations

System Info:

# lscpu
Architecture:                         ppc64le
Byte Order:                           Little Endian
CPU(s):                               128
On-line CPU(s) list:                  0,1,109-127
Off-line CPU(s) list:                 2-108
Thread(s) per core:                   3
Core(s) per socket:                   3
Socket(s):                            2
NUMA node(s):                         8
Model:                                2.3 (pvr 004e 1203)
Model name:                           POWER9, altivec supported
Frequency boost:                      enabled
CPU max MHz:                          3800.0000
CPU min MHz:                          2300.0000
L1d cache:                            192 KiB
L1i cache:                            192 KiB
L2 cache:                             3 MiB
L3 cache:                             60 MiB
NUMA node0 CPU(s):                    0,1
NUMA node8 CPU(s):                    109-127

Kernel Version: 6.10.0-rc2+ + struct_ops patches on ppc64le.

SCX Version: Latest upstream


Steps to recreate the issue:

  • Run the bpfland scheduler.

  • Execute the following command to stress the CPUs:
    stress-ng --cpu=100

  • Offline CPUs sequentially from 2 to 127:
    for i in {2..127}; do echo 0 > /sys/devices/system/cpu/cpu$i/online; done


During the process of offlining CPUs, the system successfully unregisters some CPUs without issues. However, it occasionally encounters the following error:

Failed to build host topology: Failed to open or read file /sys/devices/system/cpu/cpu82/topology/core_id.

When a CPU is offlined, its associated topology information in the sysfs is also unregistered and removed. However, the scheduler still tries to access this topology file even after it has been offlined which leads to this failure.

Error Output

EXIT: Scheduler unregistered from the main kernel (cpu 53 going offline, exiting scheduler)
10:28:02 [INFO] Unregister scx_bpfland scheduler
10:28:03 [INFO] SMT scheduling on
10:28:04 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 58 going offline, exiting scheduler)
10:28:04 [INFO] Unregister scx_bpfland scheduler
10:28:04 [INFO] SMT scheduling on
10:28:06 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 63 going offline, exiting scheduler)
10:28:06 [INFO] Unregister scx_bpfland scheduler
10:28:06 [INFO] SMT scheduling on
10:28:07 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 68 going offline, exiting scheduler)
10:28:07 [INFO] Unregister scx_bpfland scheduler
10:28:07 [INFO] SMT scheduling on
10:28:09 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 73 going offline, exiting scheduler)
10:28:09 [INFO] Unregister scx_bpfland scheduler
10:28:09 [INFO] SMT scheduling on
10:28:11 [INFO] running=0/128 nr_kthread_dispatches=0 nr_direct_dispatches=0 nr_prio_dispatches=0 nr_shared_dispatches=0
EXIT: Scheduler unregistered from the main kernel (cpu 78 going offline, exiting scheduler)
10:28:11 [INFO] Unregister scx_bpfland scheduler
thread 'main' panicked at src/main.rs:145:36:
Failed to build host topology: Failed to open or read file "/sys/devices/system/cpu/cpu82/topology/core_id"

Stack backtrace:

   0: std::backtrace_rs::backtrace::libunwind::trace
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1: std::backtrace_rs::backtrace::trace_unsynchronized
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2: std::backtrace::Backtrace::create
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/backtrace.rs:331:13
   3: anyhow::error::<impl anyhow::Error>::msg
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.86/src/error.rs:83:36
   4: scx_utils::topology::read_file_usize
             at ./rust/scx_utils/src/topology.rs:331:13
   5: scx_utils::topology::create_insert_cpu
             at ./rust/scx_utils/src/topology.rs:381:19
   6: scx_utils::topology::create_numa_nodes
             at ./rust/scx_utils/src/topology.rs:502:13
   7: scx_utils::topology::Topology::new
             at ./rust/scx_utils/src/topology.rs:201:13
   8: scx_bpfland::Scheduler::init
             at ./scheds/rust/scx_bpfland/src/main.rs:145:20
   9: scx_bpfland::main
             at ./scheds/rust/scx_bpfland/src/main.rs:279:25
  10: core::ops::function::FnOnce::call_once
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/ops/function.rs:250:5
  11: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:154:18
  12: std::rt::lang_start::{{closure}}
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:166:18
  13: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/ops/function.rs:284:13
  14: std::panicking::try::do_call
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:504:40
  15: std::panicking::try
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:468:19
  16: std::panic::catch_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panic.rs:142:14
  17: std::rt::lang_start_internal::{{closure}}
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:148:48
  18: std::panicking::try::do_call
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:504:40
  19: std::panicking::try
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:468:19
  20: std::panic::catch_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panic.rs:142:14
  21: std::rt::lang_start_internal
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:148:20
  22: std::rt::lang_start
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/rt.rs:165:17
  23: main
  24: generic_start_main
             at /build/glibc-GVyp00/glibc-2.31/csu/../csu/libc-start.c:308:16
  25: __libc_start_main
             at /build/glibc-GVyp00/glibc-2.31/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:98:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/panicking.rs:597:5
   1: core::panicking::panic_fmt
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/result.rs:1652:5
   3: core::result::Result<T,E>::expect
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/result.rs:1034:23
   4: scx_bpfland::Scheduler::init
             at ./scheds/rust/scx_bpfland/src/main.rs:145:20
   5: scx_bpfland::main
             at ./scheds/rust/scx_bpfland/src/main.rs:279:25
   6: core::ops::function::FnOnce::call_once
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Issue with "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" Affecting 'scx_...' Execution

It seems there is an issue with "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" that hinders 'scx_...' from running smoothly. I've provided the complete code and error below:

root@my-VirtualBox:~# scx_rustland 
Error: Failed to attach struct ops

Caused by:
    bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL
root@my-VirtualBox:~# 

Thanks in advance for your time & attention.

Cargo required when building the c schedulers

Cargo shouldn't be needed when only building the c schedulers but this error occurs (meson build):

Program cargo found: NO
meson.build:15:8: ERROR: Program 'cargo' not found or not executable

meson setup command

meson setup build --prefix ~ -Dbpftool=./bpftool/src/bpftool -Dlibbpf_a=../bpftool/libbpf/src/libbpf.a -Dlibbpf_h=../bpftool/libbpf/src/root/usr/include -Denable_rust=false

scx_rusty errors/exits on suspend/resume

When a system is suspended the CPUs are going offline, this triggers a call to the .cpu_offline callback (and .cpu_online on resume), that is considered an error / exit condition by scx_rusty. That means scx_rusty currently does not support PM events, even if they shouldn't be considered error / exit conditions.

Maybe exiting in rusty_cpu_online/offline should be optional (via a command line)? Or somehow detect that the "offline/online" event was caused during suspend/resume. Other ideas?

scx_lavd: ~20% lower fps in Starcraft 2 and Diablo 3

Hi,

i was testing sxc_lavd lately (after #234 was fixed) against a linux-6.9-rcx kernel:

In Starcraft 2 and Diablo 3 i get roughly 20 percent less fps compared to eevdf.
Those games are heavily using one or two cores.

Tested on:
Endevaour OS
Ryzen 5950x ( with one CCD disabled as the readme mentioned sxc_lavd is not optimized for more than 1 ccd cpus)
32Gb Ram

Br
Pingubot

sched_ext: Elden Ring heavy stuttering

Elden Ring is presenting intense stuttering in any game area, resolution or frame rate. Tested with lavd (bad), nest (bad), rusty (worse), and rustland (unplayable). Stopping scx.service, falling back to BORE in CachyOS, solves the problem. Dark Souls 3, from the same engine, is normally working with sched_ext.

Ryzen 7 5800X3D
RX 7900 XT
16GB (2x8) 3800 MHz CL14 RAM - running fine grained OC
ROG Strix X570-i - PSS, NX, Global C States disabled (happens with PSS enabled too)

Happens on Nobara 39 with linux-cachyos kernel, CachyOS itself (linux-cachyos 6.9, linux-cachyos-deckify 6.9, linux-cachyos-rc 6.10, linux-cachyos-lto).

# scx_lavd -s $(nproc) 2> elden_ring.log attached.
elden_ring.log

meson compile should obey jobs flag

I tried to compile the schedulers on my system but my machine was quickly rendered unusable. No matter what I do it uses all cores and it looks like it's hard coded to make_jobs = nproc. Im not sure what all is compiling with the make_jobs flag but that was the only thing I saw with a super quick glance through.

scx_userland errored

Feb 04 18:48:37 roman kernel: sched_ext: BPF scheduler "userland" errored, disabling
Feb 04 18:48:37 roman kernel: sched_ext: runnable task stall (watchdog failed to check in for 3.001s)
Feb 04 18:48:37 roman kernel:    scheduler_tick+0x34a/0x3f0
Feb 04 18:48:37 roman kernel:    update_process_times+0x77/0x80
Feb 04 18:48:37 roman kernel:    tick_nohz_highres_handler+0xbf/0x120
Feb 04 18:48:37 roman kernel:    __hrtimer_run_queues+0x104/0x2f0
Feb 04 18:48:37 roman kernel:    hrtimer_interrupt+0xf4/0x390
Feb 04 18:48:37 roman kernel:    __sysvec_apic_timer_interrupt+0x4b/0x160
Feb 04 18:48:37 roman kernel:    sysvec_apic_timer_interrupt+0x36/0x80
Feb 04 18:48:37 roman kernel:    asm_sysvec_apic_timer_interrupt+0x1a/0x20
Feb 04 18:48:37 roman kernel: warning: `ThreadPoolForeg' uses wireless extensions which will stop working for Wi-Fi 7 hardware; use nl80211

I watched a YT video and had gnome-system-monitor open searching / filtering for scx
I had a foot term open executing scx_userland and watching the output
and one term with journalctl -f
and one other terminal which I closed and I opened vs code running in wayland

Version: 1.86.0-insider
Commit: e244acbb172c428cb219717a07bf55d2737492ca
Date: 2024-01-22T05:35:29.903Z
Electron: 27.2.1
ElectronBuildId: 26149897
Chromium: 118.0.5993.159
Node.js: 18.17.1
V8: 11.8.172.18-electron.0
OS: Linux x64 6.8.0-rc2-2-cachyos-rc-lto

and I got the entry in the journal that userland errored.

[Bug Report] scx_lavd failed to launch

Summary

Related to chaotic-cx/nyx#684.

Since commit b9d57e8, the scx_lavd program fails to run.

Ref: https://github.com/chaotic-cx/nyx/blob/main/pkgs/scx/common.nix#L9

Steps to reproduce

⯁ flake git:(master) ✗ ❯❯❯ just b
[sudo] password for kev:
warning: Git tree '/home/kev/flake' is dirty
building the system configuration...
warning: Git tree '/home/kev/flake' is dirty
stopping the following units: scx.service
activating the configuration...
setting up /etc...
sops-install-secrets: Imported /etc/ssh/ssh_host_rsa_key as GPG key with fingerprint 4ab394fe0264b5028190ed02231ba03f2115cb3f
sops-install-secrets: Imported /etc/ssh/ssh_host_ed25519_key as age key with fingerprint age1px9v42s7k0urw8af4mt8qc8jrchc02k2qkj0ysu50a0pztfclslqzpr097
reloading user units for kev...
restarting sysinit-reactivation.target
reloading the following units: dbus-broker.service
restarting the following units: polkit.service
starting the following units: scx.service
the following new units were started: battery_charge_threshold.service, sysinit-reactivation.target, systemd-tmpfiles-resetup.service
warning: the following units failed: scx.service

× scx.service - Start scx_scheduler
     Loaded: loaded (/etc/systemd/system/scx.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Fri 2024-04-26 09:14:57 HKT; 9s ago
   Duration: 37ms
       Docs: https://github.com/sched-ext/scx
    Process: 39561 ExecStart=/nix/store/xm7spb165ls4d27n6fgxbgwzkvqqwdrx-scx (code=exited, status=1/FAILURE)
   Main PID: 39561 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
        CPU: 36ms

Apr 26 09:14:57 nixos-x1-carbon systemd[1]: scx.service: Scheduled restart job, restart counter is at 5.
Apr 26 09:14:57 nixos-x1-carbon systemd[1]: scx.service: Start request repeated too quickly.
Apr 26 09:14:57 nixos-x1-carbon systemd[1]: scx.service: Failed with result 'exit-code'.
Apr 26 09:14:57 nixos-x1-carbon systemd[1]: Failed to start Start scx_scheduler.
warning: error(s) occurred while switching to the new configuration
error: Recipe `rebuild` failed on line 25 with exit code 1
✖ 1 flake git:(master) ✗ ❯❯❯ sudo scx_lavd
Error: Failed to load BPF program

Caused by:
    Invalid argument (os error 22)

Additional Hardware Information

⯁ ~ ❯❯❯ uname -ar
Linux nixos-x1-carbon 6.8.6-cachyos #1-NixOS SMP PREEMPT_DYNAMIC Sat Apr 13 11:10:12 UTC 2024 x86_64 GNU/Linux

image

Musl support

Hello! Today I encountered scx compilation and received an error

error[E0063]: missing fields `sched_ss_init_budget`, `sched_ss_low_priority`, `sched_ss_max_repl` and 1 other field in initializer of `sched_param`
    --> src/bpf.rs:377:34
     |
377 |          let param: sched_param = sched_param { sched_priority: 0 };
     |                                   ^^^^^^^^^^^ missing `sched_ss_init_budget`, `sched_ss_low_priority`, `sched_ss_max_repl` and 1 other field

Understanding the problem, I discovered that the libc library, in the presence of musl, has additional fields in sched_param

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.