Code Monkey home page Code Monkey logo

bore-scheduler's Introduction

BORE (Burst-Oriented Response Enhancer) CPU Scheduler

BORE (Burst-Oriented Response Enhancer) is enhanced versions of CFS (Completely Fair Scheduler) and EEVDF (Earliest Eligible Virtual Deadline First) Linux schedulers. Developed with the aim of maintaining these schedulers' high performance while delivering resilient responsiveness to user input under as versatile load scenario as possible.

To achieve this, BORE introduces a dimension of flexibility known as "burstiness" for each individual tasks, partially departing from CFS's inherent "complete fairness" principle. Burstiness refers to the score derived from the accumulated CPU time a task consumes after explicitly relinquishing it, either by entering sleep, IO-waiting, or yielding. This score represents a broad range of temporal characteristics, spanning from nanoseconds to hundreds of seconds, varying across different tasks.

Leveraging this burstiness metric, BORE dynamically adjusts scheduling properties such as weights and delays for each task. Consequently, in systems experiencing diverse types of loads, BORE prioritizes tasks requiring high responsiveness, thereby improving overall system responsiveness and enhancing the user experience.

Demo

https://youtu.be/vbs4zj79tfo

How it works

alt Burst time bitcount vs Burst score

  • The scheduler tracks each task's burst time, which is the amount of CPU time the task has consumed since it last yielded, slept, or waited for I/O.
  • While a task is active, its burst score is continuously calculated by counting the bit count of its normalized burst time and adjusting it using pre-configured offset and coefficient.
  • The burst score functions similarly to "niceness" and takes a value between 0-39. For each decrease in value by 1, the task can consume approximately 1.25x longer timeslice.
  • This process acts as a radix conversion from binary logarithm to common logarithm, converting between two different magnitudes (nano-seconds-to-minutes timescale to about 0.01-100x scale) dimensionlessly.
  • As a result, less "greedy" tasks are given more timeslice and wakeup preemption aggressiveness, while greedier tasks that yield their timeslice less frequently are weighted less.
  • The burst score of newly-spawned processes is calculated in a unique way to prevent tasks like "make" from overwhelming interactive tasks by forking many CPU-hungry children.
  • The final effect is an equilibrium between opposing greedy and weak tasks (usually CPU-bound batch tasks) and modest and strong tasks (usually I/O-bound interactive tasks), providing a more responsive user experience under the coexistence of various types of workloads.

alt Relationships between each variables and related functions chart

Tunables

Example

$ sudo sysctl -w kernel.sched_bore=1

sched_bore (range: 1 - 1, default: 1)

1 Enables the BORE mechanism.
0 Disables the BORE mechanism. (Disabled in v5.0.0 and above)

sched_burst_cache_lifetime (range: 0 - 4294967295, default: 60000000)

How many nanoseconds to hold as cache the on-fork calculated average burst time of each task's child tasks.
Increasing this value results in less frequent re-calculation of average burst time, in barter of more coarse-grain (=low time resolution) on-fork burst time adjustments.

sched_burst_fork_atavistic (range: 0 - 3, default: 2)

0: Disables the inheritance of the average child burst time from ancestor processes.
1-3: Enables the inheritance of the average child burst time from ancestor processes using a topological hub/stub style hierarchy tree, rather than the traditional parent-to-child style.
When this feature is enabled, nodes with only one child process are ignored when finding and calculating ancestor/descendant processes for inheritance. Any number equal to or greater than 1 also represents the number of hub nodes (with a child process count of 2 or more) that update_child_burst_cache will recursively dig down for each direct child when traversing the process tree to calculate the average of descendant processes' max_burst_time.
Enabling this feature may improve system responsiveness in situations with massive process-forking, such as kernel builds.

sched_burst_penalty_offset (range: 0 - 64, default: 22)

How many bits to reduce from burst time bit count when calculating burst score.
Increasing this value prevents tasks of shorter burst time from being too strong.
Increasing this value also lengthens the effective burst time range.

sched_burst_penalty_scale (range: 0 - 4095, default: 1280)

How strongly tasks are discriminated accordingly to their burst time ratio, scaled in 1/1024 of its precursor value.
Increasing this value makes burst score rapidly grow as the burst time grows. That means tasks that run longer without sleeping/yielding/iowaiting rapidly lose their power against those that run shorter.
Decreasing vice versa.

sched_burst_smoothness_long (range: 0 - 1, default: 1)

sched_burst_smoothness_short (range: 0 - 1, default: 0)

A task's actual burst score is the larger one of its latest calculated score or its "historical" score which inherits past score(s). This is done to smoothen the user experience under "burst spike" situations.
Every time burst score is updated (when the task is dequeued/yielded), its historical score is also updated by mixing burst_time / (2 ^ burst_smoothness) into prev_burst_time. but this mixing occurs only when prev_burst_time increases. burst_smoothness=0 means no smoothening.

Special thanks

  • Hamad Al Marri, the developer famous for his task schedulers Cachy, CacULE, Baby and TT. BORE has been massively inspired from his great works. He also helped me a lot in the development.
  • Peter "ptr1337" Jung, the founder of CachyOS high-performance linux distribution, also being the admin of its development community. His continuous support, sharp analysis and dedicated tests and advice helped me shoot many problems.
  • Ching-Chun "jserv" Huang from National Cheng Kung University of Taiwan, and Hui Chun "foxhoundsk" Feng from National Taiwan Ocean University, for detailed analysis and explanation of the scheduler in their excellent treatise.
  • Piotr Górski a.k.a. "sir_lucjan" from the CachyOS community, for hosting BORE-powered CachyOS kernels for Fedora, as well as helping me shoot some bugs.
  • dim-geo, for assisting me with optimization hints.
  • Array, for helping me investigate a serious lockup bug and providing me usefultest reports.
  • Mario Roy, for providing me good insights by various detailed test reports, hosting a BORE-powered, clearlinux-based optimized kernel, and also helping me integrating some early upstream patches for improvement, by curating them for me.
  • And many whom I haven't added here yet.

ko-fi

bore-scheduler's People

Contributors

dim-geo avatar firelzrd avatar hokim98 avatar sirlucjan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bore-scheduler's Issues

The Witcher 3 hangs

The Witcher 3 (v1.31) running via Wine hangs with BORE scheduler. Sometimes this happens sooner, sometimes later, but usually this takes no longer than 1 minute to happen.

When it hangs it also seems to prevent new processes from working correctly. For example, i tried running free and pgrep after the game hanged, and they also hanged indifinitely until i switched to another TTY and killed the game from there.

Disabling BORE via sysctl kernel.sched_bore=0 fixes the issue.

Here is the video of how it looks like: https://youtu.be/fQBmaU5yG2o

Let me know if need to provide more info or try something.

My system:

CPU Intel Pentium G4620 3.7 GHz
Linux 6.4.7 with BORE 3.0.0 (compiled with Clang 15, Full LTO and O3)
Wine-Staging 8.13
DXVK 98f3887-git

bad multitasking with bore 2.4.1 and kernel 6.1

I gave bore scheduler a try with linux-tkg 6.1.38 and I found it not good at multitasking.

I have turned off the zenify values in linux-tkg. The kernel is compiled with bore 2.4.1.

I have a fast PC (Ryzen 9 5900X with 64 GB Ram) and I frequently have several virtual machines open with Windows11 and some other Linux guests. This is working fine with bore scheduler as long as the PC is not under stress. As soon as I put the PC under stress, e.g. compiling a linux-tkg kernel, the VM's are basically unusable. The response times for mouse and keyboard input become very, very long. This does not happen with CFS scheduler. With CFS scheduler I do not even notice that a compile job is running in the background.

doesn't work on 6.5.4

I'm usign linux-tkg and i get this when i apply the latest patch:

-> ######################################################
->
-> Applying your own linux-6.5 patch /home/fsyy/build/linux-tkg/linux65-tkg-userpatches/0001-linux6.5.y-eevdf-bore3.1.4.mypatch
->
-> ######################################################
patching file include/linux/sched.h
Hunk #2 succeeded at 580 (offset 3 lines).
Hunk #3 succeeded at 1017 (offset 3 lines).
patching file init/Kconfig
Hunk #1 succeeded at 1306 (offset 29 lines).
patching file kernel/sched/core.c
Hunk #1 succeeded at 4485 (offset -6 lines).
Hunk #2 succeeded at 4650 (offset -6 lines).
Hunk #3 succeeded at 4973 (offset -7 lines).
Hunk #4 succeeded at 10125 (offset 16 lines).
patching file kernel/sched/debug.c
patching file kernel/sched/fair.c
Hunk #5 succeeded at 1268 (offset -7 lines).
Hunk #6 FAILED at 5077.
Hunk #7 succeeded at 6629 (offset -4 lines).
Hunk #8 succeeded at 8364 (offset 1 line).
Hunk #9 succeeded at 8378 (offset 1 line).
1 out of 9 hunks FAILED -- saving rejects to file kernel/sched/fair.c.rej

realtime/non-realtime kernel with BORE - performance comparison

Hi Masahito Suzuki,

I have more questions.
is there any significant difference in performance between realtime and non-realtime kernel with BORE enabled?
I would like to use BORE on the realtime kernel as well.

Is there any tool written by you to check the performance of BORE when this is activated? Maybe benchmark.

A good day.

Ionut Nechita
ionutnechita
Sunlight Developer Kernel - AMD/Intel Lowlatency Platform

Linux git tree

Hi, very short question:

is there a clean Linux git tree with the BORE scheduler on top of EEVDF?

Move generic tweaks to a separate patch

I've noticed that current patch has some tweaks which may conflict with popular custom kernel builds

  1. SCHED_HRTICK disabled
--- a/kernel/Kconfig.hz
+++ b/kernel/Kconfig.hz
@@ -56,4 +56,5 @@ config HZ
 	default 1000 if HZ_1000
 
 config SCHED_HRTICK
+	default n
 	def_bool HIGH_RES_TIMERS

This one definitely conflicts with xanmod's "ck-hrtimer" patches. Also it might drop performance and latency for custom kernels which use CONFIG_HZ < 1000.
Even "linux-cachyos-bore" keeps this option enabled

  1. sysctl_sched_migration_cost value modified
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -90,7 +124,7 @@ unsigned int sysctl_sched_child_runs_first __read_mostly;
 unsigned int sysctl_sched_wakeup_granularity			= 1000000UL;
 static unsigned int normalized_sysctl_sched_wakeup_granularity	= 1000000UL;
 
-const_debug unsigned int sysctl_sched_migration_cost	= 500000UL;
+const_debug unsigned int sysctl_sched_migration_cost	= 200000UL;
 
 int sched_thermal_decay_shift;
 static int __init setup_sched_thermal_decay_shift(char *str)

Similar CFS tweak is used in Liquorix/Zen kernel.

CPU migration cost.............: 0.5 -> 0.25 ms

burst/smoothness calculation

Right now burst score is calculated like:

return (new + old * ((1 << smoothness) - 1)) >> smoothness;

which is like:
(new + old (2^ smoothness -1 ))/2^smoothness

If we rearrange terms, it is equal to:

(new-old)/2^smoothness + old
Which helps the user to understand better the role of smoothness factor (maybe document this in readme?)

and we can write it, if new > old like:
(new -old) >>smoothness + old

for new<old, we can use this:

old - ((old-new)>>smoothness)

maybe this calculation is faster than multiplication...

Update readme to explain Bore scheduler sysctl parameters

Hi, Could you please explain what each parameter does?

kernel.sched_burst_granularity = 12
kernel.sched_burst_penalty_scale = 1280
kernel.sched_burst_smoothness = 2

I assume that increasing smoothness makes burst score more dependent on old burst score values?
What about granularity & penalty score?

3.1.0-stable causes compilation warnings

In kernel 6.4.8, I noticed some warnings in the latest version of bore-stable that didn't appear in the previous 3.0.0-stable. I'm using Clang 16.0.6 for compilation, so these warnings may not appear with GCC.

kernel/sched/fair.c:163:9: warning: comparison of distinct pointer types ('typeof (((40UL << 8) - 1)) *' (aka 'unsigned long *') and 'typeof (scaled_penalty) *' (aka 'unsigned int *')) [-Wcompare-distinct-pointer-types]
        return min(MAX_BURST_PENALTY, scaled_penalty);
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/minmax.h:67:19: note: expanded from macro 'min'
#define min(x, y)       __careful_cmp(x, y, <)
                        ^~~~~~~~~~~~~~~~~~~~~~
./include/linux/minmax.h:36:24: note: expanded from macro '__careful_cmp'
        __builtin_choose_expr(__safe_cmp(x, y), \
                              ^~~~~~~~~~~~~~~~
./include/linux/minmax.h:26:4: note: expanded from macro '__safe_cmp'
                (__typecheck(x, y) && __no_side_effects(x, y))
                 ^~~~~~~~~~~~~~~~~
./include/linux/minmax.h:20:28: note: expanded from macro '__typecheck'
        (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                   ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~
kernel/sched/fair.c:1067:20: warning: comparison of distinct pointer types ('typeof (1) *' (aka 'int *') and 'typeof (__calc_delta_fair(delta_exec, curr, false)) *' (aka 'unsigned long long *')) [-Wcompare-distinct-pointer-types]
        curr->vruntime += max(1, calc_delta_fair(delta_exec, curr));
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/minmax.h:74:19: note: expanded from macro 'max'
#define max(x, y)       __careful_cmp(x, y, >)
                        ^~~~~~~~~~~~~~~~~~~~~~
./include/linux/minmax.h:36:24: note: expanded from macro '__careful_cmp'
        __builtin_choose_expr(__safe_cmp(x, y), \
                              ^~~~~~~~~~~~~~~~
./include/linux/minmax.h:26:4: note: expanded from macro '__safe_cmp'
                (__typecheck(x, y) && __no_side_effects(x, y))
                 ^~~~~~~~~~~~~~~~~
./include/linux/minmax.h:20:28: note: expanded from macro '__typecheck'
        (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                   ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~

Possible typo in 4.1.14 patch files

Hi Masahito,

I'm comparing 4.1.13 and 4.1.14 and noticed a typo on line 553 in 0001-linux6.6.y-cachyos-bore4.1.14.patch. See also 0001-linux6.7.y-bore4.1.14.patch and 0001-linux6.8.y-bore4.1.14.patch, line 549.

.proc_handler = proc_douintvec,

The line is missing &.

.proc_handler = &proc_douintvec,

how to know it really works?

hi. i applied this patch to my android kernel. i can see the bore appears in dmesg. and also fell a little bit more responsive. but im suspecting it may caused by placebo.

is there any tools i can use to check bore really working? which part of performance improve is bore aimed to? (i mean can i see some thing significantly faster after bore applied?)

backport bore 5.2.x to 6.6.30+

Hi,

since 6.6 is still the latest lts, can you please keep 6.6 patches updated with the latest bore scheduler?

thank you!

Incomplete suspend with 6.6.34

Hi. I wonder if someone is getting a weird problem like this. Using the latest 6.6.34 (but was there also on previouses) with bore 5.1.0, the system suspend is not working completely, i.e. for instance running 'systemctl suspend' the system is suspended but not the fans which continue to spinning. The same problem doesn't happen in the same kernel built with bore disabled (i.e. a kernel where the only difference in the setup is the CONFIG_BORE [not set] in the .config file). This is weird and is even difficult to track.

patch as of 5.0.3 and 5.1.0 no longer applies to 6.6.30

Starting from kernel 6.6.30 the bore patchset (either 5.0.3 and 5.1.0) no longer applies correctly anymore, this because there were in 6.6.30 some upstream patches to the eevdf code upon which bore relies on.

Also in bore-5.1.0 I get random freeze with audio looping that reminds the freeze that were described at the beginning of the issue #32. It doesn't happen so often, and thus it's more hard to reproduce, but I haven't got the same problem using previous 5.0.3 upon 6.6.30 (and reverting the eevdf's 6.6.30 patches to let 5.0.3 applying correctly).

Last but not least, in legacy dir, there are patches for LTS kernels 4.4, 4.19 and 5.15 series, but seems it is missed the one for another LTS, which is the 5.10 series.

Could you provide a brief version history?

Hi, I've been using your BORE scheduler for quite some time now and it's awesome!

I'm getting zero lag/stalls and good gaming frametimes on my main production/gaming rig (KDE Plasma, Ryzen 5900X) and I'd go as far as to say that BORE works even better than PDS/BMQ for me, while being seemingly much less complex at the same time. Thus, I thank you very much for your dedicated and ongoing work!

However, to me it's not really clear what exactly changes between releases, especially between 1.7.x and 2.0.y series. It would be very nice if you could explain this to me, and maybe put a little comment into future commits (even if it's just "tune X" or "fix bug Y"). Your answer is very much appreciated. Cheers and TYVM!

System deadlock

Hi,
My system becomes deadlock at random times with the kernel compiled with BORE(I 've tested it with non-BORE kernel). On my old Intel core2duo processor, launching processes like xfce screensaver, Firefox etc at random times brings deadlock(The system is running in the background but the interface gets stuck - doesn't accept any keystrokes. Mouse cursor works!).

When I tested the same kernel on Intel core i5 processor, the issue doesn't appear.

unsigned int possible to wrap up

+ sysctl_sched_timeslice_factor / (cfs_rq->nr_running - 1 || 1),

There is a chance that nr_running equals to zero even if there is a task. By decrementing it by 1, this will results to unsigned int max. To be safe, it is better to use h_nr_running since this includes all normal, batch, and idle tasks.

unsigned int		h_nr_running;      /* SCHED_{NORMAL,BATCH,IDLE} */

To be safer, make a if check for zero case.

Thank you

Fix required for kernel 6.6.44

A change in upstream code for kernel 6.6.44, as of this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.6.y&id=7ca529748b2df93c7c8c5d73e01e1650ebc226e5

requires bore code to align to newer arguments of reweight_task(), otherwise compilation ends up with error:

kernel/sched/fair.c:159:6: error: conflicting types for 'reweight_task'; have 'void(struct task_struct *, int)'
  159 | void reweight_task(struct task_struct *p, int prio);
      |      ^~~~~~~~~~~~~
In file included from kernel/sched/fair.c:59:
kernel/sched/sched.h:2443:13: note: previous declaration of 'reweight_task' with type 'void(struct task_struct *, const struct load_weight *)'
 2443 | extern void reweight_task(struct task_struct *p, const struct load_weight *lw);
      |             ^~~~~~~~~~~~~

Some code slips outside #ifdef CONFIG_SCHED_BORE ... #endif // CONFIG_SCHED_BORE

Hi.

I tried the patch 0001-linux6.6.y-bore4.1.13.patch on kernel 6.6.16 with SCHED_BORE enabled and it seems working. Hoewever I noticed that when the patch is applied but the SCHED_BORE is not enabled, there is some part of code outside a #ifdef CONFIG_SCHED_BORE ... #endif // CONFIG_SCHED_BORE pair. In other words when the patch is applied, but one choose as default another scheduler (e.g. EEVDF) there is some code that interferes with the stock kernel scheduler. The result of this "interference" is that booting with the resulting kernel (i.e. bore patch applied but stock EEVDF scheduler enabled), the stock scheduler no longer works correctly, and at desktop boot I get an audio loop on greeting sound and a kernel freeze.

Would it be possible to include entirely the BORE code within a #ifdef CONFIG_SCHED_BORE ... #endif // CONFIG_SCHED_BORE pair, so that it won't interfere with stock kernel, when the config option SCHED_BORE is not enabled (well, except comments and documentation of course)? This is useful for getting multiple kernels from the same code by just changing the config option.

Also would it be possible to achieve a similar thing for the LE9UO companion patchset, e.g. a CONFIG_LE9UO option that enclose entirely the modified code and that enables the code only when such option is enabled at config time?

Thanks.

Burst score vs niceness

Hello,

I wanted to analyze what bore scheduler does compared to nice values.
I use ananicy-cpp to give priority to gui processes.
I modified borestats to print pids and burst score.

#!/usr/bin/env ruby

def each_sched_debug_block
        sched_debug = File.read('/sys/kernel/debug/sched/debug')
        sched_debug.scan(/.*?\R{2}/m) { |block|
                yield(block.chomp)
        }
end

def get_task_table(**options)
# options[:per_cpu]
        (cpus = Hash.new)[0] = cpu = Hash.new
        each_sched_debug_block() { |block|
                case head = (lines = block.lines).shift
                when /^cpu#(\d+),/
                        if options[:per_cpu]
                                cpus[cpu_idx = $1.to_i] = cpu = Hash.new
                        end
                when /^runnable tasks:/
                        tasks = (cpu[:tasks] ||= Array.new)
                        lines[2..].each { |line|
                                state = line[1]
                                name  = line[3..17].strip
                                #puts "#{name}"
                                rest  = line[18..].strip.split(/\s+/)
                                #puts "#{rest[0]}"
                                tasks << [state, name, *rest]
                        }
                end
        }
        return cpus
end

def count_per_burst(task_table)
        bs_pos = 10
        bs_pos += 4 if File.exist?('/sys/kernel/debug/sched/base_slice_ns')

        table = Hash.new
        task_table[0][:tasks].each { |task|
                state       = task[ 0]
                #name        = task[ 1]
                burst_score = task[bs_pos].to_i

                table[burst_score] = table.fetch(burst_score, 0) + 1 #if 'RDI'.index(state)
        }
        return table
end

def print_name_per_burst(task_table)
        bs_pos = 10
        bs_pos += 4 if File.exist?('/sys/kernel/debug/sched/base_slice_ns')

        #table = Hash.new
        task_table[0][:tasks].each { |task|
                state       = task[ 0]
                name        = task[ 1]
                pid         = task[ 2]
                burst_score = task[bs_pos].to_i
                puts "#{pid} #{burst_score}"
                #table[burst_score] = table.fetch(burst_score, 0) + 1 #if 'RDI'.index(state)
        }
        #return table
end

def show_histogram(num_per_burst_score)
        printf("\033[2J")
        puts "Score   Tasks"
        max_count = [1, num_per_burst_score.values.max].max
        (0..39).each { |burst_score|
                count = num_per_burst_score[burst_score] || 0
                puts "#{burst_score}\t#{count}\t#{'|' * (63.to_f * count / max_count).ceil}"
        }
end

if !Process.uid.zero?
        STDERR.puts "Root privilege required."
        exit
end
if !File.exist?('/proc/sys/kernel/sched_bore')
        STDERR.puts "BORE scheduler not found."
        exit
end

task_table = get_task_table(per_cpu: false)
print_name_per_burst(task_table)

#loop {
#       task_table = get_task_table(per_cpu: false)
#       num_per_burst_score = count_per_burst(task_table)
#       show_histogram(num_per_burst_score)
#       sleep 0.1
#}

and did some analysis like this:

join <( ps -eL -o lwp= -o ni= -o comm= | grep -v " - "  | awk '{print $1,$2+20,$3}' | sort -k 1b,1  ) <( ./borestats.rb | sort -k 1b,1  ) | awk '{print $1,$3,$2-$4,$2,$4}' | sort -k 3 -n -r

This will show the biggest deviations between priority/nice and burst score. grep -v " - " removes the tasks that do not have nice values like FIFO, RR, etc...

...
5725 startplasma-x11 -9 11 20
30740 smbd[192.168.12 -9 11 20
249958 sort -9 16 25
249950 sort -9 16 25
249949 awk -9 16 25
5214 X -10 8 18
5674 sddm-helper -11 11 22
13688 master_me -12 6 18
5660 kworker/R-defer -13 0 13
5659 kworker/R-cifso -13 0 13
196458 kworker/u9:1 -13 0 13
5689 pipewire -14 5 19
5661 kworker/R-cifs- -14 0 14
5658 kworker/R-cifsf -14 0 14
5657 kworker/R-smb3d -14 0 14
12809 kworker/R-xfs_m -14 0 14
5692 wireplumber -15 9 24
52960 kworker/0:2H -15 0 15
234342 kworker/2:1H -15 0 15
12808 kworker/R-xfsal -15 0 15
5656 kworker/R-cifsi -16 0 16
5386 kworker/2:2H-kblockd -16 0 16
52988 kworker/1:2H-ttm -16 0 16
44 kworker/3:0H-kblockd -16 0 16
30148 kworker/R-vfio- -16 0 16
29 kworker/1:0H-kblockd -16 0 16
2768 kworker/3:1H-ttm -16 0 16
249957 ruby -17 16 33
212065 htop -17 16 33
92 kworker/u9:0-uvcvideo -18 0 18

I was surprised to see that pipewire, wireplumber and X were down-prioritized compared to ananicy values...
For example pipewire has -20 nice value, which means 5 priority, but its burst score is 19. So there is a -14 difference.
I don't experience any problems in my system, but it's interesting to check what bore boosts and what not.

root@gentoo:~] 3s # journalctl -b | grep -i bore
Μαρ 11 09:41:02 gentoo kernel: BORE (Burst-Oriented Response Enhancer) CPU Scheduler modification 4.5.2 by Masahito Suzuki
[root@gentoo:~] # uname -a
Linux gentoo 6.6.21-x86_64 #1 SMP PREEMPT_DYNAMIC Fri Mar  8 19:48:30 EET 2024 x86_64 Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz GenuineIntel GNU/Linux

Here are my modified bore settings.

[root@gentoo:~] # sysctl -a | grep -i sched_b
kernel.sched_bore = 1
kernel.sched_burst_cache_lifetime = 60000000
kernel.sched_burst_fork_atavistic = 0
kernel.sched_burst_penalty_offset = 0
kernel.sched_burst_penalty_scale = 1280
kernel.sched_burst_score_rounding = 0
kernel.sched_burst_smoothness_long = 1
kernel.sched_burst_smoothness_short = 0

In the next few days I will try to disable ananicy and see if bore is fine without it.
PS: You can close this issue, it's just an observation about how bore behaves... if you have any thoughts or insights or tests to suggest, feel free!

[no issue] thank you very much for all the work done

Hi Masahito Suzuki,

I thank you very much for your work.
Everything you've done so far with BORE has been great and thank you so much for that.
I use it on several systems and it is noticed that the algorithm works very well.
I like it and I would like to work with you in the future on this side of testing and verification the BORE scheduler.
A collaboration in the future would be on the audio side with lowlatency with the BORE scheduler.

A good day.

Ionut Nechita
ionutnechita
Sunlight Developer Kernel - AMD/Intel Lowlatency Platform

Unexpected WARN_ON in 6.7.2-rc1 in place_entity()

While testing BORE as a possible alternative to BMQ/PDS (and overall seeing very encouraging results!) I just hit the following while compiling in the background:

Jan 23 22:30:54 hho kernel: ------------[ cut here ]------------
Jan 23 22:30:54 hho kernel: WARNING: CPU: 10 PID: 11339 at kernel/sched/fair.c:5429 place_entity+0x20d/0x270
Jan 23 22:30:54 hho kernel: Modules linked in: mq_deadline rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace sunrpc sch_fq_codel bpf_preload mousedev iwlmvm amdgpu mac80211 snd_ctl_led snd_hda_codec_realtek i2c_algo_bit drm_ttm_helper snd_hda_codec_generic snd_hda_codec_hdmi libarc4 pkcs8_key_parser ttm drm_exec drm_suballoc_helper snd_hda_intel edac_mce_amd amdxcp snd_intel_dspcfg uvcvideo drm_buddy crct10dif_pclmul snd_hda_codec videobuf2_vmalloc crc32_pclmul gpu_sched lm92 iwlwifi videobuf2_memops crc32c_intel snd_hwdep uvc ghash_clmulni_intel drm_display_helper videobuf2_v4l2 sha512_ssse3 snd_hda_core r8169 sha256_ssse3 drivetemp wmi_bmof sha1_ssse3 cec videodev thinkpad_acpi snd_pcm cfg80211 realtek ledtrig_audio drm_kms_helper snd_timer snd_rn_pci_acp3x psmouse mdio_devres snd_acp_config videobuf2_common platform_profile ipmi_devintf ucsi_acpi snd snd_soc_acpi rapl mc serio_raw k10temp i2c_piix4 snd_pci_acp3x soundcore typec_ucsi rfkill libphy ipmi_msghandler roles drm typec battery ac video wmi i2c_scmi button
Jan 23 22:30:54 hho kernel: CPU: 10 PID: 11339 Comm: cc1plus Not tainted 6.7.2-rc1 #1
Jan 23 22:30:54 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
Jan 23 22:30:54 hho kernel: RIP: 0010:place_entity+0x20d/0x270
Jan 23 22:30:54 hho kernel: Code: 49 8d 34 0a eb c2 4c 89 e6 48 89 4c 24 10 48 89 7c 24 18 e8 c5 d7 ff ff 48 89 c3 48 8b 4c 24 10 48 8b 7c 24 18 e9 44 fe ff ff <0f> 0b eb ab 4c 89 e6 44 88 44 24 0f 48 89 4c 24 10 4c 89 4c 24 18
Jan 23 22:30:54 hho kernel: RSP: 0000:ffffc9000ac17dc0 EFLAGS: 00010046
Jan 23 22:30:54 hho kernel: RAX: 00000000b3ac4821 RBX: 0000000000009637 RCX: 0000000000015ab9
Jan 23 22:30:54 hho kernel: RDX: 0000000000015ab9 RSI: 0000000000015ab9 RDI: 0000000000000000
Jan 23 22:30:54 hho kernel: RBP: ffffc9000ac17e08 R08: 0000000000000000 R09: ffff888109a9be00
Jan 23 22:30:54 hho kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8881009c6980
Jan 23 22:30:54 hho kernel: R13: ffff8887ef6abfc0 R14: 00007e7d1f0dc537 R15: 0000000000000009
Jan 23 22:30:54 hho kernel: FS:  00007fa0b2f4cf00(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
Jan 23 22:30:54 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 23 22:30:54 hho kernel: CR2: 0000000000ac1c70 CR3: 00000001c5d36000 CR4: 0000000000350ef0
Jan 23 22:30:54 hho kernel: Call Trace:
Jan 23 22:30:54 hho kernel:  <TASK>
Jan 23 22:30:54 hho kernel:  ? place_entity+0x20d/0x270
Jan 23 22:30:54 hho kernel:  ? __warn+0x7d/0x120
Jan 23 22:30:54 hho kernel:  ? place_entity+0x20d/0x270
Jan 23 22:30:54 hho kernel:  ? report_bug+0x155/0x180
Jan 23 22:30:54 hho kernel:  ? handle_bug+0x36/0x70
Jan 23 22:30:54 hho kernel:  ? exc_invalid_op+0x13/0x60
Jan 23 22:30:54 hho kernel:  ? asm_exc_invalid_op+0x16/0x20
Jan 23 22:30:54 hho kernel:  ? place_entity+0x20d/0x270
Jan 23 22:30:54 hho kernel:  enqueue_task_fair+0x16d/0x570
Jan 23 22:30:54 hho kernel:  activate_task+0x54/0x90
Jan 23 22:30:54 hho kernel:  ttwu_do_activate.isra.0+0x49/0x140
Jan 23 22:30:54 hho kernel:  try_to_wake_up+0x19a/0x400
Jan 23 22:30:54 hho kernel:  __do_softirq+0x249/0x255
Jan 23 22:30:54 hho kernel:  irq_exit_rcu+0x62/0x80
Jan 23 22:30:54 hho kernel:  sysvec_apic_timer_interrupt+0x32/0x80
Jan 23 22:30:54 hho kernel:  asm_sysvec_apic_timer_interrupt+0x16/0x20
Jan 23 22:30:54 hho kernel: RIP: 0033:0xea4b2e
Jan 23 22:30:54 hho kernel: Code: ff ca 83 fa 02 76 75 48 83 c4 08 31 c0 5b 5d 41 5c 41 5d c3 0f 1f 84 00 00 00 00 00 48 c1 e0 06 45 31 ed 80 b8 c2 89 8e 02 00 <75> 40 e8 3b c5 22 00 83 fd 26 0f 94 c2 48 89 03 44 08 e2 75 05 83
Jan 23 22:30:54 hho kernel: RSP: 002b:00007ffd2a5a3950 EFLAGS: 00000246
Jan 23 22:30:54 hho kernel: RAX: 00000000000020c0 RBX: 00007fa0b1141260 RCX: 0000000000000006
Jan 23 22:30:54 hho kernel: RDX: 000000000000000a RSI: 00007ffd2a5a39c4 RDI: 00007fa0b1140b00
Jan 23 22:30:54 hho kernel: RBP: 0000000000000083 R08: 0000000000000000 R09: 0000000000000009
Jan 23 22:30:54 hho kernel: R10: 0000000000000000 R11: 00007fa0b2c277e0 R12: 0000000000000000
Jan 23 22:30:54 hho kernel: R13: 0000000000000000 R14: 00007fa0b1140b00 R15: 0000000000000020
Jan 23 22:30:54 hho kernel:  </TASK>
Jan 23 22:30:54 hho kernel: ---[ end trace 0000000000000000 ]---

It seems to correspond to this line which would indicate that a calculation resulted in an unexpected load=0. At least that's my reading of the trace. Hope this helps.

I cannot be certain this was caused by BORE (i.e. it could also be a rare generic condition in EEVDF), but this has never happened before - only after adding the BORE patch.

How does BORE scheduler go along with the linux-zen patches?

I am currently looking into linux-tkg kernel. They provide a config option for the BORE scheduler as well as for the linux-zen kernel patches to modify CFS settings.

linux-zen aims to improve latency and responsiveness of the CFS kernel. And so does the BORE scheduler. Should they both be applied? What is the recommendation?

Since upgrading BORE v2.2.3 -> v2.2.6: I am having soft lockups/rcu stalls of several processes (bash, emerge, python, …)

Mai 16 14:26:51 Gentoo kernel: rcu: INFO: rcu_sched self-detected stall on CPU
Mai 16 14:26:51 Gentoo kernel: rcu:         10-....: (21000 ticks this GP) idle=8ba4/1/0x4000000000000000 softirq=2355521/2355521 fqs=4556
Mai 16 14:26:51 Gentoo kernel: rcu:         (t=21001 jiffies g=17817745 q=6862 ncpus=24)
Mai 16 14:26:51 Gentoo kernel: CPU: 10 PID: 53603 Comm: xargs Not tainted 6.3.2-gentoo-BOREv2.2.6 #1
Mai 16 14:26:51 Gentoo kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F34 07/08/2021
Mai 16 14:26:51 Gentoo kernel: RIP: 0010:sched_fork+0x15e/0x2a0
Mai 16 14:26:51 Gentoo kernel: Code: 00 48 8d bd 38 05 00 00 48 8d 88 b8 fa ff ff 48 39 c7 0f 84 30 01 00 00 31 c0 31 f6 ba 08 00 00 00 c4 e2 eb f7 91 f0 >
Mai 16 14:26:51 Gentoo kernel: RSP: 0018:ffff9dbf9ceefc30 EFLAGS: 00000206
Mai 16 14:26:51 Gentoo kernel: RAX: 00015ea74dbf1618 RBX: ffff8e75794b8000 RCX: ffff8e71b78cbe00
Mai 16 14:26:51 Gentoo kernel: RDX: 0000000000004c19 RSI: 000000009ba209d8 RDI: ffff8e7259f00538
Mai 16 14:26:51 Gentoo kernel: RBP: ffff8e7259f00000 R08: ffff9dbf9ceefbf0 R09: ffff8e75794b82c0
Mai 16 14:26:51 Gentoo kernel: R10: 0000000001200000 R11: 0000000000000000 R12: 0000000000000000
Mai 16 14:26:51 Gentoo kernel: R13: 0000000000000000 R14: ffff9dbf9ceefe10 R15: ffff8e75794b8000
Mai 16 14:26:51 Gentoo kernel: FS:  00007f729989c740(0000) GS:ffff8e785ec80000(0000) knlGS:0000000000000000
Mai 16 14:26:51 Gentoo kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mai 16 14:26:51 Gentoo kernel: CR2: 00007f2b2bf1e72c CR3: 00000005f3b7a000 CR4: 0000000000750ee0
Mai 16 14:26:51 Gentoo kernel: PKRU: 55555554
Mai 16 14:26:51 Gentoo kernel: Call Trace:
Mai 16 14:26:51 Gentoo kernel:  <TASK>
Mai 16 14:26:51 Gentoo kernel:  copy_process+0x9b3/0x24d0
Mai 16 14:26:51 Gentoo kernel:  ? raw_spin_rq_lock_nested+0x5/0x10
Mai 16 14:26:51 Gentoo kernel:  ? newidle_balance.constprop.0+0x223/0x3a0
Mai 16 14:26:51 Gentoo kernel:  ? xa_load+0xa7/0xd0
Mai 16 14:26:51 Gentoo kernel:  kernel_clone+0xba/0x3f0
Mai 16 14:26:51 Gentoo kernel:  __x64_sys_clone+0x84/0xb0
Mai 16 14:26:51 Gentoo kernel:  do_syscall_64+0x5b/0x80
Mai 16 14:26:51 Gentoo kernel:  ? exit_to_user_mode_prepare+0x74/0x110
Mai 16 14:26:51 Gentoo kernel:  entry_SYSCALL_64_after_hwframe+0x4b/0xb5
Mai 16 14:26:51 Gentoo kernel: RIP: 0033:0x7f7299976ae3
Mai 16 14:26:51 Gentoo kernel: Code: 00 00 0f 1f 44 00 00 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 >
Mai 16 14:26:51 Gentoo kernel: RSP: 002b:00007ffdabcdff98 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Mai 16 14:26:51 Gentoo kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7299976ae3
Mai 16 14:26:51 Gentoo kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
Mai 16 14:26:51 Gentoo kernel: RBP: 00007ffdabce00d0 R08: 0000000000000000 R09: 0000000000000000
Mai 16 14:26:51 Gentoo kernel: R10: 00007f729989ca10 R11: 0000000000000246 R12: 0000000000000000
Mai 16 14:26:51 Gentoo kernel: R13: 0000000000000000 R14: 00005581be207220 R15: 000000000000000b
Mai 16 14:26:51 Gentoo kernel:  </TASK>

Kernel version 6.2.3 w/ voluntary preemption. Problems started after 12hrs of uptime.
Ryzen 5900X. This is without constgran or EEVDF.

I am under the suspicion that 21b257b and/or 29eb369 may be the cause, since the issue happened right after sched_fork -> it affected a task child.

full_dmesg.log
kernel_config.txt

32-bit target

in core.c:
if (cnt) avg = (sum / cnt) << 8;

This code will not work in a 32-bit target environment. correct to
if (cnt) avg = do_div(sum, cnt) << 8;

Missing LICENSE

bore-scheduler has no license. Since the Linux kernel is GPLv2+, I assume that this code is also licensed under GPLv2+.

Possible BORE 4.3.0 regression

I tried BORE 4.3.0 with the XanMod Main 6.6.x kernel. This is a non-RT variant. There are continuous fame drops while playing a video in Google Chrome, and running a background job computing prime numbers on all CPU cores. This QA test passes with BORE 4.2.4, even with 3x the number of threads; { kernel.sched_autogroup_enabled = 1, kernel.sched_bore = 1 }.

Maybe, 4.3.0 is not compatible with XanMod. RE: https://github.com/marioroy/clearmod

Steam issues

Hi there. As a once very happy CacULE user, I'm delighted you've continued developing this scheduler. I have, however, hit some issues.
System is fully up-to-date EndavourOS (arch) with 5600X/5700XT with amdgpu. I'm running a tkg-bore kernel with greysky zen3 optimisation and all other settings at default

  1. Steam games refusing to launch. They just hang indefinitely. This can be resolved by rebooting, but doesn't happen with any other scheduler, and setup is otherwise identical when testing
  2. Major hitching during play. This was especially notable in Monster Hunter World, where I had 5-10 second pauses during combat. I though my PC had crashed and was surprised when it started again

This is of course not very scientific, but felt it was worth reporting anyway

BORE 4.1.13 for Linux 6.1.x kernel not working - /proc/sys/kernel/sched_bore blank value (cannot set to 1)

Thank you, for 0001-linux6.1.y-bore4.1.13.patch. I made one small edit so able to apply to the XanMod LTS 6.1.77 kernel.

259,260c259,260
<   *  Adaptive scheduling granularity, math enhancements by Peter Zijlstra
<   *  Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
---
>   *  Remove energy efficiency functions by Alexandre Frade
>   *  (C) 2021 Alexandre Frade <[email protected]>

The system boots fine and see the following message in dmesg.

[    0.000000] BORE (Burst-Oriented Response Enhancer) CPU Scheduler modification 4.1.13 by Masahito Suzuki

When I run various benchmarks, it seems that BORE is not working. So I queried the kernel.sched_bore knob. It is a blank value and unable to enable it.

$ sudo su
# sysctl kernel.sched_bore
# sysctl kernel.sched_bore | wc -c
0

# sysctl -w kernel.sched_bore=1
sysctl: setting key "kernel.sched_bore": Invalid argument

Three other knobs are also blank { sched_burst_score_rounding, sched_burst_smoothness_long, and sched_burst_smoothness_short }.

# sysctl kernel.sched_burst_cache_lifetime
kernel.sched_burst_cache_lifetime = 60000000

# sysctl kernel.sched_burst_fork_atavistic
kernel.sched_burst_fork_atavistic = 2

# sysctl kernel.sched_burst_penalty_offset
kernel.sched_burst_penalty_offset = 22

# sysctl kernel.sched_burst_penalty_scale
kernel.sched_burst_penalty_scale = 1280

# sysctl kernel.sched_burst_score_rounding
# sysctl kernel.sched_burst_score_rounding | wc -c
0

# sysctl kernel.sched_burst_smoothness_long
# sysctl kernel.sched_burst_smoothness_long | wc -c
0

# sysctl kernel.sched_burst_smoothness_short
# sysctl kernel.sched_burst_smoothness_short | wc -c
0

I verified that the values are set in kernel/sched/fair.c. Nothing seems out of the ordinary and unsure why sched_bore is blank running XanMod LTS 6.1.77 with BORE patch applied.

cd /dev/tmp
$ tar xf ~/clearmod/rpmbuild.lts/SOURCES/6.1.77-xanmod1.tar.gz
$ cd linux-6.1.77-xanmod1/
$ patch -p1 < ~/clearmod/patches/0001-linux6.1.y-bore4.1.13.patch 
patching file include/linux/sched.h
patching file init/Kconfig
patching file kernel/sched/core.c
Hunk #1 succeeded at 4376 (offset 8 lines).
Hunk #2 succeeded at 4523 (offset 8 lines).
Hunk #3 succeeded at 4831 (offset 8 lines).
Hunk #4 succeeded at 9826 (offset 14 lines).
patching file kernel/sched/debug.c
patching file kernel/sched/fair.c
Hunk #1 succeeded at 22 (offset 3 lines).
Hunk #2 succeeded at 132 (offset 3 lines).
Hunk #3 succeeded at 250 (offset 3 lines).
Hunk #4 succeeded at 830 (offset 3 lines).
Hunk #5 succeeded at 1042 (offset 3 lines).
Hunk #6 succeeded at 6312 (offset 3 lines).
Hunk #7 succeeded at 7833 (offset -250 lines).

The knobs are defined inside kernel/sched/fair.c.

#ifdef CONFIG_SCHED_BORE
bool __read_mostly sched_bore                   = 1;
bool __read_mostly sched_burst_score_rounding   = 0;
bool __read_mostly sched_burst_smoothness_long  = 1;
bool __read_mostly sched_burst_smoothness_short = 0;
u8   __read_mostly sched_burst_fork_atavistic   = 2;
u8   __read_mostly sched_burst_penalty_offset   = 22;
uint __read_mostly sched_burst_penalty_scale    = 1280;
uint __read_mostly sched_burst_cache_lifetime   = 60000000;
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.