The benchmark from scverse

Performance

For improved reliability, the machine needs some setup. Maybe we can validate that automatically?

Guides:

Once (total, in BIOS)

Disable Hyperthreading/SMT (looks complex, prefer doing in BIOS)

Per boot

Global setup:

Use performance governor for all CPUS
Disable processor boosting
Reduce swappiness
No ASLR

tentative script so far:

# performance governor
echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# no boosting
echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost
# low swappiness
sudo sysctl vm.swappiness=10
# no ASLR
sudo sysctl kernel.randomize_va_space=0

Assign roles to CPUS:
- Make sure no timed services run on the CPUs used in benchmarks (processor shielding)
- Set up interrupts (SMP affinity)

Per run

Set the benchmark program’s task affinity to a fixed cpu. (ASV: --cpu-affinity, subprocess: taskset -c 0 ./mybenchmark)

Security

Currently, the public IP has no domain, and therefore no ability to do HTTPS.

Therefore, GitHub has to send the WebHook secret in plain text, and it can (maybe, theoretically?) be sniffed.

So we should probably only allow running benchmarks on branches and tags, not arbitrary commits (as GitHub stores a fork’s PR commits in the parent repo)

Fetch configured branch

the PR job doesn’t fetch the configured branch(es) in the ASV config. Fix that.

Use python for server?

I'm concerned about use of Rust here for maintainability w.r.t bus number.

From what I can tell it's a pretty simple web server that could also be done in flask, but if it were done in flask could be more easily maintained by other scverse core team members.

Show comparison for current envs only

Currently, if you change the benchmark dependency matrix in a PR, asv compare will pick up the older runs:

Change Before [c68557c5] After [2a358df6] Ratio Benchmark (Parameter)

174M n/a n/a preprocessing.PreprocessingSuite.peakmem_calculate_qc_metrics('pbmc68k_reduced') [scvbench/conda-py3.12-flit-h5py-memory_profiler-natsort-numpy-pandas-pooch-pytest-pytoml-scanpy-scipy-setuptools_scm-zarr]

155M 163M 1.05 preprocessing.PreprocessingSuite.peakmem_calculate_qc_metrics('pbmc68k_reduced') [scvbench/conda-py3.12-h5py-memory_profiler-natsort-numpy-pandas-pytest-scanpy-scipy-zarr]

Change	Before [c68557c5]	After [2a358df6]	Ratio	Benchmark (Parameter)
	174M	n/a	n/a	preprocessing.PreprocessingSuite.peakmem_calculate_qc_metrics('pbmc68k_reduced') [scvbench/conda-py3.12-flit-h5py-memory_profiler-natsort-numpy-pandas-pooch-pytest-pytoml-scanpy-scipy-setuptools_scm-zarr]
	155M	163M	1.05	preprocessing.PreprocessingSuite.peakmem_calculate_qc_metrics('pbmc68k_reduced') [scvbench/conda-py3.12-h5py-memory_profiler-natsort-numpy-pandas-pytest-scanpy-scipy-zarr]

We can manually specify the env using asv compare -E $spec to show only benchmarks from the current env.

Python code to get autodetected envs:

import asv
import json

conf = asv.config.Config.load("asv.conf.json")
env_names = [env.name for env in asv.environment.get_environments(conf, "")]
print(json.dumps(env_names))

So we could run this, and then specify those:

let env_specs: Vec<string> = serde_json::parse(stdout);
for env_spec in env_specs {
    compare_cmd.args(["-E", env_spec]);
}

Maybe a better solution shows up in airspeed-velocity/asv#1394

Old benchmarks staying around

https://github.com/scverse/anndata/pull/1119/checks

So I have a benchmark run here (hopefully the same, screenshot attached) where anndata.GarbargeCollectionSuite.peakmem_garbage_collection and anndata.peakmem_garbage_collection have both been removed but still appear and seem to be run against main. I'm not sure why.

Use systemd’s credential system

https://systemd.io/CREDENTIALS/

CI to build for Rocky

It links against libgit and OpenSSH, so it needs to be build on or for the server

Run ASV and display results

compare:

https://github.com/HaoZeke/asv-numpy
https://github.com/anderspitman/autobencher
https://github.com/JuliaCI/Nanosoldier.jl

Backfill for tags

 git tag --list --sort=version:refname '*.*.*' | asv run HASHFILE:-

Interactive use with PRs:

Bless a PR’s last commit and somehow send webhook
Update base branch and fetch commit: git fetch origin main:main $sha
[main, $sha] | str join "\n" | asv run HASHFILE:- (or just run on one commit if main exists)
asv compare --only-changed main $sha will be empty if no changes, otherwise have a markdown table as result

Example result without --only-changed

All benchmarks:

Change Before [a4786471] After [5da3c619] <pull/1348/head> Ratio Benchmark (Parameter)

99.9M 101M 1.01 readwrite.H5ADBackedWriteSuite.peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')

100M 100M 1 readwrite.H5ADBackedWriteSuite.peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')

395±2ms 397±3ms 1.01 readwrite.H5ADBackedWriteSuite.time_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')

118±0.5ms 117±1ms 1 readwrite.H5ADBackedWriteSuite.time_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')

15.3828125 15.6953125 1.02 readwrite.H5ADBackedWriteSuite.track_peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')

15.35546875 15.37109375 1 readwrite.H5ADBackedWriteSuite.track_peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')

91214354 91214570 1 readwrite.H5ADInMemorySizeSuite.track_actual_in_memory_size('http://falexwolf.de/data/pbmc3k_raw.h5ad')

23564294 23564294 1 readwrite.H5ADInMemorySizeSuite.track_in_memory_size('http://falexwolf.de/data/pbmc3k_raw.h5ad')

5.26M 5.26M 1 readwrite.H5ADReadSuite.mem_read_backed_object('http://falexwolf.de/data/pbmc3k_raw.h5ad')

23.6M 23.6M 1 readwrite.H5ADReadSuite.mem_readfull_object('http://falexwolf.de/data/pbmc3k_raw.h5ad')

87.5M 87.7M 1 readwrite.H5ADReadSuite.peakmem_read_backed('http://falexwolf.de/data/pbmc3k_raw.h5ad')

107M 107M 1 readwrite.H5ADReadSuite.peakmem_read_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')

78.1±1ms 77.2±0.3ms 0.99 readwrite.H5ADReadSuite.time_read_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')

1.1241426611796983 1.129467373760664 1 readwrite.H5ADReadSuite.track_read_full_memratio('http://falexwolf.de/data/pbmc3k_raw.h5ad')

111M 111M 1 readwrite.H5ADWriteSuite.peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')

111M 110M 1 readwrite.H5ADWriteSuite.peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')

336±3ms 335±2ms 1 readwrite.H5ADWriteSuite.time_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')

60.5±0.8ms 59.1±0.7ms 0.98 readwrite.H5ADWriteSuite.time_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')

7.5 7.5 1 readwrite.H5ADWriteSuite.track_peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')

6.75 7.0 1.04 readwrite.H5ADWriteSuite.track_peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), array([False, True, True, ..., True, True, True]))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(0, 1000, None))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(0, 9000, None))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(None, 9000, -1))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(None, None, 2))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), array([False, True, True, ..., True, True, True]))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(0, 1000, None))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(0, 9000, None))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(None, 9000, -1))

119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(None, None, 2))

284±3ms 286±2ms 1.01 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), array([False, True, True, ..., True, True, True]))

609±4μs 615±30μs 1.01 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(0, 1000, None))

4.05±0.08ms 3.98±0.06ms 0.98 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(0, 9000, None))

274±2ms 277±2ms 1.01 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(None, 9000, -1))

1.35±0.01s 1.36±0s 1 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(None, None, 2))

168±2μs 166±1μs 0.99 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), array([False, True, True, ..., True, True, True]))

45.0±0.7μs 47.4±1μs 1.05 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(0, 1000, None))

44.9±0.3μs 46.6±0.4μs 1.04 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(0, 9000, None))

45.2±0.8μs 46.1±0.4μs 1.02 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(None, 9000, -1))

45.4±1μs 46.1±0.6μs 1.02 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(None, None, 2))

Before [a4786471]	After [5da3c619] <pull/1348/head>	Ratio	Benchmark (Parameter)
99.9M	101M	1.01	readwrite.H5ADBackedWriteSuite.peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
100M	100M	1	readwrite.H5ADBackedWriteSuite.peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
395±2ms	397±3ms	1.01	readwrite.H5ADBackedWriteSuite.time_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
118±0.5ms	117±1ms	1	readwrite.H5ADBackedWriteSuite.time_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
15.3828125	15.6953125	1.02	readwrite.H5ADBackedWriteSuite.track_peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
15.35546875	15.37109375	1	readwrite.H5ADBackedWriteSuite.track_peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
91214354	91214570	1	readwrite.H5ADInMemorySizeSuite.track_actual_in_memory_size('http://falexwolf.de/data/pbmc3k_raw.h5ad')
23564294	23564294	1	readwrite.H5ADInMemorySizeSuite.track_in_memory_size('http://falexwolf.de/data/pbmc3k_raw.h5ad')
5.26M	5.26M	1	readwrite.H5ADReadSuite.mem_read_backed_object('http://falexwolf.de/data/pbmc3k_raw.h5ad')
23.6M	23.6M	1	readwrite.H5ADReadSuite.mem_readfull_object('http://falexwolf.de/data/pbmc3k_raw.h5ad')
87.5M	87.7M	1	readwrite.H5ADReadSuite.peakmem_read_backed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
107M	107M	1	readwrite.H5ADReadSuite.peakmem_read_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
78.1±1ms	77.2±0.3ms	0.99	readwrite.H5ADReadSuite.time_read_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
1.1241426611796983	1.129467373760664	1	readwrite.H5ADReadSuite.track_read_full_memratio('http://falexwolf.de/data/pbmc3k_raw.h5ad')
111M	111M	1	readwrite.H5ADWriteSuite.peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
111M	110M	1	readwrite.H5ADWriteSuite.peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
336±3ms	335±2ms	1	readwrite.H5ADWriteSuite.time_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
60.5±0.8ms	59.1±0.7ms	0.98	readwrite.H5ADWriteSuite.time_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
7.5	7.5	1	readwrite.H5ADWriteSuite.track_peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
6.75	7.0	1.04	readwrite.H5ADWriteSuite.track_peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), array([False, True, True, ..., True, True, True]))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(0, 1000, None))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(0, 9000, None))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(None, 9000, -1))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(None, None, 2))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), array([False, True, True, ..., True, True, True]))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(0, 1000, None))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(0, 9000, None))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(None, 9000, -1))
119M	119M	1	sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(None, None, 2))
284±3ms	286±2ms	1.01	sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), array([False, True, True, ..., True, True, True]))
609±4μs	615±30μs	1.01	sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(0, 1000, None))
4.05±0.08ms	3.98±0.06ms	0.98	sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(0, 9000, None))
274±2ms	277±2ms	1.01	sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(None, 9000, -1))
1.35±0.01s	1.36±0s	1	sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(None, None, 2))
168±2μs	166±1μs	0.99	sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), array([False, True, True, ..., True, True, True]))
45.0±0.7μs	47.4±1μs	1.05	sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(0, 1000, None))
44.9±0.3μs	46.6±0.4μs	1.04	sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(0, 9000, None))
45.2±0.8μs	46.1±0.4μs	1.02	sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(None, 9000, -1))
45.4±1μs	46.1±0.6μs	1.02	sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(None, None, 2))

Support

https://asv.readthedocs.io/

Dedicated queue version

Switch to Python for more maintainers.

Separate queue server and benchmark runner.

Queue server

Two parts: webhook server and persistent queue.

webhook server accepts webhook request
transforms certain ones into event on persistent queue
from time to time checks for updates and potentially apply them¹

in order for the updates to be applied without losing requests, maybe put a load balancer or something like that in front of the webhook server

Benchmark runner

The benchmark runner should be a simple loop that executes the following steps

check for and potentially apply update¹ for step 3.
poll queue server. if event is available, execute step 3, else sleep then goto 1
handle event using a subprocess that potentially got updated in step 1
goto 1

¹E.g. we could use something like TUF / TUFup to check for and download updates

Publish raw benchmark results

One rather intricate solution is to make a website:

Each check run’s details tab has a link that reads “View more details on scverse-benchmark”. Without overriding it, it links to the GH app’s public URL, which currently goes nowhere useful.

We could make the official bot URL point to the website and override an individual run’s link using the details_url body parameter to link to subpage that provides a run’s JSON for download.

A simpler option would be to use the “update a check run” endpoint’s output.annotations[].raw_details, which can contain 64 kb of data and is rendered like this:

It wouldn’t be a great fit, as it’s attached to a certain file, but the only field that is explicitly for bigger data than a line or so.

scverse / benchmark Goto Github PK

benchmark's Introduction

benchmark machine

Development plan:

Usage

MVP Setup

Debugging

One-time server setup

Deployment

Development

benchmark's People

Contributors

Stargazers

Watchers

benchmark's Issues

Once (total, in BIOS)

Per boot

Per run

Queue server

Benchmark runner

Recommend Projects

Recommend Topics

Recommend Org