Code Monkey home page Code Monkey logo

benchmark's Introduction

benchmark machine

Development plan:

  1. MVP:

    Queue and runs exist on the benchmark server, which handles everything

  2. Dedicated queue

    To avoid a web server running in the background, have a VM or so that maintains the queue

  3. Cancellation

    Improve UX by allowing cancellation and other niceties

Usage

Eventually, we’ll just have a project-wide webhook like this. For now, if you want to test:

  1. Add a asv config to your project (either the project root or a benchmarks directory)
  2. Add a webhook to your scverse project with these webhook settings, i.e.
    • Content type: application/json
    • Let me select individual events → Pull Requests
  3. Add a label benchmark to a PR authored by a trusted user.
  4. Watch scverse-benchmarks add and update a comment with the PR’s performance impact.

MVP Setup

All these currently assume you have a <user> login with sudo rights on the scvbench server.

Debugging

  • Use journalctl -u benchmark -f on the server to tail the logs of the service.
  • Check GitHub’s page for Hook deliveries.

One-time server setup

  1. As the benchmarker user, install micromamba, then:

    micromamba create -n asv -c conda-forge conda mamba virtualenv asv
    micromamba run -n asv asv machine --yes

    (use micromamba activate asv to make asv available in your PATH)

  2. Update LoadCredentialEncrypted lines in benchmark.service using

    sudo systemd-creds encrypt --name=webhook_secret secret.txt -
    sudo systemd-creds encrypt --name=app_key app-key.pem -
    shred secret.txt app-key.pem
  3. Copy the benchmark.service file to the system, enable and start the service:

    $ rsync benchmark.service <user>@scvbench:
    $ ssh <user>@scvbench
    scvbench$ sudo mv benchmark.service /etc/systemd/system/
    scvbench$ sudo systemctl enable --now benchmark

Further steps:

  1. Setup chrony (/etc/chrony.conf) to use internal servers

    server 146.107.1.13 trust
    server 146.107.5.10
  2. Performance setup

Deployment

  1. Make changes in <branch> (either main or a PR branch) and wait until CI finishes.
  2. Run nu scripts/deploy.nu <branch> --user=<user>.
  3. Trigger a run, e.g. remove and re-add the benchmark label in PR 11.

Development

For local development:

  1. Start the server locally
  2. use scripts/test.nu to send a payload (check the script for examples for both steps)

benchmark's People

Contributors

flying-sheep avatar ilan-gold avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

benchmark's Issues

Performance

For improved reliability, the machine needs some setup. Maybe we can validate that automatically?

Guides:

Once (total, in BIOS)

Per boot

  • Global setup:

    Use performance governor for all CPUS
    Disable processor boosting
    Reduce swappiness
    No ASLR

    tentative script so far:

    # performance governor
    echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    # no boosting
    echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost
    # low swappiness
    sudo sysctl vm.swappiness=10
    # no ASLR
    sudo sysctl kernel.randomize_va_space=0
  • Assign roles to CPUS:

Per run

  • Set the benchmark program’s task affinity to a fixed cpu. (ASV: --cpu-affinity, subprocess: taskset -c 0 ./mybenchmark)

Security

Currently, the public IP has no domain, and therefore no ability to do HTTPS.

Therefore, GitHub has to send the WebHook secret in plain text, and it can (maybe, theoretically?) be sniffed.

So we should probably only allow running benchmarks on branches and tags, not arbitrary commits (as GitHub stores a fork’s PR commits in the parent repo)

Use python for server?

I'm concerned about use of Rust here for maintainability w.r.t bus number.

From what I can tell it's a pretty simple web server that could also be done in flask, but if it were done in flask could be more easily maintained by other scverse core team members.

Show comparison for current envs only

Currently, if you change the benchmark dependency matrix in a PR, asv compare will pick up the older runs:

Change Before [c68557c5] After [2a358df6] Ratio Benchmark (Parameter)
174M n/a n/a preprocessing.PreprocessingSuite.peakmem_calculate_qc_metrics('pbmc68k_reduced') [scvbench/conda-py3.12-flit-h5py-memory_profiler-natsort-numpy-pandas-pooch-pytest-pytoml-scanpy-scipy-setuptools_scm-zarr]
155M 163M 1.05 preprocessing.PreprocessingSuite.peakmem_calculate_qc_metrics('pbmc68k_reduced') [scvbench/conda-py3.12-h5py-memory_profiler-natsort-numpy-pandas-pytest-scanpy-scipy-zarr]

We can manually specify the env using asv compare -E $spec to show only benchmarks from the current env.

Python code to get autodetected envs:

import asv
import json

conf = asv.config.Config.load("asv.conf.json")
env_names = [env.name for env in asv.environment.get_environments(conf, "")]
print(json.dumps(env_names))

So we could run this, and then specify those:

let env_specs: Vec<string> = serde_json::parse(stdout);
for env_spec in env_specs {
    compare_cmd.args(["-E", env_spec]);
}

Maybe a better solution shows up in airspeed-velocity/asv#1394

CI to build for Rocky

It links against libgit and OpenSSH, so it needs to be build on or for the server

Run ASV and display results

compare:

  • https://github.com/HaoZeke/asv-numpy

  • https://github.com/anderspitman/autobencher

  • https://github.com/JuliaCI/Nanosoldier.jl

  • Backfill for tags

     git tag --list --sort=version:refname '*.*.*' | asv run HASHFILE:-
    
  • Interactive use with PRs:

    1. Bless a PR’s last commit and somehow send webhook
    2. Update base branch and fetch commit: git fetch origin main:main $sha
    3. [main, $sha] | str join "\n" | asv run HASHFILE:- (or just run on one commit if main exists)
    4. asv compare --only-changed main $sha will be empty if no changes, otherwise have a markdown table as result
    Example result without --only-changed

    All benchmarks:

    Change Before [a4786471] After [5da3c619] <pull/1348/head> Ratio Benchmark (Parameter)
    99.9M 101M 1.01 readwrite.H5ADBackedWriteSuite.peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    100M 100M 1 readwrite.H5ADBackedWriteSuite.peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    395±2ms 397±3ms 1.01 readwrite.H5ADBackedWriteSuite.time_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    118±0.5ms 117±1ms 1 readwrite.H5ADBackedWriteSuite.time_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    15.3828125 15.6953125 1.02 readwrite.H5ADBackedWriteSuite.track_peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    15.35546875 15.37109375 1 readwrite.H5ADBackedWriteSuite.track_peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    91214354 91214570 1 readwrite.H5ADInMemorySizeSuite.track_actual_in_memory_size('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    23564294 23564294 1 readwrite.H5ADInMemorySizeSuite.track_in_memory_size('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    5.26M 5.26M 1 readwrite.H5ADReadSuite.mem_read_backed_object('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    23.6M 23.6M 1 readwrite.H5ADReadSuite.mem_readfull_object('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    87.5M 87.7M 1 readwrite.H5ADReadSuite.peakmem_read_backed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    107M 107M 1 readwrite.H5ADReadSuite.peakmem_read_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    78.1±1ms 77.2±0.3ms 0.99 readwrite.H5ADReadSuite.time_read_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    1.1241426611796983 1.129467373760664 1 readwrite.H5ADReadSuite.track_read_full_memratio('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    111M 111M 1 readwrite.H5ADWriteSuite.peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    111M 110M 1 readwrite.H5ADWriteSuite.peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    336±3ms 335±2ms 1 readwrite.H5ADWriteSuite.time_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    60.5±0.8ms 59.1±0.7ms 0.98 readwrite.H5ADWriteSuite.time_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    7.5 7.5 1 readwrite.H5ADWriteSuite.track_peakmem_write_compressed('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    6.75 7.0 1.04 readwrite.H5ADWriteSuite.track_peakmem_write_full('http://falexwolf.de/data/pbmc3k_raw.h5ad')
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), array([False, True, True, ..., True, True, True]))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(0, 1000, None))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(0, 9000, None))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(None, 9000, -1))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem((10000, 10000), slice(None, None, 2))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), array([False, True, True, ..., True, True, True]))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(0, 1000, None))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(0, 9000, None))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(None, 9000, -1))
    119M 119M 1 sparse_dataset.SparseCSRContiguousSlice.peakmem_getitem_adata((10000, 10000), slice(None, None, 2))
    284±3ms 286±2ms 1.01 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), array([False, True, True, ..., True, True, True]))
    609±4μs 615±30μs 1.01 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(0, 1000, None))
    4.05±0.08ms 3.98±0.06ms 0.98 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(0, 9000, None))
    274±2ms 277±2ms 1.01 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(None, 9000, -1))
    1.35±0.01s 1.36±0s 1 sparse_dataset.SparseCSRContiguousSlice.time_getitem((10000, 10000), slice(None, None, 2))
    168±2μs 166±1μs 0.99 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), array([False, True, True, ..., True, True, True]))
    45.0±0.7μs 47.4±1μs 1.05 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(0, 1000, None))
    44.9±0.3μs 46.6±0.4μs 1.04 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(0, 9000, None))
    45.2±0.8μs 46.1±0.4μs 1.02 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(None, 9000, -1))
    45.4±1μs 46.1±0.6μs 1.02 sparse_dataset.SparseCSRContiguousSlice.time_getitem_adata((10000, 10000), slice(None, None, 2))

Dedicated queue version

Switch to Python for more maintainers.

Separate queue server and benchmark runner.

Queue server

Two parts: webhook server and persistent queue.

  1. webhook server accepts webhook request
  2. transforms certain ones into event on persistent queue
  3. from time to time checks for updates and potentially apply them¹

in order for the updates to be applied without losing requests, maybe put a load balancer or something like that in front of the webhook server

Benchmark runner

The benchmark runner should be a simple loop that executes the following steps

  1. check for and potentially apply update¹ for step 3.
  2. poll queue server. if event is available, execute step 3, else sleep then goto 1
  3. handle event using a subprocess that potentially got updated in step 1
  4. goto 1

¹E.g. we could use something like TUF / TUFup to check for and download updates

Publish raw benchmark results

One rather intricate solution is to make a website:

Each check run’s details tab has a link that reads “View more details on scverse-benchmark”. Without overriding it, it links to the GH app’s public URL, which currently goes nowhere useful.

We could make the official bot URL point to the website and override an individual run’s link using the details_url body parameter to link to subpage that provides a run’s JSON for download.


A simpler option would be to use the “update a check run” endpoint’s output.annotations[].raw_details, which can contain 64 kb of data and is rendered like this:

It wouldn’t be a great fit, as it’s attached to a certain file, but the only field that is explicitly for bigger data than a line or so.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.