softdevteam / krun Goto Github PK

View Code? Open in Web Editor NEW

84.0 84.0 10.0 1.96 MB

High fidelity benchmark runner

Home Page: http://soft-dev.org/src/krun/

License: Other

PHP 0.72% Python 83.23% Makefile 1.28% Java 3.17% C 8.42% Lua 1.20% Ruby 0.78% JavaScript 0.88% Shell 0.24% Awk 0.07%

benchmark experiment linux openbsd

krun's People

Contributors

Stargazers

Watchers

Forkers

bennn fniephaus abergeron warsier manuelleduc emaxerrno vext01 ltratt jacob-hughes

krun's Issues

Krun reports wrong log file path once finished

...
[2015-11-27 13:12:39 INFO] Done: Results dumped to warmup_results.json.bz2
[2015-11-27 13:12:39 INFO] Completed in (roughly) 192.660389 seconds.
Log file at: warmup_20151127_130926.log
$ ls warmup_20151127_130926.log
ls: cannot access warmup_20151127_130926.log: No such file or directory
$ ls | grep -e '\.log$'
warmup_20151127_112600.log
warmup_20151127_112753.log
warmup_20151127_112842.log
warmup_20151127_130825.log
warmup_20151127_130920.log

Provide a way to re-run a subset of experiments

This issue came from the discussion on Issue #41 (from @vext01 )

Currently it is only possible to re-run a subset of a whole benchmark by deleting entries from the json bz2 file. This is quite fiddly as you have to uncompress and then locate the right lines to remove in a results file potentially with tens of thousands of lines, so I wonder if we could write a tiny tool to remove results by key.

Linux CPU governor is not being checked.

I was inspecting bencher3 prior to running benchmarks. I've found that if you put a breakpoint in the Linux implementation of check_cpu_throttled(), it is never hit. Subsequently, benchmarking is allowed to continue powersave mode, which is bad.

Send mail using sendmail(8) not using SMTP.

Avoids need for a SMTP server that listens on a port. From @ltratt.

--dump-temps and possibly --dump-data

Two keys which don't yet have dump switches.

Store ETA estimates in results file.

As discussed with @snim2, the iterations results are not enough to reconstruct a useful ETA estimate if a benchmark crashes (as the times in this case will be an empty list).

We store the rough ETA times in the results file.

Reboot after each execution.

Reboot the system after every execution. Will require a systemd hook.

Check interpreters and benchmark entry points exist up-front.

There's a bug in krun where it crashes when it tries to run a VM with a bad path. Instead of fixing this bug with an email, let's just check the VM paths up-front. Same for the benchmark entry points. This means we don't get random crashes in the middle of experimentation.

Fixed version of mx

Currently we download a random version of mx (whatever is the latest in the repository). We should instead used a fixed version (i.e. use a specific checkout).

[I'd prefer we don't fix the version yet, as I'm hoping to fix mx for OpenBSD.]

mk_graphs.py appears to be broken

I think mk_graphs.py has fallen behind the changes in the rest of the repo. Running mk_graphs.py on a cut-down version of the examples config I get this:

$ ./mk_graphs.py examples/quick.krun 
Traceback (most recent call last):
  File "./mk_graphs.py", line 296, in <module>
    data_dct = json.load(fh)
  File "/usr/lib/python2.7/json/__init__.py", line 290, in load
    **kw)
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

Config file needed to reproduce:

import os
from krun.vm_defs import (PythonVMDef, JavaVMDef)
from krun import EntryPoint

# Who to mail
MAIL_TO = []

# Maximum number of error emails to send per-run
#MAX_MAILS = 2

DIR = os.getcwd()
JKRUNTIME_DIR = os.path.join(DIR, "krun", "libkruntime", "")

HEAP_LIMIT = 2097152  # K == 2Gb

# Variant name -> EntryPoint
VARIANTS = {
    "default-java": EntryPoint("KrunEntry", subdir="java"),
    "default-python": EntryPoint("bench.py", subdir="python"),
}

ITERATIONS_ALL_VMS = 1  # Small number for testing.

VMS = {
    'Java': {
        'vm_def': JavaVMDef('/usr/bin/java'),
        'variants': ['default-java'],
        'n_iterations': ITERATIONS_ALL_VMS,
    },
    'CPython': {
        'vm_def': PythonVMDef('/usr/bin/python2'),
        'variants': ['default-python'],
        'n_iterations': ITERATIONS_ALL_VMS,
    }
}


BENCHMARKS = {
    'dummy': 1000,
    'nbody': 1000,
}

# list of "bench:vm:variant"
SKIP=[
    #"*:CPython:*",
    #"*:Java:*",
]

N_EXECUTIONS = 1  # Number of fresh processes.

Benchmarks as in the examples/ directory.

Related to Issue #48

Improve krun.py --help

Explain what "developer mode" means in the CLI help.

Preserve error indicator across reboots.

There is a flag in krun that inidcates whether an error occurred. This is currently not preserved across runs.

This flag is also used to determine whether we should print dmesg changes at the end. I would argue this is no longer needed. It should be sufficient to mail out and log the dmesg changes as they occur.

Improve rudimentary platform check

When using resume-model, Krun performs a rudimentary checks to determine whether the benchmark is being resumed on the same machine that it was started on. Currently, this check only checks that the uname of the two machines is identical.

This test should differ between platforms, and it would make sense to refactor it into the classes in platform.py. This may happen as part of #75

For Linux, we could use a strong check such as blkid, which returns a unique ID of the root file system. I guess someone could have installed more RAM between resumes, so this doesn't catch every case, but anyhow @ltratt felt this was overkill and suggested we use hostname, which is available on most (all?) Unix like platforms.

A more detailed fix would check every value in the audit dictionary. However, platform.audit() returns a str, but the audit in the JSON file is a unicode. On my machine I have packages with names that contain acutes and other accents, and a == between "this" platform and the previous one always returned False. The Internet contains a wide variety of solutions to this problem, and I didn't get any of them to quite work: hence the current hack.

Build mailing into fatal()

As discussed with @snim2, it should not be hard to ensure that fatal() sends email.

Implement `--develop`

We need a switch which makes it easier to develop on a personal laptop. This switch -- say --develop -- turns off all of the checks for system prerequisites, e.g. governer, tickless, ...

Try to control floating point non-determinism

https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/

Port to OpenBSD

Laurie has expressed interest in running benchmarks on OpenBSD.

This is predicated on us getting a useful subset of the VMs running on OpenBSD.

Iterations runner output is printed at the end of the process.

The iterations runners print status messages to the effect:

iteration 1/2000
iteration 2/2000
...

However, due to stderr buffering, these messages are all printed at once when the execution ends. It would be nice if we could have these messages printed as they arrive.

ETA doesn't account for startup sleeps in reboot mode

ETAs in reboot mode are currently a little optimistic due to the fact that they don't account for the sleeps that allow the system to come up.

Turn off ASLR if possible

We should turn off address space layout randomisation on platforms that support it.

Run benchmarks as a new user

Run benchmarks as a new user, i.e. '_krun'. Make sure the user has a clean home dir and shell environment.

Run benchmarks on a fresh install.

Have krun(?) reinstall the machine, install the packages, build the VMs, and run the benchmarks.

Benchmark running time estimation script

We have a script to estimate how long a whole benchmark run would take. This would be a good piece of extra documentation to add to examples/.

Do something about Richards

In Richards one iteration is really fast, compared to other benchmarks. This leads to some weird effects and should be fixed in some way.

Minor refactorings of commonly used dictionaries

This is from the discussion on PR #74

Data from krun configuration files (and results files) is loaded when krun.py starts, and are used in various parts of the code. Usually parts of the configuration (such as filenames) are passed around as separate variables. It would be neater to factor these out into classes, so that a single object can be passed around (and tested).

Easier session resume

When a benchmark session crashes, or when the system loses power, we often need to run again just what we need to.

Currently this is quite a manual process. The experimenter has to manually determine which work needs to be run, run them, then manually merge the new data into the existing data.

It would be better if krun could help us do this.

I imagine something like the following:

user invokes krun
krun realises a results file already exists
krun says partial results detected. Looks like we need to run: 4 executions of benchmark3 and 4 executions of benchmark 4, OK?
If user says yes, krun appends results to existing results.

This need not be an interactive process, there could be a -check-results and a -resume option on the CLI.

Eco friendly krun

We should make our best attempt to set the powersave governor after benchmarking is done.

Recommend using sudo to crank up/down the governor (if possible).

Discussed with @ltratt.

Decouple config filename from results filename and remove need for config file from tools which consume results.

Currently the result filename is derived from the config filename. E.g.:

warmup.krun -> warmup_results.json

Tools that consume results use a function in krun to discover the json filename.

I think it would be better if:

The results file was named similarly to the log files. e.g. warmup_20151008.bz2
Tools consuming results should not need the config file at all.

This is really just an annoyance and isn't all that high priority.

Port to Python 3

Ruby runner uses wrong timer if run under JRuby.

This was originally report by @cfbolz here (softdevteam/warmup_experiment#55):

The Ruby runner contains this code:

if /linux/ =~ RUBY_PLATFORM
    MONOTONIC_CLOCK = Process::CLOCK_MONOTONIC_RAW
else
    MONOTONIC_CLOCK = Process::CLOCK_MONOTONIC
end

however, on JRuby RUBY_PLATFORM is "java", thus the wrong timer is used. Here is a discussion how to fix this: https://www.ruby-forum.com/topic/150942

Rework run_shell_cmd()

As discussed with @snim2:

Move util.run_shell_cmd() into the platform instance.
- It should accept an optional argument user, which if not None uses sudo or doas to invoke the command as another user
- It should accept a list of arguments, and never a string.
We should replace any manual os.system(), subprocess.Popen() with calls to the new method.

Remove `warmup_upon_iter` and `N_GRAPHS_PER_EXEC` support.

Dealing with warmup and drawing graphs are not the responsibilities of krun

Port to OpenBSD

[2015-11-02 16:42:17: ERROR] I don't have support for your platform

nice -20 benchmarks

run benchmarks highest priority. Helps on platforms with out CPU isolation.

VM specific details should be associated with VMS not VARIANTS

The model is currently wrong. VM specific details should be associated with VMS not VARIANTS. Also all VM specific details should be hidden inside krun -- not bled out into the user config file.

Reccommend renaming Variants to LangProfiles, and requiring each vm in VMs to requires a 'profile' entry.

Krun "same platform" check broken?

Hi,

I don't understand the current implementation of audits_same_platform:

def audits_same_platform(audit0, audit1):
    """Check whether two platform audits are from identical machines.
    A machine audit is a dictionary with the following keys:

        * cpuinfo (Linux only)
        * packages (Debian-based systems only)
        * debian_version (Debian-based systems only)
        * uname (all platforms)
        * dmesg (all platforms)

    Platform information may be Unicode.
    """
    for key in ["cpuinfo", "dmesg", "uname"]:
        if (not key in audit0) or (not key in audit1):
            return False
    if not audit0["uname"] == audit1["uname"]:
        return False
    return True

It just checks the existence of keys, and not the content of the audit. Is that intentional? On OpenBSD this always returns False as cpuinfo is linux specific.

I think:

We should be checking the contents of the audits.
This should be hooked into the platform hierarchy. This would allow specific platform checks without too much if linux() elif OpenBSD....

Thanks

Add sanity checks for VMs

As discussed with Chris Seaton, some VMs need checks prior to experimentation.

E.g. for graal, we need to check java.vm.name for "JVMCI".

Similarly, for JRuby we should check Truffle.graal? is true.

From experience, it's easy to get these wrong.

Log to file and email errors.

As discussed with @ltratt , it should be possible to log all benchmarking output to a file and to (optionally) email errors to a configurable address.

Store initial CPU temperature in results file

As it stands the starting CPU temperature is not honored in reboot mode. We would need to store it somewhere and pick it up later. Probably in the results file.

Linux throttle check only looks at one cpu.

CPU_GOV_FILE = "/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"

Bencher3 has cpu1, cpu2, and cpu3 in addition. We should check these too.

Decide if we did the right thing with pstates.

In an email between Laurie an I:

I did find this, I wonder if we should contact this Arjan chap?:
https://plus.google.com/+TheodoreTso/posts/2vEekAsG2QT

vm_defs.py

It works but I'm unhappy with it:

Handling of vm arguments and environments is a mess.
Updating environment dicts with dict.update() may flatten an existing entry in the environment.

Some of the issues regarding the former are due to the fact that the heap limit is not known when the VMDef is instantiated. I don't want the user to have to pass that in explicitly. I guess the proper fix is to make a proper config file format which is not evaluated by the python interpreter. Then after the file is parsed, the VMDefs could be instantiated passing in whatever we want.

To properly handle the latter we would need to provide krun with information about the environment. E.g. to properly update PATH without flattening the existing $PATH, krun would need to know that it is a colon separated collection of paths, and that to append another path, it should append ":%s" % another_path

These should be fixed one day perhaps, but we have bigger fish to fry at the moment.

dmesg whitelist

Some OSs -- particularly Linux, it seems -- print continuous, voluminous, and not insightful output to their dmesgs. We need to add a blacklist feature so that known not-very-interesting things (e.g. segfaults) aren't reported to the user. I suggest that each platform has a default list of these things. For the time being, I think a crude mechanism is enough. Maybe in the future we might find that we want to allow users to add their own rules, but that's overkill for now.

--dump-audit --dump-config

It is difficult to see what is in a results file. These CLI options would print the audit or config to STD out, saving the user the effort of loading the bzip / json.

Proper CPU pinning.

At current, isolcpus on Linux only blocks user processes from a CPU and not kernel threads.

There is a tool called cset which might allow us to do proper CPU isolation, kernel threads and all. For a while it has been broken on Linux. I reported a bug to debian a while ago and they have finally responded saying it should be fixed:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=796893

check turbo mode is off.

Krun should check turbo mode is off. This can be checked in:

/sys/devices/system/cpu/intel_pstate/no_turbo

Prevent an infinite reboot loop

To prevent a benchmarking machine rebooting infinitely:

Store the number of reboots to date in the results file
On starting krun, after the execution schedule has been build, add together the number of executions completed so far, with the number planned
If the value from step 2. is less than the number of reboots so far, halt with an INFO message in the log

Tickless operation on linux

As raised by ltratt, we may need to use tickless mode on linux:
http://lwn.net/Articles/549580/

Our debian 8 systems currently run idle tickless mode:

vext01@bencher3:~$ cat /boot/config-`uname -r` | grep HZ
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set <--------------------- this would need to be on!
# CONFIG_NO_HZ is not set
CONFIG_RCU_FAST_NO_HZ=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_MACHZ_WDT=m

To fix this, we would need to build a custom kernel. Another impractical step unfortunately.

Opening bug for discussion. @ltratt @snim2 @cfbolz

Break or clarify dependency between VM objects and platforms

When a new Krun schedule is created, a krun.config.Config object is created, based on a configuration file on disk. The constructed Config object contains a number of VM definitions, based on the classes in krun.vm_defs. Each of these objects needs to know about the platform on which they are running, so each VM object has a set_platform() method which needs to be called after the VM has been constructed.

This creates an awkward dependency between all the objects that have to be created to run a schedule. Initially, the krun.py file dealt with this in main(), meaning that client code (including unit tests) had to explicitly manage object initialisation in the correct order.

PR #111 moved this code into the constructor of krun.scheduler.Scheduler, so that anyone creating a full scheduler can forget about object initialisation. Other client code (including some unit tests) will still need to call set_platform() by hand. This is not particularly clean.

At the very least it these dependencies should be clearly documented.

DOCUMENT KRUN!

Krun is becoming complicated. We should document it.