softdevteam / krun Goto Github PK
View Code? Open in Web Editor NEWHigh fidelity benchmark runner
Home Page: http://soft-dev.org/src/krun/
License: Other
High fidelity benchmark runner
Home Page: http://soft-dev.org/src/krun/
License: Other
...
[2015-11-27 13:12:39 INFO] Done: Results dumped to warmup_results.json.bz2
[2015-11-27 13:12:39 INFO] Completed in (roughly) 192.660389 seconds.
Log file at: warmup_20151127_130926.log
$ ls warmup_20151127_130926.log
ls: cannot access warmup_20151127_130926.log: No such file or directory
$ ls | grep -e '\.log$'
warmup_20151127_112600.log
warmup_20151127_112753.log
warmup_20151127_112842.log
warmup_20151127_130825.log
warmup_20151127_130920.log
This issue came from the discussion on Issue #41 (from @vext01 )
Currently it is only possible to re-run a subset of a whole benchmark by deleting entries from the json bz2 file. This is quite fiddly as you have to uncompress and then locate the right lines to remove in a results file potentially with tens of thousands of lines, so I wonder if we could write a tiny tool to remove results by key.
I was inspecting bencher3 prior to running benchmarks. I've found that if you put a breakpoint in the Linux implementation of check_cpu_throttled(), it is never hit. Subsequently, benchmarking is allowed to continue powersave mode, which is bad.
Avoids need for a SMTP server that listens on a port. From @ltratt.
Two keys which don't yet have dump switches.
As discussed with @snim2, the iterations results are not enough to reconstruct a useful ETA estimate if a benchmark crashes (as the times in this case will be an empty list).
We store the rough ETA times in the results file.
Reboot the system after every execution. Will require a systemd hook.
There's a bug in krun where it crashes when it tries to run a VM with a bad path. Instead of fixing this bug with an email, let's just check the VM paths up-front. Same for the benchmark entry points. This means we don't get random crashes in the middle of experimentation.
Currently we download a random version of mx
(whatever is the latest in the repository). We should instead used a fixed version (i.e. use a specific checkout).
[I'd prefer we don't fix the version yet, as I'm hoping to fix mx
for OpenBSD.]
I think mk_graphs.py
has fallen behind the changes in the rest of the repo. Running mk_graphs.py
on a cut-down version of the examples config I get this:
$ ./mk_graphs.py examples/quick.krun
Traceback (most recent call last):
File "./mk_graphs.py", line 296, in <module>
data_dct = json.load(fh)
File "/usr/lib/python2.7/json/__init__.py", line 290, in load
**kw)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Config file needed to reproduce:
import os
from krun.vm_defs import (PythonVMDef, JavaVMDef)
from krun import EntryPoint
# Who to mail
MAIL_TO = []
# Maximum number of error emails to send per-run
#MAX_MAILS = 2
DIR = os.getcwd()
JKRUNTIME_DIR = os.path.join(DIR, "krun", "libkruntime", "")
HEAP_LIMIT = 2097152 # K == 2Gb
# Variant name -> EntryPoint
VARIANTS = {
"default-java": EntryPoint("KrunEntry", subdir="java"),
"default-python": EntryPoint("bench.py", subdir="python"),
}
ITERATIONS_ALL_VMS = 1 # Small number for testing.
VMS = {
'Java': {
'vm_def': JavaVMDef('/usr/bin/java'),
'variants': ['default-java'],
'n_iterations': ITERATIONS_ALL_VMS,
},
'CPython': {
'vm_def': PythonVMDef('/usr/bin/python2'),
'variants': ['default-python'],
'n_iterations': ITERATIONS_ALL_VMS,
}
}
BENCHMARKS = {
'dummy': 1000,
'nbody': 1000,
}
# list of "bench:vm:variant"
SKIP=[
#"*:CPython:*",
#"*:Java:*",
]
N_EXECUTIONS = 1 # Number of fresh processes.
Benchmarks as in the examples/
directory.
Related to Issue #48
Explain what "developer mode" means in the CLI help.
There is a flag in krun that inidcates whether an error occurred. This is currently not preserved across runs.
This flag is also used to determine whether we should print dmesg changes at the end. I would argue this is no longer needed. It should be sufficient to mail out and log the dmesg changes as they occur.
When using resume-model, Krun performs a rudimentary checks to determine whether the benchmark is being resumed on the same machine that it was started on. Currently, this check only checks that the uname
of the two machines is identical.
This test should differ between platforms, and it would make sense to refactor it into the classes in platform.py
. This may happen as part of #75
For Linux, we could use a strong check such as blkid
, which returns a unique ID of the root file system. I guess someone could have installed more RAM between resumes, so this doesn't catch every case, but anyhow @ltratt felt this was overkill and suggested we use hostname
, which is available on most (all?) Unix like platforms.
A more detailed fix would check every value in the audit dictionary. However, platform.audit()
returns a str
, but the audit in the JSON file is a unicode
. On my machine I have packages with names that contain acutes and other accents, and a ==
between "this" platform and the previous one always returned False
. The Internet contains a wide variety of solutions to this problem, and I didn't get any of them to quite work: hence the current hack.
As discussed with @snim2, it should not be hard to ensure that fatal()
sends email.
We need a switch which makes it easier to develop on a personal laptop. This switch -- say --develop
-- turns off all of the checks for system prerequisites, e.g. governer, tickless, ...
Laurie has expressed interest in running benchmarks on OpenBSD.
This is predicated on us getting a useful subset of the VMs running on OpenBSD.
The iterations runners print status messages to the effect:
iteration 1/2000
iteration 2/2000
...
However, due to stderr buffering, these messages are all printed at once when the execution ends. It would be nice if we could have these messages printed as they arrive.
ETAs in reboot mode are currently a little optimistic due to the fact that they don't account for the sleeps that allow the system to come up.
We should turn off address space layout randomisation on platforms that support it.
Run benchmarks as a new user, i.e. '_krun'. Make sure the user has a clean home dir and shell environment.
Have krun(?) reinstall the machine, install the packages, build the VMs, and run the benchmarks.
We have a script to estimate how long a whole benchmark run would take. This would be a good piece of extra documentation to add to examples/
.
In Richards one iteration is really fast, compared to other benchmarks. This leads to some weird effects and should be fixed in some way.
This is from the discussion on PR #74
Data from krun configuration files (and results files) is loaded when krun.py
starts, and are used in various parts of the code. Usually parts of the configuration (such as filenames) are passed around as separate variables. It would be neater to factor these out into classes, so that a single object can be passed around (and tested).
When a benchmark session crashes, or when the system loses power, we often need to run again just what we need to.
Currently this is quite a manual process. The experimenter has to manually determine which work needs to be run, run them, then manually merge the new data into the existing data.
It would be better if krun could help us do this.
I imagine something like the following:
partial results detected. Looks like we need to run: 4 executions of benchmark3 and 4 executions of benchmark 4, OK?
This need not be an interactive process, there could be a -check-results
and a -resume
option on the CLI.
We should make our best attempt to set the powersave governor after benchmarking is done.
Recommend using sudo to crank up/down the governor (if possible).
Discussed with @ltratt.
Currently the result filename is derived from the config filename. E.g.:
warmup.krun -> warmup_results.json
Tools that consume results use a function in krun to discover the json filename.
I think it would be better if:
warmup_20151008.bz2
This is really just an annoyance and isn't all that high priority.
This was originally report by @cfbolz here (softdevteam/warmup_experiment#55):
The Ruby runner contains this code:
if /linux/ =~ RUBY_PLATFORM
MONOTONIC_CLOCK = Process::CLOCK_MONOTONIC_RAW
else
MONOTONIC_CLOCK = Process::CLOCK_MONOTONIC
end
however, on JRuby RUBY_PLATFORM
is "java", thus the wrong timer is used. Here is a discussion how to fix this: https://www.ruby-forum.com/topic/150942
As discussed with @snim2:
util.run_shell_cmd()
into the platform instance.
user
, which if not None
uses sudo
or doas
to invoke the command as another useros.system()
, subprocess.Popen()
with calls to the new method.Dealing with warmup and drawing graphs are not the responsibilities of krun
[2015-11-02 16:42:17: ERROR] I don't have support for your platform
run benchmarks highest priority. Helps on platforms with out CPU isolation.
The model is currently wrong. VM specific details should be associated with VMS not VARIANTS. Also all VM specific details should be hidden inside krun -- not bled out into the user config file.
Reccommend renaming Variants to LangProfiles, and requiring each vm in VMs to requires a 'profile' entry.
Hi,
I don't understand the current implementation of audits_same_platform
:
def audits_same_platform(audit0, audit1):
"""Check whether two platform audits are from identical machines.
A machine audit is a dictionary with the following keys:
* cpuinfo (Linux only)
* packages (Debian-based systems only)
* debian_version (Debian-based systems only)
* uname (all platforms)
* dmesg (all platforms)
Platform information may be Unicode.
"""
for key in ["cpuinfo", "dmesg", "uname"]:
if (not key in audit0) or (not key in audit1):
return False
if not audit0["uname"] == audit1["uname"]:
return False
return True
It just checks the existence of keys, and not the content of the audit. Is that intentional? On OpenBSD this always returns False
as cpuinfo
is linux specific.
I think:
if linux() elif OpenBSD...
.Thanks
As discussed with Chris Seaton, some VMs need checks prior to experimentation.
E.g. for graal, we need to check java.vm.name
for "JVMCI".
Similarly, for JRuby we should check Truffle.graal?
is true.
From experience, it's easy to get these wrong.
As discussed with @ltratt , it should be possible to log all benchmarking output to a file and to (optionally) email errors to a configurable address.
As it stands the starting CPU temperature is not honored in reboot mode. We would need to store it somewhere and pick it up later. Probably in the results file.
CPU_GOV_FILE = "/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"
Bencher3 has cpu1, cpu2, and cpu3 in addition. We should check these too.
In an email between Laurie an I:
I did find this, I wonder if we should contact this Arjan chap?:
https://plus.google.com/+TheodoreTso/posts/2vEekAsG2QT
It works but I'm unhappy with it:
Some of the issues regarding the former are due to the fact that the heap limit is not known when the VMDef is instantiated. I don't want the user to have to pass that in explicitly. I guess the proper fix is to make a proper config file format which is not evaluated by the python interpreter. Then after the file is parsed, the VMDefs could be instantiated passing in whatever we want.
To properly handle the latter we would need to provide krun with information about the environment. E.g. to properly update PATH without flattening the existing $PATH, krun would need to know that it is a colon separated collection of paths, and that to append another path, it should append ":%s" % another_path
These should be fixed one day perhaps, but we have bigger fish to fry at the moment.
Some OSs -- particularly Linux, it seems -- print continuous, voluminous, and not insightful output to their dmesgs. We need to add a blacklist feature so that known not-very-interesting things (e.g. segfaults) aren't reported to the user. I suggest that each platform has a default list of these things. For the time being, I think a crude mechanism is enough. Maybe in the future we might find that we want to allow users to add their own rules, but that's overkill for now.
It is difficult to see what is in a results file. These CLI options would print the audit or config to STD out, saving the user the effort of loading the bzip / json.
At current, isolcpus
on Linux only blocks user processes from a CPU and not kernel threads.
There is a tool called cset
which might allow us to do proper CPU isolation, kernel threads and all. For a while it has been broken on Linux. I reported a bug to debian a while ago and they have finally responded saying it should be fixed:
Krun should check turbo mode is off. This can be checked in:
/sys/devices/system/cpu/intel_pstate/no_turbo
To prevent a benchmarking machine rebooting infinitely:
INFO
message in the logAs raised by ltratt, we may need to use tickless mode on linux:
http://lwn.net/Articles/549580/
Our debian 8 systems currently run idle tickless mode:
vext01@bencher3:~$ cat /boot/config-`uname -r` | grep HZ
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set <--------------------- this would need to be on!
# CONFIG_NO_HZ is not set
CONFIG_RCU_FAST_NO_HZ=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_MACHZ_WDT=m
To fix this, we would need to build a custom kernel. Another impractical step unfortunately.
When a new Krun schedule is created, a krun.config.Config
object is created, based on a configuration file on disk. The constructed Config
object contains a number of VM definitions, based on the classes in krun.vm_defs
. Each of these objects needs to know about the platform on which they are running, so each VM object has a set_platform()
method which needs to be called after the VM has been constructed.
This creates an awkward dependency between all the objects that have to be created to run a schedule. Initially, the krun.py
file dealt with this in main()
, meaning that client code (including unit tests) had to explicitly manage object initialisation in the correct order.
PR #111 moved this code into the constructor of krun.scheduler.Scheduler
, so that anyone creating a full scheduler can forget about object initialisation. Other client code (including some unit tests) will still need to call set_platform()
by hand. This is not particularly clean.
At the very least it these dependencies should be clearly documented.
Krun is becoming complicated. We should document it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.