iu-parfunc / hsbencher Goto Github PK
View Code? Open in Web Editor NEWGeneral benchmarking framework. Especially good at parameter studies.
General benchmarking framework. Especially good at parameter studies.
In the Config
datatype:
I can't see why these are separate currently. It should be possible to get plugIns
from plugInConfs
The Problem:
--------------------------------------------------------------------------------
Running Config 29 of 48: ScanBench/Scan.cabal s2 256 4096
[]
--------------------------------------------------------------------------------
run_benchmarks.exe: getCurrentDirectory: does not exist (No such file or directory)
Build step 'Execute shell' marked build as failure
Finished: FAILURE
How can that be, given that the 28 previous configs was ok?
This is a totally silly limitation -- because of URL length limitations we are limited in how much data we can upload to each row of our fusion table. I've disabled certain columns of our benchmarking schema to work around this.
Fixing it will require either using the SQL bulk row upload API call.
OR I heard that there is a general-purpose mechanism for spilling URLs into a POST body that Google allows in all APIs (@craigcitro - what was that called??).
Related to #13
It's very clear that continuing to add builtin fields like "JITTIME" is a losing game. It probably is a good idea to have some "core" fields in the benchmark data schema. However, for the more obscure ones, we really need the ability to customize these. That is, to add a field, extract it from the benchmark run in a custom way, and upload it into the benchmark database.
They serve similar purposes, but to be able to use LineHarvesters well and extensibly (cc #13, #24), we need these two datatypes to be combined.
It was a historical accident that they are separate. RunResult evolved out of the code for measuring processes, whereas BenchmarkResult came from the fusion table code for uploading.
Right now (583ebdb), any command line options that are not recognized are simply ignored. At the very very least a warning should be printed.
Currently we manually go in for each new table and change columns from type "Text" to "Number", which unlocks a bunch of different functionality in the fusion table.
We could do this automatically when adding missing columns to a fusion table. Both for the core, builtin schema and for custom tags, we know which are numbers and which are strings.
and then it needs to get a MySQL one at some point.
As of Criterion 1.0, it uses a linear regression methodology to generate an estimate of the expected marginal cost of running a benchmark just one more time.
A core type in Criterion is the Benchmarkable
data structure, which has an exposed constructor and is of the form:
data Benchmarkable = Benchmarkable (Int64 -> IO ())
Usually, the user doesn't construct one of these objects directly, rather criterion constructs the Int64 -> IO ()
function that simply takes a number and runs an IO action N
times. However, there are useful applications of constructing this function directly:
\n -> do init; realStuff n
.Par
monad. Doing a runPar
sets up an execution environment. This is an example of a one-time cost that can be amortized over a loop that runs /inside/ the Par monad.The reason I'm mentioning this here, is that HSBencher/Criterion integration would make sense, and could take two distinct forms:
In the second case, it's not clear to me whether the benchmarks should run in the same process as the benchmark harness or not. Opinions on that welcome.
This has been disabled for a while, since the big switch to the new method of benchmark specification. The code is still there, commented, in App.hs
This field can go away. These should be subsumed by a parameter setting (RuntimeArg
).
The current mkBenchmark
function can be tweaked to remain backwards compatible. Specifically, if passed arguments it can And
RuntimeArg
into the config space to add them.
It currently doesn't time out at all, which is hitting us on ConcurrentCilk benchmarks. The problem is the interface to RunInPlace
which gives it a tiny peephole view of what it needs to run. Either BuildMethod{compile}
or RunInPlace
really needs to get the full Config
object to be able to read global configuration information.
Right now it's putting the columns in alphabetical order.
It's much nicer to read if they are prioritized. Perhaps we can change this at the point where HSBencher creates the extra columns when it is given a new table to work on.
Incidentally, reordering them manually is HORRIBLE because the fusion table website has a very silly UI for it. (You have to chase each field pressing the up or down arrow repeatedly to move it... sorting the whole thing requires O(N^2)
clicks.)
With the retry functionality in-place, it's important to know in the recorded data-set that the retry occurred. This can be useful, for example, in post-facto detective work -- when software is flaky, are the retries quantitatively higher with version A rather than version B?
In particular, one message like this sneaks through at the end of each benchmark RUN:
[fusiontable] Computed schema, no custom fields.
That's it. No subsequent messages. And yet the code goes on without conditionals at this point and so it should print other messages. Unless its throwing an exception. It must be throwing an exception.
And since I confirmed that we are indeed removing it from allplugs
in App.hs. So how is the fusion plugin object remaining accessible?
It'd be good to set up Travis CI to automatically build HSBencher on a clean machine after every push, just to make sure all the dependencies are available and all that.
Travis seems to have support for Haskell already: http://docs.travis-ci.com/user/languages/haskell/
The user is only supposed to import HSBencher
, nothing else.
It's distracting to have too many modules listed on the haddock docs.
The annoying thing is that once you've (statically) linked in the Fusion plugin, then there's no way to turn off fusion upload. This flag would enable that.
Note: when this issue is closed, also update the wiki page.
The included methods, like cabal
, should not care whether we make the benchmark target ./foo/bar
or ./foo/bar/
. But the cabal method seems not to recognize the former as a directory and find the .cabal
inside it.
What this will entail is a notion of LineHarvester that gets to modify the full BenchmarkResult
not just the RunResult
.
The resulting model is a little weird, but will work. Lines of output from the benchmark essentially mutate the benchmark result. Then each ARGS_AND_SELFTIMED tag is like calling "fork", it finalizes the existing state (BenchmarkResult), and begins a fresh one. So, basically, any general metadata (like JITTIME
) should be spit out before starting the multiple benchmarks within a run.
It is still kinda useful as a dead simple output format that can easily be parsed by other programs. I think it should work fine as a Plugin
, just like Dribble
or Fusion
.
This is one step towards cleaning up the (old and crufty) main application logic in App.hs
.
These should be put into some separate ConfigInternal
datatype.
This appears, for example, in Utils where we launch echoStream
.
I've become accustomed to using this option to list which benchmarks are available (by name), and thus see what patterns to type to activate the desired benchmarks.
This is for dynaprof/microbenchmarks... we actually use the CompileParam setting for the initOverhead benchmark.
This is a field of the CommandDescr
data structure, but we need a global configuration option, set by a command line flag, to actually turn it on.
(And then we need to update the Accelerate/Cilk benchmarks to actually use this flag... because that's where we were originally seeing segfaults on process shutdown AFTER the program was complete. Those were the segfaults we wanted to tolerate for now.)
On the machine at Chalmers google API calls are frequently timing out. This might need yet another hack (I've lost count) to dump the data and then upload it to FusionTables from another machine.
This would have the advantage of needing fewer API calls, being instead able to upload many rows together.
Things like stderr vs. stdout, benchmark harness vs. subprocess output, and delimiters such as -----------------
are all good candidates for color-level distinctions.
That is, it should run through all benchmarks, but it should set the final exit code based on whether any of them failed.
This would be a Maybe BuildMethod
override on the general process for configure build methods. Setting it to Just would demand that that benchmark use that specific build method.
This was requested by @peter-fogg.
CodeSpeed has a different hierarchy of concepts. For the code speed plugin/uploader, I propose:
One alternative would be to move THREADS into the Executable name (e.g. make it (VARIANT,THREADS)). This would bloat that category but might be a good idea anyway.
This is currently an asymmetry. Further, passing multiple compile time variables is more user friendly than packing them into the single COMPILE_ARGS
parameter when using the Makefile method.
This is a follow-on to #28, and should be easy to do.
We have situations like this build:
http://tester-lin.soic.indiana.edu:8080/view/ParfuncBenchmarks/job/benchmark_ConcurrentCilk/219/
Where it ran and generated data, but we had forgotten to create the auth token for that particular google API clientid and the data didn't upload to the fusion table. It would be nice to be able to quickly:
Now, the really nice piece of functionality would be duplicate-suppression, in that case we would have an easy way of making sure we haven't missed data by just piping in the entire dribble file. That's a longer term project, and it probably wouldn't work too well with fusion tables because if we had to do per-tuple checks it would exhaust the API quota quickly.
There can be a global flag to change this, but by default, especially when there are multiple plugins, we don't want to stop the train when one has failed.
If someone wants to use the the hsbencher for data management, but the application is suited to Criterion, we should support that use case.
Currently it's the getTableId
function (in Fusion.hs) that actually populates the extra columns.
That functionality should be factored out and put elsewhere. Then it should be used even when the table is identified by its unique ID, not by its name.
Because we use --bindir
and a single cabal command to build/install a target benchmark, we have a problem if it installs upstream dependencies with executable targets themselves. The place this came up recently was our BoolVarTree
benchmark, which depends on cpphs
.
One example of this bug can be found in this log
This protocol of tagged lines is growing kind of large, and it should be abstracted.
Also, as a preemptive measure we should probably add even more fields to the benchmark result schema so as to have room for people to shoehorn in weird stuff. Either that or make the schema itself extensible.
The getConfig
function in Config.hs
already digs around the system to find default settings. Might as well go all the way and have the option of a textual config file following usual Unix conventions.
Right now it is hard coded to .cabal/.
Any valid file path (relative or absolute) should be acceptable there.
We interpret command line args to an hsbencher executable as filtering the space of benchmarks to run. But right now it is limited to the target
field.
In particular, there is important info in the variant and the progname. Should check each of these, possible in this order:
Note that the variant is not contained in the Benchmark
, it is part of the param space. This an expanded notion of filtering -- not just the list of benchmarks but filtering subsets of their param spaces.
Along with that we should move away from using the Benchamark
constructor directly, in favor of various constructor functions (e.g. mkBenchmark
).
In testing out the Codespeed plugin, I need to pass a plugin-specific config in. I'm doing so, but it's getting set back to the default setting (defaultPlugConf
).
After ~50 runs, 3 trials each, eventually all subprocesses exit with an error code but NO output is produced.
Is this an exception on a child thread that isn't being caught, causing part of the infrastructure to go down? Or is it some kind of resource leak? #31 addresses the former.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.