iu-parfunc / hsbencher Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 8.0 1.28 MB

General benchmarking framework. Especially good at parameter studies.

Haskell 95.22% Shell 3.19% Makefile 0.23% Gnuplot 1.16% C 0.05% Nix 0.15%

hsbencher's People

Contributors

Stargazers

Watchers

Forkers

acfoltzer eholk svenssonjoel adk9 peter-fogg mjdwitt blairarchibald tmcdonell

hsbencher's Issues

Merge plugIns and plugInConfs

In the Config datatype:

I can't see why these are separate currently. It should be possible to get plugIns from plugInConfs

getCurrentDirectory: does not exist

The Problem:

--------------------------------------------------------------------------------
  Running Config 29 of 48: ScanBench/Scan.cabal s2 256 4096
   []
--------------------------------------------------------------------------------

run_benchmarks.exe: getCurrentDirectory: does not exist (No such file or directory)
Build step 'Execute shell' marked build as failure
Finished: FAILURE

How can that be, given that the 28 previous configs was ok?

Get rid of row character-count limitation

This is a totally silly limitation -- because of URL length limitations we are limited in how much data we can upload to each row of our fusion table. I've disabled certain columns of our benchmarking schema to work around this.

Fixing it will require either using the SQL bulk row upload API call.

OR I heard that there is a general-purpose mechanism for spilling URLs into a POST body that Google allows in all APIs (@craigcitro - what was that called??).

[generalize] Make upload schema extensible

Related to #13

It's very clear that continuing to add builtin fields like "JITTIME" is a losing game. It probably is a good idea to have some "core" fields in the benchmark data schema. However, for the more obscure ones, we really need the ability to customize these. That is, to add a field, extract it from the benchmark run in a custom way, and upload it into the benchmark database.

Refactor: Combine BenchmarkResult and RunResult

They serve similar purposes, but to be able to use LineHarvesters well and extensibly (cc #13, #24), we need these two datatypes to be combined.

It was a historical accident that they are separate. RunResult evolved out of the code for measuring processes, whereas BenchmarkResult came from the fusion table code for uploading.

If a command line option is not matched by ANY plugin, throw an error

Right now (583ebdb), any command line options that are not recognized are simply ignored. At the very very least a warning should be printed.

Automatically set Fusion table column TYPE as well as name

Currently we manually go in for each new table and change columns from type "Text" to "Number", which unlocks a bunch of different functionality in the fusion table.

We could do this automatically when adding missing columns to a fusion table. Both for the core, builtin schema and for custom tags, we know which are numbers and which are strings.

Factor out backends, HSBencher.Backends.*, and use separate packages.

fusion
legacy
dribble

and then it needs to get a MySQL one at some point.

Support linear regression in a general way

As of Criterion 1.0, it uses a linear regression methodology to generate an estimate of the expected marginal cost of running a benchmark just one more time.

A core type in Criterion is the Benchmarkable data structure, which has an exposed constructor and is of the form:

data Benchmarkable = Benchmarkable (Int64 -> IO ())

Usually, the user doesn't construct one of these objects directly, rather criterion constructs the Int64 -> IO () function that simply takes a number and runs an IO action N times. However, there are useful applications of constructing this function directly:

Ruling out overhead from some startup or initialization action. The regression methodology will do that automatically and it's ok to have actions of the form \n -> do init; realStuff n.
Running code inside a different (non-IO) monad, like the Par monad. Doing a runPar sets up an execution environment. This is an example of a one-time cost that can be amortized over a loop that runs /inside/ the Par monad.
Varying something other than the number of iterations. For example, it is also useful to do regressions while varying data structure size rather than number of iterations.

The reason I'm mentioning this here, is that HSBencher/Criterion integration would make sense, and could take two distinct forms:

We should make it possible to parameterize arbitrary HSBencher benchmarks by a numeric parameter, and perform linear regression. This works even though traditional HSBencher benchmarks run in their own process.
Second, for Haskell code, we should make it possible to run traditional, intra-Haskell Criterion benchmarks and make it easy to dump their outputs as HSBencher-harvestable tags that go to our backend data stores.

In the second case, it's not clear to me whether the benchmarks should run in the same process as the benchmark harness or not. Opinions on that welcome.

Bring back parallelism in compile phase

This has been disabled for a while, since the big switch to the new method of benchmark specification. The code is still there, commented, in App.hs

Remove "cmdargs" that are built-in to the benchmark

This field can go away. These should be subsumed by a parameter setting (RuntimeArg).

The current mkBenchmark function can be tweaked to remain backwards compatible. Specifically, if passed arguments it can And RuntimeArg into the config space to add them.

Add timeouts for make-based build method. For this pass `Config` to `RunInPlace`

It currently doesn't time out at all, which is hitting us on ConcurrentCilk benchmarks. The problem is the interface to RunInPlace which gives it a tiny peephole view of what it needs to run. Either BuildMethod{compile} or RunInPlace really needs to get the full Config object to be able to read global configuration information.

See if we can create the Fusion table columns in a friendlier order

Right now it's putting the columns in alphabetical order.

It's much nicer to read if they are prioritized. Perhaps we can change this at the point where HSBencher creates the extra columns when it is given a new table to work on.

Incidentally, reordering them manually is HORRIBLE because the fusion table website has a very silly UI for it. (You have to chase each field pressing the up or down arrow repeatedly to move it... sorting the whole thing requires O(N^2) clicks.)

Add RETRIES field to core schema

With the retry functionality in-place, it's important to know in the recorded data-set that the retry occurred. This can be useful, for example, in post-facto detective work -- when software is flaky, are the retries quantitatively higher with version A rather than version B?

"--disable" is behaving oddly: the plugin does not initialize, but still tries to upload

In particular, one message like this sneaks through at the end of each benchmark RUN:

 [fusiontable] Computed schema, no custom fields.

That's it. No subsequent messages. And yet the code goes on without conditionals at this point and so it should print other messages. Unless its throwing an exception. It must be throwing an exception.

And since I confirmed that we are indeed removing it from allplugs in App.hs. So how is the fusion plugin object remaining accessible?

Build/test in Travis CI

It'd be good to set up Travis CI to automatically build HSBencher on a clean machine after every push, just to make sure all the dependencies are available and all that.

Travis seems to have support for Haskell already: http://docs.travis-ci.com/user/languages/haskell/

Move all modules but main one to Internal namespace

The user is only supposed to import HSBencher, nothing else.

It's distracting to have too many modules listed on the haddock docs.

Add `--no-fusion` to Fusion backend command line options

The annoying thing is that once you've (statically) linked in the Fusion plugin, then there's no way to turn off fusion upload. This flag would enable that.

Note: when this issue is closed, also update the wiki page.

Benchmark target path should not be sensitive to trailing slash

The included methods, like cabal, should not care whether we make the benchmark target ./foo/bar or ./foo/bar/. But the cabal method seems not to recognize the former as a directory and find the .cabal inside it.

Finish the multiresult branch

What this will entail is a notion of LineHarvester that gets to modify the full BenchmarkResult not just the RunResult.

The resulting model is a little weird, but will work. Lines of output from the benchmark essentially mutate the benchmark result. Then each ARGS_AND_SELFTIMED tag is like calling "fork", it finalizes the existing state (BenchmarkResult), and begins a fresh one. So, basically, any general metadata (like JITTIME) should be spit out before starting the multiple benchmarks within a run.

Remove old ".dat" logging, make it a Plugin

It is still kinda useful as a dead simple output format that can easily be parsed by other programs. I think it should work fine as a Plugin, just like Dribble or Fusion.

This is one step towards cleaning up the (old and crufty) main application logic in App.hs.

Since modifying Config is part of the public API, remove internal fields

These should be put into some separate ConfigInternal datatype.

Remove all uses of forkIO in favor of Control.Concurrent.Async

This appears, for example, in Utils where we launch echoStream.

Add `-l` option, like test-framework

I've become accustomed to using this option to list which benchmarks are available (by name), and thus see what patterns to type to activate the desired benchmarks.

Regularize PlugIn/Plugin capitalization

COMPILE_FLAGS doesn't seem make it into the final uploaded output

This is for dynaprof/microbenchmarks... we actually use the CompileParam setting for the initOverhead benchmark.

Added tolerateError field, add command line opt to control it

This is a field of the CommandDescr data structure, but we need a global configuration option, set by a command line flag, to actually turn it on.

(And then we need to update the Accelerate/Cilk benchmarks to actually use this flag... because that's where we were originally seeing segfaults on process shutdown AFTER the program was complete. Those were the segfaults we wanted to tolerate for now.)

Make it possible to dump serialized BenchmarkResults for later upload

On the machine at Chalmers google API calls are frequently timing out. This might need yet another hack (I've lost count) to dump the data and then upload it to FusionTables from another machine.

This would have the advantage of needing fewer API calls, being instead able to upload many rows together.

Bring back parallelism

Add basic colorization of text output

Things like stderr vs. stdout, benchmark harness vs. subprocess output, and delimiters such as ----------------- are all good candidates for color-level distinctions.

Even with --keepgoing the hsbencher process should set error exit code

That is, it should run through all benchmarks, but it should set the final exit code based on whether any of them failed.

Add a field to Benchmark data struct to force BuildMethod

This would be a Maybe BuildMethod override on the general process for configure build methods. Setting it to Just would demand that that benchmark use that specific build method.

This was requested by @peter-fogg.

[codespeed backend] Notes and conventions

CC @svenssonjoel @tmcdonell

CodeSpeed has a different hierarchy of concepts. For the code speed plugin/uploader, I propose:

"Environment" = HOSTNAME. That is, we use the field we already have in the HSBencher Schema.
"Benchark" = (PROGNAME, ARGS, THREADS) -- this is a lot to pack in, but if we don't put all these into the "key", then we end up doing an apples-to-oranges comparison. I propose we just append these three fields, space separated, to generate the benchmark name.
"Executable" = VARIANT -- these are what the Comparison view helps us compare.

One alternative would be to move THREADS into the Executable name (e.g. make it (VARIANT,THREADS)). This would bloat that category but might be a good idea anyway.

Add `CompiletimeEnv` to parallel `RuntimeEnv`

This is currently an asymmetry. Further, passing multiple compile time variables is more user friendly than packing them into the single COMPILE_ARGS parameter when using the Makefile method.

Upon failing under --keepgoing, print a summary of which failed.

This is a follow-on to #28, and should be easy to do.

Add an option to suck dribble files back in and upload them

We have situations like this build:

http://tester-lin.soic.indiana.edu:8080/view/ParfuncBenchmarks/job/benchmark_ConcurrentCilk/219/

Where it ran and generated data, but we had forgotten to create the auth token for that particular google API clientid and the data didn't upload to the fusion table. It would be nice to be able to quickly:

grep out the relevant lines of the dribble file
pipe them to something that simply accepts CSV data and uploads it through all backends (fusion table in this case)

Now, the really nice piece of functionality would be duplicate-suppression, in that case we would have an easy way of making sure we haven't missed data by just piping in the entire dribble file. That's a longer term project, and it probably wouldn't work too well with fusion tables because if we had to do per-tuple checks it would exhaust the API quota quickly.

Treat all errors in plugins as warnings by default

There can be a global flag to change this, but by default, especially when there are multiple plugins, we don't want to stop the train when one has failed.

Criterion integration

If someone wants to use the the hsbencher for data management, but the application is suited to Criterion, we should support that use case.

[Fusion] Make the Column creation happen even without --name

Currently it's the getTableId function (in Fusion.hs) that actually populates the extra columns.

That functionality should be factored out and put elsewhere. Then it should be used even when the table is identified by its unique ID, not by its name.

Update for GHC-7.8.2

[cabalMethod] Dependencies of the target benchmark with executable targets get in the way

Because we use --bindir and a single cabal command to build/install a target benchmark, we have a problem if it installs upstream dependencies with executable targets themselves. The place this came up recently was our BoolVarTree benchmark, which depends on cpphs.

One example of this bug can be found in this log

[generalize] Make SELFTIMED style tags editable in a config file.

This protocol of tagged lines is growing kind of large, and it should be abstracted.

Also, as a preemptive measure we should probably add even more fields to the benchmark result schema so as to have room for people to shoehorn in weird stuff. Either that or make the schema itself extensible.

Add an ~/.hsbencher file to the messy Config harvesting phase

The getConfig function in Config.hs already digs around the system to find default settings. Might as well go all the way and have the option of a textual config file following usual Unix conventions.

[Dribble] Should take command line option to specify dir or file

Right now it is hard coded to .cabal/.

Make stand alone binaries not require "./"

Any valid file path (relative or absolute) should be acceptable there.

Allow filter arguments to match against fields other than the bench target

We interpret command line args to an hsbencher executable as filtering the space of benchmarks to run. But right now it is limited to the target field.

In particular, there is important info in the variant and the progname. Should check each of these, possible in this order:

progname
target
variant
runtime args?

Note that the variant is not contained in the Benchmark, it is part of the param space. This an expanded notion of filtering -- not just the list of benchmarks but filtering subsets of their param spaces.

Is this an exception on a child thread that isn't being caught, causing part of the infrastructure to go down? Or is it some kind of resource leak? #31 addresses the former.