Code Monkey home page Code Monkey logo

hsbencher's People

Contributors

acfoltzer avatar blairarchibald avatar osa1 avatar parfunc avatar parfunc2 avatar peter-fogg avatar rrnewton avatar ryanglscott avatar svenssonjoel avatar tmcdonell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hsbencher's Issues

Merge plugIns and plugInConfs

In the Config datatype:

I can't see why these are separate currently. It should be possible to get plugIns from plugInConfs

getCurrentDirectory: does not exist

The Problem:

--------------------------------------------------------------------------------
  Running Config 29 of 48: ScanBench/Scan.cabal s2 256 4096
   []
--------------------------------------------------------------------------------

run_benchmarks.exe: getCurrentDirectory: does not exist (No such file or directory)
Build step 'Execute shell' marked build as failure
Finished: FAILURE

How can that be, given that the 28 previous configs was ok?

Get rid of row character-count limitation

This is a totally silly limitation -- because of URL length limitations we are limited in how much data we can upload to each row of our fusion table. I've disabled certain columns of our benchmarking schema to work around this.

Fixing it will require either using the SQL bulk row upload API call.

OR I heard that there is a general-purpose mechanism for spilling URLs into a POST body that Google allows in all APIs (@craigcitro - what was that called??).

[generalize] Make upload schema extensible

Related to #13

It's very clear that continuing to add builtin fields like "JITTIME" is a losing game. It probably is a good idea to have some "core" fields in the benchmark data schema. However, for the more obscure ones, we really need the ability to customize these. That is, to add a field, extract it from the benchmark run in a custom way, and upload it into the benchmark database.

Refactor: Combine BenchmarkResult and RunResult

They serve similar purposes, but to be able to use LineHarvesters well and extensibly (cc #13, #24), we need these two datatypes to be combined.

It was a historical accident that they are separate. RunResult evolved out of the code for measuring processes, whereas BenchmarkResult came from the fusion table code for uploading.

Automatically set Fusion table column TYPE as well as name

Currently we manually go in for each new table and change columns from type "Text" to "Number", which unlocks a bunch of different functionality in the fusion table.

We could do this automatically when adding missing columns to a fusion table. Both for the core, builtin schema and for custom tags, we know which are numbers and which are strings.

Support linear regression in a general way

As of Criterion 1.0, it uses a linear regression methodology to generate an estimate of the expected marginal cost of running a benchmark just one more time.

A core type in Criterion is the Benchmarkable data structure, which has an exposed constructor and is of the form:

data Benchmarkable = Benchmarkable (Int64 -> IO ()) 

Usually, the user doesn't construct one of these objects directly, rather criterion constructs the Int64 -> IO () function that simply takes a number and runs an IO action N times. However, there are useful applications of constructing this function directly:

  • Ruling out overhead from some startup or initialization action. The regression methodology will do that automatically and it's ok to have actions of the form \n -> do init; realStuff n.
  • Running code inside a different (non-IO) monad, like the Par monad. Doing a runPar sets up an execution environment. This is an example of a one-time cost that can be amortized over a loop that runs /inside/ the Par monad.
  • Varying something other than the number of iterations. For example, it is also useful to do regressions while varying data structure size rather than number of iterations.

The reason I'm mentioning this here, is that HSBencher/Criterion integration would make sense, and could take two distinct forms:

  • We should make it possible to parameterize arbitrary HSBencher benchmarks by a numeric parameter, and perform linear regression. This works even though traditional HSBencher benchmarks run in their own process.
  • Second, for Haskell code, we should make it possible to run traditional, intra-Haskell Criterion benchmarks and make it easy to dump their outputs as HSBencher-harvestable tags that go to our backend data stores.

In the second case, it's not clear to me whether the benchmarks should run in the same process as the benchmark harness or not. Opinions on that welcome.

Remove "cmdargs" that are built-in to the benchmark

This field can go away. These should be subsumed by a parameter setting (RuntimeArg).

The current mkBenchmark function can be tweaked to remain backwards compatible. Specifically, if passed arguments it can And RuntimeArg into the config space to add them.

Add timeouts for make-based build method. For this pass `Config` to `RunInPlace`

It currently doesn't time out at all, which is hitting us on ConcurrentCilk benchmarks. The problem is the interface to RunInPlace which gives it a tiny peephole view of what it needs to run. Either BuildMethod{compile} or RunInPlace really needs to get the full Config object to be able to read global configuration information.

See if we can create the Fusion table columns in a friendlier order

Right now it's putting the columns in alphabetical order.

It's much nicer to read if they are prioritized. Perhaps we can change this at the point where HSBencher creates the extra columns when it is given a new table to work on.

Incidentally, reordering them manually is HORRIBLE because the fusion table website has a very silly UI for it. (You have to chase each field pressing the up or down arrow repeatedly to move it... sorting the whole thing requires O(N^2) clicks.)

Add RETRIES field to core schema

With the retry functionality in-place, it's important to know in the recorded data-set that the retry occurred. This can be useful, for example, in post-facto detective work -- when software is flaky, are the retries quantitatively higher with version A rather than version B?

"--disable" is behaving oddly: the plugin does not initialize, but still tries to upload

In particular, one message like this sneaks through at the end of each benchmark RUN:

 [fusiontable] Computed schema, no custom fields.

That's it. No subsequent messages. And yet the code goes on without conditionals at this point and so it should print other messages. Unless its throwing an exception. It must be throwing an exception.

And since I confirmed that we are indeed removing it from allplugs in App.hs. So how is the fusion plugin object remaining accessible?

Finish the multiresult branch

What this will entail is a notion of LineHarvester that gets to modify the full BenchmarkResult not just the RunResult.

The resulting model is a little weird, but will work. Lines of output from the benchmark essentially mutate the benchmark result. Then each ARGS_AND_SELFTIMED tag is like calling "fork", it finalizes the existing state (BenchmarkResult), and begins a fresh one. So, basically, any general metadata (like JITTIME) should be spit out before starting the multiple benchmarks within a run.

Remove old ".dat" logging, make it a Plugin

It is still kinda useful as a dead simple output format that can easily be parsed by other programs. I think it should work fine as a Plugin, just like Dribble or Fusion.

This is one step towards cleaning up the (old and crufty) main application logic in App.hs.

Add `-l` option, like test-framework

I've become accustomed to using this option to list which benchmarks are available (by name), and thus see what patterns to type to activate the desired benchmarks.

Added tolerateError field, add command line opt to control it

This is a field of the CommandDescr data structure, but we need a global configuration option, set by a command line flag, to actually turn it on.

(And then we need to update the Accelerate/Cilk benchmarks to actually use this flag... because that's where we were originally seeing segfaults on process shutdown AFTER the program was complete. Those were the segfaults we wanted to tolerate for now.)

Make it possible to dump serialized BenchmarkResults for later upload

On the machine at Chalmers google API calls are frequently timing out. This might need yet another hack (I've lost count) to dump the data and then upload it to FusionTables from another machine.

This would have the advantage of needing fewer API calls, being instead able to upload many rows together.

Add basic colorization of text output

Things like stderr vs. stdout, benchmark harness vs. subprocess output, and delimiters such as ----------------- are all good candidates for color-level distinctions.

[codespeed backend] Notes and conventions

CC @svenssonjoel @tmcdonell

CodeSpeed has a different hierarchy of concepts. For the code speed plugin/uploader, I propose:

  • "Environment" = HOSTNAME. That is, we use the field we already have in the HSBencher Schema.
  • "Benchark" = (PROGNAME, ARGS, THREADS) -- this is a lot to pack in, but if we don't put all these into the "key", then we end up doing an apples-to-oranges comparison. I propose we just append these three fields, space separated, to generate the benchmark name.
  • "Executable" = VARIANT -- these are what the Comparison view helps us compare.

One alternative would be to move THREADS into the Executable name (e.g. make it (VARIANT,THREADS)). This would bloat that category but might be a good idea anyway.

Add `CompiletimeEnv` to parallel `RuntimeEnv`

This is currently an asymmetry. Further, passing multiple compile time variables is more user friendly than packing them into the single COMPILE_ARGS parameter when using the Makefile method.

Add an option to suck dribble files back in and upload them

We have situations like this build:

http://tester-lin.soic.indiana.edu:8080/view/ParfuncBenchmarks/job/benchmark_ConcurrentCilk/219/

Where it ran and generated data, but we had forgotten to create the auth token for that particular google API clientid and the data didn't upload to the fusion table. It would be nice to be able to quickly:

  • grep out the relevant lines of the dribble file
  • pipe them to something that simply accepts CSV data and uploads it through all backends (fusion table in this case)

Now, the really nice piece of functionality would be duplicate-suppression, in that case we would have an easy way of making sure we haven't missed data by just piping in the entire dribble file. That's a longer term project, and it probably wouldn't work too well with fusion tables because if we had to do per-tuple checks it would exhaust the API quota quickly.

Criterion integration

If someone wants to use the the hsbencher for data management, but the application is suited to Criterion, we should support that use case.

[Fusion] Make the Column creation happen even without --name

Currently it's the getTableId function (in Fusion.hs) that actually populates the extra columns.

That functionality should be factored out and put elsewhere. Then it should be used even when the table is identified by its unique ID, not by its name.

[generalize] Make SELFTIMED style tags editable in a config file.

This protocol of tagged lines is growing kind of large, and it should be abstracted.

Also, as a preemptive measure we should probably add even more fields to the benchmark result schema so as to have room for people to shoehorn in weird stuff. Either that or make the schema itself extensible.

Allow filter arguments to match against fields other than the bench target

We interpret command line args to an hsbencher executable as filtering the space of benchmarks to run. But right now it is limited to the target field.

In particular, there is important info in the variant and the progname. Should check each of these, possible in this order:

  • progname
  • target
  • variant
  • runtime args?

Note that the variant is not contained in the Benchmark, it is part of the param space. This an expanded notion of filtering -- not just the list of benchmarks but filtering subsets of their param spaces.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.