Code Monkey home page Code Monkey logo

benchmark_harness's People

Contributors

0xcaff avatar aam avatar adam-singer avatar athomas avatar dependabot[bot] avatar devoncarew avatar franklinyow avatar jakemac53 avatar johnmccutchan avatar jonasfj avatar kevmoo avatar leafpetersen avatar mraleph avatar mvuksano avatar natebosch avatar parlough avatar pq avatar psygo avatar sethladd avatar slightlyoff avatar sortie avatar srawlins avatar whesse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

benchmark_harness's Issues

add printout of "time per unit" in nanoseconds

Current printout (time to run 10 times, in microseconds) might be good only for relative measurements and regression testing, but not for evaluation of the performance in absolute terms.
E.g. when we benchmark "list copy" operation using 100-element lists, we can tell how List.of compares with toList, but it tells nothing about actual performance. If we had printout in nanoseconds per 1 copied list item, we can compare it against common sense. Common sense, in this case, says that if we have a reading above 2 nanoseconds/item on a 3+Ghz processor, there's room for improvement. And if we measure 100 nanoseconds per item (instead of 2) then there's a lot of room for improvement. Etc.

Add AsyncBenchmarkBase

I would possibly be willing to do this, assuming people are on board with the change.

Change the license to an open source license such as MIT

The license says

Harness:
Copyright 2012 Google Inc. All Rights Reserved.

Individual benchmarks:
Each benchmark has an author and license. See benchmark source for details.

I presume this is a mistake and this package was meant to be licensed under a permissive license such as MIT

Possible to run benchmarks via dart2js / Chrome?

dart test has a handy --platform flag, where you can run your tests under different environments, 'vm', 'chrome', 'node' and some others.

It would be useful if benchmarks could be run the same way.

Why is the benchmark reporting 10-times higher values?

I understand, it's beneficial (for the sake of accurateness) to run the measured function 10 times in a loop. But, why this is the value which is actually reported? Why not report value divided by 10? It is really confusing!

adjust guidence wrt this being the recommended benchmarking solution?

Hi - I'm very excited to see activity in this repo again, including discussion about possible improvements to the benchmarking technique - #38 (comment).

I think there's an open question about whether this is a general purpose benchmarking package or one that's now effectively just tailored for supporting internal dart benchmarking. I think either is ok, but we should communicate which to users.

Right now, in the readme we have The Dart project benchmark harness is the recommended starting point when building a benchmark for Dart., and in the pubspec we have: The official Dart project benchmark harness.

I'd like to choose one, and adjust our messaging to match; @sortie - thoughts?

Logged output should be to a fixed precision

When running a benchmark, the output is logged like this:
ObjectBenchmark(RunTime): 3151.181102362205 us.

Which implies a precision not being measured. It might be more clear to show a fixed precision.

Stats for the measurements?

For the benchmark measurements to be useful when comparing two or more versions of some code, we need to know the margin of error (MoE). Otherwise, we can't know whether an optimization is actually, significantly better than the base.

Here's what I mean:

Commit Mean
e11fe3f0 14.91
bab88227 14.64

Without MoE, this looks good. We made the code almost 2% faster with the second commit, right? No:

Commit Mean MoE
e11fe3f0 14.91 0.17
bab88227 14.64 0.14

We actually have no idea if the new code is faster. But we wouldn't know this without the MoE column, and we might prematurely pick the wrong choice.

Right now benchmark_harness only gives a single number. I often resort to running the benchmark many times, in order to ascertain the variance of measurements. This is slow and wasteful, because it's basically computing a mean of means. A measurement that could last ~2 seconds takes X * ~2 seconds, where X is always >10 and sometimes ~100.

I'm not sure this is in scope of this package, seeing as this one seems to be focused on really tight loops (e.g. forEach vs addAll) and long-term tracking of the SDK itself. Maybe it should be a completely separate package?

I'm proposing something like:

  1. Create a list for individual measurements (e.g. List.generate(n * batchIterations, () => -1))
  2. Warmup
  3. Execute n batches, each with batchIterations of the actual measured code, and put the measured time into the list.
  4. Tear down
  5. Compute the mean and the margin of error. Optionally, print all the measurements or provide an object with all the statistics. (I'm personally using my own t_stats package but there are many others on pub, including @kevmoo's stats, plus this is simple enough to be simply implemented without any external dependency.)

PROs:

  • Developers can make better-informed decisions about the optimizations the do
  • This works out of the box instead of being an exercise in statistic for each developer or company

CONs:

  • We need many measurements in order to compute the margin of error. That doesn't mean we need to necessarily save the time for every single run (which would add a lot of overhead), but we need at least tens, ideally hundreds of batches (e.g. measure the entirety of for (int i = 0; i < batchIterations; i++) { run();} many times).
  • Which means we probably need to know the number of batches and runs in advance, so that we don't need to dynamically add to a growable list of measurements during benchmarking.

I know this package is in flux now. Even a simple "no, not here" response is valuable for me.

Target of URI does not exist: 'package:benchmark_harness/benchmark_harness.dart'

Problem seen on new SDK. Is this known?

Dart Editor Problem:
Target of URI does not exist: 'package:benchmark_harness/benchmark_harness.dart'

Offending line:
import 'package:benchmark_harness/benchmark_harness.dart';

pubspec.yaml:
dependencies:
benchmark_harness: any

pubspec.lock:
packages:
benchmark_harness:
description: benchmark_harness
source: hosted
version: "1.0.3"

Environment:
Build 32242
Dart Editor version 1.2.0.dev_03_02 (DEV)
Dart SDK version 1.2.0-dev.3.2
64-bit Windows 7

Thanks,
Everton

update the readme for common questions

We should update the readme - likely in the example example section - to answer common questions:

  • should the user warm up in the benchmark in the setup method, or does the framework take care of this?
  • are the reported numbers averages or medians?

Add closure compiler output of the JS versions

It would be nice, if there would be another version of JS which is compiled by closure compiler with the highest optimization level. Many people compile their client side javascript code with closure compiler which does the dead code elimination. I believe it's unfair to compare optimized code from dart2js with js code that is not optimized which will make this benchmark misleading.

run time, run, warmup, exercise, ...

Hello

I've used the benchmark_harness for the first time and i got confused. The example shows that i should overwrite the "run" method with my benchmark code. In the end the run time is reported and it does not show the average run time but 10x the average run time of the "run" method. The reason is that "exercise" calls "run" 10 times and in fact the run time of "exercise" is reported.

Maybe you could make it more clear in the example. Or change run/warmup/exercise completely to make it more intuitive.

Thank you,
Bernhard

publish 1.0.5?

@johnmccutchan the current version is still using elapsedMilliseconds and multiplying that by 1000, but in the repo its using elapsedMicroseconds. A new version should be published so users can get this change.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.