dart-lang / benchmark_harness Goto Github PK

View Code? Open in Web Editor NEW

92.0 92.0 26.0 180 KB

The official benchmark harness for Dart

Home Page: https://pub.dev/packages/benchmark_harness

License: BSD 3-Clause "New" or "Revised" License

Dart 100.00%

benchmark_harness's People

Contributors

Stargazers

Watchers

benchmark_harness's Issues

Add drone.io CI

There is a drone.sh file here but there is no drone.io CI.

Add example output to example

Would it be okay if I added an example output to the README?

Abstract `BenchmarkBase`

In BenchmarkBase, instead of defining empty functions can't we just use an abstract class?

Examples:

It looks like the most recent change has not yet been published - https://github.com/dart-lang/benchmark_harness/blob/master/CHANGELOG.md#221; 2.2.0 is the latest published version but the pubspec has been rev'd to 2.2.1 in master. We should publish a new version to capture the recent work.

cc @sortie

Should use doc comments

This is copied from dartlang.org's #886:

From [email protected] on June 10, 2014 19:42:51

The benchmark_harness package doesn't use doc comments, only simple // comments (see benchmark_base.dart)

Original issue: http://code.google.com/p/dart/issues/detail?id=19344

add printout of "time per unit" in nanoseconds

Current printout (time to run 10 times, in microseconds) might be good only for relative measurements and regression testing, but not for evaluation of the performance in absolute terms.
E.g. when we benchmark "list copy" operation using 100-element lists, we can tell how List.of compares with toList, but it tells nothing about actual performance. If we had printout in nanoseconds per 1 copied list item, we can compare it against common sense. Common sense, in this case, says that if we have a reading above 2 nanoseconds/item on a 3+Ghz processor, there's room for improvement. And if we measure 100 nanoseconds per item (instead of 2) then there's a lot of room for improvement. Etc.

BenchmarkBase should be named Benchmark

I think we want to avoid naming class like this since the ideal is that a single class can be a base class, interface or mixin.

Add AsyncBenchmarkBase

I would possibly be willing to do this, assuming people are on board with the change.

Change the license to an open source license such as MIT

The license says

Harness:
Copyright 2012 Google Inc. All Rights Reserved.

Individual benchmarks:
Each benchmark has an author and license. See benchmark source for details.

I presume this is a mistake and this package was meant to be licensed under a permissive license such as MIT

Pub install failed

When I am try to include dependency in pubspec.yaml getting the following error.

Pub install failed, [1] Resolving dependencies...
Cannot install 'benchmark_harness' from Git (https://github.com/dart-lang/benchmark_harness.git).
Please ensure Git is correctly installed.

Possible to run benchmarks via dart2js / Chrome?

dart test has a handy --platform flag, where you can run your tests under different environments, 'vm', 'chrome', 'node' and some others.

It would be useful if benchmarks could be run the same way.

Why is the benchmark reporting 10-times higher values?

I understand, it's beneficial (for the sake of accurateness) to run the measured function 10 times in a loop. But, why this is the value which is actually reported? Why not report value divided by 10? It is really confusing!

Add hop support

Is it ok if I add support for hop?

adjust guidence wrt this being the recommended benchmarking solution?

Hi - I'm very excited to see activity in this repo again, including discussion about possible improvements to the benchmarking technique - #38 (comment).

I think there's an open question about whether this is a general purpose benchmarking package or one that's now effectively just tailored for supporting internal dart benchmarking. I think either is ok, but we should communicate which to users.

Right now, in the readme we have The Dart project benchmark harness is the recommended starting point when building a benchmark for Dart., and in the pubspec we have: The official Dart project benchmark harness.

I'd like to choose one, and adjust our messaging to match; @sortie - thoughts?

Logged output should be to a fixed precision

When running a benchmark, the output is logged like this:
ObjectBenchmark(RunTime): 3151.181102362205 us.

Which implies a precision not being measured. It might be more clear to show a fixed precision.

Stats for the measurements?

For the benchmark measurements to be useful when comparing two or more versions of some code, we need to know the margin of error (MoE). Otherwise, we can't know whether an optimization is actually, significantly better than the base.

Here's what I mean:

Commit	Mean
e11fe3f0	14.91
bab88227	14.64

Without MoE, this looks good. We made the code almost 2% faster with the second commit, right? No:

Commit	Mean	MoE
e11fe3f0	14.91	0.17
bab88227	14.64	0.14

We actually have no idea if the new code is faster. But we wouldn't know this without the MoE column, and we might prematurely pick the wrong choice.

Right now benchmark_harness only gives a single number. I often resort to running the benchmark many times, in order to ascertain the variance of measurements. This is slow and wasteful, because it's basically computing a mean of means. A measurement that could last ~2 seconds takes X * ~2 seconds, where X is always >10 and sometimes ~100.

I'm not sure this is in scope of this package, seeing as this one seems to be focused on really tight loops (e.g. forEach vs addAll) and long-term tracking of the SDK itself. Maybe it should be a completely separate package?

I'm proposing something like:

Create a list for individual measurements (e.g. List.generate(n * batchIterations, () => -1))
Warmup
Execute n batches, each with batchIterations of the actual measured code, and put the measured time into the list.
Tear down
Compute the mean and the margin of error. Optionally, print all the measurements or provide an object with all the statistics. (I'm personally using my own t_stats package but there are many others on pub, including @kevmoo's stats, plus this is simple enough to be simply implemented without any external dependency.)

PROs:

Developers can make better-informed decisions about the optimizations the do
This works out of the box instead of being an exercise in statistic for each developer or company

CONs:

We need many measurements in order to compute the margin of error. That doesn't mean we need to necessarily save the time for every single run (which would add a lot of overhead), but we need at least tens, ideally hundreds of batches (e.g. measure the entirety of for (int i = 0; i < batchIterations; i++) { run();} many times).
Which means we probably need to know the number of batches and runs in advance, so that we don't need to dynamically add to a growable list of measurements during benchmarking.

I know this package is in flux now. Even a simple "no, not here" response is valuable for me.

Target of URI does not exist: 'package:benchmark_harness/benchmark_harness.dart'

Problem seen on new SDK. Is this known?

Dart Editor Problem:
Target of URI does not exist: 'package:benchmark_harness/benchmark_harness.dart'

Offending line:
import 'package:benchmark_harness/benchmark_harness.dart';

pubspec.yaml:
dependencies:
benchmark_harness: any

pubspec.lock:
packages:
benchmark_harness:
description: benchmark_harness
source: hosted
version: "1.0.3"

Environment:
Build 32242
Dart Editor version 1.2.0.dev_03_02 (DEV)
Dart SDK version 1.2.0-dev.3.2
64-bit Windows 7

Thanks,
Everton

update the readme for common questions

We should update the readme - likely in the example example section - to answer common questions:

should the user warm up in the benchmark in the setup method, or does the framework take care of this?
are the reported numbers averages or medians?

README still uses git for pub repo

This package is now in pub. We can update the README.

Add closure compiler output of the JS versions

It would be nice, if there would be another version of JS which is compiled by closure compiler with the highest optimization level. Many people compile their client side javascript code with closure compiler which does the dead code elimination. I believe it's unfair to compare optimized code from dart2js with js code that is not optimized which will make this benchmark misleading.

run time, run, warmup, exercise, ...

Hello

I've used the benchmark_harness for the first time and i got confused. The example shows that i should overwrite the "run" method with my benchmark code. In the end the run time is reported and it does not show the average run time but 10x the average run time of the "run" method. The reason is that "exercise" calls "run" 10 times and in fact the run time of "exercise" is reported.

Maybe you could make it more clear in the example. Or change run/warmup/exercise completely to make it more intuitive.

Thank you,
Bernhard

dart-lang / benchmark_harness Goto Github PK

benchmark_harness's People

Contributors

Stargazers

Watchers

Forkers

benchmark_harness's Issues

Recommend Projects

Recommend Topics

Recommend Org