coq / coq-bench Goto Github PK

Scripts for differential performance testing of Coq packages / versions

License: Other

OCaml 43.53% Shell 56.47%

coq-bench's Introduction

Coq

Coq is a formal proof management system. It provides a formal language to write mathematical definitions, executable algorithms and theorems together with an environment for semi-interactive development of machine-checked proofs.

Installation

Please see https://coq.inria.fr/download. Information on how to build and install from sources can be found in INSTALL.md.

Documentation

The sources of the documentation can be found in directory doc. See doc/README.md to learn more about the documentation, in particular how to build it. The documentation of the last released version is available on the Coq web site at coq.inria.fr/documentation. See also Cocorico (the Coq wiki), and the Coq FAQ, for additional user-contributed documentation.

The documentation of the master branch is continuously deployed. See:

Changes

The Recent changes chapter of the reference manual explains the differences and the incompatibilities of each new version of Coq. If you upgrade Coq, please read it carefully as it contains important advice on how to approach some problems you may encounter.

Questions and discussion

We have a number of channels to reach the user community and the development team:

Our Zulip chat, for casual and high traffic discussions.
Our Discourse forum, for more structured and easily browsable discussions and Q&A.
Our historical mailing list, the Coq-Club.

See also coq.inria.fr/community, which lists several other active platforms.

Bug reports

Please report any bug / feature request in our issue tracker.

To be effective, bug reports should mention the OCaml version used to compile and run Coq, the Coq version (coqtop -v), the configuration used, and include a complete source example leading to the bug.

Contributing to Coq

Guidelines for contributing to Coq in various ways are listed in the contributor's guide.

Information about release plans is at https://github.com/coq/coq/wiki/Release-Plan

Supporting Coq

Help the Coq community grow and prosper by becoming a sponsor! The Coq Consortium can establish sponsorship contracts or receive donations. If you want to take an active role in shaping Coq's future, you can also become a Consortium member. If you are interested, please get in touch!

coq-bench's People

Contributors

Stargazers

Watchers

Forkers

ppedrot ejgallego anton-trunov ralfjung jasongross

coq-bench's Issues

Maintainership

Can we get a maintainer for this package?

I do volunteer to co-maintain.

Save benchmarks results

We should benchmark every commit automatically and save the reports in a format that's easy to browse.

support "overlays"

reuse the official opam file for Coq

At the moment, we generate the opam file(s) for Coq.

This is not strictly necessary.
(the only thing we need is a custom url file)

It would make sense to reuse the official opam file.

[ci] Using pinned references means that CI cannot test ML-based plugins well

In our .travis.yml file, we pin the new and old Coq commits to a fixed set. This is however unfortunate as it means that our bench setup will become stale when the ML API changes. For example, bignums.dev won't compile.

Anyways things like coq-bignums and most plugins should live in the main Coq repos to fix this problem.

Packages failing on Jenkins

A bunch of packages are constantly failing on jenkins:

- coq-color
- coq-fiat-crypto
- coq-sf-lf
- coq-sf-plf
- coq-sf-vfa
- coq-unimath

(Color seems newly failing)

We should figure out what's going on.

Jenkins: add TIMED=1

This would hopefully make it easier to do per-file comparisons.

I don't know where the jenkins scripts are, sorry if this isn't the right repository.

Wrong options passed to configure

@psteckler's job here failed https://ci.inria.fr/coq/view/opam/job/benchmark-part-of-the-branch/159/console because ./configure is called with -usecamlp5 and -nodoc. These options have been removed. -nodoc should be replaced with -with-doc no. -usecamlp5 was a no-op and should simply be removed.

Results should be saved incrementally

The EConstr build failed after some hours, but I don't think the script saves the results as it gets them. It should, so that we don't waste time.

[ci] Add CI testing

We should add a test-suite and support for testing.

cache and reuse the benchmarking results (whenever possible)

Dependencies must be provided in the topological order in order to avoid needless recompilation

The current script does something silly: if there are n packages, it can perform n(n+1) / 2 builds because of dependencies.

Why do we remove packages after we benched them? If they stayed installed, they could be used as dependencies for the next packages without reinstalling them. We could then pass to the script the names of packages in topological order.

@MatejKosik can you take care of this? The EConstr bench failed because of a CI issue. It would help if we cut down the time of the bench.

timing html broken

https://ci.inria.fr/coq/job/benchmark-part-of-the-branch/ws/625/html/ is a 404
https://ci.inria.fr/coq/job/benchmark-part-of-the-branch/625/console prints related "/home/jenkins/git/coq-bench/two_points_on_the_same_branch.sh: line 333: cd: /home/jenkins/workspace/benchmark-part-of-the-branch/625/opam.NEW/4.07.1+flambda+no-flat-float-array/build/coq-mathcomp-character.dev//: No such file or directory"

missing periodic cleanup of generated files

Each benchmarking job generates lot of files. We should define a periodic cleanup for the generated file (similarly to what we do we pyrolyse).

intermediate results were not rendered

When this job was running, immediatelly after coq-vst was compiled, the intermediate results were not displayed.

That is not expected behavior.

NOTE: the results for coq-vst were displayed after another OPAM package was benchmarked.

Measure (also) compilation at the level of individual files.

It would be useful if the benchmarking job was able to provide deeper insight concerning changes of compilation times.

at the level of individual files
at the level of individual theorems.

Add test that testing continues when some package fails

See 874e675 https://ci.inria.fr/coq/job/benchmark-part-of-the-branch/617/console

Update to OPAM 2

We cannot properly test OCaml master due to this.

Blocked on access to the machine.

shared/opam_install.sh is a heresy

Correct me if I'm wrong, but with shared/opam_install.sh, if the network is down for 15 minutes, we'll get results without failures, but with a 15 minutes increment in the timings. That's a terrible idea, and that's probably what happend with fiat some time ago.

make it possible to run benchmarks with custom configuration options passed to Coq's ./configure command

[cc: @maximedenes ]

It seems that we are building Coq with -native-compiler on; however, doesn't that imply that a large part of the measured time is really spent in ocamlc?

Couldn't this interfere in when differentially testing some changes?

post (final) results to gitter

Please install the log sections plugins on Jenkins.

So logs can be readable:

https://wiki.jenkins.io/display/JENKINS/Collapsing+Console+Sections+Plugin

General Roadmap of Coq Bench

This is an attempt to share a general roadmap for Coq Bench so users can comment / collaborate.

The main idea is to rewrite the tool in OCaml and incorporate some code I had for bechmarking oexp. This would help having the tool run in users machines, and provide some more flexibility as to the kind of bench recipes we support.

Versioning: Coq bench does follow upstream Coq versioning, that is to say, 8.9 versions are expected to be used to benchmark Coq 8.9, 8.10 for Coq 8.10, etc...

For now we have two milestones:

8.9+shell: This milestone marks the end of development of the bash-based tool. Mainly we want to consolidate and document the tool, deployment, setup a brief CI, etc....
8.10+ocaml: This milestone marks the first release of a functioning OCaml-based bench tool. For this milestone we will basically have similar functionality to the bash-based tool.

Bench dependencies of formal topology

formal-topology is in the default package list but not its dependencies, so we build them and don't look at how long they take. This should be changed.
That means adding mathclasses and moving corn from the end of the package list to between mathclasses and formal-topo.

indicate different failure situations with different exit codes

Currently, when the benchmarking script fails, it returns exit status 1.

It makes sense to refine this behavior and make sure that different failure situations are indicated by different exit codes.

for benchmarking, use released versions of packages instead of developement versions

fix minor problems in the table of results

Currently, the table with results is formatted like this:

┌──────────────────┬──────────────────────────┬──────────────────────────────────────┬──────────────────────────────────────┐
│                  │      user time [s]       │              CPU cycles              │           CPU instructions           │
│                  │                          │                                      │                                      │
│     package_name │     NEW     OLD  PDIFF   │           NEW           OLD  PDIFF   │           NEW           OLD  PDIFF   │
├──────────────────┼──────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│        coq-flocq │   56.08   55.94  +0.25 % │  153460711917  153417163264  +0.03 % │  197150214085  196007731328  +0.58 % │
├──────────────────┼──────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│        coq-color │  624.66  620.50  +0.67 % │ 1731945090197 1721280000975  +0.62 % │ 2194274604370 2160472585424  +1.56 % │
├──────────────────┼──────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│          coq-vst │ 1548.11 1503.36  +2.98 % │ 4301918111006 4177116552037  +2.99 % │ 6558940021210 6269614567092  +4.61 % │
├──────────────────┼──────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│  coq-fiat-crypto │ 1868.00 1766.40  +5.75 % │ 5189336515240 4904129094643  +5.82 % │ 8757139278523 8125954983600  +7.77 % │
├──────────────────┼──────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│     coq-compcert │  872.49  808.76  +7.88 % │ 2425100895873 2241773276829  +8.18 % │ 3835828887495 3373238371202 +13.71 % │
├──────────────────┼──────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────┤
│ coq-fiat-parsers │  616.80  529.89 +16.40 % │ 1690439636388 1448594674656 +16.70 % │ 2533798849205 2056787255123 +23.19 % │
└──────────────────┴──────────────────────────┴──────────────────────────────────────┴──────────────────────────────────────┘

We mix a bit the units of measure.
On one hand, by [s] we say that values are in seconds.
On the other hand, the values in the third column are in per cents.
To clear this up, we should probably produce something like:

┌──────────────────┬────────────────────────┬────────────────────────────────────┬────────────────────────────────────┐
│                  │          user time     │              CPU cycles            │           CPU instructions         │
│                  │                        │                                    │                                    │
│     package_name │     NEW     OLD  PDIFF │           NEW           OLD  PDIFF │           NEW           OLD  PDIFF │
│                  │     [s]     [s]   [%]  │           [1]           [1]   [%]  │           [1]           [1]   [%]  │
├──────────────────┼────────────────────────┼────────────────────────────────────┼────────────────────────────────────┤
│        coq-flocq │   56.08   55.94  +0.25 │  153460711917  153417163264  +0.03 │  197150214085  196007731328  +0.58 │
├──────────────────┼────────────────────────┼────────────────────────────────────┼────────────────────────────────────┤
│        coq-color │  624.66  620.50  +0.67 │ 1731945090197 1721280000975  +0.62 │ 2194274604370 2160472585424  +1.56 │
├──────────────────┼────────────────────────┼────────────────────────────────────┼────────────────────────────────────┤
│          coq-vst │ 1548.11 1503.36  +2.98 │ 4301918111006 4177116552037  +2.99 │ 6558940021210 6269614567092  +4.61 │
├──────────────────┼────────────────────────┼────────────────────────────────────┼────────────────────────────────────┤
│  coq-fiat-crypto │ 1868.00 1766.40  +5.75 │ 5189336515240 4904129094643  +5.82 │ 8757139278523 8125954983600  +7.77 │
├──────────────────┼────────────────────────┼────────────────────────────────────┼────────────────────────────────────┤
│     coq-compcert │  872.49  808.76  +7.88 │ 2425100895873 2241773276829  +8.18 │ 3835828887495 3373238371202 +13.71 │
├──────────────────┼────────────────────────┼────────────────────────────────────┼────────────────────────────────────┤
│ coq-fiat-parsers │  616.80  529.89 +16.40 │ 1690439636388 1448594674656 +16.70 │ 2533798849205 2056787255123 +23.19 │
└──────────────────┴────────────────────────┴────────────────────────────────────┴────────────────────────────────────┘

Another pleasant consequence of that would also be that the table itself becomes narrower.

document why we current variance of measurements

and explain what do we gain by taking minimum rather than average or mean of measured values.

track and compare compilation times of individual files

remove code which is not future-proof

This chunk is not future-proof.

Once a new version of Coq is released, the code will break.

We need to find a better way to achieve the same goal.

fiat crypto fails to install

The following dependencies couldn't be met:
  - coq-fiat-crypto -> coqprime
      unknown package

No solution found, exiting

cc @JasonGross

coq-hott

At the moment, we maintain a private coq-hott package.

The reason why we keep it private is that it is a too ugly.

It would make sense to cumb it a little bit and then share it publicly.

benchmark-part-of-the-branch has no default packages

Could jenkins suggest default packages to build?

Get rid of the local repos hack for Coq

With OPAM 2, we should be able to use opam pin coq $coq_dir and avoid a lot of hacks in the script.

The base branch should be taken from the main repo by default

IIUC, the current script takes the base branch from the same repo as head. But typically, trunk is not up to date in personal repos. I already lost ~3h in total because of this (I discovered and then forgot about it), please fix!

don't pollute stderr with debugging messages

remove fake camlp{4,5} packages

At the moment, we generate fake camlp5{4,5} packages.

Check if that's necessary.

If not, remove them.

Jenkins cleanup.

It seems we have quite a few Jenkins todo:

update [a new version is available]
remove stable / unneeded jobs
remove pyrolise

cc: @maximedenes

Many failing packages

It seems the pinned Coq is considered to be 8.9, not dev.

https://ci.inria.fr/coq/view/benchmarking/job/benchmark-part-of-the-branch/639/console

INFO: failed to install coq-compcert
coq-corn
coq-fiat-core
coq-fiat-crypto
coq-fiat-parsers
coq-formal-topology
coq-hott
coq-math-classes
coq-mathcomp-odd-order
coq-sf-plf
coq-sf-vfa
coq-unimath
coq-verdi
coq-verdi-raft
coq-vst

A few logs:
https://ci.inria.fr/coq/job/benchmark-part-of-the-branch/ws/639/coq-verdi.NEW.opam_install.deps_only.stdout/*view*/

The following dependencies couldn't be met:
  - coq-verdi -> coq >= dev
      no matching version

No solution found, exiting

https://ci.inria.fr/coq/job/benchmark-part-of-the-branch/ws/639/coq-math-classes.NEW.opam_install.1.stdout/*view*/

The following actions will be performed:
  - install coq-math-classes 8.8.1

<><> Gathering sources ><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Processing  1/1: [coq-math-classes.8.8.1: http]
[coq-math-classes.8.8.1] downloaded from https://github.com/coq-community/math-classes/archive/8.8.1.zip
Processing  1/1:

...

- File "./interfaces/canonical_names.v", line 99, characters 0-27:
- Error:
- Commands which may open proofs are not allowed in a proof unless you turn option Nested Proofs Allowed on.

https://ci.inria.fr/coq/job/benchmark-part-of-the-branch/ws/639/coq-hott.NEW.opam_install.deps_only.stdout/*view*/

The following dependencies couldn't be met:
  - coq-hott -> coq (< 8.9 | >= dev)
      not available because the package is pinned to version 8.9.0

No solution found, exiting

download operation creates unnecessary noise in benchmarking results

Unless I am missing something, using opam install is not suitable for benchmarking. Among other things, it will call constraint solvers, try to download packages, and thus could introduce significant differences on timing.

@maximedenes, what do you think?

improve parameter checking

Before doing anything else, the script should check whether the parameters provided by the users make sense.

E.g. in this case:

https://ci.inria.fr/coq/job/benchmark-part-of-the-branch/123/parameters/

The script should tell the user the a given parameter cannot be an empty string.

There should be only one Jenkins job for benchmarking

And it should ask for:

A head repo
A head commit (can be a sha1, a branch name, etc.)
A base repo
A base commit (can be a sha1, a branch name, etc.)

3 should be the github repo by default.

I would advise against trying to guess 4.

remove code which is not future-proof

This chunk is not future-proof.

Once a new version of Coq is released, the code will break.

We need to find a better way to achieve the same goal.

Default packages are not in topological order

coq-mathcomp-algebra comes before coq-mathcomp-ssreflect, the order is completely wrong...

Deployment setup

We need some minimal information about how the deployment of the script is working, so we can start documenting / improving the workflow.

I am taking the freedom to assign this to @maximedenes as discussed in the WG.

Create separate logs for individual OPAM-packages

At the moment, the script produces a huge log.

It could make sense to make sure that logs (from compilations) of different OPAM packages go to their own separate files.

In case of failures, it would then be possible to give the user a link to the particular log files (instead of letting the user to dig through everything).