sneeuwballen / benchpress Goto Github PK

View Code? Open in Web Editor NEW

25.0 7.0 6.0 1.05 MB

Testing and benchmarking tool for logic-related programs.

License: BSD 2-Clause "Simplified" License

Makefile 0.14% OCaml 97.55% Common Lisp 1.03% JavaScript 1.16% Shell 0.12%

test-runner benchmark-scripts ocaml web-ui smtlib tptp

benchpress's Introduction

Benchpress

Tool to run one or more logic programs, on a set of files, and collect the results.

License: BSD.

Basic usage

$ benchpress run -c foo.sexp dir_a/ dir_b/ -p z3
…

this tells benchpress to run the prover z3 on directories dir_a and dir_b. foo.sexp contains additional configuration parameters as described below.

System dependencies

Logitest relies on a bunch of utilities, besides OCaml libraries:

sqlite3 (with development headers)
time, ulimit, etc
~~gzip/zcat for compressing files~~
(optional) grep + access to /proc/cpuinfo for guessing number of cores
(optional) git for tagging solvers from their repository

TODO use cgroups or similar for CPU affinity

Options

CLI options

Most of the commands accept -c <config file> to specify which config files to use.

benchpress --help to list options
benchpress run:
- -t <time> timeout (in seconds) for each run
- -m <memory> memory limit in MB
- -F <file> read list of problems from given file
- -p <prover1,prover2> list of provers to use
- --task <profile> specify which task to use
- -o <file> specify an output database file
benchpress slurm:
- -t <time> timeout (in seconds) for each run
- -m <memory> memory limit in MB
- -F <file> read list of problems from given file
- -p <prover1,prover2> list of provers to use
- --task <profile> specify which task to use
- --nodes: max number of nodes to allocate for the workers (one worker by node)
- --partition: the partition to which the allocated nodes need to belong
- --ntasks: number of tasks to give the workers at a time
- --addr: IP of the server with which the workers communicate
- --port: port of the server
benchpress dir config shows the configuration directory
benchpress dir state shows the directory where the state (benchmark results) is stored
benchpress check-config <file> to check that the file is valid configuration
benchpress prover-list to list available provers
benchpress prover-show <prover> to show the definition of a prover
benchpress list-files to list the results
benchpress show <result> to show the content of the result file
-v and -vv can be used to get more verbose output.
if the environment variable LOGS_FILE is set to a filename, logs will be written to that file.

ENV var options

Some internal parameters of benchpress can be set using environment variables:

"BENCHPRESS_BUSY_TIMEOUT" controls the busy timeout of the sql database used by benchpress, in miliseconds. Default is 3000.
"XDG_CONFIG_HOME" override the default value $HOME/.config.

Web interface

benchpress-server is a daemon listening on a local port (default 8080), which provides a basic web UI.

Config File

Benchpress ships with a builtin config, which is imported by default unless (import-prelude false) is specified. It contains, roughly:

Builtin config

; read smtlib status
(prover
  (name smtlib-read-status)
  (cmd "grep :status $file")
  (unknown ":status unknown")
  (sat ":status sat")
  (unsat ":status unsat"))

(prover
  (name minisat)
  (unsat "UNSATISFIABLE")
  (sat "^SATISFIABLE")
  (cmd "minisat -cpu-lim=$timeout $file"))

(prover
  (name z3)
  (cmd "z3 $file")
  (version "cmd:z3 --version")
  (unsat "unsat")
  (sat "^sat"))

The configuration is based on stanzas that define available provers, available sets of benchmarks (based on directories that contain them), and tasks. For now the only kind of supported task is to run provers on problems, but it should get richer as we go (e.g. run proof checkers, do some basic CI, run a task daily, etc.).

In this default file we also define a pseudo-prover, "smtlib-read-status", which is used to parse SMTLIB benchmarks and find an annotation (set-info :status <…>). This is useful when running provers later because it makes it easy to find bugs (if a prover reports a wrong answer).

We also define provers minisat and z3 as common reference points, providing info on how to run them (with cmd …) and how to parse their results using regexes.

Example of config file

A more complete example, taken from mc2:

; from https://github.com/c-cube/mc2

(prover
  (name mc2)
  (cmd "ulimit -t $timeout; mc2 --time $timeout $file")
  (unsat "^Unsat")
  (sat "^Sat")
  (unknown "Unknown")
  (timeout "Timeout"))

(dir
  (path "$HOME/workspace/smtlib")
  (pattern ".*.smt2")
  (expect (run smtlib-read-status)))

(task
  (name glob-all-smtlib)
  (synopsis "run all SMT solvers on smtlib")
  (action
   (run_provers
    (dirs ("$HOME/workspace/smtlib"))
    (provers (mc2 z3))
    ;(memory 100000000)  ; TODO: parse "10G"
    (timeout 10))))

(task
  (name glob-all-smtlib-QF_UF)
  (synopsis "run all SMT solvers on QF_UF")
  (action
    (run_provers
      (dirs ("$HOME/workspace/smtlib/QF_UF"))
      (provers (mc2 z3))
      (timeout 10))))

Such a configuration file can be validated using:

$ benchpress check-config the_file.sexp

Then one can run a task, like so:

$ benchpress run -c the_file.sexp --task glob-all-smtlib-QF_UF -t 30

to run mc2 and z3 on the QF_UF problems in the SMTLIB directory. The task stanza defines a pre-packaged task that can be launched easily from the command line or the embedded web server (a bit like a makefile target).

Note that tasks are not necessary, they're just shortcuts. You can also pass directly the prover list and directory:

$ benchpress run -c the_file.sexp -p mc2 some/path/ -t 30

List of stanzas

The variable $cur_dir evaluates to the config file's directory. This allows the config file to refer to provers that are installed locally (e.g. in the same repository).

(prover …) defines a new prover. The name should be unique.
- name: unique name, used to refer to this prover in results, on the command line, etc
- cmd: how to run the prover. Variables $timeout, $file, $memory are available and will refer to parameters used to run the prover on a file. Variable $binary refers to the binary field if set, see below.
- binary: path to the prover binary. If not provided, will be inferred as the first argument of the cmd.
- sat, unsat, unknown, timeout, memory are (perl) regex used to recognize the result (or reason for failure by timeout or memory exhaustion) of the prover.
- custom tags can be used with (tag foo regex): a tag named foo will be used when regex matches the prover's output.
(dir …) defines a directory:
- (name …): unique name used to refer to this directory in the config. You can refer to previously defined directories using ${dir:…}.
- (path …) defines the path. The rules below apply to any file within this directory.
- (pattern ".*.smt2") means only files matching the (perl) regex will be considered.
- (expect …) defines how to find the expected result of each file (which will be compared to the actual result to detect bugs).
(custom-tag (name t)) makes a custom tag t available
(task …) defines a task that can be run from the command line.
- name should be unique (used to refer to the task)
- action defines what the task should do, see the action section For now there's only (run_provers …) to run provers on files locally.
(set-options…) defines global options:
- j integer for number of parallel tasks in run
- progress boolean for progress bar in run

Actions

(run_provers fields) to run some provers on some benchmarks. Fields are:
- (provers (p1 … pn)) list of (names of) provers defined in other stanzas
- (dirs (p1 … pn)) paths containing benchmarks. The paths must be subdirectories of already defined directories (see the dir stanza above)
- (dir_files (p1 … pn)) paths to files containing problem paths, one problem path per line. Each line can also be a directory, in which case all the problems in the directory will be added. The same restrictions as for dirs apply. This has the same behavior as option -F.
- (timeout n) (optional) defines a timeout in seconds
- (memory n) (optional) defines a memory limit in MB
- (pattern regex) (optional) an additional regex for files to consider in dirs
- (j n) (optional) defines the number of concurrent threads to use when running the provers
(run_provers_slurm fields) to run some provers on some benchmarks using the computing power of a cluster managed by slurm. Most of the fields are the same as those of the run_provers except that they apply to every worker running on the allocated compute nodes. There are also additional fields:
- (j n) (optional) the number of concurrent threads to be used by each worker deployed on an allocated compute node. (default is 4).
- (partition s): (optional) the name of the partition to which the allocated compute nodes need to belong.
- (nodes n): (optional) the number of nodes to allocate to the action (default is 1).
- (addr s): (optional) the IP address on which the server which will be deployed on the control node should listen on.
- (port n): (optional) the port on which the server should listen on.
- (ntasks n): (optional) the number of tasks that will be sent to the workers at a time.
(progn a1 … an) runs actions in sequence. Fails if any action fails.
(run_cmd "the command") runs the given command.
(git_checkout (dir d) (ref r) [(fetch_first fetch|pull)]) specifies a directory in which to go ((dir d)), a git reference to checkout ((ref r)) and optionally a tag to indicate whether to fetch/pull the repo first.

An example of a task running with slurm

(task
  (name testrun-slurm)
  (action
    (run_provers_slurm
      (dirs ($PATHS))
      (provers (z3 cvc4))
      (timeout 2)
      (j 4)
      (nodes 2)
      (addr "xxx.xxx.xx.xxx")
      (port 8080)
      (ntasks 20)
      )))

assuming that "PATHS" are paths to directories containing benchmarks which were previously defined in the config file.

Note: Running a task with slurm should be done on a control node. The nodes of the cluster should have at least a shared directory and "XDG_DATA_HOME" should be a path to it or one of its subdirectories. It will be used to store the config file which will be used by the benchpress worker instances which run on the compute nodes of the cluster.

benchpress's People

Contributors

Stargazers

Watchers

Forkers

rafoo gbury tiancheng-luo hra687261 halbaroth bclement-ocp

benchpress's Issues

separate notification mechanisms in subprocesses

have a separate binary for logitest-irc, with its own config perhaps, and launch it
iff it's present and asked for (a "service") on stdio
communicate via jsonrpc2
the service is the one to call logitest through the full jsonrpc2 API (so it can ask for anything)

printbox for reporting results

Filter full results to only keep differing results

On a view such as https://benchpress.cedeela.fr/show_table/res-20200515T222015-2eed2c42-3d64-4611-956f-341d3b2d54ab.sqlite/?offset=0 , it would be useful to be able to list only the results that differ between provers (and this list would be linked to by the numbers in the comparison table of https://benchpress.cedeela.fr/show/res-20200515T222015-2eed2c42-3d64-4611-956f-341d3b2d54ab.sqlite , like is done for each provers' individual results)

Wrong running time

I am using Benchpress to track regressions in Alt-Ergo but sometimes I got regressions that I cannot reproduce manually. My guess is that the way Benchpress computes running times is not appropriate for this purpose and the current times can be wrong if the server is overloaded. Thus, I tried to run Benchpress with less jobs to prevent this behavior, but still I have non-replicable regressions.

I could solve my problem by adding the running time without IO operations (the CPU time?) in the database for each tests.

I would be glad to add the feature by myself ;)

parse expected status as we go?

useful when it's done by running a reference solver

starexec backend

use ezcurl to interact with their web API
make that a sublibrary
have a series of subcommands for querying starexec (or maybe a separate binary? logitest-starexec?)

ui: auto completion in filtering of individual results

https://component.kitchen/elix/AutoCompleteComboBox could be nice. Completion should be doable for provers and results (but not problem, too many of them).

Unix.Unix_error(Unix.ECONNREFUSED, "connect", "") fails silently

This tool seems very promising! Thanks for making the work available.

I'm trying to get it configured initially. I suspect I'm doing something wrong in the configuration, and I'll keep investigating that, but this behavior seems undesirable regardless, so I figured it merited a report.

When I run benchpress in its currently configured state with default verbosity, I get

$ benchpress run -c config.sexp --task test-on-raft
/bin/sh: 1: z3: not found
/bin/sh: 1: cvc4: not found
testing with 1 provers, 0 problems…
OK
STAT:
provers
----------
sat
----------
unsat
----------
sat+unsat
----------
errors
----------
unknown
----------
timeout
----------
memory
----------
total
----------
total_time
ANALYSIS:
provers
----------
improved
----------
ok
----------
disappoint
----------
bad
----------
errors
----------
total

and a clean exit code. When I pump to max verbosity, I find

[2021-01-26T19:34:15|t0] [info:] run-main.main for paths 
[2021-01-26T19:34:15|t0] [debug:api]
  create client cl_id=0 host="127.0.0.1" port=8090
[2021-01-26T19:34:15|t0] [debug:]
  get-cmd-out "grep -c processor /proc/cpuinfo"
[2021-01-26T19:34:15|t1] [debug:] send progress api to the server
[2021-01-26T19:34:15|t1] [debug:api] make API call task_update
[2021-01-26T19:34:15|t1] [debug:]
  error when trying to connect to API: Unix.Unix_error(Unix.ECONNREFUSED, "connect", "")
[2021-01-26T19:34:15|t0] [debug:] par-map: create pool j=3
[2021-01-26T19:34:15|t0] [debug:] par-map: stop pool
testing with 1 provers, 0 problems…
[2021-01-26T19:34:15|t0] [debug:] par-map: create pool j=1
[2021-01-26T19:34:15|t0] [debug:] par-map: stop pool
[2021-01-26T19:34:15|t0] [info:]
  benchmark done in 0.072s, uuid=dc58297a-da7a-4fc2-bc4a-02f304d739d6
[2021-01-26T19:34:15|t0] [debug:] saving metadata…
[2021-01-26T19:34:15|t0] [debug:] closing db…
OK
STAT:
provers
----------
sat
----------
unsat
----------
sat+unsat
----------
errors
----------
unknown
----------
timeout
----------
memory
----------
total
----------
total_time
ANALYSIS:
provers
----------
improved
----------
ok
----------
disappoint
----------
bad
----------
errors
----------
total

So it looks like something in my misconfigured system is causing the connection to be refused. But instead of failing with an error, benchproess is reporting everything to be OK.

This is running on an install pinned to master. My system (although not relevant for the silent error):

Linux system76-pc 5.8.0-7630-generic #32~1609193707~20.04~781bb80~dev-Ubuntu SMP Tue Jan 5 21:22:25 U x86_64 x86_64 x86_64 GNU/Linux

transfer to sneeuwballen

Feature wish for ulimit wrappers

Once the double wrapping of #39 has been fixed, here are a few feature wish that would be nice to have:

ability to disable the ulimit wrapper for some prover (e.g. dolmen already has a pretty strict enforcement of the time and memory limits, and the ulimit wrapper seems like an unnecessary wrapper that might complexify the process tree)
add a few others ulimit options, particularly the ability to set stack space as unlimited

Of course, these two feature wished interact a bit, so ultimately a way for a prover configuration to include which limits the ulimit has to enforce (and removing the wrapper if there are none) would probably be the solution. I'll see if I have the time to work on that in the near future.

progress bar with ETA enabled by default

Mutex.lock: Resource deadlock avoided

Benchpress sometimes crashes while running on the following configuration:

(import-prelude false)

(prover
  (name ae-read-status)
  (cmd "zgrep :status $file")
  (unknown ":status unknown")
  (unknown "")
  (sat ":status sat")
  (unsat ":status unsat|:status valid"))

(dir
  (path "tests")
  (pattern ".*.ae|.*.smt2|.*.zip")
  (expect (run ae-read-status)))

(prover
  (name random)
  (cmd "./random_solver --timelimit=$timeout $file")
  (sat "^sat")
  (unsat "Valid|(^unsat)")
  (unknown "(I Don't Know)|(^unsat)")
  (timeout "^timeout"))

where the solver random_solver is

#!/bin/bash

choice=$((RANDOM % 5))

case $choice in
  '0')
    echo "unsat";;
  '1')
    echo "sat";;
  '2')
    echo "unknown";;
  '3')
    echo "error";;
  '4')
    echo "timeout";;
esac

The trace of the exception:

  Sys_error("Mutex.lock: Resource deadlock avoided")
  Raised by primitive operation at CCLock.with_lock in file "src/threads/CCLock.pp.ml" (inlined), line 11, characters 2-20
  Called from Benchpress__Misc.synchronized in file "src/core/Misc.ml", line 15, characters 21-45
  Called from Benchpress__Misc.Log_report.reporter.report.k.write_str in file "src/core/Misc.ml", line 67, characters 10-75
  Called from Benchpress__Run_prover_problem.pp_result in file "src/core/Run_prover_problem.ml", line 64, characters 2-208
  Called from Stdlib__Format.kfprintf.(fun) in file "format.ml", line 1356, characters 16-34
  Called from CCLock.with_lock in file "src/threads/CCLock.pp.ml" (inlined), line 13, characters 12-23
  Called from Benchpress__Misc.synchronized in file "src/core/Misc.ml", line 15, characters 21-45
  Re-raised at CCLock.with_lock in file "src/threads/CCLock.pp.ml" (inlined), line 18, characters 4-11
  Called from Benchpress__Misc.synchronized in file "src/core/Misc.ml", line 15, characters 21-45
  Called from Benchpress__Run_prover_problem.pp_result_progress in file "src/core/Run_prover_problem.ml", line 75, characters 2-109
  Called from Benchpress__Exec_action.Progress_run_provers.progress.object#on_res in file "src/core/Exec_action.ml", line 410, characters 27-84
  Called from Benchpress__Exec_action.Exec_run_provers.run.run_prover_pb.(fun) in file "src/core/Exec_action.ml", line 279, characters 6-21
  Called from Benchpress__Error.guard in file "src/core/Error.ml", line 40, characters 6-9

The output produced by the command : dune exec -- benchpress run -c config.sexp -t 5 -p random ./tests -vv

Log

[2023-03-20T23:22:13|t0] [debug:] logs are setup
[2023-03-20T23:22:13|t0] [info:] parse config files ["config.sexp"]
[2023-03-20T23:22:13|t0] [info:]
  add-stanza (enter-file /home/tiky/programming/test3/config.sexp)
[2023-03-20T23:22:13|t0] [info:]
  add-stanza (prover
              (name ae-read-status)
              (cmd "zgrep :status $file")
              (sat ":status sat")
              (unsat ":status unsat|:status valid")
              (unknown ""))
[2023-03-20T23:22:13|t0] [info:]
  add-stanza (dir
              (path tests)
              (expect (run ae-read-status))
              (pattern ".*.ae|.*.smt2|.*.zip"))
[2023-03-20T23:22:13|t0] [debug:benchpress.definition]
  cur dir: '/home/tiky/programming/test3'
[2023-03-20T23:22:13|t0] [info:]
  add-stanza (prover
              (name random)
              (cmd "./random_solver --timelimit=$timeout $file")
              (sat "^sat")
              (unsat "Valid|(^unsat)")
              (unknown "(I Don't Know)|(^unsat)")
              (timeout "^timeout"))
[2023-03-20T23:22:13|t0] [info:benchpress.run-main]
  run-main.main for paths ./tests
[2023-03-20T23:22:13|t0] [debug:]
  check prefix dir="/home/tiky/programming/test3/tests" for "/home/tiky/programming/test3/./tests"
[2023-03-20T23:22:13|t0] [debug:]
  get-cmd-out "grep -c processor /proc/cpuinfo"
[2023-03-20T23:22:13|t0] [debug:] par-map: create pool j=3
[2023-03-20T23:22:13|t3] [debug:problem]
  (find_expect `/home/tiky/programming/test3/./tests/AUFLIA/misc/arr2.smt2`
     using (run ae-read-status))…
[2023-03-20T23:22:13|t3] [debug:prover]
  (Prover.run ae-read-status (limits
                               (timeout 1s)
                               (memory 200M)))
[2023-03-20T23:22:13|t1] [debug:problem]
  (find_expect `/home/tiky/programming/test3/./tests/AUFLIA/misc/set3.smt2`
     using (run ae-read-status))…
[2023-03-20T23:22:13|t1] [debug:prover]
  (Prover.run ae-read-status (limits
                               (timeout 1s)
                               (memory 200M)))
[2023-03-20T23:22:13|t2] [debug:problem]
  (find_expect `/home/tiky/programming/test3/./tests/AUFLIA/misc/set1.smt2`
     using (run ae-read-status))…
[2023-03-20T23:22:13|t2] [debug:prover]
  (Prover.run ae-read-status (limits
                               (timeout 1s)
                               (memory 200M)))
[2023-03-20T23:22:13|t3] [debug:run-proc]
  (run.done :errcode 0
   :cmd "ulimit -t 2 -Sm 200 ; zgrep :status /home/tiky/programming/test3/./tests/AUFLIA/misc/arr2.smt2"
[2023-03-20T23:22:13|t3] [debug:run-proc]
  stdout:
(set-info :status unsat)
stderr:
[2023-03-20T23:22:13|t3] [debug:problem]

(find_expect /home/tiky/programming/test3/./tests/AUFLIA/misc/set2.smt2

using (run ae-read-status))…

[2023-03-20T23:22:13|t2] [debug:run-proc]

(run.done :errcode 0

:cmd "ulimit -t 2 -Sm 200 ; zgrep :status /home/tiky/programming/test3/./tests/AUFLIA/misc/set1.smt2"

[2023-03-20T23:22:13|t3] [debug:prover]

(Prover.run ae-read-status (limits

(timeout 1s)

(memory 200M)))

[2023-03-20T23:22:13|t2] [debug:run-proc]

stdout:

(set-info :status unsat)
stderr:
[2023-03-20T23:22:13|t1] [debug:run-proc]

(run.done :errcode 0

:cmd "ulimit -t 2 -Sm 200 ; zgrep :status /home/tiky/programming/test3/./tests/AUFLIA/misc/set3.smt2"

[2023-03-20T23:22:13|t1] [debug:run-proc]

stdout:

(set-info :status unsat)
stderr:
[2023-03-20T23:22:13|t3] [debug:run-proc]

(run.done :errcode 0

:cmd "ulimit -t 2 -Sm 200 ; zgrep :status /home/tiky/programming/test3/./tests/AUFLIA/misc/set2.smt2"

[2023-03-20T23:22:13|t3] [debug:run-proc]

stdout:

(set-info :status sat)
stderr:
[2023-03-20T23:22:13|t0] [debug:] par-map: stop pool

testing with 1 provers, 4 problems…

[2023-03-20T23:22:13|t0] [debug:benchpress.runexec-action]

output database file /home/tiky/.local/share/benchpress/res-20230320T232213-9118d387-bca8-494c-8e72-3ecd4c7f25ed.sqlite

[2023-03-20T23:22:13|t0] [debug:] par-map: create pool j=1

[2023-03-20T23:22:13|t8] [info:benchpress.run-prover-pb]

running random         //home/tiky/programming/test3/./tests/AUFLIA/misc/set3.smt2 (timeout 5s)...

[2023-03-20T23:22:13|t8] [debug:benchpress.run-prover-pb] proof file: None

[2023-03-20T23:22:13|t8] [debug:prover]

(Prover.run random (limits

(timeout 5s)))

[2023-03-20T23:22:13|t8] [debug:run-proc]

(run.done :errcode 0

:cmd "ulimit -t 6 ; ./random_solver --timelimit=5 /home/tiky/programming/test3/./tests/AUFLIA/misc/set3.smt2"

[2023-03-20T23:22:13|t8] [debug:run-proc] stdout:

sat
stderr:
[2023-03-20T23:22:13|t8] [debug:]

output for ./random_solver//home/tiky/programming/test3/./tests/AUFLIA/misc/set3.smt2: sat , ``, errcode 0

[2023-03-20T23:22:13|t8] [info:benchpress.run-prover-pb]

running random         //home/tiky/programming/test3/./tests/AUFLIA/misc/set1.smt2 (timeout 5s)...

[2023-03-20T23:22:13|t8] [debug:benchpress.run-prover-pb] proof file: None

[2023-03-20T23:22:13|t8] [debug:prover]

(Prover.run random (limits

(timeout 5s)))

Error:

Sys_error("Mutex.lock: Resource deadlock avoided")

Raised by primitive operation at CCLock.with_lock in file "src/threads/CCLock.pp.ml" (inlined), line 11, characters 2-20

Called from Benchpress__Misc.synchronized in file "src/core/Misc.ml", line 15, characters 21-45

Called from Benchpress__Misc.Log_report.reporter.report.k.write_str in file "src/core/Misc.ml", line 67, characters 10-75

Called from Benchpress__Run_prover_problem.pp_result in file "src/core/Run_prover_problem.ml", line 64, characters 2-208

Called from Stdlib__Format.kfprintf.(fun) in file "format.ml", line 1356, characters 16-34

Called from CCLock.with_lock in file "src/threads/CCLock.pp.ml" (inlined), line 13, characters 12-23

Called from Benchpress__Misc.synchronized in file "src/core/Misc.ml", line 15, characters 21-45

Re-raised at CCLock.with_lock in file "src/threads/CCLock.pp.ml" (inlined), line 18, characters 4-11

Called from Benchpress__Misc.synchronized in file "src/core/Misc.ml", line 15, characters 21-45

Called from Benchpress__Run_prover_problem.pp_result_progress in file "src/core/Run_prover_problem.ml", line 75, characters 2-109

Called from Benchpress__Exec_action.Progress_run_provers.progress.object#on_res in file "src/core/Exec_action.ml", line 410, characters 27-84

Called from Benchpress__Exec_action.Exec_run_provers.run.run_prover_pb.(fun) in file "src/core/Exec_action.ml", line 279, characters 6-21

Called from Benchpress__Error.guard in file "src/core/Error.ml", line 40, characters 6-9
Context:

(running :prover random :on /home/tiky/programming/test3/./tests/AUFLIA/misc/set3.smt2 (expect: unsat))
Context:

running 4 tests
Context:

run prover action

(run_provers (dirs (/home/tiky/programming/test3/./tests :in (dir (path /home/tiky/programming/test3/tests) (expect (run ae-read-status)) (pattern ".*.ae|.*.smt2|.*.zip")))) (provers random) (timeout 5s) (j 1))

The state of the local opam switch I used for this issue:

opam list

# Packages matching: installed
# Name              # Installed # Synopsis
base-bigarray       base
base-bytes          base        Bytes library distributed with the OCaml compiler
base-threads        base
base-unix           base
base64              3.5.1       Base64 encoding for OCaml
benchpress          0.1         pinned to version 0.1 at git+https://github.com/sneeuwballen/benchpress.git
benchpress-lsp      0.1         pinned to version 0.1 at git+https://github.com/sneeuwballen/benchpress.git
benchpress-server   0.1         pinned to version 0.1 at git+https://github.com/sneeuwballen/benchpress.git
cmdliner            1.1.1       Declarative definition of command line interfaces for OCaml
conf-gnuplot        0.1         Virtual package relying on gnuplot installation
conf-pkg-config     2           Check if pkg-config is installed and create an opam switch local pkgconfig folder
conf-sqlite3        1           Virtual package relying on an SQLite3 system installation
containers          3.11        A modular, clean and powerful extension of the OCaml standard library
containers-thread   3.11        An extension of containers for threading
cppo                1.6.9       Code preprocessor like cpp for OCaml
csexp               1.5.1       Parsing and printing of S-expressions in Canonical form
csv                 2.4         A pure OCaml library to read and write CSV files
dune                3.7.0       Fast, portable, and opinionated build system
dune-configurator   3.7.0       Helper library for gathering system configuration
either              1.0.0       Compatibility Either module
gnuplot             0.7         Simple interface to Gnuplot  Gnuplot-OCaml provides a simple interface to Gnuplot from OCaml. The API supports only 2D graphs and was inspir
ISO8601             0.2.6       ISO 8601 and RFC 3999 date parsing for OCaml
iter                1.6         Simple abstraction over `iter` functions, intended to iterate efficiently on collections while performing some transformations
jemalloc            0.2         Bindings to jemalloc mallctl api
jsonrpc             1.5.0       Jsonrpc protocol implemenation
linol               0.4         LSP server library
logs                0.7.0       Logging infrastructure for OCaml
lsp                 1.5.0       LSP protocol implementation in OCaml
ocaml               4.14.1      The OCaml compiler (virtual package)
ocaml-config        2           OCaml Switch Configuration
ocaml-syntax-shims  1.0.0       Backport new syntax to older OCaml versions
ocaml-system        4.14.1      The OCaml compiler (system version, from outside of opam)
ocamlbuild          0.14.2      OCamlbuild is a build system with builtin rules to easily build most OCaml projects
ocamlfind           1.9.6       A library manager for OCaml
pp_loc              2.1.0       Quote and highlight input fragments at a given source location
ppx_yojson_conv_lib v0.15.0     Runtime lib for ppx_yojson_conv
printbox            0.6.1       Allows to print nested boxes, lists, arrays, tables in several formats
printbox-text       0.6.1       Text renderer for printbox, using unicode edges
ptime               1.1.0       POSIX time for OCaml
re                  1.10.4      RE is a regular expression library for OCaml
result              1.5         Compatibility Result module
seq                 base        Compatibility package for OCaml's standard iterator type starting from 4.07.
sqlite3             5.1.0       SQLite3 bindings for OCaml
sqlite3_utils       0.4         High-level wrapper around ocaml-sqlite3
stdlib-shims        0.3.0       Backport some of the new stdlib features to older compiler
tiny_httpd          0.12        Minimal HTTP server using good old threads
topkg               1.0.7       The transitory OCaml software packager
uucp                15.0.0      Unicode character properties for OCaml
uuidm               0.9.8       Universally unique identifiers (UUIDs) for OCaml
uutf                1.0.3       Non-blocking streaming Unicode codec for OCaml
yojson              2.0.2       Yojson is an optimized parsing and printing library for the JSON format

The version of Benchpress I used:

opam show benchpress

<><> benchpress: information on all versions ><><><><><><><><><><><><><><><><><> name benchpress all-installed-versions 0.1 [/home/tiky/git/benchtop /home/tiky/programming/test3] dev [ocaml.4.14.1] all-versions 0.1

<><> Version-specific details <><><><><><><><><><><><><><><><><><><><><><><><><>
version 0.1
repository default
pin git+https://github.com/sneeuwballen/benchpress.git
source-hash 2d84e9d
url.src "git+https://github.com/sneeuwballen/benchpress.git"
homepage "https://github.com/sneeuwballen/benchpress/"
bug-reports "https://github.com/sneeuwballen/benchpress/issues"
dev-repo "git+https://github.com/sneeuwballen/benchpress.git"
authors "Simon Cruanes" "Guillaume Bury"
maintainer "[email protected]"
depends "dune" {>= "2.2"}
"base-unix"
"containers" {>= "3.3" & < "4.0"}
"containers-thread" {>= "3.0" & < "4.0"}
"re" {>= "1.8" & < "2.0"}
"csv"
"cmdliner" {>= "1.1.0"}
"iter" {>= "1.0"}
"logs"
"uuidm"
"base64"
"ptime"
"pp_loc" {>= "2.0" & < "3.0"}
"gnuplot" {>= "0.6" & < "0.8"}
"sqlite3"
"sqlite3_utils" {>= "0.4" & < "0.6"}
"printbox" {>= "0.6"}
"printbox-text" {>= "0.6"}
"ocaml" {>= "4.08"}
synopsis Tool to run one or more logic programs, on a set of files, and collect the results

command for regression testing

iterate on existing results and use olinq to find problems where the result is different (and time is not too close to timeout); summarize in csv/printbox

ui: have breadcrumbs for navigation

would be better than "back to root".

migrate to sqlite

each benchmark run produces a distinct sqlite3 file (still with a similar naming scheme)
use attach database $file to be able to compare 2 distinct runs
use for all the queries (means nothing but comparisons/stats is in memory)

Feature wish: show number of entries in detailed view

When looking at a list of results (for instance when filtering problems by name or status), printing the total number of entries that match the filter would be quite useful, ^^

when using `-F`, group all results together

allow fixed 'expect' field in config

instead of expect = unknown (as a fallback for problems where an expect string isn't found), a useful feature for benchmarks is to force the expect field to be unknown (or even just empty). In this case we could display results instead of ok/bad…

display cached summary directly on main page

if the metadata for a file is already in cache, generate that instead of the lazy loaded version. otherwise do lazy loading.

[UI] Filter results by tag

Particularly for large benchmarks, it would be useful to be able to filter the list of detailed results based on the result/tag of the run.

stop/resume jobs

when starting a job, put its full serialized description into a dotfile, and append each individual result as we go (use cbor+decoders?)
command to list interrupted jobs
command to resume an interrupted job (subtracting already done individual tasks)

edit: in xdg rundir, save a description of the job (a flat json file) and append results as we go. remove file when we're done.
resume looks for such files and picks up where ti stopped.

task queue for long running instance

in benchpress serve, the user should be able to click on run task, so we need a queue of tasks (also useful for cron tasks, CI, etc.).

simple thing with a level of priority and not much else. run tasks sequentially.

option to store stdout/stderr in individual files

triggered by CLI option, task option, or general option
make a subdirectory in xdg state dir for the particular benchmark
in db, just save external:file_path for both stdout and stderr
use either consistent naming scheme (prover-file.out and prover.file.err), or a sha1-of-content storage

cc @quicquid @Gbury

basic cactus plot

should be produced for either single bench (between provers), or between 2 different benchs one compares
automatically present in web UI (if gnuplot installed)
CLI command to produce a png/svg explicitly for a given file

Use GNU time to record maximum resident set size in addition to elapsed time

It is very convenient that currently benchpress records the time for each prover run, but it would be even more useful to also have an estimation of the memory used by each process. Fortunately, the GNU time utility provide such a fonctionality.

Something to that effect could be obtained by wrapping the prover calls like this:

/usr/bin/time -f "%e\n%M" -o some_tmp_file prover_cmd_to_run....

And then reading the temporary file for the output of time (%e is for the elapsed real time in seconds, and %M the maximum resident set size in Kbytes).

form to do arbitrary sql query in web UI

remove notion of test, "bad" is just when read-status/read-expected conflicts

much simpler this way, and we can support files of unknown status
can also find conflicts with past benchmarks results

add `-j` parameter back

use CCPool for thread pool, with a kind of:

val map_p : j:int -> ('a -> 'b fut) -> 'a list -> 'b list fut

make first release

use S-exprs for the config

death to toml!

edit: could use decoders-sexp or something like that.

serve: `/api/…` endpoints for serving data in json

would make it easy to interoperate with other tools, providing json data (and accepting json payloads for things like scheduling tasks).

edit: json, protobuf (ocaml-protoc), … to be decided

handle proof checking

-[x] add (proof_produce " --some --additional --args $proof_file") to prover stanza

add (proof_checker "checker $problem --whatever $proof") to prover stanza
(or maybe a separate stanza with the (proof_checker <name>) field in prover)
add a way in task/cli to specify that proof production is on for some provers; also need to be able to run the prover with and without proof production
do the proof checking, with new possible results ("unsat but with bad proof", say)
have a way to say "checked n subproofs, skipped m subproofs" (to say we have (n/n+m % steps proved)
provide views on checked/unchecked proofs in the UI

more dynamic task system

par_map should take a pool as argument, not create one
dynamic scheduling of tasks such as:
- run proof checker upon unsat
- run program for finding expected status

use vega or vega-lite for graphs

Probably vega-lite in browser, with some json obtained from the server (we can make a tiny json library for it for the server I think) would be prettier and more convenient than the current generation of gnuplot images.

embedded http server

use tiny_httpd
show list of results (just list directory)
clicking on result -> render the whole analysis, with tables, possibly graphs

richer way to scan prover output

ability to specify stdout/stderr
match errcode instead of output (cf minisat: 10/20)
priorities to try more specify regex first

cc @Gbury

jobserver protocol

http://make.mad-scientist.net/papers/jobserver-implementation/

Missing dependency to irc-client

Opam pin installation fails if the package irc-client is not previously installed (with an error message indicating to run jbuilder that itself suggests to install the irc-client package).

Should the dependency to irc-client be optional?

add per-prover parameters

allow, say:

[prover]
cmd = "prover $a $b $file $timeout
a = [a1, a2, a3]
b = [b1, b2]

and test on the whole matrix.

logitest doesn't react properly to `ctrl-c`

Name dirs in config file

Similarly to how provers are identified by a name (rather than their command line), it would be useful to be able to name directories. This would allow a few things such as:

currently the PATH for directories is duplicated, and all occurrences have to be kept in sync, which might annoying
we could have the same PATH but with different expect status (e.g. one for testing provers, one for testing linters/other non-provers programs)

export results as well formatted json + gz

store raw results in json (without output, but store the rest of the record)
allows to nicely print results later on, or csv-export results
can be used from other tools

[UI] Add link to filtered list of results in summary

In the summary of a prover's results for a task, there are lines for each tag that provides the number of runs that resulted in this tag; it would be nice if that number also was a link to the list of the runs with that result (which is possible thanks to the new filtering capability of the list of results).

tracking: test tool for E?

see https://wwwlehre.dhbw-stuttgart.de/~sschulz/THESES/e_bench2.html for the goals

Plots fails when using custom tags

It looks like the data points given to plots only contain sat/unsat/... results (i.e. builtins tags), but ignores custom tags. For a run that only uses custom tags this result in the following error (from benchpress-server logs):

         line 1: warning: Skipping data file with no valid points

gnuplot> plot  '-' using 1:2 with linespoints t "dolmen"
                                                        ^
         line 1: x range is invalid

server: `prover` link should read from the DB

display prover info from this particular DB, which could be custom.

Double ulimit wrapper

Currently it seems that provers run by benchpress are wrapped twice in a ulimit wrapper, which is at the very least superfluous, and I suspect might be one reason why I continue to have zombie processes when running benchmarks.

Additionally, the two wrappers do not enforce the same limits, which may or may not make things even more confusing.

For the record here is the command run for a timeout of 600s and 2G memory limit, as per benchpress logs:

cmd "ulimit -t 600 -m 2000000 -Sv 16000000; ulimit -t $(( 1 + 600)) -v $(( 1200 * 2000 )); <prover_cmd>

allow to download csv for errors/bad in a problem

probably need to use query parameters and not just url path in the csv handler…

release 0.1

once it's solid