dylan-lang / testworks Goto Github PK

View Code? Open in Web Editor NEW

5.0 9.0 5.0 801 KB

Test framework for Dylan

Home Page: http://dylan-lang.github.io/testworks/

Dylan 100.00%

dylan testing test-framework

testworks's Introduction

Testworks

Testworks is a Dylan unit testing library, also with simple benchmarking support.

Testworks is included as a developer dependency in every project created by dylan-tool.

Documentation is available on opendylan.org. To build the documentation locally, you'll need the furo theme.

testworks's People

Contributors

Stargazers

Watchers

Forkers

cgay waywardmonkeys baragent housel fraya

testworks's Issues

Benchmarks shouldn't require assertions.

Benchmarks have no need for assertions, but they are marked as not-implemented if they don't have any. It's probably worth reinstating "define benchmark" and handling this specially.

specs: need new generic-function-method spec type

In the System test suite the spec says:

  open generic-function \= (<date>, <date>) => (<boolean>);

The failure says:

      date-protocol-functions-test failed
        function = argument 0 type {class <object>} is a supertype of the specified
            type {class <date>} failed [expression "subtype?(actual, spec)" evaluates
            to #f, not a true value.]

(The sense of sub/super-type in the error is incorrect but that's not the issue here.)

This is checking the wrong thing. What this spec wants to check is that there is a specific method on = for (<date>, <date>) but what it's actually checking is that the = gf exists and is defined on parameters that are subtypes of <date>.

I propose a new spec:

  generic-function-method \= (<date>, <date>) => (<boolean>);

to do the needful here.

Handle errors in suite setup functions

I suspect that when a setup function signals an error the entire test run exits. Write a test for this and fix it.

No way for a testworks user to get the runner-output-stream

In some tests, I'd like to get the runner's output stream. While runner-output-stream is exported from testworks, the *runner* variable is not, nor is any utility function for getting that value to pass through to runner-output-stream.

Listing tests, suites should list their tags as well.

The command line options to list tests / suites should list their tags as well.

Don't bother to say "Running test ..." if it is skipped

If I run tests with specific tags, I don't want to see a huge list of skipped tests, I just want to see the tests I ran...

$ ~/dylan/src/opendylan/sources/_build/bin/strings-test-suite-app --tag benchmark 
Running suite strings-test-suite:
Running suite strings-module-test-suite:
Running suite strings-protocol-test-suite:
Running test strings-protocol-constants-test: SKIPPED in 0.000000s
Running test strings-protocol-variables-test: SKIPPED in 0.000000s
Running test strings-protocol-classes-test: SKIPPED in 0.000000s
Running test strings-protocol-functions-test: SKIPPED in 0.000000s
Running test strings-protocol-macros-test: SKIPPED in 0.000000s
Completed suite strings-protocol-test-suite: PASSED in 0.000000s
Completed suite strings-module-test-suite: PASSED in 0.000000s
Running test string-compare-benchmark: PASSED in 0.827351s
Running test string-compare-ic-benchmark: PASSED in 1.059075s
Running test string-equal?-benchmark: PASSED in 0.827623s
Running test string-equal-ic?-benchmark: PASSED in 1.057704s
Running test alphabetic?-benchmark: PASSED in 6.720620s
Running test alphabetic?-byte-string-benchmark: PASSED in 6.753149s
Completed suite strings-test-suite: PASSED in 14.245522s

strings-test-suite PASSED in 14.245522 seconds:
  Ran 3 suites: 3 passed (100.00000%), 0 failed, 0 skipped, 0 not implemented, 0 crashed
  Ran 6 tests: 6 passed (100.00000%), 0 failed, 5 skipped, 0 not implemented, 0 crashed
  Ran 6 checks: 6 passed (100.00000%), 0 failed, 0 skipped, 0 not implemented, 0 crashed

Provide public interface for making tests

It's a legitimate need of user code to create tests, suites, etc., but there's no public interface for this. https://github.com/dylan-lang/opendylan/blob/master/sources/lib/llvm/tests/llvm-asm-tests.dylan imports <test> and make-suite from %testworks in order to create one test per .ll input file.

Ideally it can just be make(<test>, ...) etc.

assert-instance? argument order

instance? has the signature instance? object type ⇒ boolean

However, assert-instance? and friends are type and then object.

I did assert-instance? and assert-not-instance? that way to match check-instance? but now that I'm using it, it just feels wrong.

Should we change assert-instance? and assert-not-instance??
If so, should we also change check-instance??

I think the answer is "yes" and "yes", but it'll inflict a bit of pain when updating due to tests that already use check-instance?. That said, I think it is better to just get this done now.

Adding a function to a suite fails at runtime

If you add a function to a suite as:

define suite some-suite ()
  test some-function;
end suite;

There are no compile time notices, but it fails at runtime with:

{<simple-method>: ??? ()} is not of type {<class>: <component>}
{function  0x60e1c8} is not of type {class <component> 0x113d58}

Report sent to file

Need a way to send a report to a file with no other content in the file (so redirecting stdout at the command line isn't sufficient).

This is needed for having Jenkins run tests and analyze the results.

Provide temp directory management

Testworks should have a standard way to manage temporary directories. For example:

test-temp-directory() to get a unique, already-created, empty test directory named after the test and contained in a directory common to the test run.

${DYLAN}/_test/run-<time>/<full-test-dotted-path-with-fs-unsafe-chars-removed>/?

The temp files should not be removed so test failures can be investigated. Probably just leave it to the user to rm -r _test.

testworks-report diff is useless

I noticed that running dylan-test-suite-app ran fewer tests than running testworks-run --load libdylan-test-suite.so so I tried to figure out which tests hadn't been added to the top-level suite.

I ran dylan-test-suite both ways, generating a json report for each, and then tried to diff them with testworks-report app.json run.json --report diff and got reams and reams of output about checks, making it extremely difficult to know which tests were left out of app.json.

The primary thing I want from a test report diff is to know whether any tests in the OLD run failed in the NEW run and which tests, if any, have been added or removed (and their status).

Checks are too fine-grained to care about, in my opinion. Adding a check is uninteresting as long as the test still passes. Minor tweaks to the description could cause noisy diffs.

Remove component-description slot

It is unused. We should just use regular code comments. (Maybe it was for gui-testworks originally?)

Re-run failing tests

Request from Peter Hull: an option to re-run failing tests.

Surefire output is broken

The surefire output errors on a couple of tests (I have a local fix for that), but even with that fixed, it looks incomplete.

"Expected to fail" improvements

Currently this feature is called expected-failure? and accepts a boolean or a function.

I think it should allow a string value, to document why failure is expected. This could be a separate option, expected-failure-reason, for the case where a function is provided, but whatever we do I think a reason should be required.
I would either call it expect-failure? or expected-to-fail?.

Make suites easier to use

It's too easy to forget to add tests/suites to the top-level suite. For example, as I'm working on simplifying specs I'm adapting dylan-test-suite to it and keep forgetting to transitively add them to dylan-test-suite.

If instead of this

define suite dylan-test-suite ()
  suite dylan-interface-test-suite;
  suite dylan-control-test-suite;
  suite dylan-regressions-test-suite;
  suite types-test-suite;
end;

we could write this

define suite dylan-test-suite ()
  make-root-test-suite();
end;

this wouldn't be a problem.

Error reading xml report

$ _build/bin/dylan-test-suite-app --report xml --report-file dylan-test-suite.master.xml
$ _build/bin/testworks-report dylan-test-suite.master.xml
Missing init keyword "#"seconds":"
Backtrace:
  invoke-debugger:internal:dylan##1 + 0x29
  default-handler:dylan:dylan##1 + 0x15
  default-last-handler:common-dylan-internals:common-dylan##0 + 0x2f5
  error:dylan:dylan##0 + 0x26
  error:dylan:dylan##1 + 0x76
  <test-result> constructor:%testworks:testworks##0 + 0x291
  convert-xml-node:testworks-report-lib:testworks-report-lib##0 + 0x738
  map-as-one:internal:dylan##2 + 0xe8
  convert-xml-node:testworks-report-lib:testworks-report-lib##0 + 0x5d8
  map-as-one:internal:dylan##2 + 0xe8
  convert-xml-node:testworks-report-lib:testworks-report-lib##0 + 0x5d8
  map-as-one:internal:dylan##2 + 0xe8
  convert-xml-node:testworks-report-lib:testworks-report-lib##0 + 0x5d8
  read-xml-report:testworks-report-lib:testworks-report-lib + 0xd0
  read-report:testworks-report-lib:testworks-report-lib + 0xe8
  display-run-options:testworks-report-lib:testworks-report-lib##0 + 0x4d4
  main:testworks-report-lib:testworks-report-lib##0 + 0x85

My inclination is to delete xml reports, since json reports seem to be working. No point maintaining so many formats. (Worth noting that xml report reading seems about 10x slower than json based on the time it took before blowing up as above.)

Also log reports:

// TODO(cgay): this is completely broken. Not sure when it happened.                                                                                                                                                                        
// Maybe we can just remove the "log" format and use json or xml instead?                                                                                                                                                                   
define function read-log-report-1
    (stream :: <stream>, #key ignored-tests = #[], ignored-suites = #[])
 => (result :: <result>)

Error when running tests on OS X

On OS X, I do:

./Bootstrap.3/bin/libraries-test-suite-app \
  --skip-test=common-extensions-protocol-classes-test \
  --skip-suite=semaphores-suite

And after running and while printing the results, it crashes:

libraries-test-suite FAILED in 80.406335 seconds:
  Ran 85 suites: 45 passed (52.941176%), 40 failed, 1 skipped, 0 not implemented, 0 crashed
  Ran 294 tests: 260 passed (88.435376%), 25 failed, 1 skipped, 7 not implemented, 2 crashed
  Ran 20044 checks: 19961 passed (99.585904%), 71 failed, 0 skipped, 0 not implemented, 12 crashed
#f is not of type {<class>: <buffer>}
#f is not of type {class <buffer> 0x472ce8}

No applicable method when listing components

 $ ./_build/bin/nanomsg-test-suite-app -l suites
No applicable method, applying {<sealed-generic-function>: list-component} to {<simple-object-vector>: {<unbound>}, {<test-runner>}}.
No applicable method, applying {<sealed-generic-function> 0x86998} to #[{<unbound> 0x2029f4}, {<test-runner> 0x6fad20}].
Breaking into debugger.
Trace/BPT trap: 5

Implement "expected to fail" for individual assertions

Right now, the support for expected failures is on an entire test definition.

It would be useful to be able to say that certain asserts / checks are expected to fail rather than the entire test.

This comes up in some tests that I have where a number of test cases don't work yet but I want to make sure that each of them is either failing or succeeding as expected. I don't want to create a separate test for each check that should fail.

Suites accept non-test objects

If you accidentally use the name of a normal function as a test (say, because you're reworking the testworks specification suites in the IO library) there is no error at compile time and at load time you get a cryptic error:

$ _build/bin/testworks-run --load libio-test-suite.so
Loading library libio-test-suite.so
{<simple-method>: ??? (<stream-test-info>, <buffered-stream>) => ()} is not present as a key for {<object-table>: size 1047}.
Backtrace:
  invoke-debugger:internal:dylan##1 + 0x29
  default-handler:dylan:dylan##1 + 0x15
  default-last-handler:common-dylan-internals:common-dylan##0 + 0x2f5
  error:dylan:dylan##0 + 0x26
  error:dylan:dylan##1 + 0x76
  key-missing-error:internal:dylan##0 + 0x27a
  gethash:internal:dylan + 0x8d2
  find-root-components:%testworks:testworks + 0x347
  run-test-application:testworks:testworks + 0x296

to be clear, if you have something like this:

define method test-with-input-buffer ... end;
define suite foo-suite ()
  test test-with-input-buffer;
end;

Prettify output a bit

In this output:

Running suite tracing-test-suite...
Running suite tracing-core-test-suite...
Running test test-span-annotations...
Ran check: annotations are lazily created passed
Ran check: span-annotate(span, "Test annotation.") doesn't signal an error  passed
Ran check: 1 = span.span-annotations.size passed
Ran check: "Test annotation." = annotation-description(span.span-annotations[0]) passed

It would be nice if there were ... separating the description from the result text, like this:

Running suite tracing-test-suite...
Running suite tracing-core-test-suite...
Running test test-span-annotations...
Ran check: annotations are lazily created... passed
Ran check: span-annotate(span, "Test annotation.") doesn't signal an error...  passed
Ran check: 1 = span.span-annotations.size... passed
Ran check: "Test annotation." = annotation-description(span.span-annotations[0])... passed

--report-file broken?

Current master:

batavia:testworks bruce (master) $ ./_build/bin/testworks-test-suite-app --report surefire --report-file testworks.surefire.xml
Attempted to call {<simple-method>: ??? (<result>, <stream>)} with 3 arguments
Attempted to call {function  0x5a2fe4} with 3 arguments
Breaking into debugger.
Trace/BPT trap: 5

Should be able to specify which tags to execute from the command line

"not implemented" should be considered a PASS

Currently if a test has no assertions it is marked as "not implemented", but its containing suite (or suites, because that's a thing) is/are counted as FAILED.

It seems to me that the purpose of creating a test but not adding any assertions is

to leave oneself a reminder that's more visible than just a code comment, or
because the test was auto-generated by OD's report generator

In either case, I don't think it's desirable for the test suite to fail, as long as there's a clear indication that of a TODO (which there already is) in the output.

The main point is that a failing test (suite) should be an indication of an actual problem that needs to be looked at before I commit my code. If it ain't broke, show me the green!

Fell through select when trying to select a single test

$ ./_build/bin/nanomsg-test-suite-app --test open-close-socket-nanomsg-test
Fell through select cases on {<unbound>}.
Fell through select cases on {<unbound> 0x1ba9f4}.
Breaking into debugger.
Trace/BPT trap: 5

Assertions for floating-point comparisons

It would be nice if testworks provide at least one method to compare floating-point numbers

Make data files accessible to tests

We need a way to bundle test data files with tests and access them at run-time. Needs build system support (data-files: in the LID file?) and ... accessing those files relative to the exe? @housel says thes are called "application bundles" in some circles. Bazel does it...

let path = test-data-file("path/relative/to/repo/root")

testworks-specs, test-unit, tags

We need to do something about testworks-specs using test-units.

My suggestion would be that we extend the suite-definer to allow inline tests, then where we use tests with test-units in them can become suites with inline tests.

Without addressing this, we can't ignore things in the specs-based tests at a low level (we have to ignore whole huge swathes of test units).

New report type: TeamCity

Documented here:

http://confluence.jetbrains.com/display/TCD8/Build+Script+Interaction+with+TeamCity#BuildScriptInteractionwithTeamCity-ReportingTests

I think this might make doing the IDE integration for DeftIDEA a bit easier.

Documentation: many references to perform-test

There are many REPL examples that use perform-test which doesn't exist any longer.

Documentation: Discusses old check-* rather than assert-*

Under http://opendylan.org/documentation/testworks/usage.html#defining-tests, many things are still discussing the check-* forms instead of the assert-* forms.

Let's also go ahead and say that the check-* forms are deprecated.

Documentation: broken link

Under http://opendylan.org/documentation/testworks/usage.html#defining-tests, the following aren't linked:

check-no-condition (also called check-no-errors)

Run tests/benchmarks multiple times, option.

Add an option to run tests multiple times and report the shortest, average, and longest times. Useful with --tag benchmark, but also possible to use for flaky tests.

with-test-unit is broken

This is breaking our builds on Jenkins.

Please fix ASAP!

Custom assertions

It is reasonable for users of testworks to want to be able to create their own assetions so that they can perform a custom check and, if it fails, a nice error message that is informative.

This would replace something like:

define function static-type-check?(lambda        :: <&method>,
                                   expected-type :: <type-estimate-values>)
  => (stc :: <boolean>)
  // Useful thing to put in the body of a test: infer the return values
  // of lambda, and ask if they match expected-type.
  local method final-computation-type(c :: <&method>)
          // What is the type of the final computation?  (I.e., the return.)
          let cache = make(<type-cache>);
          type-estimate-in-cache(c, cache);                         // fill cache w/types
          type-estimate-in-cache(final-computation(body(c)), cache) // just the last guy
        end;
  let found-type = final-computation-type(lambda);
  if (type-estimate=?(expected-type, found-type))
    #t
  else
    when (*static-type-check?-verbose?*)
      // Sometimes you want a diagnostic for the failure cases.
      dynamic-bind (*print-method-bodies?* = #t)
        format-out("\nFor %=:\nExpected type: %=\n  Inferred type: %=\n\n",
                   lambda, expected-type, found-type)
      end
    end;
    #f
  end
end;

define macro typist-inference-test-definer
  // Define manual compiler test & remember the name in typist inference registry
  { define typist-inference-test ?test-name:name
      ?subtests
    end }
  => {
       define test ?test-name ()
         with-testing-context (#f)
           ?subtests
         end
       end }
subtests:
  // ;-separated test specs expand into a conjunction of test results
  { }               => { }
  { ?subtest; ... } => { ?subtest ... }
subtest:
  // Wrap with try ... end and hand off to static-type-check? to match
  // against the values specification.
  { } => { }
  { ?:expression TYPE: ?val:* }
    => { assert-true(static-type-check?(compile-to-method(?expression),
                                        make(<type-estimate-values>, ?val))); }
end;

We don't want to just know if the test was returning #t or #f. We want to know if one value as type-estimate=? with the other, and if not, a useful error message that gives both values.

undefined binding perform-component

The Jenkins builds are getting this now:

/home/jenkins/workspace/opendylan-release-linux-lucid-x86_64/sources/qa/testworks/specs/variable-specs.dylan:75-84: Serious warning - Reference to undefined binding {perform-component in %testworks}.

test issue

testing gitter integration

Command-line improvements

Current Testworks command line:

$ _build/bin/testworks-run --help
Usage: testworks-run [options]

Run test suites.
      --debug WHAT        Enter the debugger on failure: NO|crashes|failures
  -p, --progress TYPE     Show output as the test run progresses: none|DEFAULT|verbose
      --report TYPE       Final report to generate: failures|full|json|log|none|summary|surefire|xml
      --report-file FILE  File in which to store the report.
      --load FILE         Load the given shared library file before searching for test suites. May be repeated.
      --suite SUITE       Run (or list) only these named suites. May be repeated.
      --test TEST         Run (or list) only these named tests. May be repeated.
      --skip-suite SUITE  Skip these named suites. May be repeated.
      --skip-test TEST    Skip these named tests. May be repeated.
  -l, --list WHAT         List components: all|suites|tests|benchmarks
  -t, --tag TAG           Only run tests matching this tag. If tag is prefixed with '-', the test will only run if it does NOT have the tag. May be repeated. Ex: --tag=-slow,-benchmark means don't run benchmarks or tests tagged as slow.
  -h, --help              Display this message.

Off the top of my head, I propose these changes:

Always output the result summary. If someone truly doesn't want to see this (e.g., automation) then >/dev/null is available.
Get rid of --debug=no. This is equivalent to not specifying the --debug flag.
--report should be only about the report format. Therefore the options are json, xml, surefire, and log. (But we should delete log, and maybe xml.)
Replace --progress with --verbose boolean option to display what you currently get from --report full. --report full goes away.)
Replace --suite and --test with --match, which accepts a regular expression and may be repeated. Suites and tests occupy the same namespace so there is no ambiguity.
Replace --skip-test and --skip-suite with --skip, which accepts a regular expression and may be repeated. Ditto.
Make --list a boolean flag (i.e., and list the tests that would be run based on --match and --skip.
Add a few more short options (see below).
Provide a default location for --report-file. Probably ./testworks-report.json etc. Tell user where the report was written.

The end result looks something like this:

Run tests.
  -d, --debug WHAT        Enter the debugger on failure: crashes|failures
  -v, --verbose           Show output as the test run progresses.
      --report TYPE       Final report to generate: json|log|surefire|xml
      --report-file FILE  File in which to store the report.
  -l, --load FILE         Load the given shared library file before searching for tests. May be repeated.
      --match REGEX       Run only tests and suites matching this regular expression. May be repeated.
      --skip REGEX        Skip tests and suites matching this regular expression. May be repeated.
  -l, --list              Only list the tests that would have been run.
  -t, --tag TAG           Only run tests matching this tag. If tag is prefixed with '-', the test will only run if it does NOT have the tag. May be repeated. Ex: --tag=-slow,-benchmark means don't run benchmarks or tests tagged as slow.
  -h, --help              Display this message.

Fell through select cases on {<benchmark>}.

~/dylan/src/opendylan/sources/system (locators)*$ ../_build/bin/system-test-suite-app --suite locators-utilities-suite --test file-locator-as-string-benchmark
Fell through select cases on {<benchmark>}.
__kernel_rt_sigreturn+0 ()
Kinvoke_debuggerVKiMM1I+53 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
Khandle_missed_dispatchVKgI+561 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
general_engine_node_n+63 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
Kdefault_handlerVKdMM1I+26 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
Khandle_missed_dispatchVKgI+561 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
general_engine_node_n+63 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
0xb761cec9 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
0xb761cf56 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
Khandle_missed_dispatchVKgI+561 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
general_engine_node_n_optionals+106 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
KerrorVKdMM0I+58 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
Khandle_missed_dispatchVKgI+561 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
general_engine_node_n_optionals+106 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libdylan.so)
0xb726c56c (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libtestworks.so)
0xb726c5be (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libtestworks.so)
Kfind_runnableYPtestworksVtestworksMM0I+59 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libtestworks.so)
Kfind_componentYPtestworksVtestworksMM0I+105 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libtestworks.so)
Kfind_componentsYPtestworksVtestworksMM0I+294 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libtestworks.so)
Kmake_runner_from_command_lineYPtestworksVtestworksI+667 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libtestworks.so)
Krun_test_applicationVtestworksMM0I+350 (/home/cgay/dylan/src/opendylan/sources/_build/bin/../lib/libtestworks.so)

Remove distinction between "crashed" and "failed"

I see no use in the distinction between tests that "crashed" and tests that "failed". I only see unnecessary complexity. Am I wrong?

In particular this comes up in relation to the "expected to fail" feature for tests. I notice that the documentation says "This option has no effect on tests which are not implemented or which have crashed." Why would this feature affect failures but not crashes? (For that matter why would it not affect not implemented tests? If the test is not implemented, don't mark it as expected to fail! But I digress....)

In the summary output, failures and crashes are listed separately, and not next to each other, leaving it up to the user to do the addition, a minor annoyance:

testworks-run FAILED in 0.029578 seconds:
  Ran 7 suites: 6 passed (85.714288%), 1 failed, 0 skipped, 0 not implemented, 0 crashed
  Ran 45 tests: 44 passed (97.777776%), 0 failed, 0 skipped, 0 not implemented, 1 crashed

What does it mean for a suite to fail vs crash? Unclear. There is one sentence in the doc that hints at it: "If setup-function signals an error the entire suite is skipped and marked as “crashed”." Is that the only way it's marked as crashed? Why not mark a suite crashed if one of its tests crashed too? Dunno.

In my view we should remove the distinction. Simplify simplify simplify.

(But what to do when a suite setup function fails is an open question. I suggest we simply note the suite setup function failure in the results and let the tests themselves run as normal. They will probably fail due to the setup failure, providing an unmistakable signal.)

check- vs. assert-

When I was first exposed to the xUnit frameworks (SUnit, jUnit, nUnit, etc., see http://xunitpatterns.com/xUnit.html), I noted that

The xUnit frameworks use a semantics of aborting the entire test method when an assertion fails.
This is considered best practice for test frameworks nowadays; it eliminates meaningless cascades
of assertion failures, and encourages a no-broken-windows policy more than a
note-failure-and-continue policy would.
Naming the assertion methods assertWhatever is more suggestive of that semantics,
by comparison with assert statements and assert macros in various programming languages.
(Not that this is a universal choice; CppUTest uses CHECK_WHATEVER and Google Test
uses EXPECT_WHATEVER.)
For testworks, we could adopt the new abort-test-on-failure semantics for macros named assert-,
and keep the existing semantics on the check- macros in existing checks.

Unfortunately I never communicated that observation to anyone. And so, when the current assert-* macros were added in 2013, I considered it a missed opportunity.

However, after five years I still think it would be a good idea. I suggest that we adopt abort-test-unit-on-failure semantics for the assert-* macros, and keep (but discourage) check-* semantics for existing tests.

Failure messages from testworks specs can be confusing

20:40 < brucem> housel: it always seems to me like this message is backwards / wrong:
20:40 < brucem>   function debug-message argument 0 type {class <byte-string>} is a supertype of the specified type {class <string>}: [expression 
                "subtype?(spec, specializer)" evaluates to #f, not a true value.]
20:49 < housel> The spec says "function debug-message (<string>, #"rest") => ();", i.e. that it should be able to accept any <string>
20:49 < housel> But the implementation is only defined for <byte-string>
20:50 < housel> If argument 0 were a supertype of the spec, it would be able to accept all specified values
20:53 < brucem> oh ... the message is supposed to describe the correct result ... but I read it as describing the problem.
20:53 < brucem> cgay: ^ confusing.
20:55 < housel> Yeah, the message is the name of the check; some sort of quotation might be helpful
20:57 < brucem> in that particular case ... debug-message needs to be a <byte-string> as it just calls to primitive-debug-message and is a function so that 
                it can avoid any sort of generic dispatch and be safe for use within dispatch or other low level code.

New status: "expected failure"

We have a bunch of tests that we know fail today, but we'd like to ignore that and look at keeping the ones that pass passing and look at the current failures later. The way things are now, if a test starts to fail here or there, we won't notice in the sea of existing failures.

I'd like to add a new test result status $expected-failure. This also requires a matching ``$unexpected-success`. The idea is that we run the test, but then look at it to see if we expected it to fail or not and then return 1 of the new statuses.

First, via the test defining macro:

define test foo (expected-failure: #t)
  ...
end test;

There are also cases where we want to expect to fail in some situations, but not all:

define test foo (expected-failure-if: method () $os-name == #"win32" end)
  ...
end test;

However, this doesn't work for things using testworks-specs currently. For that, we need to land my testworks-specs changes that simplify it a lot and let us then modify things so that you can pass keywords to function-test and the other macros.

This will require some changes to how we output result summaries. We will also probably have to modify the reporting output that we use for integrating with Jenkins.

JSON report reader can't restore error results

The new JSON report reader dies when trying to restore the dylan-test-suite report, apparently due to one of the check results being an error; a case that doesn't occur in the testworks-test-suite so I missed it.

Track test output with test results

Feature request! (or maybe it can already be done...)
Quite a few tests produce output which isn't directly a test pass or fail (e.g. the threads stuff). Would it be possible to provide a function to log this? (instead of using format-out ad hoc in the tests). Then the output could be captured in the report but otherwise not shown (configurable by command line options)

Notify of completion of test / suite and give the time elapsed

In this output:

Running suite tracing-test-suite...
Running suite tracing-core-test-suite...
Running test test-span-annotations...
Ran check: annotations are lazily created passed
Ran check: span-annotate(span, "Test annotation.") doesn't signal an error  passed
Ran check: 1 = span.span-annotations.size passed
Ran check: "Test annotation." = annotation-description(span.span-annotations[0]) passed
Ran check: span-annotate(span, "Second test") doesn't signal an error  passed
Ran check: 2 = span.span-annotations.size passed
Ran check: "Second test" = annotation-description(span.span-annotations[1]) passed
Running test test-span-data...
Ran check: data vector is lazily allocated passed
Ran check: span-add-data(span, "key1", "value1") doesn't signal an error  passed
Ran check: 2 = span.span-data.size passed
Ran check: "key1" = span.span-data[0] passed
Ran check: "value1" = span.span-data[1] passed
Ran check: span-add-data(span, "key2", "value2") doesn't signal an error  passed
Ran check: 4 = span.span-data.size passed
Ran check: "key2" = span.span-data[2] passed
Ran check: "value2" = span.span-data[3] passed
Running test test-span-stopped...
Ran check: spans should start out running passed
Running test test-span-accumulated-time...
Ran check: spans have no accumulated time while running passed

It'd be nice if there were lines line this:

Completed test test-span-data... passed (0.03s)

And similar for suites.

Feature request: assert-not-equal

I'd like to have assert-not-equal

Right now, I have to just do assert(... ~= ...) which doesn't give nice output.