se2p / tool-competition-av Goto Github PK

View Code? Open in Web Editor NEW

20.0 6.0 23.0 67.48 MB

The repository hosts the code for the SBST CPS Tool competition for testing autonomous cars.

Home Page: https://sbst22.github.io/tools/

License: Other

Python 98.56% C# 1.44%

test-automation self-driving-car beamng competition simulation

tool-competition-av's Introduction

Cyber-Physical Systems Testing Tool Competition [OUTDATED]

This repository refers to the 2022 edition, to check the latest version and join the current competition, you can visit the current repository

Contacts

For more information on the 2022 edition, contact:

Dr. Alessio Gambi - IMC Krems, Austria

Dr. Vincenzo Riccio - Università di Udine, Italy

Goal

The SBST Workshop offers a challenge for software testers who want to work with self-driving cars in the context of the usual tool competition.

The competitors should generate virtual roads to test a lane keeping assist system using the provided code_pipeline.

The generated roads are evaluated in the BeamNG.tech driving simulator. This simulator is ideal for researchers due to its state-of-the-art soft-body physics simulation, ease of access to sensory data, and a Python API to control the simulation.

Note: BeamNG GmbH, the company developing the simulator, kindly offers it for free for researcher purposes upon registration (see Installation).

Comparing the Test Generators

Deciding which test generator is the best is far from trivial and, currently, remains an open challenge. In this competition, we rank test generators by considering various metrics of effectiveness and efficiency that characterize the generated tests but also the process of generating them, i.e., test generation. We believe that our approach to compare test generators is objective and fair, and it can provide a compact metric to rank them.

Ranking Formula

The formula to rank test generators is the following weighted sum:

rank = a * OOB_Coverage + b * test_generation_efficiency + c *  test_generation_effectiveness

where:

OOB_Coverage captures the effectiveness of the generated tests that must expose as many failures as possible (i.e., Out Of Bound episodes) but also as many different failures as possible. We compute this metric by extending the approach adopted in the previous edition of the competition with our recent work on Illumination Search. As an example, our novel approach has been already adopted for the generation of relevant test cases from existing maps (see SALVO). Therefore, we identify tests' portion relevant to the OOBs, extract their structural and behavioral features, and populate feature maps of a predefined size (i.e., 25x25 cells). Finally, we define OOB_Coverage by counting the cells in the map covered by the exposed OOBs. Larger values of OOB_Coverage identify better test generators.
test_generation_efficiency captures the efficiency in generating, but not executing, the tests. We measure it as the inverse of the average time it takes for the generators to create the tests normalized using the following (standard) formula:

norm(x) = (x - min) / (max - min)

Where min and max are values empirically found during the benchmarking as the minimum and maximum average times for generating test across all the competitors.
test_generation_effectiveness captures the ability of the test generator to create valid tests; therefore, we compute it as the ratio of valid tests over all the generated tests.

Setting the Weights

We set the values of the in the ranking formula's weights (i.e., a, b, and c) to rank higher the test generators that trigger many and different failures; test generation efficiency and effectiveness are given equal but secondary importance. The motivation behind this choice is that test generators' main goal is to trigger failures, while being efficient and effective in generating the tests is of second order importance.

The following table summarizes the proposed weight assignment:

a	b	c
0.6	0.2	0.2

Implement Your Test Generator

We make available a code pipeline that will integrate your test generator with the simulator by validating, executing and evaluating your test cases. Moreover, we offer some sample test generators to show how to use our code pipeline.

Information About the Competition

More information can be found on the SBST tool competition website: https://sbst22.github.io/tools/

Repository Structure

Code pipeline: code that integrates your test generator with the simulator

Self driving car testing library: library that helps the integration of the test input generators, our code pipeline, and the BeamNG simulator

Scenario template: basic scenario used in this competition

Documentation: contains the installation guide, detailed rules of the competition, and the frequently asked questions

Sample test generators: sample test generators already integrated with the code pipeline for illustrative purposes

Requirements: contains the list of the required packages.

License

The software we developed is distributed under GNU GPL license. See the LICENSE.md file.

tool-competition-av's People

Contributors

Stargazers

Watchers

tool-competition-av's Issues

Time budget

Would it be possible to know the order of magnitude of the time budget during the competition? Will it be minutes, hours, days? It would be useful to know that to determine which approach is more suitable.

Thank you very much again.

Clarify that the files produced by the executors are "internal/private"

The file generated by the executors unless said otherwise are to be considered internal/private and test generators should not rely on them. Instead, test generators should solely work with the execution data returned by the underling executor

Organize generates filed in a more systematic way

We need to define an organization of folders/files, and naming conventions to properly manage the results and artifacts of the test generation

Integrate scripts from evaluator

We should sync public and private repos and integrate the analysis scripts uses for the competition here.

Documentation for beamng-user

Setting up the pipeline is quite easy by following your documentation. However, I struggled a bit with the beamng-user argument. In GUIDELINES.md under section Technical considerations is this argument not documented.

test report generation fails when tests are invalid

The current mechanism for generating the OOB reports fail when there's at least one invalid test

Sharpness check fails for straight segments

Steps to reproduce:

road_points = []
road_points.append((50, 50))
road_points.append((10, 10))

or simply run unit_tests.test_road_that_is_entirely_inside_the_map

Add licensing information

Many files miss the license information. This should be added

Define a TestFactory to easily give names/ids to tests

This makes it possible to uniquely identiy tests

Introduce a RoadTest class

Avoid to use plain list with points and introduce a structured abstraction to handle tests instead

OOB_percentage reporting

Hello,
would it be possible to integrate in the code pipeline an oob_percentage reporting? After execution of a test case, to either obtain the oob percentage value per state or the maximum percentage value, regardless if the executed test case passed or failed (similar to the reporting of the distance value within the execution data).

BeamNG Executor

Setup a beamng executor that uses beamgpy to setup the scenario, configure bemangAI to drive from the first to the last point in the road, and collect all the simulation/sensor data except images, lidars. Data must be tagged using SIMULATION time (from the Timer sensor).

Which version of Python to use?

Is there any specific version of python we should use?

Creating a road makes the simulation loop

Additionally, the simulation goes into an infinite loop.

this is the code to generate the test:

        test.append( (10, 10) )
        test.append( (10, self.map_size -10) )
        test.append( (self.map_size -10, self.map_size -10) )
        test.append( (self.map_size -10, 10) )

Validating sharp turns

At the moment we do not check the validity of the roads with respect to the curvature, but we should. For example, invalid roads have turns so sharp that it is physically impossible for a car to drive on them.

Distinguish starting and ending points

Improve visualization

Sample test generators readme

@alessiogambi describes random_generator and one_test_generator
@p1ndsvin describes deepJanus_seed_generator

Minimum length of valid roads

Catmull-Rom cubic works with at least 4 points. Enforce the min number of points as 4.

update requirements.txt and dependencies

Since a few commits, the requirements.txt file got out of sync. We need to update this

Improve stats on execution time

Possibly, we should report the duration in real-time and simulated time for executing tests as well as the time spent in generating them.

Use a logger instead of print/console

Create a report file at the end of the run

Now we report the stats in the CLI

Unique OBE

Test generators that expose the same problem (e.g., OBE) multiple times are not that effective. We should report in the statistics and after each failed test execution, whether or not a test failed the same way (or similarly) before.

Clarify definition of OOB

see #43

Problems with copying the TIG maps

If the script is executed from a directory different than root, beamng executor cannot find the "TIG" levels and fail.
For the moment the workaround is to manually copy

Mock Executor

The mock executor can rely on TUM logic for trajectory planning, but will not be able to provide all the data that BeamNG does.

Multi-threaded execution

We understand that running multiple simulations in parallel is not possible. However, is there any other restriction on the number of threads we can use? For instance, is it possible to run a background thread while the simulator is running?
Thanks again!

Distinguish failed test per type

Tests might fail because the car drives out of the lane (OBE) or because of a timeout triggers (the ego-car does not move).
The statistics should keep track of this

Test Oracles

We need to define basic test oracles:

Stand still oracle. If the car does not move in 10 sec test fail.
OBE (approx). We consider the position of the car (not the bounding boxes), and if this is outside the lane, triggers the test failure.
Overall timeout. The test must complex in 5 minutes (configurable).

Factorize interface for test generators

Hi, I am new to this framework. I like the code pipeline and it is easy to use. I felt free to share a few thoughts about the interface for generating the test cases :-).

As a new user of this code pipeline, I want to implement only the part of a test generator that describes the actual test case such that I don't need to focus on the test execution and reporting.

Suggestions for refactoring:

Factorize the interface of the test generators into an abstract base class (e.g., start and __init__ methods) which have to be implemented by the test generators.
Use dependency injection for road_points in the start method:

def start(road_points):
  # add points to `road_points`

In case of multiple tests the data structure of road_points could look like road_points = [test1_list_of_points, test2_list_of_points]
Extract the code for the test execution.
Extract the code for reporting the test outcome.

These are just my thoughts. What do you think about this design option? Do you see some drawbacks in this?

Check budget at the end of the simulation

Check if the budget is over at the end of the last test case

Add screenshots of generated roads taken from BeamNG simulations

Add timing information to generated tests

It would be nice to record the additional information about generated tests:

creation time
execution time (duration in real-time)
duration in simulated-time

Update requirements.txt

At the moment it only contains click for parsing the command line arguments

Car driving out of the lane but output is success

We have a question regarding the output of the tool.

In the description of the competition, it says that "a test fail if the ego-car does not move, or does not reach the end of the road within a timeout (computed over the length of the road), or drives off the lane."

Given the path [(10, 100), (40, 113), (60, 115), (80, 99), (100, 80), (120, 70), (140, 58), (160, 38), (180, 30)], the ego car goes out of the lane in the very first curve. However, the test continues to the end and the output of the test is "success". Shouldn't the output of this test be failed? What does exactly mean to "drive off the lane"? We assume that crossing the yellow line should also be a failure.

Also, does the test always continue to the end even if there is a failure at an early stage? Or should the test actually stop around 00:08s, when the car goes out of the lane?

out_of_lane_480p.mov

Thank you.

Documentation

Improve the documentation to include installation instructions, example of usage, and instructions to register and setup BeamNG.research.

Test Validators

Implement the listed test validators including:

min road length
max number points
min curvature / road shapes
self-intersection and overlapping
"Type" checking. The test must be a list of tuples or similar. Not sure how to handle duck typing.

Missing PyOpenGL-accelerate Dependency

I am not sure if this is needed on all systems but I had to install also the PyOpenGL-accelerate module. Maybe this can be added to the requirements-37.txt.

Collect statistics about generation

We need a component that collects some statistics about the generation, including

generated tests
valid and invalid tests
passed and failed and errored tests
execution time
generation time

"oob_distance" is NaN

In the current version, we are always obtaining NaN as the "oob_distance" value in the JSON files.

Example: "steering": 44.62020603589313, "steering_input": -0.08308082203328948, "brake": 0, "brake_input": 0, "throttle": 0, "throttle_input": 0, "wheelspeed": 9.729266273285269, "vel_kmh": 35, "is_oob": false, "oob_counter": 0, "max_oob_percentage": 0, "oob_distance": NaN

After inspecting the code, we realized that in line 30 of oob_monitor.py the clauses of the inline if seem to be flipped.

Currently:
last_max_oob_percentage = self.last_max_oob_percentage if oob_bb else float("nan")  (line 29)
oob_distance = float("nan") if oob_bb else self.oob_distance( wrt=wrt) (line 30)

How we believe it should be:
last_max_oob_percentage = self.last_max_oob_percentage if oob_bb else float("nan")  (line 29)
oob_distance = self.oob_distance( wrt=wrt) if oob_bb else float("nan") (line 30)

Could you confirm that our understanding is correct and fix the code in the repository?
Besides, can we understand a negative value of oob_distance means that the car is out of bound and the test failed?

Thank you very much in advance, and happy new year!

time.time_ns() method not available in Python 3.6

I just tried to follow the installation guide that recommends Python 3.6 (at least it states that stuff was tested with 3.6).

The time.time_ns() method on line 120 in competition.py was introduced in Python 3.7. This has to replaced by either int(round(time.time() * 1e9)) or the docs should not recommend Python 3.6.

Test Simulation Timeout

We do have in place a timeout mechanism that triggers if the test subject does not move. However, this is not yet tested, and we should probably introduce a way to specify a timeout on the entire simulation as well.

License

Add the following license header in all the source files:

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see https://www.gnu.org/licenses/.

Acknowledge and link this repository and its authors in case of usage.

Add some visualization for the tests

Add basic visualizations of the test, mostly for debugging purposes.

Specify oob tolerance and speed limit on command line

Add options to customize oob tolerance and speed limit

Update submission details and clarify about the competitino test subject(s)

Mostly documentation

BeamNG not starting, need configurable userpath

BeamNG does not start with my non-standard userpath (Windows' "My Documents" moved from standard C: drive to F:). This results in a timeout when running the competition scripts.

2021-01-11 14:23:39,040 ERROR    Uncaught exception:
Traceback (most recent call last):
  File "road_definition.py", line 50, in <module>
    main()
  File "road_definition.py", line 39, in main
    bng = beamng.open(launch=True)
  File "F:\Anaconda3\envs\sbst21\lib\site-packages\beamngpy\beamng.py", line 318, in open
    self.skt, addr = self.server.accept()
  File "F:\Anaconda3\envs\sbst21\lib\socket.py", line 205, in accept
    fd, addr = self._accept()
socket.timeout: timed out
2021-01-11 14:23:39,109 WARNING  sys:1: ResourceWarning: unclosed <socket.socket fd=696, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 64256)>

2021-01-11 14:23:39,131 WARNING  F:\Anaconda3\envs\sbst21\lib\subprocess.py:786: ResourceWarning: subprocess 11936 is still running
  ResourceWarning, source=self)

Checking Executor Precondition on Startup

BeamNG executor requires some preconditions to be met, including env variables like BNG_HOME. Currently, those are not checked and the execution simply fails with an error message.

Comments/suggestions on the whole repository structure

I would refactor the current README.md. Specifically, let's separate the description of the repository from the actual information concerning the installation of the pipeline to run the competition. We should transform the first page to its essential information and links, see the following suggestions for it.
We need to have on the main page information like

what the competition is about
little overview on BeamNG research and make it clear we are using their simulators (with some reasoning about it)
add if possible video of the simulation (from youtube there are many) potential scenarios
what we make available
where to find the code and the guideline to run the competition

before going to the low level of detail, we should cross-reference the SBST web page:
- add the image and text concerning the competition on CPS
- add the like to the original web page of the SBST tool competition:https://sbst21.github.io/tools/

We should have a table of content in Markdown style (since the repository is growing) as here: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

if possible I would refactor the repository into folders

"code-pipeline": containing the code to run the actual competition (partially described in the main page)
OPTIONAL folder "previous results": reference to previous paper and tools that used the previous version of the pipeline
"datasets": previous and current dataset (if any are provided)

for each folder mentioned in point 2) add a low-level README.md file that describes its content.

~~4) Maybe link to the chair for the SBST tool competition somewhere, with links to home pages?~~

I would use a different structure for the README.md of the GUIDELINE similar to the following:

"Setup Guide and Program Description"
The goal of this part is to give a brief description of how the competition code works.

Supported Operating Systems

Windows ..."

Pre-Requisites

Python ...
Memory: XX GB
...
For usage on Windows, add ...

Setup Information
For information about setup and use, please refer to the instruction provided here.

Pipeline architecture description
...

Pipeline installation
...

TO RUN THE TOOL
....
...
"

for the architecture description, with Alessio, we got a figure from the BeamNG team for a paper we wrote with them.

Add section reference, so that the original pipeline is linked to the repository
Let's add a separate section to the "Competition Evaluation Methodology" (or something similar) to describe what the competition actually does
Move installation consideration under the installation section of the pipeline
In general, try to have a README.md file for each sub-folder, so that who visit the repository does not get lost.

Add description to competition.py options

There are several options for the competition.py app. We should document better them

se2p / tool-competition-av Goto Github PK

tool-competition-av's Introduction

Cyber-Physical Systems Testing Tool Competition [OUTDATED]

Contacts

Goal

Comparing the Test Generators

Ranking Formula

Setting the Weights

Implement Your Test Generator

Information About the Competition

Repository Structure

License

tool-competition-av's People

Contributors

Stargazers

Watchers

Forkers

tool-competition-av's Issues

Recommend Projects

Recommend Topics

Recommend Org