dgbowl / tomato Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 6.0 7.31 MB

tomato: au-tomation without the pain!

Home Page: https://dgbowl.github.io/tomato

License: GNU General Public License v3.0

Python 100.00%

automation

tomato's People

Contributors

Stargazers

Watchers

Forkers

peterkraus lorisercole philpreikschas nukp empaeconversion g-kimbell

tomato's Issues

New driver interface for `tomato-1.0`

Motivation

In the tomato-0.2 branch, the driver and job interfaces were basically merged into one piece of code: each job talked to each physical device separately, which was causing some race conditions (#28). With only one supported type of device, i.e. the biologic driver, this kind-of made sense at the time.

In tomato-1.0, we want to pave the road for a "device dashboard", meaning the device statuses have to be accessible from outside of the jobs. As a reminder, the relationship between jobs and devices in tomato-1.0 is shown below:

Basically, each device (a digital twin of a physical device) is managed by a driver, and there is only one driver process running managing all devices of that type. All communication with the physical device is therefore handled by the driver, the individual physical devices (and channels within them) can be addressed when one knows the device address and channel.

Requirements

We would like the new driver interface to handle a wide range of physical devices, e.g.:

thermocouple readers, e.g. PicoLog, which are almost completely read-only
volumetric flow meters, e.g. Mesa DryCal, which have a few adjustable parameters, have a start/stop function
mass flow controllers, e.g. Bronkhorst, which have setpoints
temperature controllers, e.g. Jumo, which might have ramps
gas chromatographs, e.g. Fusion, where a measurement might take ~ 5 minutes and might have to be scheduled
potentiostats, e.g. Biologic, where a single set of instructions can contain a whole cycling protocol

The rest of tomato should be completely driver-agnostic, i.e. everything relevant for the measurement comes from the driver (available parameters, units, adjustable limits, etc.). This means the list of techniques and parameters, i.e. the driver specific language (DSL), has to be defined and documented in the driver docs.
Some functionality, e.g. task scheduling or conditional interruption, should be probably implemented just once and made available to every driver via specific keywords. Currently I can think of the following examples:
- if I want to measure temperature for 10 minutes every 15 seconds, but only poll for new data every 60 seconds, I need to be able to tell the driver that it's supposed to communiate with the device with a 15 second resolution, caching the data. Then, tomato calls the driver's get_data() every 60 seconds, and sends a stop signal after 10 minutes.

Implementation

Each driver can be a separate python package, e.g. tomato-biologic or tomato-bronkhorst, for easier maintenance.
Tomato provides a central abstract Driver class, which is inherited from and exposed by in these packages. The current model for the class looks like this:
driver_proto_v01.txt

Design questions

what can be abstracted safely and be handled tomato-side?
are we missing any key functionality?
how to indicate Driver features (e.g. long acquisition time in GC requiring scheduling, or batching of requests for multi-channel devices)?

Queue: stop job & handle restore

Restoring from crashes of tomato as well as stopping of running jobs with ketchup should be implemented.

Implement `driver_reset`

A driver_reset function that sets every Component of every Device in the Pipeline needs to be re-introduced.

Annotate data from multiple devices

Currently, when the partial Datasets generated by separate devices are concatenated into one using xarray.concat(), we align the dataset on the "uts" coordinate, but the data_vars are unmodified. This means that if multiple devices in a pipeline produce the same column (e.g. "flow" for a flow meter), its difficult to disambiguate.

Solution: the role of the device in the pipeline should be prepended to all columns.

`Payload-1.0`: New schema for tomato payloads

It's time to update the payload schema. The wishlist currently includes the following bits:

pydantic-2.0 compatibility
forward compatibility of Payload-0.2 via .update() mechanism
separate device-specific parts of the method from (optional) general commands managed by tomato, such as:
- maximum task duration
- measurement frequency
- task start time relative to payload start (e.g. "after 2 hours")
- trigger propagation between pipeline components (i.e. task completing on one component will trigger next tasks on every component)

Tagging @edan-bainglass for comments, with respect to the AiiDAlab-Aurora schemas. It might be a good time to make the payload schema here a "subset" of the other schema.

`biologic`: Implement automatic I and E range selection

The I and E range can be selected automatically based on C/D rates.

Implement job queue.

Jobs should be ordered in an internal, 3-step queue:

queued jobs
running jobs
finished jobs

In the first instance, the jobs should specify a sample, a payload, and optionally a pipeline.

Wrap and extend BioLogic's kbio

The Python kbio provided by BioLogic should be wrapped by extended and wrapped by tomato to implement the following functionality:

OCV
CALIMIT / CPLIMIT
LOOP

In a second stage the interface should be completed by including the following techniques:

VSCANLIMIT / ISCANLIMIT
PEIS/GEIS

compatibility: make `tomato` work on both Windows and Linux

Currently, tomato is Windows-only, as the only real device driver that is currently supported (biologic) requires the Windows DLL interface. However, the dummy driver should be platform agnostic, and the code should be modified to work on both Windows and Linux.

ketchup cancel queued jobs

Being able to cancel a job that is waiting in the queue would be useful functionality

sample id not in datagrams

On 0.2.x by default the sample name is not recorded in snapshot or final files

It could also be useful to record the channel and address used in the final file metadata

jobs don't 'complete with error' when start_job fails

When starting a job, if the biologic says it is in state "RUN" the job never starts, and it never switches to 'ce' (completed with error), it stays in a frozen running state and has to be manually cancelled.

The same happens if an error is thrown by drivers.biologic.start_job (e.g. there is a problem with the payload or the firmware isn't loaded), the job stays frozen and running.

`biologic`: I Range of `keep` does not work

During testing with @lorisercole we found that the I range setting of keep does not actually keep the previous I range. Needs debugging and fixing.

Implement data export.

tomato should be able to create a dataschema, call yadg, and place the created datagram in the output folder specified by the user. This is also related to dgbowl/yadg#54.

Implement samples.

The job matching process has to be able to specify a sample which is required for the job to be started. In the first instance, two things have to be implemented:

a way for the user to specify which sample is loaded in which pipeline using tomato
a way for the scheduler to match a queued job against the pipeline+sample combination

Fix `xfail` tests

Fix the following tests, which are currently flaky - the output file sometimes does not get generated:

test_ketchup_cancel
test_ketchup_snapshot

Implement device settings.

Device settings should be stored in a persistent location. The settings file should contain the following:

information about each connected device
organisation of individual devices into addressable pipelines

`biologic`: Update driver to work with new framework

This should probably be done in 3 steps:

move kbio to a separate package which will be an optional dependency
rewrite the biologic driver, perhaps implementing command batching
test, test, test

Tagging @NukP and @edan-bainglass.

`ketchup status`: multiple jobids and format

When multiple jobs have been submitted, AiiDA may ask for the status of multiple jobs at the same time.
It would be nice if ketchup status could accept multiple jobids as argument: e.g.

ketchup status 1 2 3

which is equivalent to

ketchup status 1 && ketchup status 2 && ketchup status 3

Furthermore, maybe the output of this command could be rendered in a nicer format (also easier to parse), such as yaml-style:

- jobid: 1
  name: job-1
  status: q
  submitted: '2022-06-28 16:09:31.463749+00:00'
- jobid: 2
  name: job-2
  status: q
  submitted: '2022-06-28 16:09:31.463749+00:00'
- jobid: 3
  name: job-3
  status: q
  submitted: '2022-06-28 16:09:31.463749+00:00'

Feel free to add any other scheduler information that could be useful to retrieve for debug purposes (e.g. pipeline used, ...).
Finally another useful feature would be to be able to see the list of all jobs (including completed ones). E.g. with a command like:

ketchup status queue -a

Dummy driver: multi-step method results

With the dummy driver, the results returned from a multi-step method are not split into steps but are concatenated into a single step.

Example payload:

version: "0.1"
sample:
    name: fake_sample
    capacity: 1.0
method:
  - device: "worker"
    technique: "random"
    time: 35
    delay: 2
  - device: "worker"
    technique: "random"
    time: 20
    delay: 1

The output json file contains 38 points assigned to a single step, instead of 18+20.

`queue`: avoid skipping jobs

The main loop checks the queue once per iteration, but checks the state of the matched pipelines multiple times per iteration. This may lead to jobs submitted later being executed before jobs submitted earlier.