The chefkoch from ems-tu-ilmenau

Test Cases weiter überarbeiten

aussagekräftige Subtest Nachrichten, eventuell sogar mit Parametern
data einmal erzeugen und dann in jedem Schritt nur leicht verändern
dazu Tupel (Änderung, Subtest-Nachricht) anlegen und dann drüber loopen, wenn möglich
keine Sachen doppelt testen (nicht jedes mal mit einem JSON string anfangen etc.)
nicht zu viele asserts in einem Test, eher mehr subtests

[1] https://gitlab.tu-ilmenau.de/FakEI/InIT/it-ems/Projects/SigMaSense/python_talk/blob/master/2019_11_28_git_submodule_and_unittests/unittests.md

[2] https://gitlab.tu-ilmenau.de/FakEI/InIT/it-ems/Projects/SigMaSense/python_talk/blob/master/2020_01_17_unittests_advanced/unittests_advanced.md

[3] https://www.caktusgroup.com/blog/2017/05/29/subtests-are-best/

Polish the documentation

It should be

complete
correct
in sphinx-numpy format, which is:

This function does bla.

Parameters
------------
boo (type):
    This parameter is used for bees.

Returns
--------
A magnificant value of type blub.

Raises
-------
BlobError:
    If there is a blob forming that swallows everything along the way.

Define fomat for the config file

Definition of the implementation skeletons for the Data structures

Classes and objects for recipe, step, flavour and everything else that will be entered by the user (kitchenfile and cheffile if existing)
documentation of said data structures in sphinx

CLI Befehl "chef execute <<Simulationsschritt>>.py" implementieren

Es soll möglich sein von der Command Line aus eine Python Datei auszuführen, die folgendes Format hat:

Input ist ein Dictionary. In diesem sind die eventuell einzelnen Inputvariablen und -werte enthalten.
Output ist ein Dictionary, wieder {"output_variablen_name": Wert}.
Es enthält eine Function mit dem Namen def execute(**args), die keine weiteren Parameter abgesehen von dem Input-Dictionary nimmt.

Für die Ausführung ist folgender Plan vorgesehen:

chef erstellt eine temporäre Datei mit allen bisher diesen Schritt beeinflussenden Parametern als .json
Aus dem Recipe wird das Mapping Funktionsargument & Namespaceeintrag bestimmt
CLI: chef execute $step $mapping
bsp.: chef execute step.py a=var_a b=var_c c=bla.blub.d
Hashing der Eingabewerte wird durchgeführt
chef benennt temporäre Parameter- und Ergebnisdatei mit dem erzeugten Hashwert um. Verlinkt evt. auf andere Ergebisse. (Zur Not im Wiki in der BA nachlesen.)

Recipe Modul überarbeiten

umbenennen zu etwas wie Core oder Backbone oder JSON-Parsing
Logs, Errornachrichten und Exceptions prüfen und gegenlesen
- keine except Exception: sondern immer den Typ und die Errornachricht mit abfangen
eventuell abstrakte Klassen verwenden, um Methoden aus Recipe und Flavour zu vereinheitlich
Kommentare sollten sinnvoll sein
Parsing darauf anpassen, dass die Nutzereingaben durch YML Datein geschehen sollen, nicht mehr durch JSON, pyyaml kann parsing und übernimmt security features?
alle Objekte sollen eine print Methode bekommen, so dass print(node) oder auch print("bla " + node + " blub.) oder logger.debug("bla " + node + " blub.) funktionieren. Falls das nicht geht, sollte die Methode zumindest eine Klassenmethode sein, z.b. apple_pie_recipe.print()
Recipe soll von list erben, so dass recipe["node1"] anstelle von recipe.nodes["node1"] verwendet werden kann
jsonToRecipe(jsonData) sollte der Konstruktor der Klasse Recipe sein und Integrity Checks automatisch aufrufen

Der PyCodeStyle-Check und der Code-Formatter "black" sollen sich nicht widersprechen

Bis jetzt kann black von der Travis Continuous Integration im Makefile nicht verwendet werden, da make black in einer Aufzählung der Form:

somelist = [
    {"smells": "like", "teen": "spirit", },
    {"has": "a", "funny": "story", },
]

Leerzeichen hinter dem letzen Komma weg nimmt, PyCodeStyle aber einen "missing whitespace after comma" Fehler schmeißt und die Continuous Integration Tests dann einen merge blockieren.
Aufgabe: Recherchieren, welche PyCodeStyle Regel das ist oder welche Regel bei black und eine von beiden ausschließen.

Refactor HyperItem -> Shelf

Rename the class HyperItem to FridgeShelf` throughout the whole package

[ x ] Class definition
[ x ] Class References
[ x ] Mentions in Wiki
[ ] Mentions in docs

Overview over usecases

Meet with Willi and other colleagues to collect use cases to help varify the architecture.

Define format for dependency file

Full Specification of Features available (what can the software do)

Compare requirements from bachelor thesis with features needed for this specific prototype to make a new requirements list. Specify inputs and wanted outputs or a measureable goal for every feature.

In die README aufnehmen, wie man chefkoch startet.

Nicht intuitiv bisher.

Full Specification of frontend functions available (which CLI subcommands, behave how)

Full implementation of the Data structures according to specification

sort and build overview over specified data structures
implementation of specified data structures
- open new issues accordingly
full test coverage of the main data structures

Functionality of the fridge

hash objects
store intermediate results correctly
find results by function and used parameters
write test cases for those functions

Inhalte der Config

Dieses Issue sammelt die Inhalte, die in die Config aufgenommen werden müssen:

Option, dass im Flavour File Datein direkt als Dateipfad hinterlegt werden können, statt mit {"type": "file", "path": "some/random/path/file.json"}, dafür werden im Flavour nie Strings verwendet.
Logging Level (Debug, Info, Warn, Error, Crit)
Option, die Logs, die beim Ausführen eines Steps entstehen, auch in eine festgelegte Datei zu speichern.

Investigate on best-practices regarding tarballs and temporary files in python

In chefkoch we would like to support calling shell scripts, too and these often tend to work on multiple files or directories. For a variety of reasons it is desireable to handle these situations with containers:

thousands or millions of files on a volume -- or even a single directory -- clutter systems fast. We should be very thoughtful with inode resource usage. Often it is better to have 1000 tarballs than to have 1000 directories with 13 files each
A container is easily hashable (since it is just a file) and equally verifyable
A container is less prone to loosing one file (i.e. from an unintended purge of certain filetypes in a directory level)

Tarballs are a de-facto standard in unix. They provide no compression (unless combined with gzip to produce the infamous .tar.gz), but retain user permissions and even ACLs. Also, the no-compression constraint is actually a feature when it comes to performance. Adding the wide support for tar, this seems to be the natural choice for container format. However, feel free to suggest alternatives, if you come across one.

This issue shall:

Investigate and document on how to create, test and extract tar archives from within python
The solution should add as few additional package dependencies as possible
The solution should avoid security issues (related to #52)

Show example functions for

Packing a folder to a tarball: pack(tarball, *files), where tarball is the name of the resulting archive and files a list of files (optionally: should support globbing)
Extracting a tarball to a given folder: unpack(tarball, destination), where tarball is the name of the archive to be unpacked and destination is where the files shall be unpacked
Testing a tarball for consistency: test(tarball), where tarball is the name of the archive to be tested. If errors are found, an exception shall be raised.
Creating (and deleting) temporary folders or files from within python. If possible and available, these shall be created in a designated temp-area. Find out, whether this is readily possible using python-means, but don't sink too much energy in the temp-area thing

Create command line tool `chef`

Tasks

Implement the skeleton of the fridge (cache) class

Implement the skeleton of the fridge (cache) class
Document said skeleton in sphinx
Have test cases to check for correct funtionality of the fridge (including hash collisions)
Implement the fridge
Include the fridge into the recipe execution path

Investigate on dynamic importer handling in Python

Find out how to compile any python file (given by filename) into a python compiled bytecode container (.pyc) and how to import these .pyc (that may also lay outside a module) into the python namespace as an object.

There shall be functions that

Compile a python file
Import python scripts and compiled objects
check if a file contains a python module or a compiled bytecode object

Functionality to execute a single step

execute a single step from the chefkoch backend (within the same process for now, no cache, no concurrency)
provide built-in steps like collect as soon as cache is implemented
test cases

Run the demo example

Or change it so that it runs and adjust the definition of the files that are entered by the user.

Formate für einzugebende Datein in der sphinx-Doku klären

Entsprechend der im "sophie"-Branch umgesetzen Klassen sollen die Eingabeformate für die

"Recipe"- und
"Flavour"-Datei spezifiziert werden. Auch für
Parameter in der Flavour-Datei und für das Eingeben von
"Steps", d.h. Simulationsschritten, soll das Format geklärt werden.
Folgende Änderungen müssen noch im Code umgesetzt und mit dokumentiert werden:
Es sollen keine direkten Werte im Recipe als Inputs von Simulationsschritten zugelassen werden. Stattdessen stehen Verweise auf Parameternamen, die dann in der Flavour-Datei eingegeben werden.
Die Verweise auf Parameter im Flavour-File sollen nicht mit flavour.parametername, sonder nur mit parametername geschehen. Die Recipe-Klasse schaut, ob der Name im Flavour-File vorhanden ist und interpretiert ihn entsprechend.
Es soll möglich sein, mehrere Flavour-Datein einzulesen. Ihr Inhalt wird in das gleiche (dann größere) flavour Objekt geparst.
Sub-Recipes als Simulationsschritte sind noch nicht implementiert, dennoch soll das zunächst zugelassen und mit einer "Not yet implemented" - Warning beantwortet werden.
Am Anfang der einzugebenden Datein soll bald ein Magic Comment (vgl. PEP 263) stehen, der die kompatiblen Versionen von chefkoch enthält

Eine Graph Library für das Recipe verwenden

Recipe soll von list erben, so dass man nicht jedes mal recipe.nodes schreiben muss, sondern recipe[0] verwenden kann und man nodes per Index adressieren kann.
Auf den Anhängigkeiten im Recipe wird Tiefensuche verwendet, um sicher zu stellen, dass es keine Abhängigkeits-Kreise gibt. Diese Tiefensuche ist bisher selbst implementiert und nicht besonders effizient. Stattdessen soll der Inhalt des Recipes als gerichteter Graph dargestellt werden und die Graph-Library eigene Tiefensuche oder direkt kreisfreiheit verwendet werden, die vermutlich effizienter und im Code definitiv leichter zu lesen ist. Später bei der Ausführung des recipes wird das ebenfalls nützlich.

Make sure all components work together smoothly

...and fix the issues found along the way.

Add ready UML diagram to wiki

and explain it to Johannes and Willi

Implementation of all functions required to check for workflow consistency

collect all functions required
implement them
and write test cases!!!! O^O

Pass environment variables to recipes

Since I have not studied the structure of the project intensively, please forgive me my incorrect use of your terminology – I am a bad cook.

Some python packages parse certain shell environment variables in order to change their behaviour, like logging levels and so on. It would be nice, if one could execute certain recipes with certain shell variables.

Complete concept of parameter handling

a.k.a Consistent handling of Command line (frontend) and call (backend) Arguments

The user interface for chefkoch is a command line tool, therefore information like "What is the workflow of the simulation?" need to be entered as command line arguments, for example the file path of the recipe. Thise configuration information might be collected in a "cheffile", that holds default file paths for the recipe, flavour file, the cache folder etc. The "kitchenfile" might specify which hardware can be used to execute a simulation, for example "How many processes can be run in parallel?", "How many cores are available?" or "Should the compute cluster be used for execution?"

Still, this first prototype does not use concurrency.

Should the "kitchen file" be planned and "emptily" included now or is it irrelevant for now?
Is the "cheffile" needed or unnecessary overhead?
Which information should be included in the "cheffile"?
How is this information given to the backend of the chefkoch? Central config file? Minimal database?

Investigate on security-aware interfacing to shell commands

Arbitrary code execution allows the very bad situation that a "digital perpetrator" might intentionally run malicious code in situations that were not intented by the programmer for code execution. We should do our best to avoid that chefkoch offers much opportunity for this.

Some examples, where things like this could be introduced:

sys.call
Import (Interpretation) of YAML files
Anything eval()-related.

While it is sufficient to avoid those functions, or to avoid calling them with arbitrary (user-defineable, also partially) strings in the most cases, sometime one just can't get around having to use them.

In this issue, the investigation shall conclude about

What ways exist to have safe variants for the given three examples?
How should a safe call look like?

Please document your findings in the discussion to this issue

Define concrete issues from UML diagram

Magic Comment (PEP 263) am Anfang der einzugebenden Datein festlegen

In die Comment Section der einzugebenden Datein soll nach Möglichkeit ein Kommentar die kompatiblen Chefkoch-Versionen vermerken und diese soll beim Einlesen geprüft werden. In JSON-Datein sind eventuell keine Kommentare möglich. In den Python-Datein, die die Simulationsschritte enthalten, müssen sie aber unbedingt rein.

Investigate on detecting names of function arguments

FInd out how to detet names of function arguments (key-worded arguments) of python functions.

def aFunction(bla=123, blub='abc'):
    pass

By using Python's inspect capabilities, a function shall be generated that returns ['bla', 'blub'] for the given example, i.e. the names of the keyworded arguments of an arbitrary function.

Final result: A demonstration script is created that contains the function and exhibits correct functionality. Additional capabilities and interesting notes about the inspection framework are documented in the discussion to this issue.

Implementation of the recipe execution planning & execution

First: Someone do the step execution

Implementation of recipy execution planning (schedule). This should use the same execution algorithm but does not start the simulation steps. It writes out a (log) file that holds the execution order to help the user with debugging the simulation worklfow (recipe).
Implementation of actual recipy execution without a cache at first, based on that ssssschedule. (We're using Parseltongue, don't we? ;) )
HINT: Within the same process for now, no cache, no concurrency

Define format for the flavour file

Demo example

Meet with Georg and Seb to build a demo example to varify the architecture with and to run the prototype with.

Funktion, die Recipe-Werte im Flavour-Objekt nachschauen kann

Im Recipe sind als Inputs für manche Simulationsschritte Flavour-Parameter angegeben. Bisher werden sie mit {"input1": "flavour.parametername"} angegeben. Jetzt sollen sie nur noch als {"input1": "parametername"} angegeben werden. Bei der Überprüfung des Recipes inputIntegrity() soll geprüft werden, ob ein Parametername im Flavour Objekt vorhanden ist und falls ja als korrekt angesehen werden, statt dass wie bisher nur auf das Präfix "flavour." geachtet wird.

Polish the wiki

It should

have a structure
have a hierachy within the pages and a helpful navbar
be understandable and useful
be in English

Full Specification of data formats for the definitions of: Recipes Steps Flavours

Recipe found in sphinx documentation, weak spot: how are steps entered?
Flavour file in sphinx documentation
Steps:
- How can steps be entered?
- Where are they saved?
- How can they be executed

Look up step definition in bachelor thesis and evaluate if it makes sense for this prototype. Also give definition of CLI command, idea of execution and specification.

Flavour Methoden testen und verbessern

Test cases für alle Flavour Parsing Funktionen
Input Integrity Checks
Wenn mehrere Flavour Files eingegeben werden, sollen sie in das gleiche Objekt gespeichert werden
Logs, Errornachrichten und Exceptions ordentlich machen
- so lange testen bis alle möglichen Exceptions durchgespielt sind und Exceptions dann mit Typ abfangen, nicht "except Exception as exc:" sondern eher nichts, dann fällt eine unerwartete Exception vllt noch auf.
- Wenn beim Kontruktor FileParamValue(entry["file"], entry["key"]) der key, als das Passwort zur Datei fehlt, soll weiter gemacht werden. Wenn das File fehlt, soll das Parsen des Flavours abgebrochen werden.
- Abbrechen, wenn kritische Sachen in den Eingaben fehlen statt Exception printen
in der Flavour-JSON {.., "type": "mat-file"} zu {.., "type": "file"} ändern

Check the content in the fridge for correctness

Look up thesis and decide for a way to implement the consistency check and which other checks are important.
Specify those functions.
document
implement
use test cases to help you with development

NameSpace Class

This issue collects requirements for a namespace class.

Recipe.inputIsValid() is a candidate for the name space class

Dask Tasks

Dear kitchen masters,

I don't have an issue (at least not with chefkoch), but have you had a look at

https://docs.dask.org/en/latest/futures.html ?

Maybe this is worth knowing for chefkoch?

Cheers
Seb

Problemen mit MAX_PATH vorbeugen

Die Zwischenergebnisse der Simulation sollen zukünftig in einem Ordner mit dem Namen des Zwischenergebnisses und einer Datei mit dem Hash aller verwendeter Inputs gespeichert werden. Dabei kann es passieren, dass der Name und Dateipfad länger werden als die System bedingte MAX_PATH Variable. Dafür soll eine Lösung gefunden werden oder beim Einlesen des Recipes eine Warnung ausgespuckt werden, falls die darin enthaltenen Namen zu lang sind.

Implementation of all functions related to reading a workflow

...which means

entering files and parsing them into the fitting data structures (reading recipe, steps, flavours, ...).
When the input of the user is successfully parsed into an object, there shall be a quite efficient way to look them up.
use test cases while developing

Sanity Check für Simulationsschritte

Die Simulationsschritte sollen als Python Datei hinterlegt werden, die ein Dictionary als Input nimmt (in dem dann alle tatsächlich benötigten Inputs drin stehen), ein Dictionary als Output raus gibt und eine Funktion execute(**args) definiert. Wenn der Speicherort der Simulationsschritte in die Console eingegeben wird, soll ein Sanity Check durchgeführt werden, der prüft, ob sie dieses Format haben und ob die Dictionaries mit ihren Key-Namen mit dem entsprechenden Schritt/Knoten im Recipe übereinstimmt.

Evaluate (and improve if necessary) the testacases

Good tests

cover nearly all the code existing
do not test one and the same thing over and over again
are named test_chefkochFunctionXY() and cover the chefkochFunctionXY completely with a subTest each for each test case
create an example of correct data and change only a line or two for each subTest instead of creating a huge bulk of similar data
more knowledge can be gained from Fabian Krieg

Hi Sophie,

sehr cool. Ich kann dich leider gerade nur auf ein paar
Zusammenschriebse verweisen, die wir mal für unsere Studenten
veranstaltet hatten [1,2]. Bei einem kurzen Blick in deine Tests fiel
mir auf den ersten Blick auf, dass da ein hardgecodeter Pfad stand und
dass du viele asserts untereinander hast. An der Stelle bieten sich
subtests an [1,3].

Du kannst ja mal schauen, ob du mit den Links etwas anfangen kannst.
Ansonsten frag einfach, wenn irgendwas ist! Und ich werde auch gerne,
sobald ich es schaffe, mal genauer schauen, was du da machst.


Viele Grüße

Fabian

[1]
https://gitlab.tu-ilmenau.de/FakEI/InIT/it-ems/Projects/SigMaSense/python_talk/blob/master/2019_11_28_git_submodule_and_unittests/unittests.md
[2]
https://gitlab.tu-ilmenau.de/FakEI/InIT/it-ems/Projects/SigMaSense/python_talk/blob/master/2020_01_17_unittests_advanced/unittests_advanced.md
[3]
https://www.caktusgroup.com/blog/2017/05/29/subtests-are-best/

Add all project related drawings to wiki

Full implementation of the CLI frontend that redirects to the corresponding backend functions

Goals:

There exists a user frontend that connects to the backend API of chefkoch
The backend API was implemented as a skeleton (not yet implemented) within chefkoch
The data structures are not yet implemented in chefkoch

ems-tu-ilmenau / chefkoch Goto Github PK

chefkoch's People

Contributors

Stargazers

Watchers

Forkers

chefkoch's Issues

Recommend Projects

Recommend Topics

Recommend Org