tobiasraabe / pipeline Goto Github PK
View Code? Open in Web Editor NEWBuild system for the scientific publication process.
Home Page: https://pipeline-wp.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
Build system for the scientific publication process.
Home Page: https://pipeline-wp.readthedocs.io/
License: BSD 3-Clause "New" or "Revised" License
Currently, the complete configuration of the project (rendered content of .pipeline.yaml
is available in a task template. The following issues occur:
TypeError
is raised because _render_task_template
receives the same argument twice.Make a dictionary update before passing arguments to _render_task_template
. Task information should be able to overwrite the general configuraiton ...
OR ... keep the error because silently overwriting a rendered value might produce unexpected behavior.
I would prefer the latter.
Remove globals and document that the configuration is available in every task.
By default, commented lines in Jinja2 are achieved with {# ... #}
which is not normal for yaml and for Python files.
Switch to #
.
This will become easier if there exist environments for each set of templates because they can have their comment syntax and pipeline has its own.
.pipeline.yaml
cannot be rendered which is un-intuitive if you want to define custom paths.
data_directory: {{ source_directory }}/data
I would suggest that there are two passes to read the configuration.
This would solve issues like the following:
data_directory: {{ source_directory }}/data
{"source_directory": "src"}
to the config.{"data_directory": "src/data"}
.The second step could be repeated until even more nested expressions like
data_directory: {{ source_directory }}/data
soep_directory: {{ data_directory }}/soep
are rendered, too.
It is a common problem that some tasks in your project are very expensive to run and you do not want to accidentally overwrite them.
At the same time, pipeline
keeps track of many changes (rendered template, dependencies, targets) which trigger a new execution. For example, formatting your project with black
would trigger many re-runs.
Add a key to a task definition named persists: true
which skips a task as long as its targets exist. If not the task is re-run.
A user could either clean the whole project or selectively delete the tasks' targets.
Sometimes you find yourself inside the project directory and want to build the project. For that, you need to go back to the project root. Why not search upwards for a .pipeline.yaml
?
yaml
. Document it.yaml
which is not a task, handle it gracefully instead of crashing.Currently, the file in which the task is defined, some task.yaml
, is considered a dependency of the task. If it is changed because other tasks are defined, it also leads to a re-run of the unchanged task.
The idea behind this implementation was that you can pass variables to tasks which will then be used inside the template. If such a variable is changed, the task should also be re-run.
One of the former PRs added the rendered template to the task dependencies which will also include the change of the variable. So, this is completely unnecessary.
Remove the task definition from the list of task dependencies.
I have a project where I work with much data which can be stored in one big file or multiple smaller files. Computing the hashes of all these files for each task which depends on them takes a lot of time.
To be absolutely precise, pipeline needs to compute the hashes of all files. It is not even possible to take a short-cut if one of the hashes is not matching, because you need to update all hashes of all dependencies for this task.
pipeline is currently a chimera of two components.
To tear these two components apart, I propose a plug-in system similar to pytest or Flask which allows to add more functionality to pipeline. I see these two components.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.