joocer / cronicl Goto Github PK
View Code? Open in Web Editor NEWcronicl: data pipelines.
License: Apache License 2.0
cronicl: data pipelines.
License: Apache License 2.0
Build resilience by making the unit which records state disposable, state able to be recovered, e.g. from a database
abandon and alert on timeout
message as a traced attribute
any messages out of a stage get their traced attribute set to match the incoming message
traced messages are written out to a trace-log
Mean, Range, Unique Column
The handling for None is likely to have a mistake somewhere, although it shouldn't pass None's through, it makes the system brittle to essentially close down if it does accidentally see one.
Different environments may want different tracers, implement a FileTrace and a StackDriverTrace
Pipelines need a unique name/id
A pipeline manager needs to hold multiple pipelines
Reply queue, and every queue, needs to be named according to the pipeline id
Pipelines may need a trigger, such as a file watcher or a timer, to start the pipeline
Don't kill Reply, items should forward TERM so it flows through
Fanned out processes will cause some issues
replace with a 'virtual' pump, which kicks off any other nodes with no incoming nodes
the virtual pump, called by a rewritten 'execute'
execute tests if the incoming param is a generator or a value, if it's a value it calls inner_execute, if it's a generator, it iterates through the values calling inner_execute
execute sets the tracing on the messages
To ensure sensitive information is not published into logs
running is currently determined by there being no messages in queues, this will cause premature termination if there is a long running job with no waiting messages.
running should check for empty queues and the busy attribute.
rather than the raw NetworkX library, wrap to make easier to use
Audits do not reconcile when stages are run in different threads
Create a BigQuery Sink
Collectors will collate information across multiple messages (for example to calculate the maximum) and then emit the result on an EMIT message.
Collectors should reset their counters on a RESET message.
I made a typo, the error message wasn't helpful as it encouraged me to look for 'function' attributes when it was a typo in the node name that was the problem.
call should wrap the execute in a try block, count the exceptions and add exception count to the sensor list.
have an error bin for errors to be sent to
The scheduler is what allows multiple jobs to run, single, fixed life pipelines are unlikely to need APIs to manage
If an operation such as reading a file fails (because it's still being written) the trigger should gracefully handle and retry
will make grouping of trace logs for message flows easier.
either have an instantiation ID or track the ultimate parent's ID
A pump that watches for a file to be created and stop being written to before triggering the start of a pipeline
to enable better tracing, tag fields and highlight when the field is changed or referenced in the code of the operation
as part of live debugging, there may be a step which requires more information
rather than have a synchronous control flow, have each step pick up and drop off to a queue, this is a step toward enabling stages to run in different threads, processes or containers
On error
On completion
On start
This can then be used to catch slow downs or exceptions
Create a GCP PubSub Sink
Initially just report status via a RESTful API
Number of items in each queue
The number of items passed through each stage
Proposed New Validator Schema
{
"field_name": {
"type": "type",
"expression": "regex",
"range": (min, max)
}
}
Use as a basis to prioritize stages when the pipeline gets congested;
Prefer stages with lower in:out ratios (reduce the number of records, or create few new records)
When there are matching ratios, prefer stages toward the end of the flow
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.