Code Monkey home page Code Monkey logo

rettle's People

Contributors

slaterb1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rettle's Issues

Evaluate feasibility of running Fill operations on a schedule

Overview

Some brew jobs will be better implented as a running schedule. Currently when the Brewery goes out of scope, it sends the Terminate command to all workers. Need to evaluate what happens if Fill operation loops on schedule, will the brewery still go out of scope?

Test ideas in htttea

Update components to take Vec<Tea> instead of Tea

Overview

To be able to batch process events and benefit from Rusts "0 cost abstraction", the operations should occur on Arrays. This will enable the usage of .iter().map().collect().

The onus would still be on the developer and the Fill operation would specify the batch sizes (or Brewery?).

Tasks

  • update all components to take Vec
  • investigate batches on Fill or Brewery.

Update Steep exec function to work with optional passed parameters

Overview

Current Steep only accepts &Tea and a parameter. Need to investigate best way to pass any arguments (i.e. additional instructions, file paths field to target for update).

Research

  • generic function structure / wrapper

Results

There are a few mechanisms that could work:

*Leaning towards Argument trait method, and make the exec functions accept args as Option<>

Tasks

  • research generic function structure
  • create Argument trait (could be empty, aside from as_any?)
  • update exec function to selectively pass arguments

Setup input to pull data and push to Brewer working pool

Overview

Currently brew() is called with fixed brewer, that processes the tea with make_tea(). Going forwards, a channel needs to be setup and a brewer that is a member of the brewer pool can pick up the request.

Tasks

  • setup channel to send input data to queue
  • setup brewer pool to pull and process requests

Create example/main.rs file and minor cleanup

Overview

Currently all test examples are built and run from bin/main.rs. Project on release will not have this file. The code here needs to be moved to examples/main.rs. In addition some minor cleanup needs to happen to wrap up this project.

Tasks

  • move bin/main.rs to examples/main.rs
  • add documentation to all modules in project
  • update documentation to include links to other crates (place holder for now until everything is opensource + CI)
  • add LICENSE
  • add Contributing.md
  • check on changing Fn based traits to fn function references

Add Ingredient Skim

Overview

Skim Ingredient represents a job that only removes fields on a Tea struct or removes entire Tea struct if conditions are met. Logic for implementation would be the same as Steep (other crates would have to create the logic for remove fields matching or data matching).

Tasks

  • Add Ingredient to ingredient.rs
  • Add logic to make_tea in brewery.rs

[R&D] Investigate tokio Futures instead of channels for concurrency

Overview

The current implementation of the Brewery is based on channels, similar to the rustbook Chapter 20 multithreaded server example. It might be more efficient to use tokio job stealing architecture for processing the jobs as there are less blocking interactions (i.e. locking Mutex rx to receive message and run passed closure for iterating over the shared recipe).

Note: This ticket has been archived due to current state of channels achieving ~750ms avg on macbook and ~400ms avg on ubuntu. The advantages from Futures may be more complex then it is worth to change everything out.

Research

  • complications with implementing Futures with tokio
  • benchmarking processing 1,000,000 Tea objects

Create Pour Crate that writes current data object to file as struct

Overview

Data struct management is difficult. The tea objects need to be defined every step of the way. This is not a problem after the Fill step, but if the incoming data from an http request or db table is large (multiple fields), it would be easier to evaluate what the data types are and output the struct to a file (could use serde_json Value struct to help), to be used with the actual running of the pipeline.

Examples data from table

{
  "id": 1,
  "name": "test",
  "field1": 23.5,
  "field2": "2019-06-13",
  ...
  "field100": "some text"
}

struct written to file

use std::time:SystemTime;

struct Tea-aAEjifo4ji3850FS {
  id: i32,
  name: String,
  field1: f32,
  field2: SystemTime,
  ...
  field100: String
};

Tasks

  • experiment with serde_json
  • create Pour Ingredient to accomplish above to be included with the library

Add Transfuse Ingredient

Overview

Transfuse combines data from other sources. The original logic was that it would be placed after a series of Fill source ingredients and combine those objects, but the complexity is too high and alignment of data is not guaranteed. This Ingredient should instead uses fields on the Tea struct to lookup and fetch data from another system (or always pull in additional data that is stamped on all data structs) and add it to the current Tea.

Tasks

  • Add ingredient to ingredient.rs
  • Add logic to make_tea in brewery.rs

Update docs to explain that Tea needs to be overwritten when inherited

Overview

Tea structure needs to be simplified and only include the fields necessary. Need to consider creating Tea as a Trait which can be pulled into other libraries that manipulate data (or keep it a struct but give it a trait definition).

Successfully created Tea as trait, but need to investigate the following:

  • lifetimes of Box objects (Tea is now a Box holding teas with Trait: Tea)
  • adding new() to Tea Trait, error is that it cannot return Box<dyn Tea>

Tasks

  • decide if Trait of struct is better for Tea
  • simplify base "Tea" struct in library
  • investigate the above
  • update README
  • update Tea documentation

R&D

Box lifetimes: Did some research on Box lifetimes. Lifetimes of values 5, "some string", heap memory has 'static lifetimes that are defined to exist the entire length of the problem. BUT has the variables that contain these values go out of scope, the memory is reclaimed via the Drop trait. Therefore, memory concerns that I had surrounding 'static are not actually a problem.

[Tech Debt] Update brew method to take in data from all sources

Overview

The current test implementation only uses the first source in the Pot.sources Vec. This needs to be updated before release of library.

Tasks

  • update brew to handle multiple sources
  • add metadata object to Tea to know when and where the data was collected

The metadata piece is more complicated... I need to reconsider this in the future

Add `copy` function to tea to create exact copy of tea

Overview

Steep operation needs to manipulate a copy of the &ref tea passed in to pass back an edited Tea to be updated by the brewer. Currently Steep has to create an entirely new Tea object and define all fields, which will not scale going forwards

Future Considerations

  • evaluate changing tea objects that gain or lose fields (not the original Tea object copy)

Using clone gets around the issue of what fields to make copies of but more thought and work needs to go into defining the field structure of future Tea trait objects being manipulated (will open a ticket).

Tasks

- [ ] create copy trait on Tea to make exact copy of reference tea object passed

Deriving the Clone trait handled making a mutable copy of the tea object passed to be able to manipulate specific fields.

Create Ingredient crates

Overview

Before release, I would like to create the following crates to be used with the project:

Crates

  • Fill from LOG file
  • Fill from CSV file
  • Fill from ES
  • Pour to CSV file
  • Pour to ES

[R&D] Investigate impact of changing Tea trait data object

Overview

One limitation of having a strongly typed language, that needs to know the data structures at compile time, is that all transformations of the original data object that mutate a field type or add/remove a field on the struct needs to be defined as a separate struct. The intent of this ticket is to look into alternatives to get around this handicap or manage it sanely...

Research

  • possibility of creating a util tool that initializes all intermediary structs
  • methods for organizing location of all data struct transforms
  • look into other alternatives and data handling projects
  • investigate weakly typed Tea structs (add Trait to serde_json value struct)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.