Code Monkey home page Code Monkey logo

process_tracker_python's People

Contributors

dependabot[bot] avatar opendataalex avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

process_tracker_python's Issues

Add memory and cpu status collectors

Need to further research adding collectors for memory and cpu utilization to further expand the basic capabilities build for performance cluster management in #10 .

Process Extracts in chronological order

Extracts should be processed in order of date registered. While it should be fine the way it's coded, need to enforce in the queries that return extracts back.

Process Extract Association

When ProcessTracker finds extracts need to have method to associate the process run to the extracts and change their status. Finders shouldn't necessarily do this by default though that would be an obvious place for it.

Handle Extract dependencies

Need to be able to handle the situation where an extract is dependent on another extract to be completed before it's children get processed.

S3 capabilities not configured correctly

Currently, process_tracker only handles AWS CLI addresses (s3://). Should also allow URL addresses that use the https method (https://.s3.amazonaws.com/). Need to modify all code working with s3 to use that.

Process Concurrency

For processes that don't require locking but should have some performance management as far as max concurrent runs.

Switch Unit Test suite to PyTest

Have been recommended to switch over to PyTest. This is to track the testing and investigation as well as make the switch if found worthwhile.

CLI Tool Process Status Change

The CLI tool should find the latest process run by given process name and be able to change it's status (provided it's not running or has failed). This would be ideal for handling processes that have gone into 'on hold' status.

Location Audit Info

As an extension to other audit information, we can also track information about locations:

  • number of files

Extract Location Names

Location Tracker needs to handle duplicate names gracefully. While names should still be unique, the code should try to catch the error. Currently the unique constraint on the database triggers the rollback, but that should be the last resort.

Add Extract contents getter

Quality of life improvement to add a function to get the file's contents. Question is how far does this go (i.e. what file types?) and should this be part of the framework? Wanted to at least make a note to give it further thought.

Add Dataset Object Type

Extracts should be associated to a dataset type. Instead of it's own object, maybe associate to Source?

Read config file from s3

Need to be able to read config file from s3 location. This is to support using tools like lambda.

Web User Interface

Need to build a user interface so that audits can be easily reviewed. Also should allow for management of Process Tracking framework.

CLI Cascade Delete?

Need to verify if the CLI tool performs a cascade delete when deleting lookup objects. Need to prevent it if it does.

Funky Calls On process_tracker Import

When importing process_tracker, lot's of stuff is being kicked off due to settings manager. Need to stop initializing certain things if they are only going to throw an error (or better yet, fix it so it does what it's supposed to without throwing the errors).

TravisCI Fails on Tagged Build

TravisCI fails on tagged build, but build is successful. This is due to a race condition that whichever database gets finished first will deploy instead of waiting for both to finish.

Package not packaged correctly

Found with initial beta release that package is not packaged correctly. There is more than just the core classes importable.

Enable CI/CD Stack

Once the initial version is ready to go, need to research and integrate CI/CD so that going forward new releases can be automatically tested, built, and loaded to PyPi.

Register extracts by location

Instead of registering each extract, a user may want to wait and register them in one go by location name and/or path.

Process Dependency handling

Need to add the ability for ProcessTracker to check for dependencies and if they are in a state to block the process from running.

Improve/fix logging

Need to go through entire project and add/fix/enhance logging capabilities so that logging works as expected. Need to also write out to a log file and not just console.

Extract Location Lookup Fix

Need to make find_ready_extracts_by_location easier to understand. Currently the variable location points to location_name and not the filepath. Need to provide the option for one or the other.

Extract Lookups should return Extract objects

Just realized that the lookups should return Extract objects, not filepaths + filename. That can be generated using repr or some other function. Otherwise have to do the lookups again to modify the record.

Process Cluster Assignment

Process needs to have the ability to be assigned to a specific cluster. That allows for resource allocation management.

Refactor data_store

Need to move data_store into utilities where it belongs. Also need to set configuration correctly and not just in verify_and_connect.

Write command line tool capabilities

Need to have a command line tool to be able to initialize process tracking data store and allow for adding default items (Actors, Tools, etc.).

Data Store Update Thru CLI

Need to have the ability to upgrade the data store thru the CLI tool. Also need to determine policy for when to deprecate an upgrade (we can't keep every upgrade in every version).

Audit Info for Extracts

#16

Idea came initially from working on other audit fields for process. Would be nice to have options for extracts as well. This one can be big though because we're starting to approach data profiling territory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.