Code Monkey home page Code Monkey logo

sagemaker-rightline's People

Contributors

dependabot[bot] avatar dipanjank avatar stiebels avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

azizhamza-code

sagemaker-rightline's Issues

Add support for TuningStep

Currently, TuningStep is not supported by Validations. We want to change this.

  1. Analyse what currently existent Validation subclasses are relevant for TuningStep.
  2. Add compatibility for TuningStep to these Validations.

Set up publishing workflow to pypi

Let's introduce standard github workflow setup for python projects

  • undo poetry, use pip + pyproject.toml
  • run tests on PR to main
  • create a Release on merge to main
  • release workflow publishes package to pypi

Verify unique job i/o names

Hi, interesting package

I just got:

ClientError: Failed to invoke sagemaker:CreateProcessingJob. Error Details: Input and Output names must be globally unique: [data]

when creating a pipeline in the last step in my pipeline since two inputs/outputs have the same name. -.-

Asserting this should also be possible and a meaningful usecase for your package? I was not sure if StepInputsAsExpected covered it already, since I am not familiar with the package. If needed, I can provide more context.

Add further validations

Placeholder issue for high-level ideas for additional Validations:

  • Check if NetworkConfig set and correct
  • Check if LambdaFunctions referenced in LambdaSteps indeed exist
  • check if instance type exists
  • check if S3 paths referenced in ProcessingInput and ProcessingOutput exist
  • check if role exists
  • check if pipeline with THAT role can read/write from/to THESE paths

Automate Release Creation

Currently we have to create a release manually using the Github release UI.

This can be automated using python-semantic-release

Workflow Changes Required

  • Merge to main will trigger additional steps / job to pip install python-semantic-release and execute semantic-release publish. This will create a new tag + GitHun release. python-semantic-release infers the next version from PR title or commit message using conventional commits scheme
  • Trigger Existing Release Workflow. The current version of the release can be fetched using
RELEASE_VERSION="v$(semantic-release print-version --current)"
git switch --detach ${RELEASE_VERSION} 

Redesign the lookup of attribute of the Pipeline object

Problem Statement
The general path parsing, attribute lookup and filter functionality is convoluted and requires re-design:

Target State
The overall objective is to make the implementation of new Validations as easy as possible by providing developers and users a robust and general functionality to search and filter for the desired attribute (value).

More concretely:

  • Generalized lookup functionality that can be used across Validations to find appropriate attributes of any object.
  • Should allow for filtering based on user input:
    • indices, e.g. if an attribute is a list and the user wants to fetch the first item of that attribute/list
    • value of attributes, e.g. if an attribute is a dict or other complex object

Make the name of the Validation producing a ValidationResult an attribute of ValidationResult

Problem Statement
At present, every Validation object's run method returns a dict of the form str: ValidationResult where str is the name of the Validation producing that result. See here, for example.

This is a remainder of a previous implementation where run returned a list of varying lenth of ValidationResult objects. Now, the cardinality of Validation:ValidationResult is 1:1, which is why simplifying to storing the value of Validation.name in an attribute of ValidationResult may make more sense. Besides simplification of the data structures, it'd also allow for simplifying the Report.to_df method.

Target State

  • A ValidationResult object stores the name of the Validation that it was generated by.
  • The Validation.run methods are updated accordingly.
  • Report.to_df is refactored to handle the different data structure appropriately.
  • Required down-stream changes are made.
  • Required adjustments to unit tests are made.

Pre-commit hook for pipeline checks

I see you are already using pre-commit hooks in this repo, but you could also expose one. I could help with that (at least conceptually), I have contributed to pre-commit (top ten contributor) and maintain R hooks myself. The idea would be to run the validations on every commit. Folowing sagemaker projects, the user could supply a module that contains a functinon get_pipeline() that returns the pipeline.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.