goldilox's People
Forkers
ilonatzgoldilox's Issues
Finish docs
Typos
Vaex First
Typo: "Vaex is an open-soruce..."
Fix: "Vaex is an open-source"
Typo: "...to allow the extreme flexibility for advance pipeline solutions..."
Fix: "...to allow the extreme flexibility for advanced pipeline solutions..."
Best Practices
Add columns!
Typo: "...for every value tou would want..."
Fix: "...for every value you would want..."
Typo: "...which explain the XGBoost prediction."
Fix: "...which explains the XGBoost prediction."
Typo: "..., prediciton with distance (for confidance) etc,."
Fix: "..., prediction with distance (for confidence) etc,."
DataFrames
Typo: "In production, this allow you do make sure you... and passthrough elements you..."
Fix: "In production, this allows you to make sure you... and pass through elements you..."
Big Data -> Vaex
Typo: "Vaex is excellent for big data - lazy evlaution...
Fix: "Vaex is excellent for big data -- lazy evaluation..."
Variables and description
Typo: "... - any constant you what the backend/frontend could query."
Fix: "... - any constant you want, the backend/frontend could query." -> assuming this is the sentence you were going for
Advance -> Fix: "Advanced"
Complicated pipelines
sklearn_vs_vaex_vs_pyspark.ipynb
Typo: "..., you should give her a rise!"
Fix: "..., you should give her a raise!"
Ensembles with LightGBM, XGBoost, and CatBoost
ensemble_example.ipynb
Typo: "Crazy ensmble logic example"
Fix: "Crazy ensemble logic example"
Data science examples
Vaex Skleran Predictor -> Vaex Sklearn Prediction
Typo: "The predictor can apply any skleran..."
Fix: "The predictor can applly any sklearn..."
LightGBM
lightgbm.ipynb
Typo: "Variebels and description"
Fix: "Variables and description"
Typo: "...which want to assosiate..."
Fix: "...which want to associate..."
Typo: "A greate place..."
Fix: "A great place..."
Add prefix signeture in bytes to validate model from bytes
Add the bytes 'Goldilox' at the beginning of the model bytes after save.
This will allow validating the model based on the first few bytes.
Build Sagemaker model
- Train
- Stream Inference
- Batch inference
- Serverless inference
References: - BYO containers
Add inference_steps
from_sklearn(..., inference_steps=None)
This will allow a pipeline for training, but removing some steps for inference
The main use for it is cleaning data in re-fit.
pipeline = Pipeline.from_sklearn(pipeline, inference_steps=[1,2, 5])
pipeline.fit(X, y) # uses all steps
inference(X, y) # uses only steps [1,2,5]
Add MLFlowPipeline
Implement a general MLFlowPipeline
and from_mlflow
.
- Much work.
- Good for completeness.
- Might be very complicated.
Add all params to gunicorn
in gl serve <path> param1=param2
.
Make sure all parameters move correctly with gunicorn.
export skops
Implement 'export_skops'
reference
Add azure blob to save/load pipelines and cli
Add gcpfs for save/load and cli
Validate automatically on from_*
When we run form_vaex()
, or from_sklearn()
.
Run self.validate() before returning.
Add Polars Pipeline
Polars lazy dataframe can be serialised.
Pseudo idea
- We need to serialize a lazy frame into a "state"
- Remove "input" and replace with some Special token.
- Load new data, take it's input, insert to the "state".
- Unclear how to find the exact location as it can be adjusted.
- Might deal with selections - remove by default or keep.
result = pl.LazyFrame.read_json(io.BytesIO(json.dumps(state).encode())).lazy()
It could work in theory.
Add from_onnx
Implement a OnnxPipeline
with from_onnx
.
- Might be a lot of work for no value.
- Great for completeness
- Allow pyspark pipelines.
update all notebooks to glx for cLI
Auto branch and repo added at creation
Add the branch and repo automatically.
Add lineapy integration
Add glx build
Add a glx build wich build a docker image for you
glx build <pipeline path> <image_name>
Use logger instead printing for validate()
Should apply this for every printing in the project.
Docs and examples
- Readme
- readthedocs
- Notebook examples
- LightGBM
- XGBoost
- CatBoost
- SKlearn
- Advance LightGBM
- Advance Sklearn
- Keras (deep learning)
- Tensorflow (deep learning)
- Pytorch (deep learning)
- Mxnet (deep learning)
- nmslib (Nearest neighbors)
- hnswlib (Nearest neighbors)
- Faiss (Nearest neighbors)
- sklearn nearest neighbors (Nearest neighbors)
- implicit (recommender)
- Intercept (general)
- Flaml (automl)
- auto-sklearn (automl)
- auto-keras (automl, deep learning)
- River (online leraning)
Add description to the pipeline
Add a description attribute
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.