Code Monkey home page Code Monkey logo

optimization-methods-for-tuning-data-pipelines's Introduction

Optimization Methods for Tuning Data Pipelines

Code connected with the master's thesis "Optimization Methods for Tuning Data Pipelines" by Davide Pietrasanta.

Folders

Check if you have something like this

lib
├── data                        # Store data
│   ├─ dataset                  # Store datasets
│   ├─ metafeatures             # Store metafeatures
│   └─ model                    # Store trained ML models
├── images                      # Images for presentations, README, etc.
├── other                       # Script or Notebook related to the thesis or to the plots
├── src                         # Actual code
│   ├─ test                     # Test code
│   ├─ utils                    # General utility code
│   ├─ exceptions.py            # To handle custom exceptions
|   └─ config.py                # Common knowledge for the project
|── main.py
|── requirements.txt
|── setup.py
|── test.py                     # To test all
└── Tutorial.ipynb              # Simple notebook tutorial

Install & Setup

Go in the /PATH_TO_PROJECT/Optimization-Methods-for-Tuning-Data-Pipelines/ and run:

virtualenv venv
source venv/local/bin/activate # or source venv/bin/activate

pip install -r requirements.txt
pip install -e .

If running on a pc with high number of possible jobs it's suggested to run the following command to avoid BLAS : Program is Terminated. Because you tried to allocate too many memory regions error.

export OMP_NUM_THREADS=1

Execution

Run with

python3 main.py

To better understand how to use the framework you can consult the Tutorial.ipynb file.

Test

Test all with

python3 test.py

Code quality

To check the code quality with Pylint

pylint $(git ls-files '*.py') > code-quality.txt

Pipeline map

We want to give the users the opportunity to test their ideas or let the machine do it.

Pipeline map

Delta

We want to be able to predict the delta performance, i.e. the difference between the performances obtained with and without preprocessing.

This will be the output of the meta-learner.

Delta

Training Dataset Map

A simple scheme on how the dataset used for the training of the Meta-learner is created.

Dataset map

Mind Map

From the pre-processed and raw data, metafeatures are extracted. A ML-model is then executed on both in order to collect the performances. Delta between performances and metafeatures is calculated so that we can train the meta-learner.

Mind map

Dependency

File dependency of the project.

Dependency

optimization-methods-for-tuning-data-pipelines's People

Contributors

davidepietrasanta avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.