This project demonstrates the implementation of data pipelines in an AI context. It utilizes DVC (Data Version Control) for data versioning and management. The pipelines are designed to efficiently process and transform data, enabling seamless integration with AI models and workflows. We will then use MLFlow to experiment track our experiments, and use it as a model store.
The requirements for the project are the following:
- python3.9+
- make command
To check make is correctly installed, type make --version
Start by running make--version
and python --version
to make sure you have all the prerequists.
- Run
make setup
- activate your environement :
- Windows: .\wenv\Scripts\activate
- Linux: ./venv/bin/activate
- Start developping !
PS: To check that you're on the right envrionnement, type python -m mlops_nba.main
.
Those command are targeting the mlops_nba folder and the configuration is here.
-
Code Quality: You can trigger those commands with
make check
.- Formatting with
black + isort
: To format usemake format
and check withmake black
andmake isort
forblack
andisort
respectively - type-checking with
mypy
: You can usemake mypy
to check the types and detect errors - Linting with
flake8 + pylint
: You can usemake flake8
andmake pylint
to lint your code usingflake8
andpylint
respectively.
- Formatting with
-
Tests:
- For testing we use
pytest
and target the tests in the mlops_nba usingmake test
- You can generate a coverage report using
make coverage
and a html version usingmake coverage-html
- Create a preprocessed stage having the aggreation of all currated data
- Implmenet data-quality for all data stages
- Add unit and integration tests for all pipelines (Github actions)
- For testing we use