Also known as GH2, this repository hosts the software part of my semester project at SDU Mechatronics, Fall 2023. The motivation behind it can be found here. In short, it is an AI-powered management system for green hydrogen production that uses only electricity from wind farms and national grid.
The software architecture design is descsribed here, while mechanical, electrical & electronics design, and test bench results are not yet disclosed.
-
Clone the source code
git clone https://dagshub.com/hieudtrung/mlo.git \ cd mlo
-
Create a virtual environment by using either
conda
ormamba
conda install -f conda_env.yaml
-
(optional) Setup DagsHub for experiment tracking & data versioning
# MLFlow with DagsHub Experiment as host export MLFLOW_TRACKING_URI=https://dagshub.com/hieudtrung/green-hydrogen-gh2.mlflow \ export MLFLOW_TRACKING_USERNAME=<your_username> \ export MLFLOW_TRACKING_PASSWORD=<your_password> # DVC with DagsHub as remote storage dvc remote add origin https://dagshub.com/hieudtrung/green-hydrogen-gh2.dvc dvc remote modify origin --local auth basic dvc remote modify origin --local user hieudtrung dvc remote modify origin --local password <your_token> # DVC with MinIO as remote storage dvc remote add origin s3://dvc dvc remote modify origin endpointurl s3://gh2-emu-trials dvc remote modify origin --local access_key_id <your_token> \ dvc remote modify origin --local password secret_access_key <your_token>
-
(optional) CICD with Github Actions
Retraining our model with newer data is a tedious task which can be automated. First, we follow this guide to sync DagsHub repo with Github.
Then, create a Github Actions config so that any code update triggers CI pipelines to run.
A Docker-compose file is also available for you to self-host on your own PC. Note that it is not fully tested.
There are many use cases whose sequence diagrams will be uploaded in this OneDrive folder.
This picture reveals the overall system's architecture that is Azure-native. For more details about each service, please look at their corresponding README.
Regarding data management, I also have a self-hosted solution on my homelab cluster using Delta Lake, Apache Spark, and Kubeflow. I'll keep it update once everything is tested properly.
Please keep in mind, the public source code is designed to work with Azure services. On-premise deployment would require a lot of modification, thus it's not recommended.
This work is published under MIT license as a showcase of my skills. If you have any issue or update requirement, please log an issue. Feel free to fork, redistribute, or use as your own good.