One of the most tricky and time taking task in a Machine Learning project is to continuously tweak the hyper-parameters until the desired accuracy is acquired and certainly it is one of the reason that many Machine Learning projects fail and are never put into production.But with the integration of ML with DevOps it can be mended up to an extent, as using automation tools we can save lots of time taken by those taken by manual operations.
In this Article i am going to explain my MLOPS project which uses Jenkins as an Automation tool integrated with my Machine Learning model which uses mnist dataset and as per requirement can change the Architecture of model as per requirement and continuously monitors the environment for its fail-safe without user intervention.
Tools and Technologies used in this project:-
1.Git 2.GitHub 3. RHEL 8(VM) 4.Jenkins 5.Docker
Overall extract of my project:-
- Creating a container image that has python3 and all the libraries required for ML model.This container will be used to launch an environment to deploy the Machine learning model.
- Create a job of chain project1_step1, project1_step2, project1_step3, project1_step4, project1_step5 and project1_step6 using build pipeline plugin in Jenkins.
- The Job project1_step1 push the code to the GitHub it is pulled automatically by the Jenkins using the build triggers and is copied to root directory of RHEL 8.
- The Job project1_step2 by looking at the code automatically starts the required container image providing the environment to deploy and train the code (In my case it will launch container containing all the pre-requisite libraries for CNN).
- Job project1_step3 trains the model and predicts the accuracy of the model.
- Job project1_step4 checks if the desired accuracy(In my case it is 95+) is not reached then tweaks the model to get it.
- Job project1_step5 retrains the model an notifies the developer by sending email that the desired accuracy is reached and the best model is created.
- Job project1_step6 is for monitoring in case the container fails due to any reason this job automatically start the container again from where the last trained model left.
This picture shows a Dockerfile and to build this into docker image following commands: docker build -t --name container_name:version /root/foldername/
Pull the code from the GitHub using Poll SCM build trigger and copy it to the root directory using shell execute.
The Job project1_step2 by looking at the code automatically starts the required container image providing the environment to deploy and train the code (In my case it will launch container containing all the pre-requisite libraries for CNN).
Job project1_step3 trains the model and predicts the accuracy of the model.
Code for the reference can be found in my GitHub repo:-
https://github.com/Abhishek2019singh/project1/blob/main/CNN.py
Here data.txt file stores accuracy of the model which will be used to compare with the desired accuracy and further actions will be taken.
Job project1_step4 checks if the desired accuracy(In my case it is 95+) is reached or not then tweaks the model to get it.
Here i have used various Linux commands to compare the accuracy and tweak the model as per requirement it can be done by python code but to make it simple i had used Linux commands instead.
Job project1_step5 retrains the model an notifies the developer by sending email that the desired accuracy is reached and the best model is created.
For sending mail i have used python code and the received mail after best trained model for reference is attached below:-
Job project_step6 is for monitoring in case the container fails due to any reason this job automatically start the container again from where the last trained model left.
I have used build periodically trigger to monitor the environment for any fail-safe which checks it every minute and in case of failure starts the container automatically.
At last the complete build pipeline view of my project :