Machine learning and Operation(MlOps)
Continuous delivery and automation pipelines in machine learning
Data science and ML are becoming core capabilities for solving complex real-world problems, transforming industries, and delivering value in all domains. Currently, the ingredients for applying effective ML are:
- Large datasets
- Inexpensive on-demand compute resources
- Specialized accelerators for ML on various cloud platforms
- Rapid advances in different ML research fields (such as computer vision, natural language understanding, and recommendations AI systems).
Why do 87% of data science projects never make it into production?
It is because many of the people just simply learn Machine learning but never implement. Maybe they try to implement but project drop due to lacks of DevOps concept. For this MlOps come in place. Machine learning operations (MLOps) is the use of machine learning models by development/operations (DevOps) teams. MLOps seeks to add discipline to the development and deployment of machine learning models by defining processes to make ML development more reliable and productive.
What are Hyper Parameters ??
In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training.
The value of the hyperparameter directly impacts the accuracy of the model.
Since these are set by us and not automatically decided by the machines, it becomes an absolute necessity to select the best values for hyperparameters which shall increase the overall accuracy of the model.
Article Overview
- Create a container image that has Python3 and Keras or NumPy installed using dockerfile.
- When we launch this image, it should automatically start to train the model in the container.
- Create a job chain of job1, job2, job3, job4 and job5 using the build pipeline plugin in Jenkins.
- Job1: Pull the Github repo automatically when some developers push the repo to Github.
- Job2: By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed interpreter install image container to deploy code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the software required for the CNN processing).
- Job3: Train your model and predict accuracy or metrics.
- Job4: if metrics accuracy is less than 80%, then tweak the machine learning model architecture.
- Job5: Retrain the model or notify that the best model is being created. Create One extra job
- Job 6 for monitor: If the container where the app is running. fails due to any reason then this job should automatically start the container again from where the last trained model left.
For doing the following tasks I automated the ML project that does the above job.
Job 1: Pull GitHub Code
When the developer will push any code to GitHub, this job will copy that code into the local repository in our system. For this, I have used Poll SCM to keep on checking the remote repository for any changes.
The above code will transfer the code files copied to Jenkins from GitHub into my local repository /home/amit/ws/mlops1 .
Job 2: See Code and Launch
It will do the following tasks :
1)Check whether the code (stored in program.py) is of CNN or not (checked using program checkcode.py)
2)If the code is of CNN , execute its container from its image(convoimage:v13) created using Dockerfile.
The checkcode.py as you can see below is an extremely simple code using the basic concept that any CNN model will definitely have 2 words in it to implement their modules which are Keras and conv2D. Thus if these words are present, the output of the program will be Keras CNN.
programfile = open('/home/amit/ws/mlops1/program.py','r')
code = programfile.read()
if 'keras' or 'tensorflow' in code:
if 'Conv2D' or 'Convolution' in code:
print('kerasCNN')
else:
print('not kerasCNN')
else:
print('not deep learning')
As you can see in the below image , I compared the output of the above program in Jenkins and launched my container using the image I created using the Docker File.
Here is my Docker File Code
The below code will run “python3 /mlops/program.py” as soon as the container is launched where /mlops/ is the path to access the code file program.py inside the docker container. The directory was created using the volume linking feature of docker as mlops folder is linked with the local repository stored in baseOS.
CMD [ "python3","/mlops/program.py" ]
The program.py code was actually of LeNet for MNIST dataset but I modified the layers part. The data of the layers Convolve and Fully Connected which is actually are hyperparameters are now input through a file ‘input.txt’.
convlayers = int(input())
first_layer_nfilter = int(input())
first_layer_filter_size = int(input())
first_layer_pool_size = int(input())
model.add(Conv2D(first_layer_nfilter, (first_layer_filter_size, first_layer_filter_size),
padding = "same",
input_shape = input_shape))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size = (first_layer_pool_size, first_layer_pool_size)))
#Subsequent CRP sets
for i in range(1,convlayers):
nfilters = int(input())
filter_size = int(input())
pool_size = int(input())
model.add(Conv2D(nfilters, (filter_size, filter_size),padding = "same"))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size = (pool_size, pool_size)))
# Fully connected layers (w/ RELU)
model.add(Flatten())
fc_input = int(input())
for i in range(0,fc_input):
no_neurons = int(input())
model.add(Dense(no_neurons))
model.add(
model.add(Activation("relu"))
I have done so because in the next executions of the code when there shall be a need to tweak the Hyper Parameters for improving accuracy, Jenkins shall simply run a program (tweaker.py) which shall change the contents of the input file without touching the main code file and the hyperparameters will change.
The program tweaker.py for me is the soul of this setup and it is from where I began this entire project. In Job 4, there is a proper explanation of what it actually does.
Job 3: Predict Accuracy
The task is done by it very simple that is the accuracy along with the setup is deployed on the Apache Web Server so that user can directly access and see it using the following URL :
IPofSystem/display_matter.html
sudo cp /home/amit/ws/mlops1/display_matter.html /var/www/html
Just write the above code in the Execute Shell of Jenkins job 3.
Job 4: Analyse Accuracy and move
This job performs the following tasks :
1)Checks accuracy, if accuracy is less than required, then tweak the code using program tweaker.py and again start job 2 i.e. see code and launch to start the container and run the model once again.
2)If accuracy requirement is met, call job 5 i.e. model to create success
Now let's see how tweaker.py tweaks the code…
When tweaker.py is called, it compares the accuracies old (initially 0) and new (gained from running the container). If the accuracy has increased then it increases the value of the first hyperparameter(here number of filters) of the base convolve layer.
Also, it changes the initial 0 accuracies to new accuracy received in the data.txt file for next build calculations.
As soon as the hyperparameter value is changed, the job2 is re-run to see the accuracy.
Now, if the accuracy would have increased, it means that the value increased was good and can be increased further, so it increases that parameter’s value further.
But if it finds that the accuracy has decreased, then our program tweaker.py changes the parameter’s value to its initial value and now starts changing the value of the next hyperparameter (which is in our case is filter size).
In every call, it repeats this process until, in that layer, no more hyperparameters can be increased and when such a case arises , it goes on to add another layer and do all the above processes once again in the new layer.
Above was the explanation of how tweaker.py actually works.
Now lets see the code used to implement this job in Jenkins
The above codes are :
if [[ "$(sudo cat /home/amit/ws/mlops1/accuracy.txt)" < "0.9999999" ]]
then
echo "Tweaking The program"
sudo python3 /home/amit/ws/mlops1/tweaker.py
curl 192.168.43.250:8080/view/Integrate%20Machine%20Learning%20with%20Jenkins/job/See%20code%20and%20Launch/build?token=tweakedNowRun
else
echo "Merge and Email"
curl 192.168.43.250:8080/job/Model%20Create%20Success/build?token=modelCreateSuccessfi
Here 0.999999 is the target accuracy to be achieved to accept the model as successful.
The first curl command is to trigger job 2 since our hyperparameters are tweaked and ready to be tested.
The second curl command is to trigger job 5 on successful model creation.
Job 5: Model Create Success
This is triggered when the required accuracy is met and the input file is mailed to the developer to help the developer know the correct value of the hyperparameters.
sudo python3 /home/amit/ws/mlops1/email.py
Just write the above command in the Execute Shell of the Jenkins job 5.
Job 6 : Restart Docker
This is a monitoring job called when job 2 fails that is due to nay reason the container kerasCNNos fails to complete execution.
It restarts the docker engine completely to make sure docker engine is working fine because in our setup , this can be the major reason for job 2 failure.
And then it triggers job 2 once again.
Using the above setup, I achieved an accuracy of 99.21% using 5 epochs per training within a few hours till the time I did not manually stop the building.
No. of convolve layers : 2
Layer 1
No of filters : 128
Filter Size : 7
Pool Size : 2
Layer 2:
No of filters : 2048
Filter Size : 2
Pool Size : 2
No. of FC Layers : 1
Neurons in Layer 1 : 10Accuracy achieved : 0.9921000003814697
My GitHub Link for the above codes :
Thank you guys for reading this.