Project for ML DevOps Engineer Nanodegree, unit 5.
A company that has 10,000 corporate clients company needs to create, deploy, and monitor a risk assessment ML model that will estimate the attrition risk of each of the company's clients. If the model is accurate, it will enable the client managers to contact the clients with the highest risk and avoid losing clients and revenue.
Creating and deploying the model isn't the end of the work, though. The industry is dynamic and constantly changing, and a model that was created a year or a month ago might not still be accurate today. Because of this, we need to set up regular monitoring of the model to ensure that it remains accurate and up-to-date. Scripts to re-train, re-deploy, monitor, and report on the ML model will be created. In this way, the company can get risk assessments that are as accurate as possible and minimize client attrition.
- Python 3 required
- Linux environment may be needed within windows through WSL
This project dependencies is available in the requirements.txt file.
Use the package manager pip to install the dependencies from the requirements.txt.
Its recommended to install it in a separate virtual environment.
pip install -r requirements.txtor through pipenv:
sudo apt install pipenv
pipenv shell
pipenv install- Data ingestion: Automatically check if new data that can be used for model training. Compile all training data to a training dataset and save it to folder.
- Training, scoring, and deploying: Write scripts that train an ML model that predicts attrition risk, and score the model. Saves the model and the scoring metrics.
- Diagnostics: Determine and save summary statistics related to a dataset. Time the performance of some functions. Check for dependency changes and package updates.
- Reporting: Automatically generate plots and PDF document that report on model metrics and diagnostics. Provide an API endpoint that can return model predictions and metrics.
- Process Automation: Create a script and cron job that automatically run all previous steps at regular intervals.
"input_folder_path": "practicedata",
"output_folder_path": "ingesteddata",
"test_data_path": "testdata",
"output_model_path": "practicemodels",
"prod_deployment_path": "production_deployment"python ingestion.pyArtifacts output:
ingesteddata/finaldata.csv
ingesteddata/ingestedfiles.txt
python training.pyArtifacts output:
practicemodels/trainedmodel.pkl
practicemodels/encoder.pkl
python scoring.pyArtifacts output:
practicemodels/latestscore.txt
python deployment.pyArtifacts output:
production_deployment/ingestedfiles.txt
production_deployment/trainedmodel.pkl
production_deployment/latestscore.txt
python diagnostics.pypython reporting.pyArtifacts output:
practicemodels/confusionmatrix.png
python app.pypython apicalls.pyArtifacts output:
practicemodels/apireturns.txt
"input_folder_path": "sourcedata",
"output_folder_path": "ingesteddata",
"test_data_path": "testdata",
"output_model_path": "models",
"prod_deployment_path": "production_deployment"Train production model:
python training.pypython fullprocess.pyStart cron service
sudo service cron startEdit crontab file
sudo crontab -e- Select option 3 to edit file using vim text editor
- Press i to insert a cron job
- Write the cron job in
cronjob.txtwhich runsfullprocces.pyevery 10 mins - Save after editing, press esc key, then type :wq and press enter
View crontab file
sudo crontab -l