Getting Started with Machine Learning Monitoring

Machine learning models are powerful tools when used to automate processes and inform data-led decisions. But the effectiveness of models can degrade if left unmonitored and unoptimised. The lifecycle of a machine learning model should include constant tweaks and improvements to maintain and improve accuracy and efficiency. Without a process of machine learning monitoring, this optimisation would be impossible.  

There are a range of different types of machine learning, but the common theme is a system learning and improving from experience with data. But although the strength of machine learning is effectively self-learning systems, a degree of oversight and monitoring is vital. Model accuracy can drift over time once deployed for a number of reasons, and the model may even have innate bias from the training data. For a model to be deployed in a live environment effectively, a system of machine learning monitoring is important.  

Like with any software solution in an organisation, the health of a model within the wider digital systems should be closely monitored and maintained. But there are other, more specific points to consider which apply when monitoring machine learning models in particular.  

This guide explores the main considerations for successful machine learning monitoring, and the tools available to make the process as simple as possible. 

Four areas of machine learning model performance monitoring

After a machine learning model is deployed in a live environment, it needs to be consistently monitored and measured to understand ongoing accuracy. Model performance can degrade over time for a number of reasons. Actively engaging in machine learning monitoring means any drift or loss in function can be identified and resolved. 

Alongside specific machine learning considerations, models should be monitored like any other software deployed in the organisation. The health of the model within the organisation’s data and digital ecosystem should be an ongoing consideration. This means monitoring incoming and outgoing data quality, and the system resources which power the model.  

The effectiveness of the model itself can and will drift over time, so a system of model performance monitoring should cover the entire model lifecycle. Machine learning monitoring is an integral part of model optimisation, the process of renewing and updating a model to maintain and improve accuracy. Monitoring means ensuring the model functions correctly, maintains accuracy and effectiveness, and that data quality remains high. 

Any successful machine learning model performance monitoring should include: 

  • Model health checks 
  • Model drift detection 
  • Model bias detection 
  • Outlier detection 

Model health checks

Like any other software or hardware deployed across the organisation, a machine learning model should be monitored for ongoing functionality and health. There are unique considerations around monitoring the ongoing health of a machine learning model, but the model will still sit within an organisation’s data or service ecosystem.  

For this reason, many of the monitoring processes already in place across the organisation should still apply. This means monitoring and maintaining GPU resources used by the model, checking and maintaining user access, and ensuring input and output data is of sufficient quality.  Like with any software within the organisation, individuals should be on call to troubleshoot and fix any issues identified during machine learning monitoring.  

The health of the model itself should be monitored and maintained. Increasingly, machine learning models are being deployed in containerised environments for streamlined monitoring of system health. Containers can be managed across different environments, from local servers to cloud systems. This approach means a consistent environment that can be scaled to meet demand.  Container orchestration tools like kubernetes streamline the management and monitoring of the different containers. The platform makes checking the health of different parts of the model straightforward, as well as the scaling of models. 

Model drift detection

Machine learning monitoring should be a vital part of the model lifecycle, as the performance of the model can and will degrade over time. Gradual or sudden changes in model accuracy is usually called model drift. It is when the effectiveness of the model drifts or shifts over time, and can happen for a number of reasons. As the model drifts, so will the accuracy of any predictions. The model will be less effective as a result.  

A common type of model drift that should be monitored for is concept drift. This is when the relationship between input and output data changes over time. Machine learning models are usually trained in offline environments on training data. Once deployed in a live environment, the real-world relationship between input and output data may shift compared to that learned from the training data. Because of this, a model may not be able to comprehend an emerging trend in the live environment, lowering the model’s accuracy.  

If the performance of a model deteriorates, concept drift could be the cause. Ongoing machine learning model performance monitoring can be used to identify concept drift. Producing an average confidence score for predictive machine learning models is a method of identifying concept drift.  Once identified, models can be retrained and updated to improve accuracy and effectiveness. It’s useful to maintain a version of the model when updating a model to solve concept drift. Each iterative update or improvement can then be compared to the static version.  

Another common type of model drift is called covariate shift. This is when the distribution of input data is different in a live environment compared to that of the training environment. It can be a gradual shift or a sudden change. Covariate shift is a common sign that a model is overfit to training data. This means it lacks an ability to generalise when encountering new, live data. Machine learning model performance monitoring is required to understand the degree each model is affected by covariate shift.  

It’s a very common occurrence within machine learning, as it’s likely that labelled training data will be different to live data. Serious covariate shift may be noticeable soon after deployment if the model is overfit to the training data, as the model may not function at all. Machine learning monitoring should always encompass detection of covariate shift.  

Model bias detection

There is a risk of bias in machine learning when models are built from non-representative or incomplete training datasets. For example, training data that doesn’t include certain demographics or age groups may result in a model that is innately biased. The model may be accurate with the subset of data it was trained on, but be ineffective with new subsets of training data. 

As machine learning models become an integral part of decision-making across different sectors, model bias is a major consideration. In a real-world setting, machine learning models in finance are already used to automate credit checks and loan applications. Bias against specific individuals in this system could cause serious issues. Machine learning monitoring should be built to identify model bias to ensure model outputs are fair. 

Outlier detection

Machine learning models learn from rich training datasets and process huge arrays of data once deployed. The presence of outliers in such a data-rich environment is likely. Outliers or anomalous data can have a serious impact on model accuracy, because models are trained to understand the relationship between data points. Anomalies in the data can therefore skew data trends and predictions, and lower the effectiveness of the model as a whole. 

There are different types of outliers that should be monitored for through an outlier detection process. Generally, outliers can be identified through analysing data point distribution, focusing on elements like distance and density of data. Point outliers are one of the more straightforward outliers to observe and identify. The point outlier will be an individual data point which sits beyond the range of the rest of the dataset. The point outlier will generally be beyond the patterns, grouping or trends which govern the rest of the dataset. For this reason, point outliers are straightforward to visually identify when the data is plotted against two or three dimensions. 

Contextual outliers are another type of outlier, though can be more difficult to detect. A contextual outlier is generally only observable when the overall context of the dataset changes. This might be because of seasonal changes or external fluctuations. Continuous machine learning monitoring is required to identify contextual outliers, as they may seem like normal data points in some circumstances or contexts. 

The final common type of outlier in machine learning is a collective outlier. This is when a collection of data points are anomalous, or a sequence has moved from an expected or predicted behaviour pattern. The fact that the pattern or trend as a whole has shifted makes collective outliers difficult to identify. 

Each data point within a contextual outlier when taken in isolation will seem like a normal data point. But when the series as a whole is considered, anomalous behaviour can be observed. Steps should be taken to ensure machine learning monitoring can identify collective outliers. Outlier detection is a vital element of machine learning monitoring, as the presence of outliers can be a symptom of a systemic issue with the model.  

Tools for machine learning monitoring and deployment

Seldon Deploy is an enterprise solution which streamlines the machine learning deployment process. It comes with inbuilt machine learning monitoring capabilities, so organisations can streamline the deployment and ongoing monitoring of models. Organisations can use the enterprise stack to drill down into input and output data to dynamically monitor deployed models. Users can identify outliers or divergences and investigate through Seldon Deploy, helping to answer questions and resolve issues.  

Alongside machine learning monitoring features, Seldon Deploy includes access to automated drift detection, outlier detection, and reporting features. Combined, Seldon Deploy is a powerful tool for machine learning deployment and ongoing monitoring.  

Machine learning deployment for every organisation

Seldon moves machine learning from POC to production to scale, reducing time-to-value so models can get to work up to 85% quicker. In this rapidly changing environment, Seldon can give you the edge you need to supercharge your performance.

With Seldon Deploy, your business can efficiently manage and monitor machine learning, minimise risk, and understand how machine learning models impact decisions and business processes. Meaning you know your team has done its due diligence in creating a more equitable system while boosting performance.

Deploy machine learning in your organisations effectively and efficiently. Talk to our team about machine learning solutions today.

Contents