A Guide to Deploying Machine Learning Models on Kubernetes

How to Deploy Models on Kubernetes

Kubernetes is a container orchestration platform used to manage containerised applications. It used to automate important parts of the container management process such as container replication, scaling, monitoring and scheduling. It’s an open-source platform written in Google’s Go programming language. Kubernetes is one of the most popular container management systems, and is used to power many different services and platforms. Examples range from running mobile applications across different environments or in cloud web hosting servers.

Increasingly, Kubernetes is playing an important role in the evolution of machine learning. Containerised development and deployment is becoming an important element for machine learning models. Machine learning models can be easily scaled and scheduled when containerised, and the management of workload performance can be automated. Containerisation offers a consistent state across different servers or cloud environments. With a strong community backing the open source Kubernetes, it’s one of the most popular ways of managing machine learning models within containers.

Specific toolkits have also been developed to make the process of deploying machine learning models on Kubernetes straightforward. Kubeflow is a Kubernetes toolkit developed specifically for machine learning. It provides streamlined access to machine learning pipeline orchestration tools, as well as popular machine learning frameworks and components. Containerisation of the machine learning lifecycle brings many benefits to organisations looking to deploy models efficiently. Scalability and portability are two of the main benefits for machine learning deployment.

With Kubernetes, organisations can embed end-to-end machine learning workflows within containers.  This guide explores machine learning models on Kubernetes, including a step by step instruction on setting up and the benefits it may bring.

What is Kubernetes?

Kubernetes is an open source platform to manage containerised applications originally developed by Google. It’s used an a container orchestration platform, and automates the scaling, monitoring and scheduling of containers within the cluster. Containers are a virtual environment to deploy applications, with each container deployed in isolation from the operating system. This means a containerised application can draw resources from a  range of different sources, whether local networks or the cloud, all while maintaining a consistent environment.

Although containers are a lightweight solution, the orchestration of each one can be complex. Kubernetes is used as an orchestration tool to scale and replicate containers, and maintain contain health. It automates elements like load balancing and the scaling of resources too. Containers are uniquely placed to strengthen machine learning development and deployment, as scalability and resource allocation are integral parts of upscaling a machine learning model. Kubernetes is able to automate elements like GPU acceleration, helping to make machine learning experimentation and deployment more efficient. Kubeflow is a toolkit designed specifically for machine learning models on Kubernetes.

What is Kubeflow?

Kubeflow is an open source toolkit of tools, components and frameworks developed specifically for machine learning models on Kubernetes. It aims to simplify the development, deployment and ongoing management of machine learning models in a containerised environment through Kubernetes. Its aim is to deliver an end-to-end platform for the entire machine learning lifecycle. It covers the training, production, and deployment, instead of just one element of the machine learning lifecycle. Kubeflow was built by Google as a way to deploy machine learning models on Kubernetes, before being released as an open source platform. The focus was on providing a platform for end-to-end machine learning workflows, and to improve the process of actual deploying models.

Kubeflow provides a framework for the entire machine learning pipeline, which is made up of different modules. Each stage outlines the machine learning lifecycle, from data cleaning and validation to model training and deployment. Kubeflow provides the ability to run these independent modules of the wider machine learning workflow, drawing from machine learning frameworks for each step. It is a machine learning specific improvement to Kubernetes. Users are able to choose from different components for each step in the machine learning pipeline.

In practice, this means the entire machine learning workflow is accessible from different devices and systems. The entire pipeline is within the series of containers, meaning the model is portable. Containers are also significantly easier to scale, drawing on more GPU or CPU resources when required. This is in contrast to traditional deployment, which would see the machine learning pipeline deployed across many different states and environments.

Kubeflow introduces machine learning specific management of the model, framework and storage, whilst Kubernetes deals with the container management.

How do you deploy a machine learning model on Kubernetes?

The first step is to clearly outline the machine learning workflow, from the experimental phase, to production and deployment. The unique stages that should be deployed within containers can be identified by outlining each step of the machine learning workflow. As for the process of deploying a model on Kubernetes, there are toolkits and platforms available to streamline the process. Kubernetes was designed for orchestrating containerised applications, and these platforms introduce elements specific to machine learning and deep learning.

The main platform for deploying a machine learning model on Kubernetes is Kubeflow. Like Kubernetes itself, Kubeflow was originally developed by Google. It’s a dedicated toolkit designed for the end-to-end deployment of machine learning models on Kubernetes. Another platform designed for deep learning frameworks is Fabric for Deep Learning (FfDL) originally developed by IBM. Both platforms allow users to add different components to automate or manage the different phases of the machine learning workflow. For example with Kubeflow, Jupyter notebooks can be integrated for the experimental phase of the workflow, where data scientists are interacting with and analysing data. Components like Seldon Deploy can be added to help automate the production phase of the machine learning pipeline.

Deploy a machine learning model on Kubernetes using Kubeflow

Kubeflow is a popular platform used to build machine learning pipelines for Kubernetes. It provides a way to map and deploy an end-to-end machine learning pipeline for Kubernetes, and is a popular way of powering containerised machine learning models. Users can select components for each stage of the machine learning workflow through a user interface, and amend configuration files to finetune and adapt the machine learning pipeline.

A key element of using Kubeflow to deploy machine learning models on Kubernetes is Kubeflow Pipelines. It’s a platform which allows end-to-end orchestration of a machine learning pipeline. It’s available as a part of Kubeflow or as a standalone platform. It has a dashboard and user interface to design the machine learning pipeline, so is useful to perform experiments for different workflows. Machine learning pipelines can then be reused or adapted for new projects to help streamline the overall process.

In practice, Kubeflow is used as the framework or scaffolding for the entire machine learning workflow. Popular components can be added for every stage. There are a range of ways to install and set up Kubeflow for machine learning. Most of the main container or Kubernetes services will have unique processes for installing Kubeflow on their platform. However, most will cover a similar broad step-by-step approach.

The general process for deploying a machine learning model on Kubernetes using Kubeflow includes:

  1. Download Kubeflow deployment binary
  2. Check Kubeflow is compatible with the Kubernetes service provider (if relevant).
  3. Adapt configuration files for each stage of your machine learning workflow.
  4. Deploy containers to your environment, including setting cluster sizes.
  5. Install individual components for each stage of the workflow.
  6. Integrate Jupyter notebooks created during the experimental phase of the machine learning process.
  7. Experiment with the workflow using Kubeflow Pipelines UI.
  8. Configure the training job operator to train the machine learning model.
  9. Export the trained model to Kubernetes.
  10. Use the Seldon Core integration to deploy the machine learning model.

Deploying a machine learning model on Kubernetes using Fabric for Deep Learning (FfDL)

Fabric for Deep Learning (FfDL) is a platform developed by IBM to run deep learning frameworks on Kubernetes. Deep learning algorithms simulate the human brain through their multi-layered network architecture. Models are often utilised to process raw analogue data and automatically extract features of raw data. FfDL is mostly used as a platform for organisations that offer the deep learning models as a service.

The aim is to build a stack that is easy to scale, and which can be worked with seamlessly in the cloud. This avoids the need for local high-resource systems and machines to train machine learning models. Instead, users can utilise containerised deep learning frameworks in the cloud. Popular deep learning frameworks can be selected and used within the FfDL framework too.

FfDL consists of a range of pods for individual parts of the machine learning workflow, which are managed by Kubernetes. FfDL is limited in some areas when compared to Kubeflow, most notably with more basic options for hyperparameter optimisation. For this reason, the average machine learning model may be better fitted to using Kubeflow.

The process for deploying a machine learning model using FfDL includes:

  • Select a support machine learning framework such as Pytorch, TensorFlow, or Caffe.
  • Transform the model pipeline into containers, creating a container image from the model.
  • Upload the containerised model pipeline.
  • Connect training data and configure the training job.
  • Train model using FfDL user interface.
  • Deploy the model to Kubernetes cluster using Seldon Core.

The strength of deploying machine learning models on Kubernetes.

Many different elements make up the machine learning lifecycle, and managing these separate elements separately can be time consuming and resource intensive. For example, the training phase will need a different environment and resources to the final deployment. The benefits of containerised machine learning models is that containers provide a consistent state that can be scaled, regardless of where it is drawing resources. This means machine learning pipelines can be deployed and accessed across a range of devices or networks, such as local or cloud servers.

You can run all elements of the machine learning workflow in one place, accessed anywhere the user is running Kubernetes. Instead of different parts of the machine learning pipeline being hosted and run on different systems and servers, Kubernetes means it’s all in the same accessible place. This means you can access the entire machine learning workflow from different devices and on local or cloud servers. Elements of the machine learning workload can be reused and repurposed for new projects, further improving efficiency.

Another strength of Kubernetes is the ability to automate parts of the machine learning process. Kubernetes will automatically manage and scale resources, perform container health checks, and the scale services. This removes the need for manual management of containers, streamlining the management of the machine learning lifecycle. The automatic deployment of the model can also be managed through Kubernetes. Machine learning pipelines can be amended and reused too, further improving the efficiency of a machine learning project.

The main strengths of machine learning deployment on Kubernetes includes:

  • Automation of the machine learning pipelines.
  • Automatic container management and health checks, freeing up resources and time.
  • Specific stages and nodes can be updated in a piecemeal way, lowering overall downtime.
  • Improved access and portability of all areas of the machine learning model.
  • Improves management of cloud-based machine learning models.
  • Automates scaling of the machine learning model, for example automatically accelerating GPU usage when required.

Machine learning deployment for every organisation

Seldon moves machine learning from POC to production to scale, reducing time-to-value so models can get to work up to 85% quicker. In this rapidly changing environment, Seldon can give you the edge you need to supercharge your performance.

With Seldon Deploy, your business can efficiently manage and monitor machine learning, minimise risk, and understand how machine learning models impact decisions and business processes. Meaning you know your team has done its due diligence in creating a more equitable system while boosting performance.

Deploy machine learning in your organisations effectively and efficiently. Talk to our team about machine learning solutions today.

Contents