This post was originally published by Intel Corporation on November 19, 2018.
You can now run Intel’s machine learning model optimization tools using nGraph library and the Intel® Distribution of OpenVINO™ toolkit on Seldon Core*. In this article we discuss the challenges involved for organizations that want to put machine learning models into production and how Intel and Seldon technologies combine to solve these challenges.
As machine learning (ML) applications become more prevalent across industries, organizations are challenged to take their nascent data science projects and put them into production. In their 2015 NeurIPS paper, Google engineers explained that machine learning model code is only one small part of the overall pipeline and toolset that needs to be created. They illustrated their point with a schematic showing that the lines of code for the core ML algorithm are far outstripped by the lines of code for the surrounding requirements of serving infrastructure, monitoring, analysis tools, and process and machine resource management, among other needed components. This leads to a large technical gap if the organization itself needs to create custom code to satisfy these missing components.
Similarly, from Intel’s experience in helping customers build and deploy real production machine learning applications, we have found that actual machine learning model development is a small part of the entire machine learning development cycle. Surrounding serving infrastructure, monitoring, tools and processes are needed components that are essential for a successful machine learning solution.
Adding to these challenges, organizations may want to provide machine learning operations with the flexibility to deploy to both cloud and on-premise in a reproducible manner. In this area, Kubernetes* has become a leader in container orchestration by allowing diverse software projects to be scaled and managed robustly. Machine learning pipelines can also benefit from Kubernetes’ infrastructure abstractions that allow control over resources, security, rollout and monitoring of complex container-based systems.
Seldon Core is a third-party open source project that helps organizations satisfy demands including model serving infrastructure, optimization, and monitoring and management of the CI/CD pipeline to put machine learning code into production.
Intel’s nGraph library and the Intel® Distribution of OpenVINO™ toolkit provide deep learning optimizations that allow machine learning models to be pushed from concept into production-ready optimized models. The nGraph library is Intel’s open source computational graph compiler for neural networks, able to transform a deep learning model into an executable optimized function which runs efficiently on a variety of hardware, and using multiple framework backends. The Intel Distribution of OpenVINO toolkit provides computer vision and deep learning inference tools, including convolutional image-based classification models optimized for Intel® processors: CPUs, integrated GPUs/Intel® Processor Graphics, Intel® Movidius™ Vision Processing Units (VPUs), and Intel® FPGAs.
Seldon Core has successfully streamlined the application of pipelines using these Intel tools. As Seldon Core is framework-agnostic, it allows tools to be easily containerized and packaged to be run and managed in a Kubernetes cluster. It uses companion open tools such as S2I (Source-to-Image) to allow models compatible with the nGraph library or the Intel Distribution of OpenVINO toolkit to be packaged in a container for easy deployment. Seldon Core also allows inference graphs made up of various components to be constructed, deployed and managed. Figure 1 shows the various containerized components connecting together to provide a request/response microservice graph to provide a set of functionalities to satisfy a prediction request:
Figure 2 shows how a separate container handling feature transformations is then routed via a multi-armed bandit solver (MAB) to either an nGraph library or Intel Distribution of OpenVINO toolkit-based model. Servers running the nGraph library can be managing an ONNX exported ImageNet model while servers running the Itoolkit can be managing an optimized ResNet model. The MAB can be used to decide which model is best and push traffic to it in real time, as production prediction and feedback requests flow through the graph.
The workflow in figure 2 can be described in a Kubernetes manifest, which includes the definitions of how the individual components are wired together along with the dependencies. This allows data scientists to specify resource requirements for their running model such as:
- The CPUs, GPUs, and memory requirements for each component to run successfully.
- The persistent volumes (S3, NFS etc.) that are required for model weights and other data requirements.
- Other containers that need to be run alongside the model for successful inference such as databases or other business components.
These manifests can be stored in source control and tied into an organization’s CI/CD pipelines to allow automatic updating of models in production as new versions are created, resource requirements change or scaling needs to be applied. The declarative nature of these manifests allows the latest “GitOps” techniques using source control as the source of truth and instigator of all operations to be applied to machine learning pipelines, enabling easy auditing and rollback.
To put an inference graph into production, a user can send it over the standard Kubernetes API using tools such as kubectl, Helm or ksonnet. Seldon Core then automatically exposes endpoints for both REST and gRPC to allow external business applications access to the model. The current integration between Seldon Core, nGraph library and OpenVINO toolkit is illustrated in the below documentation:
- An example notebook showing a Seldon Core managed deployment of an ONNX* exported ResNet model using Intel’s nGraph library.
- An example notebook showing a Seldon Core managed deployment of a Caffe* exported ResNet model running within Intel’s OpenVINO toolkit.
If you want to learn more about nGraph library, the Intel Distribution of OpenVINO toolkit, and how to easily manage and orchestrate your production rollout of their optimized models with Seldon Core, please follow the links below:
- nGraph library
- Intel Distribution of OpenVINO toolkit
- Seldon Core (Join the Seldon open-source community on Slack)