Real Time ML at Scale using SpaCy, Kafka & Seldon Core

A hands on tutorial to train and deploy a Sklearn SpaCy model to process real time Kafka streaming data using Seldon Core

In this post, we will cover how to train and deploy a machine learning model leveraging a scalable stream processing architecture for an automated text prediction use-case. We will be using Sklearn and SpaCy to train an ML model from the Reddit Content Moderation dataset, and we will deploy that model using Seldon Core for real time processing of text data from Kafka real-time streams. This is the content for the talk presented at the NLP Summit 2020.

You can find the full code for this article in the following links:

Model Training with SpaCy & Sklearn

For this use-case we will be using the Reddit /r/science Content Moderation Dataset. This dataset consists of over 200,000 reddit comments — primarily labelled based on whether the comments have been removed by moderators. We’ll be tasked to train an ML model that is able to predict the comments that would have been removed by reddit moderators.

Model Training with SpaCy & Sklearn

Exploratory Data Analysis Notebook (Image by Author)

When it comes to building a machine learning model — especially when it is to be deployed to a production system — it is important that principles and best practices are used to ensure its responsible design, development and operation.

Image for post

Algorithm Comparison Chart (Image by Author)

You will be able to find best practices and techniques of exploratory data analysis in the model training notebooks — here we cover understanding of the features, data distributions, data imbalances, data cleaning, algorithm performance comparison, tokenization approaches, partial dependency plots, etc.

These techniques enable data scientists and engineers to identify domain specific best practices for the training, deployment and even produciton monitoring of the machine learning models.

Image for post

Base NLP Pipeline (Image by Author)

Specifically in regards to model training, we will be building the machine learning pipeline using sklearn and SpaCy. As outlined in the image above, we use the following components in the pipeline:

  • CleanTextTransformer — Cleans input text by removing relevant characters, symbols and repeated phrases

Image for post

Model Inference Stages (Image by Author)

For a more intuitive flow of how our model will process the data, the image above shows how a text data input instance would be transformed during each of the stages. More specifically, if we receive a phrase, it will be sent through each of the pipeline components until we’re able to receive a prediction.

Once we train our model, we will be able to use the same code that was used for the pipeline, and we will export the artifacts of the models trained using pickle so we can load them when deploying the model.

Model Containerisation with Seldon Core

Once we’ve trained our model, we’re able to use Seldon Core to convert it into a scalable microservice and deploy it to Kubernetes using the Seldon cloud native kubernetes operator. Models deployed with Seldon Core support REST and GRPC interfaces, but since version 1.3 it also supports native kafka interface, which we’ll be using in this article.

Real Time ML

Model Deployment with Seldon Core (Image by Author)

Seldon provides several ways to productionise machine learning models. The most common approach is using one of the existing prepackaged-model servers. In this case however, we will be building our own custom model server by extending the default sklearn pre-packaged server to add SpaCy and its respective English language model.

To containerize our model with Seldon, we will be following the standard Seldon Core workflow using the Python Language wrapper. As shown in the image below, the standard steps required to containerize a model are:

  • Create a Python Wrapper Class to expose the model logic

Custom Model Server with Seldon

Standard Steps Required to Containerise Models (Image by Author)

In our case we will just have to define a Python wrapper that will consist of:

  • Importing the code for the ML pipeline used in the training section of this article

The code for the wrapper can be found in the image below, or alternatively you can find the full Jupyter notebook to train, containerize and test the Seldon wrapped model.

Image for post

Once we have the wrapper, we are able to simply run the Seldon utilities using the s2i CLI, namely:

s2i build . seldonio/seldon-core-s2i-python3:1.3.0 nlp-model:0.1

Now we have fully containerised our model as the image nlp-model:0.1 and we will be able to deploy it for stream processing in the next section.

Kafka Stream Processing

Now that we have trained and containerized our model, we’re able to deploy it. Seldon models support REST, GRPC and Kafka protocols — in this example we will be using the latter to support stream processing.

Kafka Stream process

Kafka ML Processing Architecture (Image by Author)

You are able to find the full jupyter notebook for the example including the deployment files as well as the request workflows.

We have the following components:

  • Kubernetes Cluster — The Kubernetes cluster where all our components will be deployed to

For simplicity we will skip the steps required to set up the kubernetes cluster — including setting up the Kafka brokers and installing Seldon Core — but you can find the full instructions in the notebook example.

Now we’re able to deploy our model. For this we will just need to define our deployment configuration file following the SeldonDeployment schema:

Image for post

SeldonDeployment Kafka Configuration File (Image by Author)

As you can see in the configuration YAML, the structure contains the following key points:

  • The name of the model name: reddit-kafka

We can now create the model using the kubectl command:

kubectl apply -f sdep_reddit_kafka.yaml

Once our model is created we can now send data into the input topic. We can do so by using the utility.

Image for post

Similarly, we can also listen for the output data that the model produces.

Image for post

Now when we send the input data to the input topic:

{”data”: {”ndarray”: [”This is an input”]}}

We would consequently see the prediction in the output topic stream:

{“data”:{“names”:[“t:0”,”t:1"],”ndarray”: [[0.6758450844706712, 0.32415491552932885]]},”meta”:{}}

With this we now have a model deployed in a scalable architecture for real time machine learning processing — more specifically, this architecture allows for horizontal and vertical scalability of each of the respective components. Namely, the deployed models can scale to a variable number of replicas, together with autoscaling based on Kubernetes HPA that would horizontally scale based on resource usage. Similarly, Kafka can also scale horizontally through the number of brokers, which enables for large throughputs with low latency. This can be seen more intuitively in the diagram below.

Scaling Replicas and Brokers

Scaling Replicas and Brokers (Image by Author)

What’s next?

Now that you have the intuition and understanding of the core architectural components, you are now able to delve into the practical details. For this you will be able to access the code and resources in the following links:

If you are interested in further hands on examples of scalable deployment strategies of machine learning models, you can check out: