Stream-Based Model Serving with Cloudbend and Seldon

Boris Lublinsky, Principal Architect at Lightbend has written a fantastic post on their blog discussing the role of model serving in building a real-time streaming application. He also reveals how to implement and deploy machine learning (ML) models with open-source frameworks Cloudflow and Seldon.

“Today’s customers expect to receive information in real time. In order to meet these expectations, many companies are now moving their data systems from batch to stream processing.” – Boris Luknlinsky, Seldon

The Github project can be found and cloned here.

Examples of stream processing applications gaining popularity in the industry today include, but are not limited to:

  • Fraud detection: Correlate payment information with other historical data or known patterns to detect fraud before it happens. This typically needs very fast processing as you must decline a transaction before it is processing.
  • Cross-selling: Evaluate customer purchasing history to make context-specific, personal, customized offers or discounts before the customer leaves the store.
  • Predictive maintenance: Evaluate current operations to predict failure before it happens. This allows replacing parts before they break.

The key in all of these use cases is that you process data while it is in motion. You need to handle the event while it is occuring, not several hours after something has happened. Although such mission-critical, real-time applications have been built for years, usage of ML allows:

  • Building new innovative applications that are differentiated from competitors.
  • Applying ML to more “traditional scenarios” like fraud detection, cross selling, or predictive maintenance to enhance existing business processes and make better data-driven decisions.

The introduction of ML models also impacts the development cycle of streaming applications by adding model creation and management to the overall process, which includes the following steps:

  • Build: Use ML algorithms to find insights based on the historical data. This step includes data collection, cleansing, preparation, and building models.
  • Validate: Use different techniques to ensure that a created model works in the production requirements.
  • Operate: Deploy the created model in production to process new incoming events in real time.
  • Monitor: Watch the outcomes of the applied model. The two most important monitoring activities here are concept drift monitoring to make sure that the model still behaves correctly and model explainability — figuring out why a certain decision was made by a model to ensure trust in the model’s behavior.
  • Continuous Loop: Improve the model by rebuilding the model based on new data. This can be done on schedule (for example, weekly) or based on the results of concept drift monitoring.

Model Serving with Seldon
Seldon Core is a ML Deployment framework that allows data scientists to convert their ML code or artifacts into full-fledged microservices through flexible and extensible inference graphs, which can be scaled to thousands of production models. The core components of Seldon include:

  • Prepackaged model servers: Optimised docker containers for popular libraries such as TensorFlow, XGBoost, and H2O which can load model artifacts/binaries and serve them as Seldon Deployed model microservices.
  • Language wrappers: Tools to enable more custom ML models to be wrapped using a set of CLIs which allow data scientists to convert a Python File or a Java Jar into a fully fledged microservice.
  • Inference graph: With Seldon, it’s possible to containerize multiple model artifacts into separate re-usable inference containers, which can be linked using the Seldon inference graph. This allows Seldon users to build inference pipelines that can consist of models, transformers, combiners, and routers.
  • Standardized API: Every model that is deployed comes with an out-of-the-box API which can be REST or gRPC.
  • Out-of-the-box observability: Each Seldon Deployment comes with monitoring metrics and auditable request logs through a standardised format that allows for consistency of monitoring during the scaling of deployed models.
  • Advanced ML insights: Seldon abstracts complex ML concepts such as Explainers, Outlier Detectors and Adversarial Attack Detectors into infrastructural components that can be extended by the Seldon users to leverage model-specific or reusable techniques.

Read the full blog post here.

If you want to book a demo of Seldon or hear from our team, fill out the form below:

  • This field is for validation purposes and should be left unchanged.