Seldon 1.3.3 released with Kafka Streams

We’re pleased to release Seldon 1.3.3 with several changes:

  • Analytics provided by Kafka Streams instead of Spark Streaming
  • New Movielens 10 Million Demo
  • Comprehensive benchmarking guide

The 1.3.2 release of Seldon contained Spark streaming jobs to feed real-time analytics of running predictive API calls. However, Spark Streaming was quite a heavy requirement to enforce on every Seldon cluster that it needs to have an always on Spark Cluster inside it. This was made more cumbersome with Spark’s default standalone scheduler which required at least virtual 2 cores per streaming job. With three streaming jobs running, small installations of Seldon were less likely to run successfully inside a VM with restricted core and memory footprint.

Seldon is focused on highly scalable scenarios, but we still want it to deploy on laptops and other limited test environments. We looked again at the recently released Kafka Streams for a solution. We use Kafka already in Seldon and Kafka Streams is a new library within the Kafka ecosystem providing focused and simple Stream processing functionality. A great introductory article to Kafka Streams can be found here.

Rewriting the analytics as a couple of Kafka Streams jobs provided a much more lightweight solution. Kafka Streams is still in early development but seems to provide a much simpler solution to scalable stream processing than Spark or Flink, especially when you already have a cluster manager like Kubernetes to handle scheduling, scaling and redundancy.

The 1.3.3 release also provides a larger content recommendation demo using the Movielens 10 Million dataset. This demo is then used in our documentation showing how to benchmark Seldon in terms of scaling to handle real-time predictions. We provide a step by step guide to create a Seldon cluster on AWS and load test it using the Iago load testing framework.

In the coming months, we will provide reference load test benchmarks for various predictive tasks at different scaling levels which Seldon users can use as guides to push their predictive solutions to production.

Please follow our install guide to get started with Seldon, which since our 1.3 release uses Kubernetes for deployments. If you have any questions or feedback, please post to the Seldon Users Group or create an issue on Github.