Dataflows for Machine Learning Operations

About this webinar

With increasingly complex requirements for the serving, monitoring and explaining of ML models, dataflow architectural principles can help teams reach their goals.

Machine learning (ML) models are deployed for production use cases with ever increasing pace, driving the growing need for machine learning operations (MLOps) for the deployment, monitoring, and explainability of ML models at scale. With the rise of the data-centric AI movement, businesses are seeking solutions that will provide them with highly discoverable and available data for monitoring, governance, and compliance.

In this talk, we identify dataflow architectural principles to address these demands and discuss their application in an open-source ecosystem. We show how to create a decentralized dataflow engine underpinned by Kafka and the Kafka Streams client library, and how this can be leveraged for building flexible data processing pipelines on-the-fly.

We explore the challenges faced in creating such a dataflow engine and reflect on our journey with the Kafka ecosystem. We consider managing dynamically-created Kafka Streams topologies, multiplexing hundreds or even thousands of these topologies onto individual JVM instances, and the integration between Kafka and Kotlin, amongst other things.

Speakers

Alex Rakowski

Solutions Engineer, Seldon

Andrei Paleyes

PhD Student, University of Cambridge

What you'll learn

  • How to create a decentralized dataflow engine underpinned by Kafka and the Kafka Streams client library
  • How this can be leveraged for building flexible data processing pipelines on-the-fly
  • The challenges faced in creating such a dataflow engine and reflect on our journey with the Kafka ecosystem
  • How to manage dynamically-created Kafka Streams topologies, multiplexing hundreds or even thousands of these topologies onto individual JVM instances

Watch the video