Machine Learning Model Serving and Monitoring with Seldon & Kubernetes
About this webinar
Join the Seldon team for a practical guide to building a state-of-the-art MLOps deployment platform using Kubernetes, with a focus on deploying deep learning models. Attendees will gain insights into the integration of key technologies including NVIDIA Triton Inference Server, Seldon Core v2, Kafka, Prometheus, and Grafana.
The session covers an end-to-end workflow for serving complex models like transformers and CNNs and configuring monitoring on top. The knowledge shared will be valuable for those looking to enhance model performance, reduce costs, and unlock new use cases in machine learning.
MLOps deployment platforms are very challenging systems to build, deploy, and operate in the real-world. As more and more business value is being driven by large, complex, deep learning models, it is crucial to be able to serve these in a scalable way.
After learning how to set up a state-of-the-art MLOps deployment platform on top of Kubernetes we’ll dive into how to serve deep learning models in a robust and scalable way. As inference is being performed, you’ll then see how to configure rich monitoring to observe models in production.
Along the way, we’ll cover each of the technologies being presented, including: Kubernetes, PyTorch, NVIDIA Triton Inference Server, Seldon Core v2, Kafka, Prometheus, and Grafana. We’ll demonstrate how these tools integrate with one another to form a fully fledged MLOps deployment platform.
What you'll learn
What You’ll Learn:
This means you can bring value to your organization through:
- Reducing inference costs through efficient serving
- Improving performance through continuous monitoring and deployment
- Enhancing the user experience with low-latency
- Shortening the time to bring use cases to market
- Eliminating the up-front development costs of building a deployment platform
Ultimately, engineers can benefit knowing that they are using best-in-breed tools, that they have a consistent way of deploying ML models, and that they have monitoring and logging built in. They can build complex, data-centric pipelines to open up a new set of use cases, including advanced recommendation engines, explainable machine vision models, and even LLM question-and-answer applications.