An Essential Guide to ML Model Serving Strategies (Including LLMs)

About this webinar

​There are many ways to serve ML models to end users today, and even though new ways keep popping up as time passes, many questions remain. Firstly, how do we pick the appropriate serving approach and, secondly, how can we execute it as fast and efficiently as possible?

In this talk, Ramon Perez, our Developer Advocate at Seldon, will dive into the different machine learning deployment strategies available today for both traditional ML systems and Large Language Models, and we’ll also touch on a few dos and don’ts while we’re at it.

Seldon enables a data-centric approach to model inference with ML-focused pipelines and empowers users to run multi-model serving (MMS) strategies. MMS provides a scalable and cost-effective solution when the number of models you are deploying increases.

Both hosting costs and deployment overhead are significantly reduced when using an MMS approach. This is the ideal solution for hosting a large number of models with the same ML framework, or different ones, on a shared platform.

Speakers

Ramon Perez

Developer Advocate, Seldon

What you'll learn

  • The different ML deployment strategies on offer
  • The best techniques for traditional ML systems and Large Language Models
  • How to create a data-centric approach to model inference
  • How to scale whilst being cost-effective when creating ML pipelines

Watch the video