Introducing MLServer 1.0: Modern and flexible model serving for machine learning at scale

Introducing ML Server

In our mission to democratise access to machine learning (ML) we are pleased to announce the full release of MLServer 1.0, our open source ML inference server. We began our development of MLServer in June 2020, with the goal of providing teams with an easy way to serve their ML models via REST and gRPC APIs.

Out of the box, MLServer allows DevOps teams and data scientists to serve models leveraging the Scikit-Learn, XGBoost, MLlib, LightGBM, Seldon Tempo, and MLflow frameworks. MLServer also enables teams to write their own custom runtimes, allowing them to use their own logic for serving any Python-based model.

MLServer 1.0 also includes thorough support for modern serving features, such as:

  • Multi-model serving: Which allows teams to run multiple models on the same server, whether that be different versions of the same model or entirely different models. To complement this, MLServer also allows teams to direct traffic to the relevant model on their server.
  • Adaptive batching: Which sees MLServer group incoming requests together, perform predictions on the batch, and then separate a batch to send responses to users. This markedly improves resource usage, in exchange for a small trade-off in latency.
  • Parallel inference: Which is the ability to run multiple inference processes on a single server, passing requests into separately-running processes that maximise the usage of cores on your server.

MLServer also offers the following benefits for users:

  • V2 Inference Protocol support: MLServer supports the now-widely adopted V2 Inference Protocol, as part of Seldon’s commitment to see V2 become the new industry standard protocol to interact with a user’s served models. 
  • Horizontal scalability: MLServer is the core Python inference server used to serve ML models in Kubernetes native frameworks, like Seldon Core and KServe, making it straightforward for a user to deploy and scale up models on Kubernetes.
  • MLflow inference runtime: MLServer has been working closely with the MLflow team on a tighter integration between both projects. This allows users to easily serve MLflow models in MLServer, either from MLServer or directly from the MLflow CLI
  • Richer Python content types: Certain models may require that their input follows a particular Python type to perform inference. To allow this, MLServer lets users annotate their payload, so that it can convert the request to the right type on the fly.

Try it today

To try out MLServer 1.0, simply enter ‘pip install mlserver’ into your package manager to install it. Once complete, take a look at some of our examples to see MLServer in action and help you get started.

Contents