Transform your LLMs with Seldon

Accelerate AI Development, enhance performance and streamline automation

Seldon is dedicated to advancing the field of Large Language Models (LLMs) through ongoing research and development. Our talented team of MLOps researchers and software engineers are hard at work improving auto-scaling capabilities to reduce startup times, even when GPU resources are scarce. 

A Closer Look at Our Ongoing LLM Research and Development

Check out how Seldon can help
your organization deploy LLMs:

Multi-GPU Serving

Maximize the value of your resources by letting you operate the multiple GPUs as a single unit


Ensure that your LLMs can retain conversation context and deliver more relevant and coherent responses

Continuous Batching

Tackle high latency, low utilization, and batching considerations (feature coming soon)

Streaming Output

Enable continuous delivery of LLM predictions, ensuring timely and seamless interactions

Deploying LLMs
with Seldon

Seldon offers exciting opportunities to deploy LLMs and maximize their potential. Seldon allows you to leverage existing frameworks like DeepSpeed while scaling out LLMs on Kubernetes.

By leveraging Seldon’s modular architecture, you can serve, manage, and connect different models together. 

MLServer LLM Runtimes

At the moment, the new LLM Runtimes are split into the following three:


Wrap a proprietary LLM API into a microservice that can be accessed by other applications or linked into graph. Only the OpenAI API is supported as of today and not all of its functionalites are accessible by the server provided by MLServer.

HuggingFace DeepSpeed

DeepSpeed inference is for inference on large models that would otherwise not fit on GPU memory. Traditionally, the computational bottleneck of deploying hefty language models for inference has been limited memory capacity.

Microsoft Guidance

Serve one or more templates that can help developers guide the output of one or many LLMs. You can think of this as giving each user a museum tour guide for every visit to an LLM hosted via Seldon’s new extensions.

LLM Use Cases




Content Generation


Topic Extraction

Deploy your LLMs with Confidence

With LLMs, your organization gains access to a wide range of capabilities that transform the way you engage with data and interact with your users. Seldon helps your journey from innovation to deployment be seamless.

LLMs open up a new class of enterprise applications by being:


Whether it’s text classification, reasoning, or something in between, ensure your LLMs perform as best as possible.


Whether you're fine-tuning your LLMs or rolling out updates, integrate with your existing workflows.


Our scalable infrastructure enables your LLMs handle high volumes of data and users without compromising performance.


Stay in control of your data by monitoring and analyzing your models’ performance.

Get Started Now

Want to see your LLMs come to life? Seldon is here to help you deploy and manage your LLM infrastructure.