Transform your LLMs with Seldon
At Seldon, we are dedicated to advancing the field of Large Language Models (LLMs) through ongoing research and development. Our talented team of MLOps researchers and software engineers are hard at work improving auto-scaling capabilities to reduce startup times, even when GPU resources are scarce.
Though challenges remain, we are encouraged by the meaningful progress being made to deliver more powerful, scalable LLMs.
A Closer Look at Our Ongoing LLM Research and Development
Check out how Seldon can help
your organization deploy LLMs:
Multi-GPU Serving
Maximize the value of your resources by letting you operate the multiple GPUs as a single unit
State
Management
Ensure that your LLMs can retain conversation context and deliver more relevant and coherent responses
Continuous Batching
Tackle high latency, low utilization, and batching considerations (feature coming soon)
Streaming Output
Enable continuous delivery of LLM predictions, ensuring timely and seamless interactions
Deploying LLMs
with Seldon
The Seldon AI platform offers exciting opportunities to deploy LLMs and maximize their potential. Seldon allows you to leverage existing frameworks like DeepSpeed while scaling out LLMs on Kubernetes.
By leveraging Seldon’s modular architecture, you can serve, manage, and connect different models together. Even though some of the tooling is still in beta, we are rapidly innovating and optimizing our software.
MLServer LLM Runtimes
At the moment, the new LLM Runtimes are split into the following three:

OpenAI LLM-API
Wrap a proprietary LLM API into a microservice that can be accessed by other applications or linked into graph. Only the OpenAI API is supported as of today and not all of its functionalites are accessible by the server provided by MLServer.
HuggingFace DeepSpeed
DeepSpeed inference is for inference on large models that would otherwise not fit on GPU memory. Traditionally, the computational bottleneck of deploying hefty language models for inference has been limited memory capacity.


Microsoft Guidance
Serve one or more templates that can help developers guide the output of one or many LLMs. You can think of this as giving each user a museum tour guide for every visit to an LLM hosted via Seldon’s new extensions.
LLM Use Cases

Classification
- Enhance data organization, accelerate decision-making, and improve the content retrieval process

Reasoning
- Extract insights from unstructured data and support informed decision making.

Summarization
- Quickly grasp the essence of content for more efficient information processing

Content Generation
- Enhance data organization, accelerate decision-making, and improve the content retrieval process

Translation
- Extract insights from unstructured data and support informed decision making.

Topic Extraction
- Quickly grasp the essence of content for more efficient information processing
Deploy your LLMs with Confidence
With LLMs, your organization gains access to a wide range of capabilities that transform the way you engage with data and interact with your users. Seldon helps your journey from innovation to deployment be seamless.
LLMs open up a new class of enterprise applications by being:
Versatile
Whether it’s text classification, reasoning, or something in between, ensure your LLMs perform as best as possible.
Adaptable
Whether you're fine-tuning your LLMs or rolling out updates, integrate with your existing workflows.
Scalable
Our scalable infrastructure enables your LLMs handle high volumes of data and users without compromising performance.
Knowledgeable
Stay in control of your data by monitoring and analyzing your models’ performance.
Get Started Now
Want to see your LLMs come to life? Seldon is here to help you deploy and manage your LLM infrastructure.