Transform your LLMs with Seldon

Accelerate AI Development, enhance personalization and streamline automation

At Seldon, we are dedicated to advancing the field of Large Language Models (LLMs) through ongoing research and development. Our talented team of MLOps researchers and software engineers are hard at work improving auto-scaling capabilities to reduce startup times, even when GPU resources are scarce. 

Though challenges remain, we are encouraged by the meaningful progress being made to deliver more powerful, scalable LLMs.

A Closer Look at Our Ongoing LLM Research and Development


Check out how Seldon can help
your organization deploy LLMs:

Multi-GPU Serving

Maximize the value of your resources by letting you operate the multiple GPUs as a single unit

State
Management

Ensure that your LLMs can retain conversation context and deliver more relevant and coherent responses

Continuous Batching

Tackle high latency, low utilization, and batching considerations (feature coming soon)

Streaming Output

Enable continuous delivery of LLM predictions, ensuring timely and seamless interactions

Deploying LLMs
with Seldon

The Seldon AI platform offers exciting opportunities to deploy LLMs and maximize their potential. Seldon allows you to leverage existing frameworks like DeepSpeed while scaling out LLMs on Kubernetes.

By leveraging Seldon’s modular architecture, you can serve, manage, and connect different models together. Even though some of the tooling is still in beta, we are rapidly innovating and optimizing our software.

MLServer LLM Runtimes

At the moment, the new LLM Runtimes are split into the following three:

OpenAI LLM-API

Wrap a proprietary LLM API into a microservice that can be accessed by other applications or linked into graph. Only the OpenAI API is supported as of today and not all of its functionalites are accessible by the server provided by MLServer.

HuggingFace DeepSpeed

DeepSpeed inference is for inference on large models that would otherwise not fit on GPU memory. Traditionally, the computational bottleneck of deploying hefty language models for inference has been limited memory capacity.

Microsoft Guidance

Serve one or more templates that can help developers guide the output of one or many LLMs. You can think of this as giving each user a museum tour guide for every visit to an LLM hosted via Seldon’s new extensions.

LLM Use Cases

Classification

Reasoning

Summarization

Content Generation

Translation

Topic Extraction

Deploy your LLMs with Confidence

With LLMs, your organization gains access to a wide range of capabilities that transform the way you engage with data and interact with your users. Seldon helps your journey from innovation to deployment be seamless.

LLMs open up a new class of enterprise applications by being:

Versatile

Whether it’s text classification, reasoning, or something in between, ensure your LLMs perform as best as possible.

Adaptable

Whether you're fine-tuning your LLMs or rolling out updates, integrate with your existing workflows.

Scalable

Our scalable infrastructure enables your LLMs handle high volumes of data and users without compromising performance.

Knowledgeable

Stay in control of your data by monitoring and analyzing your models’ performance.

Get Started Now

Want to see your LLMs come to life? Seldon is here to help you deploy and manage your LLM infrastructure.

SEE HOW SELDON WORKS FOR YOU

Serve, monitor, explain, and manage your models today.

© 2023 Seldon Technologies. All Rights Reserved.