DeepSeek Reasoning, the Cake Mystery

About

We put DeepSeek’s R1 reasoning model to the test in a live demo showcasing how context windows impact LLM performance in real-world scenarios. Using Seldon Core 2 and the LLM module, we orchestrate controlled experiments with DeepSeek’s 8B and 14B models, better demonstrating how GPU allocation, prompt engineering, and pipeline configuration shape the outcome.

Through a logic-based mystery use case, we explore what it takes to get reliable, explainable reasoning from open-source LLMs, and how to deploy them at scale in your Kubernetes environment.

Learnings
  • Mixture-of-Experts in Action: DeepSeek uses a MoE architecture to reduce memory requirements and enable large-scale reasoning without maxing out GPU capacity.

  • Scaling with Context Windows: Compare 4K vs. 12K token contexts to see how reasoning depth and consistency change with prompt length.

  • Deployable Reasoning Pipelines: Orchestrate multi-step LLM pipelines using Core 2, persistent volumes, and GPU-aware configuration via Kubernetes.

  • Real-Time Debugging & Resilience: Gain tips for troubleshooting model loading, token allocation issues, and CRD management during LLM deployment.

  • Structured Outputs with Think Tags: Learn how to structure system prompts for reasoning traceability, by splitting outputs into ‘thinking’ and ‘answer’ segments.

  • Practical Insights for LLM Engineering: From temperature tuning (0.6 recommended) to inference formatting, walk through the real setup of repeatable, testable LLM experiments.

  • When Bigger Isn’t Always Better: Discover why larger models don’t guarantee better reasoning, and how context alignment and prompt structure matter just as much.

Alex has been founding and building technology companies since 2003, specializing in mobile, data, and machine learning (ML). He experienced the challenges of scaling model deployment infrastructure whilst building a startup that served billions of personalized news article recommendations. In 2014, Alex founded Seldon with the aim of democratizing ML operations to solve the world's most pressing issues. He served as CEO until 2023, shaping the MLOps industry and delivering measurable, meaningful results for clients worldwide. As Founder and Seldon Board Member, Alex is responsible for the product and technology strategy and execution, focusing on the needs of customers today and in the future.

Complex, real-time use cases is what we do best

Talk with a expert to explore how Seldon can support more streamlined deployments for real-time, complex projects like fraud detection, personalization, and so much more.

MORE RECENT WEBINARS & EVENTS

Stay Ahead in MLOps with our
Monthly Newsletter!

Join over 25,000 MLOps professionals with Seldon’s MLOps Monthly Newsletter. Your source for industry insights, practical tips, and cutting-edge innovations to keep you informed and inspired. You can opt out anytime with just one click.

Email Signup Form