Large language models are performing increasingly impressive tasks, but practitioners are still actively researching how to optimize the functionality of LLMs. Today we’ll be exploring agentic LLMs and the current LLM LangChain paradigm. “Agentic AI” might be the next natural macro AI trend after “generative ai”.
Read on to discover where the current LangChain paradigm for defining LLM apps falls short, alternatives for the future of defining LLM apps, and the implications for how such apps may be deployed.
What is an agentic LLM?
An agentic LLM is an advanced language model that has the ability to use external tools in order to solve tasks.
This includes, for example, ReACT agents which can choose how and when to use tools from a wide selection, as well as simple Retrieval-QA pipelines interacting with a vector database in a predefined manner in order to gather relevant documents before generating an answer to a question.
How do LLMs currently work?
Currently, LLMs receive a prompt that serves as the initial input for a model. This prompt sets the scene, guiding the LLM’s behavior and setting the high-level goals.
What is LangChain?
LangChain is a user-friendly, open-source software development framework that focuses on composability. LangChain’s main goal is to simplify the creation of applications that utilize large language models (LLMs) as agents.
The prompt informs the LLM of both the resources and tools at its disposal, essentially creating boundaries within which it should operate and produce.
The prompt then provides an illustrative example of the flow the LLM is expected to adhere to. Finally, the prompt kick-starts a new flow, inviting the LLM to continue from where the example leaves off.
Why is LangChain popular?
LLM LangChain is an important tool for developers for several reasons. Firstly, this is because it abstracts away a lot of the complexity involved in defining applications that use LLMs. With LangChain, developers can leverage predefined patterns that make it easy to connect LLMs to your application.
A key attribute of LangChain is its ability to guide LLMs through a “Chain of thought”. The ability of the agent isn’t limited to that of a single LLM – LangChain offers users a wide range of components that can work together as building blocks for advanced use cases.
LangChain’s framework helps you manage your prompt by providing several functionalities to make constructing and handling prompts simpler. The prompt templates can include instructions, content, queries and more.
The framework itself doesn’t provide LLMs, but it interfaces nicely with external Language Models, Chat Models, and Text Embedding Models. LangChain also defines patterns for interfacing with for long-term memory, external data and other LLMs to solve tasks a single model could not handle alone.
What are the Limitations of the LangChain paradigm?
There are many shortfalls of this current approach. First off, there are many ways in which the agent can fail:
- There’s no guarantee the underlying model will follow to the specified format. If it deviates, , a final answer will not be reached.
- Following the specified format but generating tool names that don’t exist or tool inputs that that the tool can’t process.
- Following the specified format and generating valid tool names and inputs, but never terminating because it’s never confident it knows the final answer.
Another limitation of LangChain is that it can be fiddly and painful to define and modify.
While it may be simple to pick up one of LangChain’s default agents, even straightforward changes to the prompt/flow requires the developer to write new parser functions and stopping criteria etc. Defining custom agents from scratch inLangChain is not straightforward.
What are some of the speed and security issues LangChain faces?
Apart from this, LangChain can be very slow. Every time the LLM is asked to stop and then asked to continue, it has to reprocess the whole prompt from scratch. This becomes particularly problematic as the number and complexity of tools increases.
Prompt injection vulnerability is a significant concern for LLMs. If a user sends a malicious request like, “delete all files” or “reveal sensitive content”, the model will cause data loss or leak valuable IP without question. Straightforward safeguards can mitigate obvious attacks, but as Simon Willison notes, even 99.9% in application security is a failing grade.
Developers might encounter other unexpected bugs that they might not be aware of. For example, one less-known issue arises from tokenisation at the continuation boundary, as described by Scott Lundberg here.
Many of the issues described above only become more problematic as the complexity of the agent increases. Also, many of the problems apply to both simple retrieval-QA-answering LLMs as well as more open-ended agents such as ReACT.
What are the LangChain alternatives for the future of defining LLM Apps?
If we want to use LLMs to orchestrate components within larger systems that include non-LLM components, we shouldn’t just rely on luck to ensure that the text they generate can be understood and used by other components–we want to guarantee it.
To achieve this, we want to be able to constrain the output of the LLMs at the token level, ensuring that at each step the model has no choice but to generate valid outputs.
Libraries are emerging that not only facilitate this, but simultaneously address many of the problems we’ve discussed.
However, the vast majority of the benefits these libraries provide are currently only available when hosting the LLM locally.
→ So think GPT2 or Llama, rather than text-davinci-003 or GPT4.
→ Although, these libraries could still be used as framework for leveraging API-based LLMs within custom agents in an easier and more modifiable manner.
The main two libraries as of May 2023 are LMQL, a declarative, SQL-like programming language for language model interaction and Guidance, which defines a templating syntax for controlling large language models.
These two libraries take a similar approach to defining agentic LLMs, and have relative strengths and weaknesses when it comes to flexibility and accelerating inference.
How do LMQL and Guidance address the painpoints of LangChain?
Both LMQL and Guidance allow agents to be defined from scratch and are therefore fully customizable. They can also both enforce constraints on locally hosted models. LMQL particularly excels in both of these areas, and provides workarounds for API-hosted models.
However for constrained decoding of locally hosted models, LMQL doesn’t perform accelerations that can make a huge difference in practice. Perhaps they will in the future? Guidance already does.
Constrained decoding helps limit the attack surface for prompt injection by ensuring tool inputs adhere to certain constraints. While it’s not a complete solution in itself, it contributes to improving security.
For locally hosted models, Guidance has resolved or partially resolved all the issues we’re concerned about with LangChain. There is a strong incentive to use Guidance (or future similar libraries) rather than LangChain or writing all the logic on an application-by-application basis.
What are the Implications on Deployment?
If constrained text generation becomes the standard way of leveraging LLMs due to some compelling motivations mentioned earlier, it could lead to a shift from API-based LLMs to open/local LLMs.
This is because API-based model providers don’t currently expose the internals required in order to constrain and control generations. It is possible they never will, as exposing these internals will make their models more straightforward to duplicate.
An additional complication with constrained generation is that of batching requests from multiple users, or even batching multiple requests from the same. This is challenging because the boundaries between LLM calls, custom logic, and interactions with other components become blurred within the scripted prompting and hole-filling paradigm.
The future of LLM Apps and the LangChain Paradigm
In conclusion, while LangChain can be useful for rapidly prototyping and exploring what is possible with agentic LLMs, it also has several limitations including scalability issues, computational challenges, and privacy concerns. Alternative approaches leveraging libraries such as LQML and Guidance can overcome these limitations, with implications on the future of LLM app deployment.
Being open to embracing innovative solutions is critical for fully unleashing the power of LLMs.
Arnaud is the VP Machine Learning Research at Seldon. Before joining Seldon in 2018, he worked for 5 years in asset management as a quantitative researcher developing systematic trading strategies. Arnaud has published at top conferences and journals including AISTATS, ICML and JMLR, on topics such as machine learning explainability and drift detection. He has a diverse academic background including Master’s degrees in the fields of Chemical Engineering, Economics and Quantitative Finance.