All right. So in this video, I want to talk about
the five problems that I keep seeing again and again that people face
of getting their agents good enough to basically put into production. I get a lot of questions about in
regards to frameworks around this. And while I'm trying to be sort
of reasonably framework agnostic here, certainly some of these
things apply a lot more to some frameworks than to other frameworks. So one of the things that came up
recently was someone asked me about putting CrewAI into production. And my comment was that I actually would
never currently put CrewAI into production based on the fact that, there were so many
issues with it that I wouldn't trust it. Putting things like LangGraph
into production that's certainly much more reliable. but I think you've got some of these
problems with all of the different agent frameworks if you're not
aware of them and if you're not thinking about how to basically fix
these problems as we go through. So let's dive into this. By far the number one problem
for all of agents out there at the moment is reliability. So talking to a lot of startups,
talking to a lot of companies that want to do agents the thing I'm seeing
consistently is that companies are very reluctant to do agents, for anything
really complicated just because the reliability of the agents is so low. While your typical company wants five
nines of reliability, they'd probably even settle for, two nines of reliability,
meaning 99%, but most agents are probably at best getting around 60,
70 percent of being able to do things. Now, there are some places where maybe
that's okay, but for the majority of things, getting something into
production, you have to make it reliable. You have to be able to make it
consistently be able to produce an output that the end user
would be able to benefit from. That the end result would be able
to, be like they expect it to be and something that they can
benefit from and actually use it. there's no use of creating agents that
only work some of the time, and then end up failing a large percentage of the time. The issue that creates is the whole
issue of humans then having to basically check every single thing in the agent. Now that's fine if you're, starting
out and you're trying to make training data or something like that, and
you've got a human in the loop and you're doing that kind of thing. but really what we want for agents
eventually is we want to be able to be fully autonomous, to be fully
operating by themselves, producing a consistent level of result, without a
human having to be in the loop there. So this brings us to some of the
things that actually go wrong. So the second thing that I see
happening a lot is, agents going into excessively long loops and this can
be for a variety of different reasons. But it's quite common to see this in
CrewAI and some of the other frameworks. , where you'll have it set up and the
agents will basically not like the output , either of a tool, which can be
one of the ways that this happens quite often is a failing tool or a tool that
sort of just don't working in some way. the other way too, though, is that where
the LLMs basically, get a response out from one sub agent to the next part. And it just decides that no,
it needs to do that part again. And it just gets into this loop of going
through it again and again and again. Now this is one of the frustrations
I've felt a lot with CrewAI and with some of the others. with LangGraph, what I actually do is
I sort of hard code it so that we kind of know how many steps it's taking. Now CrewAI has actually, set up a thing
also that does something like that nowadays too where you can actually
limit the number of steps that it goes through or repeats and retries
that it does for this kind of thing. But this is a very common pattern
that you see with LLM agents, that they get into these kind of loops. And a lot of what you have to think about
when you're architecting an agent is actually how to handle any of these loops. ideally you want to reduce them to none. but if they do happen, you want to make
sure that your overall sort of agent or system is aware that they're happening. And then puts a stop
to them pretty quickly. Otherwise, you find that you end up
just getting an agent, just going on, making LLM call off the LLM call. And if it is, fully autonomous where
you're not watching that, they can get very expensive very quickly if you're
using an expensive model or something The third problem that can
go wrong is around tools. Now, tools is something that
I've been meaning to make a lot more videos about, in here. In the previous section, I
talked about failing tools. And this is something that
happens a lot, that I feel like people are often not aware of. while the tools in things like LangChain,
a pretty nice for starting out, you're gonna find that you want to customize
them a lot to your specific use case. you need to understand that a lot of
those tools were made over a year ago. They were very simple at the time. They're not really made for agents a lot. They're often made more for use
in sort of RAG than agentic stuff. and you really find that what
you want to do is basically make your own set of custom tools. Now I will follow up with a video
talking a bit about custom tools, but I will say that, tools are really
your agents sort of secret sauce. if you got a really good set of tools
that basically can filter inputs can use inputs in the right way. can generate outputs that are going
to be beneficial to the actual LLMs. So really the whole tools thing
is all about how do you get data? how do you manipulate data? And how do you prepare it for an LLM? And then when it fails, how does the
tool basically tell the LLM that it's failed in a way that, is actually
going to be beneficial Rather than going into an endless loop in here. So you can see for often really simple
things, I will make quite complex tools. This is an example of a webpage diffing
tool, just to check, basically the outputs of a web page so then an agent
can tell when a web page has been updated. So for example, this was a simple use
of the tool for basically checking, if OpenAI's webpage had been updated. it could then basically assess
what new links were there, and then be able to go to those new links. and find out what had been
announced for, returning news, returning different kinds of things. Now the same kind of thing, worked
nicely on sites like CNN and other news sites and stuff like that. The idea here though, is that
this is a very custom tool for a very specific use case. And that's how you want to think about
most of the things that you're doing. When I look at some of the best, agents
that I see companies doing, they've generally got very specific tools
that, they are able to sort of handle, different kinds of input, work out
what they need to do to generate data, et cetera, provide that back to the
agent in a way that's useful so that the agent can know what's going on. one of the sort of classic examples
is if you look at a lot of the simple search tools while they'll return
information about, what's on the page, they don't actually provide the URL. so you want to sort of go through and
customize some of those things so that you're actually getting the URL back. You're storing those URLs. You'll then basically, caching
any response to that URL. So, if you're scraping that URL,
then you're caching it so that your agent can basically use that cache
again and again, without having to do any kind of, repeating itself
of calling these different things. this is a whole class of what I
would call sort of intelligent tools that you want to build in here. All right. Th this brings us to the fourth
problem that I see a lot is the whole idea of self-checking. you need your agent to have some thing
or some way of being able to check its outputs and see, is it generating
outputs that are useful or not useful? the classic example of this
would be, with code examples. So if you got an agent that you've
got, that's actually generating code, you want to make sure that at some
point, that code is checked and that might be as simple as running a unit
test on it to see, do all the imports work, do the functions actually run,
and return what I expect for them. You want to set up some tests for
things like that So that you can actually check the output of the code
that the agent is actually generating. Now in lots of other use cases, you're
not going to be generating code. So you need to think about in those
sort of situations, how will your agent have the ability to know if something
is right versus if something is wrong, how can it check to see that this is
something that's going to be useful versus something that's just going to be totally
off base of what the end user wants? and that can be things like, checking
URLs, LLMs loved hallucinate URLs. So check, do those URLs actually exists? Do they not exist? That kind of thing that you want to
think about as you're going through, but this idea of self checking
is a really sort of key thing. The last thing, I think that you
need to think about a lot and that I see as a big problem with LLM
agents is the lack of explainability. So you really want to think about
when the user actually gets a result back at the end from an agent. Can the agent sort of
point to some explanation? Now this could be citations
is a great way of doing this. citations showing exactly where the
information that used to basically make a decision or to do something, was, That gives people a lot more confidence
in the output of the agent when they can see why the agent said
something, or why the agent gave a certain result, that kind of thing. It can also be things like, being
able to look at a set of log files or look at a set of outputs that
the agent made along the way. So this brings us to sort of
like the sixth of the bonus sort of thing that you need to think
of, which is debugging an agent. you need to have some kind of outputs
or some kind of logs that are kind of intelligent and not just purely
calling the LLMs and the agents. That's one way of doing it, but can
be very tedious way of going through. You need to be able to assess at which
point does the agent start to fall apart? Now, remember a lot of this stuff. if you're using the LLM agent, you should
be using that to basically make decisions. And perhaps generate, tokens out, as
either text or as code or something like that but mostly what you're
using the reasoning part of an LLM agent is to be able to make decisions
is to be able to see these things. Now you want to make sure that's
something that gets logged independently that's quite easy for
you to see, ah, okay, this looks a bit suspicious what's going on here? Can we debug this? We can look at the reasoning
points in the agent as we go along. So these things I think are things that
you need to be thinking about constantly when you're doing anything with LLM
agents, autonomous agents in here. far too often, I see people doing stuff
that actually, you don't even need an LLM, to do some of these things, you
can just basically, sequence them up .
There's no need for any sort of decision
point or something like that in there. make sure that, when you're building your
agent, you want it to have as few decision points as possible to get the outcome that
you want to be able to achieve with this. So go back and assess some of your
own agents and look at it and think about, okay, where are the points
of decision, going on in here? And how am I checking to make sure that
each of these things is being conformed to, so that you do get the actual sort
of reliability out of these things. Are we making a bunch more videos
of looking at building things with LangGraph, even with things like CrewAI. Even though, I don't think
CrewAI is ideal for production. I think it's great for trying
ideas out really quickly. I'll show you some sort of things
that I've been doing with that To be able to build some of these crews
really quickly and try out ideas and get a sense of what is probably going
to work, what is not going to work. and then look at, more about how
converting them across to much more sort of low level code things like
LangGraph, things like just coding some of these things in plain Python. Often you don't need a framework
to do some of these things. and that's something that I
want to go into more in the future as we go through this. Anyway, hopefully this video was
useful to get you thinking about the key things that go wrong in
getting LLM agents into production. And how you can start just think
about mitigating some of these problems that you come across. As always, if you've got
comments or questions, please put them in the comments below. If you found the video useful,
please click like and subscribe. And I will talk to you in the next video. Bye for now.