Create Your Own AI: Transformer Agents Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hugging face recently released something called Transformer agents and you might be wondering what exactly are Transformer agents so Transformer agents make use of ordinary large language models and combine them with other tools which are available on hacking face and since hiding phase is a really large platform which contains a lot of different machine learning models related to Vision related to audio and many many more so what it does is it combines these other tools which are available on hugging face and combines them with large language models to extend the capability to allow them to do much more complex tasks which they might not be able to do by themselves so in today's video we are going to be looking at how exactly these Transformer agents work we will also be comparing them with other similar agents out there and we are going to be doing a short demo of exactly how you can make use of hugging face transformation agents so let's get started so this is a basic diagram of how these Transformer agents work so imagine you're giving an instruction called read out loud the content of the image I'm passing and obviously with ordinary large language models it's not going to be able to visualize or see what the image contains so it would not be able to answer this question however what with Transformer agents what we're doing is we're making use of tools so tools which exists on a hugging phase so we're going to be making use of image generator image captioner and then text to speech we'll be creating a prompt which will be I will ask you to perform a task your job is to come up with a series of simple commands in Python that will perform the task you can print intermediate results if it makes sense to do so and you can make use of the following tools Etc so the task is to read out loud the content of the image and when we pass it to the agent what the Transformer agent does is it's able to understand okay so since I need to read out loud the content of the image I need to make use of the image captioner model to caption the image and then once I do that I also need to use the text to speech model to read it out loud because you are going to be converting text into audio that is when it passes this into a python interpreter in terms of python code so we have you know creating a caption from the image captioner and once you do that you also will be making use of audio so text to speech and then you're going to pass the caption in as well and what you end up getting is this when you pass this image in initially you end up getting the description of the image and then it is playing out loud so it's giving you audio as well that's a basic overview of how transforming agents work now let's take a look at some of the tools which they have access to so these are the following tools which hugging phase has actually integrated with these Transformer agents so the first is document question answering so you're able to then give a PDF and then the large language model is actually able to read the PDF and then understand and also answer questions you might have regarding what's inside of the PDF you also have text question answering unconditional image captioning which makes use of the blip model image question answering so given an image answer a question about this image which makes use of the wilt model we also have image segmentation so given an image and a prompt output the segmentation mask of that prompt so we that that makes use of the clip segment model it also has speech to text which makes use of the whisper model created by openai and there's a couple of more as well on top of the tools which already exist in hugging phase hugging phase also has created some custom tools and has integrated these custom tools into these Transformer agents so for example text downloader text to image image transformation and also text to video so leg chain is also something which is very similar to how these Transformer agents work Lang chain also makes use of large language models and then also integrates it with a lot of different tools however I would have to say that land chain has much more tools out there which are able to be integrated with its large language models so for example Google API it also has data breaks cohere and many many more and there's just a lot more functionality with what Langston offers however hugging phase Transformer agents is still an experimental API so I think we can definitely expect to see a lot more of these similar Integrations come up with the hugging phase Transformer engines as well so Langston also supports chaining of large language models which means that you can use multiple large language models in a sequence to carry out much more complex tasks which I'm not sure if the Transformer agents created by hacking face has that similar functionality since they have not actually mentioned anything like that in their documentation next up we are going to be testing out these Transformer Asians inside of a Google collab you can find this Google collab on the Transformers Asian page on hugging face and the link is right over here so let's get started the first cell is essentially setting up our hugging face token and also setting up the Transformers as well so you can go ahead and run that once you run this code you should see a message asking you for your hiring phase token which you can copy from your profile on hugging face and you can click login once you are hugging phase token has been validated we can move on to the next step so the next step essentially we get to choose what is the base large language model that we want to use currently we have three options so first is star coder second is open assistant and then third is openai's gvd3 I'm going to be making use of star coder but you feel free to use any other model that you want if you're making use of the openai model you will get a prompt to include your API key so make sure to do that as well once you have chosen your model you can hit run and you should get a response like this so star coder is initialized and then we can do a test run of the model so let's go ahead and click this uh with this command what we're doing is we're asking the model to generate an image of a boat in the water once we hit run we get an explanation from the agent explaining how it's actually planning on doing this so it's saying that I will be using the following tool image generator to generate an image once that is done you get an image generated which contains an image of a boat in the water just like what we have prompted it next up let's actually do something more on top of this let's ask it to caption this image the cool thing with this line of code right here which allows the agent to caption this image is that we get to pass variables so we are creating a boat underscore image variable and we also get to call it in the prompt that we're giving it so we get a caption for this image which is a boat floating in the ocean with a small boat in the background currently we have been using agent dot run to run all of the prompts that we have been giving this Transformer agent but next up we are going to be using agent dot chat so agent.chat will essentially this method will essentially allow us to chat with the agent and also it keeps all the previous prompts that we give it in memory so we can access them again so this is this is really cool so now I'm going to run the command Asian dot Chat Show me an image of a capybara and it makes use of the image generator to generate this image of a capybara so this is pretty cool the fact that it's also telling you which type of tool it's using and the fact that it can generate pretty accurately what we're looking for in the next command what we're going to do is we're going to ask you to transform the image so it looks like it has node and when we do that this is the type of image that we get in return run which shows you know snow or fog on top of the capybara also next what we're going to do is show me a mask of the snowy capybara so what this does is we're asking it to create a segmentation right it automatically understands that and what it says is I'll be making use of the tool image segmenter in order to do this so to create a segmentation mask and this is what we get in return this is really really cool and there are so many possibilities to this because of the type of tools which are integrated with these Transformer agents this is definitely worth trying out for yourself and playing around with different examples as well let us know your thoughts on Transformer agents in the comment section below thank you guys for watching And subscribe for more AI content
Info
Channel: AssemblyAI
Views: 11,062
Rating: undefined out of 5
Keywords:
Id: Q7KhrSbEnSQ
Channel Id: undefined
Length: 9min 48sec (588 seconds)
Published: Tue Jun 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.