Talk to AutoGen Agents using Whisper and Gradio

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
by using radio as the UI for our autogen project we can use the microphone to speak to our agents what is the latest price of Microsoft what is the latest price of Microsoft the autogen agents which can be two or more will get our audio prompt and communicate internally orchestrated by autogen to solve our problem and given us back an answer in this process the agents may even write code we can later access the script used by the agent to solve our problem we can even follow the communication between the agents and the trial and error steps used to reach the goal this is possible with a new framework called autogen from Microsoft with this framework you can create multi-agent conversation patterns to solve complex problems where prompt engineering comes to its limit autogen also introduced teachable agents which uses Vector databases to remember user teachings we will cover them as well as functions and group chats in the upcoming videos autogen brings AI development to the next level and can be used with open AI or local open source llms this tutorial will showcase a basic implementation of autogen where we have a user proxy agent and an assistant agent using audio prompts so let's dive in to implement our first autogen solution in our project folder create an empty directory for this project and change to the new directory from inside the folder start Visual Studio code first things first we create a requirements.txt file and put a list of packages needed for this project before installing the packages we want to create a virtual environment by typing python DM VN and autogen as the name of our virtual environment then we activate our virtual environment and make sure the name of the virtual environment appears before the promp in our case autogen next we clear the screen and install the packages listed in requirements.txt this may take some while when all of the packages are installed and the prompt comes back we can clear the terminal and go to the next step to install FFM Peg FFM Peg is a crossplatform solution to record convert and stream audio and video we download FFM peeg and extract it after extracting we rename the folder and move it to a location like our C drive and copy the path be careful to copy the location including the bin folder in my case C ffmpeg bin next we search for system variable Ables and add the location of FFM peeg to the system variable path to be sure we open the common prompt and check if the path is set up correctly the next step is to install whisper from open AI to convert our speech to text we copy the link and go back to visual studio code and install whisper after whisper is installed we can proceed to the next step to set up the UI we use gradio as the UI to be able to use the microphone of our device as input to start we simply copy the getting started code and paste it in a file with the name app.py in Visual Studio code we do some reformatting and we name our fun function to process with the shortcut F2 to test the code we use the terminal when we sort gradio it listens to the port 7860 when we navigate to The Local Host Port 7860 we see the UI and can enter a text and submit sure enough we get a greeting which shows that our UI works in the simplest form back in terminal we terminate the process with contrl C the next step is to create a n file and assign our open AI key to open aore aior key keep in mind with autogen you need to have a billing account with open AI or you get regularly array limit error autogen is a cost intensive operation so be careful and set hard limits to avoid billing surprises because of the cost and privacy we will consider using local open-source llms but for now we simply use chat GPT to keep this tutorial simple we change and expand our simple code in app.py the code should be familiar if you have watched other videos on this channel but just to recap the import open AI to use whisper for our speech to text functionality we use OS andn to be able to sore our API Keys Inn and load them in our code we expand the process function to get the file pass where our audio is stored read the audio binary and transcribe it using whisper finally we return the text part of the transcript back to the gradio interface in the interface we change the input to an audio component so that we can utilize the microphone the audio file will be stored on the disk and the path to it will be passed to our process function finally we launch the interface using the terminal and the app listens to Port 7860 when we navigate to the port 78 860 we notice the input is changed from text to audio and we can use our microphone to record what is the latest price of Microsoft what is the latest price of Microsoft whisper does an excellent job of converting our speech to text and this text will be used as prompt to our autogen project now we come to the second part of the tuto tutorial to implement a basic autogen solution similar to n we create a Jon file called oore config unor list in the simplest form we Define a model and our API key but here you can list multiple models we will cover them in upcoming tutorials we can close the file and start integrating the autogen functionality in our app first we import from autogen our two agents assistant agent and user proxy agent and additionally config listor fromjson then we load the configuration from oore config unor list to config list similar to the functionality ofn till here nothing is special only some preparation The Next Step is to create our assistant and pass the configuration and then create the user proxy here we Define a directory to store the generated code and we set the human uncore input mode to never so we do not have to confirm the steps in the middle of the agent's conversation this takes the control from the human and passes it to the agents for more Automation and in forms the AI Bots that we are only interested in the final result now comes the main integration we get the audio prompt and initiate a conversation between our agents with the audio prompt the agents will communicate back and forth and write code in the coding folder we defined to solve our problem when it's done we can get the chat messages but we are only interested in the messages of our user proxy agent so we grab only the content where the role was user and join them in one string with some separator lines and finally return this string back to the UI when we start the app and navigate to The Local Host Port 7860 we have the same UI but this time our audio will be converted to text and used as a prompt to our autogen framework what is the latest price of Apple what is the latest price of Apple we can check the communication between the agents in our terminal and finally the answer comes back to the UI this part is finished successfully but the disappointment comes when we start another Quest question what is the latest price of Microsoft what is the latest price of Microsoft this time the agents get too polite and got to an endless loop of tanking each other unfortunately this kindness cost us tokens and we must at some point cut this act of kindness don't worry the technology is new and there are some configuration to prevent this Behavior but for now we check the fq and see this is a known issue for jat GPT 3.5 turbo and a simple solution for that we add a termination note to our audio prompt and test our app again what is the latest price of Google what is the latest price of Google this time we get our answer back without any problems as mentioned this framework is new and under active development we will focus more on autogen and it gives us a framework to build an army of bots for our business consider autogen as the next level of your Automation and this tutorial as a starting point to use it good luck
Info
Channel: business24_ai
Views: 3,491
Rating: undefined out of 5
Keywords: autogen tutorial, autogen, autogen tutorial for beginners, Gradio, whisper, speech-to-text
Id: WysBjwJoulo
Channel Id: undefined
Length: 11min 58sec (718 seconds)
Published: Mon Oct 30 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.