How to Use Open Source LLMs in AutoGen Powered by vLLM

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to YeYu lab today I'm super  excited to share something useful for you the   autogen developer out there yeah I'm talking  about integrating open source language models   into autogen that means better privacy and way  less cost previously I took you step by step   through setting up AutoGen with a Sleek web UI and  we had a blast exploring those awesome multi agent   apps that can handle various tasks I've heard the  blast in the tech community and based on that I   put together a new series of tutorial that focus  on extending the AutoGen configurations there   are various similar methodologies for developing  open source models like ollama fastchat in this   tutorial I will introduce a very simple way with  only one function adding to your autogen project   without touching any existing program by which  you will convert your GPT based agent to open   source models on hugging face without further  Ado let's dive if you are new here or maybe you   need a quick refresher on the uh autogen framework  let's hit to the basics autogen is the incredible   tool from Microsoft that basically lets um  AI agent have Smart conversations to tackle   complex tasks collectively it's a very useful  framework because of its custom conversation   patterns and easy integration with human input  and it integrates with other prompt techniques   or tools like open AI assistant to make those AI  agents more functional for those called assistant   agents with configurations autogen provides  only open AI models in the application demo   of official uh notebook fortunately at the most  technical level autogen text Generations relies   on open Ai apis and this suggests that by using a  flexible interface design to mimic open a apis we   could actually run text generation using open  source models locally building on this premise   the vllm steps into the spotlight which offers  a compatible interface that seamlessly redirects   open a API to the interface serving running open  source models vllm is an accessible and efficient   Library designed to provide fast and affordable  language large language models inference Services   it hosts serving through port and efficiently  manage memory resources especially for attention   operations with its paged attention feature and  vllm easily integrates with huggingface models   and support high capacity serving with several  decoding methods such as parallel sampling and   beam search vm's user experience is enriched by is  capability to stream outputs and provide an open   AI compatible API server that's what we will use  in this demo you will find it's quite big number   of supporting models in this list and although  the mainstream model architectures has already   been included they still accept users submissions  for adding more models all right get ready to get   into the code let's do it the demonstrations  will be focused on large language models local   deployment that will need the provision of  a GPU for computing power so for a straight   forward demonstration I will showcase it using  Google colab so firstly I will quickly show the   a simple autogen app with two agents to resolve  math problem by using the gp4 models as normal   usage let's see how easy to implement it in the  autogen framework make sure you have installed   the py autogen and open AI packages now we should  Define an llm configuration llm config is an ass   component that enables agent with relevant LLM  capabilities here we only use one GPT 4 model   for the oi open AI config list you can define it  it as an environment variables at this moment we   only need to add GPT- 4 1106 preview model the  next step is to construct agents let's construct   two normal agents assistant agent which runs GPT  4 at the back end and mass user proxy agent which   acts as the user proxy to provide the professional  math question and execute code in its environment   okay now we can prompt the conversation to let  them generate an answer about a mass question   find all X that satisfy the inequality as this  you can see the output here the assistant agent   generates a runnable python code that can  calculate the inequality and the mass proxy   agent directly executes it in its environment  and prints the answer that's it let's see how   easy can this app convert from uh GPT- 4 models  to an open source language model here you should   make sure that you have at least 16 GB of GPU  memory in your runtime environment and you can   also subscribe to uh Colab Pro to have have a  v100 or a100 GPU environment since we are going   to use vllm uh Library don't forget to install it  in addition to the original LLM configuration with   GPT 4 model we need to add an open source model to  fit the interference into the 16 GB vram we choose   the newly released uh model Phi-2 from Microsoft  which is a small language models with only 2.7   billion parameters perfectly suitable for this  autogen testing we can directly copy the hugging   face pass Microsoft phi-2 into the configuration  as well as set the API key to empty then we create   the configuration object and filter out the phi-2  model now comes the most key step of this project   we are going to implicitly run the vllm command  in the python code as a subprocess to run the   local models inference with configured parameters  the main logic in this create LLM server function   is to call the python module uh command vllm.  entry points. open. API server by providing the   local address and available port and additional  parameters that run the hugging face models uh   properly when the server starts running we can  add the address with the port copy to the base   URL to replace the original end point of open AI  API so simple right The Next Step uh we are going   to construct the agents the difference is that the  model I used in this project phi-2 has only a 2K   context window instead of a 4K contact window of  minimum in open a definitions thus we can change   the normal assistant agent to uh compressible  agent which is derived from the same LM based   conversible agents class but with content uh  compress feature to prevent the token overload   error using open a API layer the compressed config  setting in the compressible agent is flexible to   be changed according to your model restrictions  please check the autogen official documents for   more details on uh compressed parameters the  math user proxy agent for executing python   code Remains the Same as in the open AI model  version now we can prompt conversation to let   them generate an answer Please be aware that this  model is not proficient in programming and is much   less capable of following instructions compared  to commercial large language models therefore   I'm not focusing on assessing the quality of  its text generation just my goal is to check   if the inference feature in autogen function uh  running property from the output the assistant   powered by phi-2 can follow the instructions to  generate python code in markdown format although   the answer is obviously not correct but one  thing that needs to be highlighted is that   this inference spends almost 16 GB of my v100  GPU in the Colab Pro account for your reference   and also it looks some additional output parser to  handle with this model template um need to be done   further once a well trained and designed open  model is deployed in the same way with enough   um computational resources the performance can be  much better for sure okay that's it if you don't   have enough GPU resources or Colab Pro account  don't worry in my next video I will show you the   way to use open source models without deploying  locally that's all for today for the tutorial   and a source code you can find the link in  the description below don't forget to like   subscribe and hit the notification Bell keep  innovating and I'll catch you in the next one
Info
Channel: Yeyu Lab
Views: 4,415
Rating: undefined out of 5
Keywords: AutoGen, Chatbot, OpenAI, GPT-4, Python, open source, LLM, vLLM
Id: ds032PYcpgs
Channel Id: undefined
Length: 10min 48sec (648 seconds)
Published: Tue Dec 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.