How to Use Open Source LLMs in AutoGen Powered by vLLM

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello and welcome to YeYu lab today I'm super excited to share something useful for you the autogen developer out there yeah I'm talking about integrating open source language models into autogen that means better privacy and way less cost previously I took you step by step through setting up AutoGen with a Sleek web UI and we had a blast exploring those awesome multi agent apps that can handle various tasks I've heard the blast in the tech community and based on that I put together a new series of tutorial that focus on extending the AutoGen configurations there are various similar methodologies for developing open source models like ollama fastchat in this tutorial I will introduce a very simple way with only one function adding to your autogen project without touching any existing program by which you will convert your GPT based agent to open source models on hugging face without further Ado let's dive if you are new here or maybe you need a quick refresher on the uh autogen framework let's hit to the basics autogen is the incredible tool from Microsoft that basically lets um AI agent have Smart conversations to tackle complex tasks collectively it's a very useful framework because of its custom conversation patterns and easy integration with human input and it integrates with other prompt techniques or tools like open AI assistant to make those AI agents more functional for those called assistant agents with configurations autogen provides only open AI models in the application demo of official uh notebook fortunately at the most technical level autogen text Generations relies on open Ai apis and this suggests that by using a flexible interface design to mimic open a apis we could actually run text generation using open source models locally building on this premise the vllm steps into the spotlight which offers a compatible interface that seamlessly redirects open a API to the interface serving running open source models vllm is an accessible and efficient Library designed to provide fast and affordable language large language models inference Services it hosts serving through port and efficiently manage memory resources especially for attention operations with its paged attention feature and vllm easily integrates with huggingface models and support high capacity serving with several decoding methods such as parallel sampling and beam search vm's user experience is enriched by is capability to stream outputs and provide an open AI compatible API server that's what we will use in this demo you will find it's quite big number of supporting models in this list and although the mainstream model architectures has already been included they still accept users submissions for adding more models all right get ready to get into the code let's do it the demonstrations will be focused on large language models local deployment that will need the provision of a GPU for computing power so for a straight forward demonstration I will showcase it using Google colab so firstly I will quickly show the a simple autogen app with two agents to resolve math problem by using the gp4 models as normal usage let's see how easy to implement it in the autogen framework make sure you have installed the py autogen and open AI packages now we should Define an llm configuration llm config is an ass component that enables agent with relevant LLM capabilities here we only use one GPT 4 model for the oi open AI config list you can define it it as an environment variables at this moment we only need to add GPT- 4 1106 preview model the next step is to construct agents let's construct two normal agents assistant agent which runs GPT 4 at the back end and mass user proxy agent which acts as the user proxy to provide the professional math question and execute code in its environment okay now we can prompt the conversation to let them generate an answer about a mass question find all X that satisfy the inequality as this you can see the output here the assistant agent generates a runnable python code that can calculate the inequality and the mass proxy agent directly executes it in its environment and prints the answer that's it let's see how easy can this app convert from uh GPT- 4 models to an open source language model here you should make sure that you have at least 16 GB of GPU memory in your runtime environment and you can also subscribe to uh Colab Pro to have have a v100 or a100 GPU environment since we are going to use vllm uh Library don't forget to install it in addition to the original LLM configuration with GPT 4 model we need to add an open source model to fit the interference into the 16 GB vram we choose the newly released uh model Phi-2 from Microsoft which is a small language models with only 2.7 billion parameters perfectly suitable for this autogen testing we can directly copy the hugging face pass Microsoft phi-2 into the configuration as well as set the API key to empty then we create the configuration object and filter out the phi-2 model now comes the most key step of this project we are going to implicitly run the vllm command in the python code as a subprocess to run the local models inference with configured parameters the main logic in this create LLM server function is to call the python module uh command vllm. entry points. open. API server by providing the local address and available port and additional parameters that run the hugging face models uh properly when the server starts running we can add the address with the port copy to the base URL to replace the original end point of open AI API so simple right The Next Step uh we are going to construct the agents the difference is that the model I used in this project phi-2 has only a 2K context window instead of a 4K contact window of minimum in open a definitions thus we can change the normal assistant agent to uh compressible agent which is derived from the same LM based conversible agents class but with content uh compress feature to prevent the token overload error using open a API layer the compressed config setting in the compressible agent is flexible to be changed according to your model restrictions please check the autogen official documents for more details on uh compressed parameters the math user proxy agent for executing python code Remains the Same as in the open AI model version now we can prompt conversation to let them generate an answer Please be aware that this model is not proficient in programming and is much less capable of following instructions compared to commercial large language models therefore I'm not focusing on assessing the quality of its text generation just my goal is to check if the inference feature in autogen function uh running property from the output the assistant powered by phi-2 can follow the instructions to generate python code in markdown format although the answer is obviously not correct but one thing that needs to be highlighted is that this inference spends almost 16 GB of my v100 GPU in the Colab Pro account for your reference and also it looks some additional output parser to handle with this model template um need to be done further once a well trained and designed open model is deployed in the same way with enough um computational resources the performance can be much better for sure okay that's it if you don't have enough GPU resources or Colab Pro account don't worry in my next video I will show you the way to use open source models without deploying locally that's all for today for the tutorial and a source code you can find the link in the description below don't forget to like subscribe and hit the notification Bell keep innovating and I'll catch you in the next one

Info

Channel: Yeyu Lab

Views: 4,415

Rating: undefined out of 5

Keywords: AutoGen, Chatbot, OpenAI, GPT-4, Python, open source, LLM, vLLM

Id: ds032PYcpgs

Channel Id: undefined

Length: 10min 48sec (648 seconds)

Published: Tue Dec 26 2023