Hi everyone, I'm Dan Taylor, a
product architect working on Azure AI Studio. In this video, I'll be showing
how you can build your own copilot in Azure AI Studio with
custom Python code by using the Azure AI SDK and CLI in our
hosted VS Code web development environment. Our goal is to build an API that
grounds your copilot and your enterprise data, and to do that
we'll be creating a chat function that will take a list
of chat messages from the user, generate A vector representation
of the user's question using an Ada embedding model from Azure
Open AI service. We'll then use this vector to
perform a vector search on an Azure AI Search index to
retrieve relevant documents, and then we'll use those documents
to generate a prompt that we send to a GPT 35 Turbo model
hosted in Azure Open AI service. We'll deploy this API to a
managed online endpoint so that it can be consumed by a web
application which is outside of the scope of this video. And then all of this will be
hosted in an Azure AI Studio project so that we have a single
place to organize all the components of our application. In the video, we're gonna be
taking the following steps to start from complete scratch
without skipping anything. To deploy this application,
first we will go to Azure AI Studio to create an Azure AI
resource and project. We'll then open VS Code Web and
clone a sample app that we'll start with. Then we'll move on to using the
Azure AI CLI to create the model deployments in the Azure AI
Search index, and then test the model deployments and search
index to make sure they're working. And finally, generate
environment variables that our code can use to connect to the
Azure AI resources. Then we'll switch over to the
Azure AI SDK, where we'll use the SDK to run and evaluate the
chat function locally to make sure it's working well. We'll then improve the prompt
used in the chat function and evaluate how well those
improvements are improving the quality of our Copilot
responses. We'll then deploy the chat
function to an API, and then we'll just invoke the API and
get a streaming Jason response as an example. All right, we're going to get
started building everything starting from scratch from
ai.azure.com. Let's dive right in. Let's go ahead and head on over
to ai.azure.com. From the homepage, we'll click
the Build your Own Copilot button to create a new project. Here we'll specify a project
name and we'll select to create a
new Azure AI resource to host our project. When creating the resource,
we'll just pick our subscription and the location that we want to
host our resources in, which will be E us. Now we'll go ahead and wait for
this project to be created. This project is connected to an
Azure AI resource which organizes the compute data and
models that we'll need for our solution. A resource can have multiple
projects connected to it, which allows you to share your cloud
resources amongst different members of your team. Now that we've created our
project, we'll go ahead and get started coding by opening the
project in VS Code Web. First, we'll need to create a
compute that VS Code Web runs on, so we'll go ahead and click
Create here. Now that our compute is created,
we'll set up an environment for VS Code to run in on this
project. Note that you can have different
environments and different projects running on the same
compute. The environment is basically a
container that is available for VS Code to use for working
within this project. Now that our environment is
ready, we'll go ahead and launch VS Code Web, which will open up
in a new browser tab. When VS Code opens, we get a
helpful read me telling us about how to use this VS Code Web
environment. In particular, I want to point
out that there is a personal code folder for us to use for
our own personal work for cloning git repos, and a shared
folder that has files that everyone that is connected to
this project can see. So if we have shared assets or
data that we want to share with other people, we can go ahead
and move it to that shared folder. And if we created any prompt
flows in our AI Studio project, they would also show up here in
the shared folder as well. First, we just need to allow VS
Code to trust the code in this repo. We'll press CTRL Shift` to open
the terminal and we'll go ahead and clone the sample repository
that we have in the read me here. So we'll CD into our code folder
and we'll go ahead and clone that sample repo and then we'll
CD into that sample repo. We can see that there is a
helpful README that comes with the sample repository which
contains instructions that we can use to install packages and
create our application. First, we'll follow the
instructions to create a virtual environment for installing
packages, and then we'll install the requirements dot TXT file
which contains the Azure AI SDK, including the generative package
for running evaluation, building indexes and using Prompt Flow. And when the packages are
finished installing, we can go ahead and follow the remaining
steps in the README file. We will run the AI init command
which will create and configure everything we need for our code
to run. The first thing that the CLI
prompts us to do is it helps us log in and authenticate with the
Azure CLI. So we'll see that this supports
2 factor authentication and then we can go ahead. Once this login is completed, we
can go back to VS Code Web and continue the steps from there. I can now pick my Azure
subscription to use, select the project that we just created,
and now we'll be able to create the models from the Azure Open
AI service that we need. So first we're going to select a
chat model. Let's go ahead and use the GPT
35 Turbo 16K model. It's created a default name for
us to use for that deployment. Now we want to select our
embeddings deployment, which will be used to vectorize the
data from the users, and again, we'll just select the default
name. And then for evaluation we'll go
ahead and also use the GPT 35 Turbo 16K model, although GPT 4
is also a good choice for better results. We're going to create a new
Azure AI Search resource here to host our vector index, and we'll
just pick the location E US. We want that in the same place
as our project, and we'll go ahead and use the same resource
group that was generated for our Azure AI Search resource. And again, we'll just pick the
name that the CLI suggested for us here for the Azure AI Search
resource. And now the CLI will wire up all
of our choices and configurations so that they can
be used by our code. First it outputs this config dot
Jason file which is used to point the sample repo at the
project that we created. And now we'll also run the AI
search command to create our search index by ingesting the
markdown files that are in the sample repository. So here in the data folder we
have all these product catalog informations for this company,
Contoso Trek, which is product information for our enterprise
that we want to use to ground our copilot in. And so now that we've gone ahead
and created that vector search index before we start running
custom code, we can use the actual AI chat feature to use
the built in chat with data capabilities of the Azure AI
Studio just to make sure that everything's working properly. And so we can ask a question
like what tent is the most waterproof of the Assistant And
we can see if the built in chat with data capabilities are
working. And we can see that that is now
retrieving information from these markdown files that we
just ingested. Awesome. So now that we can tell our
resources are all set up correctly, now we want to
generate environment variables for use with our code so we can
use the AI dev new command to generate a N file and so this N
file contains all of the configurations that we just set
up with the AI init command and wired that up to our code. If you're a developer, you may
be used to creating these N files manually, which is a lot
of tedious work. And the CLI has automated all of
that for us, which is great. And now that we've configured
our environment for our code to run, we can move on to running
the sample Copilot application. So first, we're just going to
copy this command here, which invokes the run dot PY file. This run dot PY file contains a
reference implementation that shows how to use the Azure AI
SDK to implement the Copilot. So you can see that we asked
that same question that we just ran from the CLI and we can see
that in response we got the answer to the question that
which tent has the highest waterproof rating? And we also can see the
retrieved documents that were returned from our Azure AI
Search index. Let's go ahead and take a peek
at this run dot PY file to see how this works. So we can see at the top of the
run dot PY file we load that end file that was just output by the
Azure AI CLI and if we go to the bottom of the file we can see
here what we do is we just simply take the question that
was passed on the command line and we run this chat completion
function with the question as a single message from the user. Now if we go and look at the
implementation of this chat completion function in the
sample repo, we can see it's actually pretty straightforward. First we take the list of
messages from the user. We get the last message in the
conversation and we pass that to the getdocuments function. The get documents function then
uses the Open AI SDK to 1st embed the user's question into
this vector embedding and then we use the Azure AI Search SDK
to then do a vector search using that vector query and return the
documents back to the user. Once we have those documents we
then generate the prompts. The prompt is generated using a
Jinja template, which we can take a look at here. That Jinja template contains the
instructions to the Azure Opening service model and
contains the list of documents and then we return the response
back to the user. It's that simple. Now you'll notice that this does
pretty much the same thing that the built in chat with data
capability did. However, since it's in custom
Python code, we can do whatever we want here to add calls out to
our own APIs, calls out to retrieve documents from other
data stores. We can customize this however we
want. We can change the prompts. We can change the parameters. We can really make this our own,
which is going to be especially important as we want to take
this Copilot to production. And Speaking of production, one
critical part of getting your Copilot to be production ready
is to evaluate and improve the quality of responses that you
get with the Copilot. So we're going to move on to
evaluating this Copilot next. And for that we're going to use
this evaluation data set which contains a bunch of example
questions and answers that we can use to see how well the
Copilot is performing. So to run this, we can go ahead
and use the evaluation command in the run dot PY file. We can say dash, dash, evaluate. And while that's running, let's
go ahead and look at that implementation of the
evaluation. This imports the evaluate
function from the Azure AI generative SDK package and it
loads that sample data set that we are just looking at. And then it runs the evaluation
call which takes the chat function as the target with the
data set and generates a set of GPT assisted metrics to evaluate
the quality. We can see in the output here
that for each question we get the answer as well as the
metrics in this nice table format. We also can see the evaluation
output as Jason files, so we can look at. Let's just turn on word
wrapping. Here we can look at for each
question, what was the answer that came back from the copilot
and what were the retrieved documents as well as a score. So we can use this sort of the
debug the answers, but we can also use this URL here to open
the evaluation results in Azure AI Studio where we can get a
nice visual of all of the inputs and outputs and we can use this
to debug and improve. So here we can see the
distribution of scores. So here we're calculating a
standard set of GPT assisted metrics that help us understand
how well is the copilot's response grounded in the
information from the retrieved documents. There we've got a score of four,
point O 8. We can see how relevant is the
answer to the user's question. Here we've got an average score
of 4.38 and overall how coherent and natural is the response of
the generated text. And here we get a score of four
point O 8. We can look at the individual
rows and questions that are being asked, so we can see for
each question what is the answer provided, what were the
retrieved documents in this nice visual here and also the
provided truth answer. So as we Scroll down here, we
can see that the brand for the trail Master tent the this score
is low with ones we can see that the answer, the copilot didn't
even attempt to answer the question. So that's maybe one question
that we want to be able to improve the answer on. So let's go back and see if we
can improve the prompt using our Copilot here. We've had a teammate who's
really good at prompt engineering come up with a nice,
safe and responsible and helpful prompt. And we're just going to update
that prompt and rerun evaluation from the command line. We're going to say dash
evaluate. And this time we're going to
give it evaluation name of improved prompt so that we can
easily keep track of this evaluation result when we go
back to the studio. All right, now that that
evaluation has completed, we can go back to the studio here. This time we'll just look at our
list of evaluations. We can see the history of those
evaluations. So we can always go back to
previous results. Here we're going to click the
two evaluations and then we're going to compare. When we compare, we can see that
we can see that the scores with this new prompt have improved
significantly overall. And then we can for each of
these responses see the differences in the two Co
pilots. And as we Scroll down, we can
see that in this case that we are actually able to get a
better, better answer to that question of what the brand is
for the Trail Master tent. Now let's go ahead and deploy
this Copilot to an endpoint so that it can be consumed by an
external application or website. To do that, we'll go ahead and
run the deploy command in the run dot PY and we'll actually
provide a deployment name here, Copilot SDK Deployment. So again, this just uses the
Azure AI generative SDK to deploy the code in this folder
to an endpoint in our Azure AI Studio project. So we can see that it just uses
the Deployment class here. That takes the current folder,
the conda dot yaml, as a set of packages to install, and it
points to the chat function that we just looked at. As well as setting up all of the
environment variables, including grabbing secrets from our
project so that when the code is deployed in a production
environment, it runs just the same as it does when running
locally. So that endpoint has been
created and the source code has been uploaded. Now we'll just wait for our
endpoint and deployment to be ready to use. Now that our deployment has
completed, we can run the invoke command on run dot PY and here
we can specify the deployment name. This will return the response as
a full Jason BLOB. Here we can see the answer as
well as those retrieved documents. We can also specify the stream
command which will return the response in small individual
pieces which can be used by a interactive web browser to show
the answer as it's coming back in individual characters and
those characters are visible in the content parameter of each
row of the Jason response. That was a quick tour of how you
can build your own custom copilot using code with Azure AI
Studio. Be sure to head on over to
ai.azure.com to get started today, and you can also check
out some of the resources on the screen to deep dive into more
information. Thank you for watching.