(soft music) - Today we take a look
at how you can bring OpenAI's ChatGPT model in Azure to your own enterprise-grade
app experiences, so that you can interact with your organization's
private internal data, while respecting the
information protection controls that you have in place, and along the way, we'll deconstruct how it all works with a breakdown of ChatGPT prompts. And joining me again on the
show to go hands-on with ChatGPT is Microsoft Distinguished
Engineer, Pablo Castro. Welcome! - Thanks Jeremy, it's great to be back! - And it's great to have
you back on the show. Now before we go hands-on
with all the tech, it's worth mentioning that since our last show on the topic, the Azure OpenAI service,
it's now generally available. So, this is the service that
gives you programmatic access to OpenAI large language models
to use with your own apps. The GPT model from OpenAI
in the Azure service now adds support for chat interactions. In fact, if you missed
our last show with Pablo, it's worth checking out
at aka.ms/OpenAIMechanics to learn more about the
fundamentals of building prompts that guide the output of the OpenAI models as you build app experiences. Now, ChatGPT is of course one of the fastest adopted
technologies in recent years. And at Microsoft, we're in
fact integrating OpenAI models with related experiences
across the Bing search service, GitHub Copilot for AI generated code, and as recently highlighted, the Microsoft 365 portfolio
of apps with Copilot, just to name a few. So Pablo, what potential do you see then in terms of applying ChatGPT to enterprise-grade applications
on the Azure service? - Well, it's pretty exciting. We can now build applications that combine the ChatGPT
model with your own data. This can transform not only
the way we interact with apps but also our ability to effectively
use vast amounts of data to answer questions, generate content, or any number of new and emerging tasks. ChatGPT is a new model optimized for
conversational-style interaction, though it can be used
for other tasks as well. It uses a particular convention and syntax to denote turns between the user and ChatGPT as "assistant"
in a conversation. And while it's trained on public data, we can construct prompts that
include both instructions and additional data to generate responses. So, imagine taking ChatGPT and
applying it to your own data but with precise control
over the knowledge base for in-context and relevant responses. We can do that using an approach often called "Retrieval
Augmented Generation". In this case, we combine
the Azure OpenAI service with Azure Cognitive Search to index and retrieve data of all kinds, knowledge that is private and external to the ChatGPT
large language model. The retrieval step in
Azure Cognitive Search finds the most relevant
pieces of information, even if it's millions of
documents or data points and presents the top ranked
results to the language model, and this lets you have detailed informed interactions with your data. And because the knowledge lives outside of the ChatGPT model, you're in control of it, and it's not used to train the model. And equally important from
an enterprise perspective, any chat session state lives entirely within your application. And whether you keep it or not and where is fully up to you. - Just to clarify, by "private knowledge", we mean only data that exists
within your organization or your application's boundary. - That's right, or equally
you could be a SaaS vendor that wants to enlighten your application to provide an in-context conversation or content generation
experience for your customers by using the data you manage for them. So, you're just using the large language model's understanding and reasoning capabilities and building your own
app experience around it. - That makes a lot of sense. So now we've had all the context, so can you show us an example of how this would work in an app? - Sure, we'll walk through
a typical app experience that you can build around ChatGPT. I have here a sample Human
Resources web application. By the way, we've published
the code for this whole app, including the UX, the
backend, and the sample data in a public GitHub repo
at aka.ms/entGPTsearch for you to try it out or use as a starting
point for your own apps once you've watched the show. So, this app lets employees
generally chat about topics related to their employment
benefits and employee handbook. In this case, I want to ask
about healthcare coverage, which can be unique by plan,
location, individual, etc. So, I'll type "Does my plan
cover annual eye exams?" In the generated response, you will see that not only does ChatGPT use the knowledge necessary
to derive a response but as a best practice
also cites its sources. That's because a key area
we are actively exploring is how do we make responses trustworthy? These models are not perfect, so we see this as a collaboration between the user and the app where the app reads through
millions of data points and picks a few to formulate an answer. In this case it showed the
source it used for the facts, enabling the user to validate
the response generated from the "Benefit Options
PDF" source if needed. As I mentioned earlier, we achieve this by coordinating
the Azure OpenAI model, Cognitive Search, and how
we pre-process the data. Now, I'll type a follow-up question. "How about hearing?" While the question in
isolation wouldn't make sense, in the context of the chat history, it can figure out what
information it needs to answer. For each response, I can see the preview of
the file from the citation. And it also shows all
the supported content used to formulate the response. And so, in a way, what we're doing here is we're conditioning ChatGPT to produce source
citations in its response, which then helps you to validate them. So, can you explain how then this was able to figure out the response? - Sure, we wrote this app, so
it would expose the details of what happens in each
turn of the conversation. Let's go back to our follow-up question, "How about hearing?" which, as I mentioned on its own, doesn't present enough context
to be answered in isolation. When I click on this light bulb, I can see the process it went
through to provide a response. Here we can see that GPT
first takes the chat history and the last question to
produce a good search query. You can also see the rest of the prompt, including some of the mark up used to tell ChatGPT where turns are. By the way, there's also a new API that has more structure around this, so you don't have to construct the prompts with markup manually. Finally, you can see the "sources" part, which is where we inject the fragments of documents we recalled from the search index. And here's the final
prompt we send to ChatGPT to generate a complete response. By the way, here we used the
"Retrieve-Then-Read" approach for generating responses. This is an easy-to-understand approach that can be effective in simple cases. That said, we've explored other approaches we include in the app that
you can experiment with. For example, "Read-Retrieve-Read" would present the model with a question and a list of tools it could select from, such as "search the knowledge base" or "lookup employee data". So here, it would decide to
search the knowledge base to find out more about
the healthcare plan, it would see that there
are different plans, and so to determine which plan applies, it then would look up
the employee information using another tool to
finally arrive at an answer. Another approach is what we
call "Read-Decompose-Ask" which would follow this "chain-of-thought" style of prompting. This would break down the question into individual steps
of the thought process and answer intermediate sub-questions to accumulate partial responses until it arrived at a complete answer that it would be ready
to send back to the user. - What I really like about this app sample is it really helps you to
deconstruct the fundamentals for building these types
of app experiences. To that point though, with the logic included
into this template, presumably you can tweak
the prompts more, right? - You can. Beyond the user experience, the template lets you easily experiment and configure exactly how
responses are generated and channel what a user
would expect to see. As a developer, you can
include additional instructions using a prompt override. For example, this can influence the tone or style of a response. So, I can change the
style of the response. Just for fun, I'll make it
to answer like a pirate. I'll copy this question and start a new session
using this override and paste the question again. And once the new response is generated, you can see it responds like
a pirate might, I guess. In all seriousness, this is very powerful because you can easily
adapt the style of response. For example, you might want
to format the responses to be more concise for a
mobile device or add structure. One of my favorite uses of this is to have it generate a response that can use formatting to better organize the
information if requested. I'll paste some text to
inject a few more instructions into the prompt to do that. Let me start another chat
session and this time, I'll use this suggested question to compare two healthcare plans. You'll see it gives me
this long answer at first. And now I can ask it to
summarize the response in a table and it gives me this nice
table comparing the two plans. These are just a few examples. You could do any number of things, even switching responses to a person's native
spoken language on the fly, it's up to you and the application
experience you want to create. - So, these stylistic and
near real-time format changes are a big part of GPT and can also help make
information a lot more accessible. Switching gears though a bit, I know a lot of people
are probably wondering how information protection
in this case works. So, can we talk about
how you might make sure that the information that's surfaced is indeed what that
user is allowed to see? - So, since the data of the knowledge base is in an Azure Cognitive Search index, there are a variety of building blocks for security and filtering you
can use for access control, like implementing document-level
granular access control. So, let's demonstrate this together. In this case, let's say our organization is working on an office move to optimize collaboration between teams. Information about the project is restricted to only
those of us involved. I'll type "Is there a move or
office space change coming?" And the generated response
says, "I don't know." Because there are no documents I can see that talk about this. The information available in the sources does not have an answer. So Jeremy, why don't you
try this out on your laptop? - Sounds good! So, here I have actually got
the app open on my Surface. I'll go ahead and type
in the same question. "Is there a move or office
space change coming?" And I can immediately see
from the generated response that there is a move indeed happening and it's also cited the source:
a plan for the office move that's called "TheShuffleProject.pdf" - Right. So, the information is restricted
to those involved for now. I'm not part of it but you're actively part
of the project team, so you have access to this information. - Okay, so in this case, you've actually written the solution to make sure that only people can see what they're allowed to see. So, why don't we switch gears again though and talk about the process for adding new information into search. So, how long does
something like that take? - Well, let's try it out. This time I'll ask something
a little bit more random, like "Can I get scuba gear
covered by my benefits?" And you can see, it found documents focused
on benefits coverage but says it doesn't have information on scuba gear being covered. I have this script here. And in a real application, this would be an automated process that runs as data changes but here I need to make
it happen on the spot. I'm going to manually run this script and add the information
on a new benefits plan into our knowledge base. So, now with the new content added, I'll ask the same question again. And you can see with the new
knowledge available to it, it generates an updated response. - And just to be clear, we
didn't speed anything up in terms of using any Movie Magic. The response actually reflects
the change almost instantly after Cognitive Search has
access to the new information. So to understand the
logic a little bit better, can you show us then the code that's running behind your sample app? - Sure. As part of the sample, we
included notebook versions of the interesting parts of the backend. The nice thing about
showing this in a notebook is that you can see how the
state from a chat session is passed from one prompt to the next. Here we are in the notebook
in Visual Studio Code. You can see that everything
is wired up to Azure services with the right sources, and managed identities to authenticate. Then it create variables
for parts of the prompts. So now, let's run the
whole thing to the end for a first conversation turn. And here we can see the output. Now, let's do a second turn but this time we'll do it step by step. First let's check the history and we can see the previous user question and the response from the model. Now I'll update the question in line with "How about hearing?" And it is first sent to GPT to map the history and context
and generate a search query. From there it uses cognitive search to run a query and get
candidate documents. Next, I'll pull up the content and this was returned from
our top search results. And this is really the magic with ChatGPT. We can see the prompt evolves
with each interaction. This pattern uses everything
to construct a prompt: the user question, the chat
history, the search results, all to make one big
prompt with instructions. Then in our final step, it calls the Azure OpenAI completion API to get a response based
on the entire prompt. You can see that the session history is just kept in memory in this case. The model itself doesn't track it and it's up to your application. You could choose to store
it or, like we do here, simply let it go when
you close your session. In either case, the session history is not added to the large language model. Now, I showed you some
prompt experimentation in the notebook. Now, if you don't want to
experiment using the notebook or using the code, you can also use the Azure
OpenAI Studio Playground to experiment with ChatGPT
prompts interactively. The GPT-3.5 Turbo model has
been added to the playground along with a new chat interface and all the configuration parameters. - And by the way, you
can watch our entire show at aka.ms/OpenAIMechanics to learn about how to use
Azure's OpenAI Studio. Now, you also mentioned
that this is based on GPT-3.5 Turbo but with
the release of GPT-4, how does change the approaches
that we've shown today? - Indeed. We were running the
sample on GPT-3.5 today because that's the model that most people will have access to. That said, everything
we're discussing here applies to GPT-4 as well where some scenarios
will perform similarly and others will work much better, thanks to its advanced
reasoning capabilities and a much prompt length limit. - And apparently, it can
also even pass the bar exam. So, what else would you recommend then for anyone who's watching
who's looking to build out their own enterprise-grade
ChatGPT enabled apps? - So first, try out the sample
app I demonstrated earlier. You can find it on GitHub
at aka.ms/entGPTsearch. You can find it on GitHub
at aka.ms/entGPTsearch. The sample has everything
you need to get started, including creating the Azure services and even the sample data we used. In just a couple of hours,
you can have a version of what I showed you
running with your own data. Then, to learn more about
the Azure OpenAI service, you can go to aka.ms/azure-openai. And for Azure Cognitive Search,
check out aka.ms/azsearch. - Pablo, it's always a pleasure
having you on the show. And also, don't forget to
subscribe to Microsoft Mechanics, the latest in tech updates. Thanks for watching,
we'll see you next time! (soft music)