[MUSIC] Speaker 1:
We're witnessing the shift of generative AI from a
niche technology in the hands of a few to an accessible
technology businesses are harnessing to
benefit the many. Now, you can tap into this opportunity to
gain an AI advantage. Get to market fast, do it right, and do so responsibly,
safely, and securely. Copilots are becoming an AI category in
and of themselves. Today, you can build custom copilots to serve the exact needs of your
customers and employees. Bringing content generation to all apps to supercharge
creativity, designing contextual
experiences that impress and captivate and freeing people to focus on their
highest value work. This is Azure AI Studio, the place to build your copilots where you can confidently apply the latest
state-of-the-art and open source models easily ground responses in your data knowing privacy is protected, deliver multimodal
interactions beyond text alone that feel more natural and build on a foundation of trust through every step
of app development. The possibilities with
AI are limitless. Build your own copilots
with Azure AI. Let's do generative
AI right together. John Montgomery:
Good morning. Welcome. I'm incredibly
excited to be here. Thank you for the warm welcome. I'm John Montgomery. I'm in the team at the Azure
AI team at Microsoft. I'm joined by Nabila Babar. We want to spend
the next 45 minutes going a little bit deeper into some of the
news that Satya and Scott Guthrie have
talked about today. Generative AI is actually
real. It's happening. It's happening now we have
18,000 customers who are using Azure OpenAI Service today to build amazing
generative experiences. It's everything from the
simple chat experience to the most sophisticated
kinds of applications that you see in the
Microsoft copilots, companies like
Siemen's that you saw. I think in Scott's Keynote that's building an internal
application that does translation and can issue
tickets to ISVs like Grammarly and on into even more sophisticated examples like the Microsoft copilots. I want to spend a moment
right now highlighting one of those customers that
we're super happy to have with us,
which is Perplexity. Denis Yarats:
Perplexity tries to revolutionize the way people access and search information
on the internet, performance and speed is
at the core of what we do. We've been extensively using Azure AI Studio for this
type of workflow where you have an idea,
you try things out, and then once you're satisfied
we can take this model, deploy it, and next
day its in productions. Lauren Yang:
With Azure OpenAI Service, we have the ability
to control and independently change which
model is serving prod traffic. Perplexity Ask started
out as a slack bot. You ask a question and the
bot replies with an answer. We hooked it up to web
browsing and added the capability to search the
Internet. It was like magic. We got overwhelmingly
positive feedback and from there created the official
Ask Perplexity product. Aravind Srinivas:
The underlying models and Perplexity Ask couples on Azure and you want
to make sure that no matter what
requests people send, the service is safe, it's secure and reliably runs. That's what Azure
enables us to do. You send in a lot of requests to the large language model API and there are multiple times
you repeat the same thing. You don't want to keep paying again and again for
the same thing. What PTU has allowed is you can cash the repeated
requests, saving costs. Denis Yarats:
The power of large language models is that you can skip
many of those steps. You can prototype with
tools like Azure AI Studio, and in a matter
of hours or days, you can deploy this
feature into your product. We've been working
with Azure OpenAI to migrate to a faster version
of GPT-4 models. That essentially led to increasing throughput
from roughly 300,000 tokens per minute
to 600,000 tokens. You getting twice as much as throughput through only
two percent of the cost. This not only allow you to release features in a matter
of days instead of months, but also it allows
you to experiment much faster and get much
better user experience. Aravind Srinivas:
We didn't start off as trying to work on a search engine, but rather we were trying
to just do cool things with large language
models like GPT. We ended up building
something that would be useful for basically
the entire world. Lauren Yang:
Where the first generative AI answer engine out there, and we plan on
continuing to deliver a seamless and new type of search experience and
to continue growing our product using
Azure OpenAI Service to push us forward. [MUSIC] John Montgomery:
Perplexity has done some amazing things. I think that video
calls out some of the reasons that a tool like
Azure AI Studio is useful. Their ability to flight things, new experiences versus
old experiences to be able to do testing of one
model against another. The ability to scale the
solution from a proof of concept quickly out
to incredible load. This is the same
technology that we use inside Microsoft to
build our copilots. We're not just
providers of copilots, we're not just providers of a
third party infrastructure. The same technology that Perplexity used and that these other customers have used, that's the technology
that we build our own copilots on
the exact same stuff. It's all Azure AI. That means that the
advantages that we get as we scale
out our copilots, the efficiency of the service, the accuracy of the models, the scale we're
able to operate at, we are able to pass all
of those along to you. A huge portion of that is about
making sure that there is commonality in the
infrastructure that we use across Microsoft, and increasingly
that we're seeing customers standardize
on in their accounts. We talk about this as
the Copilot Stack, and it has these common layers
that you build up from. Here we are actually in Level zero of the
conference center. I want to say that AI is in fact the foundation upon
which we're building all of Ignite because we're on the foundation of
it, maybe the Garage. I don't really think
we'd want to give this talk in the
Garage, but we could. But thinking about it
through that perspective, there are multiple layers and as we talk about
this in this show, we are going to build from
the bottom to the top. Talking a little bit about the foundation models
that we provide, the orchestration
engine we have, the AI toolchain that we provide and how we integrate
with data sources. Mostly in this talk,
we're going to be talking about how you build a
completely custom copilot. We have other talks here to
talk about how you can extend the Microsoft copilots or to build copilots from within
a product like Fabric. But this is about building your own Copilot at
scale from scratch. The tooling we used to do
that is Azure AI Studio. Azure AI Studio is in
public preview today. We announced it at our
build event last spring, and it has been an amazing
journey to get to the place today where we are able to offer it to you
as a public preview. I encourage you to go to
Ai.azure.com and give it a try. We have brought together all of those practices from
that Copilot Stack. All of the knowledge
that we've gained from building and operating
our copilots at scale. All the customer
conversations together to give you one place to do AI right and to build applications quickly and offer them
out to your customers. AI Studio is a new thing for us. We are truly unifying multiple different AI
services into one experience. We see this very commonly where customers want to be able to use multiple different AI models, they want to bring
their custom models, many different data sources, and we want to give you that
unified platform to do that. That brings in the best of our data and search
technologies. That brings in not only our own foundation
models that we have, that we partner
with open AI to do, but the best of the open
models that are out there and the best other commercial
models so that you can build your copilots using
these technologies. To do so safely and
responsibly using the cutting-edge safety tools that we use within Microsoft, and to do all of that with a full end-to-end development life cycle because that is the next phase of what's
going to be happening here. We will see the increasing
merger of DevOps processes, MLOps processes and this
new area called LLMOps, which is about how you actually
build these systems and scale them and make sure that they version
correctly over time. With that, enough
I think of me talking. Nabila, would you
actually introduce us to Azure AI Studio? Nabila Babar:
Awesome, thank you, John. We've all seen powerful
copilots like Bing Chat, M365 and GitHub and talked
about a few of these. All of these are built on Azure, and they're some of the world's
most complex workloads. What we're going to do today
is we're going to build our own copilot safely
using Azure AI Studio. Here you see that I'm
an Azure AI Studio, The first thing
I'm going to do is I'm going to create a project. Actually, that's not true. I've already created
the project. We're going to use this
project as you see here, and within the Settings tab, you can see my project
configuration. Here you can see I've created something called an
Azure AI Resource. This Azure AI Resource
helps me connect to, create, and manage all of the different AI-related assets
that go into my project. For example, in this project, I'm using Azure OpenAI, I'm using Azure
Machine Learning, I'm using Azure AI Search. All of these are
already connected, and they're ready for me to use. You can also see the compute and infrastructure that my project
is dependent on in here, and you can manage
that here as well. I have access to different
API endpoints and keys, and I also have access to seeing who has access
to my project, and I can assign different roles and permissions here as well. That's the project set-up. Let's take a look
at our deployments. As a part of creating
this project, all of the deployments
that I needed, in this scenario I'm
using Azure OpenAI, they're already deployed and
they're ready for me to use. I can go ahead and manage these deployments
through here as well. We're going to go into
the project playground. This is a great place
to get started. The first thing I'm going
to do in here is I'm using a GPT-3.5 Turbo model. I'm going to ask this a question that is very specific
to my company data, and it's not going to work. As you can see, it's not able to give me
an answer because this model has not
been trained on my company data. Let's
go ahead and fix that now. I'm going to add our
data using Azure AI Search. What I've already done in
here is I've taken our data, and I've added it to OneLake. OneLake is supported throughout the platform in Azure AI Studio. Here you see me
ingesting my data from it and then adding that
data to my vector index. It's supported for
fine-tuning and also within our orchestration
flow as we'll see later. But I've already grounded this model with Azure AI Search. I can see it's connected
to the playground here. Now what I'm going to do is I'm going to ask the
same exact question. This time the model
knows the answer. The hiking shoes cost $110. We also can view
exactly where in the product documentation
the information comes from. I've not only enabled this
but let's say we asked the same exact question in a different language.
I'm using Spanish here. (no audio) As we can see, it's giving
me a response in Spanish, even though my product
documentation was in English. Through this, we've enabled
multilingual responses. Now that we have an idea
of how this model looks, let's take a look at some of the other tools that we
have within the playground. Up until now, my system
message has been pretty basic and system messages or prompts as we call them,
they're pretty important. They tell my application
how to behave. Through Azure AI Studio, you have a wide variety of
different prompts that you can use directly within
your application. If you want to
create your own, you can type that in here as well. Also, I've been having a one-on-one
conversation with this. I'm just trying to get a feel
for how this model works. But let's say now what
I want to do is I want to ask this many
different questions. Since the model has
grounded in my data, I want to compare all of the answers from
those questions. For that, we need to do
a manual evaluation. Here what I can do is
I can manually enter several different
questions one at a time or what could be quicker is if
I import an entire dataset. I've already gone
ahead and done that and let's take
a look at that. Here you can see I had a dataset that had around
10 questions in there. I've imported this in here, as well as a list of
expected responses that I'm expecting from
the LLM to give me. I ran this, and here you can see all of the different
outputs that I can now manually view and I can also add my preferences
to these as well. Here I can see the majority
of them look good, with the exception
of this one. . . and all of this effort
doesn't go to waste. At the top, I see a summary of the insights
that I'm building. I'm building my dataset
as you see here. I can export this
and share this, and I can also save
these results to use as a dataset later for a more
comprehensive evaluation, and we'll see that
a little bit later. With that the tour for our
playground comes to an end, and I'm going to hand
it back to John. John Montgomery:
Awesome. Thank you. That's a tour around AI Studio. We're going to go to
each of these areas in a little more detail. But you saw the basics
here, connecting to data, working with a prompt, doing evaluation,
selecting a model. These are the cores of LLMOps, and you can do all of
these within AI Studio, both through that
user experience, through the SEKs and
through the command line. We have a full experience
for deployment. Let's start by
talking about data. I think Arun when
he was onstage, he said, "data is the
fuel that powers AI." I would probably be a
little less formal. I would say garbage
in, garbage out. If you don't feed good
data into the LLM, you're going to get bad results. When I talk with customers, one of the most common things I see is for whatever reason, the data is not well formatted, it's not clean, the
chunking isn't right, the embedding isn't right, and you just get poor
quality results. We're trying to make it
much easier for you to connect to data and to get it into the right form to do that. Within Azure AI Studio, we have connectors to all
of the Azure data sources, as you could imagine, it's very, very easy to connect to
them and bring them in. But we can also talk to data
sources from on-premises, from other Cloud
providers and so on in order to
feed these models, the information you
need in order to get the best possible
output of them, whether it's structured
or unstructured data. One of the big hearts
of all of this is this idea of vector search and
being able to use vectors. We announced the whole
bunch of stuff today about vectors and
vector availability. We can talk a little
bit more later about what goes into a vector and
why vectors are important. They're basically
just mathematical representations that it's very easy for a
system to compare, so you can get nearness
and relevance out of it. But it turns out not all
vectors are created the same. We're very proud of
Azure AI Search, which was formerly called
Azure Cognitive Search, and its support for vectors. Because it doesn't just
bring together vectors, it actually brings
together vectors with semantic search and
classical keyword search. We talk about this as a
vector hybrid search, and it actually has a
dramatically positive impact on the results that you're likely to get out
of a search index. As you think about
the vector support, which is a core for
all of this stuff, you also need to be aware of what those vectors are
doing and how they're being brought together with
other technologies to make sure that the
information being fed into the large
language model is the best possible
information you can get in. Again, I think there are some very easy ways to show that. Maybe Nabila, would you
show us a little bit about the amazingness
of Azure AI Search? Nabila Babar:
Awesome, thank you, John. Before I show you, I want to get a little nerdy. Let's talk a little bit
about how search works. John Montgomery:
That sounds good. I like nerdy. Nabila Babar:
Perfect. In earlier you saw me grounding
my model with my data. It's a very simple example of me grounding the
model of my data. I want to talk a little bit about how we're
searching that data. Under the hood, we're running a sophisticated
ingestion pipeline. That pipeline does three things. The first thing it does is it
cracks open the documents. It then runs a
chunking strategy, and then on top of that
what it does is it embeds the data from that
and runs it through an LLM. I was using an LLM earlier. Then it takes that embedded data and it searches Azure AI Search. When you're searching, you have three different
options in AI Studio. The first one is
called vector search. Vector search is really good at understanding user intent and relationships between words. For example, it can understand the relationship between tent and camping, waterproof
and jackets, for example, where it's not that great is getting exact matches. For example, if you're looking
for a product document, an e-mail address,
or a phone number. For that, keyword search works really well.
It's fast as well. What hybrid search does, it combines both of those based on what
the user is asking. It combines vector search, which is really good at understanding
relationships and intent, with keyword search based
on what the user needs. The last search type we have in here is hybrid plus semantic, and this is the one that
builds on top of both of them. It's the one that
rules them all. What this does is it
adds a semantic ranker, and what that ranker
does is it ensures that the most relevant information is available at the
top of a search. This is the one we're
going to try right now. I'm going to go back
to the Playground and I'm going to ask
these two questions. The first question is, what gear do I need
for Seattle weather? What I did within my
product documentation is I have a lot of documents
about my products, and then I also
inserted some documents about just generally
about Seattle to see which document
this is going to rank and search depending
on what I'm asking. Let's go ahead and ask
this, what care do I need for Seattle weather? This is great. As you can see, the first thing is
telling me that the retrieve documents do
not provide a direct answer. It's giving me
transparency that this was not in the
retrieve documents. Then it tells me that the closest information
available to this is for any weather. Actually, sorry, we're going
to try this one more time. I promise this worked before. John Montgomery:
How do you know, it's real software? Nabila Babar:
What it should do and what it has done I promise is that it is able to reference Seattle
weather and able to say that it rains in Seattle and then it brings up my
waterproof jacket. John Montgomery:
What's the irony here right now is it's bright and clear and cold. Nabila Babar:
The next thing I'm going to ask, let's go back to
earlier one, maybe. No worries. What it
does is it pulls out the right document
within here and then it's able to add to the record, it's able to pull up the
right document that has a waterproof jacket
within here that was within my product
documentation. I probably I am not using
the right dataset. But it does work, I promise.
John, now back to you. John Montgomery:
Thank you. That's a hybrid
search with semantic. There is a session on this that I would
invite you to go to, given by my compatriot, Pablo Castro, if you
are interested in it. He goes into quite a bit of depth into how this
technology works, when to apply it, and strategies for doing chunking
and embedding on your data. Now if it's a garbage in, garbage out with data, one of the other core things that's going to
determine whether a Copilot is successful or not is the foundation model
you build on top of. Within Azure AI Studio, obviously we have the
models from OpenAI, GPT-4, GPT-3.5, Turbo and so on. We're now expanding
that out for a while, we've had a model catalog that can bring in
open source models, and there's quite a bit more
going on in this space. The key for us is to
ensuring that customers have choice because one size does
not fit all in this domain. Sometimes you need
a specific model, sometimes you need other models and sometimes you
need the models to be composed with each other so that you can get
the right result. That's actually most of the
copilots at Microsoft use a mixture of different models to achieve the results they have. One of the biggest announcements
we're making today is that we are now
bringing GPT-4 Turbo, the latest version of GPT-4 GPT-4 Turbo with Vision and DALL·E 3 into the
Azure OpenAI Service. If you were following the news, you know that about a week ago, OpenAI announced that they have these models and obviously
we are bringing them into Azure OpenAI as
quickly as we possibly can so that our Azure customers
can make use of them. GPT-4 Turbo, a tremendous model. It delivers the same level of accuracy that you would
expect from GPT-4. The interesting thing is it's much more efficient,
much higher throughput, which is enabling us to be
able to lower the price on the token based
offering significantly. I suspect this is going to enable a lot more new scenarios since historically GPT-4 has been a somewhat
expensive model to use. The quality is worth it, but we're delivering
the same quality now at a significantly lower price. DALL·E 3 similarly, is
delivering a quality of image creation that is really blows the doors
off of DALL-E 2. I think you probably
saw some of that. It's an amazing model. But I want to spend a
moment here talking a little bit about
one of the trends that's happening within AI and that's called multimodality. I think a lot of us
have been talking about this for a while. Mostly today when you think
of large language models, they are language models. You type in language and
you get out language. But multimodal models go quite
a bit further than that. Microsoft has been at the
forefront of creating these multimodals for
a while and we're very pleased to be able to
say that we're bringing GPT-4 Turbo with Vision to
the Azure OpenAI Service. This is an amazing model, it can take in video, it can take in images, you can supply it with video or images and ask
questions about them, it can summarize videos. It's an incredibly powerful
model that extends the language to language to incorporate video
and images as well. There's a whole session on
this that my compatriot, Marco Casalaina, is giving. I encourage you to
go to that one. He will go very deep on this model and what
it's capable of. I'll stop there. You should definitely go to
Marco's session. Somebody who was
reviewing it from outside Microsoft said it's probably going to be the best
session at the show, so you should go
to Marco's session. Now I want to actually reiterate something
that Scott talked about. There is a session we're
giving on this in detail, but a lot of the questions
that I get from customers have to do with
what happens with my data when it goes
into your model? Do you use it to train
the foundation model? Is it secure? How
are you using it? I just want to say
super clearly, the promises that Azure makes generally apply to the
Azure OpenAI Service. Your data is your data. We don't use it to train
our foundation models, we don't do anything with it. When you fine-tune a model using the Azure OpenAI Service, the weights of the
fine-tune model stay in your subscription,
we don't see them. We have some of the
strongest protections around data and data sovereignty
that in the industry, data stays in the region it's in and we're very proud of that. We're very proud to
announce that today we are extending some of
the announcements we've made previously
about our copilots, where we are offering
copyright protection to applications built on top
of Azure OpenAI Service. We call this the Customer
Copyright Commitment. If you follow the published
guidelines we have for how to build a
copilot responsibly, that includes how to
design your metaprompt, how to handle data, and you use the Azure
Content Safety Service with the switches
that we document, we will indemnify you if somebody makes a
copyright claim against you. That is a huge
announcement because it's one of the biggest fears that a lot of customers have. It's really about the control over what's coming
out of these models. A major announcement
that we made today. Another major announcement
has been about fine-tuning, so you can now fine-tune
GPT-3.5 Turbo, GPT-4. Now I'll say the GPT-4 Turbo,
that's a private preview, so it's a limited number of customers we're
going to bring in as we bring this service up. But I want to spend a
moment talking about fine-tuning versus
retrieval-augmented generation. Everything we've
shown so far has focused on retrieval
augmented generation. This is the idea that
instead of having to go and retrain a model
and become an expert in weights and biases
and things like that, you can grab some
data and you can feed it into the prompt along with user questions and get
answers out of it. Retrieval augmented
generation is awesome in a lot of ways. It's relatively fast to do, building an index is
relatively inexpensive, you can reindex often. Retrieval-augmented generation
in our internal tests. It's really good at adding
knowledge to the experience. Fine-tuning is a very
different thing. You bring your data in,
you fine-tune the model. It takes some compute to
create those new weights. We then assemble the new weights with the base model
when you call the model using a technology we have called
Low-rank Adaptation. But the interesting thing about fine-tuning is fine-tuning is not the best way to add
knowledge into a model. It's a really good way
of adjusting the tone of output or the format of output. When you think about
these two technologies, they are entirely complementary. It is not one or the other. A lot of applications won't
ever need to do fine-tuning. A lot of applications
may not need to use RAG. But we're very pleased
to be able to offer both within Azure AI Studio. Now I've talked a lot about the OpenAI models. I'm also very happy that we have an amazing assortment of other models that you can
bring into the experience. We have the OpenAI models, but you can also deploy and fine-tune any model basically
through a partnership with Hugging Face as well
as partnerships that we have with a bunch of other companies to bring their models onto your
own infrastructure. This is basically an
infrastructure-based play. You can go to our model catalog, select a model, you can
fine-tune it, you can deploy it. It's a full experience of
bringing open source models into the Azure AI
experience so you can build them into your
generative AI application. Today, we also announced a technology called
Models as a Service. Now this goes the next level. We have a lot of
customers who don't want to deal with
infrastructure, they just want to call
the API endpoint, and that is where Models as
a Service comes into play. These behave
basically the same as Azure OpenAI Service or any of the other
AzureAI Services. We operate the endpoint. You call the endpoint, it does what you ask. We have announced
partnerships with Meta around the Llama 2 model to operate Llama 2 this way, also Mistral, Jais, and Cohere. We have their models
now that we're bringing up as part of the Models
as a Service platform. This is going to make it a lot easier for customers that
don't want to deal with infrastructure just to use the model without having to
think about infrastructure, which is the direction
that we want to go. But it's not enough just
to have lots of models. I always say to customers, the proof is in the pudding
and you should test. Because it turns out
every model behaves differently and they behave differently with
different test data. Now, I would say GPT-4
is an amazing model. We have tested it
nine ways to Sunday, it is by far the best model we have tested on every measure of quality. Its inferencing
performance is terrific, but you know what, it may not
be perfect for your task. We're going to stand
behind what we do. We will offer a model
benchmarking system within AI Studio. We're both going to
start by publishing benchmark results that we have
run of these open models, our models along with
common benchmarks. We're starting with a subset
and gradually adding out. In the fullness of time, what we'll enable you
to do is to benchmark any model against any other
model using this system, this evaluation
system that we have. You can bring your own
data to that as well. This is a key part of answering the question about what model
do I choose for what task. We're going to give you the
tools to make that easy. Again, I would love
to actually ask Nabila to show off some
of what we have there. Nabila Babar:
I have a small surprise for you, John. John Montgomery:
What's the surprise? Nabila Babar:
I ran it again, and here is it. Here, it is work.
I didn't give up. (applause) Here we see what gear do I need for Seattle
weather and you can see it referencing the
document that we need, which is about a
waterproofing jacket, even though I'm not mentioning
rain within that document. When I ask it a question about
where to eat in Seattle, even though I have documentation about Seattle that's
specific to food, it's referencing that and
not any product documents. Here we can see the
ranker pulling up the right information based
on which question I'm asking, and it's also able to build those relationships
between words. We were using the GPT-3.5
Turbo model the whole time and an ada
model for embeddings. But let's say, as
John was calling out, that we want to use different models within our application. Here within the model catalog, I have access to thousands
of open source models, including Meta's 70
billion parameter, Llama 2 model as well. But let's say I don't know
which model to choose. With our benchmarking tool based on the task that
we're trying to do, in this we're doing
question answering, I can compare a set of models. Here I have the GPT-3.5 Turbo
model that I was using, getting compared
to a Llama 2 model against a standard dataset
and I see metrics right here. We will be adding
additional metrics to this, additional datasets, and you can also
replace this with your own dataset and
benchmark those. Now let's say we want to
use a Llama-2-70-b model. We're going to go back
to the model catalog. We're announcing two
services with this. The first one is being able to deploy this model
through a mass service. This is a pay-as-you-go service. The benefit of that
is I don't have to manage any infrastructure,
any quotas. I pretty much call an API and this model is
ready for me to use. But let's say you want
to customize this model. You may want to
fine-tune this model if you want to introduce bias
into your application. Those could be for scenarios within certain
medical terminology. If you want to introduce
within your application or legal terms for that, you may want to
fine-tune models. For that, we're also announcing
a fine-tuning service, that's a managed service. All you need for this is your dataset and
you're calling an API, and we're going to take care of the quotas
infrastructure for you. To see this, I've already gone ahead
and fine-tuned the model. When I fine-tune
it, I can see it within my fine-tuned
models right here. Here you can deploy this model. But before you
deploy, you want to know what are the metrics like, is this model ready
for me to deploy? Within the Metrics tab, you get a list of metrics
that you can view. You can always go ahead, change your parameters,
change your dataset, continue to iterate on this, and then when you're
ready you can select the best model and
fine-tune this. We've talked a lot up until now about a text-in-text-out
scenario. The future of
generative AI is all around building on top of that
and adding multimodality. What multimodality
means that I can add text images and even
videos to my applications. With AI Studio, you can do that. We're going to take a look at
a few different scenarios. In the first one, what I've
done is I've created generated descriptions
that can be either for my website or
for my brands in here. I did this by going to a different mode
within the playground. I can then build on top of
that and I can also generate different images for my
product descriptions and for my websites as well. Here we're using DALL·E 3, and I've saved the
best for last. Here you see with the
GPT-4 Turbo model, what I've done is
I'm asking this. I've uploaded a pretty bad video actually from a
vacation that we took. I'm not telling it where it is, and I'm saying based on
this generate a half-day itinerary for me
with a packing list. What we see here is it's able to pinpoint exactly where in Yellowstone National
Park this hike was. It's also able to create
a half-day itinerary for me and a packing
list for my hike. I think that deserves a
round. . . (applause) And that's not all. What we've also done is we've enabled with Azure AI Speech, we've enabled text-in
in and text-out. Lastly, you can also create your own natural sounding
synthetic voices using Custom Neural Voice, and these can have different
speaking styles and can adapt to different
languages as well. Now back to you, John. John Montgomery:
Cool, thank you. Never give up, never surrender. The Neural Voice is very interesting
because of course, you can attach the output of
one of these language models to a Neural Voice and it
can speak on your behalf. I'm going to take the
next couple of sections together and then Nabila will
bring us home for a demo. I want to talk about
safe and responsible AI. Microsoft has had these
principles for a while about how we approach and how we think about safe
and responsible AI. Things like fairness
and reliability, safety, privacy, security,
things like that. But the thing is, it's
not just principles. The principles infuse our
engineering processes, the ways we build things. Everything from how we source data to how we train models. Also the tools that
we create that we are able in turn
to offer to you, and they're the same tools
that we use internally. One of those tools is
Azure Content Safety, which is a major addition
to our safety product line. We use it internally with our Copilots, we're
offering it to you. It is the thing that
can identify things like hate speech
or sexual content, but it's also the thing that can identify jailbreak
attempts and so on. That's our Content
Safety Service. The other big service
that we offer for model creators is the
evaluation on our AI dashboard, which is built into
the AI Studio as well. It enables you to
see where the data may be biased or where you may be having issues with the application
actually in production. Think about these
as two parts of the equation for how you
build a safe system. Again, it's not just talk, there's real software
engineering behind actually making
these things safe. Now let's talk about the
full development life cycle. This is increasingly
a key thing. It's not just about calling
the API and walking away. There's quite a bit
of engineering that goes into shaping the prompt, that goes into the
model orchestration, the APIs that are
called, and so on. You need a complete
way to do that. At Microsoft that feature within AI Studio is
called Prompt Flow, which is generally
available today. It is our tool for
prompt orchestration, for evaluation, and for
prompt engineering. It is deeply integrated
into AI Studio. It is also in our Azure
machine learning product. It gives you complete tools
to be able to understand the behavior of the graph of models and APIs
that you're calling. To be able to
evaluate the outputs. These models are very non
deterministic and so evaluation is at the heart of
how you actually make sure it's doing
the right thing. A little coda on this one. Everything I've
been talking about actually has application
on the Edge as well. We are beginning the
journey to link together Azure AI Studio to be able to deploy and do things
with models on the Edge. We talk about this as
Windows AI Studio. It is a Visual
Studio Code plug in. We'll be showing a
little bit of it here, particularly about
how you might fine-tune something like the
fine-tune model to be able to deploy it onto the Edge using all the power of AI
Studio in the Cloud. But to show off the
rest of it, Nabila, would you please show us
some more of the product? Nabila Babar:
Awesome. Thank you, John. So as you saw, the reality is a lot
more complex than we were showing within a
playground experience, and sometimes things may
not go the way you want. How do you adapt to that? Customers may want to be able to add many different
data sources. They may want to use
many different models. They may want to use many
different meta-prompts and then they want
to compare these. These are all variables that we can use in our application. But what are the tools that you need to be able to see what's the impact of these
variables on my application? For that, we need
evaluation tools. What I'm going to
do now is I'm going to click open in Prompt Flow. What this does is it
takes me to Prompt Flow. The entire UI that you saw is now broken
up into these nodes and an orchestration
engine that we can see here that was powering that UI. All of these nodes, they contain code that
developers can check in. If I ask the same
exact question, I'm going to get the
same answer here. I can also see the files behind each of these nodes, right
here at the top. These are files that
I can download, I can check in my repo and
I can iterate on them. But let's say I want
to develop in a tool a developer tool of my choice. If I click on "Open" in VS Code, I'll be taken to VS Code Web. What's going to happen is a development container
is going to be created for me and
it's going to be my own development
workstation in the Cloud. Here I'm going to have access
to the same exact files. Now I can go ahead and
I can iterate, encode, whatever changes I make here are going to be reflected
back in the UI. Let's go back there now. Developers face a
lot of challenges with LLMs because of their
non-deterministic nature. Minor changes in the
meta-prompts can have major impacts to
your application. To do that, we need
evaluation tools. What I'm going to
do here is I have a dataset here that has a
set of questions and answers. These are same exact dataset
that I had used earlier. I also have a Prompt
Flow here as well. What I'm going to do here is I have two prompts that I created. The first one is
pretty simple and the second one has instructions
for it to be safer. What I want to do is I
want to see the impact of both of these prompts
on my application. For that, I'm going
to run an evaluation. To save us some time, I'm going to go ahead right
here and here you can see all of the different evaluation
metrics that we have. We have both data
science metrics like F1 scores and accuracy, as well as LLM-assisted metrics
like relevance, coherence, groundedness, and
GPT similarity. For example, GPT
similarity compares the ground truth from my dataset to the output of the LLM. When we run this, we're going to see
this evaluation within our evaluation runs. Here now we can see both
of those variations and I can see aggregate
metrics at the top as well as a breakdown for each
row within my dataset. I can see in general,
the safer prompt has slightly better
metrics in here. If I want to dig further, I can look at each
of these variants. I can look at each
question within here. Let's say actually further, I can trace every
single API call that this application runs. I can view the input and
output of that as well. If I want to know exactly how this 5 is calculated
for groundedness, for example, and where
it's coming from. What's amazing is the evaluation run itself is a Prompt Flow. When I ran that evaluation, a new Prompt Flow
is created for me. The benefit of
that is now I have transparency into exactly how these metrics are calculated. Further, I can clone this and customize and tweak it
for my own scenario. Let's finish off
with Deployments. Here you see I've already
deployed my flow. I can consume this
in my application. Lastly, I can monitor
this. When I deploy this to an endpoint that's
consumed in my application. Here I'm seeing now the same exact metrics for my application
that I was while I was prototyping and developing. Now I'm able to have
peace of mind when this application is
deployed to my customers. Lastly, on the same endpoint, I can enable content filtering. Here you can see
I can filter out both harmful content from the user input into
this application as well as the output. With that, our tour of AI
Studio comes to an end. We're incredibly excited for a public preview and
we cannot wait to see what you build with
it. Thank you.