[MUSIC] ANNOUNCER:
Please welcome Chief Technology Officer and Executive Vice President
of AI, Kevin Scott. KEVIN SCOTT:
That was an amazing video and thank you, Satya,
for sharing it. It really is inspiring to
see this technology getting diffused so quickly and having a real positive impact
across the globe, not just in the global, the urban innovation centers
here in the United States and the capitals of the
industrialized world. I'm so excited and happy
to be with you all here today, in-person at Build
after a four-year hiatus. I guess it goes without
saying that a lot has changed in the
world of technology over these past four years. One of the biggest changes,
and it's the theme of this conference, is what has
happened in the world of AI, just even in the past year and what that means for
you all as developers. I wrote my first program as a developer in the
early '80s, when I was 11 years old, I think. I remember what a thrilling
moment that was, being able to do something for the first time that I didn't
even realize was possible. I've been chasing that
feeling my entire career, trying to find those moments where something impossible
became possible. Then as a developer, figuring out how I could
participate in that change. The thing that I
can say that it is the most exciting thing
in the world to me, maybe the most exciting
time that I've experienced in my career,
is what that power of AI is doing right now to help
all of us have that moment, that ability to
take something in our hands to look at
what was possible, what was impossible
and becoming possible now, and then going and doing
something great with it. I'm going to spend my next half an hour or
so chatting with you all about some of those
technological themes that are driving all of this
great progress in AI that we're seeing. We're going to start with
maybe the obvious thing. There's an incredible
amount of attention being paid right now to what's happening with the rapid
progress with these AI models, these foundation models,
as we're calling them now. In particular, the rapid pace
of innovation that's being driven by OpenAI and in their
partnership with Microsoft, we really are
setting the pace of innovation in the
field of AI right now. I think, even for us, it's been surprising to see
how much of the zeitgeist is being captured by
things like ChatGPT, and applications that
people are building on top of these large
foundation models. The reason that this
partnership between OpenAI and Microsoft has been so successful is that
we really do have an end-to-end platform for
building AI applications. We build the world's most
powerful supercomputers. We have the world's most
capable foundation models, either hosted that we built ourselves and make
available to you all via API, or open source, which run great on Azure. We also have the world's best AI developer
infrastructure. Whether that is using these super powerful computers to train your models
from scratch or to build these applications that we're going to be talking about at Build this year on
top of that infrastructure, we have that
end-to-end platform. You're going to hear a ton
about it today and tomorrow. Scott's keynote is
right after mine. He's going to dive
into detail on a bunch of this stuff, and then
the breakout sessions are going to be amazing
and equip you all with the information
that you need to go do some pretty awesome stuff. This end-to-end platform
starts with Azure. We really believe that
Azure is the cloud for AI. It's not just the amazing, technically complicated
and brilliant work that our partners OpenAI have done on top of all of
this infrastructure. But it's things
that the teams at Microsoft are doing to build our Copilot applications and
our own advanced AI models. It's also the things
that our partners, some of you here
in this room, are building on top of Azure, making Azure this
amazing platform for doing the most ambitious
flavors of AI in the world. But it's not just Azure; Windows, we believe, is the best client
for AI development. You're going to see a
bunch of that today. Panos is going to dive into
it pretty deeply tomorrow. Satya showed the
Windows Copilot, which is going to be
an amazing part of your productivity story, like GitHub Copilot works
great on Windows. But increasingly
what you're going to see is the ability to run these powerful AI models on your Windows PC so
that you can develop these true hybrid
AI applications that span the Edge, all
the way to the Cloud. It's just a really
really exciting thing. But what I'm going to
spend most of my talk discussing with you all is
this idea of the Copilot. Satya has already referenced a whole bunch of the
Copilots we've launched. As he said, it's almost as if we woke up on January the first and decided to do a whole
bunch of press releases. But it's really
been years of work where we had built a
platform for building copilots that has enabled us to do these amazing releases
that we've been doing and we are sharing with you all today some of those patterns
that have helped us build copilots, and showing
you and opening up our platform so that you can build copilots of your own. Just to start with, a copilot, simply said, is an application that
uses modern AI that has a conversational interface that assists you with
cognitive tasks. We're going to talk a lot
about what that means later. We believe that it must
be an open ecosystem. One of the most important
things that we believe is even though there are
a whole bunch of copilots that
Microsoft has built, that maybe the most interesting copilots to get
built are by you all, using these powerful
tools that you have, both on Azure, on Windows, and in the open
source community. As we start talking about this, I would love to bring
to stage Greg Brockman, the President and
Co-founder of OpenAI, to talk about his
experiences building GPT4, this powerful model that's powering a bunch of
these copilots, and about ChatGPT, which maybe is the most interesting copilot
in the world right now. Please join me on stage, Greg. [MUSIC] GREG BROCKMAN:
Awesome. KEVIN SCOTT:
Fantastic. Thank you so much for joining us
today here, Bill. I wanted to start with
the ChatGPT experience. So, like I believe it's
caught us all by surprise, like just how crazy the adoption of ChatGPT has been and how much
interest there is. But like it's a really
big engineering challenge to build something like ChatGPT, so maybe you could talk a
little bit with us about that. GREG BROCKMAN:
Yeah. ChatGPT was a really interesting process, both from an
infrastructure perspective and an ML perspective. We'd actually been
working on the idea of having a chat system
for a number of years. We even demoed at Build
an early version called "WebGPT" and it was
cool; it was a fun demo. We had a couple
hundred contractors, literally people we had to
pay to use the system. They were like,
"It's kind of useful; it can kinda help
with coding tasks." But for me, the
moment that really clicked was when we had GPT4, and that we had a
traditional process to GPT3, we just deployed the base model, so just been
pre-trained, we hadn't really tuned it
in any direction. And that was in the API. For 3.5, we'd actually gotten to the point where we were doing
instruction following, where we had contractors
who were given, here's an instruction and here's how you're
supposed to complete it. We did that training on GPT4, and the thing that was
so interesting was I just, as a little
experiment was like, well, what happens if
you follow up with a second instruction after it already generated something? The model provided the
perfectly good response that incorporated everything
from before then. So you realized that this
model was capable enough. It had really generalized
this idea of, well, if you really want me
to follow instructions, you give me a new instruction. Maybe you really want me
to have a conversation. For me that was the moment
that it clicked that, okay, we have this infrastructure
that's already in place with earlier model
and this new model, even just using this technology that wasn't meant for chat, it wants to chat like
it's going to work. So this was a real aha moment and from there we
just were like, we got to get this thing
out; it's going to work. KEVIN SCOTT:
Yeah. I think it was really
surprising to me. I remember when Sam called
me up and said, "Hey, we want to release
this ChatGPT thing and we think it's going to be a few weeks worth of work to condition
one of these models." I was like, "Sure, why not?" I had no idea that it was going to work
technically as well as it did and that
it was going to be such a crazy success. Maybe related to that, I know that you are one of
the principal architects on all of the infrastructure
that was used to train GPT4. GPT4 powers parts of ChatGPT. It has just really been a revelation for everyone who's been working
in the field of AI. I wonder if you could
share a little bit of, what are some of the
interesting things that you found about the
development of GPT4? GREG BROCKMAN:
GPT4 was very much a labor of love. As a company, we'd actually, after GPT3 had multiple failed attempts, to surpass the
performance of that model. It's not an easy thing. What we ended up doing was going back to the drawing board, rebuilding our entire
infrastructure. A lot of the approach we took
was, get every detail right. I'm sure that there's still
bugs; I'm sure that there's still more details to be found. But an analogy from Yaakov, who was one of the leads on
the project that I really like is, it's almost like building a rocket,
where you have all the engineering tolerances
to be incredibly tiny. So, lots of little details,
like for example, it used to be, turns out that if we had a bug in our
checkpoint, where if you kill the job at exactly
the wrong moment, you could end up
with a blend between new weights and old weights
when the job restarted. Machine learning
mostly doesn't care. It's happy to recover from that. But it's one of those
things where every time you see a weird
wiggle in your graph, you're like, I
wonder if this was that particular issue or
if it's a real other one. So, if you go back and you
really pay attention to every single detail and just do the boring
engineering work, that is the main
thing that I do. KEVIN SCOTT:
Yeah. Well, the boring engineering
you work do is just an unbelievable,
phenomenal scale. But I do think that that's a good parable for
everyone in the room. It's sometimes the
boring engineering work that really leads to success. So, Satya talked a
little bit in his talk about this shared approach that we're developing
for plugins. This idea that we're
going to empower all of these folks
in the room to write software that can extend the capability of
things like ChatGPT, and like all of these copilots
that we're building. I know that that also has been an interesting
technical challenge and we still don't
yet have all of the technical issues sorted out and there's a lot
of work left to do to get it into the state that we ultimately want
it to be in, so I wonder if you have some thoughts you wanted
to share on that. GREG BROCKMAN:
I love plugins. I think it's been a really amazing
opportunity, both for every developer to leverage this technology in a way that just makes the system
better for everyone. That's what I think
it's so exciting. Part of the reason
we designed it as an open standard was
because that way, as a developer, you
build this thing once and then any AI can use it. It's such a beautiful idea; I think that the web part
of what really drove it was anyone could build a website and now everyone gets
access to that. Then you build an API and suddenly anyone
can leverage it. I think that this
core design principle of really having any
developer who wants to be able to plug in and
get the power of the system and be
able to bring all of the power of any domain into
ChatGPT is really amazing. KEVIN SCOTT:
Yeah, and the thing that I really love about plugins is conceptually that
it's so simple. It reminds me a little bit about the first HTTP server
that I ever wrote. Like if you understand
the core concepts, you can stand up something very quickly that can do
something very powerful. And I think that is an
awesome thing as an engineer. In your role at OpenAI, like you are constantly
thinking about how to push the limits
of the technology. I think one of the
really amazing things about our partnership is
working with you-all, it feels like we get to
see a little bit further into the future than we
otherwise would be able to. I wonder if you could say
a few things about what's exciting to you about
what's over the horizon, like either with applications
or with the models. GREG BROCKMAN:
The thing that, to me, is interesting is we're almost on a bit
of a tick-tock cycle, like Intel of yore, where you come up
with an innovation and then you really push it. I think that with GPT4 we're in that early stage
of really pushing it, that we still have vision capabilities that
have been announced, but that we're still
productionizing. I think that it'll
just change how these systems work, and how they feel, and the kinds of applications that can be
built on top of them. I'm really excited to,
if you also look back at the history that over
the past couple of years, I think we did like a 70 percent price reduction two years ago. Then basically, this
past year we did a 90 percent cost reduction, intense cost drop,
like that's crazy. I think we're going to be
able to do the same thing repeatedly with new models. So, GPT4 right now,
it's expensive, it's not fully available, but that's one of the things
that I think will challenge. KEVIN SCOTT:
I think that is a thing that I would want to leave everyone here in
the room with is that, and it's what we say to
all of the developers inside of Microsoft
building on top of these things, like what's
expensive today won't be tomorrow because the progress
there is so fantastic. So, I think we've got time
to squeeze one last thing in. So, you've already
dispensed a bunch of really great advice for
developers here in the room. But like maybe one
more thing that you would leave
the audience with. GREG BROCKMAN:
I think that in this field, the technology is
getting better and better. But the thing that I
think every developer can do that is hard for us and even Microsoft, at Microsoft scale
to do, is to really go into specific domains
and figure out how to make this
technology work there. So, I really have companies that are in the legal domain
and really getting expertise, and talking
to lots of lawyers, and understanding what
their pain points are with this technology. So, I think that there's
a huge amount of value to be added by the
efforts of everyone. KEVIN SCOTT:
I think that's awesome. You heard it from Greg. You all are the ones who
are going to make AI great. Thank you so much, Greg, for
being with us here today. Thanks for all you're
doing. Thank you very much. One more interesting OpenAI thing that we're
going to do, so we have Andrej Karpathy here. I think I see Andrej
in the front row. Andrej is going to be
here on this stage later today doing a "State of GPT." He's going to walk
through the technology from beginning to end. It's going to be an
awesome session, probably going to be
tight on seating. Try to get your spot here. You are not going to
want to miss that. Let's talk about Copilots. Satya mentioned a bunch of the Copilots that
Microsoft has launched, that our partners have
launched, so we think of ChatGPT as fitting this
Copilot pattern, Bing chat, certainly GitHub Copilot, Microsoft Security Copilot, Microsoft 365 Copilot,
and Designer. Many, many more. The thing that we noticed as we were
building these copilots, starting with GitHub
Copilot several years ago, is that the idea of a copilot
is actually pretty general. This notion that
you're going to have a multi-turn
conversational agent-like interface on your software that helps you do
cognitively complex things applies to more
than just helping someone do software development. That's what you've seen, we have search copilots, now we're going to have security copilots, we have productivity copilots,
and we're going to have all of the copilots
that you-all build. The thing that we
noticed, for us at Microsoft, is that we
needed to look at what is common across all of these things so that we
can understand how to design great user experiences and what the technology stack is that is going to empower us to deliver these things safely, responsibly,
cost-effectively at scale. The only reason that
we have been able to do this blitz of Copilot
announcements and delivering these products
to users so quickly is because we stopped
and took the time and energy to go build a
Copilot technology stack that would allow us to move
quickly with safety. One of the things that I
want to talk with you about today is what that
technology stack looks like. But before we dive
into the details, I think Satya's reminder to us all, why do we do what we do? One of the important
reasons that we have taken the time to think about this Copilot stack as
one coherent thing is, platforms are important. It gives us the opportunity
to build things that are more ambitious than you otherwise would
be able to build. It gives you, the developers,
a chance to build things that wouldn't be possible if the platform didn't exist. I love this quote
from Bill Gates. It may or may not be apocryphal, but it's still just been attributed to Bill
for many, many years. What Bill is saying here
is that the true value of a platform only
materializes when the value created
on that platform is, accrues to the people who are building
on top of the platform, not the platform builder itself. If that's not true
of a platform, then it's really not a platform. The thing that makes platforms even greater than all
of that value that they can potentially produce,
is it prevents folks from having to bear the burden of building very complicated
things from the ground up just to build the application that
they want to go build. It's great if you want to
build all of this stuff, if you want to be
a platform company or an infrastructure company. But if what you
want to do is build a legal copilot, like
Greg was talking about, or you want to make a
copilot for medicine, or a copilot for helping people get through
their insurance claims. You are not going to want to build all of this stuff
from the ground up. It will be economically
infeasible. The amount of compute that
we are investing in and just the scale of all of that infrastructure is
absolutely astronomical. The fact that the things that come out of the other end of the compute, these foundation models
and this entire platform, that they are reusable and generalizable is really
a fantastic thing. One of the things
that we've been betting on for five years now, that this was going to be a durable property
of these systems. One of the things that
you're going to hear a lot about at Build is this idea that the
foundation models are powerful and they're
getting more powerful. But the can't do everything. You shouldn't have to
wait around until we train a model that can do the
thing that you want to do. You should have ways to
accommodate your application, build your application on
top of this technology, even when the model itself
isn't complete or perfect. We're going to talk about a ton of ways that you can do that. Satya's already
referenced plugins, and Greg and I chatted
about plugins. Plugins are going to be one of those powerful mechanisms
that you use to augment a copilot or an AI
application so that it can do more than what the base
platform allows you to do. What a plugin may do is it helps augment your AI systems
so that it can access APIs, and via API
can do anything, like change state in a digital system or
retrieve information. For sure, people will use plugins to retrieve
useful information. You've already seen
some video demos of that happening already
and you're going to hear a lot more about that. It allows you to perform arbitrary computations and to safely act on the user's behalf. Really, the way that we think about these
plugins is they're almost actuators of
the digital world. Anything that you can
imagine doing digitally, like you can connect
a copilot to those things via plugins. But what I'm going
to spend the rest of this talk focusing on is the anatomy of a copilot. What does a copilot look like? What is shared? What's common among all of these things that we've
built, and what are the platform
components that we're building to help you all
build copilots of your own. This starts from the
user experience. There are some
things that are the same and there are
some things that are different about building
Copilot user experiences. There is an application
architecture and there will be some
familiar things about it, but a bunch of new
stuff to learn. Then, it is so important for all of us to think about safety and security. You'll inherit a lot of that by using the tools that
we built for you all, but it's a thing that you
need to think about from the very first steps of building your copilot
applications. I just want to start with
the thing that doesn't change when you're thinking
about building a copilot, you have to build
a great product. It is something that
we sometimes forget. But you have to understand
what that unmet user need is, what it is that you are
trying to make better, where you have a
unique understanding of that thing that
maybe no one else has. Then you need to
apply the technology. Sure, the tech is great. It's making a whole bunch
of things that were impossible, or infeasible,
or expensive, possible, easier, and cheaper. But it does not
absolve any of us of the responsibility of thinking about what good product
making looks like. One thing in particular
that you have to really bear in mind is the model is not your product, unless you are
an infrastructure company. The model itself is just infrastructure that
is enabling your product. It isn't the thing
in and of itself. One of the mistakes that
I've seen, just being in the tech industry
over 20 years, is having people fixate on infrastructure versus
fixating on product. It's just the thing that we
even have to remind our teams here inside of Microsoft over
and over and over again, is use the infrastructure
that you have at hand that is best going to enable
you to solve your problem. Don't build infrastructure
that you don't have to build. Again, it's just up to
you all, it's up to us; we have to build
great experiences, things that delight users. We got to get things out into the hands of users as
quickly as possible, see what works, see
what doesn't work, iterate, make them better. Let's dive into
the Copilot stack. Satya already showed this and we're going to blow
it up a little bit now. This is how our Copilots at Microsoft are structured
and these are some of the things that we're
going to be diving into greater detail in
subsequent talks for you all to have a look at, to pick up, to use, to learn about, and
to make things. Some of this may look familiar. There are three boxes. You can think of these as
roughly corresponding to the three tiers of a
normal application. You've got a front
end and you've got a mid tier, you've
got a back end. The front end, like
the things that we've already talked
about, is you start with understanding what your
amazing product idea is. The thing that's a little
bit different about the user experience design
with Copilot is we have more or less been building user
experiences the same way for 180 plus years, since Ada Lovelace wrote
the first program. We have had to understand what
the machine is capable of. And then we are
fiddling around with how we express the
connection between the human and the machine in very explicit ways. What that means for you all is like fiddling around with
user interface elements, menus, binding code to actions, trying to fully anticipate the needs of the user,
and architecting your applications in
particular and familiar ways so that people know how to get at all the functionality, that capability that you
really built into your code. The thing that's a
little bit different in a copilot is
you're going to spend less time thinking about what your user
interface widgets are. and trying to second guess the user about what it is
they want, because they have this really natural mechanism to express what it is they
want, natural language. What you have to think about in the design of these copilots is what it is you want the copilot to
actually be capable of? What are the things a
model can't do that you need to augment with a bunch
of the stuff that I'm about to show you in
the orchestration layer with plugins and maybe even with fine tuning models or using portfolio
models to accomplish? But it's going to be way less of that fiddling around mapping user interface elements
to little chunks of code than you're accustomed to. You also, on the
flip side of that, have to think about what you
want the copilot not to do. This is important in how
you're thinking about safety, but also because the thing
at the bottom of the stack, these foundation models are a big bucket of
unrestrained capability. You're the one who
oftentimes has to restrain it to your
particular domain. For instance, with
GitHub Copilot, a bunch of the work
that we did is to keep the model on task, which is helping you solve
your development problems. You're not trying
to figure out what the best menu item
is on Taco Bell when you're sitting in GitHub Copilot, trying to
write a piece of code. That's the user interface, just broad brush, what
is different there. Now let's talk about
orchestration. Orchestration is the business logic of your copilot. As I mentioned, when we started building our own copilots, every team inside the company was building their own
orchestration layer. Like, all of that logic
to figure out how to get a thing to sequence
through all of the models, do all of the filtering, do all of the
prompt augmentation that you have to do to
build a really great app, and we just noticed
that there was commonality across
all of those things. One of the things
that we did that greatly affected our ability
to get these copilots out to market at scale and to do more ambitious things was to decide that inside Microsoft, we are going to have one orchestration
mechanism that we will use to help build our apps. That is called Semantic Kernel, which we've open-sourced, and there's a session on
Semantic Kernel later at Build, which I would encourage
you all to attend. But like, we also know that
we're not the only ones who see that there's all of this commonality
across orchestration, and there's some really great open-source orchestration
tools that work super-well inside of the Azure ecosystem
that we're building. Harrison from LangChain, shout out to Harrison who's here
with us in the front row. Yeah, give Harrison
a round of applause, please. LangChain is one of the most popular open-source
orchestration mechanisms, and Harrison, with
a very small team, has built a thing that is useful to an extraordinary
number of developers. And orchestration isn't
a solved problem. We're going to see a
lot of new ideas there, and the thing that I want to assure everyone
here in the room is that you'll be able to use whatever orchestration
mechanism you want. We'll give you some options that we think are great for us. We'll point you to some of
our open-source favorites. But if you want to roll your own thing, that's your choice. I'm a developer, I like rolling my own stuff sometimes too. One of the things
that you'll see in Scott Guthrie's talk that's coming up next is prompt flow, which is another orchestration
mechanism that actually unifies LangChain
and Semantic Kernel, and so I
encourage you all to go dive a little
bit deeper there. Inside of the
orchestration layer, the fundamental thing that you're going to be
manipulating is a prompt. A prompt is just a
bucket of tokens that is generated by the user
experience layer of your application. It could be in something
like Bing Chat or ChatGPT, like a question, or like a thing that a user
is asking the model to do. Or it could be something that
your application constructs, where it's not a direct natural language
thing from the user, but a natural language
thing that you are conveying to the model
from your application. A big part of handling those prompts at the
beginning stages of orchestration is prompt
and response filtering. Basically saying, I'm
not going to allow these prompts through
because maybe they will cause the model to
respond in a way that doesn't meet the needs of
your application or do something unsafe, and you also filter responses
on the way back up. After the model has produced
a response to the prompt, you may decide that you want to filter some or all
of the prompt out. A natural thing where
this happens is with the safety infrastructure
that you're going to see Sarah Bird talk
about in her talk later. But there are other
reasons that you may want to do some filtering
on the responses. You also have this unit of prompt code called
the meta prompt, and the meta prompt is
the standing set of instructions that you give
to your copilot that get passed down to the
model on every turn of conversation, that
tells it how to accommodate itself
to the copilot that you're trying to build. It's where a bunch
of your safety tuning is going to happen. It's where you tell the model what personality
you want it to have. For instance, we use the
meta prompt to do things like telling Bing Chat to be more balanced, versus
more precise. It is also how you teach
the model new capabilities. You can even think
of meta prompt design as a form of fine tuning, and so it's just far easier to do things in the
meta prompt than to have to go down
to the lower layers of the infrastructure and
start rolling your own things. Once you get past the meta prompt and the
prompt filtering stages, you start to think
about grounding. Grounding is all about adding
additional contexts to the prompt that may
be useful for helping the model respond to the
prompt that's flowing down. In the case of Bing Chat, which I think is the first
place that was really doing retrieval-augmented
generation before retrieval-augmented
generation had a name. We basically look at the
prompt, the user query, and issue a query to the search index to find relevant documents
for the prompt, we add those documents to the prompt and
send it to the model so that it has an extra context
to provide a good answer. Increasingly, people are using vector databases to do
retrieval augmented generation. You may take the prompt, compute a set of
embeddings for them, and then do a lookup
in a vector database that is indexed by
those embeddings to get relevant documents for
the prompt and give that extra context for the model to give
you a better answer. But you may also augment the prompt and do grounding
with arbitrary web APIs, and you can even
think about using plugins for doing grounding. The next step here is, this is where plugin
execution happens. At this stage, again, what I just mentioned
in grounding, you may use the plugin to add some extra contexts to the prompt before it
goes down to the model, or you may do a plugin execution on the way
back up from the model so that you can take
an action on a system. Once you get through all of the stuff in the
orchestration layer, and I should say also, you maybe do multiple turns
through this whole system. Calling multiple models,
making multiple passes through this whole pipeline in order to get what you need
from the system. But at the very
bottom of the stack are foundation models
and infrastructure, and we give you a bunch
of choices for how to use foundation models in
this Copilot platform on Azure and on Windows. You can choose to use one of the hosted foundation models,
like the ChatGPT model or the GPT-4 model that are now available on the Azure
OpenAI API service. You can fine-tune one of these
hosted foundation models, the ChatGPT-3, 5, fine-tuning APIs are live
now, and you'll be able to fine-tune GPT-4 soon. But if neither of those
options work for you, if you have exhausted all of the
things that you can do in the orchestration layer to get your copilot to
do what you need, and neither of these
things will work for you. Like, you can't wholly solve
your problem with hosted API because
of whatever reason, you can't use the
fine-tuning APIs to accomplish what you
want to accomplish, you can bring your
own model and we are incredibly excited about what's happening in the open-source
community right now. There's a bunch of brilliant work happening
with open-source models, and one of the
things that you will see in the next talk is we have the Azure AI model catalog that is going to be a place
you can go inside of Azure to find the most popular models on Hugging Face and in GitHub, where you'll be able to push
button provision and deploy those models to Azure to
use in your copilots. Also, you can train your
own model from scratch. As we mentioned several times, from the most ambitious
models in the world, the ones that
OpenAI are training, all the way down to
smaller things, like Azure AI supercomputing
infrastructure and our environment and give you a great way to train your model from scratch if
that's what you need to do. This is the Copilot
stack, top to bottom. What I want to do now is make this maybe a little
bit less abstract by talking about a
copilot that I wrote. I host a podcast called
"Behind the Tech," and every month, when
the podcast airs, my team comes and bugs me
to write a social media post to advertise the podcast,
and I suck at this. I forget to read my emails. They have to bug me
over and over again, and they really want a
Kevin social media copilot so they don't have to go through the irritation
of dealing with me. I had the honor recently of interviewing Neil deGrasse
Tyson on the podcast. I'm just going to walk you
through this copilot that we built that actually just ran, and it did the social
media posts for the Neil deGrasse Tyson podcast
that just went live. Here's what it looks like. Just end-to-end picture, the copilot runs
on a Windows PC. It uses a mixture of open-source models
and hosted models. It does retrieval
augmented generation, and it calls a plugin to
finish doing its work. Let's walk through
these step-by-step. The first step of
this process is we have an audio file and
we need a transcript. On a\our Windows PC, we take the OpenAI, open-source Whisper model, and run the audio through the
model to get a transcript. It does a really amazing job. Once we have the transcripts,
like the next stage and the orchestration, is we have
the Databricks Dolly 2.0, 12 billion parameter,
large language model running on our Windows PC, and we ask it some things
about the transcript. For instance, who was the
guest in this episode? Because again, we want
to do this lights out, not have to have Kevin
answering a bunch of questions, because he's
slow and annoying. The next thing that we do once we have the transcript
and we have all of this information that
we've extracted from the transcript, like we
want to send a chunk of that to the Bing API,
or like we want to send Neil's name to Bing
API to get a bio, and then we're going to combine
all of this stuff together into a single packet
of information, like a big prompt that has some stuff about
the transcript, some some stuff about Neil, and we get our
social media blurb. Like, this is a
pretty good blurb, so we're going to go
to the next step here, which is like, we
need a thumbnail. We call our hosted OpenAI API to get an image from the DALL-E model; this
looks pretty good, it's cosmic, it's podcasty, like plenty good
enough for this post. The last step is
we want to invoke a plugin for LinkedIn that
will take this thumbnail, and the post and the link
to the podcast and just post it to my LinkedIn feed. Before we take an action
on the user's behalf, we want to present to them
like what it is that's going to happen, because if
for some reason or another, the model went
haywire and produced something that we
didn't want to post, like once I hit "Yes," this is going to 800,000
people on LinkedIn. We review, we click "Yes,"
and we post. This is the live post that's
on LinkedIn right now; you should go check out this
episode with Neil, it's awesome. This is really just
an illustration for you all, like I'm not
claiming that this is the most interesting
copilot in the world, but it was really
pretty easy to do. We posted all of the code on GitHub repo, like I
encourage all of you to check it out, like it's a
good template for thinking about how to build
your first copilot. The thing that we want to talk
about last before we jump in to Scott's keynote
is AI safety. It's the first thing that we think about when we're building copilots, and we think about it at every
step of the process. You're going to hear a ton about this great AI safety work
from my colleague Sara Boyd, who runs our Responsible
AI infrastructure team inside of the AI platform group. It's really super good stuff, like we're giving you all
some amazing tools to go build really safe,
responsible AI applications. Just very quickly, like
I want to mention, one of the things
that you're hearing here, Satya mentioned
is, we're giving you a bunch of amazing
media provenance tools that will help users understand when they're seeing generated content or not. Like we're going to be
watermarking all of the content that we are
producing and we're giving you tools where if
your AI applications, your copilot is generating
synthetic content, you'll be able to
call our APIs and add these cryptographic provenance
watermarks to your tools, is super exciting stuff. Copilots, you
have heard from us that we have this amazing new software
development pattern. You have heard about
how we think about architecting copilots,
and you have heard our enthusiasm
that not only are there going to
be a bunch of copilots from Microsoft and from our
partners, but we really think that you all are going to
be the ones who build the most interesting
copilots in the world. It's just like any
other major platform, like the thing that
makes your PC great, the thing that makes
the internet great, the thing that makes
a smartphone great, aren't the things
that launch when those platforms
launch, it's what you all will create
on top of them. I want to share one
anecdote before we go. I was an intern at
Microsoft Research in 2001, I came to MSR with my PhD advisor when he
went on sabbatical. We would go out with
our research group every Thursday to this burrito joint in Bellevue that I think it's closed now,
called Acapulco Fresh. Occasionally this
gentleman would join us, his name
is Murray Sargent. Murray, like I was a 30-year-old PhD student,
seemed like a legend to me because Murray was the
guy who had broken the 64K limit on the
Intel microprocessors. Many of you may be too young to even remember this,
but at one point in time, the computers that we
shipped could only use 64 kilobytes of memory for doing the work that they had
to do, and Murray was the guy, when the 286 came
out, that figured out protected mode and got
Microsoft software to work beyond that 64K
memory barrier. It's unbelievable to
think about what impact small things like that had on the trajectory
of the industry. I was in awe of Murray and I wondered every time we
had lunch with him, what am I ever going to do
in my career that would allow someone like me,
a younger version of myself to look
at me and think, "Wow, like this guy did
some legendary stuff." This is the moment
for all of us now; we have capabilities in our hands with these new
tools, in the early days of this new platform, to
absolutely do amazing things, where literally, the challenge
for you all is to go do some legendary shit that someone will be in awe
of you for one day. With that, I would
like to bring to the stage my colleague, Executive Vice President
of Cloud and AI, the legend himself,
Scott Guthrie.