- Hi, welcome to Generative
AI Foundations on AWS. My name is Emily Webber. I'm a principal machine
learning specialist at Solutions Architect and at AWS. And today you are gonna
learn about generative AI. You've heard about generative AI, you've heard about all sorts of things. The purpose of this class
is to dive super deep. So we have, in fact, no
less than seven topics that you're gonna get to
learn about here today. So they're broken up
into different classes. Each class is about 45 minutes of slides. So you're gonna get to learn
about lots of concepts. You get to dive really deep and
explore these complex topics and interests that you may have. And then we're gonna have a hands-on demo. So each of the 60-minute
sessions basically, that you'll be able to
just watch on YouTube. So you can fast forward, you
can slow down, you can pause, take screenshots, do
whatever it is you wanna do, and then you're gonna get
all the resources, basically. So all the content you're gonna
be able to view and watch, and then step through on your own. And so with that, let's get started. So today, the session we're
gonna learn about right now, this session is what
are foundation models? What on earth is a foundation model? Where do they come from? How do they impact generative AI and the end-to-end life cycle for interacting with maintaining,
updating, troubleshooting, foundation models. And in particular, in
the hands-on walkthrough, we're gonna take a look at
foundation models on AWS and especially on Amazon SageMaker. You'll notice that there
is in fact a Llama here on this slide that's 'cause we're all fond of the Llama models. And you'll see them come up quite a bit. And so, hypothetically
speaking, just for fun, let's say I asked you to learn
everything on the internet, which right away, you know,
is basically impossible. Like, obviously, how could
someone learn everything on the internet? But for the sake of argument,
let's say you tried to do it, so you'd probably look at the structure of the most popular sites on the internet. Maybe you'd map out some
type of decision tree of all of the different
areas and topics and domains. And then you try to store your knowledge of these things, right? You might store your notes, you maybe, you'd wanna store the files. Obviously they're nicely
stored for you already, but in any case, you'd
wanna be storing your notes or something, but it's gonna
take a long time, right? The largest bottleneck here is really just the human time that it would take to, to literally read everything online. And so just to put some
numbers here, last I checked, there were just under 6
billion pages on the internet. And actually the average time
that a person spends looking on a website is 52 seconds. So that's pretty short page viewing. But nonetheless, multiply those together, you get right around 5 billion
hours that it would take just to view all of the
pages online and skim them. And now realistically, if you
were actually reading this, obviously you'd spend
some time on more pages, some time on less pages. But in any case, if you look
at how many human hours we work in a given year, assuming
I'm working about eight hours a day, maybe I'm working
five days in a week, maybe I'm working 50 weeks in a year, it's gonna take a human
about 255,000 years to read everything online,
which is insane, right? Obviously that's multiple human lifetimes. And then on top of that, the
internet is evolving, right? There's so much information online, there's so many new creatives,
new content, new you know, things that are popping up constantly. And so that number of
files that you have to read just continues growing. A foundation model can
do this in a few months. So this is why foundation
models are so exciting. This is why foundation
models are so interesting. They're able to, through
large scale neural networks, through distributed training, through PyTorch and scripts and what not, they're able to quote unquote
learn or read or understand or parse massive, massive
amounts of information, massive amounts of data. So this is why foundation
models are so powerful. You can do a lot with foundation models. Essentially, a foundation model
is a machine learning model that's designed to cover
many different tasks. And so in traditional
machine learning models, you would use say, a classification model or a regressive model to
solve just one or two tasks. And foundation models are
powerful because first off, they're trained on so many
different sets of data and these massive data sets. And then essentially they
learn naturally occurring learning styles from those files online. So they'll naturally learn classification, they'll naturally learn
question-answering, they'll learn summarization
and things like this. And so today you can build applications using foundation models
for almost anything. There's a huge amount of
creativity that folks are bringing to design and develop
net new foundation models to incorporate foundation models
into existing applications. Design new applications
from NLP to computer vision, to code generation, audio generation, video generation, search
summarization, the gamut, right? It's a really exciting space. And interestingly enough, many machine learning tasks where again, years ago we would've looked at this as just a classification task. So you would take a tax, for
example, this person says, I'm not that into this house. It's too expensive and it's too far from the train line, right? And we put this into a
machine learning model. And then this model
ahead of time was trained to perform binary classification. So it's just labeling
that text as good or bad, positive or negative. And so here the model would return with a negative sentiment. So it's gonna output sentiment negative, and that's traditional classification. Today, you can recast this
task as a generative task. So you can take this same
text, the same phrase, and then give it an instruction. So in the prompt for your
large language model, you can just add a prompt that says, classify the sentence into
positive or negative sentiment. You can have a more
fun way of saying this. You can say like the
instructions specifically. So this instruction, you can
word this however you want to. You can word it as, is
this person happy or sad? Are they likely to buy
or likely to not buy? You can use really any type
of prompt that you want and that you find works
well for your use case. And so in any case, you'll
put this whole prompt as it's called into the model, and then your agent will
respond, classifying it. And so again, whereas
previously we used to use these very discriminative
sort of static models that were explicitly trained
on a small number of tasks ahead of time. Now most people are
using foundation models because they're trained
on these massive data sets and are very powerful. And also because they're so
flexible for different tasks. So you can take the same
model and ask it to summarize this text or ask it to translate this text or ask it to stylize this text. And so it's very computationally
and resource efficient to have everything bundled
in this one foundation model, which you can then use for n
number of downstream use cases. And so there are many ways to
customize a foundation model. Once you pick a foundation
model and start working with it, most people will want to
customize this in a certain way. There are core trade-offs in
customizing foundation models, and we're gonna understand those today. So on this X-axis here, you see
we have complexity and cost. So complexity and cost
are sort of associated with each other in the
sense that when something is a lot harder to do, generally it's gonna
take more time to do it. You're gonna need more experts to do it and you're gonna need
more compute to do that. And so that pushes up your cost. At the same time, in many cases
this also improves accuracy. So we're gonna look at a number of different customization techniques. We're updating and
maintaining foundation models that while they can be more
complex and more expensive, they usually will increase accuracy. And so the first one is
called prompt engineering. So let's say you pick
your Llama or your Falcon or your Titan or your any
type of foundation model you're trying to work with, you pick this and then
you send it some prompts, just like we saw on the previous slide. You send it some
instructions, some questions, you send it some prompts, and you'll quickly start to realize that when you update your
prompts, you get a better model. So there's, I'm sorry, you get a better response
more than anything. So when you change and
when you hack the prompt and you stylize the prompt, you can actually develop a
prompt template eventually. That is a way to boost the accuracy and the performance of your model. For most customers and
for most developers, that isn't enough. That's the starting points. The next really common phase to move into is called retrieval augmented generation. We're gonna dive into this
throughout the course. So fear not, but retrieval
augmented generation or RAG basically refers
to a pattern where you, your user will ask a question, they'll type in some type of query, and then we'll take an LLM. And so we'll take the
query and we'll use an LLM to generate embeddings from that query. And then we're gonna search in
the embedding space actually. So we'll look in a corpus of documents, we'll find the most similar document, retrieve that document, and
then generate the response. Actually we can stylize the
response to the consumer and to the customer based on
this document that we have. And so that's retrieval
augmented generation, we're gonna have a whole
class on it later on. So lots of, lots of ways you
can learn more about this. So retrieval augmented
generation is a great next step from prompt engineering. There is another way, however, to improve the performance
of your foundation model specifically for a downstream task. So like, let's say I'm in healthcare, let's say I'm in financial services, or let's say I'm in
media and entertainment and I like working with Llama, you know, I enjoy working with Vicuna and Falcon and all of these open source
or you know, third party LLMs. But what I really wanna do
is I wanna take my data sets and I wanna customize
those foundation models so the performance is exactly in line with what my organization wants to see. So again, prompt engineering
is a way to do that. Retrieval augmented generation
can help you do that. Fine tuning is another step, it's another way to take
samples from your data, be those large samples in the case of unsupervised pre-trainings, sort of like a continued
pre-training or domain adaptation. So you can do that or you
can take just small records, you can put them in your
prompt, which we'll learn about. That's called few shot prompting. You can also actually
update the learnable weights of the parameters producing a new model that's then fine tuned on your domain and on your downstream task. Don't worry, we'll learn about all that throughout the class. And then my personal favorite technique, it's all the way out here. You'll notice this jump. So there's this sort of
exponential jump here in complexity and cost,
but likewise in accuracy. And so pre-training really it refers to creating a new foundation model. So it means taking your
multiple terabytes of data sets that you're interested in working, be that in language or be that in vision or some new niche modality that
you're developing right now. Which by the way, let me know if you are, 'cause that's awesome. But in any case, you're creating
this new foundation model. You've got your neural
network code, your, you know, custom data sets, your
distributed training environment, and you're hacking, this
amazing, brilliant thing. Pre-training is unambiguously the best way to get a much more accurate model, now that is contingent on you
being able to meet performance on certain key steps. And we'll learn how to do
that throughout this class. So there's a whole lecture
just on pre-training on each of these topics actually, so you can learn how to do
this really, really well. And then we come to find out
that the best generative models are actually built on human feedback. So this is something I
am super passionate about in a past life before moving
into computer science, I was a creative writer actually. And so I spent so many
wonderful classes learning about literature and having these
amazing discussions where, you know, 10 people can
read a book, for example and we all interpret the
book very differently. Or all of us watch a movie and we interpret the movie differently. Like we see different themes in it, we see different characters
that are interesting to us, we respond to it differently. And so generative AI and generative models are really powerful when they
aggregate this human feedback at scale. And so we're actually
gonna have a whole lecture that's just on this. So the technical term here
is reinforcement learning with human feedback
and it works like this. You start with a generative model. So this can be a large language model, this can be a computer vision model, this can be a modality star,
sort of any generative model. And then you'll, so you'll start
with this generative model, and then you'll send in
your prompts to this model. You'll send in, maybe you have 10 prompts, maybe you have a couple thousand prompts. And these are like directly
from your business. These are directly from your application, directly from your domain. They really matter. So they can be about
summarizing the call transcripts in your call center. They can be about answering
questions in customer service. They can be about generating
new ads or new domains. They can be about generating blog posts. They can be literally any type of content that you're trying to create. You can do this. And so you'll catalog a
number of these prompts. You take the prompts, you send the prompts to
this generative model, and you'll find out that the model gives, can give you many
different responses, right? So maybe you'll get four
or five different prompts, I'm sorry, four or five different
responses for each prompt. So you have one prompt, you
get like five responses. You're actually gonna send
all of those responses to the, to your users or to data labelers. So humans that you'll hire for
the task of organizing these and ranking these. So the humans will pick their favorites, they'll rank these sort of best or worst, you'll update your training data sets, and then you're actually
gonna train a reward model. So again, whole lecture
just on this topic, but we're gonna learn how
to train a reward model that aggregates this
human feedback at scale. And so this is how generative AI models are able to sort of get around
this really sticky problem of subjective human preferences, like particularly in
literature and vision, right? There are so many ways to
interpret natural language and interpret images. And so using this sort of
aggregated human feedback at scale, we're then able
to use this reward model to update and to improve the
original generative model. And so a couple of the
techniques we just looked at, we looked at instruction fine tuning, where you take key
instructions you care about, and then do sort of a basic
supervised fine tuning. And then we looked at this
reinforcement learning with human feedback lifecycle. And both of these are critical
ways that you can implement and improve your own foundation
models, generally speaking. And so a couple model spotlights for you. So this model spotlight is
obviously stable diffusion. So in this case, we're sending in a prompt
to stable diffusion. So we're sending a prompt
landscape of the beautiful city of Paris, rebuilt near the
Pacific Ocean in sunny California because why not? With great weather, Sandy
beach, palm trees, architecture, et cetera, et cetera. And then we get this sort
of amazing, amazing output. I did this myself. I went into the SageMaker
console after looking online to find good prompts. So I go look up good
prompts, copy the prompt, paste it in to my SageMaker
Foundation model hub, and then I generate this
amazing image, download it, and we're ready to go. You can also, in stable diffusion, you can add negative prompts. So negative prompts are handy because it's a way to
tell the model like, look, I really don't want it to be
blurry or I don't want it to be in a certain category. And so here it's funny
because we said no trees and no green, and yet
clearly we have both trees and both green, but this way it sort of
minimizes those things. So if we hadn't included
these negative prompts, most likely there would
be a lot more green. And so this way we were
going for some of that, kind of that sunny motif. And then when you're interacting
with foundation models, you're sending in
different hyper parameters. So you can set the sort of dimensions of the output image that you want. So here I'm giving it a much larger width, so width of 720 because I want
sort of that landscape view. And then a standard height of 512. If you want a portrait or a square, you can do 512 by 512. And if you want the other
direction, then you just, you just rotate it out. Great and then a couple of
other hyper parameters as well. The guidance scale is interesting because this is a way to
sort of tell your model how intensely you want it
to care about the prompt. Like if you want the model
to really just obsess about the prompt and do nothing else, then you set a higher guidance scale. You max out that guidance scale. If you want the model to be
a little bit more creative, to have some sort of you know, liberties in how it interprets the prompt, then you reduce the guidance scale. It's more common actually to
see a very detailed prompt like this with a higher guidance scale. So in this case, it's interesting that I use such a complex prompt with actually a pretty
liberal guidance scale because this is a lower number. So I'm giving the model more
freedom to do what it wants and it still comes back beautiful. The seed is another interesting
hyper parameter you can set, because setting the seed is sort of a way of giving the model like
a completely new modality to explore. So if you set the seed to
really any other number, it will encourage the model to like pick a completely different style
or a completely different mode, different colors, different
shapes, different backgrounds. It will still be following your prompts, particularly depending
on the guidance scale. But when you just change the seed, that's an easy way to just
sort of get a different model or a different response
rather that you prefer. And so let's take a look at
language foundation models because these LLMs are
certainly the most popular thing to talk about in technology today. But as we'll come to find out, they're not that new actually. Foundation models and
language foundation models have been around for a long time. So back in 2017, right
before I joined Amazon, actually the transformer
emerged from the planet whose name I don't remember
to save the humans, but no, I'm kidding. So the transformer is a
machine learning model that is designed to operate
really well on sequences. So the core transformer was actually built to handle translation. So it has two parts, an
encoder and a decoder. So it takes in a string of
text and it outputs a string of text originally again done to enhance machine translation. And transformers were interesting because they operated
really, really well at scale. So they set a new state of the
art for machine translation, but more than anything, there were a net new way of thinking about how to learn sequences. So rather than recurrent neural
networks rather than LSTMs, rather than CNNs actually. So yeah, so transformers became
this really interesting way of approaching knowledge using this core self attention mechanism. That's a lot of matrix multiplication. And so in any case, the year after that gave us two models that sort of supercharged NLP. And so one of them was of course BERT, the bidirectional encoding
representation transformers, the BERT model and BERT
models are really useful for classification. BERT models are encoder only, which means it's going to a larger output and produce a smaller, as
larger input, excuse me, produces smaller outputs. And BERT models are handy
again for classification and for smaller tasks. BERT models tend to fit on single GPUs or single accelerators,
and they're quite handy. We also saw GPT-1 in 2018, which back then was sort of
interesting but not a big deal. And so in any case,
there've been many, many, many years of NLP fascination
with language models. Year over year we saw
these interesting scales of language models where
researchers propose these scaling laws to help us just be bold and throw even more data
and even more accelerators at these models and really
produce these amazing results. And so you see that now in 2023, there are a lot of foundation
models and language, whereas previously there were just a few. And so what this timeline tells
us is that foundation models and large language models
have been around for years. There's a very active,
very interesting, robust, mature research community
that's exploring these and you too can benefit
from that, from them. That's all we're saying here. So one other foundation
model spotlight for you, AI21 is an AWS partner. Their models are available on the SageMaker Foundation model hub. They're also a customer
of AWS as is Stability for the record. And so in any case, this
AI21 Jurassic-2 model, jumbo means it's quite large. So jumbo means that it's north
of 100 billion parameters. And we'll give it a prompt. And so here the prompt I'm
giving it is tell me a story about a dog, running down the street and then I click generate text. And this is in the SageMaker
Foundation model hub, by the way, where we
have this nice playground that we'll take a look
at in the hands-on demo. And so we generate this text
and then we get this cute story about a dog named Max. He's a very happy dog, he loved to run. Once he was out for a walk, Mr. Jones, Mr. Jones was holding his leash, Max pulled him down the street, Max was excited, he just wanted to run. Mr. Jones was having a
hard time, et cetera. So it's funny because
this sounds like a story, but we don't really
have a narrative, right? There's not really a conclusion,
there's not a climax. So anyway, there are further
ways to evaluate this, but it's still pretty close
to being a good story. And then on this side, I'm sort of giving the
model a red herring. So I'm seeing like how complex it can go. So if you found two shoes, one for the right foot
and one for the left foot, how many shoes would you have? Now obviously we're pretty sure
this is gonna be two shoes, but we just wanna make
sure that the model has this sort of basic common sense. And so if you found two shoes, one for the right foot,
one for the left foot, you would have two shoes, great. So the model is able to sort of read this, somewhat more complex prompt and give us a reasonable answer. So that's good. All right and so the typical
foundation model lifecycle starts with picking a
base foundation model. And again, the second
lecture in our series is gonna be just about how to
pick a good foundation model. So we're gonna dive pretty deep into that. So we'll pick a base foundation model according to the domain,
the modality, the size, the performance, et cetera
of that foundation model. Then we're gonna use prompt
engineering on that model. We'll develop prompt templates, we'll hack the syntax to
get really good performance and then we'll evaluate the
performance with our users. We'll actually send the
responses of that model to our users, see if they like it, see what their responses are, store those human preferences,
fine tune the model. So actually improve those
base trainable weights to make the model even more
performant in our domain. Then we're gonna update that
original foundation model and put it back into our application. And so in the lecture
and in the whole class, we'll learn about each of these
steps in much more detail. And so with that, let's
take a look at the demo. So in this demo we are
going to explore a notebook, as it were. This is gonna be a notebook
of the Falcon model, which is running on SageMaker Jumpstart. And we're gonna interact with this to learn about text generation. So feel free to follow
along with me if you like. The short URL is right
here, bit.ly/sm-nb-1. This is already a public
notebook we're gonna step through or you are welcome to
just scan that QR code and have the notebook sent to you in your manner of choosing. So now that you have the
notebook, let's get to it. All right, so here we are. As you can see, of course, we're in AWS, sitting
here in North Virginia. And this is SageMaker, right? So in SageMaker we have
these foundation models, these are models that
you can interact with to do all sorts of things,
do all sorts of tasks. So some of them are open source models. As you can see we have Falcon
in a variety of options, some using BF16, actually
quite a few Falcon using BF16. But in any case, instruct
models and then generic models. By the way, if you're gonna fine tune, feel free to start with models that haven't been instruction fine tuned. If they've already been
instruction fine tuned, then you're not gonna wanna
fine tune that further. But if not, then that's
good for fine tuning. But in any case, what's
handy about the models here and especially the playground,
is that we can poke at them. So let's say we wanna work
with AI21 as an example of a proprietary model that's available in SageMaker Jumpstart. We'll click view model. And then after we've clicked view model, it's gonna take us to
the model details page in just a minute here. Great. All right, so the model details page, and we see this is indeed AI21 Jurassic, and what do you know? There is a playground available. And so the playground is really handy. It's a way that you can
prompt the model directly. And so essentially this means you're not of course hosting the model, you're just sending it the prompt and getting the response back. And so we can choose a few examples, actually we can choose,
let's see, outline creator, and then let's see if we can make this bar a little bit larger. Here we go, great. So we're gonna write sections, we're gonna write sections
to a great blog post for the following title, how
to start a personal blog, blog sections. Okay so clearly we have
a few examples here. So this is your few shot prompting, and then now we're asking
the model to write sections for this new blog. So let's see what we've got. So we're gonna generate the text. All right and here we have new sections. Great. Okay, so clearly it works. We get content out that seems reasonable. And now let's say that
we've interacted with it through the playground, we're ready to move into
the notebook experience. That is over here. So in this view, you can see I'm running
on SageMaker studio. What is SageMaker studio you ask? So it's a IDE for machine learning. But beyond that, it actually
runs lots of compute. So every notebook that
you're running on SageMaker is actually, we call it a
kernel gateway application. What that means is it's a
different instance actually, where it has the ability to
run on a different instance. And so I can change this out. For example, just in your
notebook, you can click here, the stop, right? You can give this a click and then actually select
any of these instances. And now remember, this isn't
changing your entire IDE, like the visual that you see here that is provided by a Jupyter server, which is actually built
and managed by Amazon. And so just to see that
in the console here. So let's say we go out to the AWS console, and let's check out SageMaker studio. So right up here. And then let's say we
want to manage studio that's under domains. And I'm using this diffuse domain and I'm running on this Falcon. And you'll notice that there are a couple different parts here. One part is this Jupyter
Server application, again, built to managed by Amazon. And this is running your visual here. So the Jupyter Lab experience
and this whole visual browsing experience that is literally
running on this guy, on this Jupyter server. And so every time we run a notebook, what we're gonna do is
actually connect that into the Jupyter server. So let's say I wanna
create a new notebook. So let's say I create a new notebook and just for fun, maybe I need a GPU, or maybe I wanna run on
a custom accelerator. So I have all of these instances, I can choose from many
different options of M-series, C-series, accelerated
compute, memory optimized, all sorts of things. And so yeah, so what I'm
saying is you can pick from any of these instances
for each notebook, actually lemme just pick one to show you. So for each notebook, again, this is gonna be running
on a different machine. And then over here on this left hand side, as the instances come online, you'll start to see
them actually show here. And so this will give us an
indication of the instances that are available in our IDE, I digress. Let's get back to Falcon. So now that we know where we are, you'll know that I downloaded the notebook from the SageMaker examples. So for those of you who
are following with me, this is the notebook you should see. So SageMaker Jumpstart, text
generation with Falcon models. We're gonna go over here and
let's see how far we can get. So first off, you're gonna be
installing the SageMaker SDK, and then we're gonna point to a model. What's interesting about this notebook is that it actually gives
you a handy dropdown. So you can see there are
many different model IDs. You've got Falcon 40b and then instruct, and then the same for
7b and then instruct. So again, obviously the instruction ones have already been instruction fine tuned, and the base ones have not. And then we have this
cute little dropdown here that lets us pick the model
we'd like to interact with. So I'm gonna interact
with the 7b instruct, and then we can just
confirm that yes, indeed that is the right one. And let's just show you here. So the model ID that I'm
interacting with is this one. Yeah, so we've got the Falcon 7b instruct, and then we're gonna do our SageMaker. One line model.deploys. So this is scarily easy because
it's already in jumpstart. So because this model
already is packaged nicely to be hosted on SageMaker, we can just hit this
one line model.deploy, and then the predictor comes up. Now I've already done this, so I will avoid the wait time here. You see this, this took a good
16 minutes to turn online. So do be patient if you're
running this at home. And then the notebook
authors were very helpful and indicated different instances that have been tested
with the Falcon model. And so we see 7b is on the
g5 across a couple options. And then the p4d, so
different varieties for you. And then the 40b as well,
some of the larger g5s. And then of course the p4d. Pro tip, make sure you pick
the smallest instance you can. If you're new to AWS, that means going with
the smaller number here. So a smaller T-shirt size
if you will, the small size, that means the instance
is literally smaller, it's gonna have fewer CPUs, it's gonna have fewer accelerators. And everything about
it is gonna be smaller. The amount of bandwidth that sees, if there's any instant storage,
that's gonna be smaller. And so as a corollary,
when you pick a larger one, so that 48 that's there,
there's gonna be more CPU. It also means the pricing is heftier with the larger instances and the pricing is smaller
with the smaller instances. And so you always wanna pick
the smallest instance you can, generally speaking as a
way to keep costs low. And that's what we're gonna do here. And so they have a couple notes for you about changing the number of
GPUs, which is very handy. And then here, actually I like this. So if you are using a larger instance, which sometimes you wanna do because maybe you're
maxing out throughputs or you're testing
different hyper parameters that actually need more infrastructure. And so you'll need a larger instance. So if you're gonna do that, just make sure you set this parameter. So my model.environment, and then just increase the
number of GPUs right there. All right, and so now theoretically, this model should be deployed. And actually we don't have
to be theoretical about this. We can just check. So I'm gonna go up to
this little home folder and let's go down to deployments. Let's see what endpoints we have. I'm gonna close that out. And lo and behold, we do
indeed have endpoints. This is great. So this is the SageMaker
example, hugging face. And then what do you know,
the hugging face LLM, Falcon 7b instruct handy. Let's do it. One, two, three, take a breath. Let's roll. Great. Okay, so here's our prompt. This is the prompt we send to the model. Tell me about Amazon SageMaker. You'll see we're putting
that in this payload object. So our inputs are indeed this texturing and then the parameters. So how the model should
interact with this prompt. Those of you who are
more familiar with say, playground experiences, you
might not be comfortable with these parameters and that's okay. Don't stress out about it. But you data scientists out there, obviously you wanna consider
these in more detail, but using the default values
is always a good choice for starting. So we sent in our prompt, tell
me about Amazon SageMaker, and we get back a paragraph
from the model again. So this model is coming
directly from the predictor. This is coming straight
out of the Falcon 7b model. I did not write this, nor did I put this in the model myself. This came out from the model Amazon, let's just read it here. Amazon SageMaker, lost developers, create, train and deployment
machine learning models from the infrastructure. Hey, looks good to me. I would check the box on that. Great. Okay and then the notebook
gives us some more information about the Falcon model built by TII, and is currently the best
open source model available via the open LLM
leaderboard, which is great. And then you'll see a
couple more functions here. Let's step through this. So we have this nice query endpoint, which basically is a lightweight wrapper around this predictor.predict, and then is gonna give us some
inputs and responses nicely. So, so now we'll query the endpoint, and this time we're gonna
ask it to write a program to compute the factorial in Python. See if it does this. All right, here is a Python
program to compute factorial. I am pretty sure this is the factorial. I actually don't remember
the factorial equation. If you're interested in
looking this up at home and then letting us know
if it is indeed accurate, please let me know. I am pretty sure it does
have to do with certainly n-1, so hmm. Yep and then it's this recursive function 'cause it takes itself. Okay, great. Onward ho. Next we're gonna ask
it to build a website. Hmm, it's funny 'cause I was
thinking like in my mind, to me this means, hey, give
me the code to do this, but yeah okay, here
obviously they're saying in 10 simple steps. So what we want are the
10 steps that you need to create a website. So choose your domain name, register, web posting provider, which frankly, arguably you would choose
that before the domain because you can't really
register the domain before you have the web hosting
provider, but that's fine. And then you create your
website design, add content. Yeah, looks pretty good. I mean, notably they don't
make any technical suggestions about how you would do
any of these things, but at least it's a good
list of 10 things to do. And then translation. So translate English to
French, sea otter is to, I will not butcher this for you, but I'm gonna take their word for it, that this is indeed the French
way of saying peppermint or no, I'm, I won't say that one for you. And then cheese, here we go. Okay, so let's run this. Ah yes, fromage, of course. And then we'll do some sentiment analysis. Great, so in this case, here
we're doing a little bit of few shot prompting actually, because we wanna tell the model
to provide this sentiment, be it negative or positive, and then we give it this last tweet. New music video is incredible
and the sentiment comes back and it's obviously positive. Couple more examples here. When was the C programming
language invented? Okay, this one we should see. So folks, it's always good
to check these things. So let's see if we can check this. When was the C programming
language invented? Okay. The most creative period
occurring during 1972. Okay, Bell Labs. All right, looks pretty accurate to me. And then the recipe for a
delicious lemon cheesecake, graham crackers, butter, cream cheese, and then the instructions. Okay, great. Into the springform pan, springform pan. Then beat the cream
cheese and sugar together. Add these things, bake,
cool for 10 minutes and then sprinkle on top. I mean, that sounds pretty fair. We just generated a recipe for cheesecake. And then the last one here
is gonna be summarization. So the summarization example, they are providing this extensive input. So the input is this, right? It's basically three paragraphs of content describing the Falcon model. And then we get information
about all the places Falcon is available, the
different use cases it solves. And then this all comes back, right? So here is the input because of that little wrapper function. So here's the input and then
you see it has the instruction right down here. So summarize the article above, and then the output is right here. TII made state of the Art Falcon model, available on SageMaker Jumpstart, pre-trained models, et cetera. Great. Okay, so accurate summarization. And then a handy guide for
some of the parameters as well. And then a couple limits on
inputs, medium and large, which is specifically the number
of input and output tokens. And if you're new to NLP, remember as a token is a part of a word, basically tokens are how
we decompose language to feed them to machines. And then there's a little bit more, this is a, hmm, generating
few tokens at a time. Okay, so this is sort of
iteratively, that's right. Walking through this range
and then feeding them to the model sequentially. And then at the end we'll do a cleanup. Alright. Yeah, great. We've got the list. Multiple iterations, this
is fun, iteration three. Hmm. So basically what this is
saying is that when the document that you wanna send to a
foundation model is too long, if it's too long to fit
in the token length, then just send it through
piece by piece basically. So you can loop through the document or loop through your range and then send parts of the document up and it will generate responses
to different pieces of that. And so here we went through 10 iterations listing a variety of services. This is not uncommon actually to see this kind of degradation. So here we just, I feel like
there's probably some looping in the model going on
here in not a good way. But in any case, that was an
example of using tokenization in order to perform text generation with Falcon on SageMaker Jumpstart. And so that is the end
of this first video. I hope you enjoyed it. And in the next one we are going to learn about how to pick the
right foundation model. So I'll see you there.