Generative AI Foundations on AWS | Part 1: Introduction to foundation models

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Hi, welcome to Generative AI Foundations on AWS. My name is Emily Webber. I'm a principal machine learning specialist at Solutions Architect and at AWS. And today you are gonna learn about generative AI. You've heard about generative AI, you've heard about all sorts of things. The purpose of this class is to dive super deep. So we have, in fact, no less than seven topics that you're gonna get to learn about here today. So they're broken up into different classes. Each class is about 45 minutes of slides. So you're gonna get to learn about lots of concepts. You get to dive really deep and explore these complex topics and interests that you may have. And then we're gonna have a hands-on demo. So each of the 60-minute sessions basically, that you'll be able to just watch on YouTube. So you can fast forward, you can slow down, you can pause, take screenshots, do whatever it is you wanna do, and then you're gonna get all the resources, basically. So all the content you're gonna be able to view and watch, and then step through on your own. And so with that, let's get started. So today, the session we're gonna learn about right now, this session is what are foundation models? What on earth is a foundation model? Where do they come from? How do they impact generative AI and the end-to-end life cycle for interacting with maintaining, updating, troubleshooting, foundation models. And in particular, in the hands-on walkthrough, we're gonna take a look at foundation models on AWS and especially on Amazon SageMaker. You'll notice that there is in fact a Llama here on this slide that's 'cause we're all fond of the Llama models. And you'll see them come up quite a bit. And so, hypothetically speaking, just for fun, let's say I asked you to learn everything on the internet, which right away, you know, is basically impossible. Like, obviously, how could someone learn everything on the internet? But for the sake of argument, let's say you tried to do it, so you'd probably look at the structure of the most popular sites on the internet. Maybe you'd map out some type of decision tree of all of the different areas and topics and domains. And then you try to store your knowledge of these things, right? You might store your notes, you maybe, you'd wanna store the files. Obviously they're nicely stored for you already, but in any case, you'd wanna be storing your notes or something, but it's gonna take a long time, right? The largest bottleneck here is really just the human time that it would take to, to literally read everything online. And so just to put some numbers here, last I checked, there were just under 6 billion pages on the internet. And actually the average time that a person spends looking on a website is 52 seconds. So that's pretty short page viewing. But nonetheless, multiply those together, you get right around 5 billion hours that it would take just to view all of the pages online and skim them. And now realistically, if you were actually reading this, obviously you'd spend some time on more pages, some time on less pages. But in any case, if you look at how many human hours we work in a given year, assuming I'm working about eight hours a day, maybe I'm working five days in a week, maybe I'm working 50 weeks in a year, it's gonna take a human about 255,000 years to read everything online, which is insane, right? Obviously that's multiple human lifetimes. And then on top of that, the internet is evolving, right? There's so much information online, there's so many new creatives, new content, new you know, things that are popping up constantly. And so that number of files that you have to read just continues growing. A foundation model can do this in a few months. So this is why foundation models are so exciting. This is why foundation models are so interesting. They're able to, through large scale neural networks, through distributed training, through PyTorch and scripts and what not, they're able to quote unquote learn or read or understand or parse massive, massive amounts of information, massive amounts of data. So this is why foundation models are so powerful. You can do a lot with foundation models. Essentially, a foundation model is a machine learning model that's designed to cover many different tasks. And so in traditional machine learning models, you would use say, a classification model or a regressive model to solve just one or two tasks. And foundation models are powerful because first off, they're trained on so many different sets of data and these massive data sets. And then essentially they learn naturally occurring learning styles from those files online. So they'll naturally learn classification, they'll naturally learn question-answering, they'll learn summarization and things like this. And so today you can build applications using foundation models for almost anything. There's a huge amount of creativity that folks are bringing to design and develop net new foundation models to incorporate foundation models into existing applications. Design new applications from NLP to computer vision, to code generation, audio generation, video generation, search summarization, the gamut, right? It's a really exciting space. And interestingly enough, many machine learning tasks where again, years ago we would've looked at this as just a classification task. So you would take a tax, for example, this person says, I'm not that into this house. It's too expensive and it's too far from the train line, right? And we put this into a machine learning model. And then this model ahead of time was trained to perform binary classification. So it's just labeling that text as good or bad, positive or negative. And so here the model would return with a negative sentiment. So it's gonna output sentiment negative, and that's traditional classification. Today, you can recast this task as a generative task. So you can take this same text, the same phrase, and then give it an instruction. So in the prompt for your large language model, you can just add a prompt that says, classify the sentence into positive or negative sentiment. You can have a more fun way of saying this. You can say like the instructions specifically. So this instruction, you can word this however you want to. You can word it as, is this person happy or sad? Are they likely to buy or likely to not buy? You can use really any type of prompt that you want and that you find works well for your use case. And so in any case, you'll put this whole prompt as it's called into the model, and then your agent will respond, classifying it. And so again, whereas previously we used to use these very discriminative sort of static models that were explicitly trained on a small number of tasks ahead of time. Now most people are using foundation models because they're trained on these massive data sets and are very powerful. And also because they're so flexible for different tasks. So you can take the same model and ask it to summarize this text or ask it to translate this text or ask it to stylize this text. And so it's very computationally and resource efficient to have everything bundled in this one foundation model, which you can then use for n number of downstream use cases. And so there are many ways to customize a foundation model. Once you pick a foundation model and start working with it, most people will want to customize this in a certain way. There are core trade-offs in customizing foundation models, and we're gonna understand those today. So on this X-axis here, you see we have complexity and cost. So complexity and cost are sort of associated with each other in the sense that when something is a lot harder to do, generally it's gonna take more time to do it. You're gonna need more experts to do it and you're gonna need more compute to do that. And so that pushes up your cost. At the same time, in many cases this also improves accuracy. So we're gonna look at a number of different customization techniques. We're updating and maintaining foundation models that while they can be more complex and more expensive, they usually will increase accuracy. And so the first one is called prompt engineering. So let's say you pick your Llama or your Falcon or your Titan or your any type of foundation model you're trying to work with, you pick this and then you send it some prompts, just like we saw on the previous slide. You send it some instructions, some questions, you send it some prompts, and you'll quickly start to realize that when you update your prompts, you get a better model. So there's, I'm sorry, you get a better response more than anything. So when you change and when you hack the prompt and you stylize the prompt, you can actually develop a prompt template eventually. That is a way to boost the accuracy and the performance of your model. For most customers and for most developers, that isn't enough. That's the starting points. The next really common phase to move into is called retrieval augmented generation. We're gonna dive into this throughout the course. So fear not, but retrieval augmented generation or RAG basically refers to a pattern where you, your user will ask a question, they'll type in some type of query, and then we'll take an LLM. And so we'll take the query and we'll use an LLM to generate embeddings from that query. And then we're gonna search in the embedding space actually. So we'll look in a corpus of documents, we'll find the most similar document, retrieve that document, and then generate the response. Actually we can stylize the response to the consumer and to the customer based on this document that we have. And so that's retrieval augmented generation, we're gonna have a whole class on it later on. So lots of, lots of ways you can learn more about this. So retrieval augmented generation is a great next step from prompt engineering. There is another way, however, to improve the performance of your foundation model specifically for a downstream task. So like, let's say I'm in healthcare, let's say I'm in financial services, or let's say I'm in media and entertainment and I like working with Llama, you know, I enjoy working with Vicuna and Falcon and all of these open source or you know, third party LLMs. But what I really wanna do is I wanna take my data sets and I wanna customize those foundation models so the performance is exactly in line with what my organization wants to see. So again, prompt engineering is a way to do that. Retrieval augmented generation can help you do that. Fine tuning is another step, it's another way to take samples from your data, be those large samples in the case of unsupervised pre-trainings, sort of like a continued pre-training or domain adaptation. So you can do that or you can take just small records, you can put them in your prompt, which we'll learn about. That's called few shot prompting. You can also actually update the learnable weights of the parameters producing a new model that's then fine tuned on your domain and on your downstream task. Don't worry, we'll learn about all that throughout the class. And then my personal favorite technique, it's all the way out here. You'll notice this jump. So there's this sort of exponential jump here in complexity and cost, but likewise in accuracy. And so pre-training really it refers to creating a new foundation model. So it means taking your multiple terabytes of data sets that you're interested in working, be that in language or be that in vision or some new niche modality that you're developing right now. Which by the way, let me know if you are, 'cause that's awesome. But in any case, you're creating this new foundation model. You've got your neural network code, your, you know, custom data sets, your distributed training environment, and you're hacking, this amazing, brilliant thing. Pre-training is unambiguously the best way to get a much more accurate model, now that is contingent on you being able to meet performance on certain key steps. And we'll learn how to do that throughout this class. So there's a whole lecture just on pre-training on each of these topics actually, so you can learn how to do this really, really well. And then we come to find out that the best generative models are actually built on human feedback. So this is something I am super passionate about in a past life before moving into computer science, I was a creative writer actually. And so I spent so many wonderful classes learning about literature and having these amazing discussions where, you know, 10 people can read a book, for example and we all interpret the book very differently. Or all of us watch a movie and we interpret the movie differently. Like we see different themes in it, we see different characters that are interesting to us, we respond to it differently. And so generative AI and generative models are really powerful when they aggregate this human feedback at scale. And so we're actually gonna have a whole lecture that's just on this. So the technical term here is reinforcement learning with human feedback and it works like this. You start with a generative model. So this can be a large language model, this can be a computer vision model, this can be a modality star, sort of any generative model. And then you'll, so you'll start with this generative model, and then you'll send in your prompts to this model. You'll send in, maybe you have 10 prompts, maybe you have a couple thousand prompts. And these are like directly from your business. These are directly from your application, directly from your domain. They really matter. So they can be about summarizing the call transcripts in your call center. They can be about answering questions in customer service. They can be about generating new ads or new domains. They can be about generating blog posts. They can be literally any type of content that you're trying to create. You can do this. And so you'll catalog a number of these prompts. You take the prompts, you send the prompts to this generative model, and you'll find out that the model gives, can give you many different responses, right? So maybe you'll get four or five different prompts, I'm sorry, four or five different responses for each prompt. So you have one prompt, you get like five responses. You're actually gonna send all of those responses to the, to your users or to data labelers. So humans that you'll hire for the task of organizing these and ranking these. So the humans will pick their favorites, they'll rank these sort of best or worst, you'll update your training data sets, and then you're actually gonna train a reward model. So again, whole lecture just on this topic, but we're gonna learn how to train a reward model that aggregates this human feedback at scale. And so this is how generative AI models are able to sort of get around this really sticky problem of subjective human preferences, like particularly in literature and vision, right? There are so many ways to interpret natural language and interpret images. And so using this sort of aggregated human feedback at scale, we're then able to use this reward model to update and to improve the original generative model. And so a couple of the techniques we just looked at, we looked at instruction fine tuning, where you take key instructions you care about, and then do sort of a basic supervised fine tuning. And then we looked at this reinforcement learning with human feedback lifecycle. And both of these are critical ways that you can implement and improve your own foundation models, generally speaking. And so a couple model spotlights for you. So this model spotlight is obviously stable diffusion. So in this case, we're sending in a prompt to stable diffusion. So we're sending a prompt landscape of the beautiful city of Paris, rebuilt near the Pacific Ocean in sunny California because why not? With great weather, Sandy beach, palm trees, architecture, et cetera, et cetera. And then we get this sort of amazing, amazing output. I did this myself. I went into the SageMaker console after looking online to find good prompts. So I go look up good prompts, copy the prompt, paste it in to my SageMaker Foundation model hub, and then I generate this amazing image, download it, and we're ready to go. You can also, in stable diffusion, you can add negative prompts. So negative prompts are handy because it's a way to tell the model like, look, I really don't want it to be blurry or I don't want it to be in a certain category. And so here it's funny because we said no trees and no green, and yet clearly we have both trees and both green, but this way it sort of minimizes those things. So if we hadn't included these negative prompts, most likely there would be a lot more green. And so this way we were going for some of that, kind of that sunny motif. And then when you're interacting with foundation models, you're sending in different hyper parameters. So you can set the sort of dimensions of the output image that you want. So here I'm giving it a much larger width, so width of 720 because I want sort of that landscape view. And then a standard height of 512. If you want a portrait or a square, you can do 512 by 512. And if you want the other direction, then you just, you just rotate it out. Great and then a couple of other hyper parameters as well. The guidance scale is interesting because this is a way to sort of tell your model how intensely you want it to care about the prompt. Like if you want the model to really just obsess about the prompt and do nothing else, then you set a higher guidance scale. You max out that guidance scale. If you want the model to be a little bit more creative, to have some sort of you know, liberties in how it interprets the prompt, then you reduce the guidance scale. It's more common actually to see a very detailed prompt like this with a higher guidance scale. So in this case, it's interesting that I use such a complex prompt with actually a pretty liberal guidance scale because this is a lower number. So I'm giving the model more freedom to do what it wants and it still comes back beautiful. The seed is another interesting hyper parameter you can set, because setting the seed is sort of a way of giving the model like a completely new modality to explore. So if you set the seed to really any other number, it will encourage the model to like pick a completely different style or a completely different mode, different colors, different shapes, different backgrounds. It will still be following your prompts, particularly depending on the guidance scale. But when you just change the seed, that's an easy way to just sort of get a different model or a different response rather that you prefer. And so let's take a look at language foundation models because these LLMs are certainly the most popular thing to talk about in technology today. But as we'll come to find out, they're not that new actually. Foundation models and language foundation models have been around for a long time. So back in 2017, right before I joined Amazon, actually the transformer emerged from the planet whose name I don't remember to save the humans, but no, I'm kidding. So the transformer is a machine learning model that is designed to operate really well on sequences. So the core transformer was actually built to handle translation. So it has two parts, an encoder and a decoder. So it takes in a string of text and it outputs a string of text originally again done to enhance machine translation. And transformers were interesting because they operated really, really well at scale. So they set a new state of the art for machine translation, but more than anything, there were a net new way of thinking about how to learn sequences. So rather than recurrent neural networks rather than LSTMs, rather than CNNs actually. So yeah, so transformers became this really interesting way of approaching knowledge using this core self attention mechanism. That's a lot of matrix multiplication. And so in any case, the year after that gave us two models that sort of supercharged NLP. And so one of them was of course BERT, the bidirectional encoding representation transformers, the BERT model and BERT models are really useful for classification. BERT models are encoder only, which means it's going to a larger output and produce a smaller, as larger input, excuse me, produces smaller outputs. And BERT models are handy again for classification and for smaller tasks. BERT models tend to fit on single GPUs or single accelerators, and they're quite handy. We also saw GPT-1 in 2018, which back then was sort of interesting but not a big deal. And so in any case, there've been many, many, many years of NLP fascination with language models. Year over year we saw these interesting scales of language models where researchers propose these scaling laws to help us just be bold and throw even more data and even more accelerators at these models and really produce these amazing results. And so you see that now in 2023, there are a lot of foundation models and language, whereas previously there were just a few. And so what this timeline tells us is that foundation models and large language models have been around for years. There's a very active, very interesting, robust, mature research community that's exploring these and you too can benefit from that, from them. That's all we're saying here. So one other foundation model spotlight for you, AI21 is an AWS partner. Their models are available on the SageMaker Foundation model hub. They're also a customer of AWS as is Stability for the record. And so in any case, this AI21 Jurassic-2 model, jumbo means it's quite large. So jumbo means that it's north of 100 billion parameters. And we'll give it a prompt. And so here the prompt I'm giving it is tell me a story about a dog, running down the street and then I click generate text. And this is in the SageMaker Foundation model hub, by the way, where we have this nice playground that we'll take a look at in the hands-on demo. And so we generate this text and then we get this cute story about a dog named Max. He's a very happy dog, he loved to run. Once he was out for a walk, Mr. Jones, Mr. Jones was holding his leash, Max pulled him down the street, Max was excited, he just wanted to run. Mr. Jones was having a hard time, et cetera. So it's funny because this sounds like a story, but we don't really have a narrative, right? There's not really a conclusion, there's not a climax. So anyway, there are further ways to evaluate this, but it's still pretty close to being a good story. And then on this side, I'm sort of giving the model a red herring. So I'm seeing like how complex it can go. So if you found two shoes, one for the right foot and one for the left foot, how many shoes would you have? Now obviously we're pretty sure this is gonna be two shoes, but we just wanna make sure that the model has this sort of basic common sense. And so if you found two shoes, one for the right foot, one for the left foot, you would have two shoes, great. So the model is able to sort of read this, somewhat more complex prompt and give us a reasonable answer. So that's good. All right and so the typical foundation model lifecycle starts with picking a base foundation model. And again, the second lecture in our series is gonna be just about how to pick a good foundation model. So we're gonna dive pretty deep into that. So we'll pick a base foundation model according to the domain, the modality, the size, the performance, et cetera of that foundation model. Then we're gonna use prompt engineering on that model. We'll develop prompt templates, we'll hack the syntax to get really good performance and then we'll evaluate the performance with our users. We'll actually send the responses of that model to our users, see if they like it, see what their responses are, store those human preferences, fine tune the model. So actually improve those base trainable weights to make the model even more performant in our domain. Then we're gonna update that original foundation model and put it back into our application. And so in the lecture and in the whole class, we'll learn about each of these steps in much more detail. And so with that, let's take a look at the demo. So in this demo we are going to explore a notebook, as it were. This is gonna be a notebook of the Falcon model, which is running on SageMaker Jumpstart. And we're gonna interact with this to learn about text generation. So feel free to follow along with me if you like. The short URL is right here, bit.ly/sm-nb-1. This is already a public notebook we're gonna step through or you are welcome to just scan that QR code and have the notebook sent to you in your manner of choosing. So now that you have the notebook, let's get to it. All right, so here we are. As you can see, of course, we're in AWS, sitting here in North Virginia. And this is SageMaker, right? So in SageMaker we have these foundation models, these are models that you can interact with to do all sorts of things, do all sorts of tasks. So some of them are open source models. As you can see we have Falcon in a variety of options, some using BF16, actually quite a few Falcon using BF16. But in any case, instruct models and then generic models. By the way, if you're gonna fine tune, feel free to start with models that haven't been instruction fine tuned. If they've already been instruction fine tuned, then you're not gonna wanna fine tune that further. But if not, then that's good for fine tuning. But in any case, what's handy about the models here and especially the playground, is that we can poke at them. So let's say we wanna work with AI21 as an example of a proprietary model that's available in SageMaker Jumpstart. We'll click view model. And then after we've clicked view model, it's gonna take us to the model details page in just a minute here. Great. All right, so the model details page, and we see this is indeed AI21 Jurassic, and what do you know? There is a playground available. And so the playground is really handy. It's a way that you can prompt the model directly. And so essentially this means you're not of course hosting the model, you're just sending it the prompt and getting the response back. And so we can choose a few examples, actually we can choose, let's see, outline creator, and then let's see if we can make this bar a little bit larger. Here we go, great. So we're gonna write sections, we're gonna write sections to a great blog post for the following title, how to start a personal blog, blog sections. Okay so clearly we have a few examples here. So this is your few shot prompting, and then now we're asking the model to write sections for this new blog. So let's see what we've got. So we're gonna generate the text. All right and here we have new sections. Great. Okay, so clearly it works. We get content out that seems reasonable. And now let's say that we've interacted with it through the playground, we're ready to move into the notebook experience. That is over here. So in this view, you can see I'm running on SageMaker studio. What is SageMaker studio you ask? So it's a IDE for machine learning. But beyond that, it actually runs lots of compute. So every notebook that you're running on SageMaker is actually, we call it a kernel gateway application. What that means is it's a different instance actually, where it has the ability to run on a different instance. And so I can change this out. For example, just in your notebook, you can click here, the stop, right? You can give this a click and then actually select any of these instances. And now remember, this isn't changing your entire IDE, like the visual that you see here that is provided by a Jupyter server, which is actually built and managed by Amazon. And so just to see that in the console here. So let's say we go out to the AWS console, and let's check out SageMaker studio. So right up here. And then let's say we want to manage studio that's under domains. And I'm using this diffuse domain and I'm running on this Falcon. And you'll notice that there are a couple different parts here. One part is this Jupyter Server application, again, built to managed by Amazon. And this is running your visual here. So the Jupyter Lab experience and this whole visual browsing experience that is literally running on this guy, on this Jupyter server. And so every time we run a notebook, what we're gonna do is actually connect that into the Jupyter server. So let's say I wanna create a new notebook. So let's say I create a new notebook and just for fun, maybe I need a GPU, or maybe I wanna run on a custom accelerator. So I have all of these instances, I can choose from many different options of M-series, C-series, accelerated compute, memory optimized, all sorts of things. And so yeah, so what I'm saying is you can pick from any of these instances for each notebook, actually lemme just pick one to show you. So for each notebook, again, this is gonna be running on a different machine. And then over here on this left hand side, as the instances come online, you'll start to see them actually show here. And so this will give us an indication of the instances that are available in our IDE, I digress. Let's get back to Falcon. So now that we know where we are, you'll know that I downloaded the notebook from the SageMaker examples. So for those of you who are following with me, this is the notebook you should see. So SageMaker Jumpstart, text generation with Falcon models. We're gonna go over here and let's see how far we can get. So first off, you're gonna be installing the SageMaker SDK, and then we're gonna point to a model. What's interesting about this notebook is that it actually gives you a handy dropdown. So you can see there are many different model IDs. You've got Falcon 40b and then instruct, and then the same for 7b and then instruct. So again, obviously the instruction ones have already been instruction fine tuned, and the base ones have not. And then we have this cute little dropdown here that lets us pick the model we'd like to interact with. So I'm gonna interact with the 7b instruct, and then we can just confirm that yes, indeed that is the right one. And let's just show you here. So the model ID that I'm interacting with is this one. Yeah, so we've got the Falcon 7b instruct, and then we're gonna do our SageMaker. One line model.deploys. So this is scarily easy because it's already in jumpstart. So because this model already is packaged nicely to be hosted on SageMaker, we can just hit this one line model.deploy, and then the predictor comes up. Now I've already done this, so I will avoid the wait time here. You see this, this took a good 16 minutes to turn online. So do be patient if you're running this at home. And then the notebook authors were very helpful and indicated different instances that have been tested with the Falcon model. And so we see 7b is on the g5 across a couple options. And then the p4d, so different varieties for you. And then the 40b as well, some of the larger g5s. And then of course the p4d. Pro tip, make sure you pick the smallest instance you can. If you're new to AWS, that means going with the smaller number here. So a smaller T-shirt size if you will, the small size, that means the instance is literally smaller, it's gonna have fewer CPUs, it's gonna have fewer accelerators. And everything about it is gonna be smaller. The amount of bandwidth that sees, if there's any instant storage, that's gonna be smaller. And so as a corollary, when you pick a larger one, so that 48 that's there, there's gonna be more CPU. It also means the pricing is heftier with the larger instances and the pricing is smaller with the smaller instances. And so you always wanna pick the smallest instance you can, generally speaking as a way to keep costs low. And that's what we're gonna do here. And so they have a couple notes for you about changing the number of GPUs, which is very handy. And then here, actually I like this. So if you are using a larger instance, which sometimes you wanna do because maybe you're maxing out throughputs or you're testing different hyper parameters that actually need more infrastructure. And so you'll need a larger instance. So if you're gonna do that, just make sure you set this parameter. So my model.environment, and then just increase the number of GPUs right there. All right, and so now theoretically, this model should be deployed. And actually we don't have to be theoretical about this. We can just check. So I'm gonna go up to this little home folder and let's go down to deployments. Let's see what endpoints we have. I'm gonna close that out. And lo and behold, we do indeed have endpoints. This is great. So this is the SageMaker example, hugging face. And then what do you know, the hugging face LLM, Falcon 7b instruct handy. Let's do it. One, two, three, take a breath. Let's roll. Great. Okay, so here's our prompt. This is the prompt we send to the model. Tell me about Amazon SageMaker. You'll see we're putting that in this payload object. So our inputs are indeed this texturing and then the parameters. So how the model should interact with this prompt. Those of you who are more familiar with say, playground experiences, you might not be comfortable with these parameters and that's okay. Don't stress out about it. But you data scientists out there, obviously you wanna consider these in more detail, but using the default values is always a good choice for starting. So we sent in our prompt, tell me about Amazon SageMaker, and we get back a paragraph from the model again. So this model is coming directly from the predictor. This is coming straight out of the Falcon 7b model. I did not write this, nor did I put this in the model myself. This came out from the model Amazon, let's just read it here. Amazon SageMaker, lost developers, create, train and deployment machine learning models from the infrastructure. Hey, looks good to me. I would check the box on that. Great. Okay and then the notebook gives us some more information about the Falcon model built by TII, and is currently the best open source model available via the open LLM leaderboard, which is great. And then you'll see a couple more functions here. Let's step through this. So we have this nice query endpoint, which basically is a lightweight wrapper around this predictor.predict, and then is gonna give us some inputs and responses nicely. So, so now we'll query the endpoint, and this time we're gonna ask it to write a program to compute the factorial in Python. See if it does this. All right, here is a Python program to compute factorial. I am pretty sure this is the factorial. I actually don't remember the factorial equation. If you're interested in looking this up at home and then letting us know if it is indeed accurate, please let me know. I am pretty sure it does have to do with certainly n-1, so hmm. Yep and then it's this recursive function 'cause it takes itself. Okay, great. Onward ho. Next we're gonna ask it to build a website. Hmm, it's funny 'cause I was thinking like in my mind, to me this means, hey, give me the code to do this, but yeah okay, here obviously they're saying in 10 simple steps. So what we want are the 10 steps that you need to create a website. So choose your domain name, register, web posting provider, which frankly, arguably you would choose that before the domain because you can't really register the domain before you have the web hosting provider, but that's fine. And then you create your website design, add content. Yeah, looks pretty good. I mean, notably they don't make any technical suggestions about how you would do any of these things, but at least it's a good list of 10 things to do. And then translation. So translate English to French, sea otter is to, I will not butcher this for you, but I'm gonna take their word for it, that this is indeed the French way of saying peppermint or no, I'm, I won't say that one for you. And then cheese, here we go. Okay, so let's run this. Ah yes, fromage, of course. And then we'll do some sentiment analysis. Great, so in this case, here we're doing a little bit of few shot prompting actually, because we wanna tell the model to provide this sentiment, be it negative or positive, and then we give it this last tweet. New music video is incredible and the sentiment comes back and it's obviously positive. Couple more examples here. When was the C programming language invented? Okay, this one we should see. So folks, it's always good to check these things. So let's see if we can check this. When was the C programming language invented? Okay. The most creative period occurring during 1972. Okay, Bell Labs. All right, looks pretty accurate to me. And then the recipe for a delicious lemon cheesecake, graham crackers, butter, cream cheese, and then the instructions. Okay, great. Into the springform pan, springform pan. Then beat the cream cheese and sugar together. Add these things, bake, cool for 10 minutes and then sprinkle on top. I mean, that sounds pretty fair. We just generated a recipe for cheesecake. And then the last one here is gonna be summarization. So the summarization example, they are providing this extensive input. So the input is this, right? It's basically three paragraphs of content describing the Falcon model. And then we get information about all the places Falcon is available, the different use cases it solves. And then this all comes back, right? So here is the input because of that little wrapper function. So here's the input and then you see it has the instruction right down here. So summarize the article above, and then the output is right here. TII made state of the Art Falcon model, available on SageMaker Jumpstart, pre-trained models, et cetera. Great. Okay, so accurate summarization. And then a handy guide for some of the parameters as well. And then a couple limits on inputs, medium and large, which is specifically the number of input and output tokens. And if you're new to NLP, remember as a token is a part of a word, basically tokens are how we decompose language to feed them to machines. And then there's a little bit more, this is a, hmm, generating few tokens at a time. Okay, so this is sort of iteratively, that's right. Walking through this range and then feeding them to the model sequentially. And then at the end we'll do a cleanup. Alright. Yeah, great. We've got the list. Multiple iterations, this is fun, iteration three. Hmm. So basically what this is saying is that when the document that you wanna send to a foundation model is too long, if it's too long to fit in the token length, then just send it through piece by piece basically. So you can loop through the document or loop through your range and then send parts of the document up and it will generate responses to different pieces of that. And so here we went through 10 iterations listing a variety of services. This is not uncommon actually to see this kind of degradation. So here we just, I feel like there's probably some looping in the model going on here in not a good way. But in any case, that was an example of using tokenization in order to perform text generation with Falcon on SageMaker Jumpstart. And so that is the end of this first video. I hope you enjoyed it. And in the next one we are going to learn about how to pick the right foundation model. So I'll see you there.
Info
Channel: Amazon Web Services
Views: 74,571
Rating: undefined out of 5
Keywords: AWS, Amazon Web Services, Cloud, AWS Cloud, Cloud Computing, Amazon AWS
Id: oYm66fHqHUM
Channel Id: undefined
Length: 48min 21sec (2901 seconds)
Published: Tue Jul 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.