Hello World: Meet Generative AI | Amazon Web Services

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
So there's a lot of public interest in this recently and it feels like hype. Is this the same, or is this something where we can see that this is a real foundation for future application development? We are living in very exciting times with machine learning. The speed of ML model development will really actually increase. But you won't get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody. Swami Sivasubramanian oversees database, analytics and machine learning at AWS. For the past 15 years, he has helped lead the way on AI and ML in the industry. Swami’s teams have a strong track record of taking new technologies and turning these into viable tools. Today, Generative AI is dominating news feeds and conversations. Consumers are interacting with it and brands are trying to understand how to best harness its potential for their customers. So, I sat down with Swami to better understand the broad landscape of this technology. Swami, we go back a long time. Tell me a bit. Do you remember your first day at Amazon? I still remember because it's not very common for PhD students to join Amazon at that time because you were known as retailer or ecommerce. We were building things. And so that's also quite a departure from a foreign academic. Definitely, for a PhD student to go from thinking to, actually, how do I build this? So you brought actually DynamoDB to the world and quite a few other databases since then, but under your purview now is also AI and machine learning. So tell me a bit about how does your world of AI look like? After building a bunch of these databases and analytics services, I got fascinated by AI because that literally AI and machine learning because that puts data to work. And if you look at machine learning technology itself broadly, it's not necessarily new. In fact, some of the first papers of deep learning was written even like 30 years ago. But even in those papers, they explicitly called out for it to get large scale adoption. It required massive amount of compute and massive amount of data to actually succeed. And that's what cloud got us to actually unlock the power of deep learning technologies. So which led me to early on this is like six, seven years ago to start the machine Learning organization because we wanted to take machine learning, especially deep learning style technologies, from not just in the hands of scientists to everyday developers. If you think about the early days of Amazon, the retailer with similarities and recommendations and things like that, were they the same algorithms that we're seeing being used today or is that, I mean that's a long time ago, 30 years. Machine learning has really gone through huge growth in actually the complexity of the algorithms and applicability of the use cases. Early on the algorithms were a lot more simple, a lot more like linear algorithms based or gradient boosting. If you see last decade, it was all around like deep learning early part of last decade, which was essentially a step up in the ability for neural nets to actually understand and learn from the patterns, which is effectively what all the image based image processing algorithms come from. And then also personalization with different types of neural nets and so forth. And that's what led to the invention like Alexa, which has a remarkable accuracy compared to others. So the neural nets and deep learning has really been a step up. And the next big step up is what is happening today in machine learning. So a lot of the talk these days is around generative AI, large language models, foundation models. Tell me a bit why is that different from, let's say the more task based like vision algorithms and things like that? I mean, if you take a step back and look at what's - How this foundation models - large language models - is all about These are big models which are trained with hundreds of millions of parameters if not billion A parameter, just to give context, is like an internal variable where the ML algorithm has learned from its data set. Now, to give a sense, what is this big thing suddenly that has happened? Few things - One, if you take a look at Transformers, has been a big change. Transformer is a kind of neural net technology that is remarkably scalable than the previous versions like RNNS or various others. So what does this mean? Why did this suddenly lead to this transformation? Because it is actually scalable and you can train them a lot faster now you can throw a lot of hardware and lot of data. Now that means now I can actually crawl the entire World Wide Web and actually feed it into these kind of algorithms and start actually building models that can actually understand human knowledge. At a high level, a generative AI text model is good at using natural language processing to analyze text and predict the next word that comes in a sequence of words. By paying attention to certain words or phrases in the input, these models can infer context. And they can use that context to find the words that have the highest probability of following the words that came before it. Structuring inputs as instructions with relevant context can prompt a model to generate answers for language understanding, knowledge, and composition Foundation Models are also capable of what is called “in-context learning,” which is what happens when you include a handful of demonstration examples as part of a prompt to improve the model’s output on the fly. We supply examples to further explain the instruction And this helps the model adjust the output based on the pattern and style in the examples. When the models use billions of parameters and their training corpus is the entire internet, the results can be remarkable. The training is unsupervised and task agnostic. And the mountains of web data used for training let it respond to natural language instructions for many different tasks. So the task based models that we had before and that we were already really good at, could you build them based on these foundation models? You no longer need these task specific models or do we still need them? The way to think about it is the need for task based specific models are not going away. But what essentially is how we go about building them. You still need a model to translate from one language to another or to generate code and so forth. But how easy now you can build them is essentially a big change because with foundation models, which are the entire corpus of knowledge of, let's say huge amount of data, now it is simply a matter of actually building on top of this with fine tuning, with specific examples. Think about if you're running like a recruiting firm as an example and you want to ingest all your resumes and store it in a format that is standard for you to search and index on, instead of building a custom NLP model to do all that. Now using foundation models and give a few examples of here is an input resume in this format and here is the output resume. Now you can even fine tune these models by just giving few specific examples and then you essentially are good to go. So in the past, most of the work went into probably labeling the data and that was also the hardest part because that drives the accuracy. Exactly. So in this particular case, with these foundation models, no longer labeling is needed? Essentially, I mean, yes and no. As always with these things, there is a nuance. But majority of what makes these large scale models remarkable is they actually can be trained on a lot of unlabeled data. You actually go through what I call as a pretraining phase, which is essentially you collect data sets from, let's say, the World Wide Web, like common crawl data, or code data and various other data sites, Wikipedia, whatnot. And then you don't even label them, you kind of feed them as it is. But you have to of course go through Sanitization step in terms of making sure you cleanse data from PII or actually all other stuff like negative things or HP and whatnot. But then you actually start training on large number of hardware clusters because these models to train them can take tens of millions of dollars to actually go through that training. And then you actually finally you get a notion of a model and then you go through the next step of what is called inference. When it comes to building these LLMs, the easy part is the training. The hardest part is the data. Training models with poor data quality will lead to poor results. You’ll need to filter out bias, hate speech, and toxicity. You’ll need to make sure that the data is free of PII or sensitive data. You’ll need to make sure your data is deduplicated, balanced, and doesn’t lead to oversampling. Because the whole process can be so expensive and requires access large amounts of compute and storage, many companies feel lost on where to even start. Let's speak object detection in video that would be as a smaller model than what we see now with the foundation models. What's the cost of running a model like that? Because now these models with these hundreds of billions of parameters are probably very large pieces of data. That's a great question because there is so much talk only happening around training these models, but very little talk on the cost of running these models to make predictions, which is inference, which is a signal that very few people are actually deploying it and runtime for actual production. Or once they actually deploy in production, they will realize oh no, these models are very expensive to run and that is where few important techniques actually really come into play. So one, once you build these large models to run them in production, you need to do a few things to make them affordable to run at cost, run at scale and run actually very in an economical fashion. One is what we call as quantization. The other one is what I call as distillation, which is that you have these large teacher models and even though they are trained hundreds of billions of models, they kind of are distilled to a smaller fine grained model and speaking in a super abstract term, but that is the essence of these models. Of course, there’s a lot that goes into training the model, but what about inference? It turns out that the sheer size of these models can make inference expensive to run. To reduce model size, we can do “quantization,” which is approximating a neural network by using smaller, 8-bit integers instead of 32- or 16-bit floating point numbers. We can also use “distillation”, which is effectively a transferring of knowledge from a larger “teacher” model to a smaller and faster “student” model. These techniques have reduced the model size significantly for us, while providing similar accuracy and improved latency. So we do have this custom hardware to help out with this that happens at I mean, normally this is all GPU based, which are expensive energy hungry beasts. Tell us what we can do with custom silicon that makes it so much cheaper both in terms of cost as well as in, let's say, your carbon footprint of the energy. When it comes to custom silicon, as mentioned, the cost is becoming a big issue in these foundation models because they are very expensive to train and very expensive also to run at scale. You can actually run like build a playground and test your chatbot and at low scale and it may not be that big a deal, but once you start deploying at scale as part of your core business operation, then these things add up. So since in AWS we did invest in our custom silicones for training with Trainium and with Inferentia with inference. And all these things are like ways for us to actually understand the essence of which operators are making are involved in making these prediction decisions and optimizing them at the core silicon level and software stack level. I mean, if cost is also a reflection of energy used because in essence, that's what you're paying for, you can also see that they are, from a sustainability point of view, much more important than running it on general purpose GPUs. So there's a lot of public interest in this recently and it feels like hype. Is this the same or is this something where we can see that this is a real foundation for future application development? First of all, we are living in very exciting times with machine learning. I have probably said this now every year. 
But this year is even more special because these large language models and foundation models truly can actually enable so many use cases where people don't have to staff as separable teams to go build task specific models. The speed of ML model development will really actually increase. But you won't get to that end state that we want in the next coming years unless we actually make these models more accessible to everybody. And this is what we did with SageMaker early on with machine learning and that's what we need to do with Bedrock and all its applications as well. But we do think while the Hype cycle will subside like with any technology, but these are going to become a core part of every application in the coming years. And they will be done in a grounded way, but in a responsible fashion too, because there is a lot more stuff that people need to think through in a generative AI context. Because what kind of data did it learn from to actually what response does it generate? How truthful it is as well? These are stuff we are excited to actually help our customers. So when you say that this is the most exciting time in machine learning, what are you going to say next year? Well, Swami, thank you for talking to me. I mean, you educated me quite a bit on what the current state of the field is. So I'm very grateful for that. My pleasure. Thanks again for having me, sir. I'm excited to see how builders use this technology and continue to push the possibilities forward. I want to say thanks to Swami. His insights and understanding of the space are a great way to begin this conversation. I'm looking forward to diving even deeper and exploring the architectures behind some of this. And how large models can be used by engineers and developers To create meaningful experiences.
Info
Channel: Amazon Web Services
Views: 55,476
Rating: undefined out of 5
Keywords: AWS, Amazon Web Services, Cloud, AWS Cloud, Cloud Computing, Amazon AWS, Generative AI, Machine Learning, ML, AI, AIML
Id: dBzCGcwYCJo
Channel Id: undefined
Length: 17min 24sec (1044 seconds)
Published: Thu Apr 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.