What are Generative AI models?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Over the past couple of months, large language models, or LLMs, such as chatGPT, have taken the world by storm. Whether it's writing poetry or helping plan your upcoming vacation, we are seeing a step change in the performance of AI and its potential to drive enterprise value. My name is Kate Soule. I'm a senior manager of business strategy at IBM Research, and today I'm going to give a brief overview of this new field of AI that's emerging and how it can be used in a business setting to drive value. Now, large language models are actually a part of a different class of models called foundation models. Now, the term "foundation models" was actually first coined by a team from Stanford when they saw that the field of AI was converging to a new paradigm. Where before AI applications were being built by training, maybe a library of different AI models, where each AI model was trained on very task-specific data to perform very specific task. They predicted that we were going to start moving to a new paradigm, where we would have a foundational capability, or a foundation model, that would drive all of these same use cases and applications. So the same exact applications that we were envisioning before with conventional AI, and the same model could drive any number of additional applications. The point is that this model could be transferred to any number of tasks. What gives this model the super power to be able to transfer to multiple different tasks and perform multiple different functions is that it's been trained on a huge amount, in an unsupervised manner, on unstructured data. And what that means, in the language domain, is basically I'll feed a bunch of sentences-- and I'm talking terabytes of data here --to train this model. And the start of my sentence might be "no use crying over spilled" and the end of my sentence might be "milk". And I'm trying to get my model to predict the last word of the sentence based off of the words that it saw before. And it's this generative capability of the model-- predicting and generating the next word --based off of previous words that it's seen beforehand, that is why that foundation models are actually a part of the field of AI called generative AI because we're generating something new in this case, the next word in a sentence. And even though these models are trained to perform, at its core, a generation past, predicting the next word in the sentence, we actually can take these models, and if you introduce a small amount of labeled data to the equation, you can tune them to perform traditional NLP tasks-- things like classification, or named-entity recognition --things that you don't normally associate as being a generative-based model or capability. And this process is called tuning. Where you can tune your foundation model by introducing a small amount of data, you update the parameters of your model and now perform a very specific natural language task. If you don't have data, or have only very few data points, you can still take these foundation models and they actually work very well in low-labeled data domains. And in a process called prompting or prompt engineering, you can apply these models for some of those same exact tasks. So an example of prompting a model to perform a classification task might be you could give a model a sentence and then ask it a question: Does this sentence have a positive sentiment or negative sentiment? The model's going to try and finish generating words in that sentence, and the next natural word in that sentence would be the answer to your classification problem, which would respond either positive or negative, depending on where it estimated the sentiment of the sentence would be. And these models work surprisingly well when applied to these new settings and domains. Now, this is a lot of where the advantages of foundation models come into play. So if we talk about the advantages, the chief advantage is the performance. These models have seen so much data. Again, data with a capital D-- terabytes of data --that by the time that they're applied to small tasks, they can drastically outperform a model that was only trained on just a few data points. The second advantage of these models are the productivity gains. So just like I said earlier, through prompting or tuning, you need far less label data to get to task-specific model than if you had to start from scratch because your model is taking advantage of all the unlabeled data that it saw in its pre-training when we created this generative task. With these advantages, there are also some disadvantages that are important to keep in mind. And the first of those is the compute cost. So that penalty for having this model see so much data is that they're very expensive to train, making it difficult for smaller enterprises to train a foundation model on their own. They're also expensive-- by the time they get to a huge size, a couple billion parameters --they're also very expensive to run inference. You might require multiple GPUs at a time just to host these models and run inference, making them a more costly method than traditional approaches. The second disadvantage of these models is on the trustworthiness side. So just like data is a huge advantage for these models, they've seen so much unstructured data, it also comes at a cost, especially in the domain like language. A lot of these models are trained basically off of language data that's been scraped from the Internet. And there's so much data that these models have been trained on. Even if you had a whole team of human annotators, you wouldn't be able to go through and actually vet every single data point to make sure that it wasn't biased and didn't contain hate speech or other toxic information. And that's just assuming you actually know what the data is. Often we don't even know-- for a lot of these open source models that have been posted --what the exact datasets are that these models have been trained on leading to trustworthiness issues. So IBM recognizes the huge potential of these technologies. But my partners in IBM Research are working on multiple different innovations to try and improve also the efficiency of these models and the trustworthiness and reliability of these models to make them more relevant in a business setting. All of these examples that I've talked through so far have just been on the language side. But the reality is, there are a lot of other domains that foundation models can be applied towards. Famously, we've seen foundation models for vision --looking at models such as DALL-E 2, which takes text data, and that's then used to generate a custom image. We've seen models for code with products like Copilot that can help complete code as it's being authored. And IBM's innovating across all of these domains. So whether it's language models that we're building into products like Watson Assistant and Watson Discovery, vision models that we're building into products like Maximo Visual Inspection, or Ansible code models that we're building with our partners at Red Hat under Project Wisdom. We're innovating across all of these domains and more. We're working on chemistry. So, for example, we just published and released molformer, which is a foundation model to promote molecule discovery or different targeted therapeutics. And we're working on models for climate change, building Earth Science Foundation models using geospatial data to improve climate research. I hope you found this video both informative and helpful. If you're interested in learning more, particularly how IBM is working to improve some of these disadvantages, making foundation models more trustworthy and more efficient, please take a look at the links below. Thank you.

Info

Channel: IBM Technology

Views: 497,936

Rating: undefined out of 5

Keywords: IBM Cloud, generative models, foundation models, ai, artificial inteligence, prompt tuning, chatgpt, bard, google bard, open ai

Id: hfIUstzHs9A

Channel Id: undefined

Length: 8min 47sec (527 seconds)

Published: Wed Mar 22 2023