JOHN EWALD: Hello, and
welcome to Introduction to Large Language Models. My name is John Ewald, and
I'm a training developer here at Google Cloud. In this course, you learn
to define large language models, or LLMs,
describe LLM use cases, explain prompt tuning, and
describe Google's Gen AI development tools. Large language models, or LLMs,
are a subset of deep learning. To find out more
about deep learning, see our Introduction to
Generative AI course video. LLMs and generative AI
intersect and they are both a part of deep learning. Another area of AI you
may be hearing a lot about is generative AI. This is a type of
artificial intelligence that can produce new content,
including text, images, audio, and synthetic data. So what are large
language models? Large language models refer to
large general-purpose language models that can be
pre-trained and then fine tuned for specific purposes. What do pre-trained
and fine tuned mean? Imagine training a dog. Often, you train your
dog basic commands such as sit, come,
down, and stay. These commands are normally
sufficient for everyday life and help your dog become
a good canine citizen. However, if you need a special
service dog such as a police dog, a guide dog, or a hunting
dog, you add special trainings. The similar idea applies
to large language models. These models are trained
for general purposes to solve common language
problems such as text classification, question
answering, document summarization, and text
generation across industries. The models can then be tailored
to solve specific problems in different fields
such as retail, finance, and entertainment using a
relatively small size of field data sets. Let's further break
down the concept into three major features
of large language models. Large indicates two meanings. First is the enormous size
of the training data set, sometimes at the petabyte scale. Second, it refers to
the parameter count. In ML, parameters are often
called hyperparameters. Parameters are basically the
memories and the knowledge that the machine learned
from the model training. Parameters define
the skill of a model in solving a problem
such as predicting text. General purpose
means that the models are sufficient to
solve common problems. Two reasons lead to this idea. First is the commonality of
a human language regardless of the specific tasks. And second is the
resource restriction. Only certain organizations
have the capability to train such large language
models with huge data sets and a tremendous
number of parameters. How about letting them create
fundamental language models for others to use? This leads to the last point,
pre-trained and fine tuned, meaning to pre-train
a large language model for a general purpose
with a large data set and then fine tune it for
specific aims with a much smaller data set. The benefits of using
large language models are straightforward. First, a single model can
be used for different tasks. This is a dream come true. These large language
models that are trained with petabytes of
data and generate billions of parameters are smart enough
to solve different tasks, including language translation,
sentence completion, text classification, question
answering, and more. Second, large language models
require minimal field training data when you tailor them to
solve your specific problem. Large language models
obtain decent performance even with little
domain training data. In other words, they can be
used for few shot or even zero-shot scenarios. In machine learning,
few shot refers to training a model
with minimal data, and zero shot implies
that a model can recognize things that have not explicitly
been taught in the training before. Third, the performance
of large language models is continuously growing when you
add more data and parameters. Let's take PaLM as an example. In April 2022,
Google released PaLM, short for Pathways Language
Model, a 540 billion-parameter model that achieves a state
of the art performance across multiple language tasks. PaLM is a dense decoder-only
transformer model. It has 540 billion parameters. It leverages the
new pathways system, which has enabled
Google to efficiently train a single model across
multiple TPU V4 pods. Pathway is a new AI
architecture that will handle many tasks at
once, learn new tasks quickly, and reflect a better
understanding of the world. The system enables PaLM
to orchestrate distributed computation for accelerators. We previously mentioned that
PaLM is a transformer model. A transformer model consists
of encoder and decoder. The encoder encodes
the input sequence and passes it to
the decoder, which learns how to decode
the representations for a relevant task. We've come a long away from
traditional programming to neural networks
to generative models. In traditional
programming, we used to have to hard code the rules
for distinguishing a cat-- type, animal; legs, four;
ears, two; fur, yes; likes yarn, catnip. In the wave of
neural networks, we could give the network pictures
of cats and dogs and ask, is this a cat? And it would predict a cat. In the generative
wave, we as users can generate our own content,
whether it be text, images, audio, video, or other. For example, models
like PaLM, or LaMDA, or Language Model for
Dialogue Applications, ingest very, very large
data from multiple sources across the internet and build
foundation language models we can use simply by
asking a question, whether typing it into
a prompt or verbally talking into the prompt. So when you ask it
what's a cat, it can give you everything it
has learned about a cat. Let's compare LLM development
using pre-trained models with traditional ML development. First, with LLM development,
you don't need to be an expert. You don't need
training examples. And there is no need
to train a model. All you need to do is think
about prompt design, which is the process of creating a
prompt that is clear, concise, and informative. It is an important part of
natural language processing. In traditional
machine learning, you need training examples
to train a model. You also need compute
time and hardware. Let's take a look at an example
of a text generation use case. Question answering, or QA, is
a subfield of natural language processing that deals with the
task of automatically answering questions posed in
natural language. QA systems are typically
trained on a large amount of text and code. And they are able to answer
a wide range of questions, including factual, definitional,
and opinion-based questions. The key here is that you
need domain knowledge to develop these
question-answering models. For example, domain
knowledge is required to develop a question-answering
model for customer support, or health
care, or supply chain. Using generative QA, the
model generates free text directly based on the context. There is no need for
domain knowledge. Let's look at three
questions given to Bard, a large language model chat
bot developed by Google AI. Question one. "This year's sales are $100,000. Expenses are $60,000. How much is net profit?" Bard first shares how net
profit is calculated, then performs the calculation. Then Bard provides the
definition of net profit. Here's another question. Inventory on hand
is 6,000 units. A new order requires
8,000 units. How many units do I need to
fill to complete the order? Again, Bard answers the question
by performing the calculation. And our last example,
we have 1,000 sensors in 10 geographic regions. How many sensors do we have
on average in each region? Bard answers the
question with an example on how to solve the problem
and some additional context. In each of our questions, a
desired response was obtained. This is due to prompt design. Prompt design and
prompt engineering are two closely-related concepts
in natural language processing. Both involve the
process of creating a prompt that is clear,
concise, and informative. However, there are some key
differences between the two. Prompt design is the process
of creating a prompt that is tailored to the specific
task that this system is being asked to perform. For example, if
the system is being asked to translate a text
from English to French, the prompt should be
written in English and should specify that
the translation should be in French. Prompt engineering
is the process of creating a prompt
that is designed to improve performance. This may involve using
domain-specific knowledge, providing examples of
the desired output, or using keywords
that are known to be effective for the
specific system. Prompt design is a
more general concept, while prompt engineering is
a more specialized concept. Prompt design is essential,
while prompt engineering is only necessary
for systems that require a high degree of
accuracy or performance. There are three kinds of
large language models, generic language models,
instruction tuned, and dialogue tuned. Each needs prompting
in a different way. Generic language models
predict the next word based on the language
in the training data. This is an example of a
generic language model. The next word is a token based
on the language in the training data. In this example,
"the cat sat on," the next word should be "the." And you can see that "the"
is the most likely next word. Think of this type as an
autocomplete in search. In instruction
tuned, the model is trained to predict a
response to the instructions given in the input. For example,
summarize a text of X, generate a poem
in the style of X, give me a list of keywords based
on semantic similarity for X. And in this example,
classify the text into neutral,
negative, or positive. In dialogue tuned, the model
is trained to have a dialogue by the next response. Dialogue-tuned models
are a special case of instruction tuned where
requests are typically framed as questions to a chat bot. Dialogue tuning
is expected to be in the context of a longer
back and forth conversation, and typically works better
with natural question-like phrasings. Chain of thought reasoning
is the observation that models are better at
getting the right answer when they first output
text that explains the reason for the answer. Let's look at the question. Roger has five tennis balls. He buys two more
cans of tennis balls. Each can has three tennis balls. How many tennis balls
does he have now? This question is posed
initially with no response. The model is less likely to get
the correct answer directly. However, by the time the
second question is asked, the output is more likely to
end with the correct answer. A model that can do everything
has practical limitations. Task-specific tuning can
make LLMs more reliable. Vertex AI provides
task-specific foundation models. Let's say you have
a use case where you need to gather
sentiments, or how your customers are feeling
about your product or service. You can use the
classification task sentiment analysis task model. Same for vision tasks. If you need to perform
occupancy analytics, there is a task-specific
model for your use case. Tuning a model enables you to
customize the model response based on examples
of the task that you want the model to perform. It is essentially the
process of adapting a model to a new domain, or set
of custom use cases, by training the
model on new data. For example, we may
collect training data and tune the model specifically
for the legal or medical domain. You can also further tune
the model by fine tuning where you bring
your own data set and retrain the model by
tuning every weight in the LLM. This requires a big
training job and hosting your own fine-tuned model. Here's an example of a medical
foundation model trained on health care data. The tasks include question
answering, image analysis, finding similar
patients, and so forth. Fine tuning is expensive and
not realistic in many cases. So are there more efficient
methods of tuning? Yes. Parameter-efficient
tuning methods, or PETM, are methods for tuning
a large language model on your own custom data
without duplicating the model. The base model itself
is not altered. Instead, a small
number of add-on layers are tuned, which can be swapped
in and out at inference time. Generative AI Studio lets you
quickly explore and customize generative AI
models that you can leverage in your
applications on Google Cloud. Generative AI Studio helps
developers create and deploy generative AI
models by providing a variety of tools
and resources that make it easy to get started. For example, there's a
library of pre-trained models, a tool for fine tuning models,
a tool for deploying models to production, and a
community forum for developers to share ideas and collaborate. Generative AI App
Builder lets you create Gen AI apps without
having to write any code. Gen AI App Builder has a
drag-and-drop interface that makes it easy
to design and build apps, a visual editor that
makes it easy to create and edit app content, a built-in search
engine that allows users to search for information
within the app, and a conversational
AI engine that allows users to interact with
the app using natural language. You can create your own chat
bots, digital assistants, custom search engines, knowledge
bases, training applications, and more. PaLM API lets you
test and experiment with Google's large language
models and Gen AI tools. To make prototyping quick
and more accessible, developers can integrate
PaLM API with Maker Suite and use it to access the
API using a graphical user interface. The suite includes a
number of different tools such as a model-training
tool, a model-deployment tool, and a model-monitoring tool. The model-training tool helps
developers train ML models on their data using
different algorithms. The model deployment tool helps
developers deploy ML models to production with a number of
different deployment options. And the model-monitoring
tool helps developers monitor the
performance of their ML models in production using a
dashboard and a number of different metrics. That's all for now. Thanks for watching this course,
Introduction to Large Language Models.