So there's a lot of public interest in this
recently and it feels like hype. Is this the same, or is this something where
we can see that this is a real foundation for future application development? We are living in very exciting times with
machine learning. The speed of ML model development will really
actually increase. But you won't get to that end state that we
want in the next coming years unless we actually make these models more
accessible to everybody. Swami Sivasubramanian oversees database, analytics
and machine learning at AWS. For the past 15 years, he has helped lead
the way on AI and ML in the industry. Swami’s teams have a strong track record
of taking new technologies and turning these into viable tools. Today, Generative AI is dominating news feeds
and conversations. Consumers are interacting with it and brands
are trying to understand how to best harness its potential for their customers. So, I sat down with Swami to better understand
the broad landscape of this technology. Swami, we go back a long time. Tell me a bit.
Do you remember your first day at Amazon? I still remember because it's not very common
for PhD students to join Amazon at that time because you were known as retailer or ecommerce. We were building things. And so that's also quite a departure from
a foreign academic. Definitely, for a PhD student to go from
thinking to, actually, how do I build this? So you brought actually DynamoDB to the world
and quite a few other databases since then, but under your purview now is also AI and
machine learning. So tell me a bit about how does your world
of AI look like? After building a bunch of these databases
and analytics services, I got fascinated by AI because that literally
AI and machine learning because that puts data to work. And if you look at machine learning technology
itself broadly, it's not necessarily new. In fact, some of the first papers of deep
learning was written even like 30 years ago. But even in those papers, they explicitly
called out for it to get large scale adoption. It required massive amount of compute and
massive amount of data to actually succeed. And that's what cloud got us to actually unlock
the power of deep learning technologies. So which led me to early on this is like six,
seven years ago to start the machine Learning organization because we wanted to take machine
learning, especially deep learning style technologies, from not just in the hands of scientists to
everyday developers. If you think about the early days of Amazon,
the retailer with similarities and recommendations and things like that, were they the same algorithms
that we're seeing being used today or is that, I mean that's a long time ago, 30 years. Machine learning has really gone through huge
growth in actually the complexity of the algorithms and applicability of the use cases. Early on the algorithms were a lot more simple,
a lot more like linear algorithms based or gradient boosting. If you see last decade, it was all around
like deep learning early part of last decade, which was essentially a step up in the ability
for neural nets to actually understand and learn from the patterns, which is effectively
what all the image based image processing algorithms come from. And then also personalization with different
types of neural nets and so forth. And that's what led to the invention like
Alexa, which has a remarkable accuracy compared to others. So the neural nets and deep learning has really
been a step up. And the next big step up is what is happening
today in machine learning. So a lot of the talk these days is around
generative AI, large language models, foundation models. Tell me a bit why is that different from,
let's say the more task based like vision algorithms and things like that? I mean, if you take a step back and look at what's - How this foundation models -
large language models - is all about These are big models which are trained with hundreds of millions of parameters if not billion A parameter, just to give context, is like an
internal variable where the ML algorithm has learned from its data set. Now, to give a sense, what is this
big thing suddenly that has happened? Few things - One, if you take a look at Transformers, has
been a big change. Transformer is a kind of neural net technology
that is remarkably scalable than the previous versions like RNNS or various others. So what does this mean? Why did this suddenly lead to this transformation? Because it is actually scalable and you can
train them a lot faster now you can throw a lot of hardware and lot of data. Now that means now I can actually crawl the
entire World Wide Web and actually feed it into these kind of algorithms and start actually
building models that can actually understand human knowledge. At a high level, a generative AI text model
is good at using natural language processing to analyze text and predict the next word
that comes in a sequence of words. By paying attention to certain words or phrases
in the input, these models can infer context. And they can use that context to find the
words that have the highest probability of following the words that came before it. Structuring inputs as instructions with relevant
context can prompt a model to generate answers for language understanding, knowledge,
and composition Foundation Models are also capable of what
is called “in-context learning,” which is what happens when you include a handful
of demonstration examples as part of a prompt to improve
the model’s output on the fly. We supply examples to further explain
the instruction And this helps the model adjust the output based
on the pattern and style in the examples. When the models use billions of parameters
and their training corpus is the entire internet, the results can be remarkable. The training is unsupervised and task agnostic. And the mountains of web data used for training
let it respond to natural language instructions for many different tasks. So the task based models that we had before
and that we were already really good at, could you build them based on these foundation
models? You no longer need these task specific models
or do we still need them? The way to think about it is the need for
task based specific models are not going away. But what essentially is how we go about building
them. You still need a model to translate from one
language to another or to generate code and so forth. But how easy now you can build them is essentially
a big change because with foundation models, which are the entire corpus of knowledge of,
let's say huge amount of data, now it is simply a matter of actually building on top of this
with fine tuning, with specific examples. Think about if you're running like a recruiting
firm as an example and you want to ingest all your resumes and store it in a format that
is standard for you to search and index on, instead of building a custom NLP model
to do all that. Now using foundation models and give a few
examples of here is an input resume in this format and here is the output resume. Now you can even fine tune these models
by just giving few specific examples and then you essentially are good to go. So in the past, most of the work went into
probably labeling the data and that was also the hardest part because that drives the accuracy. Exactly. So in this particular case, with these foundation
models, no longer labeling is needed? Essentially, I mean, yes and no. As always with these things, there is a nuance. But majority of what makes these large scale
models remarkable is they actually can be trained on a lot of unlabeled data. You actually go through what I call as a pretraining
phase, which is essentially you collect data sets from, let's say, the World Wide Web,
like common crawl data, or code data and various other data sites, Wikipedia, whatnot. And then you don't even label them, you kind
of feed them as it is. But you have to of course go through Sanitization
step in terms of making sure you cleanse data from PII or actually all other stuff like
negative things or HP and whatnot. But then you actually start training on large
number of hardware clusters because these models to train them can take tens of millions
of dollars to actually go through that training. And then you actually finally you get a notion
of a model and then you go through the next step of what is called inference. When it comes to building these LLMs, the
easy part is the training. The hardest part is the data. Training models with poor data quality will
lead to poor results. You’ll need to filter out bias, hate speech,
and toxicity. You’ll need to make sure that the data is
free of PII or sensitive data. You’ll need to make sure your data is deduplicated,
balanced, and doesn’t lead to oversampling. Because the whole process can be so expensive
and requires access large amounts of compute and storage, many companies feel lost on where
to even start. Let's speak object detection in video that
would be as a smaller model than what we see now with the foundation models. What's the cost of running a model like that? Because now these models with these hundreds
of billions of parameters are probably very large pieces of data. That's a great question because there is so
much talk only happening around training these models, but very little talk on the cost of
running these models to make predictions, which is inference, which is a signal that
very few people are actually deploying it and runtime for actual production. Or once they actually deploy in production,
they will realize oh no, these models are very expensive to run and that is where few
important techniques actually really come into play. So one, once you build these large models
to run them in production, you need to do a few things to make them affordable to run
at cost, run at scale and run actually very in an economical fashion. One is what we call as quantization. The other one is what I call as distillation,
which is that you have these large teacher models and even though they are trained hundreds
of billions of models, they kind of are distilled to a smaller fine grained model and speaking
in a super abstract term, but that is the essence of these models. Of course, there’s a lot that goes into
training the model, but what about inference? It turns out that the sheer size of these
models can make inference expensive to run. To reduce model size, we can do “quantization,”
which is approximating a neural network by using smaller, 8-bit integers instead of 32-
or 16-bit floating point numbers. We can also use “distillation”, which
is effectively a transferring of knowledge from a larger “teacher” model to a smaller
and faster “student” model. These techniques have reduced the model size
significantly for us, while providing similar accuracy and improved latency. So we do have this custom hardware to help
out with this that happens at I mean, normally this is all GPU based, which are expensive
energy hungry beasts. Tell us what we can do with custom silicon
that makes it so much cheaper both in terms of cost as well as in,
let's say, your carbon footprint of the energy. When it comes to custom silicon, as mentioned,
the cost is becoming a big issue in these foundation models because they are very
expensive to train and very expensive also to run at scale. You can actually run like build a playground
and test your chatbot and at low scale and it may not be that big a deal, but once you
start deploying at scale as part of your core business operation,
then these things add up. So since in AWS we did invest in our custom
silicones for training with Trainium and with Inferentia with inference. And all these things are like ways for us
to actually understand the essence of which operators are making are involved in making
these prediction decisions and optimizing them at the core silicon level and software
stack level. I mean, if cost is also a reflection of energy
used because in essence, that's what you're paying for, you can also see that they are,
from a sustainability point of view, much more important than running it on general
purpose GPUs. So there's a lot of public interest in this
recently and it feels like hype. Is this the same or is this something where
we can see that this is a real foundation for future application development? First of all, we are living in very exciting
times with machine learning. I have probably said this now every year.
But this year is even more special because these large language models and foundation
models truly can actually enable so many use cases where people don't have to staff as
separable teams to go build task specific models. The speed of ML model development will really
actually increase. But you won't get to that end state that we
want in the next coming years unless we actually make these models more accessible to everybody. And this is what we did with SageMaker early
on with machine learning and that's what we need to do with Bedrock and all its applications
as well. But we do think while the Hype cycle will
subside like with any technology, but these are going to become a core part of every application
in the coming years. And they will be done in a grounded way, but
in a responsible fashion too, because there is a lot more stuff that people need to think
through in a generative AI context. Because what kind of data did it learn from
to actually what response does it generate? How truthful it is as well? These are stuff we are excited to actually
help our customers. So when you say that this is the most exciting
time in machine learning, what are you going to say next year? Well, Swami, thank you for talking to me. I mean, you educated me quite a bit on what
the current state of the field is. So I'm very grateful for that. My pleasure. Thanks again for having me, sir. I'm excited to see how builders use this technology and continue to push the possibilities forward. I want to say thanks to Swami.
His insights and understanding of the space are a great way to begin this conversation. I'm looking forward to diving even deeper
and exploring the architectures behind some of this. And how large models can be used by engineers
and developers To create meaningful experiences.