AI Hardware, Explained.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the most commonly used chips today are AI accelerators who would have thought that my gaming PC at my Bitcoin miner would eventually become a good AI engineer how do you see this industry moving forward that's a great question Moore's Law is actually still as of today Alive and Kicking right power is becoming an issue heat is becoming an issue and we need to rely more and more on Parallel processing in 2011 Mark Andreessen said software is eating the world and the decade that followed just solidified this notion but software infiltrating nearly every aspect of Our Lives the last year in particular introduced a new wave of generative AI with some apps becoming some of the most swiftly adopted software products of all time and just like all the other software that came before it AI software is fundamentally underpinned by the hardware that runs the underlying computation so if software is becoming more important than ever then Hardware is following suit plus the world is constantly generating more data and unlocking the full potential of these Technologies from longer contacts Windows to multi-modality means a constant need for faster and more resilient hardware and it's equally important for us to understand who builds and controls the supply of this resource especially since many of even the most established AI companies are now Hardware constrained with some reputable sources indicating that demand for AI Hardware outstrip Supply by a factor of 10. that is exactly why we've created this mini-series on AI Hardware we'll take you on a journey through understanding the hardware that has long powered our computers but is now the backbone of these AI models absolutely taking the World by storm and in this first segment we dive into the terminology and Technology from GPU to TPU including what they are how they work the key players like Nvidia competing for chip dominance and also we address the question is Moore's law dead but make sure to look out for the rest of our Series where we dive even deeper covering supply and demand mechanics including why we can't just print our way of a shortage how Founders can get access to inventory whether they should think about owning or renting where open source plays a role and of course how much all of this truly costs and across all three videos we explore with the help of e16z special advisor Guido apenzeller someone who is truly uniquely Suited for this deep dive as a storied infrastructure expert I spent my last couple of years mostly in software but most recently before joining address norowitz actually was CTO for Intel's data center group dealing a lot with hardware and the low level components so it's given me so if I think a good Insight how large data centers work what the the basic components are that make all of this I AI boom possible today and that that really underpin this this great technological ecosystem keto has also spent time at ubico VMware big switch networks and more but let's get into it as a reminder the content here is for informational purposes only should not be taken as legal business tax or investment advice or be used to evaluate any investment or security and is not directed at any investors or potential investors in any a16z fund please note that a16z and its Affiliates may also maintain investments in the companies discussed in this podcast for more details including a link to our investments please see a16c.com disclosures [Music] we are increasingly hearing terms like chips semiconductors servers and compute but are all of these the same thing and what role do they play in our AI future if you're running any kind of AI algorithm right this AI algorithm runs on a chip right and the the most commonly used chips today are AI accelerators which are in terms of how they're built actually very close to Graphics chips right so the cards that these chips are on that are in these servers often referred to as gpus which stands for graphics Processing Unit which is kind of funny right they're not doing Graphics obviously but it's a very similar type of Technology if you look inside of them they basically are very good at processing very large number of math operations per cycle in a very short period of time so very classically like an old-fashioned CPU would would run one instructions you know every cycle round then they had multiple cores so maybe you know modern CPU can do can do a couple of 10 instructions but these sort of modern AI cards they can do more than a hundred thousand instructions per cycle so they're extremely performant so this is a GPU these gpus run inside of servers do you think of them as big boxes you know I have a power plug on the outside and a networking plugin and then these server sit in data centers where you have racks and racks of them that do the actual compute let's quickly recap CPU is central processing unit and GPU is Graphics processing unit and while both CPUs and gpus today can both perform parallel processing the degree of parallelization is what sets gpus apart for certain workloads so for example CPUs can actually do tens or even thousands of floating Point operations per cycle but a GPU can now do over a hundred thousand the basic idea of a GPU is that instead of just working with individual values it works with Vector so even mattresses or tensors more generally the TPU for example is Google's name for these kind of chips so either they call them tensor processing units which is actually a pretty good good name for them right the the cores and then and these modern gpus often called tensor Force they operate on on tensors and you know basically the core of their value propositions is they can do matrix multiplication so if remember metrics like you know like a rows and Columns of of numbers they can for example multiply two mattresses in a c cycle so in a very very fast operation and that's really what gives us a speed that's necessary to run these incredibly large language and image models that make generative AI today today's gpus are far more powerful than their ancestors whether we're comparing to the earliest graphics cards in arcade gaming days 50 years ago or the GeForce 256 the first personal computer GPU and failed by Nvidia in 1999 but is it surprising that we're seeing this chip design applied so readily to this emerging space of AI or should we expect a new architecture to evolve and be more performant in the future in one way I think it's very surprising right who would have thought that my gaming PC at my Bitcoin miner would eventually become a good AI engineer at the same time all of what all of these problems have in common is that you want to execute many operations in parallel you can think a GPU was something possible for graphics but you can think of them also just as something that's very good in performing the same operation and a very large number of parallel inputs right a very large vector or very large measures all right so perhaps it's not so surprising that nvidia's prize gpus are aligned to this AI wave but they're also not the only company participating here is Guido breaking down the hardware ecosystem the ecosystem comes in many layers right so let's start with the chips at the bottom and videos King off the hill at the moment right there a 100 is the Workhorse that that powers the the current AI Revolution they're coming up with a new one now called the h100 you know which of the Next Generation there's a couple of other vendors in the space Intel uh has something called Gaudi Gaudi 2 right that's uh and as well as that graphics card with Arc they're seeing some usage uh AMD has has a chip in this space and then we have the large clouds that are starting to build or in some cases have been building for some time their own ships right Google with the TPU you mentioned before right that is quite popular and uh Amazon has a chip called trainium for training and influencia for for inference and we'll probably see more of those in the future from some of these vendors but you know at the moment Nvidia still has a very very strong position as the vast majority of of trainings going on on their chips when we think about the different trips that you mentioned like the a100s are the strongest and maybe there's the most demand for those but how do they compare to some of these chips created by other companies is it like you know double the performance or is there some other metric or factor that may make some much more performant that's a great question you know if you look at the pure Hardware statistics so how many floating Point operations per second can these chips do there's others that are very competitive with what an Nvidia has nvidia's big Advantage is that they have a very mature software ecosystem so imagine you are an artificial intelligence developer or engineer or researcher you're often um using a model that's open source you know somebody else developed and you know how fast that model runs in many cases depends on how well is optimized for a particular chip and so the big advantage that Nvidia has today is that their software ecosystems which is so much more mature right I can I can grab a model it has all the necessary optimizations for NVIDIA to run out of the box right I don't have to do anything but some of these other chips I may have to do a lot more of these optimizations myself right and that's what giving what gives them the Strategic advantages so as we've touched on AI software is heavily dependent on Hardware but what Guido is pointing towards here is the performance of Hardware being heavily integrated with software so nvidia's Cuda system makes it easier for engineers to plug in and make optimizations like running with lower Precision numbers here is Guido speaking to the kind of optimizations that do exist it happens at all layers of the stack some of it is coming from Academia some of it is done by the large companies that operate in the space right some of them is frankly by enthusiasts but just want to see their model run faster but to give an idea of how this works like for example you know typically a floating Point number is represented in 32 bits right and some people figured out how to reduce that 216 bit when somebody was like well actually we can do it in eight bits and you have to be really careful how you do they have to normalize to make sure it doesn't overrun or under run right and um but if you normalize everything you can use much much shorter closer integers for these calculations there's many tricks like that they're really good AI developers use to to squeeze more performance out of the the chips that they have so to reiterate keto's Point floating Point numbers are typically represented in 32-bit it's that's 32 zeros and ones or binary digits with the first bit being for sine the next eight for the exponent and the next 23 for the fraction this gives a fairly large range between the smallest possible value and the largest possible value but also allows many steps in between now when many people think of semiconductors they naturally think of Morse law that's the term that describes the phenomenon observed by Gordon Moore by the way back in 1965 where the number of transistors in an integrated circuit doubles every two years but despite our Collective success for decades to continue to push more computation onto smaller chips are we now at the limits of lithography for example an apple M1 chip from 2022 has 116 billion that's billing with a B transistors and if we compare that to the arm one processor from 1985 that had 25 000 and by the way way the Apple M1 chip is not even the highest transistor count today I believe that belongs to the wafer scale Engine 2 by cerebrus with 2.6 trillion transistors so looking ahead are we at the point where we really don't see the same kind of advancement in at least the physical architecture of chips and if so where do we see advancements moving forward is it in the software is it in the specialization of these chips how do you see this industry moving forward yeah great question the so the soothing suit to to tease apart there like Moore's Law is actually still as of today Alive and Kicking right so we're still but Muslim talks about the density of of transistors on a chip and we're still increasing that right now the scale of transistors going down now I guess it's exactly the same speed I don't know but but as of today if you plot the the curve right it seems to be um intact there's a second thing called Denard scaling um which you know used to basically say just as the number of transistors I can squeeze uh onto a chip right doubles every 18 months or so it essentially meant that the power at the same time would decrease by the same factor right it says something about frequency but let's see the net outcome is power and that's for the last 10 15 years or so no longer it's true if you look at the frequency of a of a CPU it hasn't moved much over the past 10 12 15 years the net result of this is we're getting chips that have more transistors um but each individual core doesn't actually run faster right and what this means is we have to have lots and lots more parallel cores and this is why these tensor operations are so attractive right I can't add like on a single core I can't add numbers more quickly but if I can do a matrix operation instead right and especially do many of them in parallel at the same time right the second big consequence of that is that our chips are getting more and more power hungry if you look at the you know the even a graphics card for gaming PC today right you have these these graphics cards there's like hundreds of watts of power because of the 500 watt card right which is much much more than they than they used to be and that trend is going to continue and you know we're seeing what's happening data centers seeing more more things like liquid cooling uh at least being experimented with or in some cases uh you know getting deployed where basically the energy densities for these AI chips right it's getting so high that we need novel cooling solutions to make them happen so Moore's Law yes but power power is becoming an issue heat is becoming an issue and we need to rely more and more on Parallel processing so it sounds like Moore's Law is indeed not quite dead but perhaps a little more complex than it once was performance increases continue as we integrate parallel cores but we're also seeing chips become a lot more power hungry all of this will continue being dynamic as demand continues to outpace supply for high performance chips so as we look ahead what does all this mean for competition and cost you'll learn a lot more about that in the rest of our AI Hardware series tackling the questions that everybody is asking including we currently don't have as many AI chips or servers as we'd like to have how do you think about the relationship between compute capital and then the technology that we have today yeah that's uh that's the million dollar question or maybe trillion dollar question I don't know we'll see you there thank you so much for listening to the a16c podcast what we're trying to do here is provide an informed clear-eyed but also optimistic take on technology and its future and we're trying to do that by featuring some of the most inspiring people and the things that they're building so if that is interesting to you and you'd like to join us on this journey go ahead and click subscribe and make sure to let us know in the comments below what you'd like to see us cover next thank you so much for listening and we'll see you next time
Info
Channel: a16z
Views: 20,316
Rating: undefined out of 5
Keywords:
Id: -s_Ui5j0Guw
Channel Id: undefined
Length: 15min 24sec (924 seconds)
Published: Wed Aug 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.