Your Next CPU is Bigger Than Your HEAD 🤯 Cerebras Wafer Scale 2

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in modern times when you think of big cpus do you think of xeons maybe threadrippers maybe even big gpus these processors are all tiny they max out at 800 square millimeters there's now a processor on the market which is 46 000 square millimeters and it's from a company called cerebros what's your minimum specification so for today's tale we're gonna go into the world of ai processors now modern cpus like threadripper epic or you know the xeon stuff even the gpus they all do various amounts of compute on ai on one hand you have training which is their heavy memory intensive compute intensive workloads where you train an algorithm and then you have inference where you apply that algorithm to find out whether a image is a cat or a dog or you're analyzing video footage to determine whether it's a human a traffic cone a stop sign what have you so with ai processors the demand for high performance ai compute is so great that people don't want to simply just use cpus or gpus anymore they build dedicated silicon for their task at hand now we have companies in the market space who are producing their own silicon for their own workloads like amazon or alibaba or baidu or we have what are called pure play ai chip companies who are trying to produce silicon for others to use and cerebros is one of these these are companies who use global foundries or tsmc or samsung to build chips that you know are optimized for maybe producing you know being able to analyze so many images per second or being able to do video really well or being able to train at scale across thousands and thousands of nodes so they can get the most accurate models possible cerebrus has decided that just having one chip is not enough even something chiplet like amd isn't enough cerebrus today are announcing wafer scale engine 2 their second big chip that's as big as your head i've had a look at this chip before here's an image of me from uh supercomputing 2019 with wafer scale engine one so this is the biggest chip you can make from a single wafer wafer scale engine one had four hundred thousand cores 18 gigabytes of onboard sram peaked at 23 kilowatts built on tsmc's 16 nanometer and the idea is that with a chip this large you don't need racks and racks and racks of gpus to do your ai compute by having it all on one chip means that you can get away from having the complex coding needed to spread it out across multiple devices so wafer scale engine one the first generation was announced in august 2019 are hot chips this time around though wafer scale engine 2 is pretty much twice of everything i mean we've still got the same die size that's 46 225 square millimeters bearing in mind that the next biggest gpu chip is 826 square millimeters it has 2.6 trillion transistors up from 1.2 trillion it has 850 000 ai cores it has 40 gigabytes of onboard srams and with the second generation it very much carries the philosophy of the first generation doing more without having to scale the beauty of it is that wafer scale engine two can also scale so cerebrus they don't sell the chip like you can't just sell a chip you know the size of a dinner plate you have to sell it in a system and the system that they provide is about a third of a full rack unit for that you get a chip you get 12 four kilowatt power supplies that's six active six redundant so the chip actually peaks about 23 kilowatts for off chip io you have 12 100 gigabit ethernet ports the whole system is self-contained so cerebrus has designed its own liquid cooling system inside to cool this massive monster uh the front panels are machined from a single piece of aluminium or aluminum if you're from america and it looks like a really cool design now you're probably wondering how much one of these things costs so wafer scale engine one cost about two two and a half million and the reason we know this is that uh pittsburgh supercomputing center i got a grant for five million dollars and was able to buy two with uh combined with a hp superdome flex system um for you know data transfer and control and they attached that to their supercomputer neocortex uh deployments of these systems have already happened in um research facilities for the government like lawrence livermore and the argonne national lab big pharma with glaxosmithkline is a big customer oil and gas and a few that cerebrus don't want to talk about we asked how many customers they have and they say dozens with wave scale engine 2 the new system is called cs2 looks pretty much the same but just does everything 2x as much built on tsmc seven nanometer this time around and the price we've been told is several millions um in my article on nantec i've written arm plus leg um maybe you have to add firstborn in there as well now with some of these ai chips it's difficult to see how somebody would use them how can we take our cpu or our gpu ai workload and implement it on on such a different piece of silicon so cerebros has their own what they call graph compiler um and software stack but the beauty of it is is that you're meant to be able to take your if you have a standardized pi torch or tensorflow model you add in two three lines of code and use cerebrus compiler and it will just work on the system the main benefit is meant to be that you don't have to scale out to thousands of nodes if your simulation problem is that big so you save time by actually having to develop not having to develop that element to your compute solution requirements cerebrus has also had their chips used in high performance compute in stencil finite difference compute so we're talking um navier stokes 3d fluid simulation model cfd which compared to a lot of the ai chips out there none of them can do cfd this does cfd is can do so much it's really out of all the things that have happened in the ai space in the silicon space i mean you know chiplets from amd was kind of cool they helped go beyond reticle limit um what cerebrus is doing here is just another level completely i mean so when you have a chip that big um when you're making chips at a fab you have essentially what a square way or rectangle where you can print and your chip has to be within that rectangle or what's called the reticle limit and for most modern processes it's about 800 850 square millimeters cerebrus have made a been able to make a chip this big because they've developed technology and patented it so that they can do cross reticle communications now you're thinking well okay you have a chip that big what about yield why how does yield come into it cerebrus actually yields a hundred percent every chip they make they can sell reason why they can do this is because in their design so they've got four hundred thousand ai cores on wave scale engine one eight hundred and fifty thousand cores on wave scale engine two if a core has a defect on it then it can just be bypassed the metal layers are built in such that if one core isn't it can't be used for defect reasons or any other reasons it can just be bypassed uh cerebrus said that in their first generation they actually uh earmarked about 1.5 percent of the area which is what almost 600 square millimeters of spare ai cores and they told me that that was actually way too much over provisioned so i fully expect that in web scale engine 2 they've been able to cut that down hence they've been able to get just slightly more than 2x on the core count again on the software side the way they do they the way they do their ai training or inference is they break down your ai compute into its constituent layers and then the compiler maps a graph or maps a layer onto the chip such that the data can flow through the chip it's a data flow paradigm and it can do this asynchronously as well so that means you can have you can do hyper parameterization search you can do multiple models on the same chip at once and what some of the deployments are doing is they're using their big supercomputers you know for their grunt you know hard compute work and then they're using the cerebrus system to do the ai optimization of their search space so take drug discovery ais are becoming a big part of drug discovery about where where people should where the scientists should look to get you know the best drug interactions and it can test hundreds of thousands supercomputers can essentially synthetically test hundreds of thousands of compounds you know over days and months but you can't just test everything you need to narrow down your search space and they do this by building ao models to find out what sort of molecules work and then the ai models will spit out suggestions for the big supercomputers to actually run see if they're relevant that's the sort of thing we're dealing with here when we attach one of these systems to a supercomputer i have asked for one to play with uh and the answer is no the thing about cerebrus is all the other ai companies because they're dealing with small single chips they're competing against nvidia and others and so the total cost of ownership per chip has to be along the same lines as nvidia intel amd the more gpus you buy the more money you save that's right the more gpus you buy the more money you save so we're talking about anything from say a thousand to ten thousand dollars you know per chip per card cerebrus comes out you know with a two plus million dollar solution for the first gen second gen is going to be a few more million for a company that's made that's had some venture capital funding i think it's about 112 million so far they're already profitable they don't need to go through another round of funding they're expanding their personnel they're building up a team such that a they can develop future generations but also support customers and but also deploy to more customers as and when they need one of the issues with a lot of the ai companies these days if you go through the funding i mean we've just had grok and samanova get almost a billion dollars of funding between them is that neither of those are profitable but cerebrus is and this is a fundamental change on how we think about what a chip needs to compute cerebrus wafer scale engine 2 is going to be it's currently being tested by select partners um they're kind of remoting into cerebrosyst systems at the headquarters uh systems were shipped to customers uh q3 and they've already got orders in in a decade of me covering chips you know cpu gpu x86 um this is by far one of the most ambitious if not the most ambitious paradigms of compute i've ever seen and it's bloody amazing if you don't follow chips at least follow these guys but honestly speaking this is a sort of system that gets the you should look at this tech tech potato award i should make up an award and send it to them because they really deserve it here [Music] so in light of cerebrus i have to say my minimum specification is now 2.6 trillion transistors uh i think amd and intel are going to have to really up their game if they want to make chips that big for those of you who like this content please give a thumbs up and a subscribe if you didn't please let me know in the comments exactly what i'm doing wrong there's also a patreon and many thanks to all the all the patreon members recently who have joined as we passed the one-year anniversary you really helped make this channel what it is [Music] you

Info

Channel: TechTechPotato

Views: 128,120

Rating: undefined out of 5

Keywords: cerebras, wafer scale, cerebras systems, cs-2, wse, wse 2, ai cpu, ai SoC, cerebras wafer scale, tsmc, tsmc 16nm, tsmc 7nm, biggest cpu in the world, biggest cpu ever, ai inference, ml inference, ai training, ml training, tmsc, reticle limit, fastest cpu, fastest cpu in the world, fastest cpu ever, best gaming cpu, large cpu, most expensive cpu, techtechpotato, andrew feldman, ian cutress, future technology, new technology, 2050 technology

Id: FNd94_XaVlY

Channel Id: undefined

Length: 13min 12sec (792 seconds)

Published: Tue Apr 20 2021