AMD Just Rocked the Data Center World with Milan-X MI200 Genoa and Bergamo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Timestamps:

  • 00:00 Introduction
  • 01:17 Milan-X
  • 10:25 Instinct MI200
  • 21:43 Genoa and Bergamo
  • 26:05 Reading Between the Lines
  • 27:57 Wrap-up
👍︎︎ 8 👤︎︎ u/InvincibleBird 📅︎︎ Nov 09 2021 🗫︎ replies

Excellent summary. 2022 will be "Bulldozer's Revenge" for AMD...;) --should be another banner year for AMD!

👍︎︎ 2 👤︎︎ u/waltc33 📅︎︎ Nov 10 2021 🗫︎ replies

Yeah well, Nvidia rocked it more. :D

👍︎︎ 3 👤︎︎ u/Star_Pilgrim 📅︎︎ Nov 10 2021 🗫︎ replies
Captions
hey guys this is patrick from sdh and what a day amd just dropped at least three big bombshells in the server market and they're absolutely super cool so let's get talking about them like specifically let's do the high level first we have milan x that will allow you to have 1.5 gigabytes of level 3 cache in a single server and those servers are already being deployed today amd is also doing basically what it did to intel now with i guess nvidia and now has a multi-die gpu that offers crazy performance and crazy amounts of memory like you can get up to 128 gigabytes of hbm 2e so high speed memory in a single gpu which is like more than a lot of single socket servers whether those are epic or xeon servers have today so that's crazy and in terms of future cpus andy's saying that their next generation genoa parts are going to be 96 cores and those are 20 22 parts and then in production 2022 but really like you know coming to systems in 2023 we're gonna have amd bergamo which is gonna bring a total of 128 cores to those same sockets so basically amd's saying that in about five or six quarters we're gonna have cpus that are gonna have twice the core count oh and like a whole bunch of other features as well so let's get to it and start unpacking some of the crazy things that amd just announced today and let's start specifically with milan x okay so to explain milan x instead of going into like crazy detail that i think a lot of people just aren't going to like really care about what i want to do is just kind of give you a high level so that way you can talk about you know milan x with your friends or whatever and we're going to just make it really simple now basically in a milan cpu package there's the kind of central io die and then there are eight ccds that sit around that each of those ccds can have up to eight cores eight cores eight ccds gives you your total of 64 cores so that's basically how that whole thing works and each of those ccds also has standard 32 megabytes of level 3 cache so if we have 32 megabytes level 3 cache we have 8 of these ccds that gives you 256 megabytes of level three cash which is an absolute ton and amd said well that's not really enough instead what if we had like way more cash like what if we could go from 256 megs on a chip up to 768 or basically three quarters of a gigabyte of level three cache only on a given new package like what if we could go do that and that's basically what they did now how they did it was they basically have little wires and very very small copper wires that go from the base die that has the compute course and the level three cache it goes all the way up to a new die and that die actually will have you know extra cash on it and that's basically how you know they stack the cash on top of the existing level three cache and that 2d or two and a half d stacking that's actually how they're adding capacities basically they're just adding cash little tiny cash chips onto their existing chips and that's how they're doing it the specifics of it of course are a lot more complex than that but that is basically the easy way to understand it and just to be kind of clear because somebody's probably out there wondering like well why didn't they just do this before this is kind of like the stacking of chips is a relatively new thing especially in big chips like server chips and so you have to kind of see that this is kind of like really kind of on the leading edge of what humans are capable of doing these days in terms of process right so this is not something that is like oh it's so simple why didn't they do it before this is absolutely a new capability and it's super cool because of that so if you add 64 megabytes of cache to something that already has 32 and you multiply everything by eight that's how you get to that 768 megabytes of cash per chip but you have two chips in a system which gives you about 1.5 gigabytes of cash in a dual socket server and while that may sound a little bit far-fetched it's actually really practical because one melon x actually works in existing sp3 so socket sp3 servers which means that basically you're going to need a most likely like a bios firmware update but other than that you'll be able to just go drop them in the amd epix 7000 3 series pci gen4 you know systems that we review on sth all the time we can literally go look through our server series you can see just a ton of reviews on those systems so if you want to go do that go for it now during the announcement amd actually said that you're going to start to see milan x in systems from like places like cisco dr lisa sue is on the board of cisco so that's probably why they were first but also you know dell hpe super micro lenovo all those kind of guys are going to go start adopting milan x most likely in the first quarter of 2022 which is only a couple months away but that doesn't mean that these chips aren't actually out there already i mean they're actually already being deployed and they've been deployed for some time from what i've heard in the microsoft azure cloud so microsoft azure basically they have an hpc service where they have like infiniband between their nodes and they basically say like hey why would you as a company or research organization why would you go have your own cluster why don't you go rent time on our cluster and we're gonna have basically the newest best stuff probably before you can go through your purchasing process and get it deployed in your data center so why don't you come use us and so part of that is also that microsoft is a giant customer of cpus and so they're able to go to amd and say like hey actually i heard you can do this this is what we want what do you guys think and amd says sure we'll go drive truck loads of chips over to you so you can go start deploying these things and what that practically means is that although milan x will not really be coming to you know your kind of like mainstream servers until early 2022 microsoft azure is actually starting to you know show its customers these chips and these instances based on these chips today as of this announcement now before this announcement last week i was able to talk to the microsoft azure folks and it just said like you know well how long is it going to take to transition like what's the plan all that kind of stuff and they basically said we're actually going to take our hb v3 instances and instead of making them just kind of your normal milan instances that they've been for a long time we're actually going to upgrade them to the milan x chips and so henceforth what is going to happen is that you're going to have an hbv3 instance and that's actually going to have mulan x with the additional 768 megabytes of level 3 cache each or 1.5 gigabytes of level 3 cache total and so what they told me is that they've been actually deploying these things for some time plus you have maintenance windows and like you know people come off of systems get on systems and cluster all that kind of stuff and so they're actually going to go and you know kind of do a fleet upgrade and actually go put these higher memory parts or higher cache parts into their servers and then just just going forward hpv3 is going to be these high cache parts and so i kind of asked those like why didn't you just do it in the first place and the answer was that they weren't ready to do it when they originally launched the instances and ask them well why aren't you making like a new instance type and they said well actually the red tape inside microsoft maybe they didn't say this exactly but they said like the red tape inside microsoft is uh it's it's pretty significant so from a corporate standpoint it's much easier just to go and upgrade the capabilities of the hpv3 rather than coming up with a new instance type and having to have that all get published and go through all of that kind of corporate i guess processes and so we're going to flash up a list of the microsoft azure instances and you can kind of see what the upgrades are but then also i just really like the fact that they showed and they we got to talk a little bit about the performance with them uh and one of the really interesting things there is just they said that like some applications just have crazy speed ups with this additional cash and one of the kind of key concepts and something that we had heard was that there is a slight penalty if you're going to go from the kind of standard 32 megabytes of cash on the you know kind of standard die but then you go up to the new dies that have been added for extra cash there is a little tiny latency penalty but what the azure guys said was like yeah that's true but you have to kind of think about it a little bit differently right because your best case scenario is always going to be just kind of on that same die and so you're not actually getting any worse there but if you have to go to the extra cache well that's actually basically avoiding a hit to memory and which means that in the amd epic architecture you have to go from the ccd to the iodide out to the memory and then all the way back through that chain so because that is such an expensive and long high latency chain compared to just hitting level three cash even if it is on a stack die well that basically means that you know you get just way better performance than if you have to go to memory and so the applications that are able to take advantage of that extra i guess bandwidth provided by having things in level three cache versus in memory those things get just crazy speed ups and here's a little chart in terms of just how big that is and just to kind of give you some idea of like what these types of things are like wrf that's like a uh weather simulation so these are big workloads some of the other workloads are things like doing computational fluid dynamics also things like if you're doing like crash testing stuff like that i mean there's all kinds of applications that actually take advantage of this just giant memory bandwidth and one of the big things that you know people have seen over the years in high performance computing is that sometimes you can have a whole bunch of really fast cores but if you don't have the data to feed those course well that can mean that your really fast cores are just kind of sitting around waiting for data and that's what this pro or this whole level three cache thing really solves is that stall in the cpu now some of these things are actually like open foam and stuff those are actually open source but other ones like fluent are very expensive licenses i mean to the point that people come up with crazy hardware configurations just to optimize on per core performance so that way they can get more performance per core and then therefore pay less in licensing i mean the license costs completely dwarf the costs of the actual underlying hardware here and so having giant speedups is absolutely awesome because not only does it mean that you get your work done faster but also means that you're probably doing it at a lower cost and what basically microsoft is saying is like yeah we're going to give you all of this extra performance over what we deployed this year we're not going to give you like a 15 bump that you might have been used to previously we're going to give you like 70 more performance in the same year that is absolutely crazy so just kind of looking forward to next year we're certainly going to see you know more memory on chip we actually did an entire piece about the gigabyte or the transition from the megabyte to gigabyte era of onboard memory in chips and so we're going to call these mulan x the 0.75 or three quarters of a gigabyte chips each even though they're 1.5 gigabytes for two of them but that really just kind of shows that we're gonna see more of this in the next year i mean intel's already talking about sapphire rapids having integrated hbm memory on board and that'll be in the gigabyte era as well or volume as well and so i definitely think that you know we're gonna start seeing more of this as we get to 2022 and beyond but this is like the first step and it's super cool okay let's get to well let's not go do bergamo and genoa just yet let's go instead to the accelerator side and specifically the amd instinct mi 200 and yes i do pause and want to say radeon instinct every time but it's now just amd instinct mi 200 now there's a couple key pieces of background if you didn't see what this new accelerator and i'm going to call it an accelerator for a very specific reason here but there are a couple pieces background that are really important and the first one is the fact that amd split its gpu line a little while ago and they basically split it into their rdna art architecture which you really see on like the workstation gpus gaming gpus all that kind of stuff but then there's cdna which is really kind of more of a compute focused architecture and so although we kind of call them both gpus realistically the things that you put in your workstation and game on those are graphics processing units whereas what we're starting to see is that amd's uh cdna chips are really kind of more like high performance computing accelerators and i don't know if that's like a really popular take but that's just kind of how i think about it because you're not going to go run video games on your mi 200 right like you just would never do that so i just kind of want to be clear of they're kind of actually bifurcating their market from like you know the gpu and accelerator side so we're gonna actually call these accelerators and the other thing you have to be aware of is just the fact that this is the actual design that amd went to the department of energy said hey i heard you guys want to go do an exascale supercomputer with frontier we're gonna go put these accelerators in there or gpus in there and this is gonna be our architecture and we're gonna go get you to exascale and department energy said yeah uh this actually looks really cool we're gonna go with your design now previously the big computers or big supercomputers in the us were based on nvidia and ibm power but uh nvidia gpu so this was actually a big win for amd and this is the architecture that really got them that win so let's get into the amd instinct mi 200 series and what they're doing here is actually super cool and if you know what amd did to intel on the cpu side they're basically doing the same thing to nvidia on the gpu side now of course nvidia will respond at some point but just kind of looking at like what these guys have because it's super cool is they actually have two compute dies that are next to one another and then they basically have a little bridge chip that kind of bridges between each of the dies so you get a lot of bandwidth between the two dies and that gives you a lot more acceleration now each of these dies has basically four packages of hbm2e memory so 64 up to 64 gigabytes per chip and then you have two chips that gives you your 128 gigabytes of hbm2e memory and also two compute dies now aimd calls this bridge technology their 2.5 d elevated fan out bridge technology but basically the easy way to think about what they're doing is they have a little tiny chip that they put between the two big chips and that really is the communication bridge between the two of them which means that on a single package although you have two chips they have really high speed and relatively low power communication between the two of them we're not going to go into this slide in great detail because hopefully that's enough that most folks will understand basically what's going on here the other big thing that i really wanted to point out is the fact that amd is adopting the oam or open accelerator module form factor and oem is a big deal we've actually been covering oem on sth since it was first announced in 2019 at the open compute summit this is a big facebook initiative and just to kind of give you an idea of why oem is important oem was basically facebook and other kind of cloud service providers saying hey nvidia your sxm system is really cool with like these kind of custom gpu form factors it is great that you know you can get number one more power to them better cooling and then you can also get high speed interconnects between the gpus love that nvidia but we don't like the fact that it's nvidia only we don't want to have nvidia specific systems so instead we're going to have the oam ecosystem or oai ecosystem so they're kind of like two ways that these get used like right one way is that you could have a cpu to four gpus and you know you can kind of go do uh that kind of connection just using the oem as the form factor for the gpus but the other big one is just using something like a universal baseboard and ubb is just kind of the board that you can put eight accelerator or oam modules into and then that board actually has all the connections so that way you can create different topologies between the different accelerators but you can also go to the cpus and there's even options to go have external cabling so you can like build out these like kind of crazy topologies of you know whole bunches of accelerators now the fact that we're talking about amd doing oem here is not going to be unique so this is not like it oem is not an amd form factor it's more of an industry standard form factor that we're going to start seeing a lot more starting like next year but you know we also know that intel's ponovecchio is going to be oem uh the old nirvana nmps those things that were in oem although that is not really happening anymore the uh havana labs which is now intel gaudi that is using oem uh xilinx had oem that they were showing off uh before so you know there are definitely a lot of different companies that are doing that we've also seen some other ai startups and stuff like that use the oem form factor so it's just something that we are going to see in the industry everybody's going to have to go do it because the cloud providers want it and the cloud providers buy enough stuff that they get to kind of start dictating some of these form factors just like we're seeing on the edsff side on ssds now amd says that moving to oam allows them to raise their tdp up to 550 watts now oem can go way beyond that i mean i think they're like 800 watt options that people have been talking about for years i don't know if they've gone beyond that yet but the idea of having much much higher power accelerators is something that we're going to see for the next few years no question about that and going to 550 watts also means that amd has more i guess thermal headroom more power capacity than something like an sx m4 nvidia a100 chip now we've looked at the nvidia a100 in a couple of different venues number one we've looked at dell poweredge uh xc8545 which had a you know which was a redstone platform based on 400 watt aim or nvidia a100 s6m4 gpus we also have another super micro system that you're going to see a video on that i think pretty soon we've also looked at the eight sxm4 systems such as the inspur system that had eight that we looked at earlier this year and so we've definitely seen ssm4 and then the other thing i think we've also looked at is just the 400 watt and also the 500 watt which you don't really see 500 watt listed from nvidia but there are 500 watt you can get like an 80 gig uh 500 watt nvidia a100 in sxm4 although those are often liquid cooled servers so we actually did that and we looked at the difference between the 40 gig 400 watt and the 80 gig 500 watt versions in our liquid cooling uh comparison video so we actually benchmark both of those now just to be clear amd definitely has a little bit more power headroom so i think they were using the 400 watt a100 and that basically means that they are running at about 150 watts which is a significant amount of power more per accelerator so just because you have an increase in the overall tdp and power consumption of accelerator doesn't mean that your system goes up by that same amount so just want to be very clear on that and so basically what amd did was they took the fp64 performance of nvidia gpus and they said like okay here's basically the k80 here's the p100 which we did in eight p100 system a long time ago v100 we did an 8v100 system we did a 8100 system and then basically here's our quantum leap they're basically showing an almost 5x improvement over the a100 performance in terms of you know fp64 which is really actually a lot i mean like that kind of vector performance improvement that is not something that's like a normal generation like normal generation you might get like 50 or 70 more something like that you don't get like a 5x that's absolutely huge something though that we did want to point out is just the fact that that's not necessarily 5x in everything for example if you got to b float 16 you're only like 1.2 x which is actually not that much ahead and also fp16 and about 1.2 x ahead so it does kind of depend on what kind of math you're doing but fp64 is really popular in the high performance computing like super computing world so just being clear of kind of what they're i guess targeting with this the other thing that is really kind of obvious when you hear an nvidia presentation and an amd presentation is that like jensen gets up and does his nvidia you know hey a100 is great and he talks mostly about ai whereas you go to the amd presentation and you think like oh these are both data center gpu-ish accelerators like like is that gonna be the same ai story and amd story is not really all about ai i mean they're really more about the high performance simulation and like that kind of compute rather than ai and so you know like amd has like says like oh we have rockum and we've you know ported a whole bunch of ai frameworks and stuff like that so you can use them on our accelerators but at the end of the day you can kind of see that both organizations are really kind of leaning into different different aspects of acceleration and just we're just going to call that out uh they're they don't really go into it a lot here but we're just going to call out that is kind of a big big area that nvidia invests heavily in and that's kind of maybe why they're not doing as much in terms of their fp64 performance now in terms of form factors and this is actually kind of a funny one i'll show you why in a second but in terms of form factors again we have oem which is going to be the mi 250 and 250x and then we also have a pcie card the amd instinct mi210 is gonna be the pcie card which allows it to get into like way more servers right i mean these oem servers like they're gonna be probably five kilowatts or more in terms of power consumption like kind of as their base so you know just at some point that's not going to be able to be deployed everywhere and so you need things like pcie cards to be able to get your chips into more different architectures one of the big challenges of course with pcie cards is the fact that they you know can't do the same cooling because they just don't have as much you know capacity to do cooling so you typically can't get to i guess the higher power consumption levels and the reason this is a little bit funny is that actually just before the amd event i was finishing up two videos where we actually have eight of these which are nvidia a40 gpus and these are 300 watt gpus which is really about the maximum that you get in a pcie card because you just kind of you only have like a certain amount of area here that you can actually use for cooling and forcing air through and so you basically kind of get stuck at a much lower power consumption level which is why we're seeing that people are saying well if i'm going to go pay all the money for to go have the chips made and hbmtv memory and all that kind of stuff i might as well just go and let the things run at higher power levels so i get more performance per chip because power costs less than buying more chips so anyway that's basically why i guess the oem is a uh is gonna kind of be the higher end model whereas we're gonna see the pcie mi 210s a little bit later and by the way just as a quick little funny one if you want to see one of those two eight or ten gpu servers for pcie that we're about to go do reviews of on the sth main site we'll probably do videos of them as well so if you want to like subscribe turn on notifications you'll probably see those later but you can see one actually just behind me so yeah finishing those up they should be online hopefully before super computing 21. but now let's get to the future of amd server processors again and let's start talking about genoa and bergamo amd actually gave us a ton of really cool information about these two chips and so i think we should get into that and just kind of talk about what they are so first off these are amd's five nanometer generation chips and that five nanometers means that you get better performance you also get more efficiency and you can cram more stuff basically into a given die size and amd is definitely using that because what we're going to see with genoa is that we're actually going to go see a total of 96 cores per cpu so we're at 64 today the next generation is going to be 96 which is 50 bigger we're also going to get features like ddr5 we're going to get pcie gen 5 as well and with pci gen 5 amd also confirmed that they are going to have cxl 1.1 in genoa's generation so we're going to start seeing cxl we have a video on cxl so if you want to go check that out go for it now genoa will be based on the upcoming zenfore architecture and zen4 is really kind of like the evolution of zen3 but if you look at what zen3 is being used for there's a whole bunch of different things and it makes a lot of sense because you know amd was really basically had no market share in early 2017 but then you know over the last couple years they've definitely gotten their double-digit market share and because of that they now can say well you know maybe we can go and have some more customized chips now i know a lot of folks of you on youtube and on the sdh main site are going to say like hey well i wish that they had a low end chip that could go after like the xeon e series or something like that but it looks like what they're actually doing is they're saying no we're going to go kind of look at the bigger markets that are kind of you know the higher volume higher dollar markets and that's really where they're going to bifurcate their line specifically they're going to have the amd zen 4c which is going to actually enable even more course in fact amd bergamo using the zen 4c is going to have up to 128 cores now amd had a very small group pre-briefing like maybe an hour so before the event just to kind of walk through this stuff and what they're basically what they basically said is that yes they are completely different chips they are still using the zenfore isa so basically if you have code that'll run on genoa it'll also run on bergamo so there's no kind of like weird stuff like it's a strip down or there's like extra stuff in bergamo it's basically the exact same it's just that they are doing a little bit of rebalancing in terms of you know where they're spending their silicon area and you know frequencies and all that kind of stuff and a good example that they talked about was that they are going to have cash optimizations which i kind of read into is maybe they're going to have different uh different ways that they do either different ways that they do the cash or they're going to have less cash potentially on bergamo per core but at the same time that allows them to then go fit more course onto the package the other thing that they said is going to be compatible is that they're going to be socket compatible so if you had a system in theory you could have a genoa cpu and then upgrade it to a bergamo chip and i know a lot of people are going to be like well why don't we just have all 128 core processors like why would we want a 96 core processor and i think that really gets into the optimization and what we're starting to see a lot more of is companies are looking at like cloud workloads and like let's call it what it is right like running web servers and stuff like that just they're just this is not like that hard in terms of computational power and so what you're seeing like on the arm side is that the arm guys are saying like yeah we don't need to go have like crazy hpc simulations uh and have all the infrastructure that's involved in doing like crazy hpc uh simulations for our cloud processors that are going to go into like oracle's cloud and stuff like that so like the ampere ultra ultra max they basically say like that's completely worthless a completely worthless way to go spend our transistor count instead what we should go do is we should go and have things that will actually make cloud providers have better service and cloud providers care about having like a certain amount of performance per core so that way they can sell their vcpus and have lots of vcpus and a fewer number of systems which means that it costs them less money to deliver vcpus and all of a sudden you know they can make even more money than they're already making and so what it seems like is that zen 4c is really an optimization kind of in the same way that you see the arm vendors optimizing for the cloud well amd is saying well hey look we can kind of do that too why don't we just go and do an optimization we're going to be on five nanometer as well like why don't we just go do that with our chips and we can go expand our core counts something else that didn't necessarily come out in the webcast but it was i would say not said i don't think it said explicitly but it was definitely very very heavily hinted was that the idea of using the you know 3d v cache that's in milan x to go add additional cash to chips that did not sound like it was something like oh we're just going to use this for milan and probably won't ever do that again it definitely sounded like amd was like yeah that's the technology that we hope to further in the future and use in more products so i would say that there is a good chance that we are going to see future products from amd have you know as much if not significantly more cash than we are getting in the current generation so just something to keep in mind that i think that we're going to start seeing that stacked v cash be something that we're gonna see in more chips in the future also just real quick bergamo is slated to be like in production maybe in like 2022 but realistically i think 2023 like early 2023 is when we're gonna see bergamo hit servers so just kind of get ready for that still that also means that we're gonna go from today to from 64 course we're gonna in a couple months have a jump to 96 course and then about five six quarters later we're going to be at 128 course from amd epic which is going to be a lot of cores definitely by that so overall this was a lot of really cool information about the future of the data center from amd and there were a couple of other disclosures that we probably should just mention and one of the big ones at least to me was the fact that like amd is winning customers they actually mention the fact that you know facebook is starting to buy amd processors and that's a big deal because facebook has been a fiercely intel shop for just years and years and so it was always like all the other cloud providers had been you know at least dabbling or doing something with amd chips but facebook was like yeah i'm just gonna keep buying intel stuff but it was announced on stage that facebook is actually going to be buying amd stuff and i think we're gonna see a little bit more of that at open compute this week which i need to go make my flight for very soon so anyway guys i hope you like this look at the future of the data center from amd there's definitely a lot of really cool stuff here i'm super excited for 2022 2023 and beyond because we've gone from like super stagnant server world to like we're doing crazy stuff now it's a lot of fun so why don't you give this video a like click subscribe turn on notifications so you can see whenever we come out with great new videos we got a ton of reviews that are gonna be coming out we got coverage from events and so definitely go do that right and as always thanks for watching have an awesome day
Info
Channel: ServeTheHome
Views: 28,101
Rating: undefined out of 5
Keywords: AMD, Milan-X, Milan, EPYC 7003, EPYC 7003X, Bergamo, EPYC 7004, Genoa, AMD Genoa, AMD Bergamo, AMD MI200, AMD Instinct MI200, MI200, Azure, AMD Azure, NVIDIA A100 v AMD, AMD Zen 4, AMD Zen4, Zen4, Zen 4, Zen 4c, Zen4c, AMD Zen 4c
Id: ZEDKNtt-erk
Channel Id: undefined
Length: 28min 33sec (1713 seconds)
Published: Tue Nov 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.