5090 could be 70% FASTER than the 4090

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we need bigger gpus that phrase from Jensen encapsulates nvidia's sales pitch for Blackwell's first gpus the b200 and B100 the key Innovation here for these AI focused gpus comes not really from Nvidia but from tsmc by providing Nvidia with a way to link two GPU dies together so that they work as one this isn't the same as chiplets mind you but how will this translate to the black world-based gaming GP use is NVIDIA going to link two GPU dies together for those as well surely not as that would be way too expensive right well perhaps there's actually something to that and it was right under our noses for the last several years today's video is sponsored by Ur cdkeys.com if you want to activate your windows you can pay Microsoft $100 or more for a Windows 10 Pro key or you can get one from man Noam seller Like Ur CD Keys who have partnered with Tex for a discount on what is already a really low price for a total of just $15 when you use my code the keys work globally and you can even use them to upgrade to Windows 11 after you've made your purchase you will find your key in your purchased orders on the UR CD Keys website click on get keys and copy the key then in Windows click on start and type activate and then click activation settings then click change product key paste your key uses purchased and click next that said set Windows is activated Ur CD Keys is having a midmar madness sale in addition to Windows 10 Pro you can also get Windows 11 pro at a discount with my code and even office 2021 using that same discount a huge thanks to ur CD keys for sponsoring today's video check the links in the description to get your cheap OEM Windows key today if you've been following the PC space for at least a few years you might remember the n media introduced half Precision fp16 units in the Pascal generation in 2016 and followed that up with tensor core Matrix engines in Volta in 2017 fixed function that targets machine learning workloads for the last 8 years nvidia's gpus have gradually evolved to be compute engines first and pixel pushes second Harper introduced the Transformer Engine with 4bit Precision the new microarchitecture code named Blackwell follows up on Hopper focus on AI particularly large language models that support generative Ai and with the added inance optimization that comes with support for fine grain scaling of fp4 one of the key data points from nvidia's Blackwell presentation was Jensen's description of the new Chip saying the following we didn't change the architecture of hoer we just made it a bigger chip we just used the latest the greatest 10 TB pers second interconnect we connect connected the two chips together we got this giant 200 billion parameter chip jenssen hang said this while showing this chart at first glance it looks like Blackwell is 30 times faster than hopper in inference as such maybe we can extrapolate from that that the jump in performance from the hopper microarchitecture to Blackwell is truly gigantic but you'll notice that the real jump in performance is actually from the blue line to the purple line that's Apples to Apples that's fb8 versus is fp8 the green line is fp4 which only worked as an Adaptive Precision ranging Harper and video's charts at these presentations always try to present a much greater uplift than is factual because inference is becoming a larger slice of the AI challenge it makes some sense to try and show how good Blackwell is at that particular workload you don't need that much Precision for the generation stage you just need to generate convincing and satisfactory results so fp4 is sufficient never theless it's a chart that might mislead those not paying close attention also it should be noted that this doesn't apply to training people are having trouble getting fp8 to work let alone fp4 and fb6 but anyway Jensen's admission that Blackwell is just over twice as fast as hoer because it's two chips versus one chip effectively means that really what Nvidia has done is take advantage of tsmc's modest node optimization from 4N to 4np and more importantly the much larger area available to fit more transistors in particularly cash so rather than challenge the Paradigm and approach AI Hardware from a different angle and Via has gone beyond the physical limits and doubled down on brute forcing GPU performance through larger transistor counts in this case over twice the amount of transistors by combining two retical siiz chips in some sense this is analogous to what Apple has been doing with their M series of chips particularly the ultra variant when it comes to joining two chips together in a memory coherent system Apple's marketing calls this Ultra Fusion architecture while Nvidia calls it NV hbi or high bandwidth interface the principle is the same using high-speed fabric links to couple two identical chips together thus doubling the performance in nvidia's case the chip to chip speed is 10 terb pers second note that this isn't quite the same as chiplets because these connected chips here function effectively as one unified chip both in the case of apple and Nvidia the credit really has to go to tsmc for their advances in packaging technology which effectively have allowed Nvidia to go beyond the ractical limit that of course doesn't come cheap it's a lot of silicon and it's no wonder Nvidia isn't moving beyond the 5 NM tsmc variants here but rather staying at a more mature node thus avoiding costly wafer defects that would result in a ton of wasted dies still it's estimated that a dual chip Blackwell B2 200 cost $6,000 to manufacture and that can mean very bad news for gamers well for poor Gamers at least it seems logical that Nvidia will stick to the 4np process node for the Blackwell gaming gpus in fact as I write this Twitter leer copy 7 kimy has shared that GB 202 will indeed share the same process node as gb100 so that's 4 NP back in October of last year I made a video discussing this and I said back then that the performance uplifts would be very modest basically whatever uplift the node transition would allow for back then the rumors were saying that the 90 would be 70% faster than the 4090 a claim that I disputed my estimate was an increase of around 25 to 30% and now copi is claiming a 30% density increase which is closer to my estimate but would 70% be possible not that we've seen Blackwell if we do so rough calculations on how many more transistors Nvidia managed to cram into blackw compared to adaah we see that Ada's ga100 had a maximum of 54.2 billion transistors at radical size but that was on 7 nanometer so a more fitting comparison for our purposes would be hoer Hopper's gh100 has 80 billion transistors at Max verticle size while Blackwell has 208 billion transistors across two di which means roughly 104 billion transistors per die so you get a 30% increase in the number of transistors from Hopper to Blackwell and roughly the same should be the case going from the 4090 to the 5090 now that would be a transition from tsmc's for and process to 4np an optimized version but it should be noted that the 1490 is on 5an all of these nodes aren part of the same 5 NM generation but there's still density gains between each so it's possible that there's a further optimization here compared to 5n it's thought that the n in these names comes from Nvidia by the way as in tsmc specifically optimizes these nose for NVIDIA A6 now the 5090 would not be a full reticle size chip it will likely be smaller but still 30% sounds about right however if you look closely at the illustration diagram that Nvidia showed and assuming this closely matches the actual topology you see eight blocks of Compu distributed symmetrically across the two joint chips and that's a onetoone pairing with the hbm memory Stacks so this is a chip purpose buil for this configuration and does not represent the chip we will get with the 5090 which certainly won't have hbm memory but note that this symmetry was already present in h100 and GA 100 where the chip is divided by L2 cache that communicates with the other half and that's the part that's relevant here I always assumed this was just for redundancy better yields and scalability but it seems Nvidia was already laying the ground for this multiple dice strategy back when they launched ader and Hopper if we look at a102 annotated here by locua on Twitter we see that it too was a symmetrical design with split L2 cache connecting the two halves and the same with Harper with a crossbar link if that design philosophy translated from Hopper to Blackwell in two separate chips linked together it stands to reason that the blackw gaming gpus will see a similar symmetrical configuration and at a very top skew the 5090 will feature two dyes linked together so that 30% increase in transistors plus say a 5% increase considering the fact that the 490 was on the first generation of 5 NM means that the 90 would indeed be around 70% faster than the 4090 in a two die configuration the 30% uplift that I estimated back in October of last year would be true yes but for one die configuration so that could be a 80 the naming scheme here doesn't really matter it could be that the 5090 TI is two dice and the 5090 is one day but you get the point given that it's unlikely that the 5090 will be a radicalized chip it remains to be seen if it can indeed reach that 70% increase further increases could come from architectural changes like large L1 caches in DSM units for instance so why is this bad news for gamers well can you imagine what Nvidia will charge for a graphics card with two link gigantic GPU D if the 490 was selling for around 2 Grand in the street when it came out it's not crazy to think that the 5090 will sell for twice as that at 4 Grand Street prices it should be noted that for the same price as a 490 presumably you will still be able to get a roughly 30 to 35% % performance increase on what Nvidia could call let's say the 580 and this would also explain why there are rumors circulating the AMD is not launching a top tier GPU to compete with Nvidia this gen seeing as AMD doesn't have the possibility to do two link dies based on their previous work for rdna at least not for this generation a larger 7900 XTX on a newer node would probably only compete with a 80 offering a 30 to 35% performance increase but there's no way it would catch up to a 70% faster GPU like the 5090 I'll be doing a more in-depth analysis of the Nvidia presentation as there are some other things that are worth discussing but I wanted to get this very speculative thesis out let's call it to get your guys' feedback do you think Nvidia is really launching a two die GPU for the consumer Market would the cost of that be worth it for NVIDIA more importantly and I need your honest answer here would you spend $4,000 on a GPU that performs 70% faster than a 490 or would you spend let's say $1,500 to $2,000 on a 80 if it's 30% faster than a 490 let me know in the comments below and since you're here be sure to join my patreon for more analysis and top tier content I have plenty of videos coming but I need your support to keep the channel going as income has been following the opposite trend of Nvidia stock it's going down down down so join my patreon today by following the link in in the description or the end card thanks for watching and until the next [Music] [Music] one
Info
Channel: Coreteks
Views: 21,880
Rating: undefined out of 5
Keywords: rtx 5090, nvidia, amd, rtx 4090, ray tracing, next gen gpus, next gen graphics card, blackwell, rdna4, nvidia vs amd, nvidia vs intel, intel battlemage, nvidia graphics, nvidia best gpu, nvidia next gen gpu, blackwell gaming gpus
Id: wRdn3j8Z9cg
Channel Id: undefined
Length: 12min 44sec (764 seconds)
Published: Thu Mar 21 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.