RDNA3 – what went wrong?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
35 that's the average performance uplift of the Arden A3 based RX 7900 XTX over the previous ordinary 2 Flagship the RX 6950xt and yes I have to admit it's less than I expected but this isn't supposed to be a doom and gloom video even though it's tempting I'm not going to jump on the hate bandwagon and rage about my disappointment instead I'm making this video to figure out what went wrong and I'm inviting you to join me so without further Ado let's take a closer look at MD's new RX 7900 graphics card and the navi 31 chip they are based on I have to begin this video by figuring out why I was disappointed when we finally got independent performance reviews because there's always the possibility that the product might actually perform as it's supposed to and the true problem is your own unrealistic expectations but I don't think this is the case here yes I was a bit hyped for Arduino 3 but not about its performance for me being the first GPU triplet architecture was a much bigger Focus as I'm a sucker for interesting silicon design basically since the very first time AMD produced its new ardna architecture their performance claims were on point AMD claimed the 5700xt could outperform the RTX 2070 and it did Nvidia was even forced to release a super refresh AMD claimed the 6900 XT could compete and even beat the RTX 3090 and you know what it did in 1080P and 1440p it was on top of many Benchmark charts and even in 4k it gave to 3090 a run for its money so when AMD gave us a performance preview of their upcoming Arduino 3 flarehouse cards just a few weeks ago I did believe their claims right from the start of the presentation it was immediately obvious that AMD could not compete with the RTX 490 the lack of any kind of comparison gave it away but the benchmarks AMD did present or showed a 50 or even higher performance uplift over the 6950xt I knew it wouldn't be 70 or even 60 but I was certain we would see around 50 on average as we know now it obviously did not happen AMD took the trust to build up over the past years and threw it all the way in one single evening that's why I'm disappointed and to make matters even worse I for the life of me can't understand why AMD acted that way as they had to know independent reviews would burst their bubble before launch but I digress with the reason for my disappointment uncovered let's tackle a more interesting question is there really something wrong with the Arden A3 architecture in general and the navi 31 GPU specifically or is there just a performance we can expect from a first generation chiplet architecture with a graphics core die half the size of a RTX 490 amd's first rdna generation also only matched nvidia's second tier gpus and no one was especially disappointed by that on the contrary 1st Gen Arden a got great reviews and made me buy a 5700 XT back then so how do we figure out if the new RX 7900 gpus are performing as they should or if something actually went wrong since we don't know amd's internal performance targets we have to rely on things we do know for example the amount of transistors the die size of 9v31 and of course performance and power consumption data after skimming through many different reviews catching up with all the latest rumors and taking the harder aspects of Navi 31 into account it is my opinion that something actually did go wrong with Navi 31. let me explain why I came to this conclusion first there is a huge amount of transistors in these spending on 931 which does not at all relate to the performance gains we are seeing now it's important to understand that increasing the amount of transistors in a GPU does not automatically equal a similar uplift in performance doubling the transistor count won't double the performance nvidia's RTX 4090 is a good example the 490 or rather the ad102 chip powering it has 2.7 times the amount of transistors compared to the 39 GTI but the 4090 doesn't deliver 2.7 times the performance in 4k it's about 65 faster if we apply the same comparison to Navi 21 the chip hiring the 6950xt and the new Navy 31 we can see much worse scaling from AMD with 57.7 billion transistors Navi 31 has almost 2.2 times the amount of transistors but only delivers 35 more performance suddenly Nvidia scaling getting 65 higher performance with 2.7 times the transistors looks really good and it gets worse since the 4090 isn't even using the full 81 2 chip but a cut down version with about 11 of the chip deactivated a upcoming 49 DTI could increase the lead over the 30 90 TI to 75 or more whereas the 7900 XTX is using a fully enabled Navi 31. there is no way AMD designed a new dual issue Shader architecture and built the first chipnet GPU on a new 5 nanometer node only to achieve 35 percent more performance it just doesn't make any sense long before this actual silicon available the performance of a new GPU is simulated and estimated if AMD would have thought they are working towards a 35 increase in performance this card would have never seen the light of day it's also very apparent if we just compare the hardware specs of the 700 XCX with the 7950xc the increase in raw fp32 performance render output units texture mapping units and memory bandwidth alone points towards a much larger increase in performance than what the benchmarks show us I come to a similar conclusion when looking at the PowerPoint consumption with every previous rdna iteration AMD achieved substantial performance per watt gains the 5700 XT was over 60 more efficient than Vega 64. same with Arden A2 the 6900 XT was 45 more efficient than the 5700 XT and AMD did promise similar gains for rdna3 just a few months ago they still talked about RDA 3 being over 50 more efficient compared to rdna 2 as a publicly traded company AMD isn't allowed to lie about its products in such a way it might even open them up to a lawsuit from investors I'm convinced AMD actually predicted these efficiency gains for Arduino 3 and it does make a lot of sense the switch to tcmcs and 5 processor alone should enable a massive increase in efficiency but independent reviews show much lower efficiency gains in reality the 7900 XTX is only about 18 more efficient than the 6900 XT with rdna 2 AMD was more efficient than nvidia's ampere but lovely is completely turned it around and now leads ordinary 3 by up to 50 I think it's very obvious that Navi 31 does have power consumption issues AMD even stealth increased the GDP of the 7900 XT in the time between preview and launch from 300 watts to 350 Watts meaning it's now only 40 Watts below the XTX that's not something you do when you have your products figured out another good indication that arduino3 performs well below amd's exploitation is the ability to increase the amount of infinity cash by stacking additional cash triplets on top of the memory cache dies the early RNA 3 leak by angstronomics has turned out to be 100 correct so far and it confirmed the ability of 3D stacking which means AMD has implemented so-called tsv connection points into the design we do have some lowest die shots and while they are not good enough to visually confirm the integration of TS3 into the tablet there is clearly enough space for it the only reason for AMD to plan the addition of stacked 3D V cash is if their simulations showed much higher performance level to a point where more cash is needed to feed the bandwidth hungry Shader cords but as we can see from the reviews at current performance levels the 7900 XTX has more than enough bandwidth and then of course we have all the rumors about possible Hardware back floating around from lower than expected clock speeds to higher than expected voltages and a non-functional Shader prefatcher and while these are all rumors a lot of them are backed up by a code from amd's own driver stack or slides AMD has shared with the Press I'm certain Arden A3 was supposed to perform much better there's just too many hints and clues that support the theory now that we have established that Navi 31 missed its performance Target by a pretty large margin let's try to figure out what exactly went wrong a little spoiler right at the start I don't think it was a single floor but more a combination of multiple smaller bugs and issues within hardware and software the most discussed possible claws are probably the clock speeds of Navi 31. early leaks and rumors predicted very high Glock speeds well above 30 Hertz and AMD even specifically talked about how Arduino 3 is and I quote architected to exceed three gigahertz at the same time the official boost lock of the 7900 XTX is only 2.5 gigahertz at first glance this seems to be a pretty obvious shortcoming and would explain a majority of the missing performance if the xdx would be clocked at 3 gigahertz that's a 20 increase right there and suddenly it would beat the 4080 by 20 to 25 percent but that's not the entire truth first of all Arden A3 being designed to exceed 3 gigahertz doesn't mean that the largest chip of this generation will hit these lock speeds it's very common that smaller chips of our generation are clocking higher like we can see with ordinary 2 where the 6750 XT has a boost lock of 2.6 gigahertz while the 6950xt clubs out F 2.3 gigahertz there's a good chance Arden A3 has similar scaling Navi 32 could achieve higher clock speeds hitting the advertised 3 gigahertz and second the advertised boost clocks are often available load the actual in-game clock speeds according to computer base in some games the 7900 XX is clocking at over 2.9 gigahertz coming very close to the promised 3 gigahertz architecture design and tech power up even measured up to 2.99 gigahertz in games this is for the reference card from AMD aib cards block even higher and go well beyond 3 gigahertz it's clear that Navi 31 is capable of high Glock speeds the problem is that It suffers from a pretty large clock speed deviation depending on the specific games tested in some games the clock speeds are even below 2.5 gigahertz computer-based measured an average of 2.55 gigahertz across 18 games and tech power up got 2.63 gigahertz as a 25 game average we can observe additional strange behavior when it comes to the clock speed and power consumption of navi 31. for example using a frame limiter has very little impact on the power draw of amd's new gpus while nvidia's love lace cards reduce their power consumption as expected they there are also idle power problems some of which AMD has already labeled as software box in their latest driver release all of this is a very strong indication that Navi 31 needs higher than expected voltages to achieve their desired clock speeds the architecture is definitely capable of hitting fast lock speeds but it often runs into Power issues since the TDP is very close to the hard power limit a PCI Express slot and two 8-pin connectors can provide aib models already show much higher clock speeds and thus quite a bit more performance but because of the high voltage requirements the power Choice getting out of control which also hurts efficiency the second possible clause for the worst and expected performance is still more of a rumor but I think there's a good chance this is actually true quick intermission I am currently editing this video and Tom's Hardware just released an article with a statement from AMD that the Shader prefatcher is in fact in working order but I've decided to keep the discussion on this topic in the video because I think it drives home my number one point that there isn't a single the one floor that is responsible for the top Optimal Performance of Arden A3 with that let's go back to the video I'm talking about the claim that the shaderbury fetcher on most alternate 3 designs has a hardware defect I'm not basing this just on unconfirmed treats there's actual driver code from AMD disabling the Shader prefetcher on Navi 31 Navi 33 and Phoenix this rumor goes hand in hand with the assumption that now 31 got released as a a zero stepping basically the very first version of the chip that got taped out usually complex chip designs go through at least one if not bootable revision changes before launch with the knowledge that a zero silicon has a non-functional Shader prefatcher which again comes straight from amd's own code and amd's labeling of retail 931 gpus as revision a0 there's a decent chance the Shader prefatcher is actually non-functional on the 7900 xdx and 7900 XT but as always links aren't as bad as they seem because while it would certainly will be sub-optimal the performance penalty isn't that huge on Arduino 2 we are talking about around five percent performance regression without Shader prefetching it might be more for Arduino 3 or maybe even less but in any case it's not a huge performance impact if this turns out to be true still another five percent down the terrain I think you can start to see where I'm going then there's the obvious argument to be made about the drivers we all know Graphics drivers can have a huge impact on gaming performance best example is the recent Intel Arc launch but even in past generations and especially for AMD gpus improved drivers have made a big difference it even coined the term fine wine and the Shader architecture change within ardna 3 increases the Reliance on optimized drivers because the Dual issue shaders need so-called instruction level parallelism and thus are harder to utilize than those of previous ordinary generations with the switch to a new Shader core architecture AMD has to fundamentally change their driver optimizations which does take time and find finally there's the unknown factor of the first gen chiplet architecture I personally don't think it will have a large impact on performance but die to die interconnects do use a lot more energy as moving data off and on a triplet is much more energy consuming than keeping it on a monolithic chip a higher than expected fappy power consumption in turn means less available GDP for the graphics course and thus more negative effects on the performance to recap in my opinion Arden A3 and Navi 31 specifically seems to suffer from a couple of different problems none of which are a huge factor in of itself it seems like Nami 31 does need slightly higher than expected voltages to reach the desired clock speeds which means the chip is limited by the tight power limit and in general just runs with lower than expected clock speeds it's not a huge drop in some games the XCX can get really close to 3 gigahertz but overall it's a performance reduction then Navi 31 seems to suffer from a hardware Buck within the Shader prevention again not a huge problem but still around five percent performance regression next Arduino 3 is using a new Shader architecture which means driver optimizations can be copied from previous RNA generations and the specific dual issue shaders are also harder to optimize for and last but certainly not least the first gen chipler design could take a bigger bite out of the TDP than predicted further reducing the available power for the graphics chord eye each of these problems alone are not that impactful and almost any chip of this size suffers from Bugs and issues especially when using a new architecture implementing a new tablet design and using a new process node at the same time but combined they have the potential to severely decrease the performance of Navi 31 you might ask yourself now how can something like this happen and why would AMD release such a seemingly broken GPU well first we have to remember that up to this point all of these supposed bugs and issues are unconfirmed they might be very strong indications that it's true but right now we only have circumstantial evidence I try to make a a convincing argument but unless AMD comes out and openly confirms or denies these problems we won't know for sure we might get some confirmation once Nabu 32 releases which is supposed to use a fixed version of arduino3 but until then we have to wait and see as for the how and why here's a possible explanation imagine AMD just taped out Navi 31 the design is finished and first samples are being tested they show some promise but it quickly becomes clear that more metal layer revisions are needed to get the voltage requirements to a more acceptable Point nothing uncommon most chips take multiple metal layer revisions and sometimes even silicon layer revisions to fix Hardware bugs until they get released but then the engineers discovered that the Shader refetcher doesn't work no reason to initiate a metal spin if the Silicon layer needs fixing too but it takes longer than expected to figure out why the prefatcher isn't working and then even longer to design a possible fix by now AMD is too close to the promised launch date a complete re-spin would take at least another half year and Navi 31 would launch in the second quarter of 2023 the only other option is to do a metal layer fix before launch but it's also a waste of money and resources to fix the metal layer in order to get the voltages and power draw and clock speeds up to Snuff then there's still broken silicon inside another problem for AMD is that they have booked a lot of five nanometer capacity at tcmc it's a waste to not use it and even with all its problems 9v31 is still faster more efficient and probably cheaper to produce than Navi 21 as a result AMD begins production of a0 revision Navi 31 chips to hit the promise launch date in quarter 4 of 2022. now this is a nice story but nothing more I was just trying to explain how such a situation could arise and why it might still be the better choice to release a slightly broken product maybe none of that is true or maybe I exactly hit a Mark we had similar GPU launches in the past if you remember nvidia's Fermi generation and the corresponding GTX 480 gpus you know what I'm talking about in September of 2009 ATI release the Radeon HD 5000 series chips with the flagship the hd5870 it was 30 to 40 faster than nvidia's GTX 285 consumed less power and was the first GPU to support DirectX 11. Nvidia clearly didn't expect a kind of performance it took a video 6 months to release the counter to GTX 480. it did beat the hd5870 but also consumed a whopping 50 more energy it was loud and hard then a few months later in November of the same year Nvidia released the GTX 580 which was basically a fixed version of the 480. it increased performance by about 15 and at the same time consumed less power and was cooler the 580 in very simplified terms was just a re-spin of the 480. by the time the 480 had launched Nvidia must have already taped out the 580 but Nvidia couldn't have waited another 8 months to counter ATI Because by the time the 580 released everyone interested in a high-end GPU would have already bought a 5000 series card so Nvidia decided to launch a partially broken GPU another example imd's Vega gpus and the hype about the Primitive Shader feature it was teased and rumors for a long time but apparently broken in hardware and as far as I know it never got fixed next I want to talk about if and how Navi 31 can be fixed this discussion is highly dependent on how accurate my previous predictions turn out to be for example if the retail Nike 31 gpus are actually not a zero stepping and already have a fixed chamber refetch there is no more performance to be gained maybe the large fluctuation in Glock speeds are related to bugs in their power management which might be fixed with software so a metal error vision isn't needed maybe the drivers are already as good as they will get or AMD will release a 30 Miracle driver in a few months we simply can't know for sure but if my predictions are correct Navi 31 does not need a lot to go from an OK product which really is only okay because the competition in form of the 4080 is probably the worst price to Performance GPU ever to a actually good product we don't need a miracle driver that doesn't need to be an amazing metal layer Vision that boosts Navi 31 to 4 gigahertz locked beats and there also doesn't need to be a hard Graphics that doubles performance by now AMD will know all about the bugs and issues within Navi 31 and understand how to fix them imagine a B2 revision Navi 31 being released in the summer of next year the shade debris Thatcher does work now and adds about five percent more performance on average a couple of metal layer versions improve the electrical properties of the gcd to a point where the average Glock speeds are now 300 megahertz higher instead of 2.55 gigahertz it's hitting about 2.5 gigahertz on average and because of the lower voltages power consumption does not increase that's another 10 to 12 performance on top and by summer MD's Engineers have further optimized the drivers not by much but they are able to achieve an average of about six percent more performance if we add all those little bits together the five percent from the prefetcher 12 from higher clock speeds and another six percent from driver optimizations suddenly the 7900 xt-axis 23 faster while consuming the same amount of power that means it beats the 4080 by 20 to 25 percent the power efficiency is now over 40 better than Arduino 2 and much more in line with previous generations and even the 4090 wouldn't be that far ahead anymore Navi 31 doesn't need that much performance uplift to become a good product even just 5 more clock speed 5 better drivers and 5 prefecture gains but still place it 15 to 20 above the 4080 while being cheaper and then there is Ray tracing probability only part of r93 where I was positively surprised remember the reveal where AMD fooled us with their numbers they only did so in Pure Western performance they talked very little about improvements in Ray tracing and I took that as a sign that Arden A3 doesn't really uplift Ray tracing performance at least not more than traditional rasterization but it turns out that while the 7900 xdx is only 35 faster in raster than the 6950xt in Ray tracing the lead is almost 50 percent Arden A3 has much improved Ray tracing performance it competes well with a 3090 and even a 3090 TI that was a huge surprise and something I would have put more focus on if I was in charge of amd's presentation this strong performance in Ray tracing also means that the 4080 is just 25 faster than the 7900 XTX yes I said just and yes 25 is a lot but it's much closer than I expected it to be and if AMD manages to improve the rest performance of Navi 31 it will also affect racist in performance since the performance penalty is mostly percentage based as much as I would like to continue talking about possible ways Arduino 3 might be broken or could be fixed I have to come to a conclusion the reason I made this video is not to pretend that I know exactly what's going on with Arduino 3 and how to fix it but to give an insight into my thoughts about the current situation and to initiate a discussion that goes beyond AMD bad or 7950 xdx will wipe the floor with Nvidia while some semiconductors might be botched by one huge flaw it doesn't need a fatal flaw to turn a promising product into an underperforming one I don't think Arduino 3 and 9v31 have a single flaw that will either be fixed or not I think it's a accumulation of different small problems that combine severely limited performance potential of the GPU I also don't want to defend AMD in any way for their botched performance numbers in releasing a potentially sub-optimal variation of Navi 31 but to explore why certain circumstances Force certain actions I am super excited to see if my assumptions will be confirmed in the future or to learn where I went wrong I'm sure I'm not the only one with an opinion on what's going on with Arduino 3 and navi-31 and of course I would like to know what you think do you agree with my assumption that there are bugs within Navi 31 or do you think everything works fine but amd's architecture just isn't that good and if you believe there are issues what's your opinion on possible fixes do you expect a 7950 XCX or is that all nonsense to you maybe I made a obvious mistake somewhere leave a comment down below with your thoughts you know what to do if you enjoyed this video and see you in the next one
Info
Channel: High Yield
Views: 52,074
Rating: undefined out of 5
Keywords: RDNA3, RDNA 3, Navi 31, Navi31, radeon gpu, amd gpu 2022, RX 7900 XTX, rx 7900 xt, rx 7950 xtx, amd vs nvidia
Id: eJ6tD7CvrJc
Channel Id: undefined
Length: 23min 59sec (1439 seconds)
Published: Sat Dec 17 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.