Probing a Gigabyte RTX 3090 Vision that died when trying to run New World

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I wonder if the failed evga cards in question have a similar failure point

👍︎︎ 31 👤︎︎ u/darkknightxda 📅︎︎ Oct 07 2021 🗫︎ replies

Where do you even get replacements for a failed power stage? salvage from a similar card? what's the exact manufacturer and part number?

It's also probably not trivial to solder, but it's definitely worth the shot for a 2000$ GPU...

👍︎︎ 22 👤︎︎ u/NewRedditIsVeryUgly 📅︎︎ Oct 07 2021 🗫︎ replies

Smart VRM controllers can "turn off" stages depending on power draw. A 1kW VRM with, say, 10 stages can turn off 9 stages when the power draw is 100W or less, to keep things simple.

I'd heard about the asymmetrical power stages before, and something that came to mind was this sort of dynamic VRM control.

Does anybody know if VRM controllers know how or can be programmed to have one or two oversized phases that it uses for low power modes.

Say, in this case, there seem to be two phases that each have two 60A stages... So from, say, ~0W-100W it uses the first big phase, and from ~100W-200W it uses the first two big phases?

👍︎︎ 5 👤︎︎ u/cp5184 📅︎︎ Oct 07 2021 🗫︎ replies

Not wanting to watch a half-hour of rambling, does he come to any conclusion?

👍︎︎ 46 👤︎︎ u/iDontSeedMyTorrents 📅︎︎ Oct 07 2021 🗫︎ replies

the last version of NW ???

👍︎︎ 6 👤︎︎ u/Jacko10101010101 📅︎︎ Oct 07 2021 🗫︎ replies

One story I am reading and seeing YouTube videos about is a certain game from Amazon called New World is causing expensive graphics cards to suffer a hardware failure. So is it the game or the graphics card?

👍︎︎ 2 👤︎︎ u/Whoknew1992 📅︎︎ Oct 07 2021 🗫︎ replies

Could anyone smarter than me explain why this has only happened to Nvidia 3000-series cards (not previous generations or AMD cards)? Is it because not a lot of people are using AMD cards, so it doesn't attract enough attention?

👍︎︎ 3 👤︎︎ u/huy_lonewolf 📅︎︎ Oct 07 2021 🗫︎ replies

Can't wait to see more on this, particularly if you find out anything! First I'd heard conclusively that it was more than just EVGA cards.

👍︎︎ 1 👤︎︎ u/Kougar 📅︎︎ Oct 08 2021 🗫︎ replies
Captions
hey guys bill zloid here and today we're going to be taking a look at an rtx 3090 that tried and failed to run amazon's new world mmo so this right here is a gigabyte rtx 39 division it was sent to me by its owner because well he tried to play amazon's new world mmo uh the card promptly turned off and then never turned on again and because he bought it second hand on ebay um gigabyte basically rejected his rma request and so yeah he decided like and you know i went to like and basically i made a post on twitter asking if anybody has a card that they'd be willing to loan to me so we could figure out what is it that's actually dying on the 30 90s because you know it's all well and good to know yeah 3090s died playing this game but i want to know why they die like what part of the 30-90 is new world uh killing um and so that's what we're gonna try to do today um well i've already figured out what part it is but yeah we're going to take a look at that um and before we get into that a little bit about the card so this is not of this is an rtx 39 division from gigabyte and it is actually pretty custom um more like in at first glance i thought it was reference but uh on closer inspection it became very obvious it is very not reference um so yeah i mean the obvious things are sort of like oh you got these power connectors but those these are power connectors those don't really matter in the grand scheme of things um no what we have as major differences here is we've got memory power over here we've got a memory phase over here um we've got three phases of msvdd above that this is normal this is like reference style and then above that we've got one two three four five six power stages of v core this is not standard normally you would expect to only see i think four of them in this area and here we have six of them so yeah um we've got six v core power and i'm saying power stages or well like power like channels because the thing is the voltage controller on this card is a up 9512 and there's more power stages and inductors than there are phases on a 9512 so gigabyte has to be putting some of the power stages in parallel and therefore it is incorrect to say that these are all six separate phases because i don't know that i've not checked that they run out of phase with each other um but anyway so we've got six power stages and inductors over here uh on the other side and then we have a memory phase over here this is pretty normal we've got another memory phase over here then we've got v core v core v core v core memory and then two more phases of msvdd and so msvdd is a five phase that's standard reference rtx 3090 stuff right there but we've got 10 power stages for the vcore vrm and also gigabyte upgraded the power stages from 55 amp drmos components from alpha and omega semiconductor to 60 amp uh drmos components from alpha and omega semiconductor so yeah this is basically like on paper this is a slightly upgraded rtx 3090 pcb for the capacitor configuration we've also got some upgrades like we have a few capacitors that most manufacturers don't bother populating like this middle one is present we've got 470 microfarad uh polymers instead of the nvidia spec which is 220s so we've got more capacitance behind the core we've got some optional capacitor pads that gigabyte didn't bother with um and yeah all around like the you know at a glance i'd this really just seems like a slightly upgraded rtx 3090 pcb and it's a bit confusing that it would fail because you know it's not the absolute bare minimum of the specification the voltage controllers are as i mentioned earlier at 95.12 uh for the v core we've got a 95 uh 12 for the msvdd and then there's a 9511 for the memory rail and that makes sense because the memory rail normally just sits at one voltage all day so you don't really need a more advanced voltage controller for that like the 95 12s that we see for for vcore and msvdd so yeah the only thing that i find suspicious about this card in this scenario is the fact that we do have more power stages than we have phases on the controller and i still haven't actually had a chance to measure how the vrm like handles that but anyway um that's sort of what we have in terms of gpu setup right like it's two eight pin power connectors with these weird extension connectors and then we have some power from the pcie slot so let's get to some probing um and luckily for us gigabyte uh included a bunch of board protection fuses and i call them board protection fuses because they're not there to protect your power supply um they will not like they're not supposed to do that like that's not their job your power supply has is supposed to have its own protections built in um and uh they're also not there to prevent the card from failing what these fuses do is when you have a catastrophic power stage failure they prevent that failed power strikes age from just pulling in way too much power and destroying everything on the card um or more like destroying the pcb of the card which is actually very common on say 980 ti's it's not too uncommon to basically have uh the bottom two power stages burn out and when they burn out while they burn the surrounding area of the pcb to charcoal and then you've got like some serious you know problems to deal with when trying to fix the card now by including fuses um the gpu like the fuse basically limits how much energy is available to the short circuit when it occurs which means you don't get that you know massive amount of heat damage that you get when you have cards with uh no fuses um so yeah good good on gigabyte for including the fuses um let's see now let's check those fuses also they make troubleshooting a lot easier because you just have to check if the fuses are shorted and that'll immediately tell or more like tell uh check which fuse blue and that'll tell you which part of the vr like what part of the card failed um so this fuse isn't blown it's just my like my pros being there so that fuse is nice and healthy and if we go over here we've got another um healthy fuse right very low resistance then we get to this fuse over here and this fuse oh come on turn back on oh nope nope there this mode um yeah so that fuse uh has 4.7 kilo ohms of resistance so that one is blown that one is very much blown um but there's no visible damage on the card anywhere like you can't even tell that anything is wrong with the card um but you definitely do not want to just replace that fuse um if you replace that fuse you're gonna have a very bad day actually your power supply will probably stop you from turning the system on but if your power supply is crap you're going to have a really bad day so let's check what the fuse why that fuse is blown so we're going to grab our ground off of the pcie slot bracket and we check this fuse and we can see that the resistance is going up and up and up and up on this side on the other side of that fuse we find the same behavior the you know resistance going up and up and up and up and that's normal because on the 12 volt power planes we have a bunch of capacitance and so you're not going to get like a steady resistance measurement a lot of the time but yeah there's nothing really wrong with that and we know there's nothing like if there was something wrong that fuse should have blown so we don't have to worry about that too much if we check this fuse over here we have extremely high resistance which is completely normal that that's not like a contact problem that that's actually completely normal that's what you'd expect um i mean some cards will measure like hundreds of kilo ohms but yeah on both sides of this fuse we get like a fraction of hundreds of kilo ohms again so that that fuse is obviously fine and that's why it didn't block like both that fuse isn't preventing any anything like that it's fine um which is also why it's not blown but if we check the blown fuse so from this side to ground we get 4.7 kilo ohms which is the resistance of the fuse itself and if we measure from the other side of the fuse the side of the fuse which is actually connected to the v core vrm yeah um hmm yup that's that's no good so that is 0.14 0.15 ohms um which uh conveniently enough is roughly the same resistance as what you get if you measure the v core to ground assuming that i don't have contact issues now this is a very normal low res like the core of a gpu like this this is a huge chunk of silicon a very low resistance like 0.112 ohms that's completely normal that's in line with what expected uh like that's what you'd expect to to measure uh on a card like this so the core is like there's no short circuit on the v core okay like that is not a v core short circuit that is 12 volts shorted out like potentially 12 volts shorted into the v core rail or 12 volts shorted to ground and if we try to measure um actually if it was 12 volts shorted to ground we probably wouldn't see the resistance being almost the same as vcore but yeah like the vcors the the core is not shorted the vrm is um and we can see that here because if we check say msvdd which is also a very low resistance rail well let's check memory first so that you can see like a more normal power rail so memory is you know approaching 40 ohms of resistance and that's normal for the memory rail on a 30 90. and if you check the msvdd we have about twice as much resistance as we have on core which is what you'd expect it's a lower power rail there's less silicon connected uh connecting that rail to ground so it makes sense um but uh yeah so we have a short circuit on probably the core portion of the vrm and connected to this eight pin power connector so we're gonna flip the card over and now we're gonna try to figure out which specific power stages are uh connected uh which group of power stages is blown up because the thing is uh from the v course like all of these power stages are in in parallel with each other inductors have basically no resistance to them so if you measure through a inductor like this because an inductor is basically a very fancy piece of wire and if i could get contact on the probes to work oh the multimeter decided to stop being cooperative again sometimes does that which is really annoying because that's like an expensive keysight meter and yet very frequently could i yeah unfortunately getting measurements off of inductors just doesn't tend to go very well there we go yeah so measuring through an inductor just they're very low resistance it's basically a fancy piece of wire and because of that uh all of the power stages are in parallel with each other um so if you're thinking of like going oh i'm gonna like check each phase doesn't work it's never gonna work they're all in parallel you can give up um so what actually we're gonna do is we're going to be checking the 12 volt input filtering capacitors right behind the power stages because by identifying which of these capacitors are shorted out we get a rough idea of which power stages could be shorted out and by input filtering caps i mean this like this cap over here this cap over here that cap this cap this cap this cap this cap this cap this cap this cap uh actually you can see the last one but yeah uh these caps and we're just gonna be checking those we're gonna check them directly there because i don't know which side of those caps is ground and which side is 12 volts and i don't really care to figure that out and then we're going to do the same thing on the other side of the vrm because i don't actually know what the 12 volt power distribution on one of these cards looks like because i don't have schematics um which would certainly make this a lot easier you know if you had schematics and you just went like okay well this fuse blue and then you could go and check with all like all of the power stages that are connected to that fuse and like you'd be done you you could literally skip this part of the process that we're gonna go through but yeah uh for one reason or another you can't get schematics from other boards or gpus or well anything um so yeah let's get through uh let's get to this so we're gonna start with that man i've managed to entangle my probes so i'm just gonna untangle them for a bit and then we're gonna start with that cup over there i was having real issues with getting the the contact right so that one just sees yeah resistance rising so that one's fine check this one over here that one's fine check this one over here and if i could get the probe to stay where it's supposed to be not fine yeah very not fine so we're gonna mark that one um because uh i'm not gonna remember it otherwise i'm just gonna check that it is the same one because yep so actually i wasn't paying attention to what i was looking at so i'll take that one damn it there just put a dot and let's check another one this one should also be shorted like i went through this before so yeah that one's also shorted oh come on i mean it's just the next one in line so we're gonna put a dot near it this one that one's fine yep the resistance just goes up and up and up this one fine this one fine and fine come on also yeah that that one's probably connected to the bc pcie slot and this one fine okay so that's this side of the card checked now we're going to go over to the other side we don't need to check the memory because if we go from you know if we go from the memory power rail to to the blown up fuse we just see memory resistance so like that's that measurement is telling us that we're measuring memory through the memory chips to ground and then through ground to the probably through the failed power like through ground probably through v core through the failed power stage and to this fuse so we don't have to worry about any of the memory phases because all of the memory phases are like they're they're all fine um we don't have a memory failure memory power failure so you don't need to bother checking the input capacitors for the memory we're going to check this this is a memory cap and isn't it yeah that one's memory right yeah so that one's in line with some of the shorted out ones actually that would be kind of whack if maybe it is the memory that failed let's check this one okay that one's shorted as well hmm because they're like normally when you have a power stage fail it shorts the power stage to the output um but if it shorted the actually well if it somehow managed to short 12 volts just to ground without interacting with the power rail i still think it's v core but that is something like it is worth keeping in mind so i'm just gonna note that but i really don't think i'll have to actually deal with the memory um like if we had a short circuit from 12 volts to memory then i'd say yeah we need to do something about the memory power stages but currently um like the thing is this is just telling us that those memory power stages get power from the same place that all of the vcore power stages are getting it from yeah that one's shorted as well isn't it and we're going to check this one i think like in my experience power stages fail high side like they don't fail the low like the low side doesn't really tend to fail very often that one's also shorted because of course it is pretty sure most of the side of the card is shorted out and this one also shorted up and if we check this one this one's fine yeah these probably run off of the pcie slot don't they fine and fine and this bottom memory one actually no this one's msvdd isn't it yeah that one's all actually that one's wait a minute what are you oh right that's msvdd oh fun so this bottom one not fine why would you man nvidia cards and their 12 volt power distribution is something i will never understand like why would you just split the power connectors half half between your power phase between your phases why why make a why make the card so complicated but yeah okay so um i still suspect v-core more so than i suspect anything else but it is worth noting that msvdd and the reason i suspect v-core so much is just the resistance is so low like msvdd right if we check msvdd to ground um it's relatively high resistance it's more resistance than the core is right like that's 0.3 ohms right now um also the the resistance of the gpu core actually goes up and down based on how cold it is which gives me an idea i do still have some liquid nitrogen we could try cool the card a bit um and see if we can bring up the bring up the core resistance so that we get a better idea of what's short yeah that might work um so i think i'll test that off off screen but yeah so far so basically what we can say right now for 3090s is just like well some part of the vrm fails it's probably v core based on the fact that if i measure from that blown fuse to ground we get what looks like v core resistance right because if i check the v core phases um we get a similar resistance to what we get on that shorted out fuse assuming that the probe stops having contact issues you know i put quite a bit of effort into trying to clean the card so i could not have to deal with the fact that there's junk all over the pads but yeah whereas if you check msvdd like it's still low resistance but it's like twice as much as what we get on vcore so i don't really think yeah like that's basically twice as much as what we get on vcore so i really think like that would show up on the fuse i think so i really think the v-core vrm failed um and the repair would be to just replace them like the failed power stage um after figuring out which one it is so yeah that's that's it for the video um at this point like the issue with with having just one of these cards for this kind of failure is um i don't know if it's always the v core vrm that fails i don't know if it's always the same power stage or same set of power stages on the card that fail right like you'd have to have more than one card like having one card and going like okay well the vcore vrm failed um which i'm 99 sure is what happened here um and that doesn't tell us if that's a design flaw with the vcore vrm itself or if it's a like if because like we have options like the vcore vrm is too small and therefore uh cards like this and cards like the reference cards would die playing new world because this should technically be more powerful than a reference card since it has more power stages on vcore than a reference card however it could also be that because this has an asymmetrical vrm uh the way this vrm operates is bad right because it's asymmetrical and so the controller isn't doing a good job of dealing with the fact that there are 10 power stages connected to eight phases which if that sounds like insanity to you join the club like no like no motherboard vendor voluntarily does this to their motherboard nobody does this on motherboards you like instead of like alternate what you could do is run this as a five phase and that should that shouldn't have any weirdness to it but i have a suspicion that nvidia is so obsessed with current balancing that they wouldn't allow the manufacturers to just build it as a five phase um so yeah um but basically so the the options are one the v core is just completely like un inadequate in terms of overall power handling capability which would mean all 30 90s are susceptible to this or at least all 30 90s based on the reference designer susceptible to this or actually reference cards would be more susceptible to this because this should be more powerful two it's a design flaw in the current balancing of the vrm due to the fact that the phases aren't symmetrical if that was the case if that's the case then that that's just like no more please no more asymmetrical vrm designs thank you very much um two um dodgy over current protection but that i feel like would still be tied to the asymmetrical vrm design um because if you have an asymmetrical vrm i'm really not sure how you're supposed to implement over current protection on this because your over current protection should be like as far as i know most controllers set the over current level for the whole vrm at the same time so each phase like you can tell hey all of the phases have an 80 amp current limit or a 50 amp current limit um but if you have two phases that are supposed to have twice the current limit then you'd probably end up setting the current limit for the whole vrm to the to the ocp of the two most powerful like your most powerful phases and it's like but what if one of the weaker phases gets overloaded the controller is not going to stop that um so yeah anyway i'm going to try fix this card um and uh yeah i'm like like unfortunately this doesn't answer the question of like why do these cards fail um because like if this was a nice symmetrical vrm i'd just say okay this is grossly inadequate in terms of power delivery requirements for new world and nvidia should probably set some kind of like power limit specifically for new world but this is not symmetrical um this is right we've got 10 barrel stages on eight phases so or i'm assuming it's on eight phases if it's ten power stages on five phases then we go back to like it becomes a symmetrical vrm and then it's like okay then it's evidently not powerful enough um but i yeah i'm not sure about that actually there is a way to check um with a lot of really annoying probing to see like how the phases are distributed between the actually you know what i think maybe there's a chance i could get schematics for this card maybe i i seem to think like i i seem to remember seeing a leak of a whole lot of gpu schematics recently maybe this was among them um and that would make it because the thing is like i'm like you'd have to i i don't feel like checking the current sense configuration um well yeah um yeah so unfortunately i don't have a verdict on what's failing on these cards the thing is it is worth noting that like the ftw threes which were the first cards reported as failing those were all like those are asymmetrical very asymmetrical they have an eight phase controller driving a 13 power stage vrm um if i remember correctly like that like when you build a 13 power stage vrm it's like okay we are making our lives difficult like we are intentionally making this as complicated as possible um but i also saw like i also saw somebody report like a strix failing trying to play new world that's also an asymmetrical vrm though i think that's like 10 real like really high-end power stages with a really high-end controller but it's still not a symmetrical vrm so yeah it could just be like a like a fundamental problem with hey if you build your vrm asymmetrically that just it doesn't work um in like worst case scenario becomes a very real problem um yeah i'm not sure about that because this is a one-off and if it was consistent that it's always like if it's always this fuse that's blowing then that sort of points to there being probably an issue with the fact that the vrm is not symmetrical um and therefore like an issue with the current balancing which again is just tied to the fact that you don't have symmetrical phases because the controller like if you're trying to current balance an asymmetrical vrm i can't imagine it being very easy compared to what controllers are normally meant to do which is just evenly distribute the current between all of the phases to the best of their ability um which i don't know how like with a vrm where your con phases aren't all the same size you need to preferentially put more load on like whichever phases are bigger um and the controller probably isn't thrilled about having to do that um so yeah um i wish i wish i had more more conclusive information here but uh yeah like the thing is i also saw a zotac card burn up but that might have also been slightly customized um because the thing is like i'd love to know if anybody's had a reference like proper reference card die um and or if it's only been affecting cards which have a few too many power stages relative to how many phases they have um because yeah so far this like yeah anyway and also it's worth noting that gpus generally have really high failure rates compared to everything else in a system um so there's going to be some cards that just fail because they snuck past the the quality assurance checks at the factory but um yeah anyway that's it for the video so big thank you to the person who decided to loan this card to me um because uh yeah i mean i was really interested to find out what fails on these and i was honestly expecting it to be something like the the memory control like the msvdd rail failing but apparently it's just vcore which is like the most normal failure for high-end nvidia gpus ever because gpus that blow up vcore are 980 ti's 780 ti 780s titans titan x maxwells 1080s 1070s gtx 590s uh what else is like like basically any high power draw nvidia gpu ever made if the vcore vrm doesn't have a tendency to die it's weird so yeah um so this is really anticlimactic from for me in that from that perspective of like oh it's just v core because it's always v core with nvidia cards but uh yeah um that's it for the video so thank you to the person who decided to send the card over um i'll see if i can get it fixed for him um and uh yeah that's that's it for the video so thank you for watching like share subscribe leave any comments questions suggestions down in the comment section below if you like to support what i do here with actually hardcore overclocking i have a patreon there's a link to that down in the description below there's also the hoc teespring store where you can pick up shirts stickers posters you know the usual youtuber merch both patreon and teespring help out immensely with running the channel so it would be much appreciated if you check them out and that's it for the video so thank you for watching and goodbye
Info
Channel: Actually Hardcore Overclocking
Views: 14,630
Rating: 4.969543 out of 5
Keywords: Overclocking, PCbuilding, Buildzoid, AHOC, Actually, Hardcore, Hardware, OC
Id: w-zSgiMEzkU
Channel Id: undefined
Length: 29min 0sec (1740 seconds)
Published: Thu Oct 07 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.