VLOG 3: Industry Advice - Failures

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody welcome back to this session of industry advice so today's topic is about humility and failure i recently made a post on linkedin about a lesson in humidity humility because it was a case of where i kind of rejected a tool upon cursory glands because it didn't fit into my workflow my view of how i troubleshoot and and i you know two years later i learned that i probably shouldn't have done that right and so you know this session is about failure like i said oh before we talk about failures remember that the wireshark session the blue session not the green session will include will finally get to the actual mechanics of how to use wireshark we'll go into some of the ui how to set it up why you set it up the way you do some tips and tricks related to that some gotchas that you should be aware of we'll quickly go through those topics and then we'll get and finally get into packet analysis so it's not too far off and for those of you that stuck with me thanks for sticking with me and hopefully the wire shark sessions will be productive so why are we talking about failures today because you know when you see people present any any kind of industry you know events or shark fest you're seeing the best possible light it's things that are teed up things that are prepared it's like a cooking show right it's nobody ever goes to cooking show and then they take it out of the you know the oven and it's burnt to a crisp you know we're having cajun tonight boys none of that right it's um it's the best possible way and in fact apologies to past marketing folks that i've worked with it drives them crazy because i have no notes in my presentation and i do it extemporaneously because otherwise i turn into a bad actor reading a script on a scrolling screen so you know and every time they say oh that was great do you want to run through the practice again and then we can but then the script's going to change and i'll you know because it's boring for me to be constantly regurgitating the same old things right so i like to speak extemporaneously and these sessions are extemporaneous as well so anyway why do i want to talk about failure because everybody has them and i this may seem a little odd but i kind of it reminds me it has the same feel as when people used to drink a lot i know i certainly did everybody probably did if not in school or as young adults growing up and there's always that one drink that you know if you drink this it's going to take you over the edge and yet you do it well because you're young and stupid the old enough to know better young enough not to care stage troubleshooting is a little bit like that there's always that niggling feeling of i should go down this rat hole but you don't because you're pressed for time what have you and in almost always it turns out that was the root cause and it would have saved you a lot of heartache pain and time if you had listened to your gut feel the problem so why don't we do that why don't people do that because it's a fine balance between going down every rat hole and finding the right one and that comes with practice that comes with feel i call it technical esp some of you have may have different words for it but you just have to develop it there's no way you can someone can teach you that but tools are there to help you as a force multiplier you want the tool to be able to bring to front and center the things that matter the most and that was what the linkedin post that i made about lesson in humidity humility that's what it was titled you can google that and read up on that okay so why do i want to talk about failures because it's important i want you to know that everybody falters you know there's a saying that um in korea there's a saying even a monkey falls off a tree every now and then right and and that's sometimes that happens so i wanted to recount some of the things that that are still painful for me there was one time i'll start way way back one time where um i kind of piped up and said something that i shouldn't have because someone else was getting you know a brutal beat down verbal beat down and it was in the army and i said and i spoke up and said here i got it let me let me take us out of this situation and i didn't quite know because i wasn't paying attention from a land that perspective and i said but this other lieutenant was getting a brutal brutal beat down and i said i got it let's go follow me and i was wrong and we ended up patrolling for hours because um because i i just wanted to deflect that situation and say i think i thought i remember where we were and and and i didn't and and my my nickname became wrong way bae so i still i still remember that i remember another case where i shut down the wrong server and this with with hughes missile systems i had secret clearance i had the you know i just gotten out of the army i had my clearance with me so i was able to work on classified data center and i shut down the network server novell server i walked away because it takes a while to dismount all those volumes and i came back and the server was still running and i was like oh i could have sworn i sat down so i typed down and hit enter and over here on my peripheral vision i see that exit to dos message and i was like oh no i down the wrong server so that happened and and good thing was it was in change control window so no one really freaked out but it could have been really bad right there was another time where i saw something and i got interrupted and i didn't pursue it when i came back and the client paid a lot of money for the investigation of slowness and it was that one connection that i didn't follow up on that was a root cause eventually i found it but it took me a couple of iterations to get there and it occurred to me that you know i postulated all these different things about what could be wrong this could be wrong and here's the thing about packet analysis if you really want to you can pick out 10 15 20 things that are that is not the root cause but it all helps okay it helps with the performance it helps with this or that but that's not the smoking gun or as close to smoking gun as you can find it and and things like that have happened the other one was there was a problem it was about go it was about to you know it was a go live situation where they're running out of time they couldn't get things to work they didn't know why there was this mysterious 45 second delay and i said oh as long as it's repeatable you can find it right anything that's repeatable if you look hard enough you can find it and it was a go live for a large application in a big big country and i got all the traces i you know i and they said oh every time you log on every other time every third time it's kind of you know hit hit or miss there's a 45 second delay and i thought okay well that's not a network delay because 45 seconds is an eternity it's an application level delay they didn't like hearing that but i said there's nothing in the network that can hold things for 45 seconds nothing it's just it just can't happen and i said but we'll find it because it's you know but the problem was that it's a large application um with you know reverse proxy and forward proxy and and load balancers and another load balancer there's multiple layers of complexity and i got packet traces from all of it and i was going through that and let me silence the phone there and and so um where was i the the the large packet trace that i saw i got them and i started looking through it and i realized there were calls um for dns sometimes other times there were no dns calls and i said oh i found it root cause i said hey some of your systems are using uh etsy host file and and or your resolver is for dns configuration is incorrect and it's going to the host file first and not dns others are using dns et cetera i said problem solved and everybody was like yay and i wrote my report and said you know case closed and then they called and said oh manhattan goes you're right there were a bunch of discrepancies in the host file we cleaned all that up thank you and i said awesome you're welcome and they said but we're still having an issue and i was like really and so you know remember everybody's like yay problem solved and so now it's problems not solved and what it was was the one connection that i meant to follow up on that i never did and it was a reset it was a one reset right so again ssl application people reset the connection to close instead of closing it's it's that's what browsers do and my mistake was that this wasn't a browser condition okay this was not front-end communication this was back-end communication so in a back-end communication a reset is something that i should have looked at and while the other host files being cleaned up did make a difference for all these niggling delays and timeouts and whatnot the the root cause of that 45 second was actually that reset and what happened was they forgot to take a ser they took the server out of rotation the load balancer out of rotation but they never updated the load balancer configuration and they didn't have uh the health beat heartbeat check on so the load balancer is like okay you're up next open a connection open a connection open a connection open reset okay and that was essentially the 45 second delay that's the the the thing that annoyed me because a it took too long for me to real remember the context no front end communication no users therefore no browsers therefore all resets are something you should follow up on and i didn't so that annoyed me there was another one where we're making a change on network change router change and i was teaching someone and the person said hey can i you know so i was explaining what was happening and they were doing it and and there was hey what if we did this and if somewhat innocuous and i i said um we shouldn't because we're going to change control window but since you asked this is what happens and i you know i type the commands to show them the effects of a change or whatnot and then that dreaded enter key that doesn't respond right you network people know what i'm talking about and you hit enter you hit enter and then you're like oh please please come back please come back please come back please come back and it did after it reconverged it came back and there was a bunch of carriage returns and that was a sinking feeling of oh man i i made it i deviated a little bit from the prescribed change that i i wrote and there was an unintended consequence again change control window there was no you know no users affected but what if it didn't come back right well it would have been okay i suppose because i always do reloading 30 reloading 50 whatever and so if something goes horribly wrong the router will reload itself so you can kind of go back to known state and start over but it's a very sinking feeling but the biggest mistake of my career was the perfect storm so i was a consultant then i became an employee and as a result of that even though i had a chain of command that i reported into as a consultant it's different when you become an employee and so i reported to my manager john and he was meeting with his new boss because there was a merger between two companies and this was a day that my boss john was gonna go meet with his boss and and she knew of me because you know we were all i was working as a consultant so i was kind of freelancing between both parties there you know that were involved in the mergers and acquisitions and i was troubleshooting a problem of um smith barney branch and there was a constant report of slowness from this branch and it had to do with you know internet and it's hard thing to troubleshoot because there's a proxy involved there's a firewall involved and then there's the the internet itself and of course the application that people are accessing there's a lot it's a very complex thing to troubleshoot so it was on my way home this particular branch was in penn station actually and it was on my way home from commute so john said hey on your way home can you stop by the branch and see what's up and open up a case with operations get some packet capture and see what you can see i said sure no problem so i called up the help desk of the smith barney branch his name was aaron and he said yep i'll come with you and on our ride up he asked me hey the thing that you're going to do it's not really dangerous right because it's not disruptive or anything i said nah packet capture spanning is about as innocuous as you can in fact it's so safe that as policy we can do it in the middle of the day it's quote unquote network change but we can do it in production time because it's so safe yeah that's foreshadowing now the perfect storm that i was talking about was also not just the manager my manager meeting his manager for the very first time right as i entered the smith barney branch but i had forgotten what i always railed against which is turning off safety things like spanning tree um because it you know spanning trees there for a reason right it's there to prevent a loop from forming etc and i always said you know our standard is no spanning tree to save what uh uh ethernet frame every two seconds like it's like saying you're not paying insurance to your car because you're you're not going to get into an accident anyway that's kind of the background story to it but i get on and i i connect my laptop running wireshark ethereal maybe ethereal at the time wireshark maybe i can't remember maybe i brought a sniffer i don't even remember and i checked my mac address i saw the mac address the target mac address that i wanted to see and i called up operations and said hey i got a ticket here open up this spam port from here to here and at least it was after market close i think it was like 4 10 or 4 15. and all sudden the sniffer goes to line rate 100 utilization and i said oh aaron hey i think i found the problem man i said uh there's something here that's completely eating up the bandwidth to a point where it's starving everybody else out and then that little spidey sense said what are the chances that it goes to line rate the minute you told operations to do a span part and then it hit me i didn't check properly it was it was what i call the u design right two routers two switches and ethernet trunk between the two and what i did was i created a loop by spanning one of the ether channel ports and onto itself essentially because i saw my mac address but i didn't see the mac address it was coming across that trunk and i i formed the loop and lo and behold there's no spanning tree to prevent that from happening and how do you get out of a spanning tree um not i shouldn't say spanning tree loop it's a misnomer a bridge loop well you can't why can't you get out of a bridge loop well if you're an ip packet you can get out of it because there's time to live right ip headers have a time to live and after 60 you know four hops it dies but what happens if you create a loop of an ip packet or even ethernet frames like like you know cisco discovery protocol or or spanning tree bpdus ethernet frames don't have a time to live and also ippac gets ttl doesn't decrement if you stay inside the switch so imagine these are the two switches and the ip packets are going around and around it never dies and it's going around as fast as it can so the only way to recover from that is to either unplug or turn off the switches and reset and that's what we did of course by this time the branch manager is freaking out like everybody lost their connection everybody what's going on and i was told this was not disruptive and i was like oh crap so i this was before texting i think or maybe it was texting was there anyway so i texted my message and said hey john i know you're in a meeting with your boss um and but i just brought down the branch he's like what and so i get a call he puts it on speaker so john was meeting with meredith i'll just use a first name meredith and said hey john meredith um a little awkward i know we're meeting for the first time from a chain of command perspective but i just brought down smith barney 100 my fault and this is what happened and and and did i think i was going to get fair fired possibly but i did have a ticket open at least i did that so i had a little bit of uh air cover and i remember meredith said ah happens don't worry about it just recover as fast as you can and i was like oh okay but i felt bad because it's you know my actions were a reflection on john as he's meeting with meredith and i kind of brought that down if you will by bringing down the branch so to this day that bothers me because if i had taken a half a second more i would have been more thorough and i would have brought that down i also talked about you know that failure from my other hobby which is woodworking as you guys may know and every time i do a cut on a table saw i always make sure that i visualize the cut first i know situational awareness of where my hand is where the blade is and sometimes i make a practice cut so i know the sequence of events so i'm very methodical like that i plan things out but because i got arrogant because i knew what i was doing because i've been doing this forever i circumvented the safety checklist that i always say you should follow through on and i paid dearly and while that didn't you know they didn't fire me and rightfully so if they fired me i would have complained it was a fireball offense okay what i felt bad about was that i made john i put john a friend of mine and my boss in a bad situation right so i brought harm to somebody else and and to this day that bothers me more so than the fact that i screwed up now everybody makes a human mistake it's okay that's fine but it's not okay if you short circuit your own processes because you think you know better all right and that's that back to a lesson in humility where people who've been doing woodworking lose their hands all the time because it's that moment of inattention same thing when you're doing changes same thing when you're troubleshooting it's that moment of inattention that brings that a can change the course of your events or hand and apologize for my pc blinking here i thought i thought i muted everything so again the point of this thing is that a not everybody is perfect you should however do everything you can to minimize damage that you can do from a mistake with a checklist every network change that you have you should have a a checklist okay so that just like every pilot has a checklist when they take off and land every military equipment it's called the dash 10 right pmcs preventative maintenance checks and services dash 10 level where someone reads out check this check this check this check this before you actually use it again because lives are dependent on this thing so be in the habit of creating a checklist whether you're troubleshooting whether you're making a change so that you don't make that i know better okay and then the final thing that i'll talk about is there was another point in my career where i had a lot of experience you know taking care of something and there was that one person in the room who said hansen you don't know what you're talking about i've been doing this 15 years i know what i'm doing and my retort at the time because my cup kind of runneth over was just because you've been doing it for 20 years doesn't mean you've been doing it right for 20 years and everybody in the room went ouch burn but i was in the right okay and the reason why i spoke up wasn't because i'm you know that was important that i was right was important because i knew it would bring harm to the project we were working on right i didn't want to do that i didn't want to bring harm because i just needed to stand up and say no no no absolutely not that's incorrect and this is why and once i explained it all everybody came around so when you're explaining things to people don't be holier than dao nobody likes that be factual and and also listen because they may have a point that you didn't consider yourself okay a lot of times we get into that frame of reference of i'm right i know what i'm doing and we get the tunnel vision so you know circling back to that linkedin article that i talked about right lesson in humility that's what happened that's what happened with net data as a tool i discounted it and now two years later i'm realizing but that really could have helped me in some of these cases right so now i'm kind of gorging on some of the videos and whatnot so essentially what i'm saying is no matter how good you are no matter what expert level you're at you can do better you can learn more and above all don't make stupid mistakes mistakes that can be prevented with a simple checklist with a simple flow chart simple walk through a mental walk through in your head you know and don't be leroy jenkins right and if you don't know that reference go ahead and google leroy jenkins and i'm sure you can watch the video it was the first viral video i guess before the word viral was even there so don't be leroy and uh with that i'll sign off and i'll start to work on the actual wireshark course that'll go up in a couple days maybe a week uh it's a pretty hectic week coming up so but it won't take that long okay all right everybody thanks take care have a great weekend
Info
Channel: Hansang Bae
Views: 475
Rating: undefined out of 5
Keywords: Tcpip, wireshark, sharkfest, packet analysis, protocol analysis, sniffer, ethernet, technology, network engineering, SE, systems engineer, hansang bae
Id: iatDNG1o6HQ
Channel Id: undefined
Length: 25min 25sec (1525 seconds)
Published: Fri Oct 09 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.