TCP Fundamentals Part 2 - Wireshark Talks at Sharkfest // TCP/IP Explained

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] welcome back part two of Wednesday morning okay let's talk about some TCP windows so we talked a little bit about how that number is established at the beginning session right so when we are advertising keep this in mind when we're advertising window in a packet okay so I'm a syn I send off my sin to the server and I'm advertising that number that's a receive window okay so there's basically two windows that we'll talk about there's receive window and congestion window okay the one that you see advertised in your packets is the receive window the sender of that packet is saying don't send me any more than this amount of data that's my receive buffer at once okay so going back to hand songs analogy that I really liked you have your you have your pipe and I want to fill that pipe with water that's my goal if I have a big data transfer the server can send water over to me in a bucket but it can only send as much as I can receive into one bucket at once otherwise what happens if I got a thimble sized bucket and he's got a big five gallon paint pail and he shoves that down the pipe and I'm there with my cymbal going oh yeah right right so that server is limited by my receive window okay now in a perfect world I've got in a perfect world the network has infinite capacity right the receiver can receive it in infinite capacity we have this huge bucket that will never fill the sender can send it infinite capacity there's no limits our sending bucket the amount of okay Oh got a bunch of water to send our buckets so big it's a swimming pool the pipe to carry that is swimming pool size and it's going into a massive swimming pool that's in a perfect world the reality boom as in yeah rock and roll in reality the network has limitations let's just be honest right we're limited by bandwidth we're limited by latency we're in limited by things like packet loss congestion errors discards buffers along the way that can get overwhelmed we're limited by translating between ten gig to one gig we're limited we have lots of limitations from a network perspective so end to end I don't have this massive swimming pool carrier between these two end points especially if they're any distance apart from each other unless they're on the same 10 gig 40 gig hundred gig switch and they can make the most use of it if they're just cabled in and right next to each other well client-to-server and most of the stuff that we're gonna be troubleshooting isn't like that there's some network in between I mean maybe maybe we'll be doing that type of troubleshooting that's possible and I'm actually going to show you a few examples there but the reality is that we do have limitations on the network another reality is the client has limitations okay I might be able to tell that server I can only receive I'm just making up a number 8k at a time that's all I can take why I got a bunch of other resources going on I got other things happening maybe I have this receive bucket and the application isn't scooping it out fast enough right so my receive bucket to receive all that water can be a limitation that's set by the TCP stack another reality the server has limitations or might intentionally limit itself due to network conditions let's bring up the idea of congestion window now a congestion window can be this nebulous number that no one sees and knows it's basically the amount of data that that server can send at once how big is the bucket that it can pour data down the pipe to to the other side to how big is that bucket we're going to talk about it for a little bit but first I want to make sure we understand the receive window that's a more clear one to get all right let's talk to take a look take a look at the receive window just want to walk you through an example here see window okay back here alright now we missed our handshake but I'm just gonna tell you these two stations weren't using window scaling alright so the number you see is the number you see what you see is what you get this window size value is true it's real don't worry about calculated window size at this point I'm going to remove this column all right okay so basically what was happening here give you a little bit of a problem statement one of my clients was backing up a database from one backup server to the other just moving moving from a primary server to a backup server same switch gig attached sitting right next to each other in the same data center literally and they were just like moving this data from one machine to the other now originally they started this backup after everyone was done with their workday they go home at 5:00 Pacific time they start the backup and by the time East Coast people were coming in to work the following morning think in terms of Pacific time so the following morning at about 5:00 a.m. people in the East Coast started coming into work around eight ish in the morning that backup was still running by the time people were coming in so gig attached it wasn't a ton of data I don't know I can't remember the exact details it must been found 300 gigs four hundred gigs something like that I mean just doing the wrong numbers on that think about it even if I okay let's just say at eight hundred gig database right if I have one gig interface one gig interface and I'm moving stuff at line rate how long should that really take 800 gigabytes divide that by or multiply that by 8 to get the bit value or add bits to bytes right and then divide that number by my link speed so basically that shouldn't take a whole lot of time right but this thing was taking all night 8 hours so what was the first thing that these guys thought to do well one gig we need some 10 gig on this thing now smartly if that's a word it isn't ironically they first did a packet capture and we talked about it before they went and completely forklift out that infrastructure brought in all a bunch of 10 gig and snapped everything in this is just a sample of the trace the trace was huge but it does start to show you some of the the problem that we have now right here for Delta time I'm just going to come down here real quick notice that everything you see so far is sub-millisecond all right eyes are here eyes are here all zero zeros everything sub-millisecond we're cranking or slamming data we are just moving and about here is yeah everybody see that okay so we're slamming data and then we have this 19 milliseconds now 19 milliseconds am I gonna fuss about 19 milliseconds in on the grand scale of things am I gonna go oh man there's your problem right there well I mean 90 milliseconds big deal right 19 milliseconds doesn't hurt if we suffer it once 19 milliseconds hurts if we suffer it thousands of times everyone agree so sometimes it's death by these little delays that'll get you sometimes things are a hundred and eight full seconds of delay like that the trace file that I showed you before the break 108 full seconds whoa boom there's my delay Wow goo it's clear right application delay but sometimes it's these little guys that'll bite you the question is how often do I suffer them I'm just gonna take that value right there 100 packet 140 and I'm gonna sort my Delta time what this does is it puts all the deltas now the worst ones together I can end at the bottom of the list so I can see that 19 milliseconds is not something that I only suffer once I suffer it quite a few times and I did have this outlier this 60 millisecond outlier but this was the server coming back to the client I mean at one point it took 60 milliseconds but most of these delays are these dead acts coming back to the server okay these are the empty packets that's what's taking 30 - 25 20 19 19 19 19 19 19 I do have another one out here it's another outlier I can investigate that but most of them and this is just a sample trace file this is just a very small amount of this trace file if you took a larger section of it 500 Meg's or something like that you would have seen these 19 milliseconds go on into the sky so we suffered it I mean the grand total of this trace this year this isn't even a one second trace this is only 800 milliseconds of 873 milliseconds of the total trace this is just BAM and this thing went on for hours right okay so I'm concerned about those little pauses so come down here one more time I found my little pause no what is it that I was waiting on data is coming across you see my 15 14 s that's my Ethernet length data is coming so the server that wants to back itself up and send data over to the other guy he sent in big packet bad bad bad bad bad bad bad bad receiver of that data every other packet thanks thanks thanks thanks thanks it's incrementing its sequence number excuse me it's incrementing its acknowledgement number excuse me well then we get down to this point where we wait the server has to stop for a second packet 139 and we have to wait for 140 to come in that was a 20 millisecond pause after that been more data okay so let's take a look at this pause what on earth would cause that pause why was the client this is the the receiver of the data I'll just call it the client the receiver the data is getting all this stuff coming in he pauses 20 milliseconds and then boom he sends off this acknowledgment starting alright let's see what's going on here so first of all if I I'm gonna go ahead and add a column up here to my profile here if this is already in another profile and I just don't want to do this to you I just jump around profiles I'm just gonna build this out as we as we're talking in fact I'm going to minimize data what decide is alright so window size receive window so notice this number here in one direction with the 15:14 i got 65 160 the send the one who's sending the data he's saying i can receive 65 160 at once that much data can be out lying in the pipe you can send that much to me at once and i'll go ahead and start my acknowledgments now the thing is though is that this server is not the one receiving right that one is sending the big packets are coming from that one they're not coming from dot too so dot one can advertise any window size that he wants to advertise but he's not the one doing the receiving here so 65 160 that number just look at this number here it doesn't change he's not receiving data he's sending stuff now in the opposite direction though you notice that this number is changing in fact yeah you got a question [Music] correct the question was I mentioned that they're not using window scaling would that be a consideration if they were not if they were using window scaling for sure yeah absolutely and I'd be able to tell because a lot of times what you'll see if you miss a handshake a lot of times you'll notice like a number let me just take a look at the dot twos these are the empty acts coming back for this data all right look at these empties in fact I'm just gonna say I don't even care about the the the packets with the data I just want to see the ones coming from 192 dot 168 1.2 BAM all right ok so these are just the acts all right just the acts coming from the server that I'm sending the data to now a lot of times I can tell if I got a window scale or not based off this number a lot of times the number will be smaller it'll be something like I showed you like a 64 or you know 120 I don't know it'll be a real small number that doesn't make a whole lot of sense that's how I can usually tell that number was scaled I just don't know what the scale factor is however here I'm getting a full-sized one it might be scaled might not but this is how I know for sure so 65 535 that the receiver of the data is saying hey you can send me 65 535 at once I'll be cool with it that's how much you can send that's my receive bucket for the water that's coming at me it's sending these acts it's saying great keep sending keep sending I'm good keep your eye here on the window size value there did it did it I'm good man rock and roll whoa this number changes in fact this number as we're going forward with our acknowledgments this number is starting to where are your number here this number that's how it's starting to go down this number that receive window is starting to fill now think about it in terms I'm a receiver data is coming at me water is coming down the pipe I've got a bucket that's catching that and I'm hopefully dumping the bot the water out as fast as it's coming into my little application over here like here you go here you go now in this case the application is going to come in and basically take the water out and do its thing with it TCP is sitting there water's coming in app we're starting to fill okay sixty five five thirty five is what I usually could do but right now we're starting to get a little bit of pooling on the bottom sixty four nine fifteen that's all you can send now cuz it's started Oh more is coming in 61 995 5975-5987 shepherd sending the data can only send so much he is now limited by that number outstanding so he sends a bunch of stuff now keep in mind there's no latency between these two boxes it was like in the I don't know I would almost say it's in the nanoseconds very very very very small mentally error microseconds sure anyways this comes down boom boom boom boom boom all the way look at our window we come down to 20 to 99 all right let me remove my filter so now we have this window size so this number is dropping too big packets and if you if you subtract the actual amount of data that's sent these big packets have 1460 in them so 1460 1460 if you subtract these two that amount of data from my sequence number you're going to notice that my I'm sorry my window size here notice my window size is dropping exactly by that number in fact we get down to 20 299 that is the client saying hey server this book it's full I got a 22 99 that's it that's all you can send so the server on his side he's dealing with full size MSS values you notice he just says 1 he sends one more 15 14 just one more packet that's packet number 139 there he sends one more 15 14 look that's all that'll fit I realize you have a marginal amount of data in there but you know what I've already segmented this stuff up I'm doing dealing with full packets here trying to be efficient there's one more you can't receive another full size packet I'll stop your buckets full client he's gotta wait until that bucket clears by the application packet number 139 or 140 [Music] packet 140 BAM right here this is the AK look at his window size 65 535 so now the receiver of the data is saying hey I got an empty bucket now layer 7 came in here took it all out what's rock n roll that took 20 milliseconds there was that pause now that could be the amount of time that it's taking the app to come in and clear that window will start digging but that in this case that 20 milliseconds was on the client side why because it's receive window build now it didn't go all the way to zero that's a TCP zero window had it gone all the way to zero we would see Wireshark indicating some things I have some traces of that but basically whoa you got a zero window that means stop sending zero window I cannot receive anymore why my buckets full and TCP is going I can't do much else until that guy comes and gets it so no more water okay question good good thinking this is not a window update because actually that's a flag by Wireshark so Wireshark could have or should I shouldn't have this is a full window again it's not just a marginal window update usually you see that if I sent like let's imagine that I sent like a 300 I'm sorry like a 30k and then I write after that I send like a 65 535 that one will usually be tagged as a window update I have some traces that show you that why this one doesn't it must be just because once it returns to a full the the rule for showing that alert isn't triggered why developer den I don't know there we go that's true I forgot you're right if it's acknowledging new data it's not gonna be a window update it has to be basically an empty back it's like a canned then one right after it that's empty and not acknowledging new data that would be your true window update thank you thank you all right so what happens next we got a full window receive window got emptied out buckets clear data comes in from server Bam Bam Bam Bam mmm big packet all right now again keep our keep your eye on anything coming from dot - all right in fact I'm gonna set that display filter again all right so here we are 155 here's our receive window I'm sorry the source sees time - there we go ok so we went down to very small not zero and now we're wide open again we're accepting new data coming in coming in coming in coming in look at our window size when does our bucket start to fill whoa whoa hang on here all the way down to here there is our window update so 1459 so this is 1 byte less than my MSS this will cause the server to pause this time it pauses 14 milliseconds and then it starts to send its data again so we're gonna start to see a pattern here as I scroll large window window drops drops down to a small number there was actually one more data packet after this one that we have filtered out 19 milliseconds a weight then we send again right so we're fire hosing data when we can but the client is hanging us up because it's receive window is filling all right every time we see the receive window drop to less than the MSS we see that delay so right now is this a network problem would upgrading these links from one gig to 10 gig do anything is this ascending problem what's the issue here app on server sides not coming in and cleaning out that that TCP buffer fast enough now for me this is where we'll get in we'll look at a few of the nerd knobs on the receive side within the app see if we can get in there and tweak anything to go in there and clear it out faster in this case they actually I think they moved to a different backup method that use TCP windows that were much larger and more efficiently but in this case the hang-up was on that receiver it simply could not the app was not taking full advantage of both windows and the pipe between them question this is the sender's so whoever sent this 1.2 is getting the data and he's just saying my buckets empty and boy it's filling it's filling its filling well hang on I only got room for one more and that says stop because I don't want anything to spill out right so now I mean I do have a couple options here and we have done this in the past what if I just gave the receiver a bigger bucket right enable window scaling put a multiplication factor on that that's one option probably get things to go a little faster because he's got a big bucket but the root cause if I give him a bigger bucket it might take a little bit longer but I'm still not seeing app guy come in here and get that water out if I give him a twice as big bucket it's good for a while but then boy it starts to fill so root cause I need to get in there and that app needs to scoop that data out faster now for me that's what I say all right that's our problem we'll dig in to dig in to see if we can find any tweaks and you know things that we can adjust but a lot of times it involves involving the vendor and they got to get in there with their code and figure that one out but we found we absolutely I say with isolated root cause I have a question oh yeah I was just about to show you that okay so now that we understand I don't like to just throw like a TCP trace graph out there until we understand what it's doing so that's what we've just built - okay let me pull up there's a handy little graph statistics let me come to TCP stream graphs let me go to time sequence TCP trace all right now what I want to see my goal is I want to see a nice straight line my x-axis is time right there's my 800 milliseconds of trace file the y-axis is Quinn's number so that's number each byte has an Associated sequence number so what I want to see is a nice flat boom line that goes as steeply as possible up and to the right that means I'm moving data there's no pauses there's nothing hung up on either end data is moving efficiently but as soon as you see this stepping that means that something's pausing one side or the other we have a weight bends up for you and I to dig in there and figure out what that weight is now a nice thing about TCP trace let me just zoom in here a little bit here go to zoom I'm just gonna get this guy right there okay okay so you can't see if hole you might be able to see it from down there but each one of these little guys here is a is a packet pump up up up up up up up a Baba go there's a little brown line underneath this row of packets here and that represents acknowledgments or what has been acknowledged okay let me dig in a little further all right so that brown line you see on the bottom that represents what has the the receiver actually act okay so as I see these guys go up what I want to see is that brown line should come up and meet those packets all right it just means yep that's been act that's been acting as you notice if I go up into the right you see acts are coming in okay two more packets usually see two more packets ACK you know a couple packets coming in ACK alright the acts are keeping up just fine now we back up from here there's another line see this line up above me this is the receivers receive window my goal is I always want to have between the packets and that green line ideally those never meet but this is a good good trace file to show you remember the behavior what happened here is not too far into my packet stream you notice my receive window it's coming up it's keeping in step with incoming data if we measured that that's exactly 65 535 that's the space between where the packet is and where that line hits that's exactly 65 535 and it's going up like this and then at one point it freezes right and then it goes straight notice that now the amount of space between my data and that window is closing the two are starting to meet that window is filling it's stuck as soon as my packets meet that window pause that's what causes this 20 millisecond pause let me zoom out a little bit right that pauses there is because I ran out of the Green Line I didn't have any space between the data coming in and that green line so it pauses I have to wait until that green line goes up notice as soon as that green line goes up boom servers like sweet boom data and then a certain point in we start to see the receive window freezes data comes up to meet it so whenever I have that type of pattern right now that you've seen this in the packet that's why I like to show you first this is what's happening with that window size this is what this means now we go to the TCP stream graph hopefully these numbers make a bit more sense now when we look at this pattern you see that green line it opens we send data we stop it opens we send data we stop it opens we send data we stop so we are limited those hang-ups this stepping is being caused by that receive window filling okay now how do we fix this again we got to get into that receiver and figure out why that app isn't coming in and scooping out that data I wanted to work on this one longer with them and really get to see what was going on within that part of the stack however they chose to go with a different backup algorithm and then this problem went away but you hate it when an upgrade fixes it you're like no no what was wrong hate that I mean it's good that the problems gone but no let's find out which little tweak we could have made to fix it but anyway a lot of times for me the what happens especially when I'm consulting my customers just like just look just tell me if it's the network or not oh it's not okay done not us send your report no I want to know what it is all right so that was a receive window problem now let's talk about the opposite side okay congestion window ooh the mysterious congestion window okay pipe we've already talked about the receive bucket outstanding how much can be sent at once once this guy starts to send acknowledgments and we can continue to have that much outstanding data on the wire congestion window congestion window is how much can this server actually put out there okay how much can it actually send know the thing about congestion window unlike the receive window the congestion window is not advertised I can't look within the packets and look at the packet headers and go oh there's the congestion whether that's how much this guy can send that's how much this guy can receive the congestion window is not a number that is advertised by the server he doesn't come out and say it what does that mean we got to figure it out okay now the goal of this server is to firehose if it can it wants to start sending data however it doesn't just come out the gate and slam data this would be a bad idea for the server right imagine if that's this is what the server did if this is like 10 gig attached and if the server is like hey NIC what are you your 10 gig suite boom let's kick 10 gig out on the wire just BAM fill this thing well odds are that I'm not 10 gig all the way from server to client even if he has this massive receive window even if the buckets huge and the sending bucket I mean I can't take this sending book and just shove it down this pipe what if the pipes like a teeny little pipe right so at the beginning of the conversation if it's a big file transfer a data transfer like we see here one of the goals of the server is to figure out how much data can we put on the wire before we start running into problems ok ideally the server doesn't put so much out there that it hits its head on the ceiling of the network bandwidth is what I like to call it okay so the sent the congestion window it's a sender side limit on the amount of data the sender can transmit before receiving acknowledgment from that receiver the minimum of congestion window or receive window governs data transmission okay so again going back to our pipe analogy with buckets on each side if I have a 10 gallon paint pail on this side that's my congestion window but the receive window is just 8 ounce glass I can only send 8 ounces however if he's got a five gallon bucket that he can receive water and I've only got 8 ounce glass over here then I can only send 8 ounces at a time so whichever one is smaller that will govern the rate at which I transmit now the thing about a receive window though is it doesn't depending on the TCP algorithm which is in play which to be honest with you in this conversation in this talk I don't have time to go over but I will point you to a very fantastic session that we have right here at shark fest mr. Simon back there he's gonna be doing a mighty sea what is it my TCP 8 your TCP go to that session because it's all he goes he goes into the different TCP algorithms TCP Tahoe TV Reno TCP new you know TCP these methods that TCP uses for ramping up the the ramping up data the very ramping up the rate at which it is sent while still avoiding congestion so but to keep it within our context here congestion window is some multiple of the maximum segment size all right so I take my maximum segment size I do some multiple of that number that is my congestion window now what do I start at when I'm at the beginning of a conversation most stacks will do the slow start method so initial size will vary it just depends on the that's in play it could be one two four eight times the maximum segment size what does that mean it means that if I got a bunch of data to send all send let's just say for example it starts at two I send two full-size packets I wait for all that stuff to get to my receiver receiver says great Chris thanks oh great he got to fold to maximum size packets he got two MSS a--'s sweet I'm gonna send four one two three four wait wait for all that stuff to get there he comes back with a couple of acts awesome man forward cool didn't hit any any thresholds I didn't bang my head on the ceiling of bandwidth let's send eight Bam Bam Bam all those eight get over to him he starts to acknowledge so the server will do something called slow start it starts to send data and then it will double that number of maximum segment sizes until it hits an internal threshold slow start threshold at that point and that's another number that we can talk a little bit about but he hits that number and then he just marginally starts to increment the amount of data he sends now the catch with all of this guys is that the server never advertises what that number is it's also not a fixed number it's a number in motion the idea is that we send as much as we can as efficiently as we can without hitting our head on bandwidth or retransmissions or causing causing network congestion from our own traffic TCP wants to figure out what's that float point where I can send data without retransmissions and still make use of your receive window so TCP is smart right here the the server is saying like hey self I'm gonna start sending slow until I figure out how much this network can take I'm just not gonna tell anybody how much that is we got to figure that out from the trace itself so let's go ahead and see this in play all right enough PowerPoint how much time do I have I got oh man we just have so much fun here okay I'm just gonna do this one and then I'd like to show you guys some more case studies there's some pretty interesting things okay so first thing I'm gonna do here's just a slow start good okay I'm going to take a look at our network around trip time 90 milliseconds okay where'd I get that from there's my round-trip between the sin and Cenac three-way handshake I send a get to that server okay 92 milliseconds later server comes back it sends me a couple of packets I see it ACK then this is where you're gonna start to see slow start do its thing okay when we're actually beginning to send data out so here we go so we have actually this first one okay there was a get in just a 2 packet response that's why we didn't see anything else go this next get this is a get from the client to the server this time the server has more to send this is an actual okay it's not a 404 not found this is a real file that I'm gonna send across to you you notice how many full-size packets the server sends we have one two three it starts to see an ACK come in we still have some room in our receive window over there on the receive side great the server paused right it didn't want to send any more than the this the minimum number of frames are just that getting started number of frames so it waits that 90 millisecond in this case 93 millisecond round-trip time and then it sends one two three four pauses okay it waits for row waits for this act to come in then we start to see the number increase one two three four or it actually put out there a couple of times waits another round-trip now we start to see the congestion window go up one two three four five 6-7 full sized packets weights to get some of those acknowledgments keep in mind these acknowledgments are in flight okay so it starts to receive those we're capturing client-side this time how many big packets we see one two three four five six seven eight nine so we're starting to increase than the number of full-size packets we put out there on the wire the reason we're slow starting in the congestion window and you can see this if you come into TCP stream graphs used to be trace this is a nice clean example of slow start right so it starts small let me put a couple segments out here all right just one let's add a couple let's add a couple let's add a couple let's go go go go go go go go now ideally I never hit my red my green line that's my receive window you see the space that we maintain between the data and the green line I got space so that if I had a hang up here it's not the fault of the receiver in fact you see our stepping what do you think that's caused by every one of those lines is a consistent number isn't it you see how those lines are all just about the same right so if anything let's check that out real quick now this wasn't a very long it wasn't a very large trace file or a long transfer so we don't see a ton of them but let me minimize this cool thing about TCP trace oh come on now all right I'm just going to move this out to the side cool thing about TCP trace is you can click at any point on this on this graph and you can you can look right next door and see where the delays are so if I take a look at let me look at this delay all right so I clicked just before the delay point I see none 91 milliseconds if I click on the next step there's my hundred so that's just my network round-trip time all right but it's doing slow start so and again this there wasn't a ton of data but the idea is I just want to see this guy go boom to the sky I just didn't have enough data to see it actually go much higher than past a slow start now at some point though what we are gonna I'm gonna lose my mic is what's gonna happen all right now at some point here you notice that we start to see about the same amount of data you see the the number of packets that's going out there is starting to look consistent right you see all these little I got my pause point but then you see like this line right here this line right here this line right here this line right here so it looks like I hit my internal slow start threshold that's the amount that the sender is going to be rate limited to it's it's rate limiting itself why I'd have to dig into the TCP stack itself and figure out what it's doing but basically it's limiting itself it doesn't want to use more than a certain amount okay now if I if this kept going and I was always seeing this then it means that my congestion window isn't increasing or or it's also possible that the application on that side just is not handing the TCP stack enough data to then send send more data or more water down the pipe it's got a big bucket on that side but there's just no water in it right in fact yesterday I'm trying to build to a trace file that I was given yesterday by Randall there's a gentleman in the audience over there that we were sitting and talking and doing some troubleshooting on one of his systems and we actually ran into that issue so hopefully I can show that so okay so point that I wanted you to take away from this two types of Windows receive window congestion window congestion window is the scary nebulous one this number that's set within that server it does slow start then it goes into congestion avoidance the goal is it doesn't want to overwhelm network it has no idea what is going on between both end points so it does a slow start and then it will eventually slow itself down to where it's sending at a threshold as long as I don't hit retransmissions now if the sender if that server hits retransmissions let's just say it slams data we start to see some retransmissions whoa okay or I have some dewbacks coming in it's going to need to resend data what that will do to the congestion window on the server side is it's going to say look I hit my head against the the network I hit my head against the throughput that our bandwidth that I have available to me I'm gonna reduce my congestion window I was too aggressive so let me reduce the amount that I'm putting out there on the wire okay that's what it'll do because it wants to avoid avoid congestion if it can alright so take away points again let me bring out my Prezi okay let me now show you a couple of interesting things that I've seen out there one is Oh TCP Delta oh I got to talk with you about that first okay now there's in my profile here we talked about this when I first started up my Wireshark is I have three timers I like it that way with my TCP plain profile a running total right Delta time but then I have this TCP it's a TCP Delta now what's the difference now some people have told me okay I they just bring up their trace files and they're looking for delays looking for pauses looking for things that are that hang up so they bring up they bring up a trace file with lots of different conversations in it then they come over here and they sort on the Delta time okay we sort on Delta let's go down to the bottom according to the Delta time my biggest pause quote-unquote in this trace file is 423 milliseconds that's the most amount of time between packets that I have in this trace file 423 but think about it I already told you I mean look at the green lines you have going here I already told you there's lots of different conversations happening in this trace file there's lots of different parallel TCP connections so this number is just telling me for the entire trace I only have this is my largest amount of time space that I have between packets is that a useful number to me at this point is that is that where my pause is is that where the delay is I've been looking for this delay that's a tough number to use why because I don't know that the number that the packet that came above this had anything to do with the conversation that that packet is a part of that makes sense follow TCP stream someone just said go ahead and follow the TCP stream okay I could do that let's do that I'm just gonna come down here say follow and I mostly agree with you I I wanna I want to isolate the conversation but I don't want to extract the data I just want to see the timers so I'm going to go to conversation filter TCP and then I'm gonna re sort my numbers okay so one what was a 400 was it 427 or something now that this conversation is in context with itself that just became 611 okay so in context now I have a single thread and I can see those delays but if I'm dealing with a TC I mean it's common in this trace file this was between a web front end and a back end database you've got tons of connections I'm looking for slow ones I don't want to see just Delta time sorted and guess what I certainly don't want to have to do follow follow TCP stream or set a conversation filter on every single one to then go and sort the Delta that call everyone agree that's a lot of manual stuff so what I'd rather do I want to sort on time since previous frame in this TCP stream now this number let me jump to the bottom here this number tells me not Delta between this packet and the one previous to it that is in the capture this tells me the amount of time between packets in that stream so now I don't have to filter it now I can go looking for my pauses in my delays if you do this first let me show you how to do it let me make sure because I don't recall if that's a default setting yet if you open up any TCP packet that you have and come down and do you have TCP timestamps okay not the option timestamps this is actual timers that Wireshark will show you if you don't have this at the bottom of any TCP header if you do not see time stamps then what you need to do just right-click the TCP header right click that guy come over to protocol preferences and calculate conversation timestamps ok if you don't have that do it why because it's a really great column activate those timestamps and your tcp that's your TCP preferences now I can come down to let me pick coups did I accidentally uncheck it I might have unchecked it oops ok now I got my timespan but ten timestamps back alright cool so now I can just pick any TCP packet I can come down to timestamps and I want to create a column time since for I'm sorry time since previous frame in this TCP stream right click apply his column now I have a column that will give me context based pauses in my TCP streams ok so now I don't have to go filter them all now I can just sort on that column and I can come let me jump down to the bottom here let me show you which ones to ignore or actually you might be able to tell me first of all got some resets up there right so this just tells me a hundred and thirty seconds in 125 130 seconds into some of these connections they were being reset it seems like they were just idle too long right or one side of the other was like hey let's reset boom shut shut it down hang up the phone after that we have some keeper lives after that we have a few acknowledgments you're gonna hit a big window or actually see the blue lines here my in my profile I paint my fins blue again my way or the highway right I see a bunch of 60-second TCP keeper lives so you're gonna see a bunch of stuff if you just take just a standard right off the wire trace and you sort on this column you're gonna see stuff like this look for look for numbers that are on these boundaries 45 seconds 60 seconds you just saw it 1:30 right those could be timers like keeper lives fins it can be timeouts it could be stuff like that what I look for let me show you specifically where my eye goes what I want to see is I said let me scroll down to it I'm a scroller sorry Laura's not here I don't tell her okay the type of thing that I'm interested in looking at you notice here it says get again I'm I'm sorted on my time column notice here that this says get requests requests get requests get that 30 seconds was just the client it could have been the client just waiting to do something client think time right the client was the one 30 seconds ago it did something 30 seconds into this stream it then sent again I'm not gonna focus a ton of attention on that at this time unless I start doing some sorting and filtering the complaint in this thread that I was sent was that the the the an alert was being sent by Oracle that it was taking too long to hear back from a reads there was a request sent to the server and it was taking way too long hear back from the database so I'm not as much interested in the request okay what I want to see is where we response when I see 24 seconds and it's a response that's one that I might want to come and say okay let's filter let me go to my conversation filter let's put this guy in context I'm gonna come over here to my frame number Andry sort alright syn syn ack ack wow we're in the microseconds case Cenac came back microseconds we see a TNS request that's request stated requests data happens pretty fast ACK coming back coming back boom 24 full seconds and that's when I see a response that's exactly what the client was complaining about so we're able to go in and see exactly which request it was we're actually going we're able to set a few filters and sing does this happen all the time is it just sporadic but the way that for me I initially found it I know that seemed like a blurring column right with all the different you know ninety seconds one hundred thirty seconds that kind of stuff but you'll start to get a feel for what you can overlook all right and for me I'm always looking for in that column if I ever have a huge trace file and I just sort it I'm looking for responses HTTP 200 any type of response from a server because that means that that delay was server-side all right so the app actual application was taking that long to respond I got one more case file I'd like to show you before we wrap coming up on 2:00 2:30 so hopefully this is all helpful to you guys and you can just see some of my tricks for analyzing TCP okay the next one I had a problem where yeah okay problem where a client was trying to access McKay client brings up his little file explorer' he's trying to access a certain file on an LDAP server couldn't access it couldn't get to it for about I don't know 20 to 30 seconds or so if he sits there and waited long enough his file begins to transfer and he can actually pull the file down but there was that twenty to thirty second pause and it happened every time they went to try to access a new file alright this is the trace file now if we take a look at the syn synack right away your eyes over here eighty-six milliseconds of round-trip time if I come here to the syn and I go down to whoops let me bring up my little headers here and I come down to the IP header my time to live on the sin is 128 all right that means that I am capturing this sin before it is routed this is a full teeth full time to live on the IP header we get a response from the server 1:22 unless something's messing with that in the middle which is possible that server is probably six hops away the final packet of handshake there's my acknowledgement going back to the server after that I actually said the client sends its 404 bytes to the LDAP server search request then something weird happens I get a response from the server it has data in it this packet is 563 bytes ok in fact I'm actually going to come in here and just do my TCP segment length right click apply is call them so now we have the actual segment link that was in the packet in fact right now that's more valuable to me than just overall packet length ok segment Li so the actual amount of encapsulated data within the packet I'm going to remove window size for now all right and tell you what this one - I don't need it right now all right so send my request I get a packet back Wireshark says whoa whoa whoa buddy TCP previous segments not captured this is the first packet that I got back from the server send my request okay let me back up to my request by coming into this packet I'm looking at my sequence number the the requester the client here is saying hey search request I'm starting on sequence number one my next expected sequence number will be 3 3 51 when he responds in his ACK the ACK part of his is a header value should be 351 so let me take a look at this packet coming back from the server 351 sweet he got my request it got there but look at his starting sequence number this is the first packet that I'm receiving from the server he's beginning on 1461 whoa whoa whoa hang on a packet above I just send him we were we were sequenced at one write the handshake did that that's that ghost byte that we sent we started one he's telling me he begins at 1461 what does that sound like what did he send before the one that he's on how much how much space is that when the 563 okay before this 563 he had a full byte packet or full MSS packet that he sent 1460 I never got that one but I got the residual one okay that's packet five client says so okay do pack if I look in that do pack and then the options and everything the way sac works the client says actually let's look at it it's a good opportunity to learn a little bit about sac client says okay right I'm gonna act one because that's where we were I'm good to one to where our handshakes were however if I open up the options and I come down here to the TCP option sac selective acknowledgment left edge is 1461 right edge is 2024 that's acting the smaller packet so the one that I'm missing is the Delta between where I'm good to and the left edge I'm good to one and I'm missing between one and 1464 1461 that MSS right there say hey server that's the packet I want but I got this little residual one you sent this 500 byte thing we're good there don't resend that then what happens I don't hear anything for 21 seconds sound kind of like what the guy was complaining about our goal is always to take what we hear the symptoms of what we hear and match them to our packets he said he was waiting 20 seconds to begin a file transfer there it is but what caused this well notice the size of packets that I see coming in 536 okay interesting right away I'm going 21 seconds that server is doing other things on the other side from the client perspective I don't have enough information let's go server side so we captured server side let me get that trace up for you and check out what this guy was doing on this server side okay ignore this red for now okay syn syn ack ack search request okay got the got the packet in from the client server sends of 1460 I sent my 1460 and my 563 everyone with me we're just capturing now from the server side so that 1460 I send it out there I send the 563 and then I get this ninety-three milliseconds later I get this dupe back saying hey I missed the 1460 server says okay you see the 2.9 five seconds on packet number eight server goes that's my retransmission time or we haven't done a whole lot of talking yet so my I'm gonna wait almost three seconds to retransmit this that's my retransmission retransmission timer at the time how come I didn't do a fast retransmission what what trigger is a fast retransmission triple doop I got to get three do packs and then I could that'll trigger a fast retransmit I didn't send enough data to get three acts so I have to wait the full TCP retransmission which at this time is three seconds I haven't even started slow start I haven't done anything else so I'm at the very beginning that that horrible three second retransmission timer which happens at the beginning of a TCP conversation worst time to have packet loss so the server sends peck at eight three seconds here is the big one nothing server didn't get anything back it doubles the retransmission timer packet nine send six there are six seconds later here's another big packet gets nothing twelve seconds let's try this cuz man then that client is not responding I'm sending this big packet finally service says you know ok packet 10 these big ones aren't working so you know what I'm gonna do the stack has the option of giving the minimum when in doubt MSS a shot you see that 536 on packet 10 536 if in our TCP header or in our TCP syn synack in our handshake if we did not exchange an MSS if MSS was not there the minimum MSS that will be assumed is 536 so server says you know these big ones aren't working last-ditch effort I'm just gonna give this a shot 536 BAM AK comes back oh whoa so it took the same data but it just sent it in smaller chunks and notice about a TCP segment length watch this we never see it recover it never goes bigger so now the MSS internally on the server side is this 536 we never we never try again now what happened okay well check this out look at the MSS look at the MSS on the sin as it comes in the MSS says 1460 this is on the sin right the server sends syn ACK 1460 there's our MSS right let's see what the client received remember fourteen sixty 1460 client-side again syn MSS fourteen sixty that's the biggest I can receive mr. server fourteen sixty actually got to the server server turns around fourteen sixty we receive the syn ACK 14:32 he let it go at fourteen sixty I receive it at 1430 - what happened something along the way his gateway actually reached up into that TCP syn and said oh no we're gonna we're gonna ratchet that down a little bit cuz I've got all this other stuff that's when acceleration I got some things I need some space the 1460 doesn't give me any room and the TCP headers to do any other fun stuff it's a full payload 1460 if you add a 20-byte tcp header that's 14 the 1480 a 20-byte IP header that's 1500 you just hit your MTU server thinks 1460 is legit right client goes Oh 1430 - ok cool that's all you can receive good that's all I'll send so one side thinks we can send bigger than the other cool server was on the wrong side of that deal so this 21 seconds is the server going ok these large ones aren't working let's try a small one and because it was so early in the TCP stream it didn't give the server enough time to reduce its retransmission timer because it didn't even enter a slow start so that retransmission timer would have gone from three seconds to way less once data started to actually move so what we do well the outbound router on the client side had recently been replaced and whoopsie they forgot to have that router it's a it's a command on common firewalls you can actually reset the TCP MSS at the router level if you know you have stuff between that you have to handle things like when acceleration things like other boxes that muck with stuff in the middle the router itself was going up and saying TCP you TCP MSS let's kick that down to 1430 - give me give - give some room for some some header room for this other stuff so we went on that router the outbound router adjusted that and by the time this MSS or this sin rather the first sin from the client the client lets it go at fourteen sixty the router fixes it to fourteen 32 by the time it arrives at the server the server goes out 14 32 well 14 32 let's just do that it's not negotiated but each side can agree like so sure all right that worked even if he sent a 14 60 the other way it's still getting sent down to 14 32 so they both won't go to that upper limit of packet size once that happened 14 32 was the MSS we had enough ceiling in the middle to get through the other stuff it had to do and then we were good to go alright guys this was just one thing that I've run into got a lot of traces and a lot of fun stuff to work through but time time has run out so I had a great time today I hope you guys did found something useful if you have any questions please come up to me I'll be around for the rest of shark fest I'd really like to get to know and meet more of you guys so please come up let's chat and thanks for coming [Applause] [Music]
Info
Channel: Chris Greer
Views: 46,345
Rating: undefined out of 5
Keywords: Wireshark, TCP/IP, analysis, slow network, slow application, troubleshooting, TCP, Transport, Protocol, Packet, TCP Handshake, wireshark training, wireshark tutorial, packet analysis, packet capture, tcp analysis, tcp connections, wireshark tutorial 2020, tcp basics, wireshark for beginners, packet pioneer, chris greer, sharkfest, wireshark course, how tcp works, how does tcp work, free wireshark training, free wireshark tutorial, network troubleshooting, tcp fundamentals
Id: NdvWI6RH1eo
Channel Id: undefined
Length: 72min 3sec (4323 seconds)
Published: Mon Sep 10 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.