SF19US - 21 Troubleshooting slow networks (Chris Greer)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
history on me so my name is Chris Greer I work for a company called packet pioneer I do Wireshark training I do consulting people send trace files try to help them get an answer to their problem when I'm not doing things like that you can find me blogging speaking at different things like shark fest and just trying to help people make sense of packet captures now I know packet capture is gonna be pretty daunting it can be confusing especially when we're getting started so the way that I teach I try to give you as much hands-on as possible because that's how most of us learn how to do this right now one of the common things are one of the biggest complaints that I get called in to is troubleshooting slow networks so when's the last time your users called you up and said the networks running too fast right I don't typically get that call too much well thankfully because I can still you know make money and you know put food on the table but we're going to talk about troubleshooting slow networks in this session now if you were in the pre-conference class this is exactly the point that basically we left off at if you weren't in the pre-conference class no worries as long as you've attended some of the other sessions about doing some basic TCP filtering and looking at TCP conversations from more than one point maybe using some of the TCP stream graphs so if you have a handle on some of those functions then that's exactly where we're going to be picking up this this topic again no doubt many of you are here because you are blamed for slowness on a network right so there's a performance problem people are getting spinny wheels the progress bar isn't coming across as fast as they expect it to because you just installed some new amazing 10 gig link across the ocean and people expecting you know you go from one gig to 10 gig so we should get ten times the performance right well then something goes wrong and of course they blame you they blame the network well so in this session we're just going to talk about the approach to troubleshooting some of those things and specifically when the problem is TCP related when it's not the network and how we can look into different facets of TCP and not just improve performance but really get to the bottom of what line of why things are slow all right so that's that's why we're here today now like we just mentioned slow why are we still talking about slow you would think by by 2019 we would have this problem licked you know here we are we're facing 2020 but still slow is something that we battle not just disconnects and application drops but but slowness performance problems and this is in a day and age where we have fibre running everywhere we have links that are 110 forty hundred gig connections and beyond so really when it comes to capacity and bandwidth we have more than we ever have had before but things are still slow so we're going to talk about why that is in a lot of cases of course we can't cover everything in an hour and a half but there are some key things that we can look for especially when we're doing file transfers that can help us to get to root cause now just because one thing I want to just kind of hit right off the bat is when we're installing new connections and expanding out the network something we really want to make sure that we do just from the get-go especially if we're coming from the network side of the house is that we validate that we're getting what we're paying for you want to make sure to validate your network the last few times I've worked with people doing slow file transfers and having problems in those types of environments it really came resolving the problem didn't come down to packets resolving the problem came down to a few different iperf sessions and we were able to find the device that was dropping packets how do we just started with iperf and validated the network from the beginning then we may have found that link a little easier than digging through packets first so make sure it doesn't necessarily have to be iperf which is an open-source throughput tool throughput test your network just do it make sure that you know the capacity the bandwidth of your connections especially if it's a new new environment new area of your network that you're bringing up make sure to test it don't assume that you can plug in a link see a light go on and that just because you see a 10 gig light that means you're getting 10 gig ok make sure to stress test your network now that's a whole conversation as far as tools and how to do that and there's hardware analyzers that do it there's obviously I mentioned iperf is one but definitely something that we want to make sure that we do ok that's just from the get-go just as Network engineers because when we're approaching things from a packet level it can take a long time to get these packets to open up these trace files to understand them to look at TCP conversations look at these TCP graphs only to find out that we just had some packet loss on a link we probably could have found that another way right so let's let's make sure to test the speeds and feeds the highways and byways and validate that we are getting the kind of van within throughput that we expect to get ok that's just a as network engineers best practice now what are things that cause networks to be slow we're really going to focus or talk generally about these three points today so packet loss or congestion those two can go hand-in-hand TCP protocol behavior we're definitely going to spend time there and toward the end if we have some time I'm going to show you some examples of chatty applications now well packet loss in congestion hopefully is Network people we have a good idea what that means right so literally losing packets on a link somewhere along a path between a and B congestion traffic increasing on a certain link at a certain period of the point in time during the day and affecting the they're competing traffic for that link the TCP protocol behavior like I mentioned we're gonna talk about how to CP works with windowing and how it decides to send or receive and how much should be out on the wire and give you guys some practice there and so on so let's go ahead and get into it so first I just mentioned this let's make sure that we are getting what we're paying for we want to make sure we test our network links there's one example of an open open source tool we might have to run several sessions of it to really get really get the the the throughput that we need to successfully test a link sounds simple but make sure we have a good roadway - then throw packets across now we want to watch for signs of packet loss the tool itself should tell us ok I was able to achieve 900 megabits per second out of one gig and I had this percentage of packet loss all right now what does packet loss look like as we're going across the network we want to make sure that we have management level access to infrastructure devices along the path things that we're going to look for link level errors FCS CRC discards things like that in our trace files we're gonna look for retransmissions dewbacks out of orders things like that so that symptom can lead us to looking back to the network and looking for those link level errors now there's definitely tools out there that will tell you hey that switch over in that closet on port 3 is having FCS errors go look there right so hopefully you have some type of network management and monitoring going on that can help you to find those things but the signs of loss want to watch those discards link level errors and so on now let's actually start having some fun now in a perfect world when it comes to network performance and behavior in a perfect world we have no network limits no receive limits and no send limits this client says hey server I want this file that file service is great here's your file and there's absolutely no limitations whatsoever the server can send as fast as it wants the network can handle all the traffic that is put across it and the receiver can handle any amount of ingress traffic there's our perfect world right that's when you and I actually start losing our jobs because things are running too well as we know this isn't reality right the fact of the matter is the network does have limits the server has a limit on how much it can send the receiver has a limit on how much it can receive the question is which one of these is hurting you or is it a combination that's what I want to teach you how to find in this class okay specifically if you can walk out and have a good idea how to figure out which one of these three is limiting you this is after you've already stressed test your network and you've patched the holes in the road and you've shaved down all the speed bumps and you've made sure you don't have a lot of discards and errors on your network but our goal here in this class we're going to find out which one of these three is really causing our issue like we said the network has limits between two endpoints we could have a very small amount of bandwidth we could have a large amount of bandwidth the thing about it is that those two endpoints don't know what's in the middle let's start with that concept if you're a client how much I'm a server you ask me for a file I pull up that file and I get ready to send it to you TCP does not know what's in the middle right when I'm first getting started and sending you something we don't know what's in the middle that's part of its job I want to figure out how much can I send you without loss without causing my own congestion but at the very beginning it doesn't know right so I'm not going to assume that we have a 10 gig connection with Lola agency and I can just firehose traffic across this link right what would be some cons to a server doing that let's just say a server is 10 gig attached if you asked me for a file and I literally just blow the doors open on that 10 gig connection as I send you that file what would be some downsides there that you can think of in your head what's some downsides to a server just opening up a firehose and you asked for it here you go boom go ahead sir ok what if what if the firehose coming in can't can't successfully fill the the 10 gig what if on the other side of that 10 gig I'm going down from 10 gig to a 1 gig link that's going somewhere else there's a choke point right any other thoughts as I could have a competition for the link yeah I mean that server isn't just serving me it's doing other things with other services with other users right so it's not going to say okay a whole hog you get the entire network connection and we're the only ones here so I'm just gonna go ahead and slam this thing and assume that 10 gig is our our weakest link it's not right so the server at first doesn't know what kind of link we have between us so let's go ahead and illustrate this all right this whole concept of TCP windows receive window congestion windows send limitations network capacity okay and this is one way that I thought of how we could illustrate this all right so you have an empty swimming pool I've got a full swimming pool and you at you just asked me for the water all right that water obviously would represent our data what we have between us is 100 feet of three-quarter-inch garden hose all right a hundred feet of three-quarter-inch garden hose let's start with this number let's begin here how much capacity cannot hose handle if you put your hand on one end and on the other end I start pouring water into that hose how much water do I have to pour in before you fill water come up against your hand on a hundred feet three-quarter inch hose think about that for a minute maybe I don't know if you got some people are very good at math could just do that on their head I actually went out to mr. Google and he answered that question for me so a hundred feet of three-quarter inch hose holds about two gallons of water all right that concept makes sense so right there there's our network capacity not three not one gallon I can have two gallons of water on that hose at once all right according to mr. Google this could be wrong so don't go testing it don't you know take this as gospel here that seemed to make sense to me now what does that mean that means let's just say two gallons of you know let's use milk jugs because those are for you know for people who aren't on the gallon system or they're on the metric system like the rest of the world so let's just use those two images right there as gallons so that's our network capacity all right right there is what we have to deal with now at the beginning I don't know that we have to two gallons of capacity I don't know that all I know is that there's this link between us I don't know if it's a drinking straw that's really really long I don't know if it's a huge city pipe that's you could walk through between us right so I think a t1 vs 10 gig connection all right I don't know which one it is so we also have something called send and receive windows now you guys have heard this you've probably done some analysis on TCP connections before let's just illustrate those real quick so to fill your pool you're going to receive water from the hose and you're just going to pour it in the pool you're gonna catch the water in this glass you're gonna pour it in the pool you can't receive a need more than that glass if a gallon comes out of the hose and into a pint glass what's gonna happen if a gallon of water comes out of that hose and into a pint glass what's gonna happen it's gonna overflow right I can't carry a gallon in a pint glass I can only carry a pint glass so it's gonna take a while but I'll take that plank glass fill it on the hose and pour it in the pool fill it up pour it in the pool I can only receive one pint glass at a time on the sender's end because there's a plank glass over on that other side the sender can't send any more than I can receive he can't go despite what the network handle the sender cannot send more than that pint glass period no matter what the network can hold all right there's your sin window so whichever of the two of these is the smallest that controls throughput so let's just leave this picture alone right now if we left this picture didn't change anything are we making use of the network are we fully utilizing our network at this time not yet right we're only sending a pint glass through it it can handle two gallons so a lot of the time when we're troubleshooting issues not all but when we're troubleshooting slow file transfers this is something that I want you to start watching for immediately this is your first symptom are we using the network to its capacity all right is TCP either with a sin with sin window or the receive window not sending what our potential is we're going to talk about the potential in just a minute here all right everybody good their question questions at this point so we have sin window this is our capacity we can hold we're also going to talk about how to calculate that bandwidth delay product and our TCP receive window okay now not long after a TCP file transfer happens with modern stacks we start with the pint glass thing we do this tcp slow-start thing but in many modern applications and modern stacks will say okay I was joking about that whole prank glass idea I'm gonna go ahead and really give you something to work with mister server let's increase this to five gallons I can't just grab a single pint glass I can get five gallons from you send away okay so it's not uncommon to see not long after a connection begins the receiver below its receive window to the sky and say you know what I can handle a ton right now if that's true the receive window can truly handle that amount of water our network would be over congested to send that much everyone agree the network can't handle that much however my clients saying my client doesn't know yet what the network can do so it's just saying hey go ahead and send me everything you got so the server is like alright you know what that's cool I know you can do five gallons I understand you're awesome you're you're a rockstar we still don't know what this network could do so why don't we just move away from this plank last thing and let's go to one gallon alright so I'll go ahead and increase what I'm gonna send you I sent a gallon does the gallon is that enough can the network capacity hold that we okay it's less than two that one gallon pours into the bucket on the other side we're good dump it into the into your receive pool fill up your your your bucket again so we're still okay now that went so well you told me that you got everything iris that I sent there was no loss not even a drop of water went on the floor so the server goes you know what that's great let's try two gallons let's double that thing let's try to turn up the volume a little bit so now we're at two gallons of send this is now where we hit the limit of the network all right so we're at our network congestion point the server though many stacks these days won't stop there because it doesn't yet know that we've hit our head in the ceiling what's a symptom that we've hit our head on the ceiling and we're gonna start losing a few drops here and there if I'm in a trace file what symptom am I going to look for that yep I hit my head packet loss if I put three gallons shove that down that that pipe at once and say I know you can only do two but you will three what's gonna happen my pipes gonna burst and I'm gonna have water spilling out a lot of it's still gonna get to the other side but I'm gonna have packet loss okay that's my first symptom that I just stretched things too far so that packet loss or congestion depending on the TCP algorithm and it's using if you guys went to Vladimir's session yesterday I think he's repeating it today or maybe tomorrow but anyway that's all about that congestion window algorithm how that works and how some are lost space summer congestion based but basically the bottom line is once I can sense I tipped over what what the networks comfortable with what I'm gonna do is say you know what my bad whoops let me go ahead and back up back off No so what we'll do actually on this side we'll just reduce our sin window now the other thing that we need to keep in mind is it at any one time at any one millisecond at any one microsecond the network can go from being able to handle two gallons at once to down to a pint glass thinking as Network people what can cause that to go from being able to handle two gallons down to a pint glass what's that yeah competing traffic we're not the only ones that are using this pipe this isn't a dedicated line between our swimming pools in fact all your neighbors have swimming pools too they're using this pipe too so it could be at any one moment our network capacity it doesn't stay exactly at two gallons even if TCP figured out that two gallons and everything is just Rockstar awesome our network capacity can change contending traffic most of the time that's what it is it's that congestion so we only get to use what's left over now depending on the TCP send algorithm in place that will determine how recive we are with our traffic as far as competing traffic all right now let's talk for a minute these are these are some numbers that as Network analysts you want to know about your network if you haven't ever worked out the bandwidth delay products before let's talk about this for a minute let's actually throw some numbers at it so here we've got our our hoes we've got our two gallons of capacity let's actually turn that into some numbers let's just say you had a one gigabit per second link and to end you shot it with I per if you know that it's a gig it's not 445 no no it's exactly one gig per second you've shot it you know it you're able to maintain that level of utilization without too much loss alright if we have a round-trip time of a hundred and fifty milliseconds okay what we want to do is work out how much data can I actually have on that wire and not just in one direction I want to be able to send you actually receive my data and you are acknowledging that data I get your acknowledgments before I stop sending that's my goal that way we don't have gaps in our send so we take our one gig per second multiplied it by 150 milliseconds that gives us 150 megabits per second that would be our capacity let's change that to bytes instead of bits we want Big B because that we're talking about the amount of data not just the rate that we're sending at for a one gig link at 150 milliseconds of network latency I should be able to transmit 18.75 megabytes at once okay I want to be able to sustain that amount of traffic that would be my number my golden number that I'm working with there's our bandwidth delay product now let's just say there's our two gallons up there okay that's our target when we first start out the TCP connection again we don't know what that number is so let's just say that the the receiver gives us a nice conservative receive window let's just say it's 65 FK Sen window let's start something small let's start at 8k again I don't want to overrun my network now let's go ahead and have you open up the example number one let me tell you a little bit about these trace files and we have three sets of trace files exercise one and two go together exercise three and four go together exercise five and six go together exercise one is the client perspective of a file transfer exercise two is the server capture of that same file transfer so what I want you to do is get comfortable with looking at trace files from different perspectives from both the client and the server these were all taken in a in a demo environment they were actually gifted to us by mr. sake so I really really appreciate that he did this for us he said hey use these traces because he had them all queued up and everything if you were in can I project it no okay if you were in the weekend course you know exactly where the instructions are to start up this lab or start up the exercise if you were not in that class then just go up to statistics and go to capture file properties and you will see some questions in capture file properties that will guide you or ask you leading questions to go through the trace file okay that's where you're gonna find your questions to answer you're also gonna notice just a quick disclaimer as well mr. sake gave us these trace files but he wanted to be sure that that at the bottom there's a disclaimer a share license in there so these are his property so no one go make a million dollars off teaching with them right these are his property so please just use use it to your own your own expand I don't know what yeah just use for your own personal use I did get permission to add the questions and to change the final names but all right um so here's what I like you guys to do now that we actually are back did I have anything else to do nope power went off right on time okay so I'm going to give you guys a little bit of time and after we're gonna walk through this example and what I want to do is just show you some of the tricks that you can use with Wireshark to do some of this analysis and find these kinds of issues a little faster so you're a goal this is just a simple file transfer you're gonna notice if you have all let me blow this up for you there we go you're gonna notice if you have like a bad TCP you go up here you hit bad TCP well there's no retransmissions in this trace file that's good so we just added our tcp analysis dot flags and no analysis window update things are looking clean from that perspective however this was a slow file transfer our network capacity he told me he had it set to a hundred megabits per second so we can do the math on I believe this was a we have 1 Meg file that he was moving across 100 Meg per second link we can think about how long that should have taken so here's my question to you guys on the client side look through it what do you think is the root cause or what you which bucket would you blame in this trace file once you've come to a bit of conclusion go ahead and open up the server trace file and then you can validate whether you got that right or not or if there if you get any further information based off the server side all right I'm gonna give you guys five to ten minutes I'm gonna go Rove around you have questions you get stuck raise your hand let me know otherwise I'm just gonna let you check this one out for a little bit okay all right let's check this one out so first how about tell me about the throughput what you guys think would you measure it as 8 what 1.8 gig okay anyone else what do you think first of all are we happy with the throughput right there's a reason why we got called the troubleshoot right it was slow there's a couple of places we can look one quick and dirty one if it's right there under your properties what do you see under your properties if you go to capture file properties what's your what's your just run-of-the-mill average bits per second what does it tell you there you just said 10 megabit does everybody see that easier said Kay so that's a spot we can head check but where's another place that we can measure the throughput ok just heard the UH gentleman back here he said throughput graph did anybody use that yeah ok let's just give that a shot first now this is a quick transfer as you can see it didn't take very long right our trace our whole thing is what a second here so yeah we're under a second so let's take this graph with a bit of a grain of salt if we come down to throughput did everybody see that III think it went too fast you're real so stats I come down stream graph I'm just going to go through put alright I just want to see now the other thing I also selected a packet going in the direction that I wanted to analyze ok if I had selected one of the acknowledgments this graph wouldn't look the same if you accidentally do that if you go to this graph and it looks like nothing it could be that you just picked a packet that was going in the act direction not the data direction okay just uh just something to remember so if your graph looks weird first go and try to hit the switch Direction maybe you just got the wrong one we can come up here and make sure that our server is actually the one that's sending us the data alright so here on this graph we have segment length so the size of the packet or that that data that was coming across sometimes you'll see this and it's like jumping all over the place that's kind of interesting we might want to see if the if the data size is small consistently maybe we have something in the middle that is breaking them up or we have an MTU somewhere or maybe the the method that's being used to transmit this data to the other side is is doing that so that that's a symptom but here we have we got large packets we can also see we're less than a second so the amount of time if this was a longer trace file we might have a nice little graph that would go up for a period of time and then come back down but the the maximum point that we see on this graph it says just over 8 times 10 to the 6 what does that mean okay so so 10 to the sixth is yeah well 10 to the 6 is one Meg so we just multiply 8 by it and we got 8 Meg's per second now remember what I told you to start this is a hundred Meg circuit so right there we used 8 Meg of it for this file transfer there's one thing to think about what next so we measure our throughput what other interesting things did you see how about the handshake what do we notice there okay so our handshake shows us our round-trip time which is 10 milliseconds missing the window scaling everybody catch that let's go into the syn let's look down at the options if we come down to the bottom and we take a look at our flags and I'm sorry not yet sorry not a flags are options there we go okay we come down here we can see we have MSS maximum segment size we have sach permitted and like was mentioned in the back we don't have window scaling now again what is window scaling there we go so instead of working with a pint glass size field we can say give me five gallons we can take the receive window give it a multiplier and make it much larger right so that option is missing so the clients saying I'm not doing that whole window scaling thing forget it so the number that you see on my receive window is real and in fact one gentleman asked me he said I I saw the handshake I put in a hard set window scale factor he went into TCP and he actually gave it one and it did use it right so it the hard set scale factor if we can figure that into Wireshark it'll actually override the handshake setting which to be honest I just haven't had to do that before so that's pretty cool all right so we're not using a window scale so let's actually come down here let's take a look at our calculated well this is going to be real as well I just have calculated window size as a column so does this guy ever go down does it ever go to zero no but right away just the fact that we're using not using window scaling and this number never goes north of the teen's I don't think so 15 K 16 K what was the largest you saw it pretty low right now in this direction let me see here I can just what was it 15 928 there yeah all right so if that's the largest window we can receive there's your prank glass all right so yeah our network was a hundred megabits per second our network round-trip time was 10 milliseconds so how much can our garden hose hold in this case 11.6 megabits or megabytes did we divide by 8 bytes so we have to take it was a hundred megabits per second right let's take hundred megabits per second multiply that by 10 mil seconds divide that number by 8 will give us bytes that's our product that's our bandwidth delay I don't have a calculator in front of me if one of you guys want to tear on that number but right away just with that value we can determine even from the client side that our receive window is way too small 15 K nah now we can it's interesting to look at this trace not just from the client side let's actually pop open the server side trace - did any of you guys start doing that one just yet let's open that one up let's go over there actually you know what before we do I'm gonna show you one other thing let's go to here's my here's a data packet here I'm going to come up to stats I'm gonna go to stream graph so let's go to TCP trace and I'm just gonna zoom in so from the client side we can see we get some data coming in when when the when this line right here starts to go vertical that's data on its way into me that's data coming at me the green above that is the graph of the receive window so I always want that green line to be above the data that's my space that I have to work with I never want those vertical they're actually little packets if you zoom in far enough you'll see their little eye beams but I never want those guys to touch the green line that means that I ran out of receive window my receiver with the with the pint glass I already sent him a pint of data and he's already full he can't receive any more data he tells me stop sending water my pint glass is full I know it's small but it's still full alright so I never want to see those lines converge let's open up the server-side trace and see if that's what happened come back and open that okay talk to me so first just doing a quick little scroll what do we start to see right away on the server side that we did not see on the client side window full so first let's read this let's see what this means is this the fault so this is on a packet that is sent from client or server server is sending this packet all right Wireshark is tagging it as TCP window full does that mean the server is doing something wrong all right the client can't handle it so just at first when we see those black lines and red letters and look scary we can think oh man what's the server doing and I've been asked that like here is this trace file but why is my servers window all out of whack that's nothing to do with the server it just means the packet this packet that is sent right now is filling the amount of capacity that we have the work with which is the receive window all right so let's just see how we get there let's let's kind of work up to that if I go up to before where that happens all right I got my handshake I got to get I got an okay server just launches three big packets those are in flight on their way to the client I have a column here bytes in flight very nice column to have if you don't have one on a profile I highly suggest it when you're doing this kind of analysis bytes in flight means this is how much data I have outstanding that has not been acknowledged yet okay it's a good way of getting an approximation of how much that server is able to actually put out there all right so if if you have that that's nice where you can get that on TCP if you come down to sequence acknowledgment analysis you'll have bytes in flight if you do not see sequence acknowledgment analysis bytes in flight if you don't see that it could be that you just have to right-click TCP come on over here to protocol proofs and we want to make sure that track number of bytes in flight is selected it could be that you just don't have that selected all right so we can select that make sure we got bytes in flight so here we go so the server has let three packets into the into the garden hose it's on its way to the other side yes we have an acknowledgment from the other side after 10 milliseconds it's going to continue to send we come down here this is where we start to run into problems so we have our calculated window side on our other side I'm getting it at 11 584 okay that's the receive window my outstanding bytes in flight once this packet goes out the wire is gonna be 11 584 this packet fills the other side this will fill up his pint glass I can't send him anything else until I start getting his acknowledgments coming in letting me know that he's he's pouring water out of that pint glass and I can only send what he's poured out into the pool okay this will stop me from sending that's why I send that that packet but I gotta wait now to help us to visualize this a nice graph let's go back to our graphs on the server side trace file so this is a simple concept but I like to use traces like this that are easy to break down to help us better understand those stream graphs so let's have everybody open up the TCP trace graph let's go to TCP stream graph let's go to TCP trace and on this side let's go ahead and zoom in what do we see here that's different than what we saw on the other side how does this look different than the client side looked sorry nice job he just said those uh the packets right are my little eye bars what line are they touching the brown one are the green one upstairs there it's like right up against the green one in fact remember what I said you never want your data to touch that green line that green line is your ceiling all right I can only receive this much if I go up to touch it that means there's no space between what I'm sending and what he can receive above that so here we can see that our data just immediately if you if you had a server-side trace and you were troubleshooting an issue like this open up your TCP stream graph and look to see do I ever touch that green line with data okay does it ever go up and touch that if it does then my receive window is giving me a throughput ceiling my data can't blow through that ceiling why I got a pint glass on the other side until he tells me I got five-gallon bucket go ahead and send more data I can't put any more than that pint glass out on the water out on the hose okay so fundamentally this one a bit of a simpler issue but it can help us to understand these stream graphs this one receive side window limitation it was not using window scaling that restricted our throughput to eight megabits per second instead of a hundred Meg which we hope to expect on a hundred Meg link okay one other graph let's go reset I want to come down here to window scaling a little more interesting when we actually have a scale so these are segments these blue lines these are segments and we can see this is actually bytes out or bytes in flight we can see it crawls up and it hits the Green Line and it stays up there so our throughput ceiling is bound by the TCP receive window I'm gonna bring you back here a few times okay this is just one example of a throughput issue like this okay before I move off of this example any questions so far I feeling okay those are just a warm-up for you get used to some of these windows get used some of these buckets these stream graphs so now let's look at another example let's go ahead and open up exercise 3 and exercise 4 okay this one was a little bit different you can get in there and take a look at what you see again what are we troubleshooting we're troubleshooting low throughput low bent or low bandwidth usage in these types of file transfers they're absolutely going to come come up to you and say the network slow what the heck's going on so let's go and take a look at that start with exercise 3 you have some questions that can help to guide you in the pack and the frame cott I'm sorry they file comments and then once you finish up with 3 we can look at four and and we'll compare those I'm gonna go ahead and robe if you have any questions okay everybody let's chat so this was the client side okay so what I'm interested in let's just jump right out the gate and take a look at let me just do a quick little scroll just see how data is moving if we come to stat stream graphs throughput I'm interested in the throughput here so so we in the throughput we go up to what level and stick there was our throughput here yeah so someone said 48 Meg's right so 10 to the 7th if if 10 to the 6 there's 1 Meg let's just add a 0 right go to the the next power so that's 10 Meg multiply 10 Meg by 4.8 that gets us to 48 Meg are we happy with that hunter Meg link depends depends on I mean it's it's not a I call this screen punching slow it's not screen punching slow but just gives us an idea of how much of that capacity we're actually using okay so let's take a look at how this was different or what if any types of limiters there are on this one if we go to this is on the client right so yeah so on the time sequence side of things let me zoom in a little bit here so a lot of times one of the things that slows us down one of the things that you want to look for when you're looking at that time sequence graph or the Stevens graft or anything is what I call stepping okay so you have stuff than nothing and stuff then nothing then stuff then nothing that nothing is when you're losing time right think of those as air gaps in the garden hose that's when I have water that's moving through but I didn't have enough water I can't send enough at once to fill the whole thing so I just have to put a pint glass in there that pint glass of water is moving but there's a big air gap behind it so there's a period of time when we're not sending anything so we're not efficiently using the throughput as well as we possibly could right we still have those delays now real quick what we can do is we can come over here and take a look at drags and one nice thing about these graphs is we have our little circle there and we can click on a very specific area so if I click right here I can come down I see 12 milliseconds come down here I see that 12 milliseconds again go to the next gap 12 milliseconds 12 milliseconds so about 12 milliseconds so what that that space in time where I'm not transmitting that's that's how long that delay is all right so let's think about that for a minute let's actually switch to the server side and just see how this graph changed and see what we can learn from that side let's close this down we're just gonna jump up to you here we go all right excuse me you come down to detail what profile opened all right so here's from the server perspective so we have our since in a CAC just curious are we using a window scale on this one I'm seeing this and I'm seeing this where do I look for that handshake right so let's just take a real quick look come down to the options that we are using oh that was on the AK not sin pick this increase come on all right MSS sack permitted we got timestamps no app just to fill in the the header some packing peanuts there and we got a window scale of x eight that's on the client side the server side comes back window scale x 16 just curious you guys ever see a window scale that's zero you ever see that you might see that if you ever do windows scale zero that just means you can go ahead and use window scaling over there I'm just not going to because I'm not sent you're not sending me stuff I'm sending you stuff so you're just this lowly little client guy I'm sending you traffic that way you're not saying anything back to me so I'm not going to dedicate a bunch of resource for receiving your data I'll let you do we can do this window scaling thing I'm just not gonna make a big deal out of it okay moving on Oh yep that means exactly that means your little window scale thing that you want to do mr. client I don't even know what you're talking about so what I won't send you any more than your actual advertisers receive window is so his question was just for the recording with the window scale we need to both we need to see that option in both sins right in order to for both sides to advertise that they support that option that is true that's one reason why a server might not use an actual window scale factor but they will put in the window scale they're they're just not going to use a value because the server is like look uh oh I need a big window when you're sending me data if I'm sending you data I don't need to to plug that resource up just to keep a connection open question yes suck is this so if we in that handshake we both have to have we need to agree on well always be careful with the word negotiated it's not negotiated this is an advertisement of what you can support and if you don't support sack or know what that is then I can't do it either right same thing with window scaling if you don't know what window scaling is and we're not going to use that option in this connection I can't use it either we can have differing m SSE's that's just an advertisement of the payload size that you can receive we can disagree on that we don't negotiate that if that MSS anyone know just a trivia question if MSS is not there speaking of TCP options real quick if you send me a sin and it has no options not even MSS anyone know what the maximum size packet that I should send you is Patrick knows 536 that's exactly it Patrick good job so when in doubt that is the when in doubt MSS now that said now that I have said it and I just got recorded you can probably go find a stack out there that breaks that rule and sends 1416 anyway but yes a lot of these are these are advertisements of what options you can use throughout the connection alright so in this example I'm just gonna pick a data packet I'm gonna go up to statistics I want to come down to stream graphs and let's just start with TCP trace just just to have fun let's go into zooms I'm gonna zoom in again what again what behavior do we see here in this trace we see stepping and what thing do we never want to see especially on the server side what line are we touching that green line that green line represents what receive window so on the client side that is the amount of space above the present secrets number that I have in my TCP receive window all right so right now I can just look at the TCP trace graph not only do I have that stepping thing that means I've got these pauses but the reason for those pauses I send and I have to stop because I have touched that receive window in fact as soon as the client let me zoom in a little a little bit more as soon as the client sends these acknowledgments he's increasing his receive windows or rather the receive window is giving me space to transmit if even at 1 packet at a time right so I these acts are coming in and letting me know I have a little bit of space a little bit of space a little bit of space so that server saying great here's data here's data here's data then we have around suffer a round-trip and then here's some more data here's some more data so we're hitting our heads on that receive window on that side now we can see that if we scroll down a little bit you see where this starts to take shape here here's our bytes in flight our receive window let's come up to byte come down to bytes in flight right around here right around here so packet if you have those values about packet 331 the sender for me I've got bytes flight so I can see I've got 123 243 in flight the receiver has 124 328 total window that's how much can be outstanding now these two numbers if they're ever equal if they're ever the same that means that my bytes in flight has hit the ceiling of the receive window in this case though I don't see those black lines with red letters saying receive window full or TCP window full why is that on the last trace I saw those black lines window full window full wonderful how come we don't see that here bytes in flight you see I got 123 243 123 243 right what's the difference in these two numbers let's take 124 328 and subtract 123 243 could someone do that real fast 124 328 - 123 243 who could do who could do that real quick we had someone in the class over the weekend that was just boom they had the math down what's that number what is it 185 so I'm telling is that it 185 one 185 so basically the client is saying I only have one 185 1185 bytes left before I hit the top of my window so I am almost full the pint glass has a shot glass worth of space up there because it's basically I'm saying that's all the space that I can receive the server saying okay great all I can do is send you a shot glass at a time okay here's two packets but the reason why we don't see that black line is because we don't actually completely fill the window the server wants to be efficient right so you can see you can see all of these packets are my maximum segment size 14 for 48 so at no point does the server say you know this is all you've got for space let's go ahead and drop the MSS a little bit let's go down a little lower and just completely fill you up that's not an unusual behavior to see on a server if you have a lot of data to send the server would rather send it out at the full MSS rather than stop give you just a junk of one of those packets and then keep going full MSS instead you'll see that behavior as soon as I say this you'll find a server the behaves differently but a lot of them want to use that full MSS size that's why we're net we're not completely filling the window but we're going as full is we can with full packets the math isn't falling right on that line there's a question it's a great question the question was is this telling us that the scale factor is not big enough or that the client is not fast enough to receive or to use a larger window depending on the application I'd have to go and see what exactly the application and operating system was but in this case that's exactly it the client wasn't using a large enough window to fill the network could have been this was an old crummy stack that needed updating could have been that it was an application that was overwriting the OS stack and saying use this use this feature when you use this application so yeah this was definitely receive side simply was not using a window size big enough I know we've seen different options like within windows like receive window auto scaling you've seen like different features like that start to come out to address things like this to give that client the ability to go up much higher with its receive window all right I got one more example for you guys before we uh man see what happens all right let's go and open up exercise 5 & 6 while you're working through it I'm going to take you through it this was exactly the same type of scenario you got a client you got a server but this time you have exactly 0.1% packet loss you heard me point 1% packet loss okay so go ahead and open up the trace you can start client side take a look at where those retransmissions happen you can watch TCP try to recover while you have that open and you're looking at it I just want to talk to you a little bit about that send window when we see loss so you can go ahead and open that up I'm just going to go straight to the server side of this one okay this was 0.1% packet loss and what I'm gonna look at here is my bytes in flight and my window size if you open up exercise 6 you can just follow along with me I'll just talk us through this since we're running out of time so here we have our window sizes start ups start small but if you notice go ahead and let your eye take a look at that window size you notice that our window size is climbing on the client 95k 101 Kay I'm just looking at this number here and it's growing 121 121 121 well the server is like whoa you got a you got a window size that's climbing here well at first we can see that the server just wants to put out 21 720 at a time before it sees more acts come in I still have a ton of space to work with here so I'm not super worried what I'm interested in is how big does that receive window size get before we see our black lines with red letters all right you can see over in my intelligent scroll bar that I'm about to run into some packet loss this is where we hit our head on the network throughput capabilities I either hit congestion or I hit an FCS error or some other type of link level error and I got a retransmission all right so let's just look at how health the transport the throughput looks before that loss and after that loss all right so I'm I'm just interested in my receive window coming up on that loss well I just jumped over it because I scrolled too fast okay so here we are right before we have some doot-doot packs we have a point of loss the client is coming back and saying hey buddy there was a gap in the sequence numbers I was good to hear there's a gap and now I'm doing my due back thing with my selective acknowledgement so I'm able to continue to act left edge right edge let this grow but I'm gonna let you know back here this gap of sequence number is what I need you to retransmit right before that period of loss my receive window on the client side is 121 342 we noticed the bytes in flight from the server's perspective we hit 40k we had 40k outstanding all right so that's that's how much it put out before we saw something happen all right let's see what happens after this let's see where this clears up we get our retransmission okay we can see here we hit we had a total outstanding 57 we notice what happened after the point of loss this is the server again doing bytes in flight so on the server side we were up in the 40s we hit our head on some type of ceiling our bad didn't mean to go that fast I thought we had a 10 gig network we probably have a drinking straw between us we're gonna go ahead and reduce the congestion window down and it actually looks like it cut it in half it was at 40 something up there it sensed loss and it went down to 20 so right there we just pulled our breaks after that point of loss this is where things get interesting we can start to see how the the congestion window works check this out so I'm actually going to set a timer over here I'm just gonna say set on set time reference so okay we have 22 72 outstanding on the wire our I notice when this number changes to 21 720 this is another full-size packet more so we were putting 2272 that 18 let's forget that for right now but just at this point we went ahead and added another packet another MSS of collision window so collision window or I'm sorry collision congestion we know collisions is a whole nother topic congestion window is a function of adding it's the it's a number where I send a certain number of MSS as maximum segment sizes out at once once I hit congestion depending on the algorithm in place I'll knock it back in some cases I'll go all the way half to what I was before and then I will add additive increase for especially these older stacks I'll add one MSS for every Network round-trip time and I'll only add one MSS so every 20 milliseconds all you get is one more MSS of collision window that's what's happening here we had 22 72 you notice over here the stopwatch that I started I started a stopwatch when you first saw that 20 now 21 milliseconds down I think the network round-trip on this trace file is 20 right okay in the handshake 20 milliseconds later I add one single MSS to my bytes in flight that is exactly one more all right so let's see when that started let me right click again let's go to time reference I'm just going to zero my counter out again what I want to look for is when does this bytes in flight go up one more packet let's scroll oh it just did it 23 168 we just added one more MSS to our our congestion window if we come over there alright 44 milliseconds so it took twice the round trips just to add one more MSS okay I go through that over the break a little bit slower if you need me to that's not a problem but this is the problem with TCP right here in some implementations and especially older congestion avoidance algorithms they aggressively try to hit that Network and send traffic out but once we hit our heads on throughput we back down so far that it takes us forever to get back up to what good throughput should be I actually have a screenshot I wanted to show you I couldn't share the trace file but it's just a screen ah bummer hang on I know guys it's almost time of course I can't find it oh wait yeah no yeah here we go so here's just like a here's like an old crummy stack that was using a long fat Network so a network that had a lot of capacity and high latency long fat Network so you have a lot of pipe to use and it's also a lot of time that means that your windows have to be huge and if we have any loss this congestion algorithm was taking a long time to recover so this is what happens on long fat networks when we hit that throughput window or we lose a packet when we hit that congestion recovering from that can take so long that that can really lengthen the amount of time it takes to move a file guys I got a I got a probably have to stop because we have a break and we have to go to the next sessions anyway um if you guys like this just to wrap up my goal was to give you guys a bit more hands-on with congestion window instead of just going to a session where I just blow a bunch of numbers at you I actually give you hands on and see how this works keep practicing with it keep getting comfortable with how sequence numbers work and so on please fill it fill out the feedback if you go on the app and make sure to go ahead and write in if you like the session or not or come up and chat with me appreciate you guys coming today and I'll see you around shark fest [Applause]
Info
Channel: SharkFest Wireshark Developer and User Conference
Views: 7,195
Rating: undefined out of 5
Keywords: Chris Greer, Wireshark, Troubleshooting slow networks, SharkFest
Id: h9stVIfug5Y
Channel Id: undefined
Length: 70min 57sec (4257 seconds)
Published: Sun Jun 16 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.