TCP Tips and Tricks - What Makes Applications Slow? - Wireshark TCP/IP Analysis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to this video that was taken from a session that was presented at shark fest 2016 in this video we're going to go ahead and record that session and if you have any questions about what you see here feel free to comment below the title for this session is TCP tips tricks and traces let's chat about what makes applications crawl before I get started into the content of this presentation I just want to introduce myself real quick my name is Chris Greer work for a company called packet pioneer LLC I'm a network analyst and also a Wireshark certified Network analyst for packet pioneer I provide training and professional services with Wireshark also helping people to resolve network and application problems so those weird things that slow down on networks and make us scratch our heads and wonder if it's the network if it's the application if it's a server those are the types of things that I trouble shoot you can keep a contact with me right here on my youtube channel or you can check out love my tool comm that's where a lot of network analysis tips and tricks are posted for several different vendors in the industry great site to check out if you haven't stopped by there also you can get ahold of me at packet pioneer com or feel free to comment below so in this session we're going to talk about why TCP is such an important thing we should discuss well basically TCP is used by important stuff on our network a mission-critical applications use TCP to deliver sensitive data between clients and servers and then back in the other direction now a lot of times when I'm out there troubleshooting or I'm training and teaching people one question that I like to ask them I like to ask IT organizations is what's the oldest technology that you are using in your network now sometimes that'll get a response from people like 2005 I have a few drivers that are pretty old or I have that one old printer that's been sitting in a corner for almost a decade now and over a decade now so clearly that's the oldest technology and my network but you might be surprised to learn that TCP is tcp/ip these two protocols and some of the other underlying protocols these protocols are aged they're old in fact if we look at TCP the RFC 40ca was originally written back in 1981 so think about that for a moment the core protocol TCP which hasn't dramatically changed since then there's been a few options at it a few tweaks that have happened that make it a bit more efficient but it's a 35 year old technology so now nowadays in 2016 and beyond what we're starting to see is some of the weaknesses of TCP come to the surface that's why some problems can hide at this layer of the OSI model the transport layer specifically TCP we're talking about connection oriented reliable transmissions and TCP is key it's critical for us isolating the problem domain but the problem with TCP and problem with the transport layer is that few people in IT organizations really take responsibility for it let me show you what I mean by that when a network problem strikes when something happens and performance is bad if we take a look at the OSI model we find that network engineers if that's the silo within the IT organization that we're coming from if we take a look at that OSI model we're responsible for the lower layers we're responsible for the physical layer so the cables the Wi-Fi environment we're responsible for data link for the switches the Ethernet the the point-to-point part of the network also the network layer so making sure that data efficiently gets from point A to point B with low latency and low loss that's our job as network administrator's or network engineers now we might say that we also own just a piece of the transport layer by by firewalls or by perhaps load balancers those also can tweak and muck with the transport layer settings specifically with TCP now if you go due to any HR department you pull my job description that's what you can hold me responsible for as a network engineer so when that problem strikes what do I do I go to my network interfaces I check utilization levels I check link errors I want to see is the problem on the network or not and one of the reasons why I'm quick to do that is because the network gets so much blame for problems today if something slow if there's a delay sure enough people are going to blame the network they're going to say the network is slow now something that I found interesting over the years I'm sure you've found this as well is that even non-technical people blame the network have you ever called in to a call center maybe to a credit card maybe you're changing a flight and the person on the other end of the phone who's sitting in a call center non-technical from a network perspective perhaps their system is running slow what do they usually tell you typically they say you know our network is slow today I always think that's a funny thing for them to say because under the hood it might be something that's completely different it might be a server might be an application might be a service that's running slow but really even they are conditioned culturally to blame the network that's why is network engineers we often go into defense mode we want to cover ourselves we want to prove it's not the network we want to validate those lower three layers that we are absolutely responsible for now if we come in from the other side of the IT house we find that if we're a server application of virtualization a system support person when a problem strikes we also check those error logs we check server resources we look at CPU we look at memory try to see if we have a runaway process we want to know if the issue is ours when a problem strikes now a lot of times these people won't know that they're taking responsibility for those upper layers a lot of times they don't use that OSI model in the same way we would as Network people but there are responsible for that session presentation application will bundle those together and call those those upper layers they're responsible for the service that's running on that application if they're an application developer they're responsible for that code the way that users interact and now those require ease come into that system so what when a problem strikes they retreat to their system and try to figure out if a problem is there's up there now do you notice one layer of the OSI model that typically goes overlooked yes the transport layer what we find culturally today in a lot of IT departments and this is something that I've found is both a consultant and trainer traveling around the world and talking to people in these scenarios in these environments a lot of times when finger-pointing goes from one to the other a lot of times I find that people are fighting over either a problem that is in the transport layer or a problem that can be determined by analyzing what's going on at the transport layer so what I'm not saying here is all problems are a result of the transport layer that's not what I'm saying but we can use the data found there specifically TCP and that's what we'll talk about in this session to hone in and figure out do we need to go up to the application or do we need to go down to the network to resolve what's going on and that's what we're going to discuss today so when you're capturing traffic with Wireshark let's talk about what you should look for in those trace files now Wireshark has some great air events to flag TCP problems now if you're used to accustomed to applying in using display filters a displayed filter that you absolutely want to keep in mind is TCP analysis dot flags I'm going to show you that when we actually get into a trace file just in a second if you use that that filter what that will do is it'll help you quickly spot the errors or the issues that Wireshark has found at that layer at the transport layer specifically with TCP or if we're using Wireshark 2.0 or above we can take a look at the intelligent slide bar off to the right and we can quickly see those events over there so what types of events are flagged what types of things will Wireshark point out to for us or will we be filtering for well we'll see that that filter that I just shows you that will show us TCP retransmissions it'll show us out of orders it'll show us dewbacks it'll show us 0 windows even window updates of the big ones there are some additional ones that will be displayed for you but these are the big ones that you want to try to find now I'm going to show you why and show you how to set that we're actually getting into Wireshark so let's go and bring that up I'm just going to open up a trace file it's a basic trace and it shows us just a simple example of a user going out to a web server and getting an image ok nothing super special here but what we want to take a look at as far as TCP is concerned is the highlighted packet here so first of all if we come up to the top we can see that a client is connecting to the server it's sending a TCP syn so this is the first packet in the three-way handshake shape between client and server and what we notice here is over here on the on our time columns you can see that we wait over three seconds and then the client sends a TCP retransmission now I'm going to talk about these two time columns for just a minute now something you absolutely want to do with your copy of Wireshark no questions asked if you're using a default installation of Wireshark you want to add a delta time column now the difference between a time column and a delta time column like you can see here is that the time column shows us a running total of time from packet 1 all the way to the end of the trace file what the delta time column will show us is the amount of time between those packets or technically from the end of one packet to the end of the packet that we're clicked on right now so that delta time column is an important one to add not to do that to put it pretty simply all you got to do is go up to the Edit drop-down come down to preferences and then we want to go to columns once we're in columns down here below we have the ability to add or delete a column what we want to do is we want to add one and I'm going to name this new column a double click on it and I'm going to call it Delta then I'm going to come over here to the column type this shows me the type of information that will be displayed in this column so if I just left this alone and hit OK I would have a new column called Delta and it would show me the frame number that's the information that would be displayed in that column now that's not useful for me I want that to be up here under number right so that shows me the number of the frame so what I'm going to do is hit this drop down and we're going to scroll up to Delta time displayed that's someone that we want to show so when I'm when I've added the display filter and some of my packets are filtered out what this is going to do is it's going to show me the Delta time to the previous displayed packet rather than the previous captured packet all right now from here if I want to I can drag this column up to make it close to the time column that's something that I like to do in my copy of Wireshark I'm going to delete this since I already have a delta column here and then I'm going to hit OK now front that's just a style thing I prefer to have my time and Delta right next to each other all right in fact I'm just gonna just going to enlarge the font just a hair okay so here we are we've got our Delta column now and this is something that you can definitely follow me on now the this TCP retransmission clearly we see that this happened three seconds later now at the beginning of a TCP conversation that's what we're going to see as far as our TCP retransmission timer so the syn was sent we wait three seconds we retransmit now right away I can tell one of two things actually I should say one of three things my apologies the first thing is I want to know or I want to determine did the network drop this packet this first sin that's a possibility that this sin sin went out from the client to the server and somewhere along that path it got dropped another possibility is that this sin made it to the server the server responded with a syn ACK but that syn ACK got dropped that's also a possibility that's that's more unlikely that's an unlikely scenario because I'd say I'd probably see a faster retransmission from the server side or at least I would see that syn ACK come in but that's it's a possibility so either the syn got dropped the syn ACK got dropped or it's possible that this syn made it to the server but the server was so busy doing something else that it didn't respond to this sin maybe it was a congested it had a lot of connection connections and clients that were doing different things and it simply didn't have enough resources to support a new TCP connection that's also a possibility so what I want to do as a network administrator since I'm responsible for the network I see a TCP retransmission that means that packets were either lost or the server could not respond so again since I'm a network person my responsibility is to make sure that stuff is getting between clients and servers reliably so what I'm going to want to do is I'm going to want to walk the path between client and server and take a look for things like FCS ish FCS problems congestion discards I want to see is there anything between that client and server that is causing packet loss maybe I have a hardware issue maybe have a faulty cable somewhere along there because that way I can if it is due to packet loss when we do Siri transmissions that's where we can really dig in and we can find what's happening on the network as far as packet drops now ideally as a consultant coming in something that I like to see is I always try my best to get dual side captures I want to capture from the client end and I want to capture from the server end simultaneously now the nice thing about this is here we see a client-side capture I'm going to show you in just a second how I know that this is a client-side capture I see it going off it would be nice if I also had a capture on the server side to see whether this original sin made and if the sin does make it and the server doesn't respond well then I know that that's a congested server now I said that this is a client-side capture there's a couple quick ways we can determine that first of all I have a private address here on the client-side it's 10 dot Network and it's sending off this sin that's the first reason that I know but also if I come down here to my my frame details if I expand out the IP header information and I scroll down to the TTL now this TTL shows me that this packet has not been routed yet this is what we call a full TTL so TTLs will start at 255 128 64 there's a couple other ones but those are the big three and one 255 128 and 64 so just by clicking down here I can I can quickly see the time to live is full that's either true or this TTL started at 255 and it was decremented 127 times already so that's unlikely right it's more likely that this packet simply hasn't crossed a router yet if I captured this same exact packet on the other side of the first router this TTL would have decremented down to 127 so that's how I know if I'm a hop away from the client or how many hops away from the server now on the other direction if I take a look at that third pack it down that's my DCP syn ACK I can see that in the details the server is coming back 14 milliseconds later now if I come down at the time to live the server the packet from the server shows me 113 TTL now I can pretty safely assume that this packet started or when the IP the IP values were originally set this began at 128 so I know unless there's another device in the middle that's terminating this connection and starting up its own one then I know that the server began the time to live at 128 I can make an educated guess that servers 15 hops away okay so that's how I can tell where in the packet stream I've captured so since I'm capturing client-side this is something that I want to take a look at it I want to scan and see how much of that how much of those hops do I actually control or how much is a part of the domain that I'm a part of and you know how can I get in there and really begin to troubleshoot and find that packet loss now you notice that in a trace file when you have TCP issues like TCP retransmissions those are flags black and red okay so Wireshark is trying to point this out to us like hey mr. analyst this is bad look at this now what can be helpful is when I'm when I'm taking a look at a trace file I want to see all of those bad things displayed for me now this is a quick trace it's only 11 packets but imagine that it was a half a million packets I just want to see do I have retransmissions or do I not have retransmissions to quickly do that what I recommend you do is let's add a button up on top that's called bad TCP now what bad TCP does is it adds the display filter TCP analysis dot flags what that shows me are all tcp bad events that are happening in this trace file so again retransmissions out of orders do pax it's going to show me window updates it's going to show me zero windows so stuff at the transport layer that I should know about now this is something that I'll hit when someone sends me a trace file and says hey I got some issues this is something that I hit right away I'll just open up the trace come up the bad TCP click and then boom I can see are there network issues or not now if if in a TCP handshake I see low latency I see low trip round-trip times and the bad TCP is clean so I hit that button and nothing shows right away in the back of my mind I can pretty safely assume okay we don't have packet loss we don't have retransmission issues it looks like the network is doing its job and reliably and effectively delivering data between Clanton's server I'm not quite ready to let the network people go home but we are we can safely assume that the networks not dropping anything now to add TCP or I'm sorry to add this button up to your bar up here all we got to do is hit this little plus button right here add a display filter button now when you hit that right away you'll see that there's a label apply this filter so if I don't change that that'll be the name that's given to me up there in that filter bar so in this case since we're going to add this button just go ahead and name it bad TCP and your filter is going to be TCP analysis dot flags as soon as you hit OK now bad TCP will be given to you you can go up and click that and without needing to type out that filter it'll all be applied to this trace now this is a nice feature to keep in mind when you're looking at trace files and you're looking at things that in Wireshark especially issues that you continually come back to if you're setting a display filter often for the same things just turn it into a button like if you find like one for me over here slow HTTP one that I find that I'm commonly adding to trace files is HTTP time equals or I'm sorry is greater than one what that does is it shows me what this this display filter will do is it'll show me all HTTP responses that were greater than one second or took longer than one second that'll show me if I have some lagging HTTP if I have some slow responses from application servers and that'll help me take a look right at the application people rather than at the network okay and that's also one that I've added to the top here with a button okay so I'm just going to clear out my filter and here we are so that was just a brief overview of taking a look at a TCP retransmission okay now let's take a look at another trace file one that has some more interesting stuff in it I'm just going to expand this out now you'll notice right away something I love about Wireshark 2.0 is the intelligent scrollbar now you can see over here in the intelligence crowbar what it shows us is this is from the beginning of the trace all the way to the end of the trace and as I go over different parts of the trace that show different colors you can see how I run over these black lines now those are my TCP analysis flags that shows me where and when in a trace file do I have some some issues okay I'm just going to change this to my TCP plain profile okay here we go so if I come up to Bad TCP right away it shows me all of the bad TCP stuff that I have going on in this trace now instantly without doing much more analysis I can tell that I got I have some network I have some work to do on my network I've got TCP previous segments not captured I've got dupe backs I've got fast retransmissions if I scroll down here I can see several more retransmissions I got out of orders now when out of order let's talk about that for a minute basically what an out of order can be is either two segments were sent imagine this is just explanation of it imagine that two segments were sent from the server and along the way they took a split route and as they came out of that split route they were reordered packet number two was first and packet number one was right behind it Wireshark and so that is an out-of-order like packet to beat packet one and on the way to its destination or another possibility of an out-of-order is when I see when I have a retransmission in the trace but I don't see the original packet okay that's another reason why we might see an out-of-order in fact what I'm going to do I'm just going to right click on out of order and I'm going to come down here to conversation filter like to show you how this looks with Wireshark okay so I have a syn synack AK standard TCP handshake the client is connecting to the server this client goes out and does it get and it says hey server give me this stuff this is the information that I want to see from you server comes back and says all right here you go and then I start to see the responses come in well along the way notice what happens if I scroll down just a little bit here now here on packet 173 I can see TCP previous segments not captured so I have the packet right before this one on its way back to me was not captured now that can mean one of a couple things first it could be that legitimately this packet was lost and this packet that I've collected was the one right after it so I lost a packet up here it's also possible that my wire my copy of Wireshark simply didn't pick it up maybe I exceeded the capabilities of my spanned port or maybe I hit the maximum capture potential of the laptop that I was using to capture or something like that now usually when you see dewbacks right after TCP previous segment not captured I can rule out the one that I just mentioned I know that right now my client is saying hey mister server I got this packet I got 173 but I'm missing a packet above it I'm missing some information so it's likely that something you sent mister server got lost and I'm just acknowledging what I have seen in case now when you don't see dewbacks following TCP previous segments not captured if you see this kind of randomly and it just flies by on your packet trace typically it's not something to be terribly worried about usually it just means that this packet didn't make it to your analyzer or the packet above the one that's being flagged the packet was real and it made it between the client and server and it was acknowledged but we dismissed it because of where we were captured in fact at shark fest 2016 I was talking to a guy right after this session that I taught and he he told me now it makes sense all of those TTP previous segment I captured that he he gets in his trace files he mentioned that he often captures on one side of a split route so he doesn't see the packets coming in on the other side so this is something that he often sees and now he better understood it ok so let's see what happens if we scroll down further coming down to the bottom of this this this messy little part of our trace this is where I see a TCP out of order now what this means is that all these do PACs made it to the server and the server went oh oh no all right there's a loss packet there's a packet missing tell you what here's the replacement and that replacement arrived out of order see Wireshark gets this packet and it says you know what I should have seen this up here this should have come in up above 173 that's why we label it as an out of order the reason why we don't label it as a retransmission because technically that's what it is is because we never saw the original packet now if this same exact trace file was taken on the server side what we would see is things going on along just fine we see packet packet ACK packet packet ACK packet packet ACK and then all of a sudden we would start to see these dew packs come in then we'd scratch our head and go you know what wait a second do packs that's signaling that we had a packet loss let me examine this this sequence and acknowledgment numbers and let's go ahead and get that second packet out on the wire this packet down here would be labeled as a retransmission not as an out-of-order and that's because we see the original packet in the trace so for capturing this is something just a tip for you if you're capturing client-side or you see a lot of out of orders that means you have upstream packet loss somewhere between you and the source of the data in this case the server somewhere between our capture point and that server is where that packet loss is happening if I saw retransmissions that I know that somewhere between my capture point and the receiver in this case the client somewhere between there I'm getting my packet loss so it kind of helped helps you to first of all determine I'm getting packet loss so I'm seeing these retransmissions and out of orders and second based on which one they're labeled as that'll help you determine what side of the analyzer to look for either downstream or upstream okay so that's just another tip as far as reading TCP in our trace files now there's quite a few other bad TCP events that we had in this trace file so I'm just going to bring up bad TCP again and I'd like to talk to you through another one we notice here we already talked about TCP previous segment on capture we talked about some new pacs talked about window full for a minute i'm go and right-click this going to come down to conversation filter and i'm going to come down to tcp now what this does is it sets a filter for the conversation of which this packet that i've clicked is a part of now some people go straight to like follow and tcp stream and the stream data is exported out of those that package stream and it shows you on a on a separate window for me i don't really care about all that i just want to quickly set a tcp filter alright so here we go here's another TCP conversation and let's go ahead and take a look a little deeper of what's happening here so we have our syn syn ack ack we have a get we have an acknowledgment and then we have data coming in from the server now right away I can tell this is operating pretty decently things are looking good I got a 10 millisecond round-trip time between sin and sin AK I'm fine with that looks good if I take a look at the hop count of that packet coming back from the server how many how many routing entities did we go through it's likely this started at 64 so it could be 4 or maybe whatever is acting as a proxy like a web proxy could be answering that for us here's our get here's our ok and let's go ahead and take a look at that window full scroll down come down all the way to where we at so previous segment not capture it looks like we add a have a missing package looks like we're doing a dupe back what this does is it advertises to the server which packet was lost we're good up and what we're telling that server is we're good up 246 721 sequence number 46 721 what we're doing here is we're saying hey mister server I'm good up to the acknowledgement number that you that I'm telling you what I'm missing though or are that the next block of data that I received is between 59 861 and 61 321 so between the acknowledgement number I'm sending you and the left edge of what I'm advertising that's the data that I'm missing that's what I need you to retransmit okay so this is where we see another packet come in from the server we see another dupe back but then we come down here to this window full now keep in mind which direction Wireshark is flagging this on now TCP window full let's talk about that for a minute first of all what is a TCP window basically when we see it in terms of Wireshark what we're talking about is the TCP receive window now the receive window is it's advertised in each packet what that means is I'm telling my my link partner how much data I can receive at once unacknowledged it's basically the size of my receive buffer so for example if I can only receive 64k of data at once if that's what's been allocated to my TCP connection I'm going to advertise to you if you're the server hey I can only receive 64k at once don't send me 65 66 70 K worth of data because I don't have enough buffer space for it but as you begin to stream data to me I'm going to begin to acknowledge that data that you're sending if that data ever gets stuck in my buffer if the if the application isn't reaching in and scooping out clearing out that buffer and processing that data up to the layer 7 application well then that data that's sitting in that buffer reduces the amount of space that I have to receive more data so if I have a 64 k buffer and you send 10 K and it gets stuck in that buffer I'm going to reduce the amount I can receive 54 K if you send another 10 K I'm going to save 44 K and so on my the amount of size of my tcp receive window will get smaller now that TCP receive window will come up here to our since in a kak in our handshake we can determine that window size based on our handshake now hopefully in your trace files you can capture these handshakes because there's a lot of really good data in there so from the client to the server let's come down to our TCP header we're going to come down to window size value we're advertising initially 8192 and if I expand out my TCP options if I come down here I'm applying a window scaling factor now what that means is it allows me to increase the amount of window size above 64 K now this window size value it's only a two by value in the TCP header that means it can only be 64 K without help now what the window scale allows me to do is multiply that number that I'm advertising by a window scale factor and then I'll receive an actual calculated window size whatever that ends up being okay so since we've captured this in the hand shake we can now see down in each one of the acts that come in from the client I'm just going to select one packet sixty six this is where the server it began to respond we see in I'm sorry we see our request here the server begins to respond in this HTTP 200 okay we see another packet then we see an act from the client so this act is acting the two packets above if we come down here to our calculated window size when we apply the window size value and then the the multiplication factor the calculated window size from the client end is seventeen five twenty so in addition to acknowledging these two packets that were sent from the server the client also says hey mister server you can only send me seventeen five twenty that's what the size of my window is so server says okay let's go ahead and send two more packets and if we take a look at the act from the client calculate window size 17 five twenty so we can see that this continues and continues and continues if we come down here though to that window full what this means is that Wireshark has seen so much data in flight that the server has reached the maximum amount of data that it can send unacknowledged so that server by the amount that it has transmitted has hit the window size by the amount of bytes that are in flight that's why Wireshark is saying hey the window is full so it's pretty common to see when we have TCP window full and then some packet loss and that's a common error set that we can see in Wireshark the server it sends out so much data that the client that's all the client can receive it once and it can just the network temporarily causing packet loss so a lot of times what I'll do is I'll look for packet drops between client and server see if we have any duplex mismatches any errors that would indicate that the network had dropped packets because it saw this microburst now I'd like to show you another example of a trace file where we saw bad TCP but it didn't necessarily indicate that there was a problem with the network itself okay so I'm going to open up my trace files and I'm going to bring in bring this up ok so right away this trace file we see some bad TCP right we see the black lines ugly red it's definitely indicating that we have some issues going on now the symptom that this traced file was captured for what we were troubleshooting was client or an application that was performing slowly clients would click I'm a certain tab in their application and they would sit there and watch a spinny wheel for several seconds even over a minute before we would see that page populate all right so let's go ahead and dig in and see what is really happening on the wire now first of all we can see that we have a good handshake we have a syn syn ack ack right away between the client and server I know we've got 97 milliseconds of round-trip time now sometimes people ask right away is that bad well if that server was one hop away that'd be bad if it was 2 hops away I'd wonder why do we have 90 seven milliseconds of latency between this client and server it really depends it depends on where this client is where the server is do I go through a service provider that I don't control there's several factors that I need to consider but if I come down to the syn/ack I can come down to the IP info and I see that the time to live is 111 it's 111 hops that's how many hops I have left so I can guesstimate I can take a pretty good swing that this packet has has been routed 17 times so now what's that number in mind I can come up to my 97 and I'm okay with that I'm not super upset that I have 97 milliseconds that's certainly not what the clients are punching their screens for just yet or seeing those spinny wheels so since in a kak and we see from the client I'm just going to expand out my little length here this is a full size packet 15 14 from the client to the server and a client saying hey here's a get this is a request coming from that client it says give me now what it exactly is I don't really care right now this this trace file has been sliced so that data payload hasn't been captured but I'm not too worried about that just yet what I first want to figure out when I'm troubleshooting something like this is is it a network problem or is it an application / server problem so we see the request comes now this request was so big that it filled one packet and we had to have another second packet from a client to the server just to to carry the rest of the requests maybe there was a cookie in there maybe there was something that exceeded that one packet limitation all right so we have two a 2 packet request the server comes back with 107 106 millisecond response so that tells me that those two packets made it to the server the server is telling me I got your request hang on I'm going to go to work then we see our glee black lines if I click on that first packet notice how long we waited before this packet was sent now the sender of this packet is a client this is a packet with nope no payload it's 60 bytes or plus for the the FCS on the end of the packet would have been 64 but this is a small packet it's simply a TCP keepalive we waited 45 full seconds we didn't hear anything back from that application server so what we're doing as a client is we're checking in we're saying hey are you still there on the other side of the phone or have you hung up did you disappear to the client rather did the server die to the server spore disappear did something happen on the network between client and server that's severed communication what happens so the clients saying hey are you still there server comes back 99 milliseconds later that I'm happy with that that bout matches my round-trip time 99 milliseconds later server says yep I'm still here ok shrug we hear nothing else from that server till pack at 9 another 45 seconds later client sends to server a TCP keepalive now what this is doing is like the name implies it's keeping the TCP connection alive CT CT wants to timeout after a certain amount of inactivity if one of these two signs is dead if a service has gone there the service has dropped or some other network event has severed connection in this TCP channel we want to shut this thing off we want to open up that TCP resource to other things on each side so here that's why a TCP keep alive will keep that connection alive if we hear a TCP keep alive ACK which in this case we are so we send another keep alive the server says all right yep I'm still here let's keep this open servers still working on it don't hang up on me yet finally packet eleven comes this is 18 seconds after the final TCP keep alive ACK eighteen point four five full seconds and now this has measurable data in it this is an HTTP 200 okay this is the first packet of response from that server after this we're good like if we take a look at the Delta x I mean these aren't blissfully fast but in comparison with a hundred and eight seconds of delay this is lightning fast in fact once we start to transmit data let's go ahead and keep our minds on that hundred and eight if we scroll down all the rest of this data is coming in 109 May we're doing okay right then the next get happens down here that's why we see that jump in in time at four milliseconds in K so for the requests above we were sitting there waiting for the application to respond for a hundred and eight seconds so there are times by using bad TCP if we see things like keep allies and if we filter in on the TCP conversation that's generating them it could be like in this case the network itself was absolutely not to blame the application was sitting there waiting for a hundred and eight seconds to respond now we might be thinking well what if that 108 seconds was simply taken up because the server let this packet go and it swam around on the network for 108 seconds now that is tremendously unlikely or actually I should say impossible there's no uh I've heard someone say one time there's no red carpet lounge where packets go to hang out on the network they don't just enter a switch and say you know I'll just I'll just chill here for a while that's not what packets do they go into a vise they have processed they come out of that device especially when I look up here 97 milliseconds that's my benchmark network round-trip time so it's impossible that the network was just sitting on this response okay so the next thing that I want to do with a trace file like this is I've captured it here from the clients side we saw the requests go out we saw the response come back 108 seconds later the next thing that I want to do to further troubleshoot this is to move to the server side and run another capture hopefully we can reproduce the issue or best-case we captured server side simultaneously while capturing client side so then we can come and look and see what I want to know is what was that server doing for 108 full seconds was it look at it some back-end system was it looking at an Oracle server was it looking at a sequel box that was taking a long period of time so right so those are the kinds of questions that I have so what Wireshark has helped us to see is that these delays have nothing to do with the network although we do see some bad TCP what this does by starting at the transport layer it allows us to look up at the application and immediately exonerate the network below and that's our goal okay up next we have another example let me open up the next trace file so now we're going to take a look at an example of where TCP showed us something that wasn't necessarily a network issue but we did see an issue with bad TCP now the basic just to tee up the details here what was happening is we were doing a backup so just simply put from one from a primary backup server to a destination backup server and this backup was taking hours of time in fact the person that captured this he would tell me that he would begin the backup after people went off shift at the end of the day after 5:00 p.m. and he would run it and the thing wouldn't even stop until the next business day when people started to come in to work for the morning shift so this backup was taking a long amount of time well what he wanted to know right away is Chris what's the problem is it a network issue do I need to throw more bandwidth that this do I need to go from one gig to 10 gig doing to go from 10 dig to 100 Gig what's going on here of course because all the server and application people were telling him hey you've got you've got network issues and your networks too slow to handle this backup so right away when we open up this trace file if we come in we can see our bad TCP button let's go ahead and click that right away we can see window updates okay now what a window update is it's when the receiver of data or whoever sent this packet is telling us its window size has just increased it's gone from one value to a new value now what I learned from this is that first of all I don't have any TCP retransmissions do I know retransmit of orders no do PACs so right away I'm not concerned about the network dropping traffic in fact right away from layer 1 2 & 3 I'm good I don't have any issues there and I don't see anything that's really pertinent or things that I need to go digging around in those lower layers about but well what I do want to do is figure out where are the delays coming from all right so the next thing that I'm going to do after filtering on the conversation between client and server which is what I've done here I can see that the server that's sending the data the 1514 s are coming from 192 168 1 dot one they're going to 192 168 1 not 2 so dot 2 is the receiver that one is the sender now I've filtered on just this connection between these two the primary server in the backup server now the next thing I'm going to do it that's important to do first of all because the next thing I'm going to do won't make sense unless you filter on that conversation ok so the next thing I do is come up to Delta this is our Delta time this is our amount of time between packets what I want to know is where are the delays where are things being held up if I click on Delta it's going to show me the the delays between packets that I see for this entire trace file now what I want to do is look for patterns I want to see what side of this conversation are we waiting on here now most of the delays are coming from which side dot one or dot two I can see that I do have an exceptional 160 milliseconds that's coming from the server but most of these delays are coming from dot - now they range anywhere from 20 milliseconds appear 17 milliseconds I see 25 I see 32 if I scroll up a little farther I can see that 2 is even further responsible for more of these delays so I can definitely see that I have some I have periods of time where I'm waiting on dot 2 to respond now that's something that I want to be aware of I also can see that some of those delays are coming from packets where I see window update all right this is all just note to self ok so what I'm going to do is sort on number now I'm going to go back to the top of my trace and what I want to do is look for the first delay so what I'm doing right here on the Delta column I'm going to go ahead and click down it's going to show me coming up here real quick oops there oh and just just one by there is my first delay 19.8 milliseconds so things were moving along just fine at less than one millisecond delays between packets things are sub-millisecond things are going on wire speed transmitting data between one point to another and here we come up on this screeching halt now 19 milliseconds doesn't sound like a lot of time does it but in context it does when everything else around this 19 milliseconds is sub millisecond and things are just screaming fast this 19 milliseconds looks like a complete roadblock in fact mentally move this decimal point over what if I had 71 milliseconds 52 milliseconds I know these aren't milliseconds but just bear with me here what if this was 71 milli 52 million hundred 20 to Millie and I see this big whopping 20 full seconds that's basically what this is telling me everything else is screaming fast you've got this delay but what was it it was causing this delay now remember this is coming from the client to the server if I go up above that packet to one of the previous packets from the client right now I'm only interested in packets coming from the client the server I'm not worried about him he's not causing these delays I want to take a look at the client itself in fact I'm going to come up here to display filter I'm going to do IPS RC for IP dot source equals equals I just want to filter on only the traffic that's coming in there that was sourced from dot too okay so these are only my acts these are not the packets coming from the server now also in my TCP plain profile that I have here with Wireshark another column that I added was window size so what you can do on any packet come down to your TCP window size and it's best to do this for calculated window size you can right click and you can say apply is column what that does is it shows me do I have a window size that's falling is this client that's receiving this traffic and acknowledging it becoming congested and here I can see sure enough it is this window size starts high 65535 65535 and then I can see right around here it starts to bog down this TCP buffer starts to fill 64 or 53:38 it goes all the way down to 20 to 99 after 2299 we wait there's our 20 milliseconds then the client updates its window it says hey I'm back up to full or I'm not full I'm empty I'm back up to 65 535 go ahead and resume start sending in traffic again now if I pull off my filter and we're sitting here at that 2299 check out what the server does the server says okay great here's one more packet you can you can fit one more full-size packet inside that window size inside that buffer which this is interesting the window size doesn't go to zero it goes down at 2299 now a zero would absolutely tell me that I have a congested window on that receiver but it doesn't go to zero it just slows down the server the server says all right here's one more packet and then the server stops transmitting the server can't send anything else until this window size gets cleared out so it waits 20 milliseconds the client sends to the server hey I'm I'm back up to sixty five five thirty five the server takes off like clockwork 15 14 15 14 15 14 it just bursts a bunch of data out on the wire then the clients begins to ACK okay so we see this go on if I if I scroll down we're gonna see another delay coming up here around this is a good long burst here boom boom boom boom and what I'm looking at is my delta time column right here so this is where we see so data is coming in it's being acknowledged data is coming in oh the windows falling sixty four or six thousand four hundred eighty seven bytes thirty five sixty seven fourteen fifty nine this packet right here stops the server the server can't send anymore the data that's being sent in these packets these fifteen fourteen s is fourteen sixty bytes so right now I'm telling you I got one byte less than a full size payload so the server stops server says okay fine I'll wait we have our forty millisecond delay and then the server takes off again okay so these little pauses while they don't seem like much notice how long we're transmitting if I come back up to the top and I'm going to look for that nineteen millisecond delay sorry and scroll up above here and here's my 19 millisecond delay what I want to see is how much time do we spend transmitting versus waiting so if I right-click the packet just after that delay the next one from the server if I come to set unset time reference or I could say ctrl T either one what that does is it resets my time column back to zero so from one forty one I want to time how long do we spend transmitting versus waiting so I'm going to go over to see my next delay if I take a look at the window size you can see when the delay happens because that window size number right here it's starting to fall on my 60 by tax it's starting to fall so let's go ahead and get down to where it goes down to 1459 there's our pause so we spent 16 milliseconds transmitting and 14 milliseconds waiting so just for this burst half the time we spent transmitting versus waiting so it bursts waits bursts waits verse waits now later on in the trace file this is just near the beginning of this transfer this bursts versus wait delay gets a lot worse we're only sending just really small bursts of data and that window is filling really quickly so here I can see that that there are several things at play here first of all the application on the client side is not clearing out the TCP window as fast as data is coming in this 14 milliseconds is the amount of time that it takes layer seven on the client to scoop out that data and open up the TCP window again this is where we can see that there's that difference between layer seven and layer four so that's one thing at play the second thing at play is we're not using window scaling so 64 65 535 is all that the server can put out there at once another thing that's going on here is we can see that this backup process is using a single thread it's only using one TCP connection to move all of this data over 13 782 now right there when you do this type of analysis with a trace file there's going to be some things that you can change and other things that you can't now for me I'm not an application coder so the last thing that I want to do is get in there in the client and figure out where in the application code does it have the amount of time that's it goes in and scoops out the tcp or it processes the data up from layer 4 not my job not what I do the best at I'm a packet and network person so that side of the house is something that I try to leave alone what we ended up doing to patch this while we got ahold of the vendor that does write that code is I want to in ask my friend that captured this I said hey you know is there a is there any options on that receiver or on that backup process where you can use more than one TCP connection can you do TCP it's called multi threading or multi connection transfer and he looked and sure enough that was a check box that he could click on the backup process so when he would had checked it and rather than sending all the state over one TCP connection what it did is it opened up 20 of them or at the at the time I can't remember the exact number but it used a lot more connections to send the data from one machine to the next so rather than one TCP connection bogging down and causing all of this delay just because we have only one lane of travel what he did is he just opened up the whole freeway so now he's got like an eight Lane freeway where traffic can flow down instead of just one lane of travel now while some of those connections continue to show this behavior while one would get bogged down the others would continue so with eight connections only one would fill while the other seven continued and then another one would fill while the other seven continued and so on while we're waiting for an actual response from the vendor okay so I wanted to show this just to see how we could analyze TCP analyze that transport layer and quickly determine whether we had network problems and we saw we didn't we were able to see that this was an issue that was routed in TCP window and then we were able to make some tuning from there so that brings us to the end of this session that was presented at shark fest 2016 hopefully this was helpful to you appreciate you stopping by my youtube channel and taking a look at this video if you have any questions for me please feel free to contact me directly Chris at packet pioneer comm or you can post a question down in the comments thanks again
Info
Channel: Chris Greer
Views: 164,432
Rating: undefined out of 5
Keywords: wireshark, tcp, analysis, trace file, slow application, performance, tcp/ip, tcp handshake, tcp window size, tcp analysis, tcp explained, wireshark tutorial, wireshark tutorial 2020, how to use wireshark, network troubleshooting, application troubleshooting, what makes applications slow, how tcp works, chris greer, troubleshooting with wireshark, wireshark course, free wireshark class, free wireshark tutorial, packet analysis, free wireshark training, wireshark tutorial 2021
Id: 15wDU3Wx1h0
Channel Id: undefined
Length: 62min 22sec (3742 seconds)
Published: Wed Jun 22 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.