SF17EU - 34: TCP Analysis (Jasper Bongertz)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is a another TCP analysis talk it's about more complicated things so if you just learned how to ZP works this may be a bit confusing the funny thing here is that I put this together without really looking that much at the traces myself I saw them a couple of month years or whatever ago so if I'm confusing myself doing this talk that is for you to see me trying to find my way through the traces I did that on purpose so it can happen that I get lost in my own traces it's quite possible so I try to get a little bit of that packet doctor thing into my on my own talk as well there's seen the packet practice yesterday okay so the rest will see a single packet doctor struggling with traces the idea here was that very often everybody's talking about very simple things when you're doing analysis like hey I see a retransmission and there's some time that is gone by so this must be a problem and problem solved or topic solved but sometimes it gets more complicated than that and I turned in this talk just to show a couple of things where we were looking at traces that weren't that's really easy to solve it all comes down in the end to the amount of spirit experience that you have there's one trace in here where people may spend hours and days on and not really seeing what's happening and if you've got enough experience you will look at it and say well this is this like half an hour done and experience unfortunately something is something that I can't just hand to you you need to make it yourself as I always say experience is needed most right after you needed it so that painful times when you find out that something works in a certain way after spending hours on it that's the most valuable time because then you're like hey you know I know how this goes and that is the same for a network and others basically so I can show you a couple of things and my plan is to give you some ideas of things that I've seen and maybe when you see something similar you remember like how there was this talk and there was something about this and then you're a little bit faster it's not going to help you like solving complex problems in half an hour that's not going to happen probably but if you get a little idea that is quite similar to the challenge of soccer yesterday if you know what you're looking for or if you have a couple of ideas what to do next that is always helpful because if you don't then you're like now what stuck and we don't want to have that right okay so about myself I'm working still forever that hasn't changed since the first talks that I gave here I'm active in the wash our community in a way that I write a blog post or blog posts if you haven't read it maybe you want to may want to I will reference to some of the blog postings probably during this talk because my idea of writing blog posts is not to have one blog post every couple of days to show you that I'm very active writing something but I'm writing blog posts sort of like a collection of knowledge and collection of tutorials that you can look up even years later so sometimes people are pointing to my blog post from 2013 which is about four years ago and they're still kind of valid or mostly valid and if they're not valid anymore I will update them so it's like a resource sort of like an online mini book so to speak Twitter if you want to do the Twitter thing I'm package a I'm quite active on the Wireshark Q&A page depending on the questions that arise if somebody's asking something about Wireshark code stuff I'm like I'm out because I have no idea and sharpest obviously also I made trace render if you haven't heard by now trying to get more than five people interested in that sometimes it works sometimes it doesn't take a look if you want so the main challenge that I have with packet captures coming to me or that I have to analyze is always to capture quality I have had captures where somebody said hey no problem we will do a capture at seven capture points simultaneously and sent the oil sent to you all the files and then you can find the problem I did two of the seven myself the quality was acceptable good enough depending on what I could do at the pilot I had to use a spent port so that's not the optimum that you can do tap is always better but then I got the remote captures from somebody else who did them for me and they were completely useless and like spending a day capturing stuff then getting results back from somebody else who's captured something on I don't know what and he couldn't use them at all so in that way doing analysis if the capture quality is not doing enough you can forget it basically yeah that's not so much that he can do then depending on what the problem is so capture locations we have three of them at the client at the server and somewhere in the middle and please note I'm saying it has a client at the server net on the client or on the server because I don't like those why not anybody knows why it's not a good idea to capture on the client or end on the server yeah we have a lot of things going on where the trace will lie to us so one thing everybody says is packets never lie and I'm like no no they do sometimes depending on what you captured how you capture it but I mean if you if you do a perfect capture with a tap full duplex was a great capture device yes packets don't lie if you did it locally on a system that is doing so many things at the same time being overloaded and stuff like this packets do lie and the problem is that you need to know in which way they are lying so if you see a huge packet 16k or something you know well that's not in the specs anywhere well then you need to know it's it's just a lie it's it's a packet that hasn't been cut into smaller packets and then the troubleshooting gets a bit more complex or maybe impossible can can happen so please try to get a good capture quality otherwise mm all of those will probably be much harder to do all right so I did a couple a couple of slides here to show you what the difference is between the capture location if I'm capturing at the clients and packets are going back and forth you can see that the clients is a long time between or relatively a long time between his requests going out and answer coming back so the capture device sees almost the same thing maybe a bit quicker because it's a little bit in front of it if you move the capture device to the server the picture is quite different because you see that the time between the request coming in at the answer coming out may be a lot faster so it's quite important to know where you are because if you're talking about timings and what it took and what happens you need to know where you were when you captured if you had the client that picture is different than from the server if you in the middle then it gets even more complicated because you will see like let me check if I can get this thing to work you can see here that this time at this location the capture device realizes for the first time that there is a packet but the whole travel distance here from here to there it doesn't know it seems to pack it at that point in time and that's the first it's ever sees it same with the answer the answer is first seen here so for the capture device this is the time it took and the travel time to this and and that and it doesn't know okay so you need to be aware of where your capture was so one of the effects that I'm trying to show here is and I think this is an animated slide I made just maybe 20 minutes ago it looks like this if the packets are going out from the client to the server you will see that basically the client is sending a lot of things the capture device will see many packets from the client before the first packet from the server arrives so that picture is different then if I move this a little bit to the right because then you will see two packets from the client and the packet from the server coming back well in this case the green arrows should be a little at Louisville but you can see in the capture that the order of packets will be quite different yeah so that is really important never look at the order of packets in a trace file because they depend on where the capture was taking it's more here or more there or in the middle of it's completely different things so what can we do to really find out the order of packets that were sent and acknowledged yeah look at the handshake that is a variable idea to do that why because we need to know what the round-trip time is ok but for packet ordering what is the one and only thing to look at starts with an S and ends with equals so we're looking at packet sequences sequence numbers they tell us which order of packets should have happened sometimes they may trick us which is the problematic part in TCP because you may see a retransmission which has an older sequence number appearing later than it should have been and if you don't have the original you're ordering gets quite out of order and then in that regard your idea about looking at the handshake is quite important because the handshake will tell you how much time does it take from one to the other and back and if something arrives much later it's probably a retransmission okay so timing is the one key thing that you need to look at and we have many questions on ask dot Wireshark talk where people are confused by seeing perfect packet sequences in a certain order not thinking about what the time is well were they captured how long did it take to travel is that the real order that they arrived at the one point or the other and that is important to know so some guidelines watch out for initial round-trip time e IRT T so everybody knows what that is initial round-trip time who doesn't so you all do great all right so I don't need to explain it right now if you if you need a hint about this go to my block and then there's a post call to determine initiative round-trip time Google for its you will find it and it will paint you some funny lines of why this is the way to do it basically what we do is we look for the sin the cynic and the AK and the time from sin to AK is your initial round-trip time so if anybody of you is now like wait a minute it's not from the sin to the egg it's from the sin to the cynic with the block-post because it is okay from sin to AK handshake full time the next thing that many beginners don't do is they don't isolate TCP connection and it's quite funny to see them try to match sequence numbers from different TCP connections it's like why doesn't this especially if you do relative sequence numbers relative sequence numbers are only allowed if you know what you're doing and if you go get into trouble because of relative sequence numbers turn them off yeah I know that many people don't like the long real sequence numbers because they're so long but here's a little trick for you you don't need to remember all of them just remember the last five digits why is that enough I only do I always do that I only look at the last five digits of a sequence number because the biggest packet that you can see has how many bytes one thousand something yeah or if it's a jumbo frame it's nine thousand whatever for numbers four digits so if you add a four digit number to any other number the only thing that can change our five digits with an overflow to the next higher number so the only thing that can change are the last five digits if you see a sequence number where we're more than the last five digits change you know that you have a gap or your capture quality is not good enough because they have huge packets maybe yeah so sometimes that is a problem that you need to work around but very often if you're using absolute sequence numbers just look at the last five digits that what I do isolating the TCP connections in two different ways either you use two five tuple five tuple means IP a port a high PB port B and of course being TCP with that you get a filter that basically says IP address equals something and port equals something and appears equals something and port it will something and then you have your connection and then there's the stream index which most people use by saying follow TCP stream which I absolutely hate because it's an extra step to close this content window again and I don't like clicking one extra click it's too slow for me and some of the core developers look at me like what are you doing it's too slow really yeah I even hate the find dialog because the most distance is too long yeah the the fine button is all the way on the right and I don't like it yeah and everybody's like what what's he talking about yeah I'm really fast when I when I'm in the zone like a programmer being in the zone looking at packets I don't want to move more than I need to I would do everything with keyboard shortcuts if I could which I can't but I would alright umm if you know the capture location great if you don't know it's find out from the packet capture who knows how to do that well how do you find out where the packet capture was taken you look at the syn synack yeah you look at the three-way handshake and you can find out where it was taken by seeing where the small numbers are yeah usually the numbers are really small for the one and very very high compared to that for the other and then you know how it was taken here was taken there was taken in the middle timing is everything so if you're doing advanced TCP analysis you're looking at sequence numbers and Delta times sometimes relative times but those two things time and sequence numbers that's what you do all day and that's where we lose usually all the beginners because it's hard to concentrate on that if you're not really used to it and you need to practice practice practice practice I've done this for 13 years now a thing or fourteen so I've got a lot of practice so I can sometimes even jump over big amounts of packets because I can see out of the corner of my eye that the sequence numbers are nice and for good so I try to find the locations where it's not good so the one thing that you need to have if you're doing advanced TCP and others is and that is one thing han seong always says he won't talk to you if you don't have a delta time column if you don't have it you're not serious about packet analysis and yeah it's kind of like that way so you need to have a delta time column I hope everybody has one because it's really really really important okay in our slides let's just try to look at some packet captures hello picture please come on okay okay my first race it's called Citrix who knows Citrix or site ryx or whatever you want to call it who's afraid of it huh again you've heard it you heard about Citrix of course who thinks it's a scary protocol why is it the scary protocol it's asynchronous usually what we're used to is that a client says hello DS server I want to talk to you when the server says great let's talk okay let's talk that is the end check and then the client says I need this from you and the server says there you go and as an analyst we're like where's the question or the query or the request where's the answer how long will it take is it good enough great if it's not good enough bad we found something with Citrix it's similar sometimes but not always because it's asynchronous Citrix this is a Remote Desktop Protocol so think of this the client connects to the server the server says hey you want to see a screen okay here's the screen so far so good but then something on the screen happens and the saw was like hey you didn't ask for it but here's something else because you need to paint this please so the server can send something to the client without being queried at any point in time so if you're looking for large Delta x that's not going to work most of the time yeah very often we're looking for where does it take a long time but if the client just sits there for quite a while and then at some point the server sends something because something happened on the desktop you will see a packet coming from the server and you're like oh this took forever and then it's like yeah but he had every right to do that so this is bad so Citrix under this is always a big challenge this is a real simple one I'm just starting with this topic just to let you know that Citrix analysis is always something where I get really really careful because you cannot guarantee any kind of time resolution value because it may not be that simple just to let you know the most complex citrix and others I ever did was for a company that is doing 3d drawings for satellites with Citrix so the designers sit in one town and the server's there working with our 500 kilometers to the north and they're rendering everything that they're doing and satellite drawings are not your typical something that you can print in a 3d printer they're insanely complex things and the problem they had was if they turn the model with the mouse sometimes they click on something because they think the turning is complete and then it goes one step further and they click on something completely different and they have two totally annoying it costs so much time the problem of that is that all the 3d drawings are rendered on the servers and then set as a movie to the client and we troubleshot that one and that was the one where they told me hey we can get you packet capture from all the way but their quality was like unusable there weren't even Citrix packets in the traces that I got back I'm like what am I supposed to do here so that's the one I couldn't solve because they cancelled the problem thing and went back to non Citrix drawings I think so Citrix is a problem you can see that it's light blue because I colorized Citrix just so I know where my enemies so trips protocol is something that I need to be aware of if it happens in the trace and this was actually a relatively simple thing but I wanted to use it just to show you how you can look at the timings here where's my okay so you can see a three-way handshake and one trick that I already showed if you're trying to look at a couple of packets and not getting confused with anything else just remove everything else all right so what I do very often for example is framed dot number less than four yeah now I only have my handshake here and I can concentrate on that of course I don't do that anymore with the handshake but sometimes on with other things so frame range filter is quite nice to just remove everything else that's that may distract you so what we can see here is we have a delta time from the sent to the syn ACK of 740 714 microseconds is that fast oh yeah but the act is 890 I think let's zoom in so with the caps are worth the capture location at the client or at the server server trick question it's so fast it's hard to tell but it's more likely that it's close to the server just because it's a little higher but with microseconds I would not go and say I'm pretty sure it's right there it's like it's probably me but maybe it's in the middle I don't know yeah sometimes you have times like this we're not really sure where it is actually this was captured at the client so yeah if you have time that fast you need to consider that the nodes also need to do something in that TCP stack it's not like that they're instantaneous now the time is too fast for you to really tell where it is so in this case I know it is at a client because I captured it at the client yeah if you've ever done troubleshooting with a really huge problem I can recommend doing the capture yourself right next to the person having the problem because then you know what is going on that is sometimes invaluable by reading the packets because you know that oh we did this and we get that and you wrote it down writing things down is the most important advice I can give you because if you're doing the analyst one week later or three weeks later you like this trace what was it again and it happened to me and then you're in big trouble because you're starting to sweat like I need to find something but I have no idea anymore what I was doing yeah the MEC addresses on different subnets yeah yeah if you look at the Mecca dresses sometimes you can tell that there's a router in between so let's check I think they're anonymized so you don't know yeah it's stress Wrangler if you see a mecca dress starting with f2 that is stress ringer because I'm using a I'm setting the locally administered bit so that you know this is not a real Mecca dress so if you see f2 as a Mecca dress it's usually it's dress Wrangler and the other thing to find out if it has been an anonymized is quite simply go to statistics capture file properties and it will tell you it will send it to us by a trace Wrangler yeah and that's the first thing I usually check when somebody sends me a trace did they sanitize it just by looking at this well if the file name is underscore and on I know it has been transferring here because that this is standard file extension that it gives it all right so I'm looking at the handshake we don't know where it was but I know where it was so let's take a look at this and the problem was that the user complained of Citrix suddenly completely freezing his screen not being able to do anything anymore and not being able to work it was in a bank they should be able to work he wasn't and the interesting thing was that I was sitting next to him when his screen froze and it froze for about I don't know like 60 seconds 60 seconds is like eternity forever in a network so just to start things lightly let's see if we can find the 60 seconds somewhere so no scrolling as Laura said ignore that I'm scrolling and if you're experienced you will be doing this a lot like I don't care I don't care and I keep going I don't care stuff like this all that was red stuff Oh red stuff window fools your window window fools everyone no is that bad that's pretty bad it depends okay yeah I'm leaning to be the zero window phenomenon being one of the things where the it depends for not being bad it's pretty small the window is very often bad like 90% or 95% of the time a zero window is bad sometimes there's a special case like I presented in my first talk where there was this old printer that has to get the skiers going and everything it was sending a zero window to stop the print job from arriving because it had no attention at the time or there was no cash in that printer so it used to zero window to stop the transmission before it was ready to basically print the stuff there was one of the situations where the zero window is kind of bad because it means the device is so old that it's slowing things down so zero windows is very often a bad problem ok let's check out what's going on with this I have one thing that I often do here when I have no idea what's going on and that is TCP dot analysis start Flags just to get some idea what is going on window update is that bad no good things it's not bad window up there usually is not not bad it's just saying you are telling you hey I got more or Loum now and that's good it's a recovery process right alright so let's see what do we have here windows your updates zero zero zero window Mozilla windows it's not looking good right but seems to recover at some point and it got more problems here more problems okay let's see let's go down further Oh hmm what's that what's happening here I can make it bigger for you because I think it's really small on the screen we're missing some packets yes but the one thing we're always really gets into found something is when you see the zero window probe and the zero window probe back because a zero window can happen and if it happens what do you do then what do you do if you see a zero window are you going to write a report like hey there's a zero window and I solved your problem wait for ya window updates but the window update is only half of the thing that you're looking for what I looking for the painter are based on the zero window and the window update what is the interesting thing here the time how long did it take because I've seen zero windows where the window update arrived like five microseconds later so it was zero window and then there was a window up there telling me hey 16k so it's almost a question of how much time did it take so how much was the transmission slow down buzzes your window process and if it's really small and not really having an impact on the throughput that you're seeing write it down as hey I saw this I think the hardware is on the edge of being maybe you should something to get something faster but if it's not costing you much time you will have a hard time telling anybody like hey this system is like you need to replace it because it's not happening but if you see a zero window probe when does that happen the application on the receiving end doesn't read the buffer so it's quiet and doing nothing and then the senator gets nervous at some point and says hey um I still got more stuff for you are you going to send me a window update at some point please maybe and if it takes ages and ages like hundreds of milliseconds which is ages in a network at some point the sender will be like poke poke poke tell me something what's going on and that's the window probe and then you get a either a window probe act back telling you hey I'm still at zero or you get in window updater window up there would be a resolution so it would work again but if you get a window probe act that means yep still busy don't talk to me don't have time for you I'm slow okay and we see that here so we have quite a few window probe probe ack probe bro cap so let's do a conversation filter I think it's only one conversation in this trace but I want to see everything now so we see there's stuff working and then suddenly it says window full zero window window full means the sender has sent as many bytes as the receiver said it could handle now this is your window the server is the server is sending the probes the client is the one that is overloaded [Music] we got ya window full is coming the window full is something that Wireshark tells us for the data arriving from the server to the client the service the 10.10 dot ten dot whatever it's the handsome scheme his clients are always 192 168 something and the server's attend not something and I adopted that because we're presenting a lot together and I don't want to confuse people because if I would do I could do a funny thing naming my servers 192 168 something and everyone's like what what is going on so 192 168 is usually the client in those and innermost traces so what's happening here is this server sending stuff at some point Wireshark says TCP window full by the way we have many questions on asks where people are hey and why does the server sent a window full message it doesn't if you see square brackets it's a Wireshark notification kind of thing so the server gets a bit annoyed and says their window probe what's going on and you can see that there will be zero window probe X with TCP zero windows so the client keeps saying bad bad bad bad bad so back to the problem of timing or what do we do now what we do is we check for leaks let's say the time where the window full is given you can see a forever the relative time if in the front and the question is one does the window update arrive 18 seconds yeah that's not good right I mean waiting 18 seconds for something to unfreeze will be something where the user is like grabbing the phone hello I T did you turn it on and off again so yeah zero window means my buffer as a receiver is completely filled to the last byte of data yeah window full message at Wireshark is calculated by looking at the bytes in flight but yeah zero one no the window is when there's a zero in the window size value at in the TCP header if it's less than MSS that was called by sniffer the silly window and he is still concerned but not a full segment if you see something like this I don't think that there is a expert symptom for that in Wireshark it's no that's not right silly window basically means it's highly inefficient there is something you need to find out as an analyst yourself and what I've done I think in my color profiles is I set a color filter that tells me hey here's some window size less than MSS and I assume MSS to be 40 and 60 for practicality reasons because a dynamic thing determined from the handshake I don't need that kind of thing but if the window drops to one segment or less or less than one segment it's always like oh yeah all right so 18 seconds this is a problem but it's not the 60 seconds at a time basically I said that the client was a stopwatch when that happened and I know it's laying along with than that so let's see if we can find something else there's more zero window probe probe ACK window update stuff I could sort now by Delta time and find it real quick you were didn't listen to you I should listen to Betty yeah let's sort by Delta time again so I get the high numbers up there okay that is part number used after tour in two seconds can we ignore that is it important who wants to ignore port number reused after tour in two seconds why because the time wait time outs is two minutes which is 120 seconds doing something after 120 seconds like a port we use we don't care all right so it doesn't matter all right see your window probe at 28 seconds that sounds really nice because that means somewhere there is a window probe being sent after a long time so there must be zero in no condition somewhere that we don't see right now but what we can do now is just click on the packet so that the focus is on the packet and then just sort by a number and now I'm here in the middle of the problem and if I do the same stuff that I did before setting a time reference you will see that I have a window update time of exactly 60 seconds from problem - it works again 60 seconds and that matches quite closely the timing that I had on my stopwatch so I put in my report there's a problem what did I what else did I put in my report who's causing the problem the client what is my proposal or what did I propose to solve the situation because if you turn in a network analysis report where you say hey we saw a problem it looked like this thank you here's the invoice they will be like what are we supposed to do so what is your recommendation what should they do ed memory - the kind maybe it's a memory bottleneck what else CPU maybe not good enough this space basically we're talking about a hardware bottleneck somewhere on the client it could be CPU it could be memory it could be disk i/o it could be anything here it was a thin client calls eager not the birds but the German word ego means Hedgehog and that's the company that built those and what the user was doing at the time and that was a funny thing because usually users don't do that if you're doing a network capture he was watching a youtube video on this and if you want to kill a Citrix performance watch a video yeah because so many updates need to come in and it needs to paint stuff and it's really stressing it out I think Citrix is mostly done for work processing where you type one character and another character and another character with little updates on the screen but if you're watching videos and a lot of people do this apparently honest inclines mostly because they don't get desktops anymore or laptops well this can happen and then you if you don't want to prevent them from doing that if you're allowing private use of the company resources which most German companies for example do then you probably need to upgrade your hardware and I told them this syncline is not fast enough to do whatever the user is doing it's not my not my problem if he's doing something he shouldn't be doing I'm only saying upgrade the hardware find out what the bottleneck is and that is something that I'm as a network and not responsible for find out what the bottleneck is do something but it's a hardware problem and I can see it on a network it's not a network problem very big red letters not a network problem okay you see the zero window the probe everything visible on the network but it's not a network problem they came in and hired me to tell them that the users having real problems and that there are real and that they're caused by the users I omitted in my report I just told them forever for whatever he's doing or she's doing the device is not fast enough and if the company then wants to ask the user what were you doing you were supposed to do si P and he's like oh I watched the tutorial video how to use sa p alright as an yeah probably I was trying to show you one company in the u.s. ones where they had I think 300 employees from from Vietnam because they were not that expensive but they keep watching their home TV over the internet link and that was a t1 and that didn't work so funny things you had retransmissions like crazy because the pipe was completely full all the time and you were had outgoing retransmissions you saw a retransmission leaving the company like crazy because crock pipe so it happens video is the killer of everything if you're not careful alright so um if somebody wants to I think I wrote in the session description that I will put the tracer somewhere and I totally forgot to do that so apologies I will put them somewhere afterwards probably at the location mentioned in the things so that you can find it and it will name it accordingly so that they can can get them okay sorry it was just I mean everybody after shark fest yes if you ask anybody they will tell you I don't know what happened but somebody dropped 16 tones on me like the Monty Python sketch we were all completely overloaded with whatever so there was only almost no preparation time for anything especially not this kind of thing and then Hanson couldn't come and I had to take over his slots so my bet basically okay next one due back son what's the do pack I mean there's an expert talk so ish or advance talk you should know what to do pack means what does it mean who's trying that it depends thing because it depends maybe packet loss very good he avoided the it depends thing by saying it maybe packet loss yeah it's one of the it depend things okay yeah the receiver is basically saying I got something but it's not what I wanted so when does it happen the most when there's packet loss because they're sending packets and there's a gap and then the receiver says wait a minute I expected something else do back send me something starting here not there okay what's the other thing of it depends what else can it be if you see dupe X in a trace file and that's my favorite thing because it happens to I get 50 percent of the times now what out of our pockets yeah maybe but that's not the main problem I had one colleague a couple of weeks ago in those crazy times when when I do was doing five jobs at the same time and he said well this said we were having trouble with WebEx application and he said I got a packet capture and it has 33 percent of dupe X 33 percent and I'm like no you don't yeah I do here's my trace file and I said okay let me take a look at it so what happened 33 percent of two picks what does that mean duplicate packets yeah people are capturing and coming back to capture quality it's a big issue because people with in Germany we call this gesundheit person healthy half knowledge they're like oh this thing is black and red in Wireshark I got a problem oh let me see how many problems I have 33% of problems oh we have a problem and I was like okay can I have the trace file so I got the trace file I ran edit cut minus D or - D for a DJ application ran the Frye's through it gave it back to him like you don't have a problem he was like looking at like what did you do this looks fine I'm like yeah it looks fine you just captured in a way that you shouldn't have or you can capture in that way if you know what you're doing and deduplicate it before telling me it's a big problem here because if you do that in the end DNS report must be able to be handed out to Sakae a christian handsome and he must came to the same come to the same conclusion than I do if not I'm in trouble I've done that I've handed in reports but I wasn't sure if I was completely right and if you have somebody on the other end of the report receiving it who's checking what you're doing and he's maybe better than you or has at least the same level you should be sure about what you're saying so reports are very important in a way that they need to be precise and correct you may make an error in there which is what I told you and at the sharp eyes it's quite I mean my first report I ever turned in was totally correct but getting there was a totally incorrect way so I said this is the problem and it was that problem but my way of finding it was completely bogus it was sheer luck that it was that same thing so it can happen but we need to try to be precise here so let's look at this thing do packs that's what the trace is called so let's try to find some new packs because otherwise my time is running away TCP . analysis starts it's too small okay so this will take a while okay do we have two backs yes we do how many of them I mean if you want to have a quick look 4190 two out of almost a million packets so it's that bad is that a bad ratio and once again it's time for the two famous words that begin with it depends because yeah maybe maybe not if you have duplicate ACKs every couple of packets but that it's not costing you any kind of time but or it's only just an out of order arrival we don't care it depends on how closely they are together so looking at those you can see in the zoom stream up there already how high do they get the highest number these can see right now is 62 right what does that mean 62 - Peck number 62 again I've seen the same acknowledgement for the same packet 62 times so how many peckers did arrive for me to send you 62 times do back it depends on how many acts you sent per packets or how many packets you act in one single go so usually dude 1 / 2 packets so it's at least twice that yeah yeah yeah the egg frequency changes was packet loss it depends on the stack how it's implemented so it's quite possible that there are X every packet after packet loss yeah it's quite possible you need to know which stack you're looking at but for simplicity I always say it's every other packet you know if you see and and if you are looking at something where you think it's every packets you need to go and look at the sequence and knock numbers and check what is happening and that is the things that take most of the time that what when we're doing that right now yeah if you have sack and you have a good indicator that helps you what is going on so let's check is 62 the highest number that I have here I don't know 90 100 200 300 400 nice number is 1111 duplicate ex and everybody should be know like oh that is a lot it's the twice a second highest number I've seen and the highest number is also in this trace it goes up to 1129 so this happens at least two times what is happening here what is the round-trip time very good so if you have no idea what's going on first thing is get your bearings find out what's happening here and you need to have the round-trip time for that that is the livery first thing that you need there's two ways of finding that one out the first way is to look at the and check the second way is using a recent wire track version where the co-develop has finally listened to me and put something in Wireshark that will let you know what the round-trip time is and let's try that I hope it works go into this TCP layer and look at the IRT T field that is in there that is really useful because like Betty I don't want to scroll all the way up when I'm somewhere but where it's interesting just to find out what the round-trip time is because I forgot to do that at the beginning of the trace now the field is in there and this field is it should be a little bit golden because it's kind of magic you can do some cool advanced filter' tricks with it for example try filtering on or extracting all three handshake packets that is a challenge that wasn't possible before we had IRT T and the window scaling thing because the IRT T fields will only be there if you have a full handshake so if you don't have it if you don't have a full handshake there will be no IRT T fields so that is a good indicator of is their complete handshake and then try to get it I wrote a blog post about how to do that so if you want to do it read it so our initial round-trip time is 22 milliseconds good bad something not outrageous well let's say this communication starts in a city in Germany and ended or the other side of the thing was in a city 400 kilometres south so 22 milliseconds quite good I don't think you could get any better and the problem description of the customer was well we got a connection that starts with a 10 gig line and it ends at an LTE thing that has 150 megabits the throughput we're seeing is 12 megabits so it's less than a tenth of what we are expecting what is going on too large who was too large window size how big is the window size 4 megabyte is that a big window size and again it depends right but where did you find the window size calculated okay I mean a full megabyte window size is something that you can encounter quite often depending on how big the pipe is what the distance is the famous spent by delayed product but let's in this case go to the handshake and see what the handshake is doing for that I need to clear my filter again which takes forever so here they start with a window size of 8k scaling of four so this one has a calculated window size here before it happens where is it okay I need to go to the right rights connection because this is the one that starts we will see that the window size 46 calculate 5 8 8 8 so it starts low but this one for the receiver I think that's the receiver let's see who's who this is the client the client gets the data I think yeah so the client is saying in its I don't know if you're starting from here packets 1 2 3 4 5 6 7 8 and it's a CH packet it's already saying window size 4 megabytes now is that common or is that kind of like hmm it's a static value normally what happens with the window size it's increased over time when you when the receiver feels like this is working the throughput is fine no packet loss increase the window size increases the increases it grows this one here is static 4 megabytes all the time and if you see something like this like a static window size being that big at the beginning of the of the communication somebody's not doing something right so either the stack on that thing is kind of weird but I know it's a Windows system so the stack should be fine but something is putting a huge window size into the packets 4 megabytes that is quite big and it doesn't change it keeps it at 4 megabytes so now there's nothing in between just switches and a long distance line maybe router but nothing that is firewalling load-balancing nothing of that things that I typically go to first just normal stuff I will let you know what it was in the end we found out so the problem is that the client is basically sailors server show me what you got I can do a lot and the server is like oh I'm sitting on a 10 gig pipe let's do that blasting packets out so what happens if that is the case well somebody's going to go into trouble mode at some point because not the full connection was 10 gig from end to end so the funny thing in this case is if we go to and I'm now using the find instead of the filter why am I using the find because filtering takes forever and I just want to find that packets not remove everything else so find the finder log is one of the most important things if you want to do things fast in big traces if you know what you're looking for if you just want to jump to a specific packet use a find not the filter because the filter will run through all the packets every time when you clear it when you set it it's just too slow and now you know why I'm not that happy with it finally alok anymore because it used to be small and in a small area so my mouse movements were really fast now I need to move my mouse all the time all over the place so let's go and say TCP dot and I'll lose this and the other thing is right now I don't know to find out rock used to help me was showing me what the keywords are that I can use now it doesn't need to open a back report go on where is it that's the first duplicate ACK so the interesting thing is I see a duplicate ACK Dudley could AK the duplicate duplicate egg and a fast retransmission is that the retransmission fall day topic attack can you tell ah good you can't tell because FTP data is hiding what is going on with sequence numbers so a trick that I sometimes do if this is annoying me because I don't need to know that this is FTP data I know it's FTP data so what can you do about it well simple I go to analysis enable protocols I look for FTP data I could what yeah I know I'm scrolling because Laura says don't scroll and I'm a rebel song so now the problem is then if you're enabling or disabling protocols it needs to rewrite the trace file and it's quite big so it takes some time big tracers are cool for presentations because you can stand you're like not my fault come on all right so I disabled the FTP data dissector which means it cannot write anything onto the info column anymore and now we can look at interesting things like the sequence numbers and the fast retransmission is having the sequence number of something in the end seven six eight eight nine the egg was for something called seven six eight eight nine now if you want to compare it completely you can see what it is is that the retransmission for the duplicate ACK really again yes yeah the duplex says I need sequence number one six one eight seven six eight eight nine that's what do Peck is saying and this is the retransmission for that so why are they duplicate ACKs that keep telling me please send this retransmission a one after who was first somewhere in the front who said something first the packet is stuck in the buffer so you've seen that before good experience did you see it he had experience he had seen it before so what happens is I'm blasting out packets like crazy ten gigabits full-throttle into the next switch at some point the switch is going onto it one gigabit line which is a factor of ten to one what's going to happen it will buffer how big is a switch buffer not big enough in our experience and you can figure this out if you're looking at bytes and five and stuff like this it's usually purport 100k so have 100k of bytes that it can hold but if there's a 10 gig link with an FTP server who's like hey I've got a window size on the other end that is 4 megabytes and throws everything at that one the buffer is too small no matter what you do you can get the biggest switch from whoever it will be too small so what will happen it will drop packets if packets are drops at the other end you will not receive the packets so the client will say hey due back I'm missing something and the server is like oh he's missing something one arc a one do park second to back sir took up fast retransmission now the packet is sent into the switch but what's the situation I get the switch the buffer is completely crowded with packets that have cute already so it's like a big funnel and the retransmission is sitting here and it's trying to get through get through get through and it takes forever because there's so much stuff happening so that it takes a really long time to finally get to the client and while that is happening everything in the buffer is still being pumped to the client so there's I think I had 1111 duplicate ACKs so 2,000 packets were in front of the retransmission who had to queue it could not jump to the front of the line like hey I'm a retransmission I need to get there faster doesn't work that way that's something like you should try in an hour port if your flight is leaving two tears and tell them my flight is leaving and they they will allow you to do that a retransmission can't do that so that is the problem here the window size is so big that the server basically throws everything at the client and the devices in the middle get overwhelmed by the amount of packets the first switch in the in the equation held some of the packets then there was another one and another one and another one and they were going down from 10 gig to 1 gig and finally down to 150 megabits so there were a couple of buffers that got completely filled up this is why it takes forever for the retransmission to arrive at the other end and now you can tell me where the stress was taken at the client or at the server at the server why because it sees a fast retransmission after the three duplicate X or for whatever yeah we are on the server side we are really close to where the sending is happening which is why we see the retransmission coming out just fine just in time everything's great but it arrives on the other end ages later yeah we can see how long it takes to get there if we're looking at where the packet is lost let's say at this location and again that set a time reference and then we get down until the duplex are finished somewhere here this is the egg that's basically it tells us now I got everything everything is fine the retransmission arrived because there's no more help or a call for help so how long did it take 151 milliseconds for a round-trip time of 20 milliseconds that is not that good I wouldn't be good if it's 20 milliseconds because we're basically sending and the ex coming back is just telling us hey continue so this is not good yeah Simon does that happen at the beginning of the transaction it's somewhere in the middle of because the buffer up first the switches will all buffer stuff until this happens for the first time yeah it doesn't it this in this case the server is quite not having any kind of doubt about being allowed to send four megabytes of window size it just keeps blasting the stuff the speed graph is quite interesting let's look at the speed graph because usually we use that speed craft to tell you that there's no problem apparently yeah slow start is something that happens to but only at the beginning apparently and let's wait for the graph to paint but you will see at the graph that there is no problem why because we're at the server side and server side is constantly blasting our packets full throttle yeah so it doesn't care about any lost packets apparently and we yeah so here it is so it's going up going up going up and I mean the scale here is packets I don't want packets I want its I'm not sure maybe I got the wrong side of things I had a different graph echo reply yeah the the problem with I simply echo reply it's not very reliable yeah well if you do a ping if you do a ping during the the problem you will see that the ping will not get answered most of the time now because it's just routed okay let's try that again that's more like it come on do it all right well you can see here I had a different graph before um I don't know what I did wrong so this is basically ramping up all the things and then going down but staying at a steady a hundred and twenty megabits per second I think if I calculate that one correctly 1.2 times 10 to the power of 8 I think it's 120 megabits anyone agrees I'm not sure I'm not good at calculus so the funny thing here is that you see yeah there's a huge blast and then it keeps sending data at 120 megabits per second which is the target speed they wanted on the other end but it doesn't work that way because every once in a while a packet get lost and then you have to send the retransmission and you can't continue on the other end so everything gets stretched but on the server side it's just blasting packets like crazy so if you're seeing the graph on the other end which I don't have that's the problem sometimes you just have the capture on one end but not the other and then you need to think about what is happening down the line what may be the picture on the other end sometimes you can only guess but here the server is just sending packets packets and yes it's slowed down a bit but it kept sending like crazy the stream graph yeah what's the one name of what how wait a second I just start the graph and then I try to Stevens TCP trace whatever you want now let's do Stevens okay need to come closer because I'm not hearing well yeah if it uses it to selectively admonishments I have to check I don't know this was a case from a while ago like years ago so I'm not sure anymore so where's the graph one second I think I got the wrong on side and you switch over switch direction this looks interesting I think I never looked at the graph before or not in the new Wireshark at least looks interesting right because yeah well sequence numbers are all over the place sometimes they are so close to each other that they appear in two locations at once so we have some packet loss here I think all right so um the other thing what where's the next 2-pack one one second there it is due back and it says yeah there's sec hedges in there all right so I'll put the the trace on line ish and you can work through it if you want to and find things out and in the end it turned out that the window size was the problem because the client said Henery a ridiculous amount of buffer space that overloaded the switches in the middle the reason for that was the LTE cop driver of the LTE card it had a hard-coded window size in the driver and it overrode whatever the system said so the LTE card was like hey window size something I don't care for megabytes and so the cop driver was at fault and as soon as they changed it and made it listen to what the operating system was doing the problem went away yeah so sometimes it's the driver that does that kind of thing and I think I had more traces than those two or three but I will do a quick one in the end the this one let's just make the columns a bit bigger this is an interesting case what is strange here there's a reason that it's okay every situs happens but what else the same guy who told us hey shut up it's out there that saying yeah I want to talk to you too like what do you have any idea of what could happen here it's the problem of the medication sort of maybe actually I don't have a definitive solution to this thing but it's that is that is the possibility yes so we are at the client so we only know what the client sees we don't know what arrived at the server the only thing we know for sure is that the syn arrived right for the sooner we got a syn ACK so it must have been there everything else is unclear it's a wreath it's coming from the server sending us a syn ACK again after sending reset yeah so yes it is a possibility why why would it send the reset then look at the timings I mean it's not like it couldn't have known because the packet was going one way around the world and the other thing or the other way around the world this is just mental it's like what so any other ideas Tom another device on the network well basically his network is like himself a DSL router and everything out there and he's sure this guy that it wasn't a router or Rooter so I'm not entirely sure what's happening but this was a case on a stopwatch octo-core but most of us set somebody spoofing reset packets somebody's injecting reset packets and trying to keep you from visiting this page and we think that the cynics are the original packets from the server that things it needs to continue something and the resets are inserted by whatever kind of device but since we don't have a capture on the server side we're not sure and sometimes you can tell by a forge reset because of the TTL being different than the original server in this case it's identical so if it's a forged reset somebody doing something bad he's quite good at hiding his location yeah so these are the traces were at some point you have to say I need more info I need to capture this again I need to cap which is kind of problematic because this has is this is a thing of the past obviously I can't capture the server sides later and I need to redo this and capture the server but if the server is not mine or under my control I'm probably not going to be able to so sometimes it's frustrating because you don't have the other side but you need to get as much as you can from the packet capture and deduct as much as you can and sometimes you're just like I think this is a fort reset but we don't know for sure Tom yeah the TTL I think we can take a look at it but it's as far as I remember it's the same so if you have the cynic here TTLs 40s 60 and the resets also have a time to live of 60 the IP ID here is zero which is kind of funny and here it's not zero and hit zero again so yes in fact that was one of the things where we said well it looks like the server is a OpenBSD which only sets a non zero IP ID if there's a fragment the OpenBSD people are very paranoid so they always send you an IP ID of zero unless it's really needed and it's only really needed if there are fragments so this looks like the resets are not from the same system clearly based on IP ID but we don't know for sure yeah so it's possible but maybe it's not all right so I think I'm over time already like I always manage somehow so many interesting traces to look at I have still have two but I don't think we have time to go through them so thank you for being here I hope you had fun I will put the traces on my Donald page like mentioned in the agenda it will take a couple of don't know days weeks months well christmas is quite close so Christmas is you the time when I have time to do something like this so there could be Christmas presents in the form of pcap files thank you
Info
Channel: SharkFest Wireshark Developer and User Conference
Views: 1,912
Rating: undefined out of 5
Keywords: SharkFest, Wireshark, Network Analysis
Id: Tz6IfyfodKo
Channel Id: undefined
Length: 77min 16sec (4636 seconds)
Published: Wed Nov 15 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.