SF18EU - 25 Using Wireshark to Solve Real Problems for Real People (Kary Rogers)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
see I have three case studies what we'll do is I will describe the problem very briefly and we're just gonna jump into the the pcaps right so I've got three of those and then afterwards we'll talk about some takeaways which are just kind of my couple thoughts about it though really it's about you you whatever your takeaways are as we go through these case studies you know write them down or tattoo um or whatever it works for you my name is Kerry Rodgers I'm the director of staff engineering at tech support for riverbed I came up through tack answering phones helping customers solve steel head optimization issues maybe it's not as fast as it should be maybe it's broken why is that and I found that the quickest way to solve a customer's problem was looking at the packets some of my colleagues I found for a bit slower not and because they weren't that as good at looking at packets so I found that the better I got at it the faster I got at doing the analysis the quicker I could solve the customers problem and I also if you remember Jerald talking about programmers or you know being lazy I mean no one I know enough to be dangerous and when I first started working at riverbed you know we have some proprietary protocols and we were like they taught us to like read the offset that's the address this is the port and do that by hand I was like no no no I'm not doing that so I wrote a dissector for that protocol and I convinced my boss back in 2011 2010 to come short Fest I think I kind of walked in thinking I you know I know what I'm talking about I'm the packet guy and I was blown away you know seeing the guys that you see present Jasper and Hong Kong and Christian and all these people I was I've learned so much from them and so you're gonna take a little bit from me and which is parts of them and you're gonna build your own process but that's what we're here to do if you have any questions just shout it out because I'll be looking at the screen mostly and not at you no offense but so just shout out if you have any questions along the way anything else that I from getting to say I don't guess so oh yeah so three case studies one I did in the summer at the shark fest us but the second two are brand-new no one's ever seen them except for one person here who gave it to me so I have not done those two like public so bear with me if it's a little rough but I think you'll find them interesting and it'll be fun so the packet analysis 18 the packet a team is what I call it I have a little website called packet mom calm eye and now the director of staff engineering I think I mentioned that so I'm a manager now that means I've lost all my technical knowledge right it just automatically that day you get the job is gone but I tried really hard to hang on to packets because I really enjoyed it it was I liked doing Wireshark and doing packet analysis so before it all slowly drained away and went into nowhere I thought I could put some of it on the Internet so I had this website called packet bomb comm and people find me email me sometimes I find them on if they posts about problems on reddit or something like and give me a peek at let me see if I can help so if you're familiar with the 1980s American television show the Pat the a-team that's kind of know if no one you have a problem no one else can help if you can find them you can hire the a-team so that's kind of the thing how do I start this thousands of people turn to the internet for help because they have a burning computer and network problem that they cannot solve if you have a problem if no one else can help and you can find them maybe you can hire the pack and a team all right yeah we're all the pack at 18 now so before we get started let's talk about I repeat these things almost in every pcap every case that I do things that you should do I think are important again you take away your own process these are things that I think are important if you're gonna make when I walk around at riverbed support and I see someone looking at packets and they have the default layout I have to stop take a deep breath not slap them upside the head and say you know let's help you out here if you're gonna add one thing if you're gonna do one thing only you need a delta column it's very important when you're looking for problems when you're looking for performance issues that you add a delta column and we can just go to Wireshark real quick can you guys see that oh by the way if you're in the back and you're having trouble seeing sometimes I get that feedback let me know and I'll tell you to move closer up or I'll make it bigger maybe both is that should I make it bigger is that okay what in the back good okay cool I don't want to hear about on the survey ok column Delta column right there all you have to do to add columns is find the thing you want nifty thing about Wireshark is if you see things that are in square brackets that is information that is not contained in the packet itself it is something that is displayed for your benefit that Wireshark presents to you so when you see like in TCP you can see you know the calculated window size versus the window size that's in the packet if you look at the bytes you know that's actually in there calculator window size is not in the packet the other one is next sequence numbers a good one you have sequence number that's right there and the packet next sequence number it ain't it's calculated for you so if you want to add something you just right click on it and apply is column so Delta you need you should you should do that what else we got statistics when I get a peek at from someone especially strangers on the internet I don't really know what I'm getting many times so what am I looking at I'll go to the conversations and hope that it's you know if it's a throughput issue hopefully there's just one connection not like tons of them this has two same addresses but you know one small ones large that makes it pretty easy to jump in and see what we got to do 3-way handshake and this is gonna come up all throughout it's very important it's very important that when you're capturing your data to analyze a problem you get the three-way handshake there is information contained in the three-way handshake that's only there and I mean if you miss it it's gone forever into the night of the internet you will never see it again so always get the three-way handshake if possible understand sometimes it may not be because it's got in good information like the TCP options about what's going on and from this you can determine you can determine the the round-trip time and you can figure out if you captured on the client-side or if you've captured on the server-side so for this one which do you think which side have we captured on the the side the client or the server on the server side I heard so if you think about it find the client and I want to talk to a server over there and let's say I'm capturing on the the client you see this sin go out at time zero right times zero we capture the sin it goes a to the server the server gets the the syn at time zero for itself right syn shows up it immediately sends a syn ACK back there's very little delay if any between the sin and sin ACK on the server side the syn ACK has to go back across to the client where the client has been waiting this whole time so there's a delay between the sin and the sin ACK on the client side he finishes the three-way handshake with an ACK so it goes back to the server you're earning your keep here keeping up and the act shows up and so there's a delay between the syn ACK that went over there and the act that comes back so the delay is either between the sentence and act for the client side or the syn ACK and the ACK for the server side and we can see there's a just a few microseconds between the Senate the sentence and AK in about nine milliseconds between the syn ACK and the ACK round-trip time we've got MSS information sack all the good stuff you want to know and window skills very important you don't know what the true window size is if you don't get the 3-way handshake and then when I was doing packet capturing analysis every day if I would have to look at TCP stuff I would have to look at layer two stuff I would have to look at pretty much every layer really in different protocols HTTP SMB and so I would have a profile for whatever I was troubleshooting you can easily switch between profiles down here and I recommend that you create a profile for what you're doing save it and you can switch back and forth since I only do this occasionally when people email me I tend to just go with a my default and then just add and tweak as I go that but that's just me okay so I think that's those are the things I generally talk about with each case study I just want to go ahead and get that out of the way so let's get into the first one this was someone who they had a sparkly brand-new one gigabit connection from their data center to AWS but they weren't getting the throughput they expected and they're wanted help figuring out why they weren't getting throughput and as lots of people like to do they will use iperf to test if I had a dollar for every time someone you know was chasing some problem they thought was wrong with their connection they're there they're there when when it's really just are not using iperf correctly they're not setting window sizes anyway so I'm thinking well you probably just aren't you doing it using the config properly but let's have a look so we already looked we'll do it again again I get my peak app someone emailed it to me let's see what we're dealing with look at my conversations and I have two conversations ones just 28 packets a couple thousand bytes and the other one is 200 Meg so if you've looked at enough iperf you can spot patterns and how it behaves so I perf 2 will start the connection I think it'll send like maybe 24 bytes back and forth where it's just setting up what the parameters of the connection and then the data goes iperf 3 has a separate I guess sorta like FTP behavior where it has one connection where it talks back and forth in a second connection where it does the throughput so I threw a handshake we've already just we've already determined that we have around nine seconds round-trip time followed away as you do your analysis you're just tucking information away and you're bringing it back when you need it we've this the title of this is receiving so they're receiving data but it's also the quote server side we can see that we have 14 60 MSS size we have time stance window scale all right and on this side smaller window scale but we and we have a 1460 the same you notice any differences between this the sin & senic options that are important any differences yeah the timestamp the window scales different does that matter they're in different order right whoops they're in a different order right does that matter don't know probably not again things you just kind of file away things you might notice that may come back and be important maybe not and as the more you do it the more you learn to sort of say huh and throw away the things that don't matter and kind of hold on to the things that do so then we have our second connection just to take a quick look it looks pretty much the same right same round-trip time so and then we have the data starts so would you take a little leisurely stroll through the packets now this is a lot of stuff right this is what I use there's a lot of data and so this is what I find important because I know what I'm looking at and I know when to look where you're just starting out maybe you don't have all this on there I think it's really important to be able to do sequence number analysis and people don't always learn how to do it I remember a guy posting some TCP dump output on maybe a reddit thread and and in someone made a comment about the sequence numbers you know and another person said I've been I've been looking at TCP dump output for 15 years I've never bothered with sequence numbers what what are you doing then that I'm sorry you wasted 15 years of your life so I would say you know probably keep it down maybe take out the sequence number stuff to see I have sequence number next sequence number ACK the window size bytes in flight which we'll talk about push bytes which I don't know if I'm going to talk about it later so I'll tell you know bytes in flight is how many how many bytes have been sent but not acknowledged at any given time since this is the receiving side and it's acting as soon as they come in packet packet ACK packet packet ACK the Bison flight don't get it very high push bytes is how many bytes have been sent since the last push flag because the push flag can sometimes give you clues about the application behavior how it's written the buffer sizes in the application I don't think that comes up in our case studies today but some of my previous case studies it's been very important so I have this little blue coloring line for push flags just so they jump out but you can see right you you see a pattern as you scroll through nice as one AK for every two packets standard TCP behavior right so good now when I'm looking at throughput my favorite tool is under statistics TCP stream graphs and Stevens just kidding TCP trace let's in bigan okay what we have here on the the x-axis is time in seconds and what we have on the y-axis is sequence numbers sequence numbers are just the way we keep track of the bytes that have been sent so one sequence number means one byte of data so really it's just bytes so we have bytes over time that's throughput right so we want to see it go up and to the right which is what it's doing it starts down at time zero time marches on we send more data so it's going up and to the right pretty nice and smooth this is this a good graph I don't know probably not since we're not getting a throughput that we desire which you can look under the properties and see that we're getting 163 megabits on average out of our one gig pipe so let's zoom in have a look at the beginning and we can see that oh and this is a stream graph so it's you know a stream is one direction so you have to click on a packet with data on the side that's being sent to see anything because if you do it the other way you don't get anything so if you bring up this graph it looks like that just hits which direction zoom in and we can see all right let's talk more about what we're seeing here so these little guys are the packets right the bigger the little line the little eye eye beam thing the bigger the line the more data that's in that packet the green line below it is the acknowledgment so when it goes up to meet the and they're on the same level here at the top that means all that data has been acknowledged we get three more packets the green line goes up that means these three packets are acknowledged these two packets have been acknowledged and then we have nothing happening why is nothing happening what is this behavior for TCP at the beginning of a connection that you see some data some more data some more data I think I heard someone say slow start slow start we keep increasing the amount of data that we're able to send in one go the congestion window size the receive window is advertised in the packet you can click on it you can say okay this guy says he can take 64k or whatever it is but the congestion window which is how much the sender thinks he can send based on the current conditions of the network etc is not advertised anywhere you can't see it you just have to Intuit it it's not in the packet but we can see that it's per round-trip right it's around nine eight nine seconds milliseconds roundtrip so we send some data we've sent all that we can send we wait for the acknowledges to come in then we can send more the line at the top is the receive window this is how much space is available in the receive buffer of the receiver so this the distance between the data and that line is how much how big the receive when window is right so there's plenty of space here so we're all good so we keep sending more and more data let's assume back out a bit and it just kind of keeps going like that it's just kind of like a little wobbly up and to the right just real consistent data coming in but not as fast as we think or that we want so it seems like the the sender maybe isn't sending data because that we've got plenty of receive window room right the space between the data and the the receive window line we're not filling the receive window here so what why aren't we sending more data well what we can do let me close these from a previous thing what we can do is have a look at the sending side sending iperf so just to do a quick sanity check on this look at our conversations it's the same - we've got our two connections over here and we'll see if everything looks as we expect something does look a little different do you notice any differences from the previous capture file and this one maximum segment size it's different right it was 14 60 and the other one is that a problem I don't know probably not but again you foul it away we know that when it was so we know this is a side that is initiating the connection right we can verify that look there's the round-trip time after the sin so this is the side initiating the connection so this is his we're capturing on the host probably yeah so he's sending 89-61 as an MSS but when it arrives it's 1460 right that's something adjusting the MSS probably a router or something along the way adjusting it to the correct size for the MTU so that's fine probably and 1460 is what we get back all right so let's do our little scroll through let me just kind of get there doing our same scroll how this looks different right y is how does this look different from the pattern we saw before versus now before we saw day to day to act day to day to act doing the nice little TCP dance right well here we see several data packets going out and then the acts coming in later so it's important when you're doing your analysis one you right you make the recognition of where you captured client or server and depending on that it's going to look different right if you're receiving data as soon as to come in you send an actor immediately day-to-day to act day to day to act that's what it looks like from the receiver side from the sending side I can send you know in slow start at the beginning maybe I can send two packets or whatever so I'm gonna send those to wait for the acknowledgment now I'm gonna send four I'm gonna go ahead and send them because I'm allowed to send four it's not gonna fill up the receive window so I send all four and it takes a round-trip time for the act two acts to come back to act those four packets now I'm gonna send eight so I just shove all eight out the door at once and wait for the acknowledgement to come in and the acknowledgments will then as you send more and more data the acknowledgement start rolling in but you can see from the sender side a bunch of packets go out at once before the acknowledgments start coming in so again think about the perspective of where you're capturing and what it should look like so we'll scroll down scroll down until the dreaded red and black lines TCP window full we've talked about receive windows we've talked about congestion windows I can only send enough data bytes in flight up to the receivers receive window its receive buffer and in this case here's the ACK right here from the receiver to one two nine nine two is its receive window and look at the bytes in flight we're sending sending sending and then we hit two two nine two two two one two nine nine nine two we filled the window we are not allowed to send any more data so we have to stop and we wait for the acknowledgement acknowledgement comes back now we can send two more right because the knack is acting every two packets so we send two more and now we filled the window again and you notice the bytes in flight went down and now it's back up to to the receive window so let's see what this looks like in our favorite graph TCP trace a min we have our slow start happening as we saw it before and then pretty quickly essentially after like one round-trip no that's not right but pretty quickly you before I should I told you if the date is down here and there's space between the data and the receive window line that's the amount of space you have well if the data is touching the line that means you filled it up so you can determine that just from this pretty picture so we hit that line we have to stop we have to wait for this data to go some some of the previous data to go and then get acknowledged and then we can send we can send two more right we send the two and then we have to wait let's go back out and we send some more and we hit the receive window and we have to wait and so this is happening over and over so we are clearly received window bound on this side from this perspective but we didn't see it on on the receiver side we didn't see it there and you also notice that when we're sending the packets especially down here it's like it all goes down in one go right and the other side it just looked different right it's kind of kind of spread out with a butter knife right a little more even so that hmm when you're sending right you're you are dependent on the axe coming back to tell you that you can send more data so getting the axe back in a timely on time like a clock is important for you to keep that transfer going and keep the throughput up so I want to see like we know the round-trip time is like right at nine milliseconds how long is it taking to act the packets that we sin so we're looking from the sending side and you can add a column which I already have in the past I'm just gonna enable it but it's the round-trip time to act a segment and you can find that do I believe it's in yeah yes under the sequence a can alysus section just here you can right click and you can add it so what I'll do is I'll sort by this value and then go down to the bottom and see what we have all right so this is this is the axe coming back from port 1025 and we've got 60 milliseconds 40 I mean it's a lot higher than nine right a couple a couple longer ones from our side which we can dive into if we want to to see where that is but there's a lot of Acts here that are higher than round-trip time another way to look at this in a pretty picture is under the stream graph so you can look at round-trip time and switch directions so we would expect these are the round-trip times for all the so we would expect it to be kind of clustered around nine milliseconds which you know it's it's down here but there's also quite a lot that's higher and some way higher and there's there's a lot of variation here but clearly there's something causing the round-trip time to be artificially increased so if we go back to the other one the sending side just to see and we sort by the ACK round tip time and again this is a side that's sit in the AK so it should be really short right and we have a few high ones so we can we can look and see what those are and then we have some round-trip times from the other side that's okay but the ones we're sending are all yeah in the microsecond range which is what we would expect so it's not the sender that's delaying the axe going out and we can see what these are by clicking on one sorting it back you know why do we take 15 milliseconds and we are here I think it's backwards okay there we go so this is an AK yeah at the end - at the end of a connection so here's a fin for that port so I'm probably not too worried about a delay towards the end of the connection it's not somewhere in the middle so to wrap this up so we have more to do something in the middle because these captures are taken on the edges of the network you know the sending are made they're taken on the the host itself but it's it's you're taking on the on the on the the host the two hosts doing the iperf something in the middle is delaying the packets what kind of things might introduce delays like this what's that shaping yeah qsr shaping some big buffer these packets are sitting in a buffer waiting to get on the freeway and a parking lot waiting to get on the freeway it's being you know the the the throughput is being spread out as we see here they're not jus all together they've been spread out and the acts are being delayed so that's slowing everything down so this is the feedback that I gave to the person like you you have a configuration somewhere you have a buffer somewhere that you need to find and if you know if it's your network you probably know the the the likely candidates where to go look if you're not sure or you may have to do narrow it down you keep moving in or out depending on your perspective to figure out where that's happening or maybe you have to go to your provider like what are you guys doing you probably know you probably know what you pay for the service that you're getting you can raise the issue with them so this they ended up I went back to this guy they agreed that it's definitely a buffer or you know a deep buffer config and then they his group sidelined the project and they left it so good stuff all right let's go back here any questions on that oh we did that so round-trip time MSS window size capturing the through a handshake so you know what the window size is is critical and again think about the the behavior what you're seeing in the capture based on where where it was captured so that so if you're we talked about this if you're receiving data the acts go out immediately on the other side you're sending a bunch of data at once think about the perspective of what it looks like and then know a little bit about the the application behavior you know if it's iperf what do you expect to see you know 2 connections one connection you know make sure you've set the window size you know the dash W and iperf appropriately for the the bdp of the link or you're never gonna fill the pipe ok so we'll do another quick video that's did illustrate some TCP basics [Music] [Music] [Music] all right that's my now six-year-old daughter she's a delight so let's were doing around a time so NFS hang this came for me from a buddy at a Houston support for different company he knew that I did pack a bomb he's like hey can you help me look at this we got an NFS hang some kind of tcp issue we've got what is this su se Souza Susan Linux how do you say it enterprise 12 on bare metal talking to a net app that's serving up NFS they tested it with other flavors of Linux and didn't see the issue so that's a big clue but we still need to understand what's going wrong and then they still they have IP connectivity when it the it hangs they tested they paint do some pings they're still talking it's just that this connection has hung so let's have a look oh this is gonna take this actually first so I'm in the directory where we have it now these files he gave me NetApp captured I think on the net app and then the SLA s12 pcap captured on the server and you can see there ones one and a half gig and one is right at a gig Wireshark is not going to like that very much if it it probably will explode but even if it doesn't just gonna take all day so one of the things I like to do there's there's a lot of different ways you can attack this problem and if you go back and look at I think looking for the needle in the haystack the Jaspers done in the past on the retrospective that's probably the definitive session on dealing with datasets large data sets I'm using tracer angular and other tools but just quickly things I like to do is take a big file break it down into small pieces and then have a look at one of the pieces and figure out what else I can do from there ideally you know you don't want a bunch of files cuz you may have to go file to file to file to file to find the problem you can't see the whole picture so what I and there's other ways to do this this is just the way I've always done it so don't laugh if you do it a different way and you think this is dumb I know a guy that uh here's how he what he would do when he was looking through log files he would go more file type to grep pipe two more I'm like what what do you what why do you do that he's like oh that's just what I've always done like well it's dumb stop he didn't anyway we're gonna read this file it will do the client-side we're gonna break it in 200 Meg pieces - W and we'll just call it like client pcap that's it right okay so now we have a bunch of files 100 Meg each so we can open one of them with Wireshark let's open the first one okay so I also want to look and see is there anything else I can do to help get the data smaller right so I'm looking at this and I go okay they're probably 89 60s that a number that jumps out at you right for the segment size TCP segment size they're probably using jumbo frames you told me how this is on there in their lab I think they were reproducing this issue high throughput multi gigabit so they're probably using jumbo frames it's not a big deal we look at what we've actually captured and we've captured 1500 bytes of it do I need all 1500 bytes for an NFS issue NFS issue maybe if I have to look at the application layer but probably not they said we think it's a TP TCP issue so what we can do is trim these down so look at this one here and it's we've got Ethernet IP TCP we look at the other side we've got Ethernet VLAN IP TCP so what we can do is say all right well we've got Ethernet header is how long how many bytes 14 VLAN is for IP 20 TCP 20 options let's just say twenty seventy eight bytes how about that that sound good so to do this you can use edit cap with a snap length of 78 and we will use the original file and call it snap dot pcap okay and that one now is 128 Meg all right so we went from 915 Meg to 128 and it's all in one file hopefully why Wireshark won't choke on that and it is this one so it's you know a little slow but it's it's manageable so first things first conversations we have a single TCP connection good port the NFS port right we have yeah 12 gigabytes of data let's see through a handshake no we don't have a handshake that's unfortunate I admonish my friend how dare you give me a pcap with no through a handshake he did know the window sizes the the scaling factor so you can like type that in if you want as it can you know within Wireshark to manually set it I'm not gonna do that so not great but let's see what we've got so we know we've got you know we're dealing with jumbo frames NFS let's look at our graph okay and that looks different yeah not quite up into the right as we expect honestly this first time I had ever seen a graph like this so if we zoom in I mean we can see that yeah we have data right going up into the right really quickly because it's a land environment with gigabit throughput or or more but then it drops way down here and goes back up again and it does it twice before we hit this dead spot which is probably the hang right we're looking for a big flat spot in our graph because if nothing is happening with a hang then it's just no data is really moving so that's probably that and we'll get to that but what is happening with this first part does anyone want to take a guess I'm sorry yeah yes so the sequence number is a 32-bit integer unsigned integer right so it can only communicate you know that many bytes 32 bits and once it hits the highest value it can get to it rolls over right so let's have a look I guess we can let's see let's go down here and give me a moment to zoom in to the bottom we're almost there there now you click on I thought I mentioned that before you can click on a packet in this and it'll take you there in Wireshark so we're there so the sequence number of this is 43,000 and change and right before that it was a big ol number well no that's not the right side sorry here here that's not just an AK there we go there's the last bit of data we've reached the maximum with this bit of also we saw jumbo frames before but what is this about 49,000 bytes in one packet is that a problem is it broken what does that mean yeah offloading segmentation offloading so this is a feature of the network card that is probably enabled by default TCP instead of having to break up the data in MSS ice chunks which is 8960 and giving it to so the kernel giving it to the network card they can say look here's a bunch of data deal with it and the network card can then break it up into MSS and send it off however when we capture we're capturing between TCP which is the kernel and the network hard we can't see what actually goes out on the wire so where we've hooked in between TCP and the network hard so we're seeing what TCP is giving to the network card so that's one reason why you want to capture on the wire if you can with a tap or span or whatever because you actually can see what's on the wire versus here which is not really what's on the wire and it can make it can make analysis challenging so we set this amount this much data the TCP sequence number rolls over it's not a problem probably not we're just sending a lot of data real fast yeah we can let's see this was the let's do that let's do this we're gonna try to calculate the throughput here by seeing how long it took at the bottom this thing is so big it takes so much data it's zooming out zooming in here we go okay all right you want this guy that's close enough so we go to this packet where it rolled over the first time which is right here yes right here it rolled over we're gonna set a time reference for this packet so make it time zero right so now everything after that starts from zero and we'll go to the top of the peak way up here almost there there so to get there that's what rolled over again is ten seconds ten point five six seconds so if we say we know we sent this much data right that's the maximum sequence number and divide oh wait no that's bytes so times eight four bits that's how many bits we can divide it by ten point five six seconds that is bits per second will do K Meg so that's what around three gigabits per second it's moving pretty good and then it dies a horrible death so let's go to that what we're here for this is unwieldy okay flat spot so if we you can see the little circle which is it kind of jumps to the packets kind of just take note of the distance between them from here to here they're there and to there what do you notice about that pattern yeah it doubles what do you think that is yeah let's click on one there we go so you can see the Delta column right 195 milliseconds you know 200 basically 400 sort of 800 1.6 yes so we definitely are seeing typical back-off behavior for retransmissions now each one look at the sequence number is a different segment is that typical TCP behavior to back off and send a different one each time that's sent normal or would you send the same segment and keep backing off I think it's the same segment but honestly I've seen both I've seen both behaviors so I don't know if it's I wouldn't call because in the next one we have the same behavior sort of a back off but it sends a different segment each time so maybe that's just how it works so it's a single a single time out for the whole thing or is it per segment from this behavior it seems to be per segment but I thought it was suspicious TCP behavior so after all that we will load real quickly the server side don't see it I didn't do that one yeah yeah we don't really need it so I said can you please get me a we could dive in further here but I really I really would like to see a through a handshake get a better idea of what we're working with you know this was it didn't spend a lot of time on that right like can you please do that and that one is this one I think this one was taken on a span and it's atrocious in terms of and look at the little cheater column it's lots of stuff going on he said they were having trouble getting clean captures from a span but this one at least had the through a handshake or multiple three-way handshakes so we want I guess it doesn't really matter just the same hosts so sure enough there's our we verified that we're using jumbo frames we have a window scale size on this side same thing but immediately I basically stopped here because we're definitely dealing with packet loss we didn't really dive into it but we saw retransmissions we're definitely dealing with packet loss and when I I want to see certain things when we're dealing with packet loss what's missing from our options selective acknowledgment that's mostly I say mostly I mean it's kind of default in most things I look at it could be stripped out it could be just not turned on also we're dealing with sequence numbers wrapping right what is something else that plays into protection against the time stamps right so I said look I you know I'm a busy guy I don't have time to dig into your stuff here go back turn these things back on you should have turned them off and try it again and guess what it fixed it so they opened the case with the linux vendor and they said well why didn't you know they stopped their hand I said why did you just say well we don't know who did that but that's how it was they were recreating the the customers environment there's turned off so we turned it off so there was a combination of selective acknowledgments which allow you to to tell the sender hey I need packet X but I have all these other ones right in the TCP options field on the acknowledgments you'll see selective acknowledgments and we will see it on the next case study so it's a big improvement in terms of performance when we have packet loss so that coupled with some sort of TCP bug issue on the Linux side which they acknowledge and filed cause sort of this deadlock issue but you can avoid it if you just have the defaults own yes how're we doing on time fifteen minutes is that right crikey okay let's hit these real quick takeaways don't disable the defaults unless you know what you're doing I see people tweaking current kernel parameters with time wait and recycle and it's like if you don't know what you're doing really no leave it alone and certainly don't disable Sack unless you have a really good reason do we handshake very important we've already covered that and then the offloading piece one thing I didn't show you was running out of time on the receiving side packets it was receiving we're still showing up is like 20,000 bytes I was like whoa that's that can't be right but it's the in this case we're sending data to the net app server and the net app actually I think I've got it yeah large receive offload is the reverse of segmentation offload data comes in to the NIC it can assemble it into one big hulking one and give it to TCP so it's a performance thing again capturing on the the client or the server is not ideal let's we got one more we'll try to do it this one comes from an attendee somewhere are you Oh No yeah in the back so blank webpage we go to the website it's blank or it's really slow essentially unusable looking into it well the provider did make some changes but it's the web apps fault it's a developer's fault and here's why we think that's true I mean you guys have never been caught in like a finger pointing exercise that doesn't happen right Network team says we don't see any uh you know Atlanta shoes so it's it's this fault as there's fault well how about we look at the P cap and we figure out whose fault it is so we will do that client-side I like to look at when I'm looking at P caps ideally I get both sides but I like to start with closest to where the problem is seen so if I'm the client I see the problem with my PC you know trying to do a thing I want to see what they're seeing from the packets perspective so this is the client side now this was already narrowed down to a problematic what am i doing problematic connection so thank you thank you I love it when I get a P cap and they're like hey we have this problem and there's one TCP connection so you that they've already narrowed it down to they just need help you know and maybe do explain what's going on so through a handshake very good we have true 1460 windows scale sack okay then we have window scale sack what what's missing there MSS what happens if you don't advertise an MSS 536 is the default from RFC eight nine eight seven nine had to look that one up yeah that's not ideal right is it a problem is it gonna break it it's just gonna suck so we have a web application we have the three-way handshake we have let's get rid of this we have let's get rid of I don't think when you push bytes let's clean it up a bit through a handshake and then we have like 1.2 second delay coming from the client to the server is that a problem probably not if it's a web application we've got a we're waiting on the client to send a get request right so the time before that happens it's really on the user or the browser so we're not really worried about that even though 1.2 seconds is a really long time in network time you scroll down you see those 536 byte packets flying around but it looks ok and then I stop because I have a coloring rule and I go ha and this coloring rule you can look and see how it's colored it is time Delta displayed is greater than 190 milliseconds that's some value I you was using a previous thing and I just left it and it's fine you can spot you know delay tax and stuff so let's look over here we have 300 and something milliseconds okay 903 seconds 12 seconds that's not the typical back off behavior right 48 but the number is still like they should like they should these values and how they are kind of sort of related in their intervals you know should be as a network troubleshooter that should raise a flag to you that that's not an accident probably probably so we have these delays it doesn't look like you know we've lost anything because we can do the sequence number analysis if we have this one so this one is I'll just read the last four digits 1284 with the next expected sequence number of 1820 the next one is 1820 then we are the next one after that should be 23:56 and that's what shows up and it continues on this little nice little zigzag you can follow it the packets are showing up in order and we're acting them immediately because we've waited so damn long for them to show up like yes please give me more information and then we do have a TCP previous segment not captured and we can see that this is five five seven two and the next one is not five five seven two so if we do the math there that's 536 bytes so we've apparently we've dropped one okay but then the delays kind of go away right and then the dupe backs start this is where we're talking about selective acknowledgement saying hey I need five five seven two I've already told you that and you sent me something else I'm telling you again I need five five seven two okay now you get this one 1468 no I need five five seven two and if you go down to the options you can see a select selective acknowledgement block saying hey I need five five seven two but I have these and as you go down you can see for every segment that's received this out of order we keep changing the right-hand side of the sack block it keeps increasing for all the ones we've received and I believe somewhere down here so all those packets we're receiving we're acknowledging but we're saying hey I still need five five seven two and then down here it shows up five five seven - we have some more like minute delays and then finally the servers like I don't know what's going on with you I give up and we close the connection so these these the time here we're talking in a three seconds 12 48 60 50 and then more at the bottom if you click on a webpage what are you going to see during all this probably a blank page so this is we have definitely captured what appears to be the issue so good luckily for us let's go check out the server side and see what it looks like over there so this is that we have our three-way handshake let's do this let's go let's just kind of cut to the chase this is the first one that was delayed right so we will copy that sequence number copy as a filter go back over here and I don't want to filter and just see that one I just want to find it real quick so you can use find put it in there and then you found it so there it is do we see a delay here oh well there's what 1 millisecond but the ones that come after it right this 28 92 that was probably the next one 28 92 these have big delays you know there's what about six seven packets here we don't see any of that over here so that doesn't seem to be the server that is causing these delays we can scroll down we see something's happening down here so this is where we start oh this is yeah so 20 yeah that's the one we were looking at before right 23:56 we have a retransmission of that one okay so let's see if we go back to the first time we send it here we set a time reference so now that's time zero go back to where we transmitted it it's exactly 350 milliseconds so we're we're just happy sending our 536 byte packets out the door over and over and over we get to a certain point we go wait a second I haven't received an acknowledgement from way back then I'm gonna I'm gonna stop so we stop with this packet and we wait 295 milliseconds for something to happen no acts are coming back so we're like we give up and go well I got to start retransmitting now so I send a retransmission of back where the acts stopped all this other data is already gone it's already out the door or something we can we can't get it back we already sent it but now we've got a start retransmitting the ones that haven't have not been act so we get acknowledgment for it right the next sequence number is 28 92 we get an acknowledgement of 28 92 so okay good we got an act I can send two more so it sends two more in sequence because we're sending all these packets we got 50 52 so we pick right back up where we left off and we're allowed to send two packets because we got one ACK and we wait right and there's the 800 and something milliseconds and a packet from way back up there was never act so we send it it gets through it gets acknowledged so we send a couple more so there's something very interesting going on here if we go back to the top this is one of those times where you can actually look at IP each packet has a little identify er called the IP ID so this one is 3506 right this packet 3507 3508 is incrementing by one for each packet it's ensign this is across the whole IP stack for every connection so if they're sequential like this then you know the server maybe it's not too busy talking to other people other other connections but it's we can see these are being sent in order and is that the one yeah that's the one so this 2356 is the one that got dropped didn't show up alright it showed up really late and it is 3508 so if we go over here and we click on that one let me close this and we open this that's not it right that's a different IP ID so which packet is that let's go to the second one here's the retransmission 3744 was that it that was it so the first one didn't make it but the retransmission did and if you look at the time that we sat here waiting so I'm the receiver I'm getting data I'm acknowledging data and the data stops coming for three hundred and sixty four milliseconds remember our time out was three hundred and fifty milliseconds that's pretty close and this retransmissions get through we're like oh great some showed up I'll acknowledge it and then we wait again meanwhile on the sending side it's been sending data the whole time right and then it had to stop and go wait a second I'm not getting acknowledgments but it's already sent a bunch of data and this one let's see we can copy this one you this is the retransmission its IP ID is 3 1 1 5 3 and we click on it here 3 1 1 5 3 so again the retransmission got through the other one did not now let's go back down here where things pick back up so all of a sudden we had this weird packet packet after these long delays and then all the data starts coming through this IP ID is 3 0 606 and if we here it is up here and look 3:06 o6 this happens here are the packets that were held up but were actually lost and never showed up their retransmission showed up later and after that happened the packets that were sent later all came in at once so to recap the situation we had packets being sent they stopped showing up the other when they started timing out the retransmissions themselves got through and then after a few of those what six or seven of those the data started coming through and you can do the the IP ID analysis to see which are the original packets which are retransmissions and the order that they're sent so that is very strange behavior so we know there was a change you lean on your provider and they say well we you know we did install this like inspection device that you know inspects layer seven and only tied only probably one of the only times I've seen weird behavior like this where things just stopped flowing and then weird retransmissions get through one will pop out and then they'll keep going and then it stops again is like an IPS or something like that just doing some kind of application layer inspection you know it needs all the packets to do its inspection - ah this doing not at for the HTTP information so you have this data you have both sides you know what you sent and you know what you received and it's all out of whack you take this data you take the analysis get on a WebEx if you have to and you take the provider to school and say go fix this and sure enough there was a bug in the probe that they had in their network that was causing this problem they fixed that problem went away I would have probably gone a little more in-depth on that one but maybe next time actually these two case studies I'm probably will record back at home and put them up on the website which I haven't put anything on the website in a long time so that would be cool so the takeaways here again know the basics the MSS thing it's like what is that about like that's a feedback you can give to the server team or whoever maybe they can even improve performance more get captures from both and lino little man don't let him get away with that all right any questions actually you can come talk to me afterwards please fill out your survey in the app if you want to talk packets send me an email or Twitter or whatever thank you for coming I'll be here all week [Applause]
Info
Channel: SharkFest Wireshark Developer and User Conference
Views: 38,402
Rating: undefined out of 5
Keywords: Packetbomb, Packet Analysis
Id: UBfSgjUCEi0
Channel Id: undefined
Length: 74min 49sec (4489 seconds)
Published: Wed Nov 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.