SF19US - 30 Using Wireshark to solve real problems for real people (Kary Rogers)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm Carrie Rogers director of tech support America's West at river bed I started there I don't know ten years ago answering phones maybe talking to you maybe talking to people you know helping them solve their still head optimization problems and I found that the better I was at looking at packets to understand what was happening on the wire the quicker I could solve problems so I at every chance I could I would use Wireshark I would get packet captures and I got better at doing it I got faster at solving problems and a few years ago I went to the dark side in management so like immediately all the technical knowledge I had went overnight out of my head but before the packet analysis completely drained out I dumped some of it on the Internet it's a little website called packet bomb calm so what we have today think that's an that's so anytime you have questions just shout it out I'll probably be looking down at the screen I may not see your hand so just shout out if you have a question we will be doing some case studies has anyone this is the first time to see case studies for me new people all right I am I think a good number of people these case studies I did I did do in Europe last year just an FYI and we're gonna these are you know problems that people reach out to me over email or whatever and say hey I have a problem can you help and so these are you know real folks we'll get into a little bit about that yeah so what I call the packet a team packet analysis team where people sometimes I find them on reddit sometimes they you know they post a problem and I'll say give me a pcap will try to figure it out or they just email me so that's where these case studies come from and I think that's enough to know up top I'll be giving you you know a very brief overview there's not the slides are not really much here just a quick overview of the problem it will die into the pcap and then I'll give you some takeaways that I think are takeaways but really you should have your own well you know maybe some things you know some things are new to you whatever those are I don't know but make sure you have your own takeaways from this and with that and use my thousands of people turn to the internet for health because they have a burning computer and network problem that they cannot solve if you have a problem if no one else can help and if you can find them maybe you can hire the pack and a team [Music] all right I guess everyone knows and loves the a-team from the Heaney's if not you're like what in the world are you that's just my little fun thing to do so before we get started I generally have a very similar approach when I get sent a pcap from a stranger on the internet some things that I you know look at things I want my Wireshark to look so let's just run through a few of those before we get into it you know things I've learned on my own tons of stuff I've learned here at shark fest I think I'm wearing the first shark fest ever went to t-shirt actually found it this week buried in a bag somewhere in a closet that's like a great I'll put it on I've learned from tons of people here and so like my process is sort of take what works for me things I've learned and put it all together so that's what you're doing you know you come here you listen to people you're looking at package yourself you're watching youtube videos reading books all that sort of goes into your brain and you'll have your own process but let me talk about mine so if you're gonna add one thing to Wireshark I would add a delta column this you know to find the delay so how many times you're looking for performance issues things that are broken a lot of time big delays big Delta time between packets you can sort on it you know you can highlight it you can do it ever you want I would say you add that in there then when I get the peak app I like to go look at the conversations they just tell me hey it's a web problem hey it's an NFS problem it's we're doing iperf okay well you know I don't I wasn't there I don't know how you captured I don't know what's in this thing so I like to see am I looking one TCP connection in here are there a thousand just to kind of get the lay of the land three-way handshake is so important so important when you're capturing data there there is information in there that you can't get anywhere else it's only it's only presented in through a handshake so if you don't get through a handshake you're going to miss some important bits of information and we're going to go through that through the the case studies and then I think it's a really if you're doing this day in and day out you know I no longer do it for my day job I do it occasionally you know for strangers on the internet but if you're looking at peak apps all the time in different scenarios different protocols I would suggest you start from like a base and then as you make changes you can save those profiles you can save one as HTTP you know SMB TCP you know minimal if you want minimal columns and then you can switch back and forth as needed for myself I usually just kind of have one base one and just there's only a few different columns I'm probably going to put up as I go through so that's how I start out that's how I approach let's jump into the first one and I'm not sure I'm think I have done this one it's some shark fest in the past if it's not new to you it'll be a good refresher this is this one I think is really interesting so this person emailed me they were having they had a new data center connected to AWS one gig and of course hey we have you know something new and we're not getting the expected performance right I mean you've probably run into that before we all assume this the network well we don't but people do so he's had a peak app he said hey I'm gonna do I was an I perf test you know that's something that people do a lot to try to determine am I getting the bandwidth that I expect that I'm paying for etc I would say most times the people come to me or post on the internet about oh I perfect I'm only getting two megabits when I should be getting you know 50 they probably haven't set up the test very well they're not using the iperf config you know giving it the proper window size etc and you can see that in the peak app so I expected to see that in this and let's jump in straight to the peak app and we'll have a see if we can figure out what the throughput issue is is that good enough it should it be bigger can you see okay a little bit more all right we do that better I know there's a lot I mean there's there's a lot of the things assaulting your eyes right now with my layout and I'll walk you through some of this again if anything needs to be bigger or smaller explained just let me know let's make this smaller so this this pcap okay you give me a peek at what's in it I go to statistics and conversations and I see I think you can see that a couple of two connections one was just 28 packets one with a lot more so that's probably the one we care about I see when you go through pcaps things will kind of jump out at you just kind of maybe it means something maybe it doesn't I see port 1025 that means something to me does it mean something to you everything special about 110 25 well it's one more than 1024 right what's special about 1024 and under right it is and you know you have to be a super user route whatever to bind to ports under 1024 so okay it does that matter probably not but again the more you look at pcaps some more things and the more you read RFC's documentation things these little puzzle pieces just kind of float up you file them away maybe they come back up maybe they don't but you you learn to notice things so then I want to know do we have the three handshake hello yes we do CIN CIN ack ack another reason why it's important to get the three-way handshake because you can look and you can make a determination usually get a good idea of where and then we're in into nu captured this data it says you probably can't read it but the name of the file is receiving okay but in terms of you know are we on the client side are we on the server side with the three-way handshake you can many times kind of figure out at least a ballpark of where you're at so if you think about if I'm initiating connection to a server and I'm capturing that sin goes out at time zero right it travels across the connection the way and whatever it is to the server the server immediately sends a syn ACK back to me and that's one round-trip and then I immediately act so the delay is between the sin and the sin ACK right the Delta and I'm if I'm the client if I'm the server you switch it right I'm sitting there minding my own business when a syn shows up out of nowhere at time zero I immediately send a syn ACK back the syn ACK has to go back to the client which then responds with an act to complete the through a handshake there's the round trip between the syn ACK and the ACK so which one which side in terms of client and server would you say this is captured on and my Delta column is here yeah the server size between the syn ACK and the ACK so what do we have we'll make this a little bigger now I usually like to just take a quick gander at the options to see what's there MSS sac timestamps no op window scale okay nothing really jumps out at me but it's good to see you know something red shows up here when I click on the syn ACK maybe I want to know what that is header checksum header checksum x' sometimes or if you capture on the host there they're you know they're not correct because of offloading to the card the card itself might do the offloading or might do the checksum calculation and stick it into the packet before it goes out on the wire but you're not capturing if you're capturing on the host you're capturing before the packet gets handed to the neck so we can probably ignore that maximum segment size no op Windows scale sac timestamps okay anything different here between the two windows scale yeah good to know they're a different order is that a problem no you probably hear me say a lot is that a problem because sometimes it is so nice it's not sometimes you don't know so we have two connections one very small and if you look at I perfect traces enough you start to see you know some patterns iperf too generally has one connection it's a very beginning after three handshake it'll exchange like 24 byte packets before the data flow starts and iperf 3 i've seen more of an FTP behavior of like a control channel and then a data channel and that appears to be what's happening here if I had to guess I would say it's a pair of three we go down to the second one just to do a quick check it looks the same sense in AK with our options and our round-trip time around nine milliseconds so with this why don't we just kind of scroll through here and just a quick word about you know again my layout I know it's a lot of numbers I like to have the sequence number information in my columns because a lot of these things require you to follow the sequence numbers right what's what do the order you expect this to be in you know is this a real real retransmission it not so I like to have that there if you're just starting out and it's a lot of it's overwhelming you know you can take those out and look at them as needed so let's move just a touch we'll take a scroll down here just kind of eyeballing it and you can pretty quickly see a pattern right so this is bytes in flight this 1440 828 96 so this is data being sent right 14:48 bytes this is the tcp length and i don't know if you can see my mouse where i'm I guess maybe over here on the left so we're sending two packets and an ACK two packets and an ACK and that's pretty standard TCP behavior data data act data data act and the bytes in flight never get above the two packets because we're again we're on the receiving end we're receiving this data so as soon as we get the two packets we act as soon as the two packets come in we act so that's the pattern that we see now when I'm once I get the lay of the land I kind of see what we're dealing with and I'm looking at throughput my go-to tool is under statistics TCP stream graph and tcp s-- trace let's make this bigger and then we'll walk through this so I like pitchers I'm a visual person I I learn and I think better visually so what this is the x-axis is time in seconds and the y-axis is sequence number and sequence number in TCP is just a representation of bytes right that's how we keep track of where we are in the flow of data so one sequence number is one byte so I have relative sequence numbers turned on meaning that at the beginning of a TCP connection Wireshark it starts at zero when the real number is there's going to be a randomize 32-bit integer but we're starting at zero so relative sequence numbers so what we want to see is this line this throughput line going up and to the right so more bytes are being sent over time right this is sequence number over time which is bytes over time and that's throughput so let's zoom in so we can see a little better of what we're dealing with and talk a little bit more about what this is so these little guys are the packets the lengths along the XY axis is how big the packet is so if it's a real long one a real short one just at a glance you can see how big that packet was the line below it are the acts so remember if we're going to the right this way that's time moving forward and going up is more data being sent so we right here where the mouse is we sent two packets and the green line goes up just very shortly after just probably microseconds after and now they're at the same level meaning this data has been act we get two more packets and then the ACT line goes up meaning that those packets have been acknowledged okay so that's data being sent and the acknowledgments for them and then after that there's a flat line or nothing is happening so that at the beginning of this connection which is where we are you can see there's a little bit of data then we wait a little bit more data and we wait and the amount of data we're sending each time is increasing so in TCP what is that usually slow start the beginning of a connection the green the other line up here is the receive window so if we're down here the space between the packet and the receive window is how much space is in the receive buffer so that's the receiver saying in every packet it sends there's a window size it's in the it's in the TCP header and I said hey I can receive this much data this is how much buffer space I have so if it starts to fill up it can be a problem but we're good right date is coming in pretty steadily as we ramp up the connection it sort of has sort of like a wave pattern but it's going up and to the right there's always space in the receive buffer available it's just not as fast as we expect you know there's not any we could look and see if there was any data loss but when I look at this through on the receiving side like okay well it's going yes it's not going fast enough I don't see like a bottleneck here so why don't we go have a look on the sending side so we have both captures close that let's see what it looks like from the other end there so just real quickly will verify we have two connections here there they are and we have the three-way handshake notice anything different about this one let's go here did you see an MSS the same MSS on the other side the maximum segment size so maximum segment size is only advertised again in the through a handshake and that's telling the other end here's the maximum amount of data that I can receive of TCP payload right not counting the headers just TCP payload so what does 89-61 probably mean well so this is the per packet this is the amount of data so if we look at the other one it's 1460 which is pretty typical for what a 1500 MTU right but this one is much bigger jumbo frames right but we didn't see it on the other end if you recall we looked it was you know 14 48 I think so something has changed the MSS is that a problem maybe but do you think if you know if this is over some sort of you know nine milliseconds some internet connection some circuit you think there's jumbo frames in 2n maybe there's you know some overhead somewhere so something in the middle usually a router firewall or something will do MSS clamping because it knows what it's in to you is going forward and if it's you know 1500 or whatever it is and it sees an MSS of a over 8,000 bytes it's going to go in there and change it and bring it down to the appropriate level so it can go across without fragmentation or being dropped so probably not a problem but again it's just another thing that goes into your little file bank when you're looking at this so here remember these 14:48 in this column these are the sending sending side the ones with 0 or just the ax coming back so this if we scroll through this one give a little more space it looks different from the other side right we don't see the same data data act data data act behavior so you have to think about again when you're capturing from which in you're capturing and what it looks like relative to that perspective so if I'm receiving data and I know that I'm supposed to act every other packet I'm gonna act as soon as I get two packets if I'm the sending side I can send X number of packets at a time depending on what my congestion window is so if you start out in slow start there might be two packets and four then eight but at some point you know you can send you know however many packets twenty-five packets well I'm gonna if I'm allowed to send them I'm going to send them all at once and it takes time for them to go out and then to be received and acknowledgments generated and come back so it here from this perspective the sending side you see many more data packets going out before you see acknowledgment packets again not a problem it's just perspective so okay fine let's go have a look at our graph at a glance it looks very similar up into the right smooth so let's zoom in and you can see slow start and then pretty quickly the space between the packets and the receive window goes to almost nothing right if they touch let's see if you can click on the packet and you can go to it in Wireshark so if we been scrolling down a little bit longer we would get to this point and see what TCP window full so I'm sending data I'm sending data I you know the receiver is telling me with every act how much receive buffer space it has so I know I can't send more than that amount of data in one go without an acknowledgment because I'll fill up the window so if we look let's see so bytes in flight when we hit TCP window full is to one two nine nine two that means I've we've sent that many bytes are in flight and have not been acknowledged and if you look at the act that comes back it's a window size sure enough is to one two nine nine two so we get in the acknowledgment great we got acknowledgment now I can send two more packets because each ACK is acting two packets right so now the windows open two packets worth so I send two more and whoops full again gotta wait now we get two acts back which means it's act four packets so I can now send four packets before I fill the window again so I am being throttled by the receive window on the sending side right pretty straightforward if I were to have looked at this side first I'd be like oh okay this is a receive window problem on the the receiver side and it was it may have been the one that had like the the lower window scale so maybe we just need to bump up the window scale but we looked at the other side and we did not see its window getting full it was keeping up it was acknowledging packets as soon as they came in right every two packets ACK so that was not that's not it's not the receiving sides fault so what we can do is if we click on well let's do this we're gonna add a column and if you want to add a column to Wireshark the easiest way to do it is just right click on it and do apply as column so pretty much almost anything in down here you can apply an ad as a column at some point in the past I had this one the round-trip time to act the segment what that means is I sent a segment how long did it take the ACT to come back to me that should be around round-trip time in general right and our round-trip time we know is around nine milliseconds as you can see these acts are around nine milliseconds that's what we expect to see but let's sort everything by that we'll scroll down to the bottom well that's different right that's not nine these are the acts coming back and there so what 60 50 40 30 etc all these these milliseconds way higher than nine there's also another graph that can be useful to visualize round-trip time it is under the stream graphs and it is called round-trip time and if you click on the wrong packet like I did you won't get anything so switch direction and here we go so this is like a dot for every X round-trip time and you would expect to see it clustered around nine which obviously it is somewhere in that range but we've got plenty that are way above the round two time and if you zoom in you can see it's just really all over the place so you know we don't have data loss the sending side is doing everything it can to send data the receiving side is acting as fast as it can remember you know the throughput is sort of is bound by the the a crate I can only send data if it's being acknowledged at some point I'll have to stop so the accent need to be coming like clockwork regularly on time to me if they're delayed sending is delayed so if you look if you remember the graph on the other end how it was kind of spread out like with a butter knife had that kind of wave instead of do I still have this one open this guy he's sending them out fast right just like a packet train boom it's going out when they come out the other end they're all spread out and these acknowledgments are all spread out coming back so what do you think something like could call something like that device in the middle a big buffer some shaping QoS this is one of those cases where a woops it is the network depending on what your definition of the network is but we've probably got a deep buffer here a configuration of a device so my recommendation mmm if you take captures on the very edge of your network and you see the behavior as far as you can to your edges of your responsibility and you still see this behavior you know it's in the middle then you have to go to a provider or potential you it is in your network it is some shaping device on the edge that you can then track down so this absolutely was that type of issue so unfortunately this is one of those where it turned out to be the network but instead of people finger-pointing is saying well this the packets tell you exactly not pinpoint where yet that takes additional work to narrow it down to the where but the what is very you know clear here that you can take to your and have a discussion about where to go with that so back to here how are we doing on time I have what till 10:30 10:45 ok so again that some of the takeaways that I came up with for this the basics you know the initial round-trip time if you don't know that the round trip time is 9 seconds then that supposed to be expected for acts you know these are things you want to file away the MSS size and we saw it the jumbo frames here not over here it's not a problem in the window size right that that was key and finding this issue was seeing the window fill the window on the sending side and it's important here to think you know where you're capturing from and what it should look like you know are you gonna see acts immediately are you gonna see acts coming later but thinking about perspective is very helpful and a good exercise and then you know in terms of the application behavior kind of what you expect to see if you know this is really talking about like iperf to iperf 3 making sure you know you see what you expect ok the next thing is a quick little tutorial on tcp [Music] ah all right mmm that is my delightful daughter who's now seven and just finished first grade so that's yeah that was a long time ago all right I hope you took a lot away from that don't try to make a video with your kid when she's sleepy and cranky all right so next one is interesting NFS hang I had a buddy who actually works in tech support for a company said hey we've got this Linux server bare-metal talking in a NFS to a net app and it hangs at some point in the connection they've tested different flavors of Linux they don't see the issue so you know they're definitely looking at this Linux server but they're not quite sure what the issue is when it hangs they do a quick ping so they still have IP connectivity like the whole stack isn't frozen it's just it seems to be this connection so let's have a look ah yeah so when he sent this to me I think I'm in the loop that could I make that I should make that bigger right so this is the directory and if you look in the net app pcap and the SLES 12 pcap there one's a gig and a half and one is almost a gig right some people's machines can maybe handle that it won't completely explode mine certainly won't even if you can get it to load it might be very slow so when I'm given a big peak app there's a couple things they like to do now there's like to my for me the definitive topic on this is Jasper's top a session I don't know if he's doing it this year but it's in the retrospective from years past about finding the needle in the haystack excellent stuff so for me I'm like okay I want to I want to take this peak app and break it down maybe into smaller chunks and actually I've done that before so I'm not going to and one way to do that is with teeth be dumped I'm gonna break it into a hundred and megabyte chunks and give it a new name with - client run that wait and now we have multiple files at around 100 Meg each let's open one of those you will open the first one so that's gonna be way easier to work with as a start mmm but it's not ideal right because I don't know where the problem is and I've got you know ten or more files to kind of work my way through to figure out what I need to do so right away just glancing I see the TCP link this appears to be jumbo frames and if I look at what we've captured well we've captured 1500 bytes do I need 1500 bytes per packet to troubleshoot this probably not I don't know it is in FS so if it's an application issue I'm gonna need the the NFS but I'm gonna assume that maybe I don't so what I can do is cut this down a little differently with edit edit cap it comes free with every installation of Wireshark no extra charge we're gonna cut it down to say like 78 bytes you know there's um Oh just real quick there's Ethernet there's IP there's TCP one of these I thought yeah VLAN so we want to you know we do you like 14 bytes for Ethernet for for the VLAN 20 for IP 20 40 CP and another 20 just for giggles so take the file this one is that and will do snap right so we'll just take seventy eight bytes from each one and that 128 that's not too bad so let's open that one and then we'll have the whole thing so it's still not you know super Swift but it's it's usable okay so let's have a look at our conversations there is one port 20 49 if you're in a faster rings Bell we're still loading okay all right three-way handshake three-way handshake not there I could search for you know TCP flags send equal one I'm not gonna find it so after I finished smashing everything because I don't have the three handshake i sat back down to have a look let's take that off so I immediately express my displeasure to my friend and said you didn't get me three handshakes like I know these are long-running connections it takes a while to like get to the hang he's like I'll work on it for you okay fine so it looks like we have jumbo frames you slide it over and if we do a little just a little scroll action to see if we spot any patterns or anything now while it is you know file transfers over NFS it's not the protocol is not just like you know HTTP or FTP or whatever it's like you know give me that here you go take it right there's there's there's more going on at the protocol level with NFS so we may not necessarily see our our nice neat little tcp pattern but you know it we can see it's definitely going you know there is there's very low latency you know this is on a local land so let's just have a quick you know let's take a shot at the our graph I'm sorry no so this one I split it up initially I open one of them saw that each each packet was 1500 bytes captured so I went back to the original file the gigabyte file sliced it at 78 bytes per packet and now I have one file for the whole thing so this file covers the whole thing and yeah we have some interesting action here so remember what we're looking for is an up and to the right graph for throughput and it does that multiple times but very strange right we have it going up and to the right very quickly dropping way down here going back up here dropping down here so I think this is the first time I've actually seen this in a graph does anyone know what's happening here somebody why do do and they flatlines yeah yes I probably so actually well let's zoom in to the bottom of this little drop and we'll click on a packet down here at the bottom towards the bottom that's probably close enough let's go back okay so our sending side dot 11 here's the sequence number right it's a big number this is a 32-bit integer is the sequence number which has a maximum value right so we see it here here here here and then it rolls over to here so this is low latency lots of data being transferred you know Multi multi gigabytes and you can only have 32 bits in your sequence number field so what happens when you get to the the highest number you can get to starts over so that's what the graph is showing you and you know I'm certainly a known that sequence numbers do this but I don't think I've seen it in a graph before so just for interest sake this is you know at the bottom right we just rolled over at the bottom so what I'll do is I'll set a time reference on this packet so now we're setting this packet is time zero and we're going to go zoom out and we're gonna go to the top and there's the amount more a lot of data and now we'll zoom in on the top alright let's close enough so now we're to peak right before it rolls over again so that is 11 it rolls over right right there so that took so now this is my other time column is time since the being beginning of capture or time since your last reference set because I set a reference of where it rolled over the first time and I've gone to where it rolls over to the next time it took ten point five six seconds to do that and we know exactly how much data was transferred right because it's 30 32 bits so we can just do a quick calculation so if we do that that's how many bytes right 2 to the 32 and it's if we care about bits will do it times 8 that's how many bits if we divide that by ten point five six seconds that is bits per second we can make it a little more friendly there's K there's um and here's GB so about three gigs for Giga bits per second is what we're dealing with here okay so now that we kinda like whoa kind of the initial shock of that is worn off we're like okay the other thing that sort of I noticed looking at this look at this TCP links we know as jumbo frames right we saw our you know we saw at the beginning the TCP length was a typical jumbo frame size for TCP link field this is 62 K in one packet what what is that about is that a problem not really but it does make analysis really annoying so I mentioned offloading earlier about check sums well you can also do TCP MSS as an offload so TCP can just say hey I've got 62 K that I need to send here you go network card go break it up into MSS ice chunks and send it out but because we're capturing between you know the kernel and the driver we don't actually see what goes out on the wire so we see what TCP is handing to the card which is 62 K this is one reason why if you can possibly capture with a tap or something not on the box itself it's it's very useful could you see you can actually see what's on the wire and here we just kind of have to guess so back to our graph will zoom back out now we know that the the weird peaks and valleys shouldn't be you know a problem of themselves but we very then clearly see this long flat spot here at the bottom where we assume is the hang that we're talking about that we're looking for and as I move my mouse around you can see how it automatically highlights the little circle is are the packets as I move my mouse along you can see it jump to the next packet and there's a certain pattern to this space between them that is a hint about what may be going on so if I click on it it'll go there and you see the retransmissions right these are the the highlighted ones in black and red and they're sending 8960 bytes and the time deltas here you know they jump out a little bit right we can go to the beginning so we have a time delta of around 195 milliseconds you know right around 200 then we go to 400 and then we're approaching 800 1.6 3.2 what is that what does that sound like right exponential back-off so I send a packet I I'm sending packets I started timer I expect to receive an acknowledgment of the data that I sent within a certain amount of time it's different on different operating systems but here it appears to be around 200 milliseconds when that timer goes off and I have not yet received the acknowledgment I've got to resend that packet that is a retransmission timeout so that's happening and from the graph you can see is happening for a really long time right it appears to maybe recover a bit later but eventually the thing dies and in here when I was looking at these let me see if I can find a yeah due back so if I'm expecting you to send me back at 5 and you send me back at 6 but I never got 5 I'm gonna say hey wait I'm gonna acknowledgment I'm gonna send an acknowledgement for 5 in every single packet I get after that that's not 5 I'm going to remind you that I asked for 5 and those are duplicate acknowledgement so you can see the act number here is consistent on these doop acts and we're up to a hundred and something do PACs for this one sequence number and if we go down when you're looking at our and this is where having the three-way handshake is really useful because I would expect to see on you know offer in most operating systems selective acknowledgments but there are Ellis in that those are buried in the the TCP options on every ACK if there's a duplicate ACKs happening well we're up to 173 duplicate ACKs I don't have any TCP options that tells me probably that it wasn't enabled in the through handshake but I don't know because maybe it got stripped out maybe it's not turned on I don't know because I don't have a three-way handshake so I asked him to do that to get it to me do I have so he ended up doing a span at one point and he got me I think one that says span it's 191 I won't I won't bring it up it's terrible because you know we're talking this is a high throughput span on a switch is a lower priority there's tons of drop packets which yes these so they're a little a little different but it's microseconds so when I say I'm expecting five you give me six I sent an accent no I wanted five you send me seven I send an accent no one at five you send me eight and say no every single out-of-order packet will generate a duplicate ACK right so these acts were generated from data received higher up so you know like this this is data being sent it's not the one I wanted so every single packet I received and again it's those acts are responding to something further up and again it's difficult because it's acting individual packets which I don't see this is 44,000 bytes which isn't what really went on the wire that you know you divide that by the MSS that's how many packets actually went out so the acts are you're you're receiving acts four packets you never actually see in this capture which is why offloading can be a challenge he gave me a span capture which is quote what's on the wire but in this environment the span capture had tons and tons of drop packets so they weren't like literally dropped on the network or not packet loss the capture process dropped them so it's really hard to ignore analyze that yes so when you're looking at the capture and it tells you previous segments not seen or whatever if you don't see like duplicate ACKs or the receiver saying you know hey I missed a packet with a dupe back so you're missing packet five you might see an act that says I receive packet five you never saw five but you see the acknowledgement say it received it so you don't you don't go into packet loss mode you don't do three transmissions and duplicate ACKs it continues on without that you just miss you're missing the packets so I did get the the capture I verified sack and time stamps were not in the options for the fuelling handshake so one I said you need to turn sack on you you do have packet loss in your environment that's what kicked off this whole thing sack will help the performance if you enable that to be able to use selective acknowledgments in tcp times stamps is generally default as well and then do you know what they're used for one thing they're used for it's called pause PA WS protection against rap sequence numbers it's exactly what we've got here now is that you know if you basically within a certain time frame you can have the same sequence number on the wire or you know at least out there but it's really two different pieces of data right cuz we wrapped it's not the same data so this helps TCP keep track of what's valid and what's not I don't know if that would come into play here and honestly it wasn't the problem but I said go back find out why sac has turned off but turn it on do your test they did and the problem went away they opened a case with the Linux folks and they said yeah it should be on by default back you know slap your hand for turning it off but they did acknowledge it was sort of a combination deadlock of the lack of sack and some sort of TCP kernel issue that they filed so that was I don't know who decided to mess around with the options and and disable it but one it was back on the problem went away yeah how we doing we're okay again unless you really know what you're doing I do not recommend changing TCP default settings there's a lot of things around I've seen on the internet people talk about time weight cycles and buckets and changing some of those things unless you really understand the implications of what it is you're doing it's probably best to leave it as it is and again I'm having to do my best guess at a lot of this stuff and maybe I would have seen it way sooner if I'd had the three-way handshake to see what options were missing and the offloading right it can make things difficult just like you were talking about the acknowledgments I'm receiving acknowledgments for tons of packets I'd never actually see on the wire because of offloading okay I want to one more quick thing here before our last one if you guys haven't heard about Wireshark here's a little infomercial for you I think is it a network problem is it an application problem do all your SNMP and net flow graphs look fine are you tired of playing the network blame game maybe you've tried finding some lost packets you probably even tried swapping the cables if you tried everything and still can't find the problem you need Wireshark Wireshark lets you look directly at the packet on the wire no more blame game you'll save the day and be the hero the best part Wireshark is absolutely free but that's not all included with Wireshark is edit cap birds kept reorder cap and many more when network problems get you down turn to Wireshark Wireshark is provided enabling which is it no cost at you learning to use a wire shirt will cost you time practice and training attendance at shark fest is highly encouraged Wireshark is not responsible if you read the packets wrong until the server team they have a server problem when it's really your ten year old firewall dropping packets alright I hope you guys met the star of that video I think he's here this week John Ford police he's supposed to be here so John Ford did not come no different John Ford the guy in the video he is also John yeah yeah different John Ford John Ford in this video is supposed to be at the riverbed booth anyway introduce yourself tell him how much you enjoyed his thespian you know efforts for Wireshark alright last one this was interesting was ejected just came up at the Europe shark fest and I talked to the guy and we threw this together real quick for Europe this is also a pretty interesting one so blank webpage it's either so slow so excruciating and people just give up or just never seems to load it just spends it's basically unusable telco we Mason yeah there's always we're making changes but you know this nothing will affect a web app you know finger-pointing yeah I mean I'm sure you guys never have to deal with finger-pointing when it comes to application performance and of course the the the network teams like now our network is clean no issues there but you know a good way to figure out where to actually point the finger a pcap so let's have a look at a peak app and this one is called will start so I have a client and server which again if you can get it from both if you can get a capture from both ends it's fantastic I in terms of where to start to looking at the data I like to if a client if someone is complaining I want to start from their perspective what are the packets look like from the side of the connection of the people who were complaining so this is a client side and if we have a look at the pcap that he gave me it's a web app so you know there's multiple TCP connections with a web app well he narrowed it that he was able to narrow it down to one connection so I thank him very much for making the job a bit easier than digging through and trying to find the problem you know the problem so he believes he's discovered the issue like in the peak app just trying to understand it better so we have one TCP connection we have a three-way handshake and let's come down here and have a look we've got MSS one to scale sack no timestamps but you know and oh what's missing from this one right have a look at those and then we'll come down to this one what's missing from the options MSS what happens if there's no MSS well I believe RFC 879 says the default size 536 look at this is that the problem for a blank webpage probably not but yeah right so that's definitely feedback to go to your your who was it the server yeah the server team you might want to fix that okay fine so let's have a look the round-trip time about 44 milliseconds where based on this where is this where he told you the client-side but you know it's between the sense and back through a handshake and then there's almost there's like a 1.2 second delay for the next packet coming from the client is that a problem that's a you know over a seconds a long time on the network it's not a problem I don't know at the beginning of a connection for HTTP well nothing is going to happen right until the client asks for something so we're waiting on this is a get request maybe it's user time maybe you know browsers will start up multiple connections to a server already there to you know cut down on the round-trip time and then they get used as needed so maybe the connection was set up and then you know the requests didn't come down through this connection until one point two seconds later client wait time whatever the client delays in the client side at the beginning of HTTP I'm not too worried about but again then there's 50 milliseconds which is just barely over round-trip time from the from the server so that's a pretty quick response and it is sending data 536 so we get a response back pretty quick ok we're often going now so we'll do our little scroll and you can see on the little cheater pane on the right we've got some nasty stuff coming up we're gonna scroll down and some some some subtle highlighting jumps out here this is a rule that I have and you can if you want to know why something is highlighted you can go under the frame header and you can see their coloring rule so this is Delta time displayed is greater than 190 milliseconds sometime in the past I put this in to maybe help me spot delayed acts or something and I just left it in so I find it useful so you can clearly see we have some delays and look at these look at these deltas 300 some 900 something 3 seconds 12 seconds 48 seconds is that the normal back off that we see not that I'm really aware of if we're talking you know the typical back offs for retransmissions it's usually doubling this is doing more than that but the numbers if you're you know a computer person network person packet person those numbers and the the the changes between them should definitely tickle the back of your brain right that's those those can't be on accident those those Delta's three nine then milah then 3 seconds 12 seconds 48 seconds right so let's back up a bit I mean these are these weird retransmissions so we can this is why I like to have my sequence numbers right here really close ready to go I remember years ago reading a comment on some website a guide like pasted some TCP dump output and ask a question about sequence numbers or something and a guy responded said I've been looking at TCP dump output for 15 years and I've never had to bother with sequence numbers I'm like what are you doing what are you even doing I mean okay I found that interesting humorous so let's have a look we've already talked about sequence numbers today and what they represent you know I've got my sequence number of this packet I've got next sequence number that I expect to see in the next packet that it sends in the current acknowledgment so just for example we'll start here I'm just gonna read off the last few digits so we have a current sequence number of oh seven four eight we sent 536 bytes so you add that to get the next sequence number which is 12 84 if we come to the next packet you come down it is in fact 12 84 well we sent another whopping 536 bytes we expect the next sequence number we get to be 18 20 so we come down the next packet sure enough is 18 20 now we're at 20 23 56 is the next expected sequence number and this is the first delay that's that's right 26 23 56:28 9228 92 so it doesn't appear that these are retransmissions or you know packet loss these packets are in order they're just inexplicably delayed and if we scroll down more we see these this number getting up to 60 seconds I mean if you're talking about a blank page if you have a delay like this in the transport for HTTP I mean you see nothing happening on your your browser screen right you're just waiting you're going to get coffee do whatever you're frustrated because something's happening the background we don't know what eventually we do get a previous segment not captured so let's see if it really is the last packet sent 50 36 and the next expected sequence number 55 72 what did we get 60 108 that's not an order and if you do the math it's 536 bytes or it's missing so we dropped we it appears possibly we dropped one packet could it be a capture problem well let's see so the the data continues on and remember the one we're looking for next is 55 72 so this data rolls in and then we start to AK and what do we act 5572 even though we've gotten all these other packets data continues to come in and we continue to act 5572 saying okay and you've sent me this data but I'm missing 5572 every subsequent packet you send me I'm going to immediately respond to you with an AK saying no no I need 55 72 if we go down to the the options so we did have sack enabled here you can see it keeps a list you know a range of here the packets I do have so you don't need to resend these to me I just need 55 72 so eventually we have the we have these duplicate ACKs if we keep scrolling down apparently a ways there it is we finally get 55 72 so that's just recovery you say well why why are all why so many duplicate ACKs right if we I mean how many do we get up to we got up to like 56 duplicate and again you have to think about perspective I'm on the receiving side so these packets that I'm receiving they were sent however I forgot with around 244 milliseconds wasn't the raster at time they were sent 44 milliseconds ago from the sender he already sent them out the door they're on their way here no one can stop him from coming so I'm going to receive them even though somewhere in the middle one got dropped so all those packets that were already out the door they come in to me I say duplicate act no you dropped one or we dropped one some more 55 72 and if you let me find a good run of these Deepak's and if you look at the sack block it starts at 60 108 but for every new packet that comes in it increases the right-hand side you see the number going up so these acknowledgments are acknowledging data that has already been received and we're saying hey I've got them I've got them I've got them but I still need this one which we eventually get okay so we've got one bit of packet loss which honestly one bit of packet recovery is not necessarily a problem that might slow down throughput it's not going to cause a hang our real problem is back up here even though these are in order they there's huge delays so let's take the first one that's delayed I'm gonna copy let's see TCP sequence I'm gonna copy this as a filter copy as filter let's go to the sending side and see what it looks like all right I gotta fix them right okay so we can do what is it in the control f-find so I don't want to filter it like and only display the one packet I just want to go right to this packet so you can use the find and use the display filter and it will go find that packet that matches it here it is 23:56 right that's our sequence number and be a little cramped you still read that all right here's our Delta column here's that packet here are those ones that were sent remember we received them all in order so here they are in order do you see a delay there is one millisecond delay but not 300 or 900 or whatever all these packets left at the same time look there's no delay between these packets going out right so if we come down start scrolling down we can see something's coming up so we're sending again we're just sending data and then we get here which is a retransmission of 2356 that's the one we're talking about so retransmission 295 milliseconds if we go back up let me go back to the first one and I will set a time reference on this guy I'll go down to the retransmission yeah it's still no sorry 350 so the time since I set that time reference and my first time column over here on the Left 350 milliseconds that's a nice round timeout number right so I'm sending data all day long that deducted ah and when I send the data the timer starts right if that timer expires without it being acknowledged I have to resend it here the timer is clearly 350 milliseconds I never receive the ACK so I send it again we come down here's the next one so these packets were never act they're retransmitted on the other side did we see packets show up twice no right they were in order they were heavily delayed but they showed up only once so let's go back to so what I want to know is I sent this packet twice it showed up once which one showed up I go to the first one and we're actually gonna dive into IP IP header IP ID is very useful for finding the same packet in different captures the eye piece the IP stack is shared across all TCP connections right so this increments for every pack of descent on any connection from that way and if we look at this if we kind of scroll down 3508 oh nine ten eleven twelve all in order so maybe the server's not doing a lot otherwise so 3508 we go to the here's where it showed up was there a question no yes 37 44 that's not it if we go back to the server side we go find the second one it is 30s there it is the retransmission showed up on the other side and if we start looking at the deltas between these retransmissions it might start to look pretty similar right so let's test the next one let's see is it 28 yeah 28 92 31 153 would come over here 31 153 so we received the retransmissions we did not receive the first time it was sent right we can tell that with the IP ID that's how we know now the ones that were coming in you know 35 3506 these are this is these are the packets that were coming in before the delay started 505 506 507 we would expect this to be 508 it's not the 508 one never showed up poof so these are the the later retransmissions the last one that we have here was 507 if we go down to after the the delay stop in the packet start coming in again now we're at like 607 608 and remember these yeah I forgot my yeah so what we're seeing here if we go back and look at these guys we have packets coming in everything's fine then everything freezes one pops up later another one pops up later they're all in order there's huge delays but our IP IDs removing like this the ones that show up have an IP ID way further down and when the the data string picks back up it's the original packets that were up here from before the read transmissions were sent does that make sense let me go back to the this guy so before the retransmission started we're sending data and for about you know whatever what five or six packets these all disappeared and then these packets we got the read transmissions through and all the packets that were sent in the meantime start popping up they start coming in the original ones so this is so six six four four remember this one is sent before the retransmissions right this is when everything is fine we haven't returned its video yet so this is sequence number six six four four with 607 as a IP ID and do I have six six photo this is it three six this is it so this is the exact packet so what's happening is we're sending data it stops then we get retransmission somehow poke their way through and that's somehow what frees up a log jam somewhere and all the original packets they've been held up this entire time come rushing in that is not normal if you see behavior like this where packets get held up retransmissions somehow magically get through sometimes it'll be the third or the fourth retransmission will finally poke through and then these packets just come out of nowhere that have been held up somewhere that is not normal and the times I've seen this it's always been some kind of application packet inspection engine it needs to get all the HTTP data to do security scans to do whatever policies you to do on it and if there's any kind of problem with it things freeze and then it just blows up and either packets come through or the connection gives up and dies in this case while it seems to recover some it continues to have problems eventually you know there's still a minute delay here eventually the server just gives up and you know it's like you know what we can't we can't seem to get on the same page here we give up so you take this data and you capture from both ends you capture if the edges of your responsibility so you know that it's not you in your environment that's doing this again this is not normal behavior and you take the data you go back to the telco who said that oh we made some changes but it's not the network you say yes it is here's the data this is not normal behavior sure enough they had a network probe doing packet inspection had a bug patch the bug problem on the way so finger-pointing can be put to an end for that so I think we're about out of time here know the basics IP ID it's very important to be able to try to match packets be able to do some basic sequence number analysis so you know where we are in the order of the flow if you have it from both ends that is fantastic that's what really helped us here if I just been looking at the client-side I said oh there's you got packet loss you got something but I know much better what's happening when I can see it from both ends when you have the data you lean on whoever is responsible I've been on many customer calls in the past my years doing this where you know it's this problem and people are upset I know ok walk you through the analysis with the packets and everyone I mean what can they say I mean maybe I'm wrong but we can all agree that hey let's go in this direction next let's go and figure out the next step it really kind of shuts down all the finger-pointing and hondo here is the person who sent this in that's it questions packets are fun right hey yeah so that I mean we'd have to go back it was either a weird a weird time out behavior on the sending TCP stack these numbers match up roughly I think they kind of do or it was the the probe itself that's causing the problem you know it's hanging onto these packets and then some of the retransmissions are getting through so it's either the probe doing it or we we could go back and take a closer look at the sending side it could be the TCP stack on the sending side I think I must have done this around Halloween before well if anyone has ever seen David s pumpkins but anyway I'll be here the rest of the day if you'd like to chat about packets or anything I've said wasn't clear you know we do do trying to move in a quick pace I'm happy to talk with you if you have packets you want to look at please do fill out your survey on the app you can reach me if you want to if you have case studies please reach out to me here I used to have Twitter on here but I kind of gave up on Twitter just as a personal decision but yeah you reach out to me here be happy to talk to Bill packets enjoy the rest your day [Applause]
Info
Channel: SharkFest Wireshark Developer and User Conference
Views: 5,560
Rating: undefined out of 5
Keywords: SharkFest, Wireshark, Kary Rogers
Id: ClqlK7OEFCc
Channel Id: undefined
Length: 80min 1sec (4801 seconds)
Published: Thu Jun 13 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.