Raspberry Pi Network Troubleshooting

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to crosstalk solutions my name is Chris and in a video that I released about a month ago called getting started with raspberry Pi 4 I mentioned that one of my favorite things to do with a Raspberry Pi is network monitoring and really more specifically network monitoring and troubleshooting now I had a need come up recently where I'm going to be using a Raspberry Pi for some network troubleshooting and I thought it was a really good opportunity to show one example of how Raspberry Pi can be extremely useful in troubleshooting a pesky Network problem that is pretty difficult to troubleshoot otherwise so what I'm going to do in this video first I'm gonna show you the script that we have running on the Raspberry Pi what it does and how it's going to work then I'm gonna dig into the actual problem itself for those of you who are you know interested in actually maybe helping me try to troubleshoot this thing you can give your suggestions but that'll be on the back half of the video because I don't want to waste time up front talking about that let's just talk about the script so the script that I'm running on this Raspberry Pi will be available for you guys to download and modify as you see fit I will make sure that there is a link to the script that I'm using on this Raspberry Pi in the description of this video so what am I using here I am using a Raspberry Pi 4 with 4 gigs of RAM this is actually overkill for what I'm doing you could basically do this same exact Network troubleshooting script and setup on like a Raspberry Pi 0w if you wanted I prefer to have the hard wired network cable but I certainly don't need the 4 gig version I could use a Raspberry Pi 3 with the one gig of ram no problem but since this is what I happen to have on hand this is what we're going to be using and I'm only gonna have this at the client site temporarily and eventually I'll be bringing it back here to my office so this is the Raspberry Pi 4 I have this rainbow case from PI Moroni I also have the PI Moroni fan shim and I have set up the PI Moroney fan shim to you know come on when it hits a certain temperature and turn off when it goes below a certain temperature even though you know not necessary to do that but I like to make sure that these things are running properly so really simple we'll set up as far as the Raspberry Pi goes imagine you started with my Raspberry Pi getting started video and we're kind of picking it up from there here I have the SSH interface of this Raspberry Pi and I have a folder here called net test if we look at net test we're gonna see a file called net test sh so let's go ahead and edit that file this is the script that I'm going to be running so the purpose of this script is to check for Internet connectivity and when it detects that there is no Internet connectivity then we want to do a TCP dump the reason that I'm running this script is I have a client that is having a very odd Network problem they have a dead simple network again I will dig into this in more detail on the back half of this video but sort of the gist of why I'm having to do this is because very sporadically and we're talking about sometimes it happens three times a day sometimes it happens once a week so very sporadic this problem happens when the problem occurs the client loses Internet access and complete network access only further wired clients ok their wireless clients are still just fine so this is just affecting the wired network and I cannot for the life of me figure out what's going on and again I will dig into the troubleshooting all the steps that I've taken to troubleshoot this so far and we are now at the point where I want this device to be on site so that when it detects the Internet is out I start doing a Patrick packet capture of what's happening on the network while the Internet is down once the internet comes back up it stops doing the packet capture so the reason being or the notion being is that I need to be able to see what's happening on their network when this problem is happening and I have not been able to catch it in person so you know I don't want to sit down at their office all day every day until this problem just happens to occur so the internet and network goes down for anywhere from like five to 20 minutes at a time and then it comes back up and it might be another week before the problem happens again but it's happening often enough that it's a real you know pain in the butt for this client and totally understand so there's a number of things that I suspect could be happening but without actually being able to see that network while the internet is down like basically see what those clients are seeing on the network when they don't have internet connectivity that's what I'm trying to accomplish here so that being said let's take a look at this script it's a very simple script we have a ping delay time so this is the ping time in seconds how often are we checking for Internet connectivity I'm checking every 5 seconds originally this script was set to every 1 second but five seconds should be plenty then we can turn the debug on and off and essentially this means you know all of the output of the script can be sort of enabled or disabled so you see all these things here is if debug is on then we're gonna echo out this line right and that's basically just you know for the developers and when you're testing out the script to be able to see that sort of stuff so what are we doing here let me scroll down so basically what we're saying here is we're going to do ping - C 1 okay so that's one count to this IP address 8.8.8.8 now that's Google's DNS server you can pick any IP address or fqdn and as a matter of fact we are redoing this script we're adding this in as a variable instead so that what we'll do is we'll have a you know ping destination variable up at the top that you can set to be 8.8.8.8 or one dot one dot one dot one or any IP address or fqdn of your choosing and then it will populate the ping command down here so basically if the ping succeeds then we just go to sleep for the amount of the ping delay and then we try pinging again basically we're pinging every five seconds however if the ping fails then we are first going to do some cleanup work basically it kills any existing TCP dump processes that are running on the server and then we do a brand new TCP dump so we're doing a TCP dump a bunch of different variables that are related to TCP dump and we're putting it out to a the root folder or it's really whatever folder you're running this script from and then we append the date and time of the cat packet capture so that's really all there is to it this is a pretty simple script it just says ping the internet when the internet is not ping a bowl start doing a TCP dump and save those files now that's sort of part one of what we're doing here let me go ahead and run that script so you guys can see it in action so dot slash net test Sh it's going to wait five seconds and then it's gonna start boom so ping succeeded it's gonna wait five more seconds it's gonna do it again boom ping succeeded right so right now it's just pinging every five seconds I also have another version of this running in the background so let's go ahead and hit control C and what I'm gonna do here is I'm just gonna unplug the network cable from the Raspberry Pi keep in mind that it is running that script on the Raspberry Pi already so we're just gonna unplug the network cable and let's wait for about thirty seconds okay thirty seconds is up we're plugging the network cable back in let's see if I still have connectivity sometimes it saves it sometimes it doesn't it looks like I'll have to reconnect here there we go okay so I didn't even lose connectivity actually came right back so now let's go to the root and here we can see some TCP dumps so right now it is 1:20 p.m. on January 14th and we can see here January 14th 119 p.m. we have a TCP dump now so this TCP dump is not going to be very interesting because I I had the you know the network cable completely unplugged so it wasn't seeing anything but the notion is if the internet goes down and I have this plugged into an active switch like they have a 24-port switch on premises then it will be able to see broadcast traffic you will be able to see ARP traffic it'll be able to see a broadcast storm if something like that is going on that is actually flooding the network so badly that it is causing connections to be interrupted so we'll be able to see all of that stuff on a TCP dump with this plugged into the switch at the clients site then of course you can take that P cap off and you can you know open it up in Wireshark and sort of look through the captures so there are three improvements that I'm having made to this script so this is basically the first version of the script it's good enough for this client but I'd like to see the script a little bit better so the first thing is we are making the pinga belad role look at that light reflecting on me this is from my window it's super bright and sunny outside and I've got this light on my neck let me fix that real quick okay that's better so we have three improvements that we're making to this script number one we're making the pink able address of variable we're also going to make the interface of variable so for instance if you only wanted to do the TCP dump or the pinging through a specific interface such as e0 we're gonna make that a variable as well so that you can actually specify whether you're running this script on the wireless interface of the Raspberry Pi or are you running the script on the wired interface of the Raspberry Pi also this doesn't have to be a Raspberry Pi this is a script that can run on any Linux based server at least it should finally the third improvement that we're making is when Internet connectivity comes back so once Internet connectivity is detected once again I'm putting in a mechanism to automatically upload any TCP dump captures or pcaps that were taken while the internet was down to an FTP server so right now that's a manual process I have to get into the server I have to you know download those to a my own FTP server and blah blah blah right so it'll automatically upload the pcaps for me once this is all said and done the next way that this is an important troubleshooting tool is that here at crosstalk we have a system where servers can phone home ok so I have this system set up so that when it boots up it checks in with our what we call our RMS system our RMS system is basically a way a central interface where we can VPN into this central interface and if I need access to this Raspberry Pi I can edit this raspberry PI's details and I can check a box that says connect with VPN then the next time that this device checks in with our central console it's going to say oh Chris needs me to initiate a VPN connection and it will open up GPN connection out to the rms server which I am also VPN din to so then I have this VPN connection into my clients network without having them open up any ports in their firewall or anything like that if we go here to if config on the raspberry pi we can see here that we have a VPN tunnel 0 at 170 to 1618 dot 134 when all is said and done I will be placing this device at the client site for you know a week or two depending on how long it takes me to actually catch this problem happening and whether or not catching the problem happening gives me any useful hints as to what's going on on their network I also will have remote access to it whenever I need it so I can check it every so often to see if it's done any P captures if it has done any P captures I can then FTP those off on to my FTP server open up in the Wireshark and start doing some troubleshooting so yeah just a really small example of how a Raspberry Pi can be incredibly useful to troubleshoot something that is very difficult to troubleshoot otherwise of course you can do all the same stuff with if they just have like a linux server on their network somewhere but this just makes it super convenient to drop this off plug it in and come pick it back up you know a couple weeks later alright for those of you who are still here and are interested in more specifics on this problem that I'm running up against I would love your thoughts and input down in the comments below I hope I'm not inviting too much Network administration backseat driving but what I want to do here is I will go over the customers Network I will go over what I have done already to troubleshoot this issue and basically you know what my thoughts are what what my suspicions are as to what the problem might be ok so as I said this is a very very small office Network I think they only have about six or seven employees total and they have no more than 30 devices wired and wireless devices combined on the network in total like no more than 30 devices so very small network they have Charter 300 by 20 internet connection we have already confirmed that the Internet is not the issue because Wireless clients still do have internet access while this problem is occurring it's only the wired clients that have issues so that comes into a charter spectrum modem that they have on site and then we're pumping everything into a unify USG and then out of the unify USG we go to this small switch now this is not an ideal switch but again this is a very small network very few people I didn't want to spend you know 100 plus dollars on a unify switch for basically just a switch that has one connection over to another switch a printer and an access point plugged into it so we have this tp-link eight port switch into that switch we have plugged one UAP AC pro access point where we have some wireless clients I think they only have one or two actual computers on the wireless network and then they have like they're just the regular devices you know cell phones and smart phones etc they have this big printer here so this is one of those like standing like Canon printer copier things that is you know about the size of a VW bug you know sitting in the middle of their office so that is plugged into this small tp-link switch and then we have a cross connect that goes from this small switch where the internet modem and the USG and the access point are and they have a long cable run it's about a hundred feet it goes downstairs into a basement where we have a u.s. 24-port switch and then that fans out to all of the wired clients from the basement closet fans out to all of the wireless clients are wired to clients excuse me throughout the rest of the office so basically all of the cat5 you know connections in the wall drill through to this small little closet down below where I've got this u.s. 24 switch okay so what have I done so far to troubleshoot this problem first of all what happens here is seemingly when the problem is occurring this switch goes offline so I can no longer see this switch in unify and unify however I can still see the UAP AC pro and I can still see the unified USG firewall so it's not the internet going out I still have visibility to those two devices only lose visibility to the US 24 where all of these wired clients are down below and they are all of course affected by the outage as well the wired clients can not only not see the internet they cannot see the rest of the network while this problem is occurring so for instance they have a Synology nas that they use as a file share those wired clients cannot see the Synology Synology nas while the problems happening and again the problem itself sometimes it happens three times a day sometimes it doesn't happen for a week it's very very sporadic and when it occurs the internet and network is only down for five to twenty minutes those that's about the timeframe that we are down Wireless clients are not affected by the outage and so what have I done so far when I first got there I said well let me take a look I replaced this switch so they had an even junkier switch in place before this tp-link I happen to have the tp-link just as a spare so I swapped they're really junky switch with this tp-link which is still a junky switch but it's not the world's most junky switch so I swapped that first I thought well this is kind of a you know cruddy switch maybe that's the problem maybe it's causing issues with the wired clients that didn't fix it then I made sure that everything all of this equipment here was up-to-date and by the way I don't remember exactly the order that I did all of this troubleshooting so people might be like well you should have updated the firmware before you replace that's what you're right but maybe I should have I don't remember the exact order in which I've done things here so all of the equipment is up to date on the latest UniFi firmware which is the latest as of whatever version unify controller that I'm running this problem by the way didn't start happening with the introduction of any hardware all of this was working fine for a long time I think I put the switch in close to a year ago it was working for a good probably eight months no problem before this problem started happening so all of the firm was updated I tested this cable run because I was like maybe there's something flaky and weird going on with this cable run so I brought my pocket Ethernet out there and I tested the cable run the cable tests out perfectly fine every looks good on this cable run now that being said there could be something in the walls or a mouse chewing on the cable or something who knows that causes the problem but I would think that if there was a physical problem with the cable it wouldn't be so sporadic right it wouldn't be that the cable you know goes in and out and less like it's rubbing on something and like a mouse walks by it and like shorts the cable out for five minutes or something I don't know I'd like said I don't really know what the problem is here so I tested the cable though and the cables test out fine then we have this u.s. 24 switch which was my next likely culprit because all of the wired clients are affected and all of the wired clients are in this u.s. 24 switch so I replaced that switch okay so I had just another spare switch I went down there it's actually they're still temporarily running on my spare switch I left the us 24 switch on the wall but only plugged in to the other switch with one cable and then I moved all everything else into a brand new switch thinking that maybe there's some problem at this u.s. 24 I could RMA it and we'd be all good no dice so that switch was not the issue the problem is still happening even though I have replaced this u.s. 24-port switch with a temporary replacement I don't remember if I've done any other troubleshooting I've probably spent 2 to 3 hours on site looking at different things just making sure everything's up to date and trying to figure out what the heck could be going on looking at packet captures while the problem is not occurring there doesn't seem to be any problem with his net network whatsoever so my suspicions as to what it could possibly be number one it could be some sort of broadcast storm of sorts that is basically just clogging up this switch clogging up all of these wired connections but perhaps is not affecting wireless right so if it's a broadcast storm that's one of the reasons why I have this Raspberry Pi network monitoring tool going in tomorrow because if there's a broadcast storm when the internet cuts out theoretically I should be able to see that in the packet capture when I when you know when I actually get the packet capture after the internet goes out so that's not number two I mean it could be some sort of virus or something now all of their computers all of their clients test clean I don't I don't see any malware or viruses on their systems so I mean it could be a system where something where they're like affected by a DDoS or they got some more I don't know it doesn't seem like there's any virus or malware happening but I would probably check that a little bit more thoroughly if I if this doesn't turn anything up the other thing I thought is maybe there's some sort of issue with a device like a rogue device that's taking over the IP address of their gateway temporarily like every so often it Arps itself out as the default gateway on the network that could affect everything in this way except there's two problems with that number one you would think that that would also still affect the wireless clients but it does not wireless clients are unaffected by that I don't recall if I have the lay land to WLAN multicast data enabled on the switches or not on the wireless stuff or not I think I disabled that as part of the troubleshooting here but regardless and ARP broadcast that says a different device saying hey I'm your gateway over here could be the issue it's just it would it seems like it wouldn't be so sporadic if that was the problem the other thing I'm gonna check is this printer because at one point it could be a coincidence but at one point they told me that they were not able to get to the printer they were having that internet issue where the internet was down and they powered off the printer and it cleared up the internet issue but again I wasn't there for that so it could have been just total coincidence but I will take a look at that printer as well though I'm not sure how that printer unless it was completely misconfigured I don't see how that printer could be causing a problem like this so anyways let me know your thoughts down below I'll do a follow-up to this video if I actually do get to the bottom of this problem and figure it out it's such a small simple network that I don't know I just want to see where this problem is originating again is one of the client computers that has an issue that is causing it to you know blast the network so much that it's causing an outage at certain points of the day or something like that I just don't know so we're trying to get more visibility that's where this project comes in if you guys listen to everything that I said here and you think you might know what's going on put that down in the comments below I'd love to hear your ideas especially before I head out to the clients site for some additional troubleshooting all right well I hope you guys enjoyed this video it's a little bit different than the videos that I usually do on the channel but I thought it was an interesting application for a Raspberry Pi network monitoring and troubleshooting if you enjoyed this video please give me a thumbs up and if you'd like to see more videos like this please click subscribe my name is Chris the crosstalk solutions and thank you so much for watching [Music]
Info
Channel: Crosstalk Solutions
Views: 41,663
Rating: undefined out of 5
Keywords:
Id: gV4g51zsvU0
Channel Id: undefined
Length: 23min 8sec (1388 seconds)
Published: Thu Jan 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.