In my last video, I did a quick overview of how
to setup a NetGear modem with Google Fi for LTE cellular backup. And the reason I went through
all of that, was ultimately to do this - which is to provide cellular backup to my home network.
Especially because as work from home recently has become so much more of an important thing, having
a reliable internet connection has suddenly also become super important. And unfortunately
my internet connection varies from day to day. So I wanted that ability to just failover
my internet connection to something secondary, in case my internet completely went out or
was just performing so terribly that it was unusable. So in the diagram you see right
now is a brief overview of what I'm trying to accomplish. I have my internal network, which
I'm just representing with a single switch and a single PC as internal clients - in reality
I have quite a handful of things. I also have a Cisco FirePower 1010 as my external firewall
from my home network. And I currently have two connections going into it - one is going to be
outbound to my existing internet connection and second is going to the new NetGear LTE modem
that I just configured recently... and that's ridiing over over a Google Fi cellular network. So
ultimately what I'm looking to accomplish is this: I want some way for me to be able
to monitor my existing connection, using IP SLA or some other route tracking, to
measure an external service - in this example I'm using Google DNS at 8.8.8.8. And I want to be
able to tell if that connection exceeds a certain amount of packet loss or packet latency. And if
that happens, go ahead and flip the connectivity over to the NetGear modem for now - but should my
primary internet connection come back into line with the thresholds that I've specified, fail
back over to my primary internet connection. So in order to accomplish this, I evaluated
a couple of different ways of trying to see what I could do using just the FirePower
box alone. And unfortunately there's not a good way within the the current software base
to accomplish exactly what I was looking for. So instead I went the route of writing
custom automation using Python in the backend - to do both the monitoring and
the route injection and removal in case of failover. So first we're gonna have to go
through a set of changes on the firewall itself to get ready for this. So let's go ahead and
switch over to our firewall - and we'll login. Okay, so the first thing we're going to need
to do is configure our interface for the LTE modem. Now as you can see in the diagram, it
looks like ports 1/7 and 1/8 provide power over ethernet - which will be helpful since I purchased
the Power over Ethernet variant of the NetGear modem, to reduce cabling and also benefit from my
battery backup. So first thing we're going to do, is go ahead and go down to interfaces. And we'll
go down to interface Ethernet 1/8, which is where I have the modem plugged in. And we'll edit that.
We're going to go ahead and change the name. Keep this as routed, set the status
to enabled, and we'll set our ipv4 to DHCP - so it can collect an address from the
modem. I will note that on the modem itself, I kept it in routed mode - so that
we have a upstream gateway that we can inject a route to. So we will remove the
"obtain default route via DHCP" - since we don't want a default route to push all of
our traffic over the modem. Next we'll go ahead and go over to the PoE tab - make sure
the Power over Ethernet is enabled and hit OK. Now we will need to deploy our changes for that
interface to come up. But what we also need to do, is add this interface into an existing security
zone. So we'll go ahead and go up to objects, and security zones. And in order to make
efficient use of my existing ACL policy and all of the configuration that I have already,
I'll just go ahead and add this to the existing 'outside' zone. So it'll benefit from all the
same rules that my existing internet connection has. So got an add "LTE backup modem",
hit OK and OK. The benefit of doing that, means that now I don't have to go through
all of my access list policies and change all of the rules - since all of the rules
are ready allowing traffic from the internal network to the outside zone. But what we will
need to do, is configure an NAT policy. So go ahead and go up to policies > NAT. And we're
gonna add a new manual NAT, which we'll name "NAT_LTE". And we'll go ahead and set this to
"after auto NAT rules" and type to dynamic. Our source interface is still going to be our
internal network, which I have on a trunk port. And our translated is going to be the modem.
Source address, I'm going to set to my internal network. Destinations: any. Source for
the modem, we're gonna go ahead and set as interface. And we'll go ahead and hit OK. Next
we're gonna have to make one other change that's going to help me with my automation. We'll go
back up to the device page and go to routing. Now the way that the script I wrote works, is
going to be by injecting a static route into the FirePower appliance every time I want to
failover to the external internet connection. Now in order to keep monitoring my primary
internet connection even when I'm failed over to the secondary connection, I'm going to add a
static route for the IP address that I intend on monitoring over my primary internet connection. So
we'll go ahead and hit create static route - and we'll name this as "failover_monitor". And our
interface is going to be our outside interface, not the modem. And for networks we're gonna
create a new network. we'll create a host for Google DNS which is gonna be 8.8.8.8 - hit OK and
add. And so we'll go ahead and type Google DNS, add that in here. And our gateway is gonna be my
upstream gateway from my current provider - so we'll add a new network object. And we'll add that
address in here, hit OK, go ahead and find it, add it in here - we'll keep the metric at 1 so that
it's always preferred and hit OK. Once we're done with all of those changes, we need to go ahead
and deploy them to the firewall. So we're going to hit up deployment, review our changes to make
sure that we have everything in here that we need, and hit deploy now. This is going to take a
minute so we'll come back in just a moment. Okay - now that our changes have deployed,
we'll go ahead and flip over to Visual Studio and look at some of the code. So the first
thing we'll see, is this options file that I created. And this is going to contain the
configuration settings that we'll use for our tests. So first we're going to see the options
relating to the hosts that we're monitoring and the thresholds that we're setting for for
latency and loss. So you'll see what the ping target is - in this case I'm monitoring
8.8.8.8. We're gonna send 10 ping messages, every time we run the script. My max latency is
that I don't want to exceed 2000 milliseconds. And my max loss is no more than 20%. Next we'll have
a couple of settings that we want to configure for our FirePower itself. First is the address - in my
case I have a host name of just FDM. The username and password that we're gonna use for the API
credentials to log into the device. We're going to set what our failover route is. Now in a normal
case I would just set this to 0.0.0.0/0 - but for the purposes of this test we're just going
to inject a host route to fail over to the LTE modem. We'll also configure what our failover
gateway is - this is the IP address of the modem, that will be setting as our next hop. In this
case that'll be 192.168.5.1 - and the failover interface, which we just configured as Ethernet
1 / 8. Now this is comprised of two different scripts - there is a path monitoring script
and the firepower script. We'll take a look at the path monitoring script first,
because that's gonna be the easiest. The first thing that we're gonna do is go ahead
and load all of our options out of the file. This script performs two primary functions - the
first one is going to be running our ping tests. So this is going to go ahead and send the 10
ICMP messages that we had configured already to the host and measure the latency and response
time. Then we'll go ahead and calculate what the loss and latency is, and make a determination
on whether or not that exceeds our thresholds or it stays within the thresholds. Depending on
the result, we're going to make a call to our FirePower module - to either add our new static
route over the LTE modem, or remove that route. So let's go ahead and take a look at the
FirePower script. Now this script is a little bit more involved in our path monitoring
script. This module contains all of the logic for creating network objects, creating gateways, route
entries, and automatically injecting & deleting the routing from the FirePower device
itself. So you have a bunch of config, including loading the options, the parameters for
host headers, and OAuth. We also have our config for authenticating to the FirePower, getting
the global routing table, adding our route, and removing a route. So this script is going
to check a couple of different things every time it's run. So for example, if our loss
and latency is within the thresholds and we're already on the primary connection. it's
gonna make a call to FirePower to make sure that the static route still does not exist - and
if it does exist remove it. If it finds it, we assume that we already failed over in the
past but the primary internet connection is good now - so we want to go ahead and remove that
route. In the event that our thresholds exceed what we're looking for for the primary internet
connection, this script will go ahead and add a static route to our destination - in this case
it's a single host, but again it could be just 0.0.0.0/0 for a default route. If our thresholds
are outside of the bounds that we've specified, this will go to the FirePower, create any
network objects and routing objects it needs to, inject the static route over the LTE modem, and
then deploy the policy. Now if it runs again and we're still outside the thresholds - what it
will do, is it'll go back out to the FirePower and just check to make sure that route is still
there - and if so make no changes and quit. Okay, now let's go ahead and test our
script. In order to simulate a failure, we'll go ahead and change the max latency to
about 15 milliseconds. My primary internet connection usually stays between 20 to 30
milliseconds, so 15 should be well below that. Before we go ahead and run the script,
we're gonna run a quick traceroute to check what path were taking out to the Internet
currently. All right and as we can see that is going out my current internet connection
- so now we'll go ahead and run our script. And we'll see pretty quickly, that the average
response time is 37.7 milliseconds - which is above our threshold of 15. And our script informs
us that that does violate the thresholds that we've set, and it is going to go ahead and fail
over to the secondary internet connection. And we'll see that it creates our routing object
out to 9.9.9.9 via our gateway of 192.168.5.1. After that's been done and the route has been
added, we'll go ahead and deploy the policy. Because policy deployments can sometimes take
a minute - the script will continuously check to see what the status of our deployment
is, back off for about five to ten seconds, reach back out, and check it again
to see if it has been completed. Okay now that our deployment has successfully
completed - our script does inform us that traffic has been failed over to the backup
connection. Let's go ahead and verify that by performing our traceroute again. All
right and as we can pretty quickly see our second hop in the list is 192.168.5.1 - so
we are going over our LTE internet connection. We can also validate this by going back to
the FirePower interface, going to our static routing config - and we see that we do have our
backup route out the LTE backup modem interface, for the /32 host route that we configured via
our gateway. Now as I was mentioning before, if we run this again while it's failed over
- we'll see that the loss and latency still violates the thresholds, but the router
already exists, so there's no changes necessary. So let's go ahead and fail back
over to our primary internet connection, by updating our max latency back to the 2000
milliseconds and running our script again. Now in this case, we see that our average response
time is only 55 milliseconds - which is far less than the 2000 milliseconds configured. So
our loss in latency is definitely within the thresholds that we have specified. So
next, our script is going to go ahead and reach out to the FirePower device, find the
existing static route that we have configured, and delete it. Then we go ahead and deploy the
policy changes - which happened pretty quickly this time around. And then we can go ahead and
check our traceroute one last time - and we'll see that the connection has failed back
over to the primary internet connection. Again we can verify this by going back
to the FirePower device - and we look at our static routing config now, we have no
host route out the LTE cellular connection. I hope this video is helpful and if you're
interested in using the script or learning more about it, I'll go ahead and post the
code to GitHub - Check the video description for a link. Well that's all I had for
today then - Thank you for watching!!