Understanding Check Point ClusterXl Part 1

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Shalom and welcome everyone once again I want to thank you for joining me in checkpoint training bytes checkpoint training bytes is where we bring you advanced training on different checkpoint products features and blades in this module we'll be looking at cluster Excel and cluster Excel technologies again my name is Manuel Joe King and I will be your instructor so please sit back relax and let's get started so first of all let's take a look at the agenda of this module the first stop we're going to discuss is what is clustering and what is cluster excel then we'll take a look at cluster excel methods and discuss the different modes that these methods contain then we'll also take a look at the cluster Excel synchronization and how synchronization works with cluster Excel fourth we also will look at the clustering control protocol what this protocol does and what its purpose and finally we'll look at some P notes and clustering statuses so let's first discuss why you would need clustering now imagine you're sitting on your PC and you're trying to get to the Internet so traffic is being forwarded to the firewall and from the firewall out to the Internet to some server out there that you're accessing and viewing information now imagine if the firewall crashed while you're trying to view the information then you would not be able to get any data from that web server and so clustering is a solution that offers redundancy redundancy means that you have more than one system in case one of the system fails if the system fails all traffic will be directed to a redundant firewall or a backup firewall and the traffic will go to the internet and then back to you one other condition of cluster is that the failover should be transparent meaning that the user will not know that they're actually failure on the firewall and so if a firewall fails from the user's perspective he will not see any performance integration on this network because the traffic will be forwarded to the actual file that's still active and so it's transparent from the user perspective and it's also transparent from administrative perspective the administrator did not need to change any routes to for the traffic to the secondary firewall it would automatically was forwarded to the secondary file and out to the internet also clustering can also refer to a fault tolerance fault tolerance in essence means that you can have more than one system and so obviously you can have more than one user and so half of the traffic will be sent through one firewall the other half of the traffic will be sent through the other firewall if one of the firewall fails then the traffic that was sent through that firewall will be redirected to the firewall that's still up and so fault tolerant in essence means that whatever traffic was being processed by one of the systems is now going to be processed by the other system and so to recap a cluster should first of all be redundant meaning that you have more than one system for a backup purposes and two it should be transparent from the users perspective and also from the administrators perspective and three the cluster shall also support fault tolerance and so that load that was being processed by the firewall that went down has to be distributed to the firewall that's still active and so now that we discuss what a cluster is let's discuss what a cluster is not a cluster is not where we have a redundant pair of firewalls let's say both firewalls have the same IP address one of the firewalls is actually turned on the other firewall is turned off they're both stacked in the same rack in the server room and obviously all traffic is going to be forward through the machine that is on and in case that machine actually goes down administrator has to go into the server room and power on the machine down off and now all the traffic was going to be forwarded to this new machine and so this is not a cluster because it was not transparent meaning that the administrator had to go in the server room and power it on also from the users perspective we had some downtime while the waited for the administrator to power it on and it's also not fault tolerance because there is no traffic and load distribution but yes it is redundancy because you have multiple systems but this is not a cluster so now let's take a look at another solution that is also not clustering assume that we have two firewalls both firewalls are actually powered on their running and each firewall has their own IP address this is that one this one is dot two from the gateways perspective on the user's pcs the gateways are all pointing to dot one and so this machine is pointing dot one this one is also pointing dot one so all traffic is being forward to the firewall dot one in case the firewall doubt one goes down the users they have to go to the default gateway and change the default gateway to two so that all traffic will now be forward and redirected to the gateway to now this is redundant because you have multiple systems but it's not transparent from the users perspective because the users have to change the default gateway I've seen a similar solution instead of defining it at the gateway level they place a wrappers upstream closest to the firewall and the router themselves have the default gateway let's say pointing to dot one and so now if that one crash and goes down the administrator he has to connect to the router and change the default gateway to two so law of traffic will be 42.2 and so both of these solutions are feasible in the sense that you have redundancy built in you have multiple gateways backing up each other but you have limited transparency because either the administrators has to define the routes on the routers or the users have to change their routes on their pcs but a better solution that I've seen administrators do is enable a dynamic routing like rip or OSPF and enable that on the firewalls on the routers and one of the fireworks goes down dynamic routing we'll learn about the next best path and in this case we'll forward the traffic to firewall two so the solution is a valid solution in a sense that you have redundancy you have multiple gateways backing up each other but there's limited transparency in the sense that if a user is going through the Internet while the firewall goes down he actually can experience it downtown while the routers are learning and propagating these new routes and so now that we defined what clustering is not let's define what clustering is a cluster should be a redundant solution a cluster should be a transparent solution and a cluster should be a fault tolerant solution now you don't have to have all these three ingredients but the more options you have the better and more stable your clustering solution is and so clustering should be a redundant solution where we have multiple gateways a minimum of two gateways a maximum of about eight gateways and it's also a transparent solution where we have a virtual IP address so all pcs will be pointing to this virtual IP address as their default gateway so all traffic will be forward to this virtual default gateway dot 3 and this 3 is actually a virtual IP address that's actually owned by all the machines that are part of the cluster so I'll traffic will be 4 to all the machines that are part of the cluster and one or more machines can process this connection depending on what kind of clustering solution you've enabled and depending on what clustering solution you've enabled you can also guarantee fault tolerance if needed so now let's talk about the different clustering solutions that checkpoint offers the first is cluster Excel it's a proprietary checkpoint clustering solution invented by checkpoint and only used on checkpoint firewalls the second is a vrrp and vrrp is an industry-standard clustering solution that's used by multiple vendors like checkpoint Nokia Juniper Cisco and so Check Point offers actually both solutions in this video I'm only going to talk about cluster excel in the next video I will do a demonstration a presentation of vrrp so cluster excel there's actually two methods of setting up cluster excel the first method is called high availability the second method is called load sharing or load balancing sometimes you can use these words interchangeably so there are two methods high availability and low cheering now in these two methods there are two modes for high availability first mode is called the legacy mode that legacy high availability mode is actually like the word says legacy it's no longer used and longer supported and that the second method is called the new hey che mode the new hey che mode is actually the one that most people use and kernel is supported and used on the splat and also on guy operating system legacy hae mode as mentioned is no longer supported in its place we now support vrrp legacy hey che was supported on splat via or PA supported on gaya in low cheering load-balancing we also have two modes a simplified version is called unicast method the unicast method is an easy quick way of setting up load sharing and more advanced powerful method is called multicast the multicast method is a more efficient more powerful method of doing load sharing clustering okay so let's take a look at the different clustering solutions let's discuss the high availability solution high availability solution is that you have two or more firewalls in the cluster one of these firewalls is going to be in active status and the rest of the firewalls are going to be in the standby status and so the active machine is the one that's actually going to be processing the connections and forwarding the connections out of the internet in case that active machine ever goes down or has issues or crashes the active machine goes down the standby machines are monitoring the state of the cluster and so they recognize that the active machine went down and so they promote themselves from standby mode in now into active mode and so all traffic now will be redirected to this new active machine which is going to process that connection and forward the connections out to the internet so next let's take a look and discuss the other mode of doing clustering which is low cheering and load balancing in load balancing load cheering all firewalls are in active status so you have two or more firewalls in active status and so all traffic is going to be forward to the active machines in this case all the machines are going to receive the connections and they can run the router ISM and one of these machine is actually going to process that connection and for the connection through so you have low distribution because you actually have two machines or more machines in the cluster and processing connections at the same time and so just to summarize we have two cluster Excel methods high availability method and the low balancing load sharing method high availability there are two modes that legacy mode which has been discontinued the new high availability mode which is an active standby solution and the unicast and multicast which are both active active solutions and we also offer the VRP mode because the legacy hea has been discontinued legacy H a was supported in splat operating system splat is being deprecated and so on Gaea we support vrrp and we also support vrrp on episode exist 'm and in order to make cluster excel redundant and transparent cluster excels works with a virtual IP address there is a virtual IP address that all traffic will be forward to the virtual IP address and each end machines has obviously a physical IP address so when a traffic is forwarded the virtual IP address it could be processed by one or more machines depending on the clustering solution that you have if you're using a chi only one machine is going to be active and the other one's going to be standby so all traffic will be for to the active machine if you're using the load sharing load balancing glue solution you're going to have two machines in actus that status and so traffic will be forward to the virtual IP address and they could potentially be processed by one or more machines in a unicast multicast method and so if there are some requirements when you're sitting a cluster excel in order for cluster excel to work properly the first requirement is that the operating system should be the same in all the machines in the cluster so if one machine is running let's say Gaea operating system all the machines have to be running the same Gaea version on all the machines in the cluster the second requirement is the firewall version if one machine is running let's say our 77 firewall version all the machines in the cluster have to be running our 77 the third requirement is also the hot fixes if one machine is running hot fix xxx all the machines in a cluster have to have hot fix xxx and the fourth requirement is that the hardware should be the same let's say I use the 4000 series appliances then all machine should be 4000 series appliances now check-point recommends that they be the exact same hardware appliance but sometimes it's not feasible so this could be a 4000 let's say 200 system and another one could be 4000 let's say 400 system and so even though they're not the exact same Hardware they're in the same series and so that also would work for clustering also another requirement is that all machines that are part of the cluster have to have the same products and blades configured and the final requirement is to have the same time configured and all the machines that are part of the cluster and so it is recommended either to use the same GMT time in timezone on all the machines in the cluster and it's also best practice to use let's say an NTP server and configure all the machines to use an NTP server and so they'll be polling the NTP server and so all the machines will be synchronized with each other and have the same time and time zone so the next thing I want to talk about is the state synchronization now to discuss States organization let's go through the whole topology of a network let's say you're sitting on a PC and you're trying to access an FTP server out somewhere on the Internet and so the user he is going to open the connection through the firewall in the active firewall in this case firewall 1 the active firewall is going to look inside the rule base to see if there's a rule allowing this user to get out to the internet and in this case let's say there is a rule so it's going to keep track of this information at the kernel in the kernel there are state tables keeping track of all the connections that the users are opening in this case it opens a connection between host X and server Y so in a state table we have a connection X to Y all right so these connections are stored in a state table and then the connection goes out to the server now assume now that this active firewall it crashes it goes down and so if the active file goes down the standby firewall has to promote self to active status so that my becomes the active machine so now the connections instead of going through the firewall one the connection is not going to go through firewall to what would happen if the firewall too becomes active that existing connection that FTP connection is dropped and it's dropped because it's packet out of state the firewall does not know about the state of this connection and so packet is dropped as out of state and so to prevent these packet drops out of state what we do is we need to sync those connections from the original active machine to the standby machine so let's take the scenario over again let's start over again where we have a connection from the user trying to access a connection to the FTP server so the user opens the connection he goes out and it goes to firewall one firewall one looks in rule basters rule allowing it updates it in the state tables now the state tables you need to update that stand by that firewall one needs that app update firewall to about this connection so that firewall two will also have the same state information about connection X to Y and so when there is a failover from the active machine the standby machine becomes the active machine and now that connection is going to be forward to this firewall 2 and firewall 2 looks in the state tables and it finds a match in the tables and if it finds a match it allows the connection and so there was a transparent failover because cluster Excel took care of the failure from the active machine to the standby machine and also from the users perspective there was a seamless failover the user was not affected and interrupted that session that he had opened an FTP session is still open even though there was a failure on the network and so in order to get all the sync connections to be synchronized across all the cluster members we need that special cable called a sync cable and the sync cable is either a crossover cable between two firewalls are connected to a switch and so the sync connection is where all the kernel connection tables will be sync across all the members and so it could be either a dedicated class over cable or connected to a switch and so all connections are being processed by firewall one is going to also be processed by firewall two and it's going to be synced over that sink Network and so every connection that comes in to the active machine let's say this is active machine and again this is the standby and so all connections coming into active machine and if it's process and accepted by the rule base it's going to update in the kernel tables and it's going to sink through that sink cable and update that members kernel tables in case there is a failover that the standby will know about the existing connections so every connection that comes in will be updated in the kernel tables and sync with that cluster members so on the next thing I want to in relationship to state synchronization is we have two kinds of state synchronization we have what we call full synchronization and we also have what we call Delta synchronization now the difference between them is in full synchronization let's say I have an existing connection table and so on my active machine I have a bunch of active connections and I need to synchronize those connections to one of my cluster members let's say one of my cluster members is powered off and now I'm powering it it on or maybe one of my cluster members needs to be rebooted or one of my question members is trying to join the cluster the synchronization needs to happen across the cluster members and so on the sync Network we're going to have all of our connections a B C D and E being synced across the cluster members so that all machines in the cluster will have the same amount of connections in the connection table now that's what a full sync is a full sync happens during reboot or maybe machine is joining the cluster or powering on the system now let's assume now we have all of our connections sync amongst the cluster members and now a new connection comes in let's say this client opens new connections through the active firewall the active file process that connection and it updates it in its connection table so that connection needs to be synced do we need to sync the whole connection table no we don't need to sync the whole connection table we only need to sync that individual connection and so we're going to sync that individual connection this is what we call the Delta sync that Delta sync is in dividual connections being synced and again we have another connection coming in to the active machine active machine processes that connection it syncs it using the Delta sync now the member knows about that active connection now correspondingly let's say one of the connections is deleted is removed from the connection table now the active machine needs to synchronize that connection and update that member that that connection has been removed and so the cluster members are constantly updating each other about the connections in the connection table either using the full sync when a machine has been powered on or rebooted or using a delta sync when they're synchronizing individual new connections so the next thing I want to talk about is the clustering control protocol the CCC protocol I also call it the checkpoint control protocol this protocol runs on all the interfaces that external interface the internal interface and checkpoint control protocol is going to be sent from all the members to all the members so the member firewall one will send clustering control protocols to firewall two and firewall two will send clustering control protocols to firewall one on the external internal interfaces the DMZ interfaces all interfaces that you've configured cluster excel on the clustering control protocol is going to be updating the status of all the cluster members what status information well there's a few things that the cluster control protocol will be updating all the members number one the clustering control protocol is going to send health status reports and so every second it's going to send three packets the clustering control protocol is like a Health keepalive notifying the members that they're still active up and running and not only that they notify the state that they're in but also the state that they think that the members are in and so if one of the members fails to send these health status updates it will notify one of the members that there is an issue and so the members need to set a course of actions in order to resolve the situation number two the clustering control protocol is sending state changes updates and so if you have an active machine and active machine wants to go into the standby mode it will send these updates or if you have a standby machine and it's to become active the standby machine will update all the members that it's now coming up as the active machine or you might have a cluster that's going down and so closer control protocol will notify all the members that it's going down and leaving the cluster third cluster member probing cluster remember probing is the members will probe to see if the other members are still part of the cluster for example let's say one of the firewalls fails to send health status reports and so the member will not receive any of these packets and so it sends probing requests to identify and elicit a response to see if that member either is still up or if that interface went down and so depending on the outcome of the probing it will then make a determination on what the next course of action is to take and for it querying the cluster membership so a machine is first coming online it's going to query the cluster to identify which machines are actually in the cluster what state those member machines are in and to signal that it's trying to join the cluster and to identify what state it should join in when it joins the cluster and fifth the cluster control protocol also runs over the sink Network now this is specific kind of cluster control protocol and it's only used for syncing the kernel tables all the firewall members are syncing the kernel tables amongst each other and so it uses the sink network that runs a cluster control protocol to sync the kernel tables now the next thing I want to talk about is cluster status now I did talk about cluster status a little bit when we talked about the high availability and we also talked about load sharing I mentioned in high availability you have one machine it's going to be active and all other machines are going to be in standby status and also mention in load sharing load balancing you're going to have all the machines are going to be in active status now I've mentioned this and but I really didn't define what these states mean so active means that everything is okay and the machine is actually running and passing traffic and so in high availability the active machine would pass traffic in the load sharing it could be one or multiple machines passing traffic now in the standby status this standby only happens in high availability mode in high availability you're going to have one machine inactive every other machine is going to be in standby status and so basically the standby is the machine that's dormant he's passively listening for traffic color packets from the active machine in case the active machine has an issue and he goes down then the standby will promote itself to active status now the other status is that the down status the down status can happen for one of two reasons number one is a feminist rater sees that he needs to do some maintenance work on the system and so he might bring an active machine or a standby machine into a down state which will move him from participating in the cluster and then he can do some maintenance on the system the second reason is if the system itself is that there is an issue either an issue with one of the processes that have a problem or maybe the policy has an issue or maybe the sink has an issue and so the system will recognize that there is a problem and automatically kick himself out of the cluster and so he will go into a down state which notifies all the other machines that he's no longer passing the traffic so the traffic load has to be distributed amongst all the other members so next let's talk about active attention state the active intention state can happen in a specific scenario let's say you had two machines in the cluster and one of the machines went down and you have one last machine passing traffic now later on this machine recognizes that he has an issue with one of the processes or one of the services and so normally he would go into a down state but if you would do so in this situation you would have an Eric down issue and so to prevent a full network outage instead of going into a down state he goes into active attention state and this in essence notifies the administrator that there is an issue with this cluster not to rely on this cluster and so he needs to address whatever issues were on the other systems and when those systems come up into active state and that the active attention will eventually then go down into a down state and then the administrator needs to rectify and resolve the issues on that system next let's talk about the ready state the ready state can happen under three different scenarios the first scenario let's say this machine wants to join the cluster and so before he joins a cluster he's ready he's loaded all the software and he does a query request to all the members notifying him that he wants to join the cluster now he will only join the cluster when all the members agreed that he can join the cluster and what state he should join in on and once he joins the cluster he will go into active or standby state now another scenario could happen if the software versions are different so let's say the cluster members are running r-77 and this new machine who wants to join the cluster is running our 76 and so he does a query request to join the cluster and all the members deny his entry into the cluster because he has a different software version and so he kind of stuck in that ready state the third scenario is if the hardware in this case the CPUs are different and so let's assume this machine has 4 cores this machine has two cores again he goes into the ready state he does a query request to the members so you can join the cluster the members deny his entry because they have two different amount of cores and so in order to solve this you can obviously add more cores and so they have the same amount of cores or you could reduce the amount of cores on the members and so they will have the same amount of cores and then they'll be able to participate in the cluster so next let's talk about the initializing state the initializing state is not a issue per se this happens more when the system is been rebooted or a system is being powered on and while it loads the services and it waits for all the services to load it goes into initializing state until all the processes and services are fully loaded and then it will do a query request to join the cluster and we'll go into a ready state and when the members acknowledge that he can join the cluster you will go into active or standby state depending on what was agreed on and now the last state I want to talk about is the cluster accel inactive or machine is down state and this state can happen in a scenario where machines are participating in the cluster and all the machines and members know about each other and now one of the machines either crashes or goes down and so there is no health status reports being sent from that system and so all the members will send probing requests to identify if that system is still up and active and since that system is down there are no response for coming and so from their members perspective that system is actually down and so they're going to mark that system as cluster inactive or machine is in downstate ok now that we come to the end of our video let's take a few moments to discuss and summarize everything that we discussed during this presentation and so the first thing that we discussed is that cluster Excel is a proprietary checkpoint clustering solution that offers transparency that offers redundancy and also offers fault tolerance so the next thing that we discussed is the different clustering accel methods in the cluster Excel we have two methods we have a high availability method and high availability is where we have one machine inactive and all the rest of the machines are in standby and so traffic is being forward to the active machine and only the active machine is processing the connections and in high availability we also said that we had two methods or two modes of setting up high availability the first mode is what's called the legacy mode then the legacy mode is actually being deprecated and no longer used in the longer supported and we also have a new mode called the new haitch a mode and the new H a mode is the current most popular method of setting up high availability now we also have a different method called load sharing load balancing for cluster Excel and the load balancing load sharing we also have two modes we have a unicast mode and we also have a multicast mode now in load sharing load balancing all the machines are in active status so potentially one or more machines could be processing the connection at the same time and so the next thing that we talked about is the cluster Excel synchronization network the sync network is a the cable between the cluster members either a crossover cable or a cable connecting to a switch and all the questioned members will have Colonel tables that are keeping track of the connections that are being processed and these criminal tables need to be synced amongst the cluster members and we also talked about the full sync versus the delta sync and I mentioned that the full sync happens when the system is first rebooted or powered on it needs to join that cluster but before it joins the cluster he needs to request all the kernel tables from one of the members and so the members will provide all the kernel connections that they knows about to this new member that's joining the cluster and so in case there's a failover this machine that's joining the cluster will know about all the existing connections and so no connections will be dropped and then we talked about the Delta sync and the difference between the Delta sync is the full sync is after the full sync has been accomplished and is completed now new connections that are being processed by the firewall need to only update these individual connections and so the Delta sync is every new connection that's being processed needs to be synced with all the cluster members and next the next thing that we talked about is that clustering control protocol the CCP protocol which is the proprietary protocol that's used in the cluster Excel and checkpoint control protocol is used for many different things it's used for health status updates that cluster members are going to notify every machine that belongs to that cluster about that their health and what state they're in also the questioned control protocol is used to probe and identify if members are still in the cluster or eff members are inactive or have failed clustering control protocol will send probing requests to elicit a response from the failed members also the cluster control protocol is used to update the cluster members about any in every state change so if there's a machine that's leaving the cluster or machine that's changing their state either from active to a standby or from a standby to an active state each and every change needs to be updated to the cluster community also the question control protocol is used to query for new cluster membership and so if a new machine is trying to join the cluster it needs to query every machine in the cluster to identify what state he should join in us and also when he should join and also the checkpoint control protocol is used on the synchronization network to sync all the kernel tables amongst the cluster members and it's a special kind of packet that notifies the cluster members that this payload contains important kernel tables and the last thing that we talked about is that cluster status we talked about the active status which means that everything is okay and the packets are being forward and so there was no issues with active status and also we talked about the standby state and the standby state is only available in high availability and the machine will be in standby and it's basically dormant it's listening for the active machine and in case the active machine goes down one of the standbys will kick in and go into active status and then we talked about the down state and so if there is a problem with some of the services or processes the system could remove itself on the cluster and so it can go into a down state and the down state it does not process any connections and it's not forward any packets and so there is an issue and administrator will need to address and resolve why the system is in the down state and then we talked about that ready state now the ready state can happen for a few reasons number one reason is if a machine is trying to join the cluster and it will send some query requests to identify when it should join and what state is should join and while it's waiting he will be in ready state now a machine can also be stuck in a ready state in certain situations when he tries to join a cluster and maybe the software is different from the actual cluster members that that members won't allow him to join the cluster because they will be on different versions also if the hardware is too like example if they using different amount of course a machine as trying to join the cluster might be stuck in a ready state because other members will not allow him to join the cluster okay and the next state is the initializing state in this state there is actually no problems and this happens in the situation when the system is first booting up and while it boots up and loads the cluster Dex all processes and services it goes into initializing state while the rest of the processes and Damons finished loading and once the finish loading then it goes from initializing state and to a ready state and it probes the cluster members to participate in the cluster and the last state is the cluster excel inactive or machine is downstate and this can happen when an actual member goes down and the rest of the members will try to probe try to elicit a response to see if that system is still active and if it gets no replies then that members will mark in its table that that member has actually went down and so you'll see the cluster excel inactive or machine down when actually a machine is not replying to any probe requests and so with that being said that brings us to the end of this video I hope you found this information informative and if so I am working on cluster excel part 2 and I hope to see you there until then bye for now Shalom Cheng boy we secure the future

Info

Channel: Check Point Training Bytes

Views: 56,188

Rating: undefined out of 5

Keywords: clusterXL, Check Point, check point software, check point training bytes, High availability, load balancing, load sharing

Id: 8vMkZZZCZl4

Channel Id: undefined

Length: 37min 14sec (2234 seconds)

Published: Fri Feb 19 2016