Understanding Check Point CoreXL Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Shalom and welcome everyone once again I want to thank you for joining me in this video presentation this is checkpoint training bytes checkpoint training bytes is where we discuss different products features and blades in this module we're looking at core excel and core excel technologies and this is part one of this video series once again my name is Manuel joking and I will be your instructor so first of all let's take a look at the agenda of this module the first topic we will discuss are what are multi-core systems why multi-core systems were introduced and developed and then we'll take a look at the need for core excel why core excel was introduced and then we'll look at the different components that make up core excel and break down these components and then we'll put these components together and have a big picture overview of how core excel works and then we'll discuss how core excel it helps improve performance what are some of the benefits for having core excel and then finally we'll look at the affinity and we'll discuss the affinities like the sin affinity process affinity interface affinity and symp affinity and demystify these terminologies first of all let's take a few moments to discuss how to improve performance hardware vendors are constantly searching for ways to improve performance while at the same time trying to reduce manufacturing and production costs until a few years ago increasing performance was mainly achieved by increasing the CPU frequency and so at the turn of the century process manufacturers were in constant race to make faster smaller and more efficient CPUs and so the path they took to achieve this goal is to increase the CPU speed which increases the frequency of how many calculations your CPU can process but then process manufacturers run into physical limitations which are how small you can make an manufacture CPUs and hence we enter in the multi-core multiprocessor error and so what process manufacturer did to overcome this physical limitations barrier is the introduced multi-core multiprocessor systems will uses multi processors to simulate process data and so take advantage of these multi-core systems multiprocessor systems checkpoint had to recreate and redevelop and you code a firewall and this new code is what we call the core excel code and so core excel was introduced in our sixty five firewall our 65 was the first firewall that introduced support for multi-core systems and at the same time we needed to update our Linux kernel to 2.6 which is the first Linux kernel that supported multi-core systems and so in addition to writing the checkpoint code that supported multi-core systems and updating our Linux kernel to support multi-core platforms customers have to purchase core excel product and core excel is a product and feature Danny - and stall and also license so before we get down to the core excel technologies let's discuss how a packet is being processed from the CPU before the introduction of core excel now imagine there's a packet coming in and it's coming into the CPU now the CPU is going to need to process that connection so it's going to request that firewall code in this case to run its code and so the first thing that needs to do is run the f/w lock command which locks the firewall code and now the firewall code is locked so it can process that connection after the firewall code is locked it's going to do f/w chain do which then means that it's processing that packet and once that packet is been processed it's going to then unlock the CPU and for that packet off on its way now this whole cycle is repeated all over again when another packet comes in next packet comes in again it's going to do FW lock it's going to do FW chain do and then do fw unlock and for that traffic on its way but I want you to know this one thing is that we only have one firewall code so only one CPU can interrupt that firewall code for processing of packets but now to clarify this a little bit better let's introduce a second core into the picture so now we have two CPUs CPU 1 and also CPU 0 and so now assume what next fact comes in the packet comes into let's say CPU one CPU one is going to interrupt the firewall code request the firewall to process so the firewall does the FW lock which locks the firewall code now the packet is going to do the f/w chain do which means it's processing that connection now imagine that that packet is being processed by a firewall while another packet comes in and then the other packet comes into the second CPU now the second CPU is going to need to interrupt the firewall code but the firewall code is already locked and it's being processed by the primary CPU and so that second CPU cannot do anything until the primary CPU has finished processing that connection when it's finished processing that connection it's going to do after we unlock and for that traffic on its way and now the whole process can happen once again for the second CPU the second CPU can now process that connection it's going to do FW lock do FW chain do and when it's completed it's going to process that connection and for that connection off on its way and so now that you understand the challenges checkpoint had to redevelop a new quota firewall that will be able to run multiple transactions at the same time so how did the checkpoint solve this issue well let's take a look now you see we have two CPUs and we also have two firewall instances we have firewall instance zero and firewalls code one so these are two instances of the firewall this is what we call the firewall kernel instances and so we've replicated the firewall code this is actually two firewall instances but they're running the same policy the same policy is running on actually both of these instances this is one firewall and so when a packet comes into the CPUs the firewall code will lock the code will request the firewall code to execute will do the f/w chain do then unlock the code and do the code and so that's one component the first thing that we did is we recreated multiple fibro kernel instances so we have multiple firewall codes running simultaneously on each of the cores but in addition we also have to create another thing the second thing that we had to create is what we call the dispatcher we had to make a decision or where to assign the traffic what we assign the traffic the CPU one or two CPUs 0 so this dispatcher is also running on its own CPU so the dispatcher is running on its own core and also the NIC traffic is bound to the same CPU that's running the dispatcher so traffic is coming into the NIC it's going to the same CPU that's running the dispatcher and so the dispatcher will process that traffic tag that came in and so this patch will make a decision where to process that packet so it makes a decision where to send it in this case it sends it to CPU 1 CPU 1 is going to execute the firewall code where the firewall instance 1 is associated so firewall instance 1 will look at the packet will look in a rule base to see if there's a rule allowing that traffic and if it finds a match in a rule if it finds a match it will allow the traffic through and so the next packet comes into the dispatcher dispatcher makes a decision where to assign this traffic it makes a decision to assign it a CPU 0 and since CPU 0 is bound to a completely separate firewall that firewall will be able to run the F to relock do they have to be chained do and FW unlock and so those are the two components first of all we introduced the firewall kernel instance and the second component is the dispatcher now we know about firewall instances and the dispatcher let's put them together and see how they work so a packet comes in it comes into the dispatcher dispatcher looks in its dispatch table so the dispatcher has what we call the dispatch table dispatch table is where it stores all of the known connections in this case it does not know about this connection this is the first time this connection comes in the kernel tells us that the connection was not found dispatcher makes an arbitrary decision sends it to the core which is running the f/w worker zero is running on that core and so it executes that packet looks in the rule basis see if this rule allowing it accepts it and records it in the connection table so now look at this is that that fw worker which its own firewall will update its own connection table once it updates a connection table it also will update that dispatch table about this connection why because if the packet was dropped then you would not update the dispatch table the packet was allowed so the firewall worker was telling the dispatcher that it accepted the traffic and updated dispatcher so the next packet that comes in the dispatcher will know what to do with it so now look at the dispatcher it marks on which firewall is executing firewall zero is executing that connection process that traffic and if Forge is on its way so another packet now comes in it comes in to the dispatcher in the dispatch table is that packet known no it's not known so it makes an arbitrary decision let's forward it up to a firewall worker firewall worker processes that packet executes that packet looks in the rule base see if that packet is allowed it is allowed updates its own local connection table and then we'll update the dispatcher about that connections fire worker number two obviously process that connection set up the IP stack the IP stack is going to do the routing makes a routing decision sends it off on its way next another packet comes in to the dispatcher the dispatcher looks in that dispatch table is that packet in the table no it's not in a table so let's assign it to arbitrarily I'm going to sign it to file worker one wire will worker one to the rule based there is a rule allowing that traffic so it processes it and updates its connection table about that connection and also updates a dispatch table about that connection so now the dispatch table knows about that packet that was processed for fire will work or one that packet on firewall worker one is then sent up to the router the router is going to make a routing decision again forwards that packet off on its way so let's take a look at what happens when another packet comes in Packer comes in to the dispatcher dispatcher looks to see if that packet is known in this batch table and that founds a match and it finds that it was signed to firewall worker to look up was found and then the packet is forward to firewall worker to firewall worker to is going to execute run that packet and obviously it knows about that connection because it has it in its own connection table so it doesn't have to go to the rule base to see if there's rule allowing it we save some cycles here and because we already know about the connection that connection is then obviously forward up to the router the router is going to route it but the point I want to make here is we don't need to look in the firewall rule base anymore the first packet we looked in the rule base the second packet we look in the local connection table local connection table knows about this packet and it's processed and forward out on its way one advantage of using core excel is that you can have parallel cores executing code simultaneously at the same time so packets are coming in two different cores dispatcher is acting like a load balancer and makes a decision where to assign the packets and it's going to assign them to the CPUs that are not being utilized so three cores get three simultaneous packets the process and for those packets using core Excel you get performance by having parallel processing the second advantage and also explain it here is when the packet comes in comes into CPU one another packet comes in to dispatcher makes a decision firewall one they both execute those packets forward on this way now what happens if the dispatcher was to assign that packet to an incorrect core it would be dropped out of state because firewall two does not know about that connection so the other advantage you get here performance wise is that by having the global dispatch table you're assigning the packets to the correct cores part of that the goal of the dispatcher is to be able to make the packet sticky making sure that the connection is always forward to the appropriate core that knows about it because if it doesn't then you lose performance the third method in relation to performance is what we call data locale leti rather low catalytically in essence means make sure that the data is local what does that mean it means all appliances have memory and so memory is where actually that connection tables are stored so this connection table that we have here is actually stored in memory and this connection table that we have here this connection table is also stored in memory it's not the CPU that has this information when the tiger comes into the CPU the CPU zero looks in the global dispatch table if it knows about it great if it doesn't it forwards or to CPU which processes that connection and updates the connection table updates the connection table it's actually updating it into the memory and the CPU we also have cache you have cache one then you have other caches smaller caches too let's assume we have three cache and so to speed up the CPU process information that the CPU needs to execute it first looks into memory gets some of the code not the connection table but the firewall codes and it will take this firewall code you will process that information according to that packet the rule base etc and also updates some of the cache with information that it received from the memory the same thing when it's processing that connection it updates some of that information that it used to process in the cache next time packet comes in it looks to see where to sign up the global dispatch table knows about it so it says forwarded to CPU to CPU to receives that packet CPU to needs to do some calculation the CPU - needs to remove some information from memory needs to call firewall code it needs to call the connection table information to see if that connection is there all right and so it checks that before it gets the memory checks the the cache and it finds firewall code it doesn't have to go to the memory so finding things in cache is a lot faster than actually going to marry so you have performance improvements by using cache and so data locality in essence means to keep the same connections going to the same CPU so we can take advantage of the data that's local no CPUs caches okay next thing we're going to talk about is core affinity what this core affinity me core affinity in essence means we've unbound and rebound instances and dispatcher to different cores let's take a look here here I have a four core systems I have CPU zero one two and three and look at CPU zero is running the dispatcher CPU one is running final worker to CPU two is running fire worker one CPU three is running fire worker zero and so let's assume here that we have a lot of traffic a lot of traffic is coming to the dispatcher all traffic that comes to the Nick R is going to be processed by the same CPU that is assigned to the dispatcher the dispatcher and the Nick are using the same core the same CPU and I was saying if you have a lot of traffic so a lot of traffic is coming this CPU constantly and you have three firewall workers and you run some commands to verify the performance of your CPU and you notice that CPU zero is using maybe five percent is idle meaning it's using ninety five percent and the other CPUs or maybe twenty five percent each you know that dispatcher is being overused and you have a lot more resources on the other CPUs so CPU infinity we can remove a firewall worker from one CPU and bind the dispatcher to two CPUs so now the dispatcher will be running on two CPUs let's take a look I have a diagram here to shows that a little bit better and so let us know that the Nick cards all the traffic coming into the Nick or is going to be assigned to the CPU that's running this patcher the dispatcher is actually being run on two different CPUs so the Nick will send traffic automatically to different CPUs the CPUs will process that information from the Nick the dispatcher will look and process that information from either Nick and and obviously we're sending these packets to firewall workers too and so what we've done here CPU affinity in essence means we've unbound and rebound instances dispatcher two different cores here's another example let's assume that you've run some top commands and you've identified that one of the processes FWD for logging is using a lot of views what you could do is assign a process directly to its own CPU so this is what we call CPU infinity and there's only a few processes that you can do this too but FW one is one of the processes CP u infinity is how you bind different processes like firewall process or different dispatchers or different viral instances to different CPUs let me give you another example of that in the next slide well here I have eight cores and my eight cores I have one dispatcher and the dispatcher is very busy it's processing a lot of connections so what I could do is I can bind two CPUs to the dispatcher I can remove one of the firewall instances you can actually unbind firewall worker six from the CPU and attach the dispatcher so the dispatcher can run on two CPUs like so so now I have the secure Network dispatcher that dispatcher is running on two separate CPUs and it has the global dispatch table so the dispatchers sharing two CPUs and the global dispatch table is also sharing two CPUs and traffic coming into the NIC it's going to be assigned to the same CPUs that are running this patcher so traffic will be coming in to the different CPUs to processing the district CPUs will then assign that different traffic to different firewall workers now I can even go further in this so here's an example where I use four CPUs to run the dispatcher so this is where I have a lot of traffic coming into the Nix multiple traffic and so I need multiple CPUs to be able to process all the traffic that's coming in to Nick so the Nick's again are bound to the same CPUs that are bound to the dispatcher and so now I have four CPUs to be able to process this traffic from the Nick and also four CPUs to be able to process traffic relation to the dispatcher and updating the dispatch table again it's one dispatch table that shared amongst all the CPUs and so traffic to to the Nick coming into the CPUs dispatcher makes the decision updates a dispatch table and forwards that information to the firewall workers for them to process and look in the rule base ok so here's an example where I've moved unbound firewall workers so now I only have four firewall workers in this case four fire workers but I have four CPUs that are being utilized for the dispatcher and this will make a little bit more sense when we talk about secure Excel but right now I just want you to understand what CPU affinity meant is how you bind different processes different kernels to different CPUs either file instance or the dispatcher so another terminology I want to discuss is Simha finiti but in essence what sim affinity allows you to do is bind different ports on the NIC let's say I have for Port Nick and I want to assign two ports be used on this NIC or two ports to be used on this CPU so by default you have the automatic interface affinity automatic interfaces traffic comes in automatically to the NIC cards and it will be automatically assigned to different CPUs automatically depending on what how busy the CPUs are but at the same time if I want to do manual configurations of where I can assign the traffic and so I have four ports and let's assume that one of these ports is external port and so a lot of traffic is coming in to my external interface coming into this port which my external interface and so my C traffic going all over the place so I might want to assign this specific port to only and have its own dedicated CPU and have other three ports being assigned to a different CPU so my internal interface my DMZ and maybe my branch interface there's not as much traffic as my external interface from DDoS attacks etc and so I'm assigning one interface to specific CPU another example might be if I had let's use two ports here and I had a port that IV landed in multiple VLANs so veal and one VLAN to and 100 so under VLANs coming in and so this port is very busy it's using a lot of traffic and so I want might want to have a dedicated CPU for this specific port so I can bind that port to that specific CPU and the other ports will be bound to other CPUs and so this is what we call sim affinity some affinity is when we bind that specific NIC ports to specific CPUs and multi cue what is multi cue so a multi cue is where I have let's say I have two port on the NIC so this might be a 10 gig port and this is also a 10 gig port so IV landed in multiple VLANs but I only have 2 port but I have a lot of traffic because these are 10 gigs so I have 4 CPUs that I can assign to processing but I only have 2 ports usually it's going to be one req port port so an RFQ is going to be assigned to a specific cpu and this will have also an RFQ X our aqy assigned to specific cpu so I have two additional CPUs that I could use for traffic but I've only have 2 ports so what I could do is have an i or queue bound to different cpu so I rqx would be here and I or QX also bound to CPU 1 and R or qy + RR Qi bound to CPU to n CPU 3 so all traffic here will be assigned to these 2 CPUs all traffic coming into this port will be assigned to two CPUs so 2 CPUs kid process traffic from one port so this is multi queuing is you have one port and going to multiple queues queues for the CPU because the CPU have queues this traffic is coming into the queue waiting to be processed and when the queue is video is ready it's going to process that connection same thing on the other side and so this is why we call multi queue because you can have two queues bound to one port okay the next topic I want to discuss is well what happens if you have only one core if you have only one core it stands reason that you cannot have core excel but if you had two cores where is the dispatcher and where is the firewall worker let's take a look at this next diagram here I could in essence have a dispatcher running on one core and another firewall worker running on another core but that would not work properly you would not get the performance you need and so what you could do in when you have two cores you can have a one dispatcher that's bound to two CPUs and to firewall workers that are bound to the CPU so you're sharing two CPUs amongst the dispatcher and you're sharing the CPUs amongst the firewall workers and so let's recap this module first we talked about multicores multicores was a way to get more processing because of the limitations on physical hardware and then we talked about the need for core excel we talked about originally 65 are 65 introduced core excel plus the Linux kernel to be able to run core excel and then you needed to purchase the correct cell so you need a license to be able to use the core excel to unlock the cores and so if I have eight cores and only purchase core excel for four cores I can only use four cores and later on I can purchase a different license and I can purchase two more cores a license to allow me to use two more cores and another license to use two more cores I can have total of eight cores and if I have hyperthreading hyperthre edding is supported starting with Gaia and Gaia when it's running a 64-bit operating system that means that physically you have eight cores but logically you have doubled amount of core so you can have up to eight times two is sixteen so you have 16 cores eight our physical and eight are logical cores and so you can use the physical logical cores to be able to assign information and then we talked about the firewall components the firewall components is the firewall instance the firewall instance is duplicated in its own independent firewall and running on different cores and we also talked about that the third component which is the dispatcher and the dispatcher also called the secure Network dispatcher is where it keeps track of all the connections in the global connection table and then we discussed how coaxial works how a packet comes in to the dispatcher dispatcher makes a decision to where to forward I traffic for it to one of the firewall workers the firewall worker then processes that connection looks in the rule base finds a rule sends that packet to the routing for running purposes and also updates that dispatch table about that connection and we talked about performance implications running core excel we talked about data locality let data locality in essence means to keep the same connection on the scene CPU so we could take advantage of the cache that are built into the CPUs and we also talked about the dispatcher making sure that this patcher keeps the connection assigned to the same course then we talked about Co or Finity we talked about cpu infinity CPU affinity is where we assign the different processes different dispatchers that referral instances to different CPUs so I can assign firewall workers to two CPUs I can assign firewall workers to four CPUs I can assign dispatchers to one CPU or two or four CPUs etc and also we talked about sim affinity sim affinity is where you can assign different ports to different CPUs and we also talked about multi cueing Multi queuing is where we assign a specific port to more than one queue that's running on the CPU and so we come to the end of this chapter I hope that this information was informative thank you once again for joining me in checkpoint training bytes we'll see you next time bye for now Shalom dang boy we've secured a future
Info
Channel: Check Point Training Bytes
Views: 41,545
Rating: undefined out of 5
Keywords: ccsa, ccse, check point software technologies, check point software, check point training, check point training bytes, check point security administrator, check point security expert
Id: ryM8PHjfpU8
Channel Id: undefined
Length: 28min 45sec (1725 seconds)
Published: Wed Apr 20 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.