Running a Security Operations Center (SOC) –Challenges, Solutions and Key Learnings

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thank you everyone for joining us here this afternoon got an interesting session we're gonna talk a little bit about our experiences running an InfoSec program at our company at gigamon so we're gonna share a little bit about some of the challenges we've had how we've worked around them what solutions we've put in place to work around some of this challenges some of the trade-offs you've had to make but as we go through the slides there's a couple of things I wanted to point out every company has a different set of priorities and so just because we have implemented things a certain way it doesn't necessarily mean you have to do exactly the same you have to evaluate your own InfoSec requirements you have to go through your own thought process and then perhaps take a little bit of the learnings that we've we're gonna share with you today and and apply to how you think about your InfoSec program there's a couple of dimensions i'll share with you and how we you started thinking about this the first is the notion of risk tolerance and and we'll talk a little bit about how we approached it how we split up our risk domains as well but this is important you have to think about what risk means to you for your companies for your customers so for example if you are in the pharmaceutical center in the pharmaceutical space intellectual property is a big deal securing that intellectual property is a big deal and that becomes a risk domain if you are in transportation and logistics or if you are in retail you know uptime continuities is a risk for you and so modeling your domains of risk is important we talk to you about how we modeled are the manes of risk in a little bit as well but as I said there's no one-size-fits-all so here's some of the things to think about as you think about your InfoSec program there's a question away I'll take a couple of quick questions and they will hold it'll the end yes this is exactly so we're going to talk about water challenges we ran through in our own security operations center the InfoSec the security operations center is part of our insect program right I mean it's it's one element of an InfoSec program and you're going to talk about that but I wanted to preface it with saying that the trade-offs we made the challenges we encountered may be a little bit different for you here's some of the thought processes we went through and that's how we landed up with our Sauk program and then we talked about what we did over there so just a little bit of a think of it as a 10-minute caveat but it's a thought process it's at our journey and how you think about doing this right the second thing is to also think about as you think about your sock or your InfoSec program and the sock is a part of that InfoSec program is are you driven by regulation and compliance or are you truly driven by mitigating risk and regulatory and compliance is not necessarily a bad thing but mitigating risk it takes it beyond that right and I can say fair degree of confidence that almost all the companies that got breached or compliant right and so being overly focused on compliance is not necessarily a good thing having the compliance checkbox is important but if you're really focused on mitigating risk you have to think beyond that and becomes a that becomes a key part of your InfoSec program unfortunately today the majority of InfoSec programs and the salk and everything goes to support compliance and that's not good enough anymore the third thing to think about and this comes directly to running a soft a security operation center is what are your capabilities how good are you with programming how good are you with automation because if you are not very good that becomes an interesting area for you to sit and think about how do you build out your stock how do you build out to the people and yourself where do you go invest do you in source do you outsource a key part that we are facing as an industry today when you think about running InfoSec programs running security operation centers is a dearth of talent right we have a shortage of talent and the way you work around a little bit of that is through automation it's through programming and so thinking a little bit about this from your own sock perspective becomes important and building the capabilities for automation building key programming capabilities as far your sock team I think is going to be important and if you don't have that today it's a dimension to think about we thought about this as well we invested in some of those capabilities and my colleague Jack ham was going to talk a little bit about that in a little bit over here as well so that's another dimension for you to think about when you're thinking about your own sock and your own InfoSec program and so as we kick off this session talking about what we've done for our security operation center what I would like you to take back from this is that your requirements your risk profile your capabilities may be different and it is important to think about that first before you go jump headlong into going and developing a security operation center and and making that part of your bigger InfoSec program okay so with that let me quickly jump off into how we started modeling the domains of risk within gigamon so we provide a platform for network security 80 of the 1,400 companies leverage gigamon and they plug a lot of the network security solutions behind gigamon and so when we started modeling the domains of risk and when we put our own sock in place and an or intersect program in place we broke our domains of risk into three pieces and the first one was our customers right the people who use gigamon and we wanted to make sure that our products are secure so that when our customers go deploy gigamon whatever they plug into a gigamon whether it is an apt solution whether it is an IPS solution whether it is a firewall solution the gigamon product itself doesn't lead to the customers getting compromised and that became part of our InfoSec program doing things like pen testing and other capabilities as well that was one big domain of risk that we wanted to go make sure that we are covering as well again your domains of this may be different but this is how we thought about it the second part was our core which is that ok as we are a product as we're a technology oriented company where is a product development happening where is a source code happening who has access to that what routers what firewalls does that sit behind how do we protect that because that's a key part of our risk domain as well and so that became a second domain of risk that we wanted to protect and that became part of our InfoSec program and part of our security operations center practice to be able to monitor and model the risk to those particular domains as well and then the third one of course there's the rest of the corporate right everybody has laptops everybody has desktops there are handheld devices and that is the biggest threat vector that's the initial threat vector for almost all threats coming in the human element is the most easy element to compromise today and so that's the third area of domain of risk that we wanted to go and make sure that our security operations center and an infra sect program models around and based on that we created a risk registry that captured the risks in every one of these domains and we started tracking that and a pretty regular basis so that's how we started thinking about our own intersect program which is the biggest impact is our customers we want to make sure our customers are secure the second thing is our core where are we building our IP where's our source code where's our technology being developed and protecting that and then the rest of the corporate environment as well now when we started looking at our own security landscape and our own company landscape there not a large company for about seven hundred eight hundred employees but we do have many of the characteristics of larger companies so we started as a small company in a location of Milpitas we grew we bought a bigger building we added a offshore development center at a Center of Excellence where we had a lot of our rnd people in another country in another site as well we started adding sales offices globally in different parts of the world we added support centers in different parts of the world we added some cloud operations as well and with that our threat landscape changed now the boundaries have dissolved between what is internal versus external what is secure versus insecure there's no such boundary anymore because we are a globally a distributed organization on-prem solutions cloud solutions as well and so from that perspective our environment became pretty complex even though we are not a massive company our environment to protect and our domains to protect became pretty complex and we went through a little bit of a thought exercise thinking okay how should we think about simplifying this because we don't have the resources to go protect everything and what should be the initial areas of focus what should be the secondary areas of focus and one of the things we modeled around was that one of the areas that we feel strongly about from an info ii from a security operation center perspective is to be able to leverage the to work not just as a medium of connectivity but as a medium of telemetry and as a medium of detection and so for us the network became the primary area of focus to go after when it when we talked about our security operation center and looking inside the network both as a way for detecting threats as well as as a way for enforcing certain policy actions as well and so that's how we said okay let's make that the primary area of focus endpoint is also important it doesn't go away we still have endpoint based solutions but the trust became let's looking let's look into the network environment because the endpoint no matter how much you do on the endpoint perspective no matter how many solutions you put on the endpoint again the human element is such that the endpoint becomes the easiest element to compromise right so we put a lot of focus around the network and try to approach it from a network perspective and when we did that we realized we had some challenges we have some problems like many of you over here we deployed intrusion prevention systems we deployed firewalls we have Sims that are looking at that of traffic data as well and what we realized is that the volume of network traffic data is massive it's huge and that was leading to a significant challenge in terms of the cost of running our sock and all the tools we're deploying in our sock as well as how many people you need to go be able to look at all the information that these tools are putting out out there as well so this became a real challenge saying okay we're going to focus on the network but all the network security tools are significantly challenged because of the massive volumes of traffic and the second problem you ran into is that a lot of our traffic is also encrypted right we are a cloud first company which means that not necessarily means put everything in a doublet and Azure what it means is we will leverage SAS applications as much as we can what cannot be satisfied with SAS we will then go look to see if we can put it in AWS Azure and if not we put in our data center as well but we start with a cloud first approach as well and a cloud first approach a lot of the traffic is encrypted as well so we started having this challenge where a lot of our dueling in our sock was challenged by the sheer volume of data and the fact that when you start thinking about doing things like decryption are looking at encrypted traffic the utilization falls significantly now the easy way to solve that is Dulli right but that's an expensive solution and we have limited budgets again as I said we're not a large company even larger companies are limited a month in the budgets that they have to deal with so this became a little bit of a challenge for us and so we backed off a little bit and said is there a more architectural sound approach to dealing with this right and this problem arises by the way because most companies and including us when we started doing this would do this we have a network infrastructure and we would throw tools at the problem and we would connect tools at different parts of the network hoping that we would see the right traffic and catch the right kinds of malware movement lateral movement north-south movement commit command and control we were hoping that that would be the case but hope is not a strategy right and so this kind of an approach is kind of an ad hoc approach is what led to all of those challenges right and I would encourage other people if you are thinking about leveraging Network telemetry data for InfoSec to move away from this what we did was we took a platform based approach an architectural approach where we actually deployed a security delivery platform and the platform actually connects into the network and connects into the network across physical virtual and cloud so instead of your tools reaching out into specific portions of your network the platform collects all the traffic from the different parts the network and brings it to the tool it's a paradigm shift right and took a little bit of time to go deploy all of this and figure out we're all in the network we want the data from we're all are we going to tap into the network and then take that data and then feed it to the tools and once we took this approach there were several benefits that came about and we're gonna walk through this I'm gonna have Jack come and talk a little bit about some of this in a bit as well the first part is that the platform got us the capability to get access to data and traffic and telemetry information across your physical your virtual and cloud infrastructures so no matter where our applications are today or where they are going to be in the future we knew we had access to the right telemetry data that we needed for running our sock and for our eyes on the wall right so the platform provided us access to physical network traffic data because we use physical taps it provided us access to traffic data in a private cloud environment because the platform integrated with a VMware environment and provides us access to network traffic and telemetry and an AWS and Azure world as well so we've got coverage that was the first step we took care of the coverage problem as well and let me at this point bring Jack on to talk a little bit about some of his experiences in dealing with this as well Jack but I am the CTO Jack is the guy who actually makes things happen he runs actually runs our security operations center Aaron InfoSec program right which means he just told you all the philosophy and then said go build that everyone happened a good conference yeah did anyone else get a bottle of hangover recovery drink hanging on their doorknob last night yeah okay a few people good wasn't just me yes so when I was brought in to gigamon I was given the task of okay you're gonna build a sock now go and one of the very first things I noticed into assad's point about ad-hoc deployment for example is that we did have tools we had a lot of good tools we had spent a lot of money but when you started to look at it the relevance of the relevant traffic getting to them wasn't there because what happens as well we buy an IDs and we go let's put a spam tap over here maybe that's van Porto get the data but that's the course which it starts getting overloaded it's not seeing the traffic and so one of the things that gigamon has done well is that they've got the security delivery platform so functionally for us one of these things as an appliance right and so what we did is we took the appliance and we plugged it into the network and we stuck it between our core and our firewalls and then we did some intelligent que the tapping of fiber elsewhere in the network and all of a sudden I had this ability to start sending all of the traffic that I needed to the tools in one spot and out of that came some other things that were really useful for example we started being able to deduplicate packets this became enormous because when we first started tapping and we said okay now we're going to go over here and put it to this this IDs for example it was seeing the same traffic from multiple segments and so this became very problematic for us because now we've got the tool that's getting the traffic but now it's getting multiple copies of the traffic so now it's starting to gurgle blood again and my team is blinded because it's getting duplicate alerts for the same flows in multiple locations so once we got into that model we started saying that ok now we can do this let's build an infrastructure out so that when we want to add tool it's plug-and-play so I would say when we've started a new tool was a I don't know a three-week product endeavor to get it going and especially if we wanted to just do a POC it was a lot of labor we had a lot of a lot of problems with that but once we built this fundamental framework we got to the point now where we can just plug a tool in and it's just another port next to another tool and it's getting all the traffic we have now moved POCs appliance based pocs from a few weeks of setup down to about two hours on average for us to get them going and so as a result of that I've one of the things we started doing at our own sock was we said hey there's a lot of tools out there what are the good ones well in the old days you go to sales and sales people lie to you and tell you things that they think you want to hear and you ignore them and go back to work and you're still not sure which tool to buy so we actually implemented a program now where we roll on average we aim for three tools per quarter and now for perspective my team has three people we have three security people including me we're now able to pump through POCs at a rate of about one a month all year long and that out of that we make decisions on which tools perform better which which tools perform best which tools were most relevant to our network we're an R&D shop right there are tools that I think would probably perform wonderfully in retail but when you plug them into a network with a bunch of engineers alarm start going off all over the place you know reverse soch is scary why is this person making DNS requests this way that looks like DNS tunneling and so as a result of that we started to be able to refine our tooling approach then we got to the point of virtual and we said ok now we've got the physical network looking good but what about lateral movement and what kind of kicked us into gear would this was want to cry originally and we said okay well our entire domain controller network is virtualized we don't see what's happening on the v switch so we deployed virtual agents and so now we've got the ability in addition to the physical network to see us traffic contained within our V switch and then the mandate came down from the top that we're going to the cloud we're now cloud forward is that way is that okay of course that's what he told me cloud first so then we went to the cloud and then I said oh great now I got to monitor things in the cloud and we did the same thing and the important thing for my team is being so small is that I'm able to see everything and I'm able to see it in one sock and like I said we've got three people and we write by five yeah we get paged at 3 a.m. in the morning occasionally because something blows up and we need to go look at it but it allows me to operate this very trim knock so I'm not constantly trying to scale out analysts and instead I can smartly scale out my tooling and build alarming that way and so that's kind of how we started to approach it and from an infrastructure standpoint this is what we built now and this is what we're running in-house right thank you thank you for sharing that insight so this was the basis of our InfoSec program and setting up our soft the next challenge that we ran into is okay now we have coverage right and in the future if you have applications moving between a physical and a version a cloud infrastructure we're not worried about that because the platform takes care of all of that second thing is ok now that we're at our first company as I mentioned a lot of our traffic is going out of the cloud and it's encrypted and the question is are we blind to that and how do you deal with that so that's the second piece that our platform provides us the capability which is it allows us to go look at selective elements of traffic decrypt those traffic not everything and the platform actually has a very sophisticated categorization engine so you can say these categories of traffic please decrypt it and these categories of traffic don't drink it and I'll talk a little bit about what we do but not decrypt the traffic in a little bit as well but the platform gave us our capability to discern what kinds of traffic we want to go inspect and and decrypt and then go send off to the tools so the tools don't have to deal with decryption decryption is a pretty computational intensive process typically it takes 60 70 percent utilization goes down on the tools and you have to repeat this in every tool which means you're taking the hit in every tool that actually needs to look at plaintext traffic arousal that we said or the architectural it's more sound to do it in one place do it in the platform and everybody else gets the benefit of it the tools get the full utilization they're doing it once it just makes sense now there are some policy considerations and some things that we are run through and I'll turn it over to Jack to walk through some of those implementation challenges and and issues that we will run through as well yeah so as a security person you can imagine I was very excited one we were told to go do this this is a tricky one so there's going to be people in here in different sectors as we're talking about when we say would you need to really think about what your tolerance is what your goals are I have a lot of Engineers that are around us you know we're doing Rd telling them we're about to start decrypting their traffic didn't necessarily go over great but from the security standpoint as more and more traffic moves to encryption and as we see more and more malware being being encrypted natively this became a real problem you know one of the ways that we think about it internally is that our overall protection of our network winds up being effectively a function of how quickly we can detect and how quickly we can respond and one of the ways I end up thinking about this is that if you have infinity in there you've got a problem and so you know we say internally in our sock that infinity is the enemy of security and so we had to do this like this was no choice in this however the policy side of it becomes a little tricky and I'll encourage you if you start to move down the world of decrypting SSL talk to your lawyers early because they're going to have a lot to say about this as will your compliance people but I think the way you have to approach it and the way we approached it is to look at really where were we going so one of the things that we wanted to do is we clearly wanted to watch things that are obviously concerning you know yes do we block things like pornography sites at the edge yes do we block things like dark web traffic tor yes but east-west movement even internally that's still encrypted can tell you something about a story that's developing on your network even if it has not yet left your network so we started with kind of a very narrow range of categories that generally seemed unconcerned to people and we circulated him around and got some feedback to see if people were going to be worried about these things and they weren't and then we moved out from there and we started asking questions like is it okay for us to decrypt iCloud for example knowing that some of our employees run their personal iCloud on their laptops we decided no we decided that wasn't going to be okay however we did decide to go with onedrive we run it internally we thought there's some value there in case anyone's tried any case anyone's putting any malware code out on the onedrive so we did go with that route and eventually after a while kind of iterated through and found up a list that we could do and then the next immediate challenge happened is we said okay now we're going to roll this out and then offshore site we have and then the lawyers raised their hand and said different compliance schemes different regulatory rules and then we had to do the whole exercise again what I will say though is once it was done and once we were intelligently doing the decryption the tool use obviously went up it saw more traffic when it was just seeing SSL sessions we couldn't do anything so what's it do it does some basic threat Intel matching pops it up against you know source and desta and you hope for the best maybe it's doing a little bit around SSL cert inspection but that's it as soon as we put as soon as we opened up decryption guess what we started finding malware we started finding problems so every one of you is going to have to go back if you want to think about doing decryption and ask what is the tolerance you're going to take what are the categories you're willing to decrypt I suspect no one is going to decrypt everything I think everyone's going to go through that list and go we're not going to do that one and but do it because you will get better tool performance out of this and you will get results and you will see what you need to find in there and I guarantee you as soon as you start decrypting you're gonna find bad stuff in the in those packets great thanks for that insight as well so and this is this is a big item right which is if you are going down this path as Jack mentioned I do want to emphasize make sure you talk to your in-house general counsel because the policy implications of this are important but the end goal was actually very good the end result for us was actually very good and going down this trajectory as well so okay so we got the coverage we took care of our encrypted traffic and figuring out from a policy perspective categorization perspective what traffic we want to inspect more traffic we don't want to inspect the third thing was okay now we've got a lot of data to go deal with how do we Whittle this down a little bit right how do we reduce the amount of information we have to go deal with and and that's where the platform provided us the ability to look at metadata and metadata is really important from a networking perspective lemon tree perspective because looking at all sessions of network traffic is just too much information to go through so natural metadata provides provides a really good shorthand I mean the way I talk about it is you know a network tap is like a phone tap network metadata is like a phone bill and there's a lot of information you get from a phone bill and and that's one of the things we did right so we like many of you we have Sims that we have deployed internally and the Sims were getting a bunch of feeds from a variety of different solutions you know log logs from domain controllers from firewalls from endpoints one of the challenges around relying on logs exclusively is that many times what happens is lawns are generated by the CPU and the devices and so you turn on logging on a domain controller or on your firewall and the performance takes an impact the higher the logging the more the performance impact and suddenly somebody in the networking team will come and turn the logging off because it's impacting the performance security team doesn't know about it three months later there's an incident you go down to your sim you try to figure out what's actually going on and you're missing the logs right and now you're blind to a little bit of this and so this is a little bit of a challenge and we saw this early on as well and so we said ok let's let's figure out a way where we can reliably get equivalent information reliably get equivalent information with no impact to all of the devices and so the platform actually provided us an ability to generate Network metadata initially this took the form of net flow and an IP fix but eventually there was a lot more enriched metadata that the platform started generating which we then took and started feeding into our sim and that actually provided a significant relief in terms of not just the amount of data and the amount of logs it to consume in the sim but in terms of the reliability and efficacy of that information as well so Jack you want to share with us a little bit about what are some of the metadata elements that we were looking at in our sock sure so this this is perhaps my favorite one I think this is of all the things we've managed to do internally that have have streamlined our operations and have had a loud us to be really effective it's the network metadata in terms I haven't mentioned this earlier but I think it's worth worth actually reiterating is that we view the network as the source of truth what happens on the network actually happens so if we can observe it and we can make action against that then we've got this huge leading edge indicator we still are collecting logs you know in which depending on which platform you're collecting them into can be expensive so we've been able to turn down some of our logging this but the reason is is that this is being popped off the wire right as the packet goes by and then it's being ingested so amongst the things we're collecting right now is we're collecting DNS we're collecting HTTP information and we're collecting SSL right now we started with DNS because of the fact that we kind of view it as the foundational thing not a lot happens without DNS and so we have a ton of elements that we're able to export and we eventually over some experimentation whittled them down to basically return codes query our codes up codes things like that and then we collect them into our sim and from there we just we had a pile of DNS metadata and then we're okay now what do we do so we started looking at what the models looked like and a lot of this was experimentation and we were fortunate I'm fortunate to have a bunch of very good engineers under me that two of them but they're they're wonderful and they're they're very smart and so they're able to think through this as kind of a holistic problem one of the things we started to realize and we sort of said this before that you know security is going to be done differently everywhere and the other thing side of that is when you're watching your network everyone's network is in a behave a little differently and it turns out as you start to build this long-term model and look at the data you start to realize that certain things about your network end up being true you see certain spikes you see certain patterns when you're looking at time series data you see certain blends of response codes we don't run ipv6 internally so we see very few quad-a records running across the network so a few few months ago we suddenly saw a lot of quad-a records and we said oh no now it's that we aren't running ipv6 well as it turns out there was a an engineer running a traffic generator and he was doing some ipv6 simulation and we were able to identify that but the point of that is though is that this is a leading edge indicator that allowed us to look at something and go that is different and I use metadata and the way we use it internally is to point us to things because there's lots of stuff to look at and we're all doing hunting and we're all going through the network trying to understand as this bad as this good I'm gonna make your participate now show of hands anyone have too many alerts yeah yeah right well and that's a real problem right and we all have that problem this gives us a way to start to lower that threshold because we can start saying you know we've got some alerts over here but this thing is actually behaviorally different and so by doing that we can start to pivot and say let's look in that direction for a little while rather than the normal strategy of okay we've got ten alerts now we've only got enough resources to do these too so let's drop those eight and hope the bad thing wasn't in those eight and that is unfortunately for us not a way to do security so as we started to do this we learned about our DNS and then a lot of interesting things came out of this so one is we started saying well dg8 domain generation algorithm DJ's are a problem and we said you know now we've got the query information it's low volume we know exactly what was requested why don't we just write an entropy checker so we wrote a Shannon entropy checker and so now we have a Shannon entropy checker and every time there is a query observed on our network 100% of the traffic it goes against that entropy checker and if the entropy checker exceeds a certain Shannon entropy threshold we get told about it and then over time what you are able to do is you start to look at the network and say okay well certain things look bad for example when you're checking entropy but they turn out to be okay what's a good example of this an AWS domain it has high entropy it's not human readable it's long but it's not necessarily bad so we can start saying oh well actually that's our AWS account let's not watch that anyone we have a we have a travel internal travel agency that for the digital booking for trips and stuff that for some weird reason packages a hash in a URL so there's a DNS query of this long you know looks like a maybe an md5 or something or some sort of UUID so every time that hits up against the entropy generator that's a problem okay well it's our travel agency we'll leave that and then slowly over time you start to develop this list because you start to percolate up all the things that don't match understood network behavior and then you wind up with some things to look at and there's a lot less of them so we did that with DNS we started profiling how our network normally behaves what servers there are we've started to say hey you know we're gonna start bringing more DNS in-house we don't want people going out to the Internet for resolving stuff now the sudden we've got the ability to watch for rogue DNS servers because we can say these ones are the trusted ones and if you're not on that list tell us because we want to know why you're going out there from there we moved into SSL SSL certs turn out to be massively valuable piece of information one of the things that we very early did and we're still working through this process because it turns out it's really hard when users go to websites that have bad certs we tell them don't click continue they get the red alert alert alert and Chrome and we tell them don't continue and then our engineers go to our git server and it's got a bad cert and then we say oh that's ok that's ours go ahead it is trivial if you're collecting SSL metadata to just look for the case for the signer and the subject of the same and now all of a sudden you've got a hit list of self-signed cert since ID your network and you can start looking at them we also did SSL negotiation on non-standard ports and this one the very first moment we hit run on the report the very first thing I see is an SSL negotiation on like 38 thousand won it's going to a Comcast address the cert signed by Plex Media Server and I'm like oh god I'm like that's a data exfiltration someone's offloading so we go trace it down teams running around we find the engineer an engineer that happens to like watch Game of Thrones in the office so he was streaming from home but from the standpoint of reducing the problem set that is now a known quantity that we can now ignore in the future and again we start to get this percolation and so we've done this over and over when the whoa sign thing happened a few what about a year ago now we said you know just for the time being let's add one to a list of certs we don't trust so what we did is we went and we found in every single search that it turns to trends across the network and about a month and we basically sat down and went through the list and we found all the CAS that we trusted and then we found some CAS that were like maybe we don't trust those ones we put whoa sign over there and now anytime there's a certain negotiation against one on our network we can go see is there something concerning there or is it just that there's an engineer reading a newspaper back in China and oftentimes that's the case but again it starts to cut our problem set up and reduce it so we have less places to look and it's really fast and then when the logging if we need to we can write searches that then correlate against the logging which is also enormous ly helpful because you can then say this happened on the network did this event also was it observed in the logs and then out of this you just start to build all these different models that allow you to rapidly hunt through the network so that again there's a lot more information that we can go through but in the interest of time we'll run through a couple of these things quickly the key over here is that again stepping back right by by taking an architectural approach we're putting the platform in this all became possible we could do the SSL decryption we could do the categorization we had visibility physical virtual cloud and now we have the ability to do metadata extraction to be able to quickly Zone in on specific areas of interest that we wanted to go look at as well the other one or two interesting things that the platform of foreigners the capability was this notion of targeted inspections certain security solutions certain Network main security solutions want to go look at full packet streams of data but those those packet streams of data some of it is relevant and some of it is not running for example in many cases many of these solutions you know if you're running some streaming media if they're going to YouTube many of these tools would simply discard that traffic right or Netflix or Hulu as an example but at the same time when you're tapping into the network and sending in a whole pipe the tools are busy doing all of this stuff and they're looking and this and saying okay this is irrelevant let me drop it so we leverage the platform to do that pre-filtering right and and and using the platform what we could actually do is we could make sure that certain applications that were not relevant but that didn't need inspection could be discarded and other applications that were important from an inspection perspective actually made it to the security tools and and you can do this for a variety of target applications you know you can do this for example you know Windows will pump out updates every so often and on those particular days you can say I want to exclude all of the windows updates from going to my tools because that's simply gonna hammer them right and it's going to take the performance down significantly and so we could leverage the platform to do this kind of targeted inspection at an application level and sure that the right sets of application traffic were hitting the tools all the way from the start from the first packet from the TCP syn all the way till when the session completed the entire session was either filtered into the tool so it could do the inspection or taken out so the tools are not overburdened and you're getting more in terms of the performance of the tools so your investment your investment dollars are going a lot further as well right so that was another key piece of the platform that we took advantage of there's one last piece that I'm going to talk about there's a lot more but this is another very interesting piece and then we'll kind of open it up to questions and Jack will talk a little bit about this as well is that many security solutions today are deployed in line like firewalls are in line they'll simply block and take action on the traffic IPS is our inline web application firewalls are inline to the network and many of you who are who've come from a networking background you know that networking infrastructure is constantly changing you'll go through an upgrade cycle and when you tend to connect some of these solutions in line to your network what happens is that a network infrastructure refresh can potentially force and rip and replace because now you've gone from one gig network link to a 10 gig network link you've connected an IPS directly then in the network's you gotta go throw that plate out or the IPS out and get a 10 gig IPS even though you may not have 10 gig of network traffic going on you may actually have a smaller amount of traffic going on and you can't connect them serially either because you're still down to the lowest common denominator so you can take three or four one gig IPS is and connect them into the link because they're still down to one gig it's it's a serial connection fundamentally so so that becomes a challenge for deploying inline solutions the other problem we also had is the networking team is a little bit loath to putting things in line right you want to try out a new solution you want to try out an apt solution and you say look put this in line and they're gonna say gonna break my network another red line sorry guys my job as a networking guys to ensure connectivity it's not to break connectivity and so there's a little bit of this tussle going on between the networking and security operations team and so many times what happens is you deploy a an in line solution perhaps in a monitor mode or an out-of-band mode and and once you get comfortable with the position with the capability of the tool or the policies what kinds of traffic you want to take action on more traffic you dramatic action once you built out the policy model of the comfort zone you flip it and bring it in line but in order to do that in our ago schedule a maintenance window right because you got to take the network down you go connected in line and so that again leads to a lot of challenges operationally and so the platform actually has a really neat capability where you can connect the platform in line ok you do it once the platform is connected one gig 10 gig 40 gig hundred gig 25 gig it doesn't matter you connect the platform and at what was speeds you want and then all the traffic that you want to send to your inline tools whether it's to an IPS or to a raffle the platform will divert it but now it can actually load balanced inline right so if you have a set of one gig IPSS and you've got a 10 gig link or a 40 gig link you can add more of those are one gig IPS is in line to the network and it'll load balanced across the traffic and if any one of those devices fails it'll bypass it so your network stays up if that's the policy order drop it depending on what or policy you choose and so this became a really nice way for us to go deploy inline solutions or real-time threat prevention solutions but also in terms of building up that comfort zone letmego deploy this out herb and see if it works doesn't work the platform is connected in line to the network you can connect your security solutions out-of-band and when you're ready to bring them in line you flip a switch on the platform you never touch the network and it will automatically brought in in line right so you don't have to go scheduled maintenance windows you don't have to have this network security team head-butting and and all of this kind of kind of works pretty smoothly as well Jack do you want to share some insights on that as well yeah sure so if there's any head-butting between network and security I gigamon it's me had about it myself because I own both teams but the the inline solution is is interesting from an operational standpoint for my team because I don't want to hear from executives that the network is down because I don't want them calling me so what this has given us not just from a Pio seeing stand point but we're able to bring tools and we're able to pull them out if they're having problem but also from a failure standpoint if we have something that's in line that's important in IPS for example and it does happen to die we don't go and hose the whole network which you know I don't know there networking specific people in hair network security people no okay the if you're if you're on the hook for the other side of this the availability and the CIA triad then you know you have to worry about this a security tool that causes us not to be able to do business for thirty minutes I haven't done my job right so this has allowed us to scale this out and then also self-heal really quickly if we have any of these kind of problems do you lose some visibility during that standpoint yes it's a balancing act I think for us are we willing to lose some visibility if it means that we haven't taken the entire headquarters office down yeah that's something I'm willing to deal with we can go on high alert during the period that the tools down and get it running again and also then there's the scalability things out of this but the one of the things I'm starting to look for as we go into the you know next few years is where's our bandwidth going to be going I've got tools I don't necessary want to upgrade them all not all of them even have in their roadmap for example 40 gig links so now I have the ability with 40 gig traffic to slave it out amongst several 10 gig tools so if I have to go and I don't have a vendor that's ready to go the other way I can go horizontally we've also done this for license cost constraint meant we're working on this project right now actually where there was a certain tool that the licensing was modeled against the backplane throughput and I said no that doesn't make any sense because I don't have the backplane throughput you're charging me based on the size of the device and I said you know what I could do is buy a small device maybe even buy two small devices because the licensing is tied to the device size and put them parallel in line and now I'm paying for smaller devices that's not an H a pair if I lose one of them I've still got the other one running and I'm think on that one we're going to get something like seventy seventy five percent cost savings which is huge because this guy doesn't give me all the budget I want so you can tell them to give me more well anymore budget it's not gonna go towards manpower is going to go towards automation that's right okay and so that's the last piece that I want to talk about this is this is a topic that's actually near and dear to both Jack and me many of us recognize over here that security personnel shortage is a real issue and you're not going to be able to scale your sock just focused on hiring more people to do it automation is a key piece of that whole piece the clout folks have got this right by the way right DevOps has actually got this right they've done a good job with many of the tools available for DevOps and it is time for second DevOps right it is time to bring those that mindset and that mentality into running a security operations center as well so let me hand this over to Jack because Jack has done a good job the jack came from a DevOps perspective he's come from an automation perspective and he brought a lot of that perspective into running our own security operations center so Jack you want to talk a little bit about what you've done with automation and api's yeah I'll give you there's one PRC we've done that I think is really it's it's kind of our first real strong foray into it and it's really it's a great example of it is at one point in time I was asked to look at full pcap solutions and the problem full pcap solutions have you've ever done them is that they're either incredibly expensive or what you can afford means that you have a really short window of retention and then we know that the dwell time of most incidents is fairly long so you're collecting traffic but chances are by the time you know what you need it's already been rolled off the back end of the pcap collector and then when I looked at the pricing I was like I'm not gonna do it but I said there's got to be another way and again when I talked about targeting in on things that we want to see I started to say look look would it be great to collect every bit of traffic on the network short but not every bit of traffic is equal there are certain things that are starting to behave strange that are looking weird and so what we did is we started to take all of our tools that are able to tell us that this is suspicious that's a known threat that's a virus and we pulled all those alerts together and we ran him into the sim and we wrote rules and what we ended up building was on the other end of this was a Moloch it's a open source pcap distributed pcap collector and we said okay if we see enough of a lurk threshold for a certain IP let's route that into the security delivery platform and we collect the packets from that IP and send it into the pcap collector now sure is their chance you're gonna miss something yeah but if you have no pcap collection opportunity or you have targeted I argue that the target it goes better and in this particular case this tool just runs and it runs and it finds suspicious things and we go in my engineers go in and we do we review them we go do incident response then we take the rule back out but the thing is just doing this on its own it's automated we can come in in the morning and there's there's a new IP in there and we're just rolling pcaps you know we're down from what would normally be something like you know five or seven gig of traffic you know we're collecting hundreds of Meg's of traffic we were able to just do this on a single x8 regular 1u server with you know a terabyte of hard drive space or something so a solution that probably would have cost us a half a million dollars if we tried to do full pcap and then likely wouldn't have worked I don't know cost 20 grand I mean it was nothing it was just basically some labor of putting the api's together and doing some automation and now once we got that working and we saw that going we're like I got we're clearly going to keep going down this route and eventually we're gonna start doing inline blocking because hey you can wait for the firewall to do it or you can try to shorten that distance as that blast radius is expanding and just kill it right in the middle of the network which is something we'll be able to do because just like we can route traffic intelligently on our security delivery platform we can also just blackhole that traffic and stop it where it is so now of a sudden we've got the ability to just start self resolving events and it gives our team time to get catch up with them which is you know an obvious problem and we saw the show of hands for alerts like it takes time to get there so if we can contain it now then great thanks so obviously there's a lot more that goes on in running a security operations center but we want to provide some insights and you know some of the decision process and some of the thought process that led to some decisions we've taken the end goal was actually very good for us is very good for us you remember the picture I showed earlier where we landed up with security tools not really effectively utilized you know they're running to the full capacity but they're doing things that are not really relevant to them ultimately we'll end up with a situation where we have fewer tools but they're very well utilized the entire cost of running a security operation center drops significantly we are very efficient there's three people including Jack running all of this stuff but also the fact that we have the expertise we have the automation the programming expertise in the sock makes a big difference and make more of this happen as well so I'll end with this last slide coming back to this this is how we did some of these things in our sock your risk tolerance may be different you may be looking at different risk factors to start with that start that what are the areas in which you would categorize your domains of risk understand whether you are going to be focused primarily on regulation and compliance or you're going to be focused on mitigation of risk right because regulation compliance is fine but as I mentioned almost all the companies that got breached were compliant so start thinking beyond compliance and start thinking about your own InfoSec program and your stock as as going beyond the whole regulation piece of it focus on the technology piece how good are you at technology how good are you at automation and programming we made an investment when we hired the two people in our soft team the focus wasn't bringing people who could do automation who could write scripts who could do programming that was a critical part of being able to streamline our security operations center reduce the cost of all of that stuff it was a decision we made you may do things differently but this is how we thought about the whole thing and then what are the domains in which we thought you know we we had threats for us you know our customers our products where we develop a source code develop a technology putting frameworks to make sure that we're looking at all the traffic in and out of those particular domains of threat and that led to our InfoSec program our Security Operations Center and I hope some of this was useful for you your choices may be a little different but I'm hoping that some of the trade-offs you made some of the choices we made some of the architectural decisions we made and taking a platform based approach would be relevant to you as well so we'll end over here thank you all for your time and attendance if you have any questions we are available to answer any questions or you can come by and visit our booth as well by the way if you're interested again Jack runs our security operations if you want to come and visit our headquarters in Santa Clara and get a tour of a Security Operations Center please feel free to take us up on that as well thank you very much [Applause]
Info
Channel: GigamonTV
Views: 16,566
Rating: 4.8509316 out of 5
Keywords: security operations center, soc, security operations, CTO, network operations, Shehzad Merchant, Jack Hamm, Shehzad Merchant Gigamon CTO, Jack Hamm SOC Manager
Id: JRMwV5XM6Lc
Channel Id: undefined
Length: 47min 20sec (2840 seconds)
Published: Thu Aug 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.