Adventures in Using ELK to Keep the Lights On -- ElasticON 2021

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
awesome hi everyone my name is dan gunner i'm the ceo and founder of insane forensics today i'm here to talk to you about adventures and using elk to keep the lights on so here at insane forensics we deal a lot with cyber security mainly in the public sector and industrial sector and uh commercial clients so today i'm going to be sharing some of my experiences and lessons learned using elk so where our journey today begins um is really with an understanding that industrial control systems really underpin life as we know it today everything from power to water to manufacturing even making the coveted vaccine pharmaceutical everything is made today really by industrial control systems um they're the uh within industrial control systems you have different components that are kind of if you think of it as legos they're the legos that make up the factory so you have you know those blocks that are they're called programmable logic controllers they're little computers that sense what's going on in the environment and take some action all the way up to windows endpoints that tell those controllers what to do because the controllers are pretty basic so they have to listen to something going on or to someone something telling them what to do so um this is all to say that our journey begins here our journey begins with the fact that life as we know it depends on these systems today for this talk we're going to be talking specifically on lessons learned in the power grid kind of as an overview of what the power grid looks like for those that might not be power experts we're not going to turn you into a power expert in 20 minutes but this is kind of what you need to know to follow along so power is generated it could be nuclear could be you know more renewable geothermal or wind but at some point it gets generated that's block number one in this diagram it then goes through a substation that gets it onto a transmission line the substations block number two block number three is those big transmission lines you see going across maybe when you're driving down the road when it actually gets your neighborhood it goes through another substation that puts it into those smaller lines to actually distribute it into your house why bring this up is both so you can be aware that some of the challenges of cyber security do vary and some of the ways that you respond to cyber threats in these environments each of these areas is a little different so you can have different challenges and sometimes your analysis and your tool set that you need varies between the different areas [Music] so as we've gotten more efficient and as we've moved more into technology um components are continuing to be computerized even more so the picture you see here wind farms are now actually run by ipads so you can have an operator that sits in a control center you know dozens of miles away potentially controlling wind turbines from uh um from an ipad you know that's one example um kovid has also forced um you know a lot more of those um distance boundaries it's forced um companies to think hey how am i going to run this operation and make sure that all of my operators aren't getting stay committed i can make sure that you know this process continues because we need power we need gas we need water so computerization has given us a lot of efficiency but it's also started to introduce a lot of challenges this is just three sectors that i'm going to use as an example today on this slide so in power there have been situations um in ukraine in 2015 and 2016. 2016 their power system was hit allegedly by another nation state in oil and gas you have the same place there was an event called trisys and triton that we'll talk about a little earlier that went after a safety system in a middle eastern oil refinery a safety system is a computer um that like i was talking earlier with the sensor and actuator a safety system takes a look at the environment and says hey you know this is vibrating too much or maybe the radiation's too high here and all taken action so when you're talking about safety systems and oil and gas you talk about computers that are literally um helping keep the workers safe in that area and cause um some pretty catastrophic conditions and water we'll talk about some scenarios but in water there has been insider threat there was um the old smart case which was a water system that got hacked all the details aren't fully out so it's hard to say what exactly happened then but it was a water system to where someone found the ability to basically add more lie to the water which can be um somewhat dangerous so while this computerization has been helpful it has brought new challenges with these new challenges has also come industrial focused malware so it's not just people doing it these days there are actually groups out there that are writing malware to do this um we've seen malware going against industrial processes since at least 2010. in 2010 there was an event called stuxnet that a lot knows about or a lot of people know about in the security community community but what happened in um it was an iran nuclear enrichment program um that their systems got attacked and it attacked their ability to enrich uranium in 2016 there was that case i talked about about the ukrainian power system 2015 and 2016. and in 2017 there was that safety system case we talked about on the right of this slide you have um the architecture now to where the top of that diagram is what is more traditional malware while the bottom starts to get into industrial protocols this is a shift of showing hey now attackers are taking traditional malware and starting to target them towards or continuing to target them against industrial processes it's not just malware that's doing it so there also have been people in there in 2000 and marucci shire there was a insider threat case where there was a disgruntled um job applicant that went and manipulated a water system all the way down to wannacry which shut down vehicle plants wannacry also shut down um at chernobyl it shut down the radiation monitoring incidentally so you have cases to where malwa where there's malware and attack events that even though they're not intended to shut down an industrial process they do just because of that reliance on computers this year it was hard to miss colonial pipeline in the news but it was the same way to wear you know billing systems and other systems that companies rely on to provide these services went down kind of the peak of the problem we're going to talk about today is the fact the industrial cyber security field has a visibility problem and where i'm going to share a lot of my lessons learned and experience is the fact that elastic really can help bridge this gap and we'll dig into that in a little bit a quote from ann neuberger who's on the national security council um this quote was just from a few months ago um as she puts it um we don't have visibility into a lot of these systems um and you know because of this um you know this is really dangerous because there's um significant consequences that they fail right if you're talking about chernobyl radiation monitoring um you really want that to work degraded also really matters because a lot of these systems and networks are time sensitive so um even if it you know it doesn't have to completely fail for it to still be significant um from a cyber security perspective we have to be able to have the tools to respond quickly to this um at the depth that we need that we need to and that's where we brought elastic in on this so major challenges to this um when you get into visibility a lot of these networks there's proprietary protocol and technology so these control systems are made by different vendors ge emerson honeywell a lot of big names you heard and there's also a lot of other vendors a lot of them have their own protocols and languages so when you get into cyber events you might be able to use traditional security tools but you might also need kind of a translator something that understands that other language or that protocol or technology to understand it it's trickier in the industrial space too because a lot of the major vendors have bought other companies and so they've inherited the intellectual property from those companies and so sometimes you'll actually have intellectual properties that were that came through acquisition because of these proprietary protocols sometimes you get into low situational awareness because it's not just using protocols like http or ssh or other things that are more well known there's also a limited data source focus a lot of tools are either just network focus or just host data focus um one of the big strengths of honestly using elastic and we'll talk about how we use like elastic common scheme and other things but when you take something like elastic common schema and you push data into um into your elk stack you can begin to really overcome some of these challenges your collection complexity can vary so in these environments sometimes there are only certain points that you can listen to or there's only certain procedures you're allowed to do either for safety or for reliability of the process same way with adaptability when you go into these environments you have to have a toolkit that's able to adapt one of the big things that we that we learn through this and we'll talk about when we get to the on-prem versus cloud is the ability to be adaptable because regulation can change it so some environments you might have to take a on-prem elastic stack if you're using elastic in other environments you can use elastic cloud and you can use the um advantage of scale with that but adaptability can be a huge challenge um because a lot of tools might just be built for the cloud and the nice thing that we've found is you know you have to be able to blend between the two of those there's a mix of older and newer technologies so again you have to be able to deal with some really old systems sometimes and distance and technology barriers when we get to timeliness um this is a big one obviously in the security community a lot of people hear about but mean time to detection and mean time the response mean time to detection is how quick from the actual event to when you realize it happened um you know what's kind of that time delta and mean time to response is then okay well how long was it before you were able to do something to um counteract or to neutralize what's going on the thing to note in these cases is there is there can be significant consequences to time delay so again having that proper analysis stack is incredibly important you know the what i want you to leave today is elastic at least for us and i you know it can for you too there is that connective tissue for analysis so you know when we talk about how we use elasticsearch um we deal with extremely large data sets and so one of kind of the differentiators that we've gotten into is because data isn't always straightforward and sometimes data is encrypted so if um if it's going over the network encrypted you might have to rely more on host data right so elasticsearch has been how we've been able to scale out extremely large analysis we've had elastic stacks with 50 60 terabytes or more of data in them that we've been able to run very quick analytic batch jobs and other batch jobs on elastic allows you to scale the ability to comprehend that data because again as we were saying mean time to detection and mean time to response is important you can't always just throw humans at it and so the ability to bring a whole lot of data in quickly and then ask it questions and get responses back is incredibly important when i talk about data granularity um so this is where we differ a little bit so we kind of do the level to where it's like the log stash hey i have a process log but the other area that we're working in is also dealing with raw evidence formats so taking a raw disk image taking a raw memory image dealing with raw pcap you know normally you would need a cocktail of tools for that but what we've done is work through um you know bringing that data in a lot more efficiently efficiently well with that your data granularity is going to increase because you're going a lot deeper so you need to be able to deal with that how we dealt with that was through elasticsearch the kibana side this is incredibly important because like i said the in and especially in meantime to detection and response you know you have to be able to enable that human driven analysis and you need to be able to fuse that data together cabana we'll show some screenshots later of how we use it but it's been incredibly helpful and can save literally dozens to hundreds of hours even in a single investigation elastic cloud like i was saying sometimes it's regulated and you have to have all your data on premise or handled certain ways but what we've found is if you don't have to have that the option of having elastic cloud to where the management is abstracted out into a service that has slas on the back end um you know that can be super helpful so you can keep focusing on the analysis itself and not have to worry also scaling complexity some of the really the uh the ability to you know move that slider over and scale up your instance and not have to stand up new nodes and do a lot of overhead um you know there's incredible value for bringing that in so data sources like i said um you know we talked about elastic being kind of that connective tissue day that we'll talk a little about and i hit some of that we break data kind of down into three basic areas so there's process network and host data um there's a lot of data that um you might get out of logs right the efficiency for network data um for process network data is that you can officially analyze and store it the con can be if the traffic's actually encrypted you're not going to see it you'll only see the metadata um and protocol support again like i was saying there's many different languages that are spoken in these networks so sometimes you're either stuck with that metadata or you're out of luck otherwise process host data a lot of this comes from endpoint so edr xdr all of the terms out of that the pro is that it is non-network events so when you're working through correlation and we'll talk about correlation that helps the con is configuration and integrity um and then like i was saying earlier getting into raw and host um it's not filter limited and there's a whole lot of value there because unlike process network and host data there's a lot of opinions that are applied to what you get those opinions can be good but also they can hurt you because you don't get the insights you need so raw net raw host and network data can overcome that the con is you really do need that mature analysis pipeline to deal with that data so the backdrop that i'm going to use to talk about our specific um use um was um so there was something called minor attack and what minor attack is is a texan it's a taxonomy of different attack attack or uh different techniques that attackers can use to um you know to carry out whatever their goals are if you think of it it's kind of a you know bingo card of different actions that an attacker does when they're performing their operation recently there was a triton evaluation so we talked about that middle eastern safety scenario so miter which is a big think tank they took that event and they actually uh tested a lot of the um detection market against it um they tested both network and host logs so some of the data i'm going to show is from this event they did run it on real real hardware or real equipment um in a lab but where we'll see is really the importance of what we call data diversity right so throughout that miter attack eval there were different things that the vendors were supposed to detect one example is here you have a adversary initiated program um performing um one of those uh pro or protocol operations against two assets um i'm not going to get too deep into the details on this but just know that um you know the high level on this is that this is um you know looking basically at the characteristics of different communications some of that metadata stuff but to find like this very specific event some of the challenges is you need network and host data like we were saying so when we talk about that process network data process host data in raw process network data again if you speak the proprietary language of that protocol you're going to see it if you don't speak that you might be stuck with windows event logs if you're dealing with process host data and that's something you can pull through log stash or you know if you don't even you know if you're not even processing those you might be able or you might have to do it out of the packet capture a challenge here is network visibility might be limited to do that encryption and that raw data is going to require processing to get into if the protocol is not known so again when we look at the data we could use here this is some examples and i put a few tools here so that program upload action which is just one of the protocol options you might see there you might pull that out of wiresharker out of a pcap the process host like i said you might pull out of the windows event logs the raw host and network data you might pull out of the raw pcap or if you have a file on disk pull the file off disk but something that's important to note and something that you know we definitely learned as we were going through this is the entire chain of attack matters you're probably going to struggle to find every step and there's steps that you won't find on their own but you do still need to find enough discrete events in each phase and avoid the red herrings um this is where your analysis stack really happens because you have to be ready to deal with a lot of different attacker approaches for um how they're attacking your network joining disparate attack phases um this is a good example where it's important this is another um phase in that miter eval to where there actually wasn't any network data present it was all on host data so again if you're just manually going through this analysis and you're not using something like elastic with the data that's pulled in um if you're just looking at network you're going to totally miss this chain of events um you know detection was only possible with endpoint monitoring in this case so you you have to be able to correlate host techniques with other steps kind of getting towards the end of my talk i'm going to talk through how we generally approach these approaches problem that i've been talking through we'll talk about collection extraction and normalization we do use elastic common scheme i'll go and do a little bit of that for those that might not be as familiar some of the enrichment we do we run automated analysis and then human and then each step along the way we'll kind of see that it builds on each other so data collection we've kind of talked through a few different scenarios and about a few different events um but there's a lot of big challenges with data collection you have to anticipate your needs you have to know what you need to collect um you have to worry about those architecture things that we talked about um some collection can be automated but particularly in a lot of these environments sometimes you are just dealing with batch data and so we've gotten really good the screenshot here shows one example of some of the batch data we pulled in we've gotten really good with dealing with batch data because sometimes we can't modify the environment to configure forwarding on it once you have that data the quote unquote write data which right can be arbitrary you have to actually extract and normalize that elastic common schema allows you to overcome that gap between hey here's network or here's communication data in host logs versus here it is in network data if you don't do this this is really going to make life harder in the future steps when you get into enrichment this is where life does get harder if you didn't normalize your data in the past step but you have to be able to take that data and normally um you know pivot into it and enrich it and so an enrichment what we're showing here actually here we're using uh the map feature in kibana we take things like location data through showdown geoip is another way of doing it and what's nice with this is you can plug it into your visualizations to say hey let me start drawing geo boxes around regions for source or destination ips enrichment can make and or break your analysis here and there's a lot more we do a lot more enrichment the ip address and vulnerabilities there's a lot of other thing um like third-party enrichment services that we use to pull in but this is where we save our analysts a whole lot of time what we then do is actually run automated analysis through it so we want to put our analysts on um the problems that matter and the hard problems that we can't per se automate so we'll automate parts of the analysis so they don't have to do it so they can stay focused again some might be api limited here due to licenses but automated analysis saves us when we do get into human enriched this is when we are actually digging through kibana pivoting through finding the answer that we need to find so in conclusion people matter so all of this and all of our experience with elastic is to empower our people because people grow when technology isn't in the way processes matter like i said there's a lot you have to do with the data and technology matters there's always a lot of limitations for um or there's often a lot of limitations for how you do this so where our journey ends networks or we shouldn't fear data volume um but better people process and technology protect life as we know it so with that um let me know if you have any questions um in the chat or feel free to reach out and thanks a lot
Info
Channel: Insane Forensics
Views: 42
Rating: undefined out of 5
Keywords:
Id: ZuYBcgVmbZc
Channel Id: undefined
Length: 25min 11sec (1511 seconds)
Published: Tue Oct 19 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.