Automating Threat Hunting on the Dark Web and other nitty-gritty things - Apurv Singh Gautam

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone uh my name is abu sang gotham i would like to thank def conrad tim village for having me here i will talk about automating threat hunting on the dark web and the things that surrounds it i presented this talk i presented a short version of this talk at grim con and this is the extended version of it so let's get started i will switch off my webcam so that you guys can focus on the presentation here okay so a little about me my name is uh apollon gotham i go by handle asg scorpion i'm a security researcher i started into threat intel or hunting two years back and i've been loving it since then i'm currently pursuing my masters in cyber security at georgia tech recently during the summer i was doing i was a research intern at xc uc berkeley doing research in threat intelligence some of my hobbies are gaming i pretty much love rainbow six siege i sometimes stream it also i love hiking i recently started into lock picking and i am been enjoying it uh i contribute to the security community it's like it's my passion contributing to the security community i'm a seri senior teaching assistant at siberia i contributed station x also and at the local security meetups and they are my socials if you wanna contact me or hit me up okay so uh what's today's agenda uh so we will talk about introduction to dark web how do what dark web is how to access dark web what tower is what are the type what's the difference between dark web deep web why you should perform hunting on the dark web before that we will discuss what is threat hunting and why it is crucial to hunt on the dark web we will discuss the different methods to hunt on the dark web uh can the dark web hunting be automated uh what's the pipeline or the architecture of hunting automating that equipment hunting then we will discuss a little about threat intelligence life cycle that's how threat hunting on the dark web is analogous to threat intelligence life cycle what steps are there that corresponds to this uh we will discuss a little about operational security that's opsec and why is it important to secure yourself when hunting on the dark web and yeah that's it so introduction to dark web so i i'm sure you must have seen this image a lot of times on the internet that shows difference between surface web deep web and dark web so the surface web is the the sites that are indexed by different search engines that's google being yahoo etc the majority of the portion of the internet is deep web now deep web is any site any website that is not indexed by the search engine so this can include your databases your server instances or any other different websites that's that is that you can't that you cannot search from the search engines uh the third one is the dark web that we will talk mostly about today and this is the part of the internet where you need some kind of software special software to access the dark web so this can include anything uh related to like drugs or weapons that's being sold on the dark app or some kind of research or books that sold on the uh that sold there so it so dark web includes several forums and marketplaces where people sell different kind of things so these are these type of things are sold there and this so majority portion of the internet is deep web and people confuse it between dark web and deep web so as you can see only six percent of the internet is dark web and it 80 to 90 percent is the deep web moving on so how do you access the dark web uh so there are several companies several organizations that offer their own uh dark web systems or you can call it so the uh the famous one is store that's the onion router uh the second is i2p that's invisible internet project and xeronet is also becoming popular nowadays so uh to a tower has a dartanian domain like the domain name ends with dot engine and idob is dot i2p uh talking more about tor how what tower is and how it works is it's it's like a three layer proxy system if you can see on the image here there are entry nodes a middle node and the exit node the entry node is the entry note is where your traffic goes to and then it goes through the middle layer and the exit layer and then goes to the destination uh this way your data like your identity is hidden and only the also only the entry node is publicly listed rest of the nodes are not publicly listed so it it's like it hides your identity it hides your ip address and these all these nodes are volunteer based systems so tor has about like 6000 release and i don't know the number about i2p but it is it is also becoming more popular nowadays and the major thing about tor is each node only knows the only knows the ip address of the next node or the previous node so if we talk about here the entry node doesn't know about the exit node or vice versa so in this way the nodes location of the nodes are also protected so there are many misconceptions about tor the dark web the majority of it the famous one is whenever we talk about or people think about criminal or criminal things that goes on tor yes there is criminal side of thor but there is a good side of the torah also or the dark web so it's it's uh really famous among whistleblowers or activists uh there are many countries where free speech every speech is limited so they can use store like people from those countries use store to express their speech or express what they think thor also has many also has access to many old literature or researches which is not available on the open web and it's like safe even for journalists obviously and so there are many popular sites like facebook and my times that has their counterpart on tour like their con daughter near website counterpart so it's useful for uh whistleblowers or activists the second thing is thor is so many people think that it's illegal to access the dark web uh so it's not illegal to access dark web it's illegal to indulge in uh these kind of activities like purchasing drugs or purchasing any other illegal things on the dark web that is being sold there but it's completely legal to access the dark web and it's like the last thing which i will talk about is many people think tor is like really big but if you talk about the uptime or the availability of sites there are very few reachable onion domains on the dark web if you compare it with the clear web so it's like tor is the dark web is like very little part of the internet so we will talk about the dark side of the dark web i mean the criminal side of the dark web so there are many forums or marketplaces on the dark web these are some of the relevant sites that's relevant to security researchers or people who want to access darkway for their uh organization's benefit so this is like some of them are a credit card market so where different credit cards are being dumped or remote access so these are these forums include uh remote access trojans or some kind of remote access tools or insider threads so insider threats uh it's like recent coming up a forum where insider like the people who are selling their companies secrets they talk amongst themselves so these are some of the relevant sites now coming to the cost so how much it costs for some kind of for something to buy from the dark web so as you can see it's really easy to buy these things on the dark web and it so like ssn you can buy any ssn for one dollar or fake fb friends a fake baby with 15 friends uh mobile malware bank details so or exploit to zero days so it's like these type of things are easy to get on the dark web that's why security researchers and other people are focusing more on the dark web uh you might have heard about recent news like 500 000 zoom accounts sold on the dark web or 267 million fb user profiles sold on the dark web so there are many data breaches occurring day by day and they are being sold on the dark web that's why researching on the dark herb is really important these are some of the product listings from the forums of the dark web this is how the products are being listed uh you can see these are the average cost of accounts for different online services like bank services what's the average cost of it or what the average cost of service for video games uh this is the average cost of tools that is being sold on the dark web as you can see again bank and financial average cost is 74 so you can buy as a brute forcing tool for some bank at 74. now coming to why you should hunt on the before that let's talk about what is threat hunting so threat hunting is like it's proactive searching for cyber threats proactive means before the attack happens that's proactive search you search for cyber threats from logs or indicators of compromise that's ip addresses emails domains etc or textual data that's that we are doing when we are searching the dark web uh there is nothing so it's basically hypothesis based because there is nothing concrete about the process you take one use case and work on it and then you take another use case and work on it and it goes iteratively in the same way many times there is use of machine learning or natural language processing that's nlp and advanced analytics process in this because you need to scan through the textual data if you are hunting on the dark web or you finding on the clear web also but for textual data so machine learning advanced analytics are useful there so why it's serious so why threat hunting on the dark web is really important uh what so big about that so as i told you about again there are many forums marketplaces dumb shops and where criminals so what criminals or actors do they learn new methods and techniques on the dark web they monetize their skills they trade their exploits or tools or even drugs and weapons and communication so they communicate within each with each other and they share their ideas for new attacks a security researcher or a person who is who is researching on the dark web he can find a lot on uh he can learn a lot while engaging within these communities you can learn their techniques at ttps that's tactics techniques and procedures how they uh think about the attack how they plan an attack and it can identify so if you do it correctly it can identify attacks in the earlier stages that's planning and recon stages and you can reduce the impacts that it causes so if suppose if your organization data is being sold on the dark web there are different kinds of impacts that can cause to your organization so some of the direct impacts are like personal information stolen or health care records stolen or even your company's straight secrets and some of the indirect impacts are a reputation of your organization uh revenue loss and nowadays the legal penalties that your data is lost and you have to cover the cost of the customers uh so this is like that's why this is really important to uh like to research on the dark web or to hunt threads on the dark web on the same lines these are the benefits of the threat hunting so if you do it correctly you can keep up with the latest strengths of the attacks you can get new ttps that's tactics techniques and procedures you can identify insider threats you can discover data breaches the main thing is you can prepare your socks and incident responders to deal with the attack because they will know before only what are the ttps attackers are using so they can reduce the damage and risk to the organization by acting quickly on that so coming to the methods to hunt on the dark web so we will discuss about some tools that's used to hunt on the rocket web and then we will discuss about the human element that can be used to hunt on the dark web so talking about the first tool that's really really important for this is scrapy so it's a web web crawling framework it's so famous it's so important because it manages multi-threading automatically so you don't have to write you don't have to spend too much time on the multi-threading part because it has already capabilities for multi-threading using one or two lines of one or two lines of parameters uh the second thing is store obviously if you want to access the dark web you need tor onion scan is an another tool that is used to search for onion websites it can tell you if your website is up or not and the correlation between different websites on the dark web coming to privacy so privacy is a web proxy uh before getting more into this so when you access the dark web you need some kind of proxy uh to access the record because your sp as i told you before the entry nodes are publicly listed so yes your isp can have a blacklist to block the entry nodes so you can't access it dark web or even if he doesn't even if the isp doesn't block it uh he can see whether you're accessing the dark web or not uh he cannot see what you are doing it on there but you can see whether you're accessing backward or not so you might not want that that's why you need some kind of proxy and a majority of people use socks proxy so a basic difference between http and source proxies uh sox proxy is a lower level proxy and it works on the sox protocol http proxy only works on http or https websites but soxbox you can work on uh other xproxy can work on other protocols also and there are different tools to use socks proxy like t socks polypo and privoxy i've been using privoxy and it has been so i i'm i don't have any problem with privacy so and it's good so the another thing is scrapey doesn't allow you to use directly socks proxy because it doesn't support socks proxy so that's why you have to use these tools like pryoxy t socks or pulley pull to route your socks through privacy scripts and uh there are other tools also like there are search engines like kilos or recon where you can find different onion domains apart from using socks proxy you can also use a vpn with store for extra layer of protection and hiding your like encrypting your data uh so getting more into scrapy part uh uh this image might seem a little confusing but i will get into it like step by step so a vice script so this is uh like why scrippy is really important and why scrippy is so useful in uh hunting on the dark web so if you can see there so for explaining this i will explain it in terms of python code so suppose the everything you see here is a different python program so spider is a python program downloader is a python program middleware is a python program and so on so what spiders what spider does it in a spider so at in his spider python program you give your onion domain on which you want to crawl the data or which you want to get the data so it gives it to the engine suppose engine is just the program that manages every other python programs so it gives the it gives the onion domain to the engine uh the engine gives it to the scheduler so what scheduler does is uh here the multi-threading concept comes into the picture scheduler gets different domains and schedules it accordingly into multiple threads so the new domains goes into the scheduler the scheduler gives it back to the engine and engine gives it to the middleware so middleware program includes your proxy program and or login program so what proxy program is that in proxy function so if we are talking about middleware that's a python program there is a proxy function into it so proxy function is where you will note you will put down your privacy ip or tour ip so that the request go request a response goes and comes through that proxy so that you can access the dark web uh the login program the login function is where you will put your user agents or cookies or so for accessing the dark web forums uh you have to so there nowadays for all the forums you have to have a account to access the dark web or to access that particular forum so to access the forum you need some kind of cookies or also there are many forums many high-level forums that implement captchas and as google doesn't work on the dark web so these captures are like image based capture or text based capture that is easy to bypass so you use you can use any machine learning capture bypassing service or any capture bypassing websites like death by capture or anti-capture to bypass the capture so these all codes you will write in this login login function in middleware program now the your request or your traffic goes through this material program to the downloader what downloader does it's a simple program to extract the html and save and give it back to the engine so downloader extracts the html and gives it back to the engine now engine gives it back to the spider now there is another another function is spider that extracts the html entities that you want from this from the forums html like suppose forum name or document id or the text based data or author name who post who posted a particular content so you get that and it is called items in scrapy so you get these items and it sends it to the item pipeline item pipeline is where your database is configured so i use elasticsearch you can use any database whether sql or new sql and it directly saves the items to the sql and so the important thing to note here is that yeah scrape so as i told you before multi-threaded multi threading is automatically handled by scrapy then other thing is you don't have to give multiple onion domains to spider so when the download gets the data when the router gets the html page from the uh for a particular forum it also there is a code you don't have to configure the code there is a code to get all the union domains on that particular html page so the scheduler automatically schedules the other online domains to go through the same process again again in this way you don't have to give extra onion domains to the engine and this is why scrippy is like really useful uh in like crawling data from the dark web and there are different so you can specify which domain to crawl and which domain to block in this way you can be safe from getting illegal data or getting illegal images so moving on uh now comes the human part so if it so we discussed about the tools what tools can you use to hunt on the dark web uh there uh there's a human element also that's called human intelligence or herment uh so it's like it's the process of gathering intelligence through interpersonal contact interpersonal contact rather by some kind of tools or technical process uh that's why it's the most it's most dangerous and difficult form because you are directly talking to the actor uh on the dark web which is not safe and it's it's not safe because you don't want your identity to be revealed to the actor or you don't want your organization's identity to be to be revealed on to the actor and it's important also because you can identify and respond to attacks much quickly uh you can do post attack investigation so suppose uh your organization uh like there's a data breach on your organization uh if you want to confirm so someone is selling this data on the dark web and if you want to confirm whether they are selling the correct or whether they whether they are lying about it or whether it is the truth so you can activate your human intelligence or activate your the guy that is researching on the dark web to go and ask to the actors whether the data is correct or not so that's post attack investigation or you can also use it for new attacker victory new attack vector discovery so that's uh discovering new ttps that the attackers are using or the attackers are discussing about uh you can assume is a as a high tech equivalent of what an fbi agent does when he like spends months or years working to infiltrate a criminal organization that's why it's uh really hard to do it because you have to spend so much time on it and that's why it's risky and this for this you have to think like an actor uh how they communicate within these communities how they act within these communities and uh the another thing is like it's uh the source from this is really valuable to your organization's safety now moving on uh so uh we talked about tools we talked about human intelligence what now comes the pipeline or the architecture of how you can automate the uh these threat hunting uh so before that uh yeah so uh i would suggest to set up a different system uh you don't want your personal data to be on the system where you are doing threat hunting so you can set up any lab or vm uh whether physical or whether on cloud uh just isolate the network and install relevant tools like scrapy privoxite or uh if you are doing if you are using elasticsearch okay okabana then elk and different python libraries that would be necessary for your task so this is the automated architecture that i have been using it uh i will go it one by one so and i have this automate i can uh for the task that can be automated and for the task that i don't have that's the only one i think scrapy setup and the design train nlp model so it's hard to automate that part so i will discuss it through so first of all you need to get the forums forum links or the market links so you can write a simple script to gather data from different search engines like i told you recon and other search engines uh where you can get all the forum links so that can be automated uh so uh another thing is using socks proxy like i told you you have to use some kind of socks proxy for this so you can get socks proxies ip again you can write a simple script and get the socks proxies that can be automated now comes the part of scrapy setup uh so scripty setup is is the so here you will write your login functions here you will get your proxy setups and here you will like manage the settings of the scrippy now you can't automate this because when you get the onion links different forum and links you have to go to the forums and sign in uh yes you can automate the you can write scripts for logging in or signing in using different accounts but i found it difficult to do this that's why i just i have been using it a manual i've been doing it manually so like creating different accounts like four to five accounts per forum and then noting down that into this creepy like the username password and cookies into is creepy so for this you have to do this also you need different scrapers for different forums because the architecture of forums is different for different forums that's why you need to do this step manually because you need to analyze you need to first log into the forum analyze it and then write a a different function for each forum for the html elements that you want to access uh coming to the crawler part so what crawler so the crawler person analyzer it's and the elk powder it's all the part of the script that i discussed you before so these all uh are part of the scripty system uh i've just written it differently so you you can understand what each part does so what crawler does is again it crawls html pages from the forums parser does its parses the html pages like getting the html elements like post post content author uh etc and the analyzer so analyzer part is the art you can write different function for this in scrippy so what analyzer does is so suppose you got the data from the from the dark web forum now you need to use some kind of techniques to evaluate the content that is relevant to your organization or relevant to your threat model so we will discuss what threat modeling is in the later part of the presentation so for now just understand there is you can't focus on every other threat that's out there you have to focus on thirds that is relevant to your organization so you need to do some kind of analysis to like get the data get the relevant data from the dark web so here comes the nlp model uh that's i that i've been using so you can design a train your envel model in this way that it can just get you the content that is relevant to your organization as uh so you don't also suppose if you are bank you don't want to focus on tools or you don't want to focus on data breaches that's not relevant to your bank uh you most likely would want to focus on the dumb shops that's where credit cards or debit cards are being dumped so in this way different organizations have different requirements and you want to focus on those now designing and training an np model can't be automated because you need some kind of content before a content relevant to your thread model and then you need to train your np model on that it's like so it's the same thing it's somewhat like either you get the data first or you design your model first so it's like egg and chicken problem but nowadays there are many nlp models like seeded lda where you can provide your provide like some kind of context before training the np model so it's easy to do that and then you store the data into elk so these all thing can be automated so coming to the part after getting this data so what's the process after hunting now we'll discuss a little about threat intelligence life cycle uh so what threatened in lifecycle is these are different steps that your organization takes to [Music] build a thread to like it starts from getting the data uh till the presentation of the data so how threat hunting on the dark web corresponds to this is in that so there are like five phases as you can see direction collection processing analysis and dissemination uh what we are doing is we are doing direction phase from the human sources like you can see from dark web social media forums so in the direction phase we identify dark web forums uh we register on those forums we acquire access on those forums in the collection phase you use scrippy to establish access and collect raw data processing phase is also on in using scrippy so you parse raw html data you extract topics and authors and the analysis phase is where we use nlp and other machine learning models to infer relationship between these data uh the we get data that is relevant to our organization we link data sources we identify trends and hacks and leaks etc and dissemination phases where we visualize the data in dashboards if we are using kimana or other kind of dashboards we give out alerts and reports for our higher managers or the other people to see in in our organization so if talking so these is like crux of what threat hunting on the dark web maps to threaten life cycle now threat modeling as i told you i was going to talk about this in the coming presentation so what threat modeling is it's like like getting your organization's uh critical assets and focusing on your organization's critical asset so it's like understanding threats and how you can mitigate it when it happens to your organization particularly so you understand what attackers want uh what different critical assets you have in your organization uh what are different types of actors that can target you whether they would be activists or insiders or some kind of criminal groups and know their capability if you choose your so here you choose your target on the dark web whether you want to whether so if your bank you focus on credit card markets if you are some other organization if you want you focus on inside the shared markets or you focus on general markets so in this way you choose your target on the dark way uh you prioritize risk as you can use parameter of pain for that so you prioritize risks and focus on iocs that are relevant to your organization another thing is you don't just use one source to target like there are many many forums on the dark web you don't just focus on one target you focus on multiple targets apart from dark web you focus on multiple clear websites also like baseband or twitter or nowadays on telegram also many these many actors are communicating so you focus on the dark web also and the clear web also to get all the things you can for protecting your organization so again data collection processing you collect data from the clearweb and the dark web uh so some of the sites are placed in twitter reddit uh on the dark web its forums market different forums and different marketplaces you can do all this using the scrapy crawler and parser that we discussed before the analysis part in the threat intelligence model is you use nlp like i told you before you use nlp machine learning or deep learning techniques some of them are like ldap or gpd to gather information related to your organization uh you use uh you analyze you use social network analysis for analysis of different users on the dark web that post uh data related to your organization uh there's clustering of products according to categories uh for the clustering thing you classify different uh so there's like binary classification multi class classification so that's how you classify different products being sold on the dark web so these all thing comes under analysis uh i will touch a little on miter attack framework so what might attack is it's then it's a knowledge base of knowledge base of all the ttps that's that was built using real world observations so it contains different tactics techniques and procedures that the attackers have used all these years so you use attack matrix to map the intelligence you obtained to understand the gdp is better and to protect your organization more so now coming to the operational security stuff so hunting on the dark web or if you are doing human intelligence stuff on the dark web you need to follow some set of processes uh so that you don't reveal your data you don't reveal your identity or your organizational identity as i talked before so what is object so opsec is the practice of hiding yourself online uh so that you don't reveal your uh you don't really your real self or you don't reveal or compromise your own operations uh it's right from the u.s military or that's operational security uh you need to different you need to hide your pia that's personal information personal identifiable information so you need to hi you need to work on the dark web in such in such way that you don't disclose your full name or driver's license or bank account or even a simple thing as email this is what you need to protect and that's why operation security is really important and it's also a hard thing to do because at the end of the day we are all humans and we like to be seen as knowledgeable and we like to impress others uh this all things leads to gossip gossiping bragging and over sharing with others that's why operation security is really hard and most of the time people think of it as a process so people think that okay i have to do human intelligence stuff now i have to follow operation security it should not be like that it should not be it seemed as a burden to perform or as another of your job tasks to perform it should be a mindset like you should always think about operational security before doing human intelligence or before uh engaging with actors uh so some i will discuss some of the things that can that you can use to maintain object in your daily lifestyle there are many other things so the main thing you want is hiding your identity so the first thing you can do or you should do you should do is use separate system like i told talked about before also you separate system where you don't store any personal information uh you whether it whether be it a lab or vm or some kind of system uh the main thing is to use tor with a proxy or tower over vpn the the main thing is maintaining different personas on the dark web so it's like i told it's an equivalent of an fbi agent going undercover so he has some kind of persona he has a backstory you have to do that you have to do the same thing you should have different personas for different identities that you have on different forums uh you should never mix it up that's why you have like that's why you take extensive notes so that you don't mess up the personas and so you should always watch what you say and you should always think before posting uh you should so it's a human intelligence or upset human intelligence is not a 95 job thing you can't just talk to or you can't just communicate to actors during your job time because they will know that you are doing this as a as your job i mean in this way you can be exposed they can easily guess uh that you are a researcher and not a threat actor and that's a like it would be a tape or a tip off for them that's why you have to like do this work 24 by 7 or it's not like you have to do this work on the weekends you have to do this work after your work hours that's why it's not a nine to five job thing and you have to like develop appropriate language skills because people on the doctor don't talk formally like actors don't talk formally so you have to develop appropriate language skills or slang skills also there are many uh forums like there are different russian forums or general forums so you might need to develop that language skills like learning english and learning german another thing to note is changing time zones so if you are suppose if you are in us and you are accessing or you are engaging in a community on a russian forum you might want to change the time zone to russia because again it would be a tip off to the actors that you are a security researcher or that you are not a real actor so these are some of the things that should be noted before doing him intelligence stuff on the dark now that was it so concluding all this uh we discussed a little bit about the dark web what duck webb is how to access dark web uh we discuss about dark web forums and marketplaces what different products are being sold there uh what is the cost model around that uh we discussed about threat hunting on the dark web uh how you can hunt in the dark web uh we discussed different tools and uh the main tool was creepy that's the main framework that we have that we are working on to hunt on the dark web we discuss about human intelligence how it can be used or how it should be used to support your tool based and tool-based data collection uh we discuss about the pipeline or the architecture that can be used to automate the dark web hunting we talked a little about uh a threat into the life cycle how threatening on the dark web maps to the life cycle steps and we talked about operational security and why it is so important and why it is so hard to do operation security uh again some more uh points to notice it's obviously threat hunting on the dark web is hard but it's worth the effort you don't get intelligence that you get from the dark web you should always keep operational security in mind and you should look so like i said before you should look on more than one resource you should look on forums you could you should look on clearweb forums also like baseband and telegram as an example and you should look on different other forums on the dark web also it takes a lot of resources and a team effort you can't do all these things alone you should have a team for this and we talked a little about usage of miter attack framework and how to map your how you can how it can be useful to map the ttps map your ttps to that uh these are some of the resources uh data i suggest you to read if you want to know more about the docker stuff hunting stuff uh the majority the major ones are like a recorded feature inside crowdstrike digital shadows uh they release their blogs or whitepapers regularly so read them and you will understand what all things are there on the dark web so yeah that was it i think uh thank you so much i hope you all like my presentation and you can contact me on twitter on linkedin if you have any doubt or if you want to discuss about this stuff more i will hang on in discord to answer all your questions so yeah thank you
Info
Channel: Red Team Village
Views: 1,233
Rating: 4.8139534 out of 5
Keywords: cybersecurity, defcon, red team, red team village, ethical hacking, hacking, hacker, conference, exploit, vulnerability
Id: -3ZqR_sMqq0
Channel Id: undefined
Length: 45min 37sec (2737 seconds)
Published: Sat Aug 08 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.