Defcon 21 - Defeating Internet Censorship with Dust, the Polymorphic Protocol Engine

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everybody thanks for coming out okay well I got a lot of slides so I'm gonna try to just burn through we're just gonna power through so try to pay attention to like the first five minutes of slides so that you know you'll you'll be there with me when we were hidden hidden through this stuff okay so I'm Brandon Wylie I've done some stuff I wrote a thing called Freenet in like 2000 Def Con 2000 thank you raise your hand if you have ever run a free night node yeah my people thank you national heroes every one of you so yeah my first talk ever I was 18 years old it was at Def Con in 2000 I presented about Freenet the entire description of my talk was this is about Freenet I drew the slides with crayons and that was it was like a packed room of people that like came to go see like a talk based on that information and then at blackhat like 2003 I presented curious yellow which was my super warm design that was designed to destroy the internet purely theoretical as you can tell because the Internet's still here you can read more about that in a Charles Stross has a book called glass house in which curious yellow is the thing that like destroys humanity so that was a great moment for me when you put that in there and I used to work at BitTorrent so like I was there when BitTorrent bought utorrent so I apologize for that but uh but yeah I did a lot of stuff in Victorian and um then since then when I was at BitTorrent as when I first saw deep packet inspection being used to block BitTorrent in fact when BitTorrent was when we noticed that Comcast was blocking BitTorrent before any of the press heard about it I was the guy that they sent a Comcast to try to reason with them and well you know you know how that worked out so I started doing I've been working on you know kind of anonymity stuff and mainly kind of in the censorship resistance side of things for a long time so I know the folks from tor from back in the day and I've been helping them out more recently with their new like a few skated protocols because towards being blocked in a lot of places so they need a new protocol that's not blocked and then finally I have I wrote part of a book called peer-to-peer for O'Reilly like a long time ago so anyway so those are my credentials who cares whatever I'm just putting this up so that I can establish some credibility with you guys so that when I start showing you pictures of cats you don't just be like what is this I'm out of here because there's a lot of pictures of cats in my talk so yeah all right so uh so let's let's get into it so so my slides are taken from two different sources one is my children's book on internet freedom called free as in kitties and the other one the other slides are from my PhD dissertation so I kind of mesh them together we'll see how it goes right so we're gonna start out with the Internet what is it let's define some terms hopefully you guys have checked it out if not it's pretty cool should get on there there's a lot of stuff on a lot of cats and stuff and then how do we internet with this Internet once we know what an Internet is and then we just get straight up into just binary classifiers using Bayesian statistical inference that's from the children's book now and then fooling binary classifiers with polymorphic protocols and then you know dust which is the what's it talks about which is the polymorphic protocol engine and then I got some infographics and then if we get if we have time if I had to start my timer there we go then we'll talk a little bit I want talk a little about realistic threat models versus the threat models that everybody else uses so yeah so first of all the internet the internet as we all know is the greatest technological marvel of our time and the pinnacle of civilization it's an unprecedented way to deliver pictures of cats so I know what you're thinking you can't take a real cat and transmit it over the Internet believe me I've tried it doesn't work that's an analog cat okay so first step is we have to turn it into pixels with what they call pixelization so we get it pixels and then that's like that's a digital form that we can transmit over the Internet so if we take this exact cat we make it into pixels we have this it's a pixel cat fun fact if you go on google image search and you're looking or just on google and you're looking for things like 8-bit cat pixel cat low res cat you'll find a lot of OkCupid profiles of girls live in Oakland it's true story okay great so we got this cat now we need to turn it into numbers because as we know like computers that use numbers and stuff so that's pretty easy we have all these various color spaces and things so we get like a number mapping for each color and then we run it through there and then we get you know a map of numbers okay so now we're now we're good now we have something computers can understand and we can transmit it so first we got to do is in the in the Internet for some reason why they designed the Internet they didn't think that would be handling like you know big chunks of data like cat pictures so it can only handle very tiny chunks of data so we split all of the of the data into all these kind of just kind of randomly sized different things that we call them packets and then we transmit them over an unreliable a possibly unreliable medium all right and then they all arrive maybe maybe two here I hope maybe they don't arrive at some point and then we try to kind of copy like cut and paste that so I stitch them back together to get the packet and then on the other end of the pipe after all of this magic has happened we get a pixel perfect exact replica sent through the Internet of the cat that we started with all right there we go yeah Internet and it's great okay so so what's the problem I mean the Internet is great we can look at cat pictures it brings us all a lot of love and joy like who would ever want to try to stop this well robots since the beginning of time there's been a war between cats and robots no one knows why all we know is that robots have been programmed to hate cats okay so so here's how binary classifiers work okay robot looks at something it looks at the it looks at the packets that it says is that a cat yes or no those are all the options that we have that's why it's called a binary classifier that's the decision is trying to make cat not a cat okay now because they hate cats if it is a cat they replace it with a sad panda okay all cats all cats are replaced by sad pandas now if it's not a cat don't care don't care just pass it through just exactly as it was bananas what it doesn't even don't even know what bananas are there's no about cats and things on our cats because they're bitey learn a classifier so don't care pass it on okay so the question is how do we fool robot so that we can transmit pictures of cats over the internet without having to replace with sad pandas that's the question how do we fool robots right well I think if you've been paying attention remember I said pay attention like the first five minutes you already know the answer right right you gotta make cats look like bananas and then robots don't care all right so here's the secret code to my talk don't take a picture of this slide the slide is not on the internet version of the talk this talk is just about cats and bananas so kittens are free-speech sad pandas are censorship of free speech robots are filtering hardware that's made in America and then sold to companies all over the world to make it so that people can't access the internet and find out things about like news about what's going on in their own country during elections and other critical times like that bananas are just messages that filtering hardware doesn't care about and then banana cats are free speech which is encoded so that it will get past the filtering hardware okay so yeah so this is we're talking about some serious kind of deep stuff here right like this is like really important sort of stuff because the internet needs to be free well you know I just kind of wanted to Segway in this so now I hope that we're all at the same level like we all are on the same page and understand the code right so now then you know the code I can tell you about my project dust makes cats into bananas in order for robots so they won't have anymore sad pandas okay all right so so yeah so that's the intro and now let's get you know into a little kind of some details here so how do robots see cats so robots can't see cats the way that you and I see cats are you look it's a cat right they only see the packets they see the grid of nut shooting they see the grid of numbers and then they have to use some kind of like statistical or like rule basis because they're robots right they only know logic so so here's one mechanism right which is you just look at the links of the packets right it's all grouped into these kind of randomly sized packets you just kind of count like the first ones like 38 numbers in it and you say you know if things are in this kind of configuration then it must be a cat now this probably sounds really dumb you know you think that's not gonna that's not gonna work doesn't that has nothing to do with whether or not it's a cat so we're gonna do a little we're gonna do the audience-participation test to see if you guys can and classify traffic based on packet links okay are you ready here we go this is a graph of HTTP packet links now that thing on the far right side that is not the border that's actually a giant spike in the graph there's a giant spike over there if you know about TCP that's because of the Nagle algorithm which takes little packets and then just helpfully for you it bundles them into big packets so since that's not turned off in HTTP you have Canada s-- spike in the largest possible size packets okay now this is HTTP HTTP disables the Nagle algorithm in TCP by setting the no delay option and therefore it doesn't have that kind of it has this like totally different statistical like it still has you know a lot of like fairly big packets doesn't have that spike on the end and it has kind of this other spike kind of around like 400 or so I don't really know why I just look at I just look at the graph okay so I have just showed you two different gaffes now I'm gonna ask you I'm gonna show you a chart I'm gonna ask you if you can guess which one it is okay so raise your hand if you think this is a chart of HTTP okay raise your hand if you think this is a chart of HTTP okay congratulations you are all robots it was neither it was dust my project pretending to be HTTP so yeah so it did a pretty good job right I kind of tricked you though because I didn't have that option of like is this something pretending to be HTTPS you might have picked that because that's kind of obvious choice since that's what we're talking about so yeah so packet links work as a way to determine if something is one protocol or another protocol and the reason that we care about this is because these days the way they block the Internet is they don't say hey you're looking at this thing that we don't want you to look at so we're gonna block it they say hey you're using BitTorrent blocked hey you're using tor blocked you're using SSL blocked you're using a VPN block they just block it by the protocol regardless of what you're doing and that's that's crazy because you could be doing all kinds of things but you know if they can't get what you're doing to determine whether or not they like it they're just going to go ahead and block it by default and so they do it based on protocols so like their for instance they're a situation in which SSL has been totally blocked you could only use unencrypted HTTP well that's okay if you can make your traffic look like unencrypted HTTP even if it's not right so yeah so dust removes packet length information but it doesn't just randomize it it randomize it according to a target distribution of whatever you want so you pick a protocol and thus will make your packet links look like that protocol any protocol doesn't matter just give me some sample traffic I'll sample it I'll make a profile and that'll make it look like that so here's one of the like kind of tools that I've made for looking at deep packet inspection hardware and trying to figure out how it's doing classification so that we can you know circumvent that classification I made this tool called shaper you give it a model of a protocol statistical model so for instance like a model of like what packet links it then does the trick before and makes traffic that looks like that just infinite traffic that looks like whatever you want it to look like and then we pass it through and we say hey is this such and such or not and then we get the answers back and then we can tell how how well the different hardware is at classifying protocols and then once we can do that we can get better at making encoding that hide stuff from the classifiers and so that's one of my open-source tools you can use it if you have some hardware you can like throw traffic at it and test it and see how it's doing classification okay so second type is it just looks and says hey there's some statistical properties of this traffic like for instance I see a whole bunch of sixes I'm gonna count the number of sixes if there's like a bunch of sixes then that means that it must be you know whatever must be some particular type of traffic so here's some examples of that so this is this is an English dictionary and I looked at the probability of different bytes to occur in that dictionary right so the one on the far left is just newline because it was just a list of words so don't pay attention to that that's just I didn't clean the data cuz real data is dirty so I'm showing you the dirty data and so there's yeah so this is the main thing this is lower case letters of the alphabet right so you could see there's definitely a spike to the left is a little spike that's uppercase letters there's a lot of opportunity letters in the dictionary more than you would think but a lot less than lowercase letters so yeah so that's clearly there's like statistical sort of stuff if you look at like a UK English dictionary it's a slightly different sort of thing this is HTTP oh my gosh it's the same spike why is that it's because HTTP traffic actually has a lot of you know I like ASCII letters in it as well like HTML elements are often lowercase letters a little bit of a bigger spike in the uppercase letters but yeah so you can see this bleeds through like we know that this was English HTTP HTTP traffic or at least like HTML HTTP traffic right we know this was not images because we can just look at this distribution right so I feel like a lot of people think that you know if you kind of wrap your traffic in something it hides it but a lot of stuff actually bleeds through here's HTTP oh my gosh it has the same spike why does HTTP which is encrypted have the same spike in English letters it's because SSL is encrypted but the header is not encrypted right and the header has a bunch of information in there that uses normal like English letters like the name of the website and stuff like that the SSL common name as they call it and that's how they that's how they get you with the SSL that's how they get you with the encrypted traffic is they look at the unencrypted headers and then it's it's actually super easy to tell what protocol you're using even if you using an encrypted protocol if there's an unencrypted header so I think people on this deal let's just a crypt everything with us a cell well that doesn't work because you can tell us a societal and people just black us itself so yeah so dust fixes that too right dust removes the statistical content information I use thing called reverse Huffman encoding where I encrypt everything and make it random and then I reverse Huffman and code it to make it not random to make it just whatever like if you say the only characters you could the only bytes you can use our F and a I will give you a stream of just F sanae's that you know encodes encode your traffic just whatever whatever you want whatever probably really distribution you want I'll make it look like that and then final and this is I know you guys are gonna be like that's stupid I can't no one does that but yeah this is a the most popular way of classifying traffic you look for a sequence of bytes at a particular offset in the file and then that's it you see this like for instance HTTP traffic you know it starts with like HTTP GET HTTP POST they just look at the first four bytes if it's HTTP they classified its HTTP traffic that's that's it and uh that is like 90% of all dpi classification that's like actually deployed and used for censorship is just it's just doing that so yeah so we remove that right because you know that's that's and that's not gonna work so along those lines I have this other tool that I made that's part of the best kind of suite of tools which is for looking to figure out what these byte sequences are because these signatures that comb signatures are not public like they they don't want to tell you what bytes looking forward because it would make it easy to advocate your traffic right so if you have some dpi hardware I have this tool that will take some sample traffic and then replay it with all these different variations where it blanks out certain bytes and then you can look at the results and you can find the exact string that they're looking for and you could do that you know again you can do that for any protocol okay so to break it down for you what does does is if you define a set of properties that deep packet inspection hardware is looking at to filter and you define you know like which which things go in which category based on those rules then for whatever property that is dust will randomize that property to remove all information and it randomizes recording it to a probability distribution to force the classification to whatever category so you tell me what category is your hardware has and I can make arbitrary traffic get put in any of those categories the reason you want to do this is because you want to get into the category that's not being blocked whatever that is right like there was a a recent instance of an adversary was blocking everything except for HTTP and HTTPS could only be 60 seconds long and then they were automatically closed and so a lot of protocols had trouble with that says fine 60 second HTTP connections let's do it and then encodes all the traffic that you have over you know that protocol so yeah so basically if you let any messages through then you have to let all messages through because we'll just encode into the set of messages that are allowed and then the ultimate point of all of this is I have this message server that you give it arbitrary messages it encodes them to look like bananas they're passed through and then people are reunited with the cats that they love and that's really what it's all about is just letting people get to the content they want to get to post what they want to post read what I want to read and just have free speech on the Internet cool so that's the end of my linear part of my talk and now I have several bonus slides depending on how much time we have and I think yeah I think I burn through those pretty quick so I'm gonna go let's go through them and then when we do Q&A maybe like some of the questions will also be related to these slides okay so sometimes people ask me about various other projects and how like dust is different from these other projects and I don't really think of them as competitors like we like I mean people are gonna choose they're gonna use one kind of encoding or another for their traffic to get it past these this filtering hardware but just use whatever works I mean all you want to do is get past the filtering hardware right so if something works do it and if it stops working then you know switch to something else so I worked with tor on FS proxy which is there a few skating protocol and so that's an example of protocol we're just obvious gates right like it it just makes everything look totally random and that's good that's pretty good that'll get you past a lot of things but some of the hardware now will actually flag stuff as random random looking at which point you can make a custom rule that says hey if it's random looking block it I don't if you can't classify it that's okay just block everything that has like high entropy like if you guys have heard about like the entropy attacks those are really awesome attacks that work really well they're not really widely deployed but you can custom configure and some of the hardware so that's the issue with with just a few skating stuff you need this second layer where you shape it to look like the stuff which is whitelisted a lot of people are doing a lot of research on mimicking specific protocols especially HTTP people are just trying to make stuff that hides like Stegner graphically hides information information HTTP so the problem with that approach is that people always choose the most common protocols the ones that they think like no one will ever block this protocol because it's too important people usually say that about SSL and now it's totally been blocked so people are really focusing on HTTP the problem with that is that the DPI Hardware has the most visibility into HTTP of any protocol there are actually whole boxes that just do HTTP interception and do like semantic parsing of all of the headers and all of that kind of stuff so you have to do a lot of work to look like HTTP in fact there is this paper recently called the parrot is dead in which they talk about that they're pretty sure that given any kind of traffic that mimics some other kind of traffic they can make a test exist where they can differentiate the two because there's going to be difference between like your HTTP implementation and like a real HTTP implementation so people are trying to do this crazy stuff where they're like trying to get like an actual browser like they're trying to get Firefox and try to make Firefox like load pages and then they encode like information in the way like which pages you choose the timing and stuff and that's fine it's just like a very slow protocol and you don't need to do any of that because like I said before the DPI hardware is just most of the time saying are the force first four bytes HTTP and then that's all you need to do a lot of the hardware only looks at the first packet because they're trying to scale and so they're basically they're cheating and they're designed right like instead of like looking at all the packets because I want to be able to push more throughput and be able to tell the people they're buying it like oh yeah we can handle your whole country's traffic you know you know you don't even have any boxes will be fine they just look at the first packet then they classify it and they just like forget it it's been classified so they just stick with that classification I was talking to a dpi vendor said that they looked for some protocols they have to look at like 20 packets oh no 20 packets before they can classify it so it's just it's a lot easier than trying to actually like be exactly like this protocol and then there's a really cool project called format transforming encryption that you give it a grammar for a protocol like for instance you say like HTTP or like FTP or like SMTP and then it will generate random messages that conform to that grammar that's a pretty cool project so I check that one out so the difference is in what I'm doing is that I'm not writing a protocol like op FS 3 is like the tourist current protocol for all that's for obfuscation you look at FTE that's that's kind of a protocol engine but most people are just they're thinking let's make one protocol that can never be blocked and I gotta tell you that doesn't exist there is no one protocol that cannot ever be blocked by anybody it depends it just depends on your settings like your attacker your adversary is gonna have some configuration on their hardware for block this don't block this and it's gonna be different for everybody there is no one protocol so instead I wrote a protocol engine where you just instead of updating it with each revision when it gets blocked you just change the settings like you say ok before we were making traffic lo que aqui HTTP now let's make it look like let's make let's do some udp-based thing you know let's just get crazy let's use UDP let's make it look like Skype whatever and then you know if they block that then again just you know just switch it up switch it switch it up every day in fact don't even just mimic protocols I have this thing that I can't really convince anyone is a good idea that I think is awesome which I call chimeric protocols where you take like two perfectly you take like I don't know like SMTP and like NTP and then you just kind of like smush them together and you get this protocol that people like I don't know what that is right and just keep them busy you know they got guys right they got to configure this hardware they first have to notice your anomalous traffic right then they have to figure out what you're doing and they have to make it configuration and then they have to make sure that it like evenly splits out your traffic from like the legit traffic so you know just like just keep it rolling in fact you even with with Dustin's just used a probability tuition you could make up just random distributions you know you could be like in this protocol everything's always gonna be five bytes long or you know like fourteen hundred bytes long there's no but I don't think there's any protocols like that you know so yeah another thing is my thing is purely statistical because that's how they actually look per packet is how the classifiers work so my stuff is per packet in the in the parrot the parrot is dead paper they actually referenced my work and they say I think we've determined in this paper that packet based stuff like dust is just never gonna work and it's like right it's not gonna work against a bunch of cs professors and all of their grad students in a lab looking at like just like two different like like pcap files sure but against the actual deployed hardware it works awesome i know because i have the hardware and a passive through there and works awesome so i think that's kind of you know that's kind of one of the differences there and oh thank you thank you right and so another difference is like with with FTE format transforming encryption it's a great project you need a protocol specification so that you can follow that grammar with dust you just give me some sample traffic and I'll just build a model from that in fact the best thing is you give me some sample traffic of traffic that was blocked and some sample traffic of traffic that wasn't blocked and I can from that make you a protocol that will be guaranteed to not be blocked well not guaranteed but it won't be blocked without having to even know what Prout I don't even need to know what protocol it is I just need you to give me the key hat pcap files and I just process them and then and then we're done another thing is so a lot of people that are doing these specific protocols like HTTP modeling they model the protocol and they say what does the protocol look like let's look exactly like this what I do is I model the filtering hardware and I say what is the filtering Hardware think that HTTP looks like let's look like that right and then not do any more work to the necessary so we get maximum efficiency while still definitely getting past that hardware right you give me some different hardware I might you know come up with a different protocol and I think this all comes down to like I'm aiming for like a realistic threat model like I want to base my threat model on what's deployed and what's being used to censor countries and then one more thing I just added right before the talk is that there's no shared secrets like everything's totally public like the source code is out there you can get it and you you know even the protocol doesn't have any kind of shared secret or anything so you can know that people are running dust it doesn't help you figure out who's running dust because the traffic by definition looks like the traffic that you don't care about right so even if you downloaded you run your own experiments unless you know what settings people are using it won't help but even if you know what settings you like the battle is you have to make a better rule for your filter that can tell between the mimic traffic and the real traffic so it's no longer like a war of technology it's like a war of who has the better information like who has the better the better models so talking about threat models so in the academic world the threat model hierarchy of threats is if someone just published a paper and a won best paper award that's the that's the adversary that you need to attack was the adversary in that paper right and then otherwise like if there's like a recently published attack you should defend against that otherwise if it was the pub it was if there was an attack published before 2003 no one cares no one is working on that in the academic research at all and quo so um so that's kind of my issue with academic stuff is they they're really good at classifying traffic in the lab but I mean who cares because until it makes it hard where til it's deployed until it's being used for censorship it doesn't it doesn't really matter I have a slide about open source threat and I just want to say I don't mean if anybody is my experience working on free networking an open source project is that the biggest threats the number-one threat is whatever you come up with that you can think of that's like oh that's what I defend against because like I thought of it and so it's like probably pretty serious attack and then secondly is like if someone on the mailing list comes up with it then you know it's pretty bad or if it's on reddit like if somebody attacks your system on reddit like in a reddit thread in there light your system sucks it's totally broken I know cuz I broke it because I made this attack then that's what people defend against and then finally everybody always adds plausible deniability as a thing I know we did Anna free net you know so it's like I've been there everybody just always thinks you got to add plausible deniability and I think that this is a bad road to go down as well so my threat model is based on is this is this attack actually being done in the wild to censor traffic a lot and so that would be an example of like that like the static packet of the static bike sequence matching that's like number one things so like if you don't defend against that then we we should we don't even need to talk about it and there's actually still obfuscating protocols that begin with a magic number in the handshake and so if you just put that magic number to the filter then that protocol is gone and then you know if you see it occasionally that's you know that's good to we'll do that and then finally if if it's if you if the capability is in hardware but just hasn't been used then that's like lowest priority but I'll still do that and there's some like really awesome hardware I met a lot of people actually this weekend that we're telling about some DPR hardware that sounded like totally sweet no one's using it but if anybody ever buys it so one of the things about DPR hardware it's like old it's really old no one ever upgrades so a lot of these countries that are filtering they're using like 10 year old hardware so that's the first thing is like the ten year old hardware is the first thing that we need to prevent against and you would be surprised the protocols that are coming out that fall instantly when thrown against ten year old hardware because they're reading the papers are they're going on the mailing list rather than looking at the actual hardware let me flip through see if I have some more slides here let's see yeah okay that's a that's a good question so so yeah so you have to have a client and you have to have a server and they both need to be speaking the protocol you need the public key of the server you need that because I need to have a to be able to do a handshake where we don't have to communicate anything that's not purely random bytes let me go I have let's see yeah I won't get I won't really get into the key exchange I'm a lot of time but um but the key exchange and everything is all purely random so you need to have the public key ahead of time so when you find out the address of the server you need to find out its IP its port its public key and then also the the configuration for what specific protocol you're going to be speaking so that all needs to be out-of-band in the invitation right and so I know that's not that's kind of not the way that people usually do it people like to do these like you connect and then you just handshake everything like right there that's kind of like a more popular way to do it and I just feel like that way doesn't work you need to have a little bit of information transmitted out-of-band beforehand in order to have all the properties that we want to have let's see let's see yeah okay let's just do let's do questions and if slides there are slides that are referenced by questions that's fine anybody got any questions oh we got a mic that's good cuz it's a big room Jimmy I don't have and shockingly no wireless here so how do we how do we run adust server to help out is there a community set up or such or ec2 instances or anything like that how can we make those endpoints that people can connect to right so that's a good point so dust right now is not an actually a service it's a it's a protocol and it's like an implementation of that protocol which is designed for other people to use so like for instance with tor I worked with them on Ava's proxy which is part of their public able transport system where you can basically make anybody can make a new transport for tor and so that's kind of one of the targets is like a tor wrapper that uses this and then and then also I'm trying to make it into a like a library where you can use it like just in your own kind of protocol there's no currently like system for just doing like open proxies that are based on dust I think that that's not really the model that I want to go with just because I know from knowing the Tor guys from way back when like how much work it is to run a community of volunteer nodes well and well Freenet we had that issue as well the Freenet was was actually pretty low maintenance people just run it there wasn't a lot of coordination um but uh yeah so right now this is let me go to the slide on whether or not you should put real traffic on it which is no don't put real traffic on it because this is a purely purely experimental sort of thing yeah so yeah there's there's no I don't have a good answer for that yet but that's a good question I'm gonna work on that I guess this is more of a general question for all obfuscating protocols but couldn't the attacker just noticed that you're only communicating with one machine all the time and it's always HTTP and you never get anything blocked and then just block all access that way to the right right I see what you're saying so you're talking about like your connection patterns being anomalous right like you're you're making long-lived connections to a single machine so that's one of the things I'm going in the next version that I'm working on is being able to split your traffic over multiple connections to multiple machines one conversation I've already got it we're like some protocols actually use multiple different ports like if you look at Open VPN it uses 443 and like one one nine four I already have that as part of the statistical model where you can say yeah use like 80% on 443 and 20% use like one one nine four right so you can take that to host to you can be like split your traffic among this set of hosts with this probability distribution use these ports with this probability distribution so yeah I'm totally I'm totally we're gonna that also I'm working on a thing where you can split your traffic over simultaneous TCP and UDP conversations using different profiles different protocols with different hosts and it all just gets kind of funneled back together into one stream on the other end that's a lot of work though so it has it it has it come together yet it's just a lot of bookkeeping and stuff yeah that's the next step so it seems like the obvious escalation for the hardware manufacturers is to just move up the chain and start classifying distributions of by grams trigrams like hashes of tokens in HTTP have you seen any evidence that they're moving that way or are you sort of banking on the fact that that's like a lab CS world theoretical attack and not likely to be deployed in practice well so to come back to the basic principle of dust if you decide if you define a property of connections I will randomize over that property so if you move from a first-order probability model for content where you're just looking at individual bytes to looking at by grams or trigrams and that's deployed and I see that I will simply randomize on the by Graham and trigram level and I can do that a lot faster than the hardware people that need to do all that stuff test all the stuff and then get people to buy it and they get people to roll it out I could do that I could do that today the only reason I haven't done it is because it's not deployed and also like today specifically I'm really busy doing some of the DEF CON contests are you you know right we're not done yet stop clapping so how do you specify what's allowed through do you have the client email out-of-band some pcap data for things that they were able to do and what they weren't able to do how does what's the actual details of how that gets specified so there's there's two there's kind of two parts there there's how do I make a model of a protocol and then how do we communicate that model to the client so they can connect to the server so in terms of modeling the protocol I have some tools that take pcap files and then actually boil them down into like a statistical like it takes out all of that individual package and just gives you the statistical model and it makes that into like a tiny little file that you can you can email to somebody and you bundle that up into what I call like an invite packet which has the IP and the port and the protocol configuration information all on one thing so all you need to do is tell does here's my invitation and then it will connect to the server and do everything right and so in terms of how you make those what I do is I have deep packet inspection hardware and I look at what gets through and what doesn't get through now obviously it depends on how you configure it like what kind of traffic you like are against so what I do is I look at real world instances of filtering I find out what they're using I get that hardware I can figure it to like reproduce the reported behavior and then that's how I try to make a realistic model which brings me to something I want to say about contribution here's a bunch of ways you can contribute everything's written in the Haskell and my Haskell to C is really weak so if anybody knows Haskell to see I could really use some help making my husband C bindings not suck and then also if anybody has any dpi hardware that would be cool cuz I have some but I don't have it all in particular I need some wall way so if anyone's got any wall way gear that they want to let me like sends a package through you can help save the internet from being censored so you know it's like on the DL you're saying wall ways like maybe a security problem there was that you're saying huawei's a potential security risk for my project no in general in general I wouldn't say in general I mean they they have good stuff they have good stuff they they're really good at filtering stuff so I don't know my stuff works against Huawei or not because I don't box yeah anyway more questions do you think it's possible to put our deal 50 of us can or a client in the filter so the message can be decrypted I mean automatically yeah we can use okey I mean exchange there but the protocol I mean it's relatively more cancer than there so if we just uh reverse engineer the protocol I mean I mean reverse engineer my protocol uh-huh oh you don't need to reverse engineer you can just download the source code so it's like it's right there you know yeah I was thinking I mean just to put up just trying to put a defense mechanism in the in the filter mmm like um so just a things can be automatically decrypted it to you yeah just like put a client end up in a filter so you can put a cry I mean put a client in the filter so you can understand the meaning of what has been passed through I don't totally understand your question so let's talk after and then and then I'll and then I'll get it you mentioned some academic work which sort of questioned whether in the long long run your protocol can fundamentally work because eventually they can adapt to your protocol I can you please give more details about it yeah so so that was the parrot is dead paper in which they say that packet based protocols packet based approaches to obfuscation won't work because they've already got some stuff that they have done where they look at like the whole connection and then they're able to classify stuff a lot better which makes sense right like if you're looking at one packet if you're looking at all of the packets you have a lot more information that you can use to classify so yeah sure that's true here's the thing though if you are looking at the whole sequence of all of the packets unless you delay it well that even then that means you pass them that means you pass the packets on to the server and then you get responses and you recorded the whole conversation and then you classified it i won in that case right the message got through now maybe you had to burn that IP maybe that IP is black now and you got to go to a new IP because they said all you're doing you're doing crazy stuff so we're gonna block it that's already a problem right that's already a problem that Tor deals with all the time which is you got to churn through new IPs all the time so I consider victory to be any time that I get the message through I don't care about anything else I don't care about people reading the messages I don't care about them decrypting the messages if it's afterwards and they couldn't use that information to block the packets so we just have different I think goals the academic people are like can we classify traffic yes or no and my question is can they block the traffic which they do through classification so this will be the last question if anyone else wants to talk to our man here we're gonna take him over to the chill-out cafe so one more okay only one more so I'll make it count can you multiplex traffic across multiple protocols and multiple endpoints is the first part and the second part is are you ipv6 ready so good questions the first part that is in the next version I'm working on is multiplexing over multiple protocols multiple IP is multiple ports and also between TCP and UDP which nobody is doing so I think that's I think that's cool most people just don't like UDP I don't know why it's rad and ipv6 ready it's funny to say that I actually the first version of dust was ipv6 only and people had to talk me down from that they had to be like look you guys like look you guys look Brandon like people don't have ipv6 I'm like well they better get it so the new version thank you yes ipv6 is cool so the new version I actually have just on ipv4 but I'm gonna add by ipv6 obviously because actually one of the best way to avoid deep packet inspection is use ipv6 because they haven't gotten around to implementing most of the stuff for ipv6 yeah another great thing you could do is there's a thing called Torito which is like ipv6 over ipv4 UDP with like built-in hole punching and stuff and it's like really sweet it's actually built into Windows seven so if you're one of seven you already you are have it you can just go to ipv6 addresses that's another thing where they just like don't know what that traffic is so you just use that and then everything's fine there's a lot of like you know cool little shortcuts to getting your traffic past the filters by just using weird like use a weird protocol you know stuff like that all right thank you so yeah I'd be happy to talk to everybody see you guys at the Q&A room or if you just see me around you know let's hang out let's get a beer invite me to some parties well thank you
Info
Channel: HackersOnBoard
Views: 15,375
Rating: 4.9058824 out of 5
Keywords: pt21, defcon 21 videos, def con 21, conference, hacking, leaning, learning, california, Polymorphism (biology) (Field Of Study), Internet Censorship, The Internet (Issue), DEFCON
Id: 3z56andRyCY
Channel Id: undefined
Length: 44min 47sec (2687 seconds)
Published: Sat Nov 16 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.