A Walkthrough Analysis of Tor Networks in Gephi

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
everyone I'm Jen goal Beck and I'm here to do a quick goofy tutorial for you to analyze some networks Jeffie is an open source graph is a shinto lits based on java it'll run on any platform you can download it at goofy org I have no hand in making EFI but I've made a lot of YouTube tutorials about it I teach it in my classes and so yeah you can download a copy of it at cafe org it's free and it's a great way to analyze and understand network so we're going to jump in and look at some data and I'm going to walk you through the process of analyzing networks that I would do alright so what I've done here is open the network with the IP addresses straight up in Getty is exactly out open so we've got this big square of nodes here I haven't done any layout yet what we can see up here is that we have a big and super dense network which is often challenging to visualize so we'll eventually do some filtering on that the next thing I would do is go to the data laboratory and to see what we have in terms of data for each node we have an ID and a label which seemed to match and then a node type so that will be a useful thing to look at since there's just a few types and under edges we have a weight and if we sort we can see that the weights always one and then connections here so the interesting thing here is in the nodes that we have a node type so we come back to the overview the first thing I would do with the network like this is apply a giant component filter that'll just get rid of any nodes that are Singleton's floating on the outside so we drag that down here and click filter and it didn't do much to reduce what we have we can see we basically have the whole network there but it's always a good first step if you've got a huge network to get rid of those little ones that would be floating around next I'm going to use the force Atlas - there's a ton of good network layout algorithms in here but force Atlas - is by far the fastest one and I think it gives a really nice layout for a network this big you definitely want something fast so take that and run it and we can see it thinking down here and already we've got a any decent picture I'm going to stop it right there since it's just kind of jiggling around because it's Network so big that could run for a really long time and I can already hear my computer fan spinning up because it's computationally pretty hard so now we've got this laid out let's click to center here and we can see the whole network and before I even color code anything what I can see is a bunch of let's see if we can get it here these sort of star bursty things where we have one node that's connected to a big clump here that isn't really connected to anything else and if I pull some of these nodes out let's just pull them up to the side here we can see that they're just connected to this one node and nothing else those clumps like this or I sometimes call them starbursts tend to be really uninformative they're just big globs of nodes that are kind of meaningless in the network and so the next thing that I would do is try to filter those so recenter the graph here so I already have my giant component filter so I'm going to click this little down arrow that lets me put in a sub filter so after we filtered the giant component we're going to do another filter and I'm going to try a degree range here so I'm only going to include nodes with a certain degree let's see here actually since this is a directed Network let's get rid of that and we'll look for in degrees and out degrees so let's pull in an out degree range here's another one of these starbursts let's just look at this one more time so we've got this big clump of nodes here and if we kind of pull them out and separate them if you zoom in you see there's just an indegree coming in there's no out degree from this node so I want to get rid of all those that just have an edge coming in hopefully that'll get rid of our starbursts we'll zoom a little just we can see so I'm going to click on the out degree range now you get sliders down here and those can be nice but they can also be really hard to get on a specific value so like if I wanted to make this one and I go to slide this sometimes the next value I gets really high a little trick in gessie is that you can just double click that number and type the one you want hit enter and then you get the network that you're interested in so if we bring that back down to zero this is our original network if I put it at one and filter I get a much smaller network with fewer of those starbursts there's still one up here but this looks like a little cluster so that's better let's lay that back out ok and now you can more clearly see we have two of these sort of starbursts here that didn't get caught so let's take a look at what's going on if I pull these out zoom in on those these nodes still have what looks like one edge coming in but that's probably counting the nodes that we filtered out so if we try to make our outdegree setting - that looks a little bit better that we still have these big clumps here which is not quite what I want but an interesting feature that I can see is that they're all connected to this node here and similarly we have a clump here and the center one all connected to this node these two nodes I suspect are going to be uninformative but let's run some additional statistics to see what's going on so well Center this again and now I want to go to statistics up here let's actually do average degree that's going to give us the degree of every node and then I'm going to color code and also side code size code by degree I like to sometimes do the same attribute for both size and color because it makes it extra prominent so up here where I look on the appearance this is for color and for nodes I'm going to pick an attribute and choose degree apply that and so already that's a little easier to look at for size I'm going to pick an attribute and degree this is that with a minimum of 10 and a maximum of 50 which is sort of the default that I use because it just makes it easy to see all the nodes so if we apply that we can look at our network here and we can see we have two giant nodes in here which are the two that are connected to those little starbursts they are way above any of the others in terms of their degree and in fact we can now go to the data laboratory and sort our nodes by the degree and if I do that you can see we have these two nodes that have this huge degree up here and then it drops down by an order of magnitude so if you're interested in just the role of these nodes these two are going to be super prominent ones that you want to spend a lot of time figuring out because they're connected really intensely in the network and they're important at the same time they're getting in the way of us seeing anything else so mark those down figure out what they are do some side investigation and now I'm going to select them both and delete them okay so with those gone we can come back to the overview and now this is our network a bunch of things have disappeared I'm going to lay that back out I'm going to stop that now reapply our sizing reapply our colouring I might want to rerun this just to make sure okay and what we can see is that we get a much different looking Network happening down here so now we can look at a little more in-depth as to what's going on so let's color our nodes attribute by the node type which is something that was pre supplied we loaded that up with the data set and what we can see is that everything left in here is a hidden service there's no IPS there's no clear net everything left in here is a hidden service that means we've filtered out all the clear nuts and filtered out all the IPS okay so that's sort of interesting this is the bulk of what we're seeing in our network if we go back to our filter and we get rid of our degree range filter let's just put that down at zero now a bunch of nodes popped back up let's relay this out so this is with all of our clumpy nodes but since we've eliminated those two big nodes we're going to see a little bit more so here we go we have this new network a little weird-looking so let's do a few things in here let's try again color coding by node type and now we can see all those nodes that we had filtered out so here's an interesting thing we are we largely clear I don't know than my my degree there and you can see there's a few loops and now you can see for example but that means that most of our clear net node and have a very low degree and this is to blow out no half and we've got a much where the hidden surfaces have a much higher boundary so that's an internet sort of interesting usually you have a we have an additional very high degree know diameter there is a good one because it gives you high and out degree it isn't for the tensor and still a problem which is very interesting that is influenced very prominent node here has a vastly different in and out degree if we go by color we can color by between the centrality and apply that let's change this here there we go so we get a spectrum and let's do the same thing here so now our between the centrality is dictating both the size and the color and as we can see we have high between the centrality here on that node and a handful of others the one that we were kind of curious about that had you know different big difference between the in and out degree that turns out not to be important at all in terms of between this if we wanted to investigate one of these nodes further we could click on this guy see if we can pull him out of the way all right so usually you can do selecting data laboratory I probably have something else selected which is why that's not working fortunately we can just come to the data laboratory and sort by between the centrality and see who this is so this is another node that we probably would want to pay a lot of attention to because it has this very high betweenness centrality and in fact if we scroll through here we are now sorted by between the centrality if you look at the node type you can see all the high betweenness centrality x' or hidden services and of course that goes along with the fact that when we had things filtered by their out degree there were only his services left visible everything else had a pretty small out degree and that's where all the other node types were so we have to get down here to between the centrality above zero in order to actually start seeing other types of nodes which is pretty interesting so that tells us that the hidden services are far more important in this network than the IPS or the clear Nets so there's a lot more to look into but just with an initial analysis of the data that's in this network that gives us some insights as to who's important and what the different node types roles are ok so you can see here I've got a new window I've opened up basically the same graph but there's a little bit of a difference if we go to the data laboratory this one has keywords as another column and you can see here this node has gun sex and drugs as keywords this one just has drugs clearly not all the nodes have keywords but if we scroll through we can see them scattered throughout and so what I can do is use those keywords is another way of understanding the network so let's go back to the overview and we're going to click on filters up here and there's a bunch of options that you can choose from for filters if we go to attributes here there's one called equal so that's if the attribute is equal to something and you can see our attributes keywords and node type ur in here but I actually want to look at all the keywords and just get rid of the nodes that don't have them so if we go down you can see there's non null so null means nothing that there's no value so I want to find all the nodes that are non null for keywords which means they have some value for keyword so I'm going to drag that filter down here and then apply it and you can see now just in the picture that our network is a lot smaller and if we look at the statistics up here we only have two and a half percent the nodes and 3% of the edges so most of our nodes didn't have any keywords which is fine so now we have a much more manageable network to look at so I'm going to do the force Atlas layout again though because this is a small network we don't necessarily need a fast one so let's see what we can see there we have these scattered single nodes on the outside which aren't going to be really useful for a topographical analysis if we scroll in here we can see our main network let's try another layout just to see if we can see anything better there you find who is one that I like a lot I think it makes interesting looking visualizations okay so we can see here that we have the same kind of thing that we had before where we've got these kind of starbursts a node that has a bunch of Singleton's coming out of it but because this is such a small network we can kind of leave those and just see what's going on the next thing to do is to color code these and if I come up here I can color code the nodes by attribute and keywords is one of those so we can see all the keywords here and it's interesting you know we can apply just the default colors that they have and you know kind of see some differences but it's not super informative we have to keep going back to figure out what the colors mean but since we have three main keywords here no we're not seeing because we filtered that out we have sex guns and drugs and then combinations of those what I would do in this case is to take primary colors so I will make this one red this one blue and drugs yellow and then for the combinations I'll actually just combine those colors so for example we have guns and drugs here I've colored guns blue and drugs yellow and so I'll make the combination of those just like if you mix those two colors so we'll make that green a little bit darker so it's easier to see now here we have all three gun sex and drugs so if you mix all those together you get kind of a brown if you're doing it with paint I'm just going to make it a gray to show that it's all of them sex and drugs gives us red and yellow so we'll make this one orange and then guns and sex blue and red to give us purple for this one okay and so if we apply that now we kind of see a spectrum of what's going on okay so there's a few things that we can see with the coloring now first is that we have this cluster down here of nodes that are gray so they have all the key words and then you can kind of see a patch here of a bunch of green ones so that's our guns and drugs there's a bunch of kind of scattered ones that are our primary colors red blue and yellow so they just have one key word and let's now actually see what effect the key words have on the importance in the network so I'm going to go to statistics there's a bunch of statistics that we can use to determine importance and centrality I really like network diameter because it gives us between the centrality which is a really popular one again that means that a node is kind of a gatekeeper between different parts of the network and then we can choose the size up here based on the attribute of between the centrality and apply that and so now we can see that we have these yellow nodes and some green ones over here that have really high betweenness centrality and then one red one over here if we come back to the coloring we can see that the yellow ones are drugs the green ones are guns and drugs and we only have one important one over here that has anything to do with sex so that's sort of interesting we can try different kind of centrality eigenvector centrality is similar to what Google uses which we see a pair is PageRank it basically iterates on who's the most important node so if we try to resize based on eigenvector centrality now we see a shift so it's a similar cluster of nodes that's important our one red one over here who was important has kind of disappeared but we see those yellow nodes that we had before were important and if we bring our key back up we can see that this has brought up the importance of some of our gun know there's some blue ones that are important over here and we kind of have an orange one in the middle which is sex and drugs right there that's also become prominence okay so that gives us a little bit of an idea it's definitely this cluster this group of nodes down here that are really important for us we can try doing some other filters so let's keep the filter that we have for non-null keywords and then we're going to go to topology and look at degree range and then start filtering out nodes that are the singleton so let's try having a degree of two oh okay so when that happens that means it's got rid of our filter so now we want to reapply it then let's drag this a bit and see okay so the reason like what you might be seeing here is that I have to drag that degree up really high to get some of these nodes to disappear and that's because they're connected to some of the nodes that we've hidden because they're null for the keywords and so their degrees actually much higher than what we're seeing here what we can see is that there's two important nodes that are showing up here this one and this one over here that are connected to a lot of these Singleton's that's fine I mean those are interesting let's try to get a better view here but they're really bringing in a huge number of nodes so what we're going to do is actually delete well let's actually look at those first so we're going to select in the data laboratory and we can see I think this is actually one that we saw was important in the last so it's a hidden service gun sex and drugs it covers everything and we see a bunch of the statistics that we've collected so that would be one to note down to look at later and similarly if we select this one and come back here we can see it's another hidden service that does all three with some statistics so those would be nodes to look more deeply into with the content but they're kind of screwing up what we can see here so let's delete them from this visualization okay and now we have a much different looking graph so we're going to rerun our layout and what you can see that's done is is exploded out a bunch of those single nodes to the outside well we can that looks like it's done so now we can come back in here and see a little more closely what's going on the size is a little distracting here so I'm going to change the size to just degree which is pretty similar for all of these okay so now what we can see is that we have this very tightly connected cluster here it's can each of these nodes you can see as I pull it is connected to a bunch of other ones so this cluster here that does because they're gray we know they have all three keywords and that's connected to it lots of our other nodes that are mostly the primary colors but some are green and some are orange we have the big kind of cluster of green ones here which you remember if we look at the coloring green is guns and drugs there's a lot of drug nodes in here just yellow ones and then a few orange ones drugs and sex so there's yellow involved in all of these nodes kind of in the center here if we zoom out a little bit the red nodes which use the sex keyword are really out on the outer edges of the network they're not really central now these great ones do have sex as a keyword but the ones that are singularly focused on that are kind of out on the outside edges there are some yellow ones out here too you can see a bunch of yellow ones on the outside those are drug ones but again there's also a cluster in the middle but the biggest group that kind of makes up the central part of our network has guns and drugs as their keywords so that kind of gives us an idea of keyword wise what's most important if we come back over to the data laboratory we can also take one more look the data laboratory is also filtered so you can see we only have nodes that actually are showing their keywords here and if we scroll through if you look at the node type these are all basically hidden services nodes and in fact we can look at that visually if we come back here and we pick the color of nodes not as keywords but as the node type and apply it our entire graph is purple which means they're all hidden services so that tells us that the nodes that have the key words are hidden services now is that because it's just an artifact of how we collected data only the hidden services provide the key words maybe but it also could be an attribute that the ones that have these key words for these sort of illicit activities that are happening online all happened to be hidden services that clarinets and eyepiece could have them and they just don't that's something that the visualization can't answer it can just tell us that there's this connection and that's something to look a little bit deeper in with the data so there you go there's a quick kind of goofy tutorial of how to look at these networks and the data and do some exploratory analysis hope it was helpful
Info
Channel: jengolbeck
Views: 14,067
Rating: 4.9763312 out of 5
Keywords: gephi, tor, dark web, network analysis
Id: JfMTWr8_pw4
Channel Id: undefined
Length: 22min 45sec (1365 seconds)
Published: Thu Sep 15 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.