Fighting the "Dark Web" | Christopher White | TEDxOStateU

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
you that's my name is Chris thank you for the direction and I'm talking about mimics so what is mimics mimics is a collection of DARPA funded software it does two things the first thing it does is it discovers content online and the second thing is it brings it back for analysis it's that it's a new kind of search engine and if people ask a lot like why do we need a new kind of search engine and the answer is because the Internet is a lot bigger and more complex than most people think the da of new york calls it a 21st century crime scene and to give you a sense the kinds of people i work with i partner with groups that have names like the violent crimes against children unit the child exploitation and obscenity section the National Security Council the Special Victims Bureau those are all places that need to understand content online and what they're using right now in many cases are the same things that that you and I use to buy birthday presents and look for restaurants they use existing commercial services like Google and Microsoft and Yahoo and I'll show you is sort of another way of doing things a way that enables them to ask different kinds of questions and perhaps then address their own issues and in general we call this domain-specific indexing and search and we call this domain-specific in the sense of what is your domain of interest we're at OSU here and maybe you're in material sciences and you want to understand the progression of the field by looking at journal articles over the last 60 years or if you're in law enforcement maybe you're at the Oklahoma Bureau of Narcotics and you're asking questions about victims of human trafficking cases those are the kinds of questions that require analysis and in general what we're proposing our technology solutions that enable analysis as opposed to simple search so we'll start here with a thing I'm showing on the screen it's a map that you can't see very well but at the center of its Oklahoma City and this map is a way in which Memex has been used to crawl the internet to find online ads for sex and related services to make those available for law enforcement prosecution and victim outreach services now the bigger the bubble on the map the more ads we've seen in the last year and if I hover over Oklahoma City you'll see that there were two hundred and eleven thousand for sex online last year in Stillwater here it's a small dot but it's still 12,500 ads and if you're one of the officers at Oklahoma Bureau of Narcotics and you're assigned a case that has an ad like this and you have 12,000 other ads in Payne County to ask questions about if you're using Google it's hopeless essentially right and so instead you have these interfaces that have maps that show the collections information you also have interfaces that allow you to search and look at individual ads in this case I've queried 405 the area code here and there are two hundred and sixty four thousand results okay so if you're doing this one at a time looking at single ads looking through Google it's going to take you a long time and you're going to miss a lot and and importantly it's it's it's impossible actually you may want to ask analysis questions you may want to know given an ad here in Payne County is it a single woman in a single incident is it a single location are there many locations are there many women or young boys is it just an entrepreneurial person or is it actually a victim of human trafficking these kinds of analysis questions have you have you seen this person before has this happened recently are they are they advertised elsewhere now they're here or is it the first time they've ever been seen on the Internet those are questions that are really hard to ask and answer when you're querying things one at a time using a Microsoft Bing or Google right and so that's kind of what Memex is it's a collection of interfaces that look like a search engine interface that allow prosecutors law enforcement victim outreach intelligence personnel to ask analysis questions so they can better allocate their resources better issues that poenas open investigations issue indictments and obtain convictions and it's being used it's being used for example now at the Special Victims Bureau in New York for every anti trafficking case it's a set of pilot partners we have that are testing it and so far it's proving to be quite useful because the alternative is as well show you next they asked me when I came to start talking about ideas and so what I want to show you now are kind of the three kinds of ideas that led to the creation of mimics so you get a sense for both why it's useful and how it works the first thing I'll do here is play a video for you showing the way in which these investigations are conducted now so although Memex is a technology the objective is to have domain-specific indexing and search we've applied it to human trafficking first because we think it's a place that we can make an impact we think it has many of the technology challenges and because when I run around to the government in two different parts of the law-enforcement community what I saw as well I'll show you next how they're solving problems now and it was a frustration so the first way in which ideas come to be sometimes is when you observe people that are frustrated so if you're a law enforcement person often what happens is you get a tip like in this case an email address and what do you do you're looking for evidence of trafficking you go to Google and you click on the first link everyone clicks on the first link you see a map you see some blue bars you see you know is that evidence of trafficking you're not sure you go back maybe you click on a different link in this case there's a bunch of character encoding issues a bunch of location names is this evidence of a victim of human trafficking how could how can you tell maybe you go back and third try is the charm you click on if it's playing still you click on a third ad if you look on a third ad maybe you see something like demographic information and and the point I'm trying to make here is that this process of searching and clicking and linking and cutting and pasting is the mechanism that everyone uses the internet for right now I mean how many of you have done sort of research projects and done this where you have a bunch of tabs open and you click on a bunch of links and you go back and forth and you cut and paste to a Word document or to Excel or PowerPoint this is the state of the art for most analysis tasks and it prevents complex questions from being asked the way you ask questions is fundamentally enabled by the way in which you can get answers and so if you're asking questions only the kind that you can answer using Google searches then you're gonna be really limited and I'm sure that you all watched the video but I doubt you noticed that on every webpage that I showed there there was one phone number that was the same actually and so for an investigator you know you're looking at these websites they have hyperlinks the hyperlink is what an author determines is a link between two pieces of information but it turns out for investigators the content which is shared across webpages things like phone numbers and images and addresses and email addresses those are really important for following the leads and so in that video it's really hard to see as a person that there's one phone number that's the same on all those pages but it's really easy for computers so the first again mechanism for causing an idea is this frustration watching people do this cut and paste process and know that modern technologies can help solve the problem the second one is it is a question of scale here we're looking at a map of the United States I already earlier zoomed in to Oklahoma but if I look at a different place like New York or one of our partners is there's 631 thousand ads for sex on Lions team the last year they have they have five analysts whose job it is to follow-up investigations to support those processes to encourage subpoenas if you have six hundred thousand as in a year and five people how how do you look through them how do you understand what's relevant how do you understand what organization would have the biggest impact if it were a subpoena those kind of analysis questions you can't answer if you're googling things one at a time you need a different paradigm for asking questions and these kinds of search and interface tools are those kinds of mechanisms in this case it's a map view sometimes it's a diagram of graphs sometimes it's a trendline and here it's not just in the US its global you know here we're looking at India and Southeast Asia here we're looking at Europe all over the world wherever there is the internet there are people advertising sex online and connecting those ads to victims of human trafficking is an important law enforcement and defense problem now in general this is a problem which is global and so we have to understand not just how it's happening on the surface web but in other parts of the internet as wasn't mentioned earlier there's this thing called the dark web or the deep web and what that means really as there are parts of the internet where it's easier to do things that are illegal because you can maintain your anonymity and there are places where you can publish things more quickly like in social media and those are parts of the internet which you're not going to find when you google things and so to have a comprehensive strategy to address complicated problems like human trafficking we have to enable technology to be part of it and that technology has to reach into those parts of the Internet which are harder to find so for example looking at social media this is a map of the world covering about a billion posts over about six-month time period and give you a sense for a billion a million seconds is about eleven days and a billion seconds is about 31 years it's a lot of information it's really hard for people to understand it's really hard to know what's going on and so this really reflects things like population and communications infrastructure but if we filter these things down to terms in the sex industry the map looks a lot sparser it looks much more more manageable and if we zoom into the United States and we look for what terms are being used where we can see things like the difference between the use of the word escort or prostitute and what that does by looking at where these things are posted allows us to ask analysis questions like where is their place selling sex I didn't know about is that place near a place I did know about for example our people going from parlors two miles up the road to a hotel with the same girls and posting them there those kind of questions are really hard to answer if you're restricted to the same interface so the second way in which ideas I can come can come from the scale of a problem of really identifying the problem when you're looking at 70 million ads for sex online in a year or a billion social media posts you have to think differently about how to solve that problem and that problem is big enough that it requires new approaches the third way that ideas come is from looking back into history so the program I run is called mimics it's a term from the 1940s from this article here called as we may think written by this guy Vannevar Bush this guy was spectacular he was the head of office of science research of development in the Department of Defense after World War two he initiated oversaw the initiation of the Manhattan Project he wrote this article as the end of the war right before he founded a billion-dollar company called Raytheon and in this article he challenges the science community that when were winding down from war the thing that we should is organized our knowledge in this article in 1945 he anticipates the web Wikipedia hyperlinks really far-out stuff in the 40s and and one of the inventions he describes in this paper is called the Memex it looks like this it's a machine he envisioned a machine that has spools of microfiche because microfiche was what you were using to do research in the 40s that you would go to the library and you sit down this machine and have all of these fools of microfiche that covered all of these topics of research and that as you do your research you create a trail of knowledge that connects parts of the microfiche together as you as you do your research and at the end of the day what you're left with is not just one spool of microfiche and one printout but you have this trail of information that leads to knowledge now now this was really ahead of his time and we're way past the days of microfiche but this concept of trailing into the Internet of trillions of social media and to the dark web creating the connections across content that lets you understand how to connect the dots for law enforcement investigations for missing persons in general for crisis response that's what we're using Memex for now when there's a crisis in the world there's a flood of online content just take Liberia last year with the Ebola outbreak or what's happening now in Iraq and Syria those are events in the world which people are going online and they're posting all kinds of content and we have to be able to understand what's going on if we want to make intelligent decisions and so although we're not looking at a machine like this this is the same kind of concept in the modern terms using the scale of the problem addressing the frustration of current people to solve it's sort of important defense and national security problems and so you know to close since we're you know here at school Oklahoma State where I went to school and I went to the library here and as I mentioned before our thinking is often shaped by the way in which we can ask questions if that was the beginning of when the internet was taking hold and and before the Internet people were asking questions of librarians of card catalogs and that limited their thinking now we have the Internet have Google we have Microsoft those are limiting our thinking what I sort of challenged you to do as academics and as supporters of academics is to imagine the kinds of questions you would like to ask if you had a different way to ask questions if you could look across the internet and determine I'm interested in this topic find me all of the web pages about this topic and make it understandable for me you would ask more rapid more comprehensive more diverse kinds of questions and perhaps we would all have a much stronger knowledge base and so thank you very much and go pokes
Info
Channel: TEDx Talks
Views: 92,656
Rating: 4.1876211 out of 5
Keywords: TEDxTalks, English, United States, Global Issues, Human Rights, Internet, Trafficking
Id: 9QsjkJcUznA
Channel Id: undefined
Length: 14min 17sec (857 seconds)
Published: Thu Apr 30 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.