Graph databases: The best kept secret for effective AI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
just because something is intelligent doesn't mean it's infallible our next speaker will outline how to get the best from artificial intelligence and why can connectivity is the key next up the co-founder and CEO of neo4j will explain why connected data is the key to more accurate efficient incredible learning systems using real-world case studies on everything from spatial engineering to investigative journalism please welcome to the stage the co-founder and CEO of neo4j a meal Ephraim [Music] all right fantastic man packed audience fantastic to be here so I thought would kick it off a little bit interactive it's an early morning session I want to make sure that we're all awake we're gonna start with a little bit of morning gymnastics no we're not gonna do that but we'll do the little bit which is how many in here consider themselves data practitioners like a data scientist or data engineer or data analyst raise your hand probably 15 20 percent awesome how many in here considers themselves developers they rewrite code for a living maybe a third is something like that how many in here have heard of graph databases before more than the union of those yes success how many in here have used neo4j and raised all right probably 10% something like that sweet so let's kick things off my name is MLA frem and I'm the founder of a company called neo4j it's also an open source project and the product neo4j and it's the leading graph database in the world before we get into it I really only have one ground rule and it's the same ground rule for all my talks which is that I do not want your undivided attention please tweet about this or you know Facebook message or snapchat choose your social media but if you tweet the only thing that I ask about is that you use the neo4j hashtag in association of course with the web summon hashtag because we monitor that one religiously let me know if I'm doing good if I'm doing bad and if you have any questions cool so for the people in here the roughly 50% or so that haven't heard of graph databases will kick this off a little bit with a high-level explanation of what a graph database is and then we're gonna get into the main part of the talk which is graphs and AI so my favorite way of explaining graph databases these days is through this how many in here remembers the the Panama papers hand raised yeah probably 60 70 percent this was the biggest news story in the in the first half of 2016 so the latter half of 2016 gave us the election of Donald Trump and brexit so it was it was dwarfed 2016 was a fantastic year for us all but in the first half the Panama papers was the biggest story and it started out very simple it started out with two journalists and a German newspaper called the sea George at Zeitoun who were contacted by an anonymous person who said that hey I have a lot of data from a law firm that specializes in offshore offshore tax accounts and offshore tax accounts accounts in in very tax friendly jurisdictions can be used broadly speaking for two purposes it can be used for legal tax planning but it can also be used for illegal tax evasion and so we contacted these journalists and said I have this massive data leak of all their internal documentation do you want to have a look and they of course said yes and what they got was 2.6 terabytes worth of data and they ran that through a massive pipeline of Technology when they ended up with eleven and a half million documents this is emails this is scanned government forms and and things like that and they had that data and they wanted to try to figure out if there were any stories in that data right of course now we know that there were a bunch of stories but if we take a little bit of a step back from the world of of data and we just think about investigative journalism investigator journalism is actually all about finding patterns and it's patterns specifically how things are connected and not directly connected because if it's directly connected people usually know about it it's the indirectly connected stuff so here's an example of a pattern we have a person with a bank account with an account in a bank and that is actually the pattern of you yes yes and in you and you and in me that this is all of us this is not a particularly interesting pattern no offense you're very interesting but generally speaking this is not a very interesting pattern what becomes a little bit more interesting is if you look at this pattern we say that this person over here lives at a specific address at that address another person lived who also is well whose an officer of a company that company has an account with him in an offshore tax jurisdiction right all of a sudden we see that this individual down here is indirectly connected to an offshore tax account not directly but indirectly right so in the world of data we can look at this a little bit more abstractly these were individuals we can think of them as person and accounts and banks in the world of graphs we call the circles here we call them notes and they call the lines connecting them relationships with notes and relationships and then key value properties that you attach to both the notes and very importantly to the relationships with that you can model everything and what's really amazing is if you have an infrastructure that can work with this type of data in this form but not just with whatever there's one two three four five six seven nodes but with a million nodes or eleven and a half million nodes or a billion nodes or a trillion nodes at that point you can create something really magical now this particular example actually wasn't a hypothetical example this was the example of this gentleman over here sigmundur gunlogson who is now the former prime minister of iceland he actually ended up having to leave his job because he hadn't disclosed the fact that he had previously had this this account in an offshore tax haven again not directly but indirectly through this graph right this is an example of how to use graphs in particular in the field of investigative journalism the Panama papers ended up having massive impact again with the biggest story of 2016 until until the political stories last year the organization behind it ended up winning the it surprised and exactly a year ago when I was here actually here in the day when I was at web summit last year this story had just broken which is the follow-on story from the Panama papers the Paradise Papers which did not have any Icelandic prime minister but had Putin in there had Queen Elizabeth in there and so on and so forth a few months earlier this year NBC used neo4j to look at a number of Twitter accounts and ended up detecting that there was a lot of Russian troll accounts that swayed key moments in the u.s. election so all this data this is example of data that you can look at using a graph shaped form and it's actually all available if you go to new york a calm our website where we're open source so you can just download our product for free we also have an online sandbox where you can log on you get a gallery of data sets including the Panama papers including the Paradise papers including the Russian Twitter troll so this is a screenshot from that sandbox would you just launch it's a tab in your web browser it spins up a database in the in the cloud that you in the cloud that you don't have to manage and then you can just start playing around this day that's if you think data is interesting if you're interested in politics and democracy please use this and go on and save democracy for us so that's an example of graph usage today and hopefully you got a little bit of sense that for the 50% or so the audience that don't know graph databases hopefully have a little bit of a of a sense of what it is that we can do so these are a very kind of press oriented popular examples so what are people then using graph databases for today right so we have I would say there's a handful of use cases that that that cluster are you surveys today this between those six use cases that's probably two-thirds of our user base today and a very popular one is real-time recommendations right so if you imagine that you're a big retailer you want to be able to look at purchase patterns you you have stored all your purchase history in a relational database row by row every day that looks exactly the same which is something like a customer ID you know dollar or euro amount the product you've bought and maybe a date time right that's not to purchase history I believe every role look exactly the same it's amazingly well-suited for the relational database and and these big retailers have used this technology for our know thirty eight forty years or something like that but you can look at exactly the same data in a little bit of a different form which is hey let's draw it out and then looks at it along its connections right so you have an individual let's say a mole right and I've bought these three products and then we have someone else who's also bought those three products and then how about we take the fourth product that this person has bought and recommend that to a mole because clearly they seem to have some kind of similar purchase patterns right so that's called an open triangle recommendation or a collaborative filtering recommendation and it's a very popular use case for graph databases today second one fraud detection fraud detection is is today primarily if you look at the vast majority of fraud detection applications they analyze and look for correlation and anomalies and that's great for detecting a lot of different things what it won't capture is a number of individual transactions that are individually okay but they're connected in ways that are not okay so for example fraud rings and graph databases are really great for doing that third one is network and IT operations if you're a big telco your entire business is managing large and in network infrastructure you want to be able to figure out what happens if that cell tower blows down what happens if I install a new cell tower over there if I have a security breach in my big data center how can that cascade across my entire cloud infrastructure and for that it's very useful to be able to build up a graph model which of course it fits very well in the graph database master data management is customer 360 deployments how do my customers connect to all my internal systems which is very popular now in the world of gdpr but also to external systems like social media how do you track all that knowledge graph is an area that is becoming really really popular for graph databases and it will spend a little bit more time on that later and then finally identity and access management how do i i'm i work at him let's say a big bank how do i map from myself to all the the various pieces of content and collateral that i have access to well it depends on which groups i belong to where i sit in the organization which projects i have been participated in in the past and so on and so forth how do you manage that entire identity and how it connects to the rest of the organization is a very graph we connect the data problem so those are popular use cases today in the enterprise for for graph databases it turns out that if you look at graph databases it in many ways isn't as well-known as for example hadoop was in the in the early days of of big data or document databases like like MongoDB but actually if you look at the the popularity of graph databases its rising sharply today this is a site called DB engines that maps buzz about database project right so it looks at things like LinkedIn scales it looked at number of tweets so again feel free to tweet about this it looked at a number of job postings Google searches and number of signals like that in the computer score and it turns out that graph databases has been the fastest-growing category in all of data for several years right now which i think is pretty pretty extraordinary this speaks to the the power of using connections in data are not just data in isolation if you look just at our world of graph databases in the everyday specifically which is the one where I have the most access to date to direct data of course actually almost 80 of the Fortune 100 are using neo4j today and if you look by verticals it's actually pretty astounding 20 of the 25 biggest banks in the world seven of the ten biggest retailers four of the biggest telcos in the world it's pretty astounding what's happened in the past couple of years really when this happened and if I take a little bit of a step back I've worked in the world of graph databases for 10 15 years now this is this is all I've done this is my professional lives work I really see two broad waves when graphs started being adopted the first wave was in the early 2000s and this is the in the consumer web where we saw a number of companies that came in and they entered a specific market and they said hey I'm gonna rephrase I'm gonna reorganize and reshape this business around connections in data so stark example of this is Google you know for the Millennials in the room you can't fathom that there was a world when Google was only one out of ten 15 search engines but that's actually what it was like in the late 90s there were I think 15 plus or a changes in the market when Google entered the space they all did exactly the same thing they downloaded into higher web which i think is an amazing accomplishment by the way but they download the entire web and then they when you search for something they looked inside of each and every document to find like searched through to serve up relevant search hits right now what Google did of course is the famous PageRank algorithm where they said that hey I'm gonna do that but on top of that I'm gonna look at how things are connected and I'm gonna rank based on that and you know the how it works the more people link to a certain page the more valuable it is or the more relevant it's deemed to be right and then of course it's transitive so people who then if that page in turn links to someone else it has a higher score this is called eigenvector centrality is the kind of the technical term for it in the graph in the graph world but it's much more popular under the name PageRank right as invented by Larry Page so that was the key innovation of Google right they said let's reshape my industry around connections in data LinkedIn did exactly the same thing there was a number of different job search sites when LinkedIn came out but they didn't say we're gonna do exactly that they said we're gonna do that plus we're gonna map out how people are professionally connected the graph right the same thing with PayPal actually that was the key innovation that enabled PayPal the fact that they had the transactions all the money flowed throughout their system that was why they were able to do fraud detection which was actually the key risk with making online payment works in the early 2000s if you add up these companies that enter their market and reshape them around connections you get over a trillion dollars worth of market gap in the consumer web that was the first wave of grab at the graph adoption this right here is the second wave of graph adoption and this is just have been happening for the past three four years where graphs are being adopted inside the enterprise so these are not went first companies these are not technology first companies these are traditional industries that are started to use connected data to power their businesses that's the second wave what we're going to talk about today is what I see as the third wave of graph adoption and it's starting exactly right now 2018 I think is gonna we're gonna look back to say that that was the kind of the key spark moment where graphs started being used in new ways and of course that new way is graphs in AI right and so the question then is what is the connection between graphs and AI this is a graphics that I picked up somewhere actually don't know exactly where it was somewhere in my my social media feed which described categories of of AI and I thought it was a really good kind of taxonomy of the various different subfields but of course what stands out to me as the the graph guy drinking a lot of my own kool-aid when it comes to graphs was that just the way they visually described all the different fields was as graphs right and this is not produced by us this is someone someone else would produce this they just intuitively chose to describe the various subfields of AI using the graph metaphor so there's something very intimate about graphs and AI and I think in a nutshell the way to summarize it is that graphs provide context for AI and context ends up being a really important thing I haven't thought too much about that until a few years ago but let's take a little bit of a look and explore what context actually is so this is a picture of me it's almost as good as my my social media picture from 14 years ago that I'm still using if you look at some data about me my first name is mo mine the last name is a from age 40 gender male right that's some information about me now you're learning something about me but if I tell you how I relate to the world I have two kids and a baby actually as of recently I work at a company called neo4j I'm married to Madeline I play the piano I watch this is supposed to be I watch The Matrix I don't know if that's that's visible it's a huge coincidence by the way that the name of the company was initially neo and that's also the main character from The Matrix movies that's just purely a random thing so I watched The Matrix I was born in Sweden and I could don't drive a Tesla this is fake in use I wish I drove a Tesla I Drive a Volvo because Sweden and also because they're any of a J customer as its it's a great great car and all of that is what gives me context right so this this year now all of a sudden you understand me way more than just the data that was attached purely to me right that is that context is what makes you understand me right so unknown concept we think as a human being we will learn unknown concepts by relating them to previously known concepts that's kind of how it works right so that's all captured by knowledge graphs which was a topic that I spent a lot of time on at Web Summit last year and you can look up that talks I'm not gonna spend too much time on that today but then the question is how can we play apply that to machine learning so today basically what your machine learning pipelines look like is that you take some kind of data records you extract them into features and you use those features to train your machine learning models right they look very much like this like a row in a relational database right but you train that with your models and then those models make predictions in real time right but what we actually know about the real world is that the real world doesn't look like the row at the at the top here the real world looks like this right it's connected right so that's what that is actually what's going on in in in in the real world and it turns out that if you're able to train your machine learning using connection and not just how things are kind of looked at in isolation you'll be able to predict much more I can say this with a high degree of confidence because there's been a lot of research on about this this is a guy called professor Fowler James Fowler who did a lot of research he wrote a book called connected where he said he pretty predicted that actually if I look at not just data about you but about your graph you're friends and friends and friends I will have a lot higher degree of productivity with that data then if I just use data in isolation he proved that out with election data I prove that out with with non-smoking data and that I think is pretty mind-blowing if you know everything about me versus knowing nothing about me and just knowing my graph within the latter's example you're going to be able to predict a lot more and today we're not using that in our machine learning right so we need to be able to move our machine learning from this discrete feature extraction right into connected feature expect extraction and by and large that is what graphs allow you to do with with AI we move from feature extractions in isolation to connected feature extraction where we can see how things are connected which ultimately ends up producing much much better predictions there are four broad areas where graphs touch AI last year I talked about knowledge graphs this year we spend a little bit unconnected feature extractions there are two more areas that I think are really really important um so that's broadly speaking how I look at how graphs interact with with AI today I'll leave you with one final thing which is that the best database in the world sometimes ask on stage for the best database in the world is and people think that I'm kind of looking for for positive reinforcement and they say neo4j actually think the best database in the world is this one the human brain and the human brain is ultimately structured as a graph neurons connecting to other neurons so use the best database in the world use your human brain but also use a graph database if you think is interesting go to New York downloaded it it's available for free thank you very much
Info
Channel: Web Summit
Views: 25,119
Rating: 4.8333335 out of 5
Keywords: Web Summit, web summit lisbon, web summit conference lisbon, web summit paddy, web summit portugal, web summit portugal 2018, web summit video, web summit youtube, Web summit lisboa, Emil Eifrem, binate, binate.io, binate io, graph data, graph databases, database AI, Neo4j
Id: 2ZzGMzitNgo
Channel Id: undefined
Length: 23min 6sec (1386 seconds)
Published: Wed Nov 07 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.