What is a graph database? (in 10 minutes)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello, my name is Karin Wolok I am the community manager at Neo4J. And over the last three plus years that I've been working at Neo, I realized there's still a lot of software developers and data scientists that do not understand what a graph database is. They kind of think they know or they might have heard of it, but they really don't understand it. So I just want to take a couple minutes and explain it to you because I think it's vitally important that anyone who works with any kind of capacity with data, should understand what it actually is, because it really is a game changer. So I'm gonna share my screen really quick. I'm gonna try to make this as quick and easy as possible. Ira, sharing my screen, sharing it, sharing it, working on it, I'm getting there, hold on. Okay, all right, so I'm sharing my screen. So this is a example of like the Neo4j browser right here. So one thing that I think is really important to note, very important to understand, is that Neo4j is a database. Okay? That is not a visualization, that sits under another database, it is an asset compliant transactional database. So that's something that's very important. Technically, it's in the no SQL category but it's very, very different from most normalized relational databases in the sense that most relational databases store data in the shape of tables and joins. Neo, stores the data in the shape of a graph. And when I say graph, I do not mean a chart, I mean, a graph theory graph, like a network. So in the data model inside of Neo4j is like you have your nodes here, your nodes are your noun sets, person, place, thing, location, these are your notes, they're your notes here. And these notes could also have properties, which they could be like labeled right here, so those are your properties. And then you could have relationships between those nodes and Neo4j relationships are actually first class citizens. Meaning that they're just as important as the nodes themselves. Like how are things connected, right. So just like you could add different types of nodes, you can also have different types of relationships, you could put properties in those relationships, you could put values, so they could be weighted, you could have geospatial information, you could have date and time. So basically, whenever you have data that's complexity connected, and you wanna understand how these things are connected to each other, or maybe you wanna find a shortest path, right, like shortest path from here to here, or maybe you're looking for patterns in your data. Or maybe you're looking for something that's like a combination of patterns, right? It might be like, if you're doing fraud detection, it might not be one transaction that sets off a flag, but it's a transaction with these other patterns that in behavior that might be kind of, raising the flag. Or if you're doing any kind of graph algorithm type of analysis, whether you're like community detection, or between the centrality and PageRank, things like that, like network related style queries, like is the shape of your data network, or is it a table? So that's kind of like the big differences. So there's a few things that I think that are really important to note about understanding Neo4j and what makes it different. So for one thing, Neo4j is a native graph database. And what that means is the underlying architecture of how the data is actually stored, is not built on top of tables. Okay, everything is built to support this type of data model, this highly connected data model. So when you're doing a query with most like relational databases, you're indexing and then you make another hop and you're indexing there, and then you're another joins, all these joints, all these joins you're always indexing. And that's very computationally expensive when you're doing a lot of hops. With Neo4j, hen you do a query, you index to find your initial starting point, and then from there, you're just basically chasing memory pointers, which the computer happens to be pretty good at. So the benefit of that, not having to index every time you make a hop, is pretty powerful. The traversal time between doing one hop, or 12 hops can be pretty consistent, which is pretty powerful thing when you're hopping through a very, very highly connected network state. So that's one thing that's very important to understand. And then the other thing is also the query language. So you're probably used to SQL, because that's like a pretty standard, query language. Problem is that when you're working with graph, databases, and graph type of problems, sequel isn't gonna cut it, because sequel is not built for highly connected data. So Neo4j actually developed language. It's an open language, a lot of other companies are using it, it's called Cypher. And Cypher, it's basically SQL for graphs. It's more like where SQL is kind of like give me this, Cypher, you can be a little bit more ambiguous. It's based off of pattern matching, like more networky kind of related queries, which is really powerful. So SQL is a declarative language. And it's also based off of ASCII art, which it makes it really nice to be able to see because it looks like what it actually represents. So just a high level overview of it, your nodes here, in yellow, blue, these here, they're represented by parentheses. And then these relationships that are directional relationships between those nodes are an arrow, literally an arrow, and then brackets, it's almost funny to look at it, you're like, "Oh, yeah, that makes sense." So here's another node, right? So it's like, we're looking for a company that develops a game. But we're also looking for, there's another relationship here on this side where a company also publishes the game, where the company is Electronic Arts. Wow, right, I know, it's crazy. It's crazy. So I really like this blog post 'cause I think it kind of shows the power the model in general, like the data model, like been able to do a lot of these hops. But I think it also shows like the power of having a query language that can help you look into networks and graphs and patterns and pathways. It's just very cool. So I'll show you another example. This one I also think is pretty powerful. So here is an example of a video game recommendation. So you have, here is a video game in the yellow node, that's fallout three, and here whoops, too far. Yeah, you have Borderlands two, right. And in the blue nodes, they might be I don't know, like consoles that the game is played on or whatever. And the green you have different themes of the game, is it zombie and pirates and plain and war or whatever. And remember too like these relationships, because they're first class citizens, you could also have values on them so they could be weighted. You could have a lot of pirates and a little bit of zombie or whatever. So that part is also something you could take into account which can be pretty powerful when you're like, trying to make queries based on weights. And then in red, you could have like, how is the game played? Is it played by multiple player? Are you playing one player, as a first person or with a mouse or a joystick, or a keyboard? Like how is the game actually played? Now, here, if you have a user that likes both of these games here, and you wanna say, Okay, I wanna understand who my user is, or maybe like, find out what these two things have in common, so I could find another game that has the most in common with these two games, right? Just generally characteristics, like what do they have in common? If you were to do this in SQL, it would be a very extensive query, because of all the different types of nodes that you have and the different types of relationships. So with Cypher, it's actually very, very straightforward, you might actually laugh at how amazing it is. But so this is an example of the Cypher query for this query. No, it's crazy, it's three lines. So you have here is like your node in parentheses, and then here is where your relationship would be. In this case, the relationship is undefined. Any relationship, any direction, you could put a star six in there or something, if you wanna look six hops out or whatever. There's all kinds of different things you could do with Cypher, but you're looking for a characteristics and a game relationship between these two games. Damn, right. I know, it's amazing. So this is like I think just like a really powerful example of like something you could do with not just the data model, like being able to store the data and query it very quickly. But also the ability to use Cypher to kind of help you find the things that are normally highly connected or distantly connected, or, those like roughly related kind of problems. So, I know you're probably already thinking about like, oh, where can I use this 'cause it does sound kind of interesting. I will tell you there's a lot of really cool use cases for it. The very standard ones like recommendations, big one, fraud detection, like network and IT management, those are like the really big ones that kind of are frequently used. A lot of like NLP related stuff. Like even if you think about linguistics, like how we speak right, that is all a graph. There's this thing that happens that we call it the graph epiphany. Basically, when you start seeing in graphs, everything, you see graphs everywhere, you can't get rid of it, because everything's dependent on something else. It's like all these like, intertwined connections of things. But there are a lot of really, really cool use cases. It's actually probably one of my favorite things about my job is hearing about all the interesting use cases of how people use graphs. In this case, this one is het.io. If you go to het.io, they have, I'll go to like their the homepage here. So like pet.io. This this instance, was created by Daniel Himmelstein, who's a postdoc researcher at University of Pennsylvania. But you can play around with this stuff, that he's got the ability for you to explore, there's Neo4j browser thing and there's guides that kind of walk you through tell you what you can do with it. We also have Neo4j Sandbox, I'd probably say it's probably one of the best places to start, just to kind of get you thinking in graphs. So if you go to Neo4j.com/developer, there's the online Sandbox thing here. You don't have to download anything. There's pre existing data sets, you can just jump in, you can follow the guide, start playing around. And then once you're ready, you can kind of dig a litle bit There's like, intro to graph databases, YouTube series. But we have all kinds of like, we have GraphAcademy where you can go learn self paced, surreal stuff. So yeah, hopefully, you're gonna thank me for this and not hate me for getting you addicted to graphs. I will also make sure I mention because this does happen to people who are in the graph epiphany. They try to put graphs everywhere, they don't belong everywhere, they are everywhere. They don't belong everywhere, they're highly connected data problems. But that said, once you're addicted to graphs, and you already have these amazing use cases that you wanna share with the rest of the world, then you can come to me. And then, we could do something with the community. So yeah, hopefully you enjoyed the session. Hopefully this was helpful. See you soon, bye.
Info
Channel: Neo4j
Views: 74,778
Rating: 4.8615918 out of 5
Keywords: neo4j, graph databases, graphs, nosql
Id: REVkXVxvMQE
Channel Id: undefined
Length: 10min 58sec (658 seconds)
Published: Tue Jun 09 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.