Hello, my name is Karin Wolok I am the community manager at Neo4J. And over the last three plus years that I've been working at Neo, I realized there's still a
lot of software developers and data scientists that do not understand
what a graph database is. They kind of think they know
or they might have heard of it, but they really don't understand it. So I just want to take a couple minutes and explain it to you because I think it's vitally important that anyone who works with any
kind of capacity with data, should understand what it actually is, because it really is a game changer. So I'm gonna share my screen really quick. I'm gonna try to make this as
quick and easy as possible. Ira, sharing my screen, sharing it, sharing it, working on it, I'm getting there, hold on. Okay, all right, so I'm sharing my screen. So this is a example of like
the Neo4j browser right here. So one thing that I think
is really important to note, very important to understand,
is that Neo4j is a database. Okay? That is not a visualization, that sits under another database, it is an asset compliant
transactional database. So that's something that's very important. Technically, it's in the no SQL category but it's very, very different from most normalized relational databases in the sense that most
relational databases store data in the shape
of tables and joins. Neo, stores the data in
the shape of a graph. And when I say graph,
I do not mean a chart, I mean, a graph theory
graph, like a network. So in the data model
inside of Neo4j is like you have your nodes here, your nodes are your noun
sets, person, place, thing, location, these are your
notes, they're your notes here. And these notes could
also have properties, which they could be
like labeled right here, so those are your properties. And then you could have
relationships between those nodes and Neo4j relationships are actually first class citizens. Meaning that they're just as important as the nodes themselves. Like how are things connected, right. So just like you could add
different types of nodes, you can also have different
types of relationships, you could put properties
in those relationships, you could put values, so
they could be weighted, you could have geospatial information, you could have date and time. So basically, whenever you have data that's complexity connected, and you wanna understand how these things are connected to each other, or maybe you wanna find
a shortest path, right, like shortest path from here to here, or maybe you're looking
for patterns in your data. Or maybe you're looking for something that's like a combination
of patterns, right? It might be like, if you're
doing fraud detection, it might not be one transaction
that sets off a flag, but it's a transaction
with these other patterns that in behavior that might
be kind of, raising the flag. Or if you're doing any
kind of graph algorithm type of analysis, whether you're like community detection, or between the centrality and
PageRank, things like that, like network related style queries, like is the shape of your data
network, or is it a table? So that's kind of like
the big differences. So there's a few things that I think that are really important to note about understanding Neo4j
and what makes it different. So for one thing, Neo4j is
a native graph database. And what that means is the
underlying architecture of how the data is actually stored, is not built on top of tables. Okay, everything is built to support this type of data model, this highly connected data model. So when you're doing a query with most like relational databases, you're indexing and then
you make another hop and you're indexing there, and then you're another
joins, all these joints, all these joins you're always indexing. And that's very computationally expensive when you're doing a lot of hops. With Neo4j, hen you do a query, you index to find your
initial starting point, and then from there, you're just basically chasing memory pointers, which the computer happens
to be pretty good at. So the benefit of that, not having to index every
time you make a hop, is pretty powerful. The traversal time between
doing one hop, or 12 hops can be pretty consistent, which is pretty powerful thing when you're hopping through a very, very highly
connected network state. So that's one thing that's
very important to understand. And then the other thing
is also the query language. So you're probably used to SQL, because that's like a pretty
standard, query language. Problem is that when
you're working with graph, databases, and graph type of problems, sequel isn't gonna cut it, because sequel is not built
for highly connected data. So Neo4j actually developed language. It's an open language, a lot of other companies are using it, it's called Cypher. And Cypher, it's basically SQL for graphs. It's more like where SQL is
kind of like give me this, Cypher, you can be a
little bit more ambiguous. It's based off of pattern matching, like more networky kind
of related queries, which is really powerful. So SQL is a declarative language. And it's also based off of ASCII art, which it makes it really
nice to be able to see because it looks like what
it actually represents. So just a high level overview of it, your nodes here, in
yellow, blue, these here, they're represented by parentheses. And then these relationships that are directional
relationships between those nodes are an arrow, literally an arrow, and then brackets, it's
almost funny to look at it, you're like, "Oh, yeah, that makes sense." So here's another node, right? So it's like, we're looking for a company that develops a game. But we're also looking for, there's another relationship
here on this side where a company also publishes the game, where the company is Electronic Arts. Wow, right, I know, it's crazy. It's crazy. So I really like this blog post 'cause I think it kind of shows the power the model in
general, like the data model, like been able to do a lot of these hops. But I think it also shows like the power of having a query language that can help you look into networks and graphs and patterns and pathways. It's just very cool. So I'll show you another example. This one I also think is pretty powerful. So here is an example of a
video game recommendation. So you have, here is a video
game in the yellow node, that's fallout three, and here whoops, too far. Yeah, you have Borderlands two, right. And in the blue nodes,
they might be I don't know, like consoles that the game
is played on or whatever. And the green you have
different themes of the game, is it zombie and pirates and
plain and war or whatever. And remember too like these relationships, because they're first class citizens, you could also have values on them so they could be weighted. You could have a lot of pirates and a little bit of zombie or whatever. So that part is also something
you could take into account which can be pretty
powerful when you're like, trying to make queries based on weights. And then in red, you could have like, how is the game played? Is it played by multiple player? Are you playing one
player, as a first person or with a mouse or a
joystick, or a keyboard? Like how is the game actually played? Now, here, if you have a user that likes both of these games here, and you wanna say, Okay, I wanna understand who my user is, or maybe like, find out
what these two things have in common, so I
could find another game that has the most in common
with these two games, right? Just generally characteristics, like what do they have in common? If you were to do this in SQL, it would be a very extensive query, because of all the
different types of nodes that you have and the different
types of relationships. So with Cypher, it's actually
very, very straightforward, you might actually laugh
at how amazing it is. But so this is an example
of the Cypher query for this query. No, it's crazy, it's three lines. So you have here is like
your node in parentheses, and then here is where
your relationship would be. In this case, the
relationship is undefined. Any relationship, any direction, you could put a star six
in there or something, if you wanna look six
hops out or whatever. There's all kinds of different things you could do with Cypher, but you're looking for a characteristics and a game relationship
between these two games. Damn, right. I know, it's amazing. So this is like I think just
like a really powerful example of like something you could do with not just the data model, like being able to store the data and query it very quickly. But also the ability to use Cypher to kind of help you find the things that are normally highly connected or distantly connected, or, those like roughly
related kind of problems. So, I know you're probably
already thinking about like, oh, where can I use this 'cause it does sound kind of interesting. I will tell you there's a lot of really cool use cases for it. The very standard ones
like recommendations, big one, fraud detection, like network and IT management, those are like the really big ones that kind of are frequently used. A lot of like NLP related stuff. Like even if you think about linguistics, like how we speak right,
that is all a graph. There's this thing that happens that we call it the graph epiphany. Basically, when you start
seeing in graphs, everything, you see graphs everywhere, you can't get rid of it, because everything's
dependent on something else. It's like all these like, intertwined connections of things. But there are a lot of
really, really cool use cases. It's actually probably
one of my favorite things about my job is hearing about all the interesting use cases of how people use graphs. In this case, this one is het.io. If you go to het.io, they have, I'll go to like their the homepage here. So like pet.io. This this instance, was
created by Daniel Himmelstein, who's a postdoc researcher at
University of Pennsylvania. But you can play around with this stuff, that he's got the ability
for you to explore, there's Neo4j browser thing and there's guides that
kind of walk you through tell you what you can do with it. We also have Neo4j Sandbox, I'd probably say it's probably one of the best places to start, just to kind of get
you thinking in graphs. So if you go to Neo4j.com/developer, there's the online Sandbox thing here. You don't have to download anything. There's pre existing data sets, you can just jump in, you can follow the guide,
start playing around. And then once you're ready,
you can kind of dig a litle bit There's like, intro to graph
databases, YouTube series. But we have all kinds of like, we have GraphAcademy
where you can go learn self paced, surreal stuff. So yeah, hopefully, you're
gonna thank me for this and not hate me for getting
you addicted to graphs. I will also make sure I mention because this does happen to people who are in the graph epiphany. They try to put graphs everywhere, they don't belong everywhere,
they are everywhere. They don't belong everywhere, they're highly connected data problems. But that said, once
you're addicted to graphs, and you already have
these amazing use cases that you wanna share with
the rest of the world, then you can come to me. And then, we could do
something with the community. So yeah, hopefully you
enjoyed the session. Hopefully this was helpful. See you soon, bye.