an introduction to neo4j (graph database tutorial for beginners)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey welcome back and in this video we're going to give an introduction to graph databases and in particular neo4j which is one of the most popular open source graph databases in existence today and we're going to get pretty hands-on with neo4j so we're going to show you how to install it we're going to create our own graph database and we'll talk about a couple of the use cases that you would particularly use neo4j for and at the end of this you'll be able to pick up neo4j for yourself and then go ahead create your own graph databases and solve your own business problems so the first thing we're going to do is going to install neo4j on our local machine what i'm going to do for the sake of cleanliness is i'm going to use docker to do that installation and it's pretty simple to get it installed on your local machine you can just type in docker pool at neo4j and then that will do a download now what you're going to see in my machine because i'm running apple silicon m1 chip you're going to see a no matching manifest for linux arm v8 and that's because neo4j doesn't have an arm64 image that would work with the apple silicon m1 chips now that's really easily resolved all you're going to do is add in dash dash platform linux md64 and then what will happen is docker will pull the amd64 version and apple silicon can run that in compatibility mode and everything will be fine for docker so if we return that then docker is going to do a download you see i've already got it installed so that's fine but if on your machine if you haven't soldered before it will take a few seconds to install that and then if i do docker images you will see that neo4j is sitting on my machine i've pulled the latest one and you know that was from 11 days ago and it's a sort of 600 meg or 559 meg image there now that i've pulled neo4j onto my local machine i need to just run my docker image and if you look at my screen there for a second you'll see the command is pretty simple it's docker run uh you have to specify the port so it's minus p and in this case 74 74 and 76 87 they're the two ports that you need to expose to make that work uh you see i'm running it as minus d so it runs as a background process and then i'm setting the username and password to neo4j test and then of course i'm using the latest neo4j image the one i just pulled now of course because i'm running apple silicon i have to do the minus minus platform linux amd64 but if you're not running apple silicon you don't need to do that you can just do a straight docker run now this will install neo4j as a ephemeral container so if whenever i start that up um and then when i kill the container any data that i've saved into that container will get lost because it will be destroyed when i destroy the container if you want to persist your data then you can see here that you can set data logs uh where your plugins are as sort of command line inputs and you can specify somewhere on your machine where that data can be stored and therefore when you start a container up a second time uh it will have a save file it will have all the data stored outside of the container and that will allow you to restart where you left off uh in our case i'm just gonna run it and you know whenever my container dies then i lose my data and that's that's fine for me but you can choose what scenario works for you the best so if we just take that command for a second um i'm going to copy that and i'm going to put that in my terminal and we'll just hit run and that will start off at neo4j on my local machine now that neo4j is up and running i can just quickly check it's running in docker by typing docker ps and you will see neo4j latest is running in on my machine so if i just open up my browser and if i go to localhost 7474 or just hit return you will see i am presented with a connected neo4j so you see neo4j is running on my machine and for the username and password if you remember it was neo4j as the username and test was the password and then if i hit connect and if you give it a second it will log into neo4j uh and we are up and running if it's your first time using neo4j there's lots of really cool tutorials that you can mess around with and so there's guides to cipher which is the query language i'll talk about that in a second there is uh data set so there's like a movie data set there's built in and there's guided tutorials that you can play with so if if you want to sort of guide the tutorial to neo4j you can do that what we're going to do though is build our own neo4j database for a sort of real-world scenario that we might utilize so the first thing we're going to do is start interacting with the graph database and neo4j uses a language called cipher which is their sort of query language for interaction with the graph database and we'll start using a couple of these commands as we start to interact so if you want to return all of the nodes that are in the graph database then you can basically run the match command which says it's like a query it says just find me anything that matches the criteria that i have so i'm going to type in match n which just essentially means um i'm i'm searching for anything of n which is of type node um so it's going to basically return anything and then if i just type in return n so the node type that i'm matching and if we just hit return you'll see that there is nothing in my database at this point so the first thing i'm going to do is i'm going to create an empty node on the graph and then we'll return that it won't have a type at this stage it's just going to be an empty node so we'll just type in create an n and in this case because i've not specified a type it's just an empty node so if we hit return you'll see created one node and it took 51 milliseconds so if i just type in the command that we used the last time which was match n return n then it will now return my empty node that i created two seconds ago so if we hit return you'll see it returns a node and it's not got any type there's no relationships and it's in my graph and if i wanted to create another node again i could just do the same thing so create n one more time we'll just return that you see it's created another node and then if i run this query again you'll see i now have two nodes on my screen so that's always up and running it's our first version if i to delete these nodes then i can run the match n and then just return delete n so i'm saying find me a node of any type and then delete any of those nodes that you find so if i just run that you'll see it's deleted two nodes i created two nodes before um and then if i was to run this query again the uh is say no changes no records right so i deleted those nodes of course if i was following my social networking example that i gave before then what i'm going to do now is create a person so what we want to do is create a node of type person so what we would do there is say our node of n and we will say the type is person so the key thing that we're doing here is we're using that uh before when we were creating an empty node it was just n and now we're saying we've got a colon there and then we're saying the type of the node that we want to create in this case it is person so we just run that as we did before you'll see it's added one label one node so my node has got a label of type person and if i was to run that match n again and you will now see i have a node and it's of type person so you see at the bottom there it's got a color of of pink and it's of type person and it's got id 0. so if i wanted to create another node i'll just click on and say create n person again and you will see if we run the match again i've now got two notes so now i've got some notes on my my graph and maybe i want to give them some properties so let me let me just delete those notes again so we'll type in match n and we'll delete n and then they will delete those two nodes and we'll get rid of that and you'll see that i'm back to having an empty graph so what i'm going to do now is just add some some labels onto the the nodes like we did before in our social media example so we'll say create uh we'll say n is of person and what i'm going to do is give them a name of chris so how that works is i use these uh brackets these curly brackets and and then i just say what the name is and you see it's a very similar format to json so it's uh and i can add any other sort of properties i want in there so maybe i'm going to say favorite color and that could be blue actually i was orange before it wasn't i'm changing my favorite colors and then if i were to run that again you'll see it's now created there and then if i run my match and run the graph you'll see i've got the chris person that is now created and you see the favorite color is orange and the name is chris and and of course i can take this and i can add some new people so i'll say jenny and favorite color is blue and then we'll run that and we'll maybe say jemima and favorite color is green i will run that again and then if i run my match you will now see i've got jemima jenny and chris and they've all got different uh favorite colors so now that we're comfortable with creating some nodes actually let's create a couple of other different types of nodes onto a graph so maybe in this case uh some of these people are going to go to the same schools or different schools so let's why don't we create a a school node in this case so let's uh do a create and we will say n is of type school and we'll say the name of that school is maybe going to let's say they go to lsu for example so we'll create the lsu school and we'll just run that and i've created one node one label and if i was unless i had another school there as well so let's maybe have the let's have ohio state as my school there i don't know why i put capital letters here so let's put ohio state so that is now run as well uh we'll close that uh we'll close that and then if i run the match and return n and you'll see i've now got lsu and ohio state and we've got jemima jenny and chris very cool so we've got five notes on my graph there and as you see as i create different node types uh it's giving it different colors so it's easy to identify and what we can now do is even run different commands as well so if i want to just see one type of a node so let's say rather than returning all of the nodes such as match and return n if i wanted to just see the persons come back i could specify that in my query so i can type in match n and we will say type person and then return n so if i run that you'll see it's only going to come back with the person objects and then if i was to change that and then say i want to see schools only we run that and you see it returns only to school so ohio state and lsu i could also limit so one of the other things i can do is if i want to limit my results a little bit then i can just limit them as well so if i take this match n return n and i can type in match and return n as we did before but i want to limit it to three notes or four nodes maybe i can just type in limit and how many nodes i want to return and then run turn that and you can see it only returns four and again i can sort of query this here so if i want to let's say restrict it to just one type so let's say i only want to return schools i only want to return one then it will return that as well so the limit command is is quite useful so now that we can create nodes and labels what we're going to do now is create some relationships in our graph as well so in order to create a relationship what i need to do is first of all is find the two nodes that i want to relate and then how i'm going to do that is using my match command again so i'm going to relate jenny to the school lsu so i will type in match and this time i'm going to use s as my variable and i'm going to take school so that will be lsu that i'm going to return and i'm going to also get p for person now i'm going to do a multi-line statement in this case so i need to type shift and return so i can be multi-line so what i want to do now is only return the lsu school and i only want to return jenny i don't want jemima or ohio state or chris to appear so i need to introduce a where clause so where i'm doing that match i want a school and a person i'm now going to restrict it in this where clause so it only returns the folks that i want and the schools that i want so i'll type in where i'm going to type in s dot name and that is going to be equal to lsu and so that will mean it won't return ohio state it's only going to return lsu and i'm gonna want the person and the name is gonna be equal to jenny and actually before i create the relationship if i wanted to i could just return them in the graph so if i do return s and p and hit return you'll see it returns jenny in lsu so it's not returning anything else so what we're going to do now is create a relationship between jenny and lsu and the relationship is going to be studied at so jenny studied that lsu and how we do that is well our match query is already returning lsu and jenny so we need our match statement like we did before but rather than returning the nodes we're going to create this relationship and we do that by typing in create and we need a bracket again so in this case i want the relationship to go from jenny to the school so i need p for person so this in this case it's jenny it's going to return because that query up here is restricted that to jenny so and i'm going to have a dash so minus sign so i'm creating this sort of it's going to look like an arrow which you'll see in a second and then i'm going to create so i put some square brackets and i'm going to put in the relationship type that i want so i'm going to put stu as the variable name and this time i'm going to have my relationship as studied at so i'll just type in study that and then to complete the relationship i'm going to put another dash at the other end of this so another minus sign and then i'm going to put the greater than symbol so as you can kind of see it forms an arrow so this is p right person this arrow and in the middle this studied at and then the final thing that i need is s for school so you see p studied at school right and if i hit play here for a second it creates that relationship um let's uh let's just copy that for a second and then we'll close it and then if we run the play button again you will now see jenny studied at lsu so we've created that relationship and of course i can add more relationships if i want so if i want to have uh jemima go to ohio state then i can just change this query here so we'll say ohio state and then we can run the same query again and now if we run that you will see that both uh you'll see that jenny studied at lsu and you'll see that jemima studied ohio state and then i could maybe have chris study at uh lsu again we'll run that and now we've got two people who studied at lsu so jenny and chris and then you see jemima studied at ohio state so we're building up these relationships quite quickly it's really good for networking it becomes really easy to visualize and i can extend this even further so let's add in the friend relationship for a second so what we're gonna maybe do is is uh so we'll have maybe two persons in this case so we'll say maybe i'll have uh p1 which is person and we'll have p2 as a person and we are gonna say jenny for p1 and for p2 we're gonna say uh jemima as we did before and this time rather than the friend relationship being we'll create the relationship there and rather than the relationship being studied at they're going to be friend so that is created and if we run that here you now see uh jenny is friends with jemima now i think that jemima should also be friends with jenny because i think it's weird not to be so i am just gonna modify that slightly we'll do p2 to p1 and then we can have sort of a bi-directional thing um and then if we run that there we go that's that's nicer isn't it jemima and jenny are friends with each other and i like that yeah so there we go uh jenny and jemima are now friends with each other jenny went to lsu and jemima went to ohio state but they are friends and and what we could now do is make chris friends with jenny so let's do that so we'll say p1 as chris and we will set p2 to jenny we will run that for a second and then if we just run this now jenny chris is friends with jenny let's let's make uh uh jenny friends with chris also and we've got lots of lovely friendship relationships there we can sort of glean from this that chris studied at lsu jenny studied at lsu they're both friends and jenny uh is friends with jemima and jemima studied at ohio state so we can get into this idea that if chris wanted to meet jemima for example then he could talk to his friend jenny and jenny who's friends with jemima would be able to interact another example could be if chris wanted to know somebody who went to ohio state then you can see from the the kind of map there the you know if if he spoke to jenny jenny's friends from jemima jemima went to ohio state and therefore you can get an idea of of uh who of chris's friends went to a different school in this case ohio state so as you can see as these networks get larger and larger it becomes more powerful and you can start to form these connections and they really have some fantastic use cases so i think we're done with this example and i think we're going to do a more complicated example in a second so what i want to do is clear my graph right i want to clear my graph database and and make it empty so if i wanted to if i type in matching deleting again we're going to see a problem occur so let's do that so i'm just going to run that and you'll see it says cannot load delete node 0 because it has relationships you must first delete the relationships before you can delete the node and that is true right so if you are wanting to uh get rid of nodes you have to get rid of the relationships first so the sort of thing that i would need to do is find that relationship delete the relationship and then uh then i could delete the node if i wanted to be super lazy which i'm gonna be in this case and then i can type in match and and then i can type detach delete uh n and what that will do is delete the uh relationships as well and if i just run that and then hit play again you can see it's gone so social networking is kind of interesting but i wanted to run a more complicated example so what i thought would be cool is let's take a supply chain example in this case a software supply chain example and what i want to do is try and represent something like docker and even the neo4j image but represent that within a graph database so i've created one a little bit in advance and we'll sort of just copy a couple of these things in and we'll build up the graph as we go along so what we're going to do now is start building up a representation of this docker stay in my graph database so first thing i'm going to do is i am going to create myself a container registry so if i just type in this we'll hit play and we can return match and you'll see i've created a node called container registry which is docker hub so that is where i got my neo4j image if you remember at the beginning that i got that from the docker hub container registry the second thing that i'm going to do now is because it's an official image then docker help has a representation or docker in general has a representation where everything underneath the container registry has to be in an organization so the official uh docker hub uh organization for official images is represented as a an underscore so we'll just return that so we now if we run match again we've now got a container organization and we have a container registry so the next thing we're going to do is create some repositories in docker so a repository is a storage area for a particular image so the repositories we're going to create is going to be hello world so there's a hello world docker image that exists there's going to be neo4j which is what we installed earlier and the other one would be node which represents the node.js image so i'm just going to run that and that's going to be created and you'll see now i've got my hello world repository my neo4j and i've got my node.js repository i've got dockerhub as my registry and i've got the official images underscores my official image and now what we will do is we'll create some relationships so what we want to be able to do is have the container organization uh related to our container registry um so in this case the official image is going to be the underscore organization is going to be uh related to docker hub and then i'm going to relay all of the repositories so neo4j hello world and node to the official images uh organization so i'm just going to create all of this we're going to copy that into here and this is just exactly what we did earlier right which was we did the match we're matching registries and organizations and then we're creating relationships between these items so it's exactly what we did before so i'm going to run that for a second and then i'm going to close this and then we're going gonna hit play and now you will see that uh that this official images is underscore is related to the docker hub and the relationship is of registry and you can see organization neo4j hello world and node all sit under the official images registry so we're building up this sort of software supply chain representation of docker so what i'm going to do now is take the actual image and create that as a node on my graph database and then i'm going to relate that to the repository so that neo4j image that we installed earlier we're going to create some nodes so i'll just paste that in here and you see the node type is container image and that digest represents what that image is and you can see it's a an amd64 os architecture and you see another property that i have is is the size of the image and then i've done the same for another container image here so that gives you a bit of an idea of what these images look like and if i just hit play it will create those nodes and that will now appear in our graph and there because i haven't created any relationships at this stage they're just sitting there on their own and actually for a bit of fun if i was to just google neo4j docker hub and we click on here and if we click view available tags if you look at there for latest you'll see 2377 osr linux amd64 size 33a and then if we come back onto my machine there you go so 237 that's the actual digest and the same os architecture and there's the the image size so it's a real representation of the uh neo4j image there so what i'm going to do now is relate these images to the correct repository so if we come back into uh vs code for a second i'm just going to select these relationship things and all you'll see is exactly what we did before so i'm just matching images to repositories i'm saying what repo i want to relate it to and which image i want to relate it to and then i'm just going to create a relationship so in these cases in docker relationships are represented as tags so i just say what a tag is and then i put a property against that relationship a label against it so in this case 425 which represents the the version and of course i've also put a latest one there as well so if i hit play that's created all of that and then now [Music] if i look at that you can see i've got two tags against neo4j digest here and digest here if you look at this tag this is the enterprise version and this is the uh you know and this is 425 enterprise and this digest here is uh is 425 and this one is the latest so i'm i'm now building up this representation of my software supply chain so that very same neo4j image that you created earlier on your local machine we're now starting to represent in this identity graph and that becomes really powerful as we'll talk about in a second then the last thing that i want to do is actually if i come back to this you will see that the neo4j image was pushed 12 days ago by this do i janky user so what we want to do therefore is represent them in the graph so i will create the user so let's come back into neo4j for a second and i'm going to create a user called dojanki and then i'm going to relate that image digest and i'm going to create a relationship called push by so you would see for each image that you create you'll see which user did the push we'll just run that and then we'll run match again and now you can kind of see that these two images were being pushed by this one user djanski why is that useful as a real world scenario though well if you think about it as i'm creating my own software supply chain i could get really complicated in my graphs right i could have more and more images in my supply chain i could have my ci cd pipelines represented um i could really build up a very detailed estate of what my docker image estate looks like and how my ci cds relate to that and this would actually allow you to discover any potential problems in your architecture so a good example is let's say so maybe for example you're not using docker hub as container registry perhaps you've got your own private registry maybe you're using github packages or maybe you're using kuwait or io and therefore in your architecture you don't want any images coming out of docker hub you want everything to be in your cradle i o so you would maybe have this graph representing every single image in your estate and that's great um and then you could see that relating back to quay.io but then if somebody came along and installed something maybe from dockerhub a public image then if you built up this sort of identity graph and you automatically created this then you'd be able to look at it and say oh my goodness why have i got images from docker help that's not meant to be there and you'll be able to find that very very quickly so that could be a good way of looking at your supply chain of course the other one is a good example is if you think about things like typo squatting attacks what a typo squatting attack is is if somebody gives a wrong name so let's say rather than neo4j there is a nefarious image called neo4k and and one of your developers automatically installs that by accident then that repository could appear on your graph you would see neo4j and you could see all the containers that use it but then you would see this neo4k thing appear and you go oh my goodness why is somebody using neo4k and therefore you can have some rules that stop these things appearing in your software supply chain so that that could be a really good use case then of course the last one is is what if a user becomes compromised what if a new user appears so in this case all of these images are being uh published by the stoijanky user right which is cool and maybe your images are are published by a certain user maybe chris hey uk or fred or jemima or jenny but let's say somebody publishes uh a docker image uh you know they've got access and it's a different user maybe their name is uh banana or something right and then you would look at this and you see all of these images are pushed by uh dojangi or chris hey uk as normal but then this banana user appears out of nowhere well that would be easy to see and query and understand that so now that i've got quite a complicated example of my docker estate i can start doing some really cool queries so maybe what i want to do is find only the container images that are marked as latest for neo4j so how would i do that so the first thing i would do is i would use the match command as i did before and in this case i want to return the container repositories that are named at neo4j so we will just type that in and you will see i don't really i'm not giving container repository a variable in this case because i'm not going to do anything with it but this match is only going to return those container repositories named neo4j which this one is so if i had lots and lots and lots of other repositories kicking in there so i've got you see i've got node and you see i've got hello world none of them are going to be returned it's only going to pull uh the neo4j repository these two will get ignored right so that's that's fine so the next thing i want to do is once i've got neo4j i want to return the container image that is related to neo4j with the latest tag so how i would do that is i would use that arrow syntax that we use when we were creating a relationship so i would do a dash as i did before and i would do the square brackets uh i'm not going to give the relationship a variable because i'm not going to do anything with it but i'm going to set that to latest and then i'm going to do another minus sign here and a greater than so it creates that nice little arrow so it's anything that is related with the tag latest from container repository to contain our image so we are now going to give it a variable so i want to because i'm going to use that a little bit later so i'm just going to sell image and it's going to be of type container and it's going to be of type container image here so we'll just type that and then i'm going to go multi-line for a second and i'm going to return my images so if i hit play on that for example you see it's only returning two three seven one one a one and of of course i can i can do some really kind of cool stuff there as well so if i if i rather than looking for latest if i want to find things with a specific version i can i can do that as well so let's say rather than looking for the latest tag i'm going to look for a tag and i want to find tag of name enterprise yeah so if you remember in this example here this tag here has got the name enterprise so if i run that and then hit play you see it's now returning 646b which is this one so it's full in this path it's ignored latest it's ignored this one before so you as you can see you can run some really kind of complicated queries and start traversing things very very quickly um through decipher syntax and of course there are other use cases you can kind of imagine how this is used in things like banking for fraud for example you know as you represent people in your uh in your graph database and payments as relationships and therefore if somebody new appears so perhaps there is uh you know uh a fraudulent uh account of some sort of fraudulent person or maybe you just mistyped uh a payment to an account number that you've never sent before then if it's not in this this graph if it's a relationship you've never created before so it'd be easy for the bank to detect that there is an issue because it's a transaction that you don't normally do already and be able to say okay actually did you mean to send money to that person and of course it actually becomes super easy to identify things like fraudulent accounts for example so you can imagine these these really complicated relationships where you're sending money from one person to another to another and then maybe all funnels into a single user so maybe but maybe that six seven relationships deep right so you transfer money to somebody they transfer money to somebody else to somebody else to somebody else and maybe you have 20 different accounts all send money to this this central account so it'd be easy to to find this in something like neo4j because you can start to traverse those relationships and and discover uh you know many many levels deeper where money is moving along and therefore if you wanted to find things like mules or if you wanted to find things like money laundering then this technology really helps you discover that and again that is something that banks are using today uh to do that so this is the exact software the exact techniques they're using to discover these uh fraudulent transactions so hopefully you found this quite useful and and i'm hoping that you are imagining all your own use cases that you could use in your own business and again this technology is getting hot and things like e-commerce for example you know anywhere machine learning or recommendations occur it's getting super super hot as a technology there and i'm sure you're buzzing with your own ideas of how you could use this but today hopefully you've got a pretty good idea of how to get started using a technology like neo4j and start creating your own graph databases in future videos i'm going to show you how you can then start doing things automatically so using things like node.js for example to start automatically creating graph databases from your code and building that up on the fly then i'll cover that in a video and then i'll probably delve into other examples of how you could use neo4j for some of these other use cases that we've said anyway i hope this has been useful and speak soon
Info
Channel: Chris Hay
Views: 5,070
Rating: 5 out of 5
Keywords: graph databases, graph database, neo4j tutorial, cypher, neo4j tutorial cypher, chrishayuk, chris hay, neo4j, neo4j tutorial for beginners
Id: IShRYPsmiR8
Channel Id: undefined
Length: 36min 37sec (2197 seconds)
Published: Mon May 10 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.