Neo4j Batch Insertion from CSV

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
let's take a look now at what we can do if we have our data in CSV file format so you'll notice I actually will have two files here one for the airports and one for the flights so you can see in front of you the airport file just has four airports and for each Airport the city and the state we could put other information in here the idea here is that each of those columns will become an attribute or a property for each airport and so we could have things like the latitude and longitude or the size of the airport any kind of information you can imagine for an airport we could make an attribute the other file that we have here is the flights so you can see here we've got the flight number the airline departure Airport arrival Airport and flight capacity for passengers and so we're going to use this file three different times we're going to use it to create each flight and then to create each of the two relationships for the flight so let's take a look at how we're going to do that again I'll do this from the command line in the web admin tool and what I'm going to do is use a cipher query using the load CSV format and so what I'm going to do is I'll load the airport's first so here's the query that will load the airport's and watch out for newline characters in your queries when I'm copying and pasting it over from a text file so it's going to be a load CSV and then I tell neo4j that my file has headers so it's going to know how to look up those headers and then I tell it the file location so from and here's where I've put the file on my machine and then I can say as airports so what that will do is it'll read the CSV file in and it will create a new object for that CSV file called airports that I can refer to that object now in my create statement and so all right once I get past that here now I've got I'm going to construct the basic syntax for a create statement for each Airport so you can see I'm going to say create a1 that's the name of the node that I'm creating if I wanted to return it I could and then I'm going to say give it a label of airport and then again just like before we're going to have three properties we're going to have the label so that's the three-digit code and then we're going to have the city and the state now notice what I do here I say label colon and then I say take the airport's object and look in the label column and so it'll say airports dot label look in the label column and then for city it's going to say take the airport's object dot and look in the city column and then for state it's going to take airports dot state and so to look in the state column and if we take a peek back at that CSV file will see that what it'll do is iterate through each Airport row so it'll say for the first Airport we're going to give it a label of dtw City as Detroit and the state is Michigan and it's going to fill those three values in so it's going to look them up dynamically and it'll iterate through the entire file so in this case through four rows and create one node for each of those four airports so when I hit enter what we'll see is added four labels created four nodes set twelve properties each airport had three properties and it took 351 milliseconds and so if I did match n return n we should see four nodes and sure enough there they are all four airports were created in one query now we've got the airports the next thing we need is the flights and so I'll create those flights using another load CSV query so here's the syntax for that load CSV with headers just as before from and now I tell it a different file the one that has all the flights in it and I'll say as flights so now when I want to refer to particular columns in the file I refer to flights dot and then column name so create here I'll just use n for the node it's a node with a label of flight so that's the type of node I'm creating and then the properties here that will assign number will be flights dot flight so it'll look up the flight number airline will be flights airline it will look up the flights airline and then capacity will be flights dot capacity so it will look up the flight capacity for each row and it'll iterate through all 24 rows so I'll go ahead and run this query and we'll see it added 24 labels created 24 nodes and set 72 properties so we now if we run our match and return n we see a whole bunch of dots here and I can make this fullscreen this is still being developed so it's it's not a perfect interface so not everything is fitting on this if I make my browser window a little bit bigger I could make everything fit but so you can trust me though we can see the two airports here there are two more and then it looks like we can see about 18 of the 24 flights if we had a larger window we would be able to see more so this is all still in development this browser but it's a nice way to view small graphs and then okay so let me bring back my command line I've created the airport's I've created the flights next up is going to be to create the relationships and so I will do the arrivals and then the departures so here's a query to create the arrival relationship again I'm copying and pasting these in so it's creating a new line there that I don't want so load CSV with headers as before from and this is the same CSV file I just used but we're going to use it a different way now so as flights again but now what I'm going to do is create relationships so I do match and then I say no to a which will be a flight type but find the one where the number matches flights dot flight so for each row of the CSV file finds the node that matches the flight number and then node B is going to be an airport node and find the one where it matches the flights dot arrive value so that's the arrival Airport code in the CSV file so it finds the node for the flight and the node for the arrival Airport and then it says create a relationship from A to B and the relationship will have the label of arrives so it'll iterate through all 24 and we'll see that it'll say created 24 relationships and again I'm not asking it to return any row so it's returning 0 rows all right that's one we now need to create the other set of relationships which is to say the departure relationships this is identical the only changes are instead of looking up flights arrive I look up flights depart so I look up the departure Airport and then the relationship that's created is one of type departs other than that it's identical so when I run that query again it says created 24 relationships and now if I do match n return end this actually does seem to all fit on the screen and you can see all of those relationships are in there so one thing I could do if I want I can drag things around a little bit on here you saw they're kind of floating around once I drag an object though and I drop it it stays pinned so now when I drag other objects you see the Pittsburgh Airport isn't moving so I've told that I want to lock those in so I'm going to put the airport's in the corners like this and now I can very clearly see which flights go which way the only other thing that's a little bit trickier to visualize is a rival versus departure for each flight so again I could drag these up and down a little bit to clarify that so I can see now very clearly flight 45 departs Boston and arrives in Pittsburgh flight 46 departs Pittsburgh arrives in Boston I could do that for all of the different flights if I want to so here I'll do Detroit to Pittsburgh and vice versa for instance and that just lets me visualize it a little bit more clearly so you can see we've got all 24 of our flights in now and we've got all four airports and all of the relationships have been established so this data set is now complete in our database
Info
Channel: mfschulte222
Views: 23,536
Rating: 5 out of 5
Keywords: Neo4j
Id: 1U6iUTV_Dco
Channel Id: undefined
Length: 8min 40sec (520 seconds)
Published: Sun Nov 09 2014
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.