My Favorite Graph Algorithms w/ Co-host David Meza

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey folks so welcome to another edition of reading fun in the sun so today we are going to be talking about my favorite algorithms to use with graph data now i am going to be using a whole book that is dedicated to this they use apache spark and they also are talking about neo4j but it is applicable to any kind of graph data or graph solution that you are using and i might be joined today by a special guest who also will share his favorite algorithms from this book so without further ado let's go and jump into this as with all reading fun in the sun videos there is a giveaway for the book we are reviewing today so make sure you like subscribe and leave a comment below if you are interested in that even if you are not familiar with graph or machine learning i absolutely love this book but before we get too much farther let's go in here from our special guest david hi everyone david messer here kind of wanted to give you just my overview of this book graph algorithms i think it's an excellent book for those of all levels within the graph data science community or graph algorithms themselves it gives provides good enough information for those that you're starting off to understand what graph algorithms are and how to use them but it also provides those that have already have some knowledge and want to dig a little deeper areas and places to look for so you can get a good understanding of how those algorithms work why i like this book it gives you a good overview of all the different algorithms it tells you when and how to use them it gives you examples along with the formulas and then it's going to give you a little bit of taste for things where things are possible to come where as graph data science and those things expand now david was kind enough to help me with the latter part of this book so make sure you stay tuned the very beginning actually starts out by just explaining what is graph generally and what does a good tool for graph look like of course this is written by folks from neo4j but they are pretty well kiltered in their explanation so they're not pushing their tool on you but neo4j is free to play around with so honestly it's it's worth just taking a look for that reason alone they also are talking about apache spark now if you're not familiar with that or you're not using that totally fine you don't have to use any of these with apache spark they're just using that as an example because these two things play really well together especially now that apache spark has native neo4j compatibility and there's also managed solutions for this on aws so if you don't know that or what neo4j and apache spark have something to do with each other read the book they actually do a great job in explaining why those two things are a match made in heaven for any machine learning project the book has a few central themes there's pathfinding and search algorithms goes into centrality algorithms community detection algorithms some practical examples and then how graph really enhances machine learning projects so i'm going to take one example from each of the main areas of algorithms that they cover in the book and we'll see what our friend has to say about his favorites from the book as well all right so i am going to start with all pairs shortest path now i talked about this in my recent 10 minutes or less what is an idiom and how do you manage them in machine learning i actually find these really helpful when you're trying to figure out which features two apparently separate things what do they have in common so if you're looking at all the paths between two different nodes it's going to calculate based on the weighted path which nodes they have in common or which path has the highest weight to connect the two so let's look at some examples so in the book they are looking at the shortest path for transporting goods in an urban setting looking at traffic loads what we're going to look at here is if you have let's say three different routes to get to a location which is going to get you to your destination the fastest now you could use just a shortest path but you would have to actually look at each of your options to understand which shortest path is the right one for you whereas if you use all shortest path it's going to look at all three and calculate which is the best for your use case here after the distances are calculated you can filter out anything that's a duplicate whatever it is that your use case is is what you can then weight these different paths to find the optimal route for you now taking this a step farther if you are trying to understand what feature two dispersed things have in common let's say we want to find out why heath leisure and james cameron are connected or if they're connected if you use this algorithm you're going to be able to identify the different relationships and different nodes that connect these two people together and then you're going to understand the distance or the similarity score between them you will be able to then understand what is the most likely i.e shortest path between these two entities moving on to centrality algorithms i think one of my favorite ones that i actually use in my day-to-day life a lot is betweenness centrality so this is using what we just discussed with shortest path but it's now looking at where are those triggers that connect disperse things together so these could be bottlenecks the the one broker that has to do everything at an organization or maybe a certain facility that has to process everything to get to something else these are the nodes that have the highest connections connections that often are high risk because it means whether it is a digital twin something that's representing something in the real world like a facility that has to process everything to get all of your product out of the united states that's probably risky if that facility for some reason got shut down so looking at these bridges or control points as the book describes also helps you understand how to make something more efficient if you can add another node or in the real world another facility to help connect two different processing plants together so that you have a shorter time to market that's probably a good thing to be able to identify or in the case described with the book with more network influencers if you can identify who would be the most influential if maybe talking to that person would get your product out more than maybe talking to a different person that doesn't have the same kind of influence because of the network this is something that you can do to understand the impact and where you want to invest in your time and resources the way this is calculated is when you have node a and node b you want to find the shortest path or shortest path with an s between those two and you're going to give each a score so each of those pathways could potentially have other nodes that connect a and b you're going to want to find all of the shortest paths and whichever shortest path is is the winner that is the bridge connection usually when there's one additional node between the two if you can find zero nodes between the two that's a very strong centrality when you have multiple nodes distinct between the two a and b that adds how many hops away that logic is those would be a lower score so once you understand those bridge nodes that's where you understand there is either a a risk b a influencer or c something that you just need to pay attention to so that you don't have holes in your graph if something were to happen to that node now i'm going to kick off the community detection part of this video but then david is going to take it over from there stepping into the community detection algorithms you all know if you've watched this channel enough i love me a good clustering analysis so i really like the triangle count and clustering coefficient so essentially what this is used for is understanding those very tightly coupled triangles one thing is related to another thing which is related to another thing so those all create a triangle and when you're looking at the clustering you're trying to figure out how dense that cluster is and how many triangles actually are represented within those clusters i use this all the time in search so recommendation engines are always trying to figure out what things are clustered together what things are like-minded so if you're looking at a lot of pages or a lot of content that is specific to let's say cyber security and over-the-air updates in uh electric vehicles well if those are commonly associated with one another maybe there's nothing linguistically that ties them together but you can tell that they have something in common even if you don't know what those features are yet because they are so often co-occurring you can see this in some of the linking between different wiki entities this is a really good way of doing recommendation engines looking at disambiguation these are some of my favorite algorithms to use this also ties back to those bridge algorithms that we mentioned before because if you find a node that is connecting a triangle of totally dispersed things that might be an indicator that you're looking at a bridge node this is calculated by understanding how many of one node's neighbors are also neighbors of one another and the distance or other nodes between those that gives you the centrality of how closely knit that cluster is so to start off with each section starts off with an overview of all the algorithms within that section one of my favorite is community detection i do a lot of work in people analytics because of that people everybody knows people say basically it's a graph problem as we connect people not only to our work roles to our skills to our project many different things we can do we need to see how they are how they organize around a community these algorithms help us to do that with the with the chapters themselves he starts off letting you know what an overview of all of that giving the algorithm type like what it does an example very good information for those that are starting off and wanting to know of course there are different types of graph algorithms within the community detection you can see things from measuring algorithms components label propagation and levene modularity which is one of my favorites that i use a lot which helps me start looking at things that skills adjacency and looking how communities are combined together for us to be able to understand how we may move from one community to the other across a lot of our people analytics work so this levine basically it finds communities in vast networks and it does a lot of that by telling in this book it tells you how to you should use these algorithms it gives you a good a lot of good information to start that but it just provides a little bit more than that to me anyway when i'm reading this book for those of you who want to dig a little deeper if you've already got an understanding of how the algorithm works but you really want to understand the math behind it it starts off by giving you some of the formulas behind these algorithms to give you an understanding of how that may work but they don't stop there they also give you examples in use cases and a lot of times seminal papers back to the original algorithm and how it was started so if you really want to get understanding of the algorithm maybe even create some algorithms of your own based on these they provide you the paper the papers and the research within this book to help you get started to do that kind of research into those algorithms so great way they do these things but they allow us to really showcase some of the things we're doing uh within graph algorithm you know but that doesn't stop there what i really like about this book once they've given you all of these various algorithms and how you can use them in different examples they give you a taste for those who really want to to dig even deeper than that and already have some of that knowledge it gives you a sample of what's going to happen in the next sections the last two chapters talk about how do we actually use graph algorithms within um a couple of examples and the way they walk you through this example really give you a good idea of how to use graph algorithms chapter eight they're using graph algorithms to enhance machine learning it's really the next step for the next evolution of graph data science and how we're moving forward to it and how we're actually using graph embeddings within the graph databases in order to then take that information being able to pass it on to neural nets so those last two sections dig a little deeper into that give you a taste of what the possibility is but it gives you an excitement to continue to learn try to go forward hope you like the review but ashley again appreciate the time and i think i'm going to head back to the to the river here and just enjoy my last few moments in the sun thank you everybody and have a great night all right i believe i have geeked out enough over algorithms and graph for the day again i really strongly recommend checking out this book it has step-by-step directions on how to use python how to use neo4j and how to use all of these in spark again if you're not using neo4j or spark you can still find a lot of value i certainly have and i hope you do too alright so with that i want to thank you very much and i'll catch you next time
Info
Channel: Ashleigh Faith
Views: 238
Rating: 5 out of 5
Keywords: what is information architecture, Ashleigh Faith, search engine optimization, knowledge graph, knowldge graph, how to make an ontology, how to make a knowledge graph, Graph Algorithms: Practical Examples in Apache Spark & Neo4j, David Meza and Neo4j, graph algoithms, knowledge graph algorithms, embedded graphs, graph algorithm for search, graph algorithm for people networks, graph algorithm for community networks, graph algorithm for supply chain
Id: z-RCS2sD6kQ
Channel Id: undefined
Length: 14min 24sec (864 seconds)
Published: Mon Sep 06 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.