Improving News Recommendations With Cloudflare Workers & Knowledge Graphs

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] there are many types of databases today the tried and tested relational database the new and popular document db and many others and all of these have their strengths but one place they all fall short is in handling complex connections between your data sure any database may be able to return friend-of-a-friend queries of three or four degrees but what if you need 20 degrees and need it quickly that's where neo4j comes in unlike other databases neo4j graph database doesn't need to compute the relationships between your data at query time the connections are already there stored right in the database because of this queries of deeply connected data are orders of magnitude faster we get used to the limitations of the tools we work with neo4j blows those limits wide open enabling queries never before imagined at speeds never thought possible that's why neo4j has become a key technology driving business critical applications for hundreds of fortune 500 companies government agencies and ngos learn more at neo4j.com [Music] a property graph database is a data storage system that treats relationships between data entities with equal importance to the data itself in comparison relational databases use table structures to store data requiring mapping tables and join queries to connect the data entities together in essence a relational database does joins on read whereas a graph database does join them right this paradigm shift in data storage allows us to switch from asking is my data connected to how is my data connected enabling us to make exciting discoveries so what does this mean it means that if you've got highly connected data questions such as detecting frauds in a banking system predicting network outages or just finding the fastest route from a to b a graph database answers those questions quickly efficiently and easily curious to find out more about graph databases why not give one of our free no download guided sandboxes a test drive your machine learning models are working you're seeing results but you know there's more for you to uncover what if you could use the network structure within your data without disrupting your machine learning pipeline with graph data science you can here's how it works you load your data into the neo4j graph database which reveals the connections in your data then you ask questions using graph queries and uncover hidden patterns in your data using graph algorithms [Music] for example similarity and community detection algorithms examine the network structures in your data to uncover tightly knit communities such as fraud rings then you can use graph feature engineering to extract predictive elements to augment your machine learning models [Music] now you have the power to make better predictions from existing data to learn more about graph data science visit neo4j.com [Music] [Music] neo4j graph database empowers developers by providing them with full control over how their applications interact with the database including robust data pipelines machine learning and ai streaming data from sources like kafka and more neo4j's high performance distributed cluster architecture scales with your data and your business needs in real world situations with neo4j 4.4 the latest version release of neo4j graph database comes with significant feature additions this new cypher clause provides a massive gain in data processing and reduces memory requirements when importing very large data sets by enabling developers to start one or more transactions from within a transaction connect to neo4j using cloud native api and simplified routing without middleware or use of language drivers scale swap and upgrade graph applications without any downtime with remote database and database alias commands achieve significantly faster text-based matches and benefit from data import and runtime queries take advantage of cutting edge apple silicon using docker images with official support for arm users can now authenticate with cloud-based identity providers with neo4j you can achieve a robust transactional guarantee high performance across trillions of relationships and millions of hops per second run neo4j anywhere supporting your hybrid private cloud lift and shift or cloud native environment needs to learn more about the other rich features added to neo4j 4.4 log on to neo4j.com what's new 4-4 are you interested in graph databases but don't know where to begin neo4j sandbox is the best way to get started with interesting pre-loaded data sets like movies crime investigation and the panama papers it's easy to jump right in follow our guides and explore code examples in python javascript java.net go and graphql there's no download required and you can get up and running in less than a minute try for free today at neo4j.com sandbox neo4j is the number one database for connected data built from the ground up to leverage not only data but also data relationships neo4j is a native graph database which makes it critically different from other data stores but what does native even mean neo4j is designed around a simple yet powerful optimization each data record or node contains direct pointers to all the nodes that it's connected to these direct pointers are called relationships all the information needed to find the next node in a sequence is available in the node itself the native storage layer is a connected graph that's what native means because of this principle neo4j doesn't need to compute the relationships between your data at query time the connections are already there stored right in the database because of this queries of densely connected data are orders of magnitude faster other databases don't save direct pointers between records they need to compute connections by searching through a separate data structure called an index this lookup process which has to be done repeatedly to find each connection is extremely expensive and gets exponentially slower as the data size and query complexity grows this makes them inherently slower than neo4j for relationship intensive queries over the past decade neo4j has made graph databases a key technology powering modern applications for hundreds of fortune 500 companies government agencies and ngos neo4j is used to detect fraud enhance artificial intelligence manage supply chains gain a 360 degree view of data and much more learn more at neo4j.com graph databases have been the fastest growing database category for the past decade the reason for this is simple graph databases are highly optimized to understand and work with the complex relationships between your data unlike other kinds of databases a native graph database doesn't need to search for data connections at query time those relationships are already there stored right in the database because of this queries of large complex connected data are orders of magnitude faster a graph database doesn't necessarily need to replace your current database by adding graph capabilities to your current infrastructure you can keep the technology you already have but greatly improve it with the power to uncover rich data connections reduce time to market and run faster queries taking advantage of the connections that already exist in your data enables you to get ahead of the competition every business needs to leverage data relationships and leverage them faster and more efficiently graph databases deliver those capabilities the speed and efficiency advantage of graph databases has driven new real-time applications in fraud detection customer 360 machine learning and knowledge graphs just to name a few because of this graph databases have become a key technology creating competitive advantage for hundreds of fortune 500 companies government agencies and ngos learn more at graphdatabases.com so [Applause] [Applause] [Music] hello good morning good evening good afternoon to everybody thank you for joining hi will good morning it is morning where i am yeah yeah everyone hey everyone uh thank you for joining today we have a special session so i'm i'm looking forward to that it's it's it's going to be an interesting uh topic i admittedly have not much uh idea about but i'm i'm i'm excited so um the the topic is improving news recommendations with cloudflare workers and knowledge graphs um if imagine i don't know what a cloudflare worker is could you could you sum this up in maybe a short little little teaser of what's going to happen today yeah for sure so you know part of my job so i work on the developer relations team and part of my job is to talk to users and customers and see what technologies they're interested in using with neo4j and a lot of times this is different piece of different pieces of internet infrastructure right so how do how do we build deploy our applications because neo4j is really just like one one piece of the architecture of larger applications and uh cloudflare workers is something that's been coming up more and more i think of cloudflare workers is kind of the next evolution of like serverless we've heard of like lambda functions um basically cloudflare workers are i think of as like the next generation of serverless um that allows us to really get our content faster basically okay cool very exciting um yeah so very very um very interesting um i have a couple of people in chattery shout out so nice thank you for joining montaka in i didn't say where good morning at least good morning to you good morning to formation from graphica that's uh that's a cool name and uh no no he he he replied himself he is he's from nigeria and it's evening where he's watching so uh good evening to you um yeah i i think we um uh we gonna dive right in you already put the um the um links to the slides or the deck in in chat so um that will be in in the description of the video also for later on obviously this this video is is going to be available um in the future so if you are uh watching live but have to drop out at some point don't worry this will be available on demand on youtube um yeah i hope you you keep it lively again we're watching chat so if you have any questions will is here i'm around as well so um we um we always interested in in what you what you um what you want to learn or what you think about it so keep it keep it going and chat but yeah with that i think [Music] i'm ready to hand it over to you will cool great let's do it yeah it always always amazes me uh on these kind of streams and presentations to see all the different places in the world that people are joining from it's uh it's super cool um so as alex said the link to the slides are on the screen they're also pasted in the chat dev.e.com news-graph there's links to all the code and examples and things like that in the slides too so if you're interested in getting that kind of stuff definitely check out the slides so what we want to talk about today is improving news recommendations with knowledge graphs and cloudflare workers so we're talking about news recommendations this is like in news apps like i'm reading the new york times washington post whatever um and there's a ton of content out there newspapers don't really come in you know like the the sort of a ones the most important news you know the the business section uh and so on on dead trees that the the paper boy throws over the fence um instead we have lots of different ways of surfacing relevant content for users and so that's what we are going to talk about today specifically looking at how knowledge graphs with neo4j can improve this process and then how we can use cloudflare workers basically to build a news recommendation endpoint um and yeah as alex said if you have any uh any questions or comments drop those in the chat on whatever platform you're uh watching on i'm monitoring them all right here so definitely let us know so my name is will i work on the developer relations team at neo4j part of my job is to talk to users and customers and see how they're using neo4j but really what also what technologies they're interested in using with neo4j and cloudflare workers is one of those that has been coming up more and more so we're going to take a look at how we can use cloudflare workers with neo4j uh the best way to get a hold of me is probably on twitter i've got my twitter handle up there lionwj also a link to my blog where i also publish a newsletter at lionwj.com if you like podcasts i also co-host the graphstuff.fm podcast where we talk about building things with graph technology so if you like that format definitely check out the graphstuff.fm podcast so neo4j is a graph database i think some folks are familiar with neo4j neo4j we use a property graph data model which we'll we'll talk about and we use a query language called cipher which we'll also talk a bit more about today you can see an example of cipher in the upper right there we're sort of drawing these ascii art representations of a graph pattern and then it's up to the database to figure out where those patterns exist in the graph as a database neo4j fits in kind of the the center of a lot of times the architecture of our applications so this diagram i think does a good job of showing that really there's this spectrum uh from graph analytics on the right where we're interested in more data science and analytics type applications to the left where we're building applications we're interested maybe more in how do we build the api layer for my application how do i support more transactional or operational workloads so this is a spectrum uh and really i guess what we're talking about today i think kind of falls in the middle right because we're we are going to be talking about some analytics and graph data science but at the same time our goal is to build this news recommendation endpoint using cloudflare workers so we're really like sort of talking about how do we operationalize some of the graph data science and graph analytics that we're doing so let's talk a bit about personalization and recommendations kind of in general and specifically in the uh the area of news applications so here's some screenshots from some of the different news apps that i that i use on my phone and specifically these are the sort of recommendation piece so the the for you recommendations that are based on either categories that i've explicitly said i'm interested in or based on my viewing history these apps know what news articles i've been reading and are suggesting content to me that i may be interested in based on uh based on that history based on what these apps know about me and i think that the news area is particularly challenging for personalization i mean we see personalization and recommendations everywhere right you see this on e-commerce sites here are products you may be interested in purchasing this kind of thing but i think it's a particular challenge for news applications because there's so much content out there right there there's so much news going on every day and our interests as people are really very diverse right we're we're all in different locations so i care about local news to some degree some people are really interested in politics some people don't want to hear anything about politics some people are really interested in science right and so on and so it's really difficult i think to figure out what exactly is a user interested in reading and this is really i think acute for news applications as well because of notifications so when there's some breaking news events like that's important to us we want to get that notification but it's a fine line to balance for the news app because if they're bombarding us with notifications then i'm going to turn those off i'm going to delete the app and not pay attention to it anymore so it's really this nuanced balance for these news apps to to get right so knowledge graphs can be used for personalization and recommendations so let's talk a little bit about knowledge graphs but before we get into knowledge graphs let's talk a bit about here's just what is a graph before we get into knowledge graphs let's back up a minute well a graph is a data structure made up of nodes these are the entities in the graph and relationships connect the nodes with graph databases like neo4j we use a data model called the property graph so every node has one or more labels the labels are a way to group nodes you can think of labels as kind of like a table from the relational model or kind of like a collection from the document model just a way to identify kind of like what is the type of this thing uh relationships have a single type and direction and then it's called the property graph model because we can store arbitrary key value pair properties on nodes and relationships so then what what's a knowledge graph well a knowledge graph is basically an implementation of a property graph that is i like to say putting things in context and specifically things because not only do we know the uh the thing is an entity right so if we're dealing with news articles we know that it is an article we know that the article is about a topic or mentions a person so we know the the type of the thing and we know some information about the thing right we have some properties about the thing for an article we might have the title the url for a person we might have a name and a description of why that person is relevant for a geo region it has a name it also might have a latitude and longitude so we have some information about the things and the knowledge graphs put those things in context so we know how they're connected so putting them in context we know the relationships that are connecting these nodes so i know that this article at the bottom is about united states economy and u.s politics i know other articles that are about these same topics and so on um in i think it was 2012 google released the google knowledge graph api and wrote this blog post things not strings that was introducing the concept of the knowledge graph and kind of explaining that when we're searching for things on google we're sort of searching for a thing and using the context traversing the knowledge graph to help boost search results and that's exactly i think the general case of what we're trying to do with knowledge graphs so i've put together this data set that i call the news graph basically what i've done is taken some data from the new york times api which releases data [Music] specifically what i'm looking at are the most viewed and most shared article articles each day i think we grab 25 from each of those categories each day and store them in neo4j and we end up with this this data model here so very much the example from the previous slide was taken from this news graph you can find the code for this online here in this repo i'll drop a link to that in the chat so this data or this github repo has the cypher script for loading this data from the new york times api anyone can register for an api key and then we're just using apoc load json to call out to this api so if you're familiar with neo4j and cipher you have probably come across apoc at some point apoc is the standard library for cypher which adds some procedures and functions to extend cipher and one of those is this function apoc load json which allows us to call out to in this case a json rest api to fetch here we're looking for the most viewed articles of i think the last seven days and then that json object is then able to be referenced in the rest of our cipher query with this variable value so i can basically iterate through this array of article results and with cipher describe the graph pattern that i want to create so we have data for the import in this repo and also the code for using with cloudflare workers and then also a couple of examples of how we can build a graphql api both with next.js and apollo server here so if you're interested in this data set that's the place to go let's take a look at some examples i'm just going to jump into neo4j instead of look at screenshots here um so here's ufj browser if you haven't seen efj browser before this is like a query workbench this is kind of the primary way that we interact with neo4j as we are developing we can write cipher queries and visualize the results so here we have the basic data model for the news graph so we have articles articles are about a person topics geo-regions organizations organizations like companies that kind of thing and then articles have photos so we can write some cipher queries here here we're looking for the 100 most recent articles so with cipher we define these graph patterns so this this uh is one of the simplest patterns we can express which is just a node and its label so we're saying find nodes with the label article and return those nodes ordered by date published limit 100s give me the 100 most recent articles and let's do a style reset sometimes you can see the captions get a bit off and we can configure the captions that we want this way or often times just resetting to the defaults it works pretty well anyway so here's an article this is something about tik-tok and we can see the properties that we have stored here so we have the url the title the data was published some description of it and we can double click to traverse the graph to bring in more nodes and i can see the topics of this article so data mining and database marketing i can double click on that to see other articles that have the same topic and so on but i can also add more complex graph patterns to my cipher query so we have some examples in the history here so here we're looking for articles and then we've added a more complex pattern now that says find article nodes and then traverse out to find topic nodes connected to those articles so i can see here's an article about sri lanka the different topics it has um here are a couple articles that share topics so coronavirus vaccination so yeah i've got a lot of got a lot of covid related articles in here and what's nice about cipher is we can define more complex graph patterns fairly easily using this ascii art notation you can see we're sort of drawing with sort of these arrows and circles to represent our nodes so now we're looking for articles their topics and then other articles in those same topics here we end up with a lot of coronavirus articles since that's a popular topic cipher is really powerful uh we can also do things like say what's the shortest path between two nodes in the graph so here we're saying what is the shortest path from the topic cheese to tik tok that was kind of a fun one ends up being pretty short so there was an article about this cream cheese shortage in new york apparently uh that has the topic quarantine life and culture uh and then there was also an article about uh what is this one this is about a tick-tock star in mexico that is about the organization tick-tock so cypher is really powerful it gives us a lot of ability to work with graph patterns and we will see in a minute how we can leverage that to generate personalized recommendations so i see a question in the chat from rich do you have any property indexes configured in the database so when we match ordered by date published does it use the index to produce results i don't think so i think i was lazy and didn't set up any indexes but that is a great question let's take a look so here we are matching on all articles returning articles ordered by date published and we return 100. so we can check to see if we're using an index by adding profile to the front of our query so what profile will do is profile will run the query and then give us the execution plan for the cipher query and if we don't want to actually execute the query we could do an explain instead so explain explain is useful if we have a really big query we don't know how long it's going to take but we want to see the execution plan so the execution plan we write our cipher query and cipher is a declarative language so we're saying here's the graph pattern i want you to go find database please figure out how to how to find this and the execution engine in neo4j then generates a query plan that's a series of operations to figure out basically how to find that pattern how to address basically whatever i've declaratively said in cipher that i want to accomplish now this is great for if we have a slow query looking at this metric called db hits so basically what we want to do as we're trying to optimize any query or tune any query is get the results that we expect with the minimum number of database hits and one way we can do that is by using an index to basically very quickly go to the starting point for our traversal instead of sort of scanning through all of the uh all of the nodes and checking property values or checking uh checking things like that and so in this case we don't have a index on published we can create one article published something like that so now we've added an index and let's run our query again and i think in this case we end up not using the index either it's probably because of the size of the data it's not huge so i can match on article return count i think i have maybe like a thousand or so yeah four thousand so it's not a huge data set i don't think i have any indexes in this but uh indexes can be useful for finding the starting point of a traversal uh often times we don't use indexes to traverse the graph right so in in a graph database we have index free adjacency which means that we're traversing the graph without using an index but indexes can be important for finding the starting point in that traversal but yeah good question thanks rich cool so that's the data set we're working with oh by the way so uh so we saw that cipher query for calling out to the uh new york times api with apoc load json to load that in the graph that that works once um but we want this to run like every day because every day we want to grab the the top articles so i set up a github action to do this give actions um are really cool just kind of i don't know how to describe them like serverless containerized pieces of work that i can sort of define and they run in the background typically they run in result of some action that i take on github so when i check code in for example we can trigger a ci process to like run the tests in my repo or something like that but github also has a action that they publish called flat data which allows us to periodically go out to a endpoint or also i think we can run a sql query against a sql database periodically and then have some post-processing step so i set this up i published a simple github action called flat graph that allows us to just define a cipher query without kind of setting up all the processing things so basically we just define a cipher query that defines how we want to handle the json data that comes back and that runs every this one i think runs once a day because the data is only updated once a day uh question from wintaka cold start is one of the research problems of conventional recommender systems um i want to use neo4j for this what's your view on that yeah cold start is something we'll talk about later so the cold start problem is basically if i don't have any information about users in the network so when you first sign up for a system or a news application we don't know anything about you so how could we possibly give you recommendations um yeah so that we have a little bit on that actually that we'll we'll get to later so yeah it's very very relevant question so we've talked about the data set we saw uh how we can query it using cipher to do things like looking at the topics for articles and and this sort of thing we saw a fun example of using shortest path but okay how do we actually use neo4j and cipher for personalization and specifically generating recommendations for news articles well there's basically two categories of recommended approaches to recommendations with graphs there's collaborative filtering and content-based filtering so collaborative filtering we are using the preferences ratings actions of other users in the graph in in the network to find items to recommend to a user so this is the kind of thing that if i have rated a bunch of movies i can look at other users that have rated similar movies that i have what other movies are they rating positively that i haven't seen those are good candidates for recommending to me or with products i've bought this pair of shoes look at all the other users that bought the same pair of shoes what are something else that they bought that i haven't that might be a good recommendation so kind of wisdom of crowds using using data about users in the network to generate recommendations then the other approach is content-based filtering so where i'm recommending items that may be similar to an item that i have expressed interest in so if i'm watching a movie and i like this movie we could look at what are other movies that have the same actors or the same director or maybe the same genres this sort of thing but in both of these we're traversing the graph to find relevant recommendations so if we look at collaborative filtering the first step is basically to find similar users in the network so we can think of this as like my peer group or my cluster my community in the graph and then the next step is to identify what actions those users have taken that i haven't taken yet so what products have they bought that i haven't purchased what movies have they watched that i haven't or in the case of our news graph what news articles are they reading and sharing that i haven't yet so one way to approach this there's lots of ways to approach this one way is to use similarity algorithms to find the most similar users in the network uh to handle that first step another one would be clustering or community detection algorithms so a popular one where we have ratings is cosine similarity so i'm basically taking two vectors where the vector is my rating of items so the example on the screen there comes from the recommendations neofj sandbox which is looking at movie ratings so i have a vector of all of my ratings of movies and compare that to a vector of other user ratings calculate a cosine distance between the two users based on those two vectors of ratings gives me a value between zero and one of how similar two users are do that for all users in the network and i can see who has the most similar sort of taste as me often times though what's interesting about ratings and i think especially maybe like movie ratings so people have a different baseline so for some users uh a four is really really good for them and they'll maybe never give out a five or very rarely give out a five whereas other users if something is yeah that was good i'll give it a five so everyone has sort of a different baseline pearson similarity takes that into account it's basically calculating a correlation coefficient but it's basically looking at variation uh about the mean rating uh so that can be a good one to look at as well all of these similarity metrics that we're looking at are available in the neo4j graph data science library so you can see the cipher examples there we're using calls to the gds library and again all the examples come from the recommendations neo4j unbox but there's a problem with collaborative filtering and this is exactly the problem that mintaka brought up in the chat this idea of a cold start problem so one of my news apps that i i wasn't signed in or something and i opened it up to take a look at my personalized recommendations and it basically says hey i don't know anything about you because you're not signed in i can't give you any recommendations because i don't know anything about you i don't know anything about your preferences and this is this is really a problem because we really want to be able to generate recommendations for users as quickly as possible to help drive engagement uh with our application um so once i did sign in then okay we know we know uh your interests based on other articles you've read and so on so we can generate some recommendations so what is one way we can avoid this cold start problem well and one way to avoid that is this idea of content-based recommendations so here's an example from this comes from new york times articles this is an article looking at a railroad in bulgaria and at the end of the article i have other recommendations for in this case things like other travel articles other articles about bulgaria this sort of thing and if we look at that so i found that article in our news graph and if we look it up by title well one way we could generate content-based recommendations is look at overlapping topics so what are other articles that share topics or other articles that are about the same geo-region which in this case is bulgaria so we can see how we can traverse the graph that way to generate personalized recommendations okay so we've talked a little bit about how we can use knowledge graphs in neo4j for generating personalized recommendations for news articles how do we operationalize this how do we actually build a personalized recommendation news endpoint and this is where cloudflare workers comes in so cloudflare workers i think are really uniquely suited for this type of personalization cloudflare workers are i think of as kind of like the next evolution of serverless so if you're familiar with things like lambda functions um you're deploying these serverless functions to a specific region so i'm deploying my lambda to us east one and it sits in a data center uh somewhere in virginia so anytime i'm calling this endpoint now for my lambda function even if i'm the other side of the world it's going to a server sitting in virginia yesterday we saw us east one go down so lots of services so not only is that kind of slow because of the networking but it's not very redundant right and there are there are ways to deploy your serverless functions to multiple regions so aws has a service called lambda at edge which basically allows you to deploy to multiple regions but cloudflare workers is really an evolution beyond that sort of idea of a lambda function serverless function where we're actually running code on the content distribution network so so cdns are much more prevalent than regions so they typically live in cities instead of regions of the world so what this means is that uh when i go to via webpage and i'm loading static content like html or image files we don't want to incur a lot of network latency by say a user in i don't know south america that has to go all the way to virginia to incur the network latency to send that instead we want to go to a cdn node somewhere in south america in that user's country ideally in that user city right so this is kind of a solved problem for static content we know how to do that by deploying to a content distribution network and edge handlers cloudflare workers these sorts of things are actually running code now on those cdn nodes on on the machines deployed all throughout cities all over the world much more widely than just regions um we're actually running that and there's a different architecture with cloud cloudflare workers so overhead is really sort of reduced per instance and really the benefits that we see with cloudflare workers for personalization are that all these workers are geolocation aware right so we know not only what city the user's request is coming from but oftentimes um cloud filler has some services for adding things like the latitude and longitude of not just where the cdn node is but where the user's request originated from adding services like bot detection and caching and these sorts of things can be really helpful with cloudflare workers so i'm pretty excited about cloud floor workers um i think it's it's um really neat there are other similar things like edge handlers i think on netflix or similar idea uh and then there are services built on top of this so like versailles has now um edge functions i think they call them which built which is built on top of cloudflare workers okay so let's look at an example so this is a simple architecture diagram of what we want to build here so i want to uh have a news app that can go to my endpoint served by cloudflare workers and then my worker is going to go to neo4j to basically generate the personalized recommendation for the user cloudflare workers have a cli called wrangler that is a good way for getting started there's also a web interface but i like wrangler so we can develop and test and deploy locally so we install the wrangler cli tool and then we can generate from template projects so there's an example that cloudflare has published using um i think it's called the idi router it's like a very simple router that allows us to have different endpoints different routes on uh on an endpoint uh so i started from that template and then we also want to save our connection credentials for neo4j as secrets which we can do with wrangler which is nice and then that's available as environment variables locally and also when we go to deploy it um here's the example let's instead of looking at screenshots let's jump into yes code you'll have to excuse my color theme i'm in the the holiday mood this is uh which one is this i think this is called uh north pole so very very christmasy yeah north pole there's also santa baby that one's a little more readable okay let's zoom in a bit that's too big yeah maybe that's good so this is the code for my cloudflare worker in points this is this is javascript um so we can see this is the router that the template project was pulling in called itty router and then i wrote a function here that is basically just just a helper for sending a http request to neo4j so one thing to be aware of with cloudflare workers because of the architecture some of the networking issues are handled a bit differently with workers so for example opening an arbitrary tcp connection from a worker right now is does not always work in some cases i know cloudflare is working on this they actually just a couple weeks ago released kind of a proxy thing that proxies tcp over websocket i think for helping with these kind of database connections fortunately with neo4j we don't need to worry about tcp so if we were using bolt which is the primary way that we interact with neo4j with neo4j drivers that's a binary protocol over tcp then we would have to think about how we can proxy that to our worker but neo4j also supports http requests so we can make we can send cipher requests to for j just over an http connection which is perfectly fine from doing from a worker we can uh open http requests from inside a worker just fine so we're using the http endpoint for neo4j instead of the bolt endpoint so something to be aware of for working with databases in a worker but again i know this is evolving quickly this is something that cloudflare i think has on their radar anyway so this is just a helper function for adding the headers for that http request to the database and then structuring the database given our cipher statement and any cipher parameters so then here's our first route so for the index route we get past a request object and the first thing we want to do is figure out the location of the user so for this route what i want to do is take the user's location into account and figure out okay where is this user and then i want to find basically what are news articles about the region most closest to the user so what we do is we get the latitude and longitude of where this request originated so where is the user located and then in our cipher statement we're going to find any geo regions that are closest to this point so it finds the 10 closest geo regions that we have in the database and then we traverse the graph to find articles that are connected to those regions and then we're returning the article information and then we also traverse out to find what are the other geos and topics and organizations connected to this article if you're familiar with cipher and this looks a little funky uh this is using a map projection and pattern comprehension features of cipher to basically return a json object so this is a pattern that i really like to use when i'm creating endpoints like this that's just returning some json object i can actually pretty much construct that json object with a map projection combined with pattern comprehension in cypher which i think is is quite tidy for creating these kind of endpoints that are returning json we then pass that cipher request to neo4j passing in user's location as a parameter and then because we're basically returning a json object from that cipher query we just have to grab it out of the first row and return that result so let's do a wrangler the wrangler dev i think to start a local server with this and it says running locally on port 8787 so let's check that out localhost 8787. so we should be seeing local news so i i live in montana and the first article i get is an article written by the former governor of montana about i don't know something about politics but you see one of the geos montana so yep that one looks good next article is about some murder in spokane spokane yeah that's like a couple hour drive from where i live article is about idaho which is just maybe an hour's drive from where i live so the next state over and other articles about montana and farm aid and these sorts of things so okay this is this is pretty good these are uh relevant articles based for me based on my location so giving me kind of local news that's nice the next thing we want to do though is so i'm reading an article and i want to see other similar articles based on content so i'm done reading maybe i read this article about [Music] rural politics i guess is what is what this is getting at and how democrats can appeal more to rural areas i guess and if that's something i'm interested in i want to read more articles about that how would i use the ideas that we talked about of content-based recommendations to recommend articles so for that we created a second endpoint um i see a question in the chat from chris he says what app is that are you asking about what is the the chrome extension that's rendering sort of pretty print json um so i'm just opening basically i'm just opening this localhost 8787 endpoint in in chrome i don't know what it's called i think if you search chrome extensions it's uh something pretty print json there's a few of them but basically it's nice if you load json in chrome it gives you uh kind of a nice pretty print then you can also just kind of collapse arrays and things like that yeah i think it's called pretty print json but if you search the chrome store for json you'll see a bunch of them so that's our first endpoint uh we created then another endpoint so now if we go slash recommended and then slash the id of the article now our cipher query is going to look up that article by id and then we're going to search for okay what are all of the topics geo regions organizations and people that this article is connected to and then we're going to use jacquard similarity from the gds graph data science library to find the most similar articles jaccard similarity is basically a set comparison operation so earlier we were looking at like cosine distance and pearson similarity and in those cases we had ratings we had numeric values to compute a distance with things like jaccard it's binary either this is about this topic or it's not right there's no like rating associated with that um so jacquard allows us basically just to compare two sets for sort of existence like is this thing in the set and calculate a similarity score based on that uh so we find the we find 10 articles that have the highest jaccard similarity and then again we use this map projection and pattern comprehension to return basically a json object to an array of recommended articles based on the id of whatever article we've passed in in the route so let's look at that one so we'll just grab this this article about politics in montana we'll grab its id and we'll go to slash recommended slash 16 424 is the id for that and so now we should see articles that are similar to this article that was uh that was about rural politics montana and what do we get we get uh something about the democratic party and it's dk state legislatures midterm elections all these kind of have the democratic party in common rich is back with another question is using internal id risky given ids can be used by the database yes it is rich you are doing a great job of pointing out my laziness of not creating indexes and using the internal id yeah so this is a really good point though so um this id is the internal node id of the node and so in neo4j every node in relationship has a internal id which is a value that basically points to an offset on disk um and this is earlier we talked about this idea of index free adjacency and how in neo4j we're able to traverse the graph without using an index basically just a constant time operation to do this traversal and that's facilitated by this idea of offsets in the file store so an id is pointing to an offset in the file store so like a physical location in the file store and when i delete nodes um and then load data again those ids can be reclaimed right so if if i have an id that is this block in the file store if i delete it well then i want to reclaim that space in the file store and i might load another id and load another node into this uh space in the file store so it's generally not a good idea to use internal node ids in external systems because they can be reclaimed so i did this again because i'm lazy uh there we in this data set we didn't have a unique id for these articles i suppose i could use the url that probably um would be good but in a lot of systems you have like a uuid or something to uh to identify a node's uniqueness so i was lazy and and kind of just went with the internal id it's fine in this case because yeah as you say like i'm not i'm not deleting data i'm only adding uh data to this data set so i think in in this case the these internal ids are going to be stable but yeah just it is just a general best practice not to use internal node ids referenced and external systems uh and also it sort of only lives for for the result of this query so in this case i i think it's okay but yeah it would be better to use a uuid if i had it or in this case probably the url of the article is probably safe to use i'm assuming those article urls are not repeated but yeah yeah great point okay so that's our second endpoint so we can now get local news based on a location and then when i'm done reading an article i can find similar articles using jacquard similarity uh so we've been running this locally let's give this a deploy so wrangler publish will now deploy this all throughout cloudflare's cdn and it gives me this endpoint now so this is news.graphstuff.workers.dev i'm going to paste that in the chat and so you should be able to hit that endpoint you can just open it in your browser like i have and what you should see are articles that have something to do with locations closest to you and and all this comes from the new york times so there should be pretty good coverage throughout the world i would think so i'm curious how well this works for folks if you want to if you want to try that and see what kind of articles you get recommended if it actually does take your location into accounts so let us let us know in the chat if you see anything anything interesting there so now we've deployed our endpoint um we're now using cloudflare's cdn to uh to sort of execute our custom code so when you hit this endpoint that code is running on the cdn node closest to you in your city it's not going to a region in uh in the in a data center rich ask where does secrets come from are you accessing it through environment variables yeah so wrangler allows us to set secrets in the cli so i use like like wrangler secret put neo4j uri something like that and then it asks me uh what value do i want for uh for that and it is i guess it's not quite an environment variable because this is this is um sort of a custom run time uh they're basically just global variables that get set in the worker environment so in my case i've set neo4j off which is a basic auth header because we're using the http request to neo4j and then also the neofj uri somewhere down here uh yeah this one um yeah so it's it's it's like an environment variable but in our code it's it's basically a global variable and the wrangler cli handles that both for local developments and then also for uh deployment we can set like local and uh production values differently um i just set the same one because i only have the one near j instance cool so i think that is about all that i wanted to show um hopefully that does a good idea of sort of motivating why you would use knowledge graphs in neo4j for recommendations and how you can use cloudflare workers alongside that to sort of operationalize some of the ideas that we have available in neo4j uh so all the code for everything that i talked about is on github link there in this news graph repo i'll drop a link to that in the chat as well i said also there's uh there's some other examples in here for working with graphql as well so i'll end with just a couple of resources that you may find useful i mentioned near shay sandbox earlier specifically the recommendations sandbox where we were looking at calculating those user similarities using the gds library the recommendations sandbox instance has data from the open movie data set and movie lens is where the ratings come from so those are combined together and then we see how to generate personalized recommendations using a few different approaches sandbox is nice because i can spin that up i get an instance that's private to me along with some guided queries that kind of walk me through that in cypher it's a great way to learn some of the more advanced things we can do in cipher and then we didn't use this explicitly but this is a good tool in general that i want to make sure everyone's aware of which is arrows.app so this is a graph data diagramming tool i guess that allows us to create our graph models so anytime you see some of the graph model diagrams those are typically created with arrows and this is also useful because we can export the json to create those and check that into version control so i can check my model into version control i have others work on it as well um great so that's uh that's it from me you can check out my twitter and personal blog and newsletter there as well and then as i mentioned earlier i also host the graphstuff.fm podcast so definitely check that out if you are into podcasts and that is uh that's it so thanks very much for joining today cool thank thanks will uh thank you all for for joining indeed uh thank you for presenting will was was very interesting and and and i think it's a it's a it's a good start so like you said if if anybody wants to try this themselves then i think you you provided all the all the material and then you know if you have any questions further on or if you're stuck with anything we have a community forum or a discord server where you can definitely go and and ask your questions and we will we will uh surely have uh eyes on on on these two channels as well as obviously others but um there could be a segway um yeah with that i think we're done for today i thank you all for joining thank you for for uh an interesting chat conversation today so that was nice um i think next is um monday actually michael and i we will talk about aura free and uh and a couple of interesting datas that's how to get going with that so do you know what data set you're going to use yet for monday no not yet no i think uh it's always a bit of a surprise right it's always going to be a surprise yeah but i i have i have a i have a gut feeling that with michael's pandora paper block article from earlier today or actually yesterday i am pretty confident that we might look into that that data set um because it just um it just makes sense and then we just uh we just posted about it i think it's in the developer blog so it's not on the uh on the regular one so if you type in [Music] developer no um slash developer hyphen blog i think that's what it is yeah yeah here we go yeah this is it um so i guess we do that on monday but if you if you want to uh you know be eager you can always read the the the article from from michael already cool i'll link that in the chat in case anyone's interested exactly cool thanks thanks will um so yeah have a have a look at that uh have a look at the material from will um again thank you for for showing us uh through this this was really insightful and and interesting um you know let us know if you if you program your own uh news recommendation app it's it's no shame to brag if you have done that so if you you know should you do this over christmas or the holiday season then post it in on twitter and let us know for sure yeah all right with that thank you all for watching uh see you soon and yeah take care everybody bye bye
Info
Channel: Neo4j
Views: 332
Rating: undefined out of 5
Keywords:
Id: ayqF9ABp3bQ
Channel Id: undefined
Length: 74min 7sec (4447 seconds)
Published: Wed Dec 08 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.