Best Practices for GraphQL Clients

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- Okay, so yeah, my name is Joe Savona. I'm a Software Engineer at Facebook, and I work on the Relay Team as well as working with other GraphQL teams for building our native clients on iOS and Android. So, I've been at Facebook for about two years. So, I haven't been there for the entire history of our GraphQL client development, so in kind of preparing for this talk, I've looked at what we've done in Relay as well as talking to a whole bunch of people around the company and kind of getting more information about the early years of GraphQL just to make sure that I'm kind of covering all the bases. So, at Facebook we've been developing GraphQL clients for over four years and in this talk, I want to share about the lessons that we've learned. So let's start by answering the question what is a GraphQL client? I think we all have an intuitive sense of what it should do, but let's kind of just nail down the specifics of that a little bit. So, a GraphQL client allows developers to specify queries, obvious. It sends those queries to the server. It parses the responses that it gets back and makes the parsed objects available to user interface code. So far, so good. It should also cache data that's already been fetched to avoid refetching data unnecessarily, and also, by caching data we can allow the application to work offline. It keeps the data and the UI consistent. So, if we fetched a particular piece of information once and that appears in multiple places in the user interface or in a cache and that data changes, that change should be reflected everywhere that piece of information is displayed or cached. A GraphQL client should make it easy to fetch paginated lists, so things like the Facebook news feed or search results lists where there may be many more items than we can download at one time. There may be algorithmic ranking, things like that, and we want to be able to fetch lists incrementally, and finally a client should allow us to execute mutations and the mutations system should integrate with our caching and consistency approaches and also with pagination as well. So, let's start at the beginning and kind of add things incrementally as we go. So, before we can even send a query to the server, we actually have to define the query somewhere and we've seen some approaches in earlier talks of using code to generate queries and things like that. So, the pattern that we've converged on at Facebook is what we refer to as Colocation. Again, we've heard a lot about user interface components today, and at Facebook we obviously we've developed React Component Kit for iOS and of course we can use Android Views as well. And so the pattern that we've developed is to either have a doc GraphQL file that contains fragments for each component or have the GraphQL embedded in the code. In our JavaScript client Relay we actually embed the GraphQL directly into the JavaScript. In either case, this makes it a lot easier to keep the data fetching and the UI in sync. Again, we've kind of repeated some things we've heard today, so I'll move through this a bit quickly, but in general this allow us to if we added property to the user interface. So, if we're adding a name that wasn't there before, we can then jump directly to the fragment, add that field and keep the component in sync with its data dependencies. This also means that if we reuse a component somewhere else in a application, we can simply reference its fragments and make that we fetched all the data to render that UI. By kind of avoiding, it's really tempting I think to create shared fragments and have some user fragment that we reference in multiple places in our application, but this means that if we now have, we might have some field that exists in that fragment, and now we have to go and check everywhere that uses the data from that fragment to see can we actually delete this field and so by having shared fragments, it kind of defeats some of the benefits of GraphQL in that we don't get this clear association of fetching exactly what we need for a particular piece of code. Okay, so Colocation has worked out pretty well for us. Now, let's look at Query Construction. Again, we saw some features of this before. So, the kind of first pass at doing query construction would be something like what we do in the current version of Relay. The user, so this is at Build Time, we can look at the query string on the left. Again, I've put quote marks around it to emphasize the fact that this is text string and at Build Time we can just do an initial parse of this and create a simple AST representing the query. This is how Relay works today. So, this works and the object representation allows us to describe things like when we have a fragment reference. So, the ...Photo we can generate code. So, actually look up the actual code representation of the photo fragment at Runtime. Now at Runtime, we can take this AST, execute it, embed fragments into their parents, get one giant representation of the entire query, print that to a string and send it to a server. This is very much like the approach that we saw earlier from Shopify. So, this works fairly well for smaller applications, but it doesn't necessarily scale. So, as our product grows in complexity we're gonna add more UI components. We're gonna add more different types. So, we might've had a story type before, but now we might have photo story and a video story, and all these different variations, and so our queries grow larger even as the amount of data we're fetching doesn't actually grow as much because we have unions and more complexity and more fragments. So, overtime as you can see from this completely non-scientific graph, (audience laughs) our query sized wood drill right, and the thing is this isn't just the query size. The query size actually has an impact. It means that we're spending more time at Runtime just simply generating the query representation and converting that back into a string. It means that we're taking time. Every single user of our application has to wait while their phone is doing all that work uploading all those bytes to the server and every time they run the app they're uploading the same bytes over and over again, and that means the time spent uploading is time that they're not spent downloading their response and actually using the application. So again, this isn't necessarily a problem at first, but it can become a problem, or it may become a problem over time. So, one option you might think of is, okay well, let's just like stop doing Colocation, and we'll write super minimal optimized queries. You don't have to do that. There's an alternative. That would kind of throw away the benefits of Colocation and the alternative that we arrived at is what we refer to as Persisted Queries. So, going back to that Build Time step, we already have a clear structure for where to define the queries. So, we know where to look for them. They're in either doc GraphQL files or they're embedded inside the code, and we already have a mechanism to actually convert from these disparate colocated fragments back into an AST and convert that AST into query text. We're doing that exact work at Runtime. The key insight is that we don't actually have anything happening at Runtime. That is, we're not dependent upon Runtime to know what the query is. These are all just static pieces of text around our code base that are just getting put together at Runtime and sent to the server. So, we can do that query aggregation at Build Time. So, we can take all those different fragments, put them together, send it as text to the server, again in a build step, save in database, assign it an ID, we can hash the query text to get the consistent ID, return that back to the build step and then save that ID somewhere. Now at Runtime, instead of sending up this massive query string, we can send the ID and just the variables. So if, for example, you wanted to have some conditional logic, in certain cases we'll fetch this fragment. In other cases, we'll fetch this fragment. We can just use an at include or at skip directive and then send a variable at Runtime to choose which one we fetch, and then we can get the data back and render. So now, returning to our completely unscientific graph, we can see that persisted IDs don't get larger over time. The main thing that changes here is we're probably gonna add a bunch of variables, and so the amount of data we send in the variables will grow, but this is much more scalable. So, Persisted Queries have been working well for us. Effectively, all the queries in our iOS and Android clients are persisted IDs. I say effectively, because I admit there's the possibility that there's a text being sent somewhere, but I think in practice, yeah, uh yeah. So, in Relay we are not using, the current version really doesn't support Persisted Queries. We're moving toward supporting them in our JavaScript client as well. Some ideas if you use Persisted Queries are it can be helpful when you're debugging a production issue. You probably have the ID that the client sent, and so it's helpful to have a quick, you know an internal tool that you can go to to quickly get the query text for that ID. You may also want to be able to have put in the query name and get back the list of IDs that have been assigned to that query. So, I can see all the versions of the feed query over time and what is the current ID that is about to sent to master. The other thing you might wanna do is to actually log the performance. So, we saw some tools that allow you to log the performance characteristics, and persisted IDs kind of give you a really great way to do that. So, you can, for example see, okay, let's look at the performance of the feed query over time and see how each different ID worked and how many times that ID's actually being used by clients. So far Persisted Queries give us a way to efficiently fetch a new query that we haven't already fetched before, but as we can all imagine, we're probably gonna want caching, right? When you open a screen on our app, we leave that screen, a few minutes later we come back and now we're stuck downloading all the data for that screen all over again. Ideally, this would be fast and we can use the data that we fetched a few minutes before. So again, this is a really specific use case and this kind of goes to our development philosophy for our clients which is to try to solve specific use cases first. See whether that works and if it does great and if it has problems, then we solve the next problem. So just like with rest or other HTTP protocols, we can simply add a response cache. So, here we'd have a kind of read right through cache where we have a mapping of IDs and variables. We kind of do like a stable hashing of those to get a key. If that key is already in the cache, we just use the cached value. Otherwise, we'd just go to the server, get the data, throw it into the cache and then return it to the UI, and standard things apply here like TTL, so some form of expirations that we don't keep cache data around for too long, and all the things about LRU or other eviction algorithms putting a maximum cache size. One other thing that you can do is to persist this cache to disk so that if the user is offline, they can still actually maybe get the initial screen's worth of data for example. That's kind of an easy and cheap way to get something out of offline functionality. Okay, like response cache is really useful, but it introduces its own problems. How do we ensure the data is actually correct, and how do we ensure that it's consistent? So, let's look at an example. Imagine that we fetch a query that includes the message counts. So, if you're familiar with Flux at all, you'll recognize this example. So, we fetch the message count at a certain point and we cache the value for this, and the message count is one. At some other point, a little bit later on we fetch a different view that also happens to include the message count, and at that point the message count is two. If we cache these results independently, the two caches will now have different values, and if we were to go back to that previous screen, we'd use the cached value and we'd see message count one in part of our app and message count two in a different part of our app and that's not necessarily a great user experience. So, without some form of coordination, these two cached values will get out of sync. So, we'll need something to coordinate these. There's a lot of different approaches to this. There's kind of more decentralized approaches, more normalized. The approach that we're kind of converging on in our clients, particular in Relay and our iOS client today and kind of moving towards I think on Android is to have a more normalized store and that sits kind of at the center of the client side to cache. So, every query that we fetch is published into the store and then views subscribed to a query and pull the latest values of that query from the store. So, in the case we saw before, when we fetched a subsequent query, not only did we see the new count on the new view, but because that data is published to the store which then gets notified and published to the previous screen, they now both get the correct message count. The format of the store is basically a flat mapping of kind of identifiers to records. Where Record is another mapping of basically a field name to value, and so, here for example, you can see that we've got a store that has two entries in it, my ID, where the identifier might be the type user and then my actual ID kind of put together as the identifier. We have a record. We have the type name. We have the ID field, the name, and then you can see that for city, where I live, we're not actually encoding the record directly. We have a reference to this object by its ID, and so when this allows us to have a flat mapping where we can take in a new set. We can basically take a new store and an old store, kind of walk them together and just merge the two together. So, the main kind of operations we have in the store are publish where we have some new data we've gotten from the server and we have existing data in the store. We take those two mappings, walk them together, add in any new entries from the server and just simply add them. If the record exists in both, we merge them together, and as we do that we can record which IDs have actually changed. The other side of this is to subscribe and notify. So, a UI subscribes to the store with a query or a fragment and when new data's published, we look at the results. What changed in the store? So, that we know that ID a was added, Id b changed, ID c got deleted, and we can look at these subscriptions and see which subscriptions would be affected and any subscriptions that are, we basically re-execute the query and give the new results back to the UI. So, components only kind of get the actual records that they need and we avoid unnecessarily updating UI. So, if your query isn't affected by the result, you don't get notified and you don't have to re-render. So there are some trade-offs to consider here. It isn't quite as simple as just making everything automatically consistent. For example, we may want to delay the publish. So, imagine that we're on a mobile phone and we navigate to a new screen, the user can only see the new screen anyway, so we might want to prioritize rendering that new screen before we update any other screens that the user can't see right now, and so we're kind of having some amount of eventual consistency. Technically speaking, that screen is not consistent, but we can't see it, so that's okay. Also, there may be certain specific cases where we actually want to opt out of consistency. It can just happen on your product where you have a specific case like maybe the user's interacted with some data already and if I've already modified one field, I don't want the other fields in the form to change, you know certain edge cases like this can happen. We can usually handle them by just letting the store notify you and just ignore the new notifications that you get, but it's something to consider, and then finally, lists are fetched incrementally. We fetch the first part of a list, second part of a list, and it isn't quite as straight forward to automatically make lists consistent, and moreover, if I scroll down in the comments for a story over here in the UI, I probably don't expect comments in some other part of the screen to suddenly start scrolling as well. We kind of expect, users kind of expect different instances of a list are actually distinct. So, we'll need a solution for that as well. So, let's briefly return to caching. So far we've looked at just simply going to a screen, going away and coming back to that exact same screen, but there's another case that can be pretty common. That's where we have a list view and then we click into a detail view. It's just very very common in applications. So, for example looking at my news feed and we have a lot of information already on the client about this story that I've read about the GraphQL Summit, if I click on it, I'm gonna see a loading spinner because our response cache doesn't have this query yet. So, ideally, we actually have a lot of data on the client. We could use that to more quickly render this screen and maybe fetch just any missing information in a separate query. Now, we looked at the notify operation where when new information is published we kind of tell the view what changed. Well, implicit in that is the ability to execute a query against the cache. So, we're used to basically having queries be executed in the server where the resolve function is calling your product specific business logic, but in this case the resolve function is a very, very simple look-up in this normalized map, and as we go down fields, we just continue to look up references. So, for the case that we just saw for navigating to a details page, we can look in the store and just traverse the queries. Do we have everything for this story? So, we'll start at the top of the query, look in the cache and load the story ID. Okay, it's there, great. Now, we can traverse down to the author of the comments and continue down executing the query to see if we have all the records. If every single record is in the cache, great. We have all the data we need, we can render. If we're missing anything, okay, now we'll have to go to the server and fetch that. So, I mentioned that it's a bit, there's some kind of tricky specs to data consistency and pagination. So, there's a few things that i keep in mind with pagination and some use cases that we have to work around. First, is that lists often need to have, they have special relationships. So for example, my friends list, it's not just a list of users, but it's a list of information about when I became friends with people, how I became friends with them. Those are all properties of the relationship and not either person. So, we want our schema to kind of give some space for that. Lists are often rent by an algorithm. So, for example the Facebook news feed. The item ordering may change over time. So, for example a story that is relevant to me today might not be as relevant tomorrow and shouldn't appear as high on the list. So, if I refetch the same exact arguments, even in a very short time span, I might see different results, and finally, with these types of lists there's too many items to download in full to the client and so we typically fetch them in increments. So, effectively there's this virtual list that exists in abstract on the server and that we're going to construct a client side representation of, but it's not clear that we'll ever have exactly a one-to-one mapping between this theoretical virtual representation on the server and what we see on the client. We're always kind of synthesizing the view on the client. So, let's look at an example of that. The way that we can represent paginated lists, in Relay we refer to them, really in general at Facebook we refer to them as connections, is as a segment, and the segment says kind of where it starts in this virtual list, how many items are there, and whether there's more going in that direction or not, and then of course, the edges themselves. So, we might've gotten items a, b and c. Now, let's say we want to scroll down and we ant to load more items. So, we'll go to the server and we'll say okay well we want more things after c, and again I'm using after here because these are the kind of arguments that are in the Relay connection spec and these are the kind of arguments that we use at Facebook for connections. You can imagine doing a very similar thing with like a limit or offset style approach. So, okay we got our new segment. So, after c we've got three more items and there's still more things in this direction, so we have this theoretical with these two segments that we have to join together in some way, and so, the question is how? And you'll notice that I've thrown in c twice in this list because items can get re-ranked and now maybe c is a bit less important and moves down in the list in between the time that we fetched the first set and the second set. So, this is a product specific decision. What do we do here? Do we have like c twice? Do we leave it where it was? Do we move it down? This is kind of up to you. If you've already rendered that first view, you might not want to have an item in your list suddenly jump to somewhere else. The user might be looking at it right now, and this is something you have to decide at the product level how to handle. This kind of, two main things you can do here is leave c where it was originally and just ignore the second one or move it to the new position, and so the way that we represent this typically is as kind of three levels. There's the segment which is a chunk like this that has where it is, what edges are there, and then there's a sequence of segments because we might fetch the beginning of the list and the end of the list, and then finally, we have the Connection Controller, and this is a controller in the typical MVC controller sense of the word in that it coordinates between the fetching data from the server, putting that data into the store and allowing the UI to officially render the view that we've synthesized on the client. The controller can do a lot of that work automatically by having a consistent connection scheme with a consistent naming pattern for fields and their arguments, but we'll probably want some amount of product specific logic like we talked about for how we actually merge new edges. Do we leave them where they are? Do we reorder them? Et cetera. And for things like React or a component kit where we have a more declarative UI architecture, we can create a connection component that takes the configuration and a simple function to basically render each edge, right. It's gonna render a list for us and it's just gonna render call this render edge function once for every item when we fetch new items in the server it will call this function appropriately to actually render those into the UI and the configuration can specify things like, what is the query that we actually used to get more items? What are the default variables to use? The logic for merging new items into the list, things like that. Okay, so this brings us to the last major segment which is mutations, and remember mutations can be really simple. Might be as simple as toggling a Boolean, but they can also be very complex where we're adding a new item that should appear at the top of one list, but in some other part of the UI that list is sorted in a different order and so the item should appear somewhere else. We have to integrate with caching, et cetera. So, the other thing with mutations that can be complicated is that a given change in the input can affect a very large number of items in the output. For example, if you and I become friends, well you should show up in my friends list, theoretically my news feed should maybe now show stories that come from you, because your stories are now eligible to show up there. Similarly, if I block a user then I can't see you, things I've cached about you might disappear, et cetera. So there can be certain type of changes that really invalidate a wide part of the graph. Also, there's complex business logic and privacy that make it sometimes hard to even figure out on a client how to even emulate the results of a mutation, and so, it's definitely pretty complex. So, in general mutations are a bit harder to work with than queries because of all these effects, but kind of even more simply there's the question of when I do a mutation, what do I refetch? How much do I refetch? I might've fetched a large amount of the graph, but I have to make a choice. Do I eagerly fetch everything that could change? So here's an example of a story like mutation where I'm just clicking the Like button on Facebook. What do we refetch? Well, there's a bunch of things that have the word like in them that are fields that are obvious candidates for things that I might want to refetch like do I like this story. Yeah, I'm toggling the buttons. That's probably a good one to fetch. The Like count, how many times it's been liked. Like sentence kind of summarizing who's liked this. The list of people who've liked it and the name for each one and possibly many many more things, but the question is do I actually need all of those fields? Well, maybe my UI is only showing the blue button and it just needs to know whether I like the story or not and all I really need is doesViewerLike. Maybe I'm the UI where I'm showing the names of people who've liked it and so I am going to need likers name. Maybe I have a really complicated UI that shows many many more details about each person who's liked it and I need way more information than just the name. So, this really depends upon what you're actually rendering. So, there's kind of two main options. One is to reuse a fragment that you've defined. You probably have a couple fragments in your app that are high little story components and the simple thing to do would just be to reference those fragments and fetch everything you need about a story again just to make sure. It's pretty simple, it gets you in sync, there's no way you'll be, or it's much harder to have a piece of information to be out of sync. On the other hand, there's a lot of things about a story that don't change and we're gonna end up refetching them. So, a more optimized approach would be to explicitly list out for example, just the one field that we know we care about and that could change, and so, you can find a balance between these two approaches of on the one hand, erring on the side of keeping everything consistent with some over fetching or attempting to be more efficient and minimal. Now, this is the only slide that is in inverted colors because this is an example of something that we tried and we've learned doesn't necessarily work as well. So, we just saw how it's kind of hard to figure out like exactly what we should refetch in our mutation, and we had this thought a couple years ago which was what if we could infer the mutation given the things we've already fetched? To make that a little more concrete, let's look at an example. So, this is how mutations work in Relay which is where we tried this experiment. On the left we have the fragments that we fetched about a story and so in reality in our application is actually fetched doesViewerLike and the names of the people who've liked it. On the right, we have what we call in Relay the Fat Query. These are all the fields that could theoretically change about a story given this mutation. So, it doesViewerLike, likeCount, likeSentence, likers, and I put question marks under likers because we just know that the likers could change, but we don't know what properties of them the UI would care about, and so, we tried this experiment with Relay where we basically said okay, we'll take these, we'll record these queries over time as the application is fetching information from the server and we'll let the user define this mutation. So, this is user defined and this is stored by the system, and we basically intersect them and so we'll see that doesViewerLike appears in both. Okay, then keep that. We see that likeCount and likeSentence aren't used by the application so we'll skip those queries, and finally we'll see that the application has queried just likers name, and so populate the likers field with just the name. So, this seems really convenient because I can just define everything that could change then the application is going to construct a super minimal efficient query for me, but we've been here before, right? We talked about this at the beginning of the talk. Query Construction time goes up. System Complexity we have to keep all these tracked queries around. That takes memory, that takes logic in our actual client, and of course, it means we can't do Persisted Queries. We're stuck in sending query text up to the server. So, what we kind of found is that this really seemed like a developer experience win because it lets the product developer not have to think so much about the exact right fields to fetch, but it isn't really super predictable. We look at our Fat Query and we say oh, these things will probably be fetched, but you don't actually know at any given point in time which fields are being fetched. It introduces client complexity. It just makes the client have to do a lot more things. It has to do Query Construction at Runtime, it has to keep track of those queries, it has to construct valid mutation queries which can be, you'd think it would be easy, but it's actually not as, there's kind of some edge cases that we've run into. It means we're doing Runtime with Query Construction and it means we can't do Persisted Queries which we found to be helpful in reducing query time. So, this is an example of something that we really thought would be a great developer experience win. It really seemed intuitively like a good direction and in retrospect, it's something that didn't work out as well as we'd hoped, and for really going forward, we're moving back toward static mutations which is what we're doing and have been doing on our iOS and our Android clients the entire time. So, I bring this up as kind of an example of how some, I think, intuition with GraphQL clients can be tricky. It's very easy to imagine like a theoretical ideal of how a client should work, but in practice it's better to focus on a specific use case and to solve that one at a time, and I think that this is one of those ones where we're a bit optimistic about oh, we can make this thing great and we had a really good idea. It just didn't quite work out as well as we'd hoped. So, to recap, we've tried a lot of ideas over four years to GraphQL clients. Some worked. Persisted Queries is a great example of something that has been fairly successful, and we've also had a few missteps, things that didn't work out as well as we had hoped like track queries. So, the take away for me and I hope for all of you, is that it's really helpful just to focus on specific use cases like with caching. If we don't have too much data consistency concerns, we can start with a simple request response. Cache solve this, a specific instance, and then move on to solve other things as we see them. Another take away is to exploit the declarative nature of GraphQL. So, we don't, GraphQL allows us to express data dependencies in a static text form and that we don't need anything at Runtime to generate a query so we can actually use that to persist queries at Build Time, to generate optimized artifacts based on that query. We can even generate optimized parsers to handle a response for example. So, what's next? I think it's natural that GraphQL is still kind of, has only been open source for about a year. We're all kind of still in the exploratory phase and like what is the right way to build a client, and I think a lot of the community is still trying out ideas, and so I hope that this talk has helped kind of establish a vocabulary for some of the core components within the GraphQL client system and that from you know, allows us to kind of have more effective discussions when we talk about common components, and also gives us a collective base for where to start exploring with new things. If you have questions, check out GraphQL.org. Check the Relay website. Thank you very much. (applause)
Info
Channel: Apollo GraphQL
Views: 21,719
Rating: 4.8577776 out of 5
Keywords: graphql, relay, open source, javascript
Id: 1Fg_QtzI7SU
Channel Id: undefined
Length: 32min 27sec (1947 seconds)
Published: Thu Nov 03 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.