How to Structure Your Data | Get to know Cloud Firestore #5

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
SPEAKER: So as you recall from our last video, there are a number of rules governing how Cloud Firestore works that all play a part in how you should structure your data. And while it's fine to talk about these rules in theory, I think sometimes it's better if we can put these in the context of some real-world or close to real-world examples. So come on with me. Let's figure out the best way to structure our data on this episode of Get to Know Cloud Firestore. [MUSIC PLAYING] So for the purposes of this video, let's keep pretending that I've got this great restaurant-review app that I'm powering through Cloud Firestore. Maybe it's got a search page where you can search for the top-20 Japanese restaurants in Boston, stuff like that. And from the Search Result screen a user could click on a restaurant to see details about it. This Detail screen would have more info about the restaurant along with maybe a few snippets from its most recent reviews. We'd have an option for a user to see more reviews, and if they click on any of these review snippets we would show them the full and complete review, so pretty standard. Let's tackle a few questions around how we want to store our data. For starters, where would we put the reviews? When I've used this example in previous videos, I've generally assumed we've had this kind of database design. We've got the restaurants in one collection and all the reviews are in individual subcollections. But is this actually the right design? Let's look at a couple of alternatives. Alternative number one, let's not make these reviews separate documents. We can stick all the review data into the main restaurant document as a map or an array of maps. Once a user clicks on a restaurant, I could give them these little review snippets immediately because I've already got them loaded up. And hey, as a nice little side effect, I've also saved myself a bunch of document reads. So this is an alternative, but maybe not a very good one. One problem with keeping our reviews in a map field like this is that when a user decides to search for the top-20 Japanese restaurants in their city, we're not only grabbing all the information about the restaurant but we're also grabbing every single review for every restaurant on that list. And that is a bunch of reviews and, frankly, a whole lot of data that they will probably never need. I'm also going to run into limits, either the 1 meg of data limit or the 20,000-field limit if these restaurants get too many reviews. I also can't query these reviews either, right? I can't filter for just the 10 most recent reviews or anything like that. I only get back my giant restaurant document with all the reviews in there, and if I want to sort those reviews or filter them or something, I would need to do that on the client. So I think keeping reviews in their own separate collection is the right way to go, but let's look at our second alternative which is what if rather than keeping these reviews as a subcollection for each restaurant they were just in a completely separate top-level collection? Well, now this gets interesting. Your first impression might be like, oh, don't do that. It's going to be a lot harder to get reviews for a single restaurant. But that's not the case. Remember, queries in Cloud Firestore are always fast, so grabbing all reviews for Todd's Tacos from a single collection of 3 million reviews takes essentially the same amount of time as just grabbing all the reviews from a single subcollection. More importantly, by putting everything into a top-level collection like this, I can do searches for reviews that span different restaurants, like I could search for all reviews by an individual author. That's not something I could do by keeping them in separate subcollections like this, at least not until collection-group queries get supported. And so if I went with the subcollection strategy and I knew I was going to have to find all reviews by a single user, I'd need some kind of workaround that I would be doing in advance. Probably what I'd is over in the user collection I'd keep a list of all the reviews that this user has written and then use Cloud Functions to keep all this data up to date. Sure, it's not the end of the world, but none of that extra work would be needed if reviews were just in another top-level collection like this. On the other hand, if I have everything in a big top-level collection, some things are a little harder. Specifically doing something simple like grabbing reviews for a restaurant sorted by date or sorted by score would require a composite index, whereas I wouldn't need that if these reviews were already in their own subcollection. Also, I suspect a number of security rules will be easier to write when I'm dealing with subcollections of a specific document instead of a top-level collection, mostly in cases where you're going to want to restrict access to a subcollection based on information that's in their parent document. But really, either one of these solutions is fine. I've seen them both used in the wild. But if I'm going to be totally honest with you, the main reason I went with reviews as a subcollection is that it makes it easier for me to emphasize the whole, hey, look, Firestore supports shallow queries thing. So I'm going to keep using this one for demonstration purposes, but you feel free to do what's best for you. Now as long as we're here, let's talk one further refinement. Remember that in my Restaurant Details page I've got my restaurant info and then these snippets from 10 recent review section at the bottom. And so with my current database structure, I'm either populating these reviews snippets from a review subcollection or my reviews top-level collection. And that's fine, but that does mean bringing up a single restaurant page is now 11 document reads instead of 1. And yes, I will get billed for those. And if you consider that a typical user might click on some restaurant details, skim the reviews, go back and repeat that process several times, that can add up to a lot of reads. And so if I really wanted to avoid all those extra reads, I might look at creating a review snippets field in the original restaurant document. This could contain the first hundred words of the 10 most recent reviews, maybe a headline and the author name-- basically enough to create a presentable review snippet but not so much that I'm downloading a ton of unnecessary data for each restaurant. And again with some Cloud Functions, which I promise I will talk about in a future video, I can keep those in sync. Now this is nice because I'm now only reading in 1 document instead of 11, and it simplifies the loading of this page because I only need that one document which, most of the time, will already be in memory. Plus if I ever do need to show more reviews or the full text of one of those reviews, then I can read those in from the individual review documents. But is this optimization worth the extra code needed to keep these values in sync along with the extra data our user is loading in every time they bring in a restaurant? Honestly, I'm not sure. Viewing restaurant details is a user-powered action, which means it's not going to happen several times a second, right? It will more often happen a few times over the course of a several-minute session. So this could be a case where you end up doing a bunch of work to save a million read a day or $0.60, which may or may not be worth it. So I think this is one of those situations where you're going to have to measure what's actually happening in your app and decide accordingly. I know that's kind of a wishy-washy recommendation, but that's software development for you. OK, next example-- how would we store flags for restaurants? And I'm talking about little binary values for takes reservations, is romantic, and so on. Put those away. Two months ago, I would have said the best way to store these would have been either to make these as individual fields inside the restaurant document or group them into a map field. And to be honest, these are still perfectly reasonable solutions. But now that we've added the array_contains field, I could just as easily put these into an attributes array. Then it would be pretty easy to search for all restaurants that take reservations. But keep in mind that right now you can't run two array_contains queries on the same array. So if I really do want to look for restaurants that are both romantic and kid friendly-- which is kind of an oxymoron, I know-- I should keep using separate fields. Should have read the reviews first. OK, now here's a fairly common case you're going to come across. What about storing a list of users who can access a document? Now this is a very common piece of data you're going to want to keep track of, particularly in collaborative apps where users can invite their friends to view or edit their data, and you're going to need to keep track of who can edit what. For example, let's say a restaurant has a list of editors, and these are the user IDs of people who are allowed to edit the restaurant document. Now I know I haven't really talked about security rules yet. That will be in the next video. But one thing you should know about security rules is that they have the ability to grab a specific document and read its content. So I could, for instance, put everybody who's allowed to edit a particular document into an array, and then my security rules could say something like, hey, only let users edit this document if their user ID appears in this array. But what could work better would be to use a map here where a user ID is the key and their value is their role, something like editor, owner, community manager, et cetera. And so we could write a security rule that says a user can edit this document if document.roles.userID equals editor or something like that. Now this is great, but it does have one small issue in that when you retrieve this document, all the contents of this document are transferred along with it. Remember there's no such thing as a partial document query on the client. So you're kind of leaking the list of users and their roles for this restaurant. Now this may be OK. These user IDs are pretty opaque. You can't really reverse engineer them to figure out who the underlying user really is, and they are unique per app. But still, the phrase this may be OK isn't super reassuring when we're talking about security. So this information might be better stored outside of this document in one of two ways. Option one, have a distinct editor subcollection where the document ID might be the user ID of the user, and maybe we store this person's role and some other little information about them inside each document. Or option number two, just have a subcollection called private data that contains only one document. And this is where you can keep your "who has access to what" map that we talked about earlier along with other information that you might not want to share with the general public-- maybe data for your internal sales team or data that Cloud Functions will need to perform its duties. Now both of these examples work fine, but definitely think about this kind of situation for your app because you will probably want to create access control lists like this and you will probably have data associated with the document that maybe you don't want everybody to see. OK, last example, but maybe the most important one. What if my user wanted to store a list of their favorite restaurants? This is the kind of many-to-many relationship that often is problematic in NoSQL databases. So first, let's assume that our database has some kind of users collection somewhere. Well, one option is we can just store a list of restaurant IDs as an array in a favorites field for each of these user documents. Thanks to our new array union and removal methods, it should be pretty easy to maintain this list as our user adds or removes restaurants from their favorites. If I load this array into memory when our application first starts up, my client could very easily tag a restaurant as a favorite when we see it in the app. The problem, though, is that it's hard for us to do a "hey, here's all your favorite restaurants" kind of page using an array like this. I would basically have to do a separate query for every single restaurant ID that I get back from this array in order to populate this page, or maybe slightly better, get a callable Cloud Function to do this work for me, and I'll talk about that in a later video. Now this isn't a great experience, but it's not terrible either. And if I think this feature isn't going to be used very often, then this would be a perfectly fine solution. Don't let this scare you away. On the other hand, if this is going to be a frequently used feature, there may be ways of making this a little easier and more performant by using some denormalized data. For example, instead of just storing a list of restaurant IDs, I can keep a big old map of maps that would contain enough data to populate a simple My Favorite Restaurant screen. Maybe I have the restaurant name, cuisine type, and address stored in this favorites field. That might be enough to populate the My Favorite screen. And if the user ever wanted more information about a restaurant, well, then I can query the full restaurant document and populate a Restaurant Detail screen just like normal. But the trick with denormalized data is that we need to make sure that if Todd's Tofu Hut ever changes their cuisine from Japanese to burgers, every copy of that denormalized data has to change as well, which means we need to be able to query every user who has this restaurant listed among their favorites so we can make that change. As it turns out though, we can do that. I can basically create a query that says something like select all users where favorites dot restaurant 4215 is greater than the empty string. That would give me a list of all user documents where this restaurant is listed as a favorite in the map field somewhere, and it would be fairly easy to have a Cloud Function update all those. Now the disadvantage is that this is a big chunk of extra data that I'm loading per user, and we are eating away at my 1 meg slash 20,000 fields limit here. So I'm going to need to limit the number of restaurants my user lists as a favorite. Although honestly, that kind of makes sense. If you list 500 restaurants as your favorite, hasn't the word favorite lost all meaning? This got kind of philosophical. Now another option might be to keep my user's favorite list as an array of IDs and then keep the restaurant snippets inside a subcollection. I could then load a list of the user's favorite restaurants in memory without loading all the extra restaurant details along with them, but they're also easy to load up if I ever need to show this favorite screen by querying the subcollection. And I can still find a list of users who have favorited a particular restaurant by running an array_contains query across my users collection. Now once I have that, it's fairly easy to get at the individual snippets that I need to change, although you'll notice I can't do it all in one single query now. I basically have to fetch these documents one at a time. Also, just a heads up, if you have an array that you're going to use with an array_contains query, those array elements will still count against your 20,000-fields limit. So I'm still going to need to limit my user's favorite restaurants to a reasonable number. But you know what option might work best here-- a completely separate top-level collection. Imagine that I have a FavoriteRestaurants collection. Every document in there would contain a user ID, a restaurant ID, and enough of a snippet about the restaurant to populate a My Favorite Restaurants page. Now grabbing all the restaurants favorited by a specific user is a very simple query, so creating that My Favorites page is very straightforward, and it's just as easy to grab all the instances of a particular restaurant if I ever need to change or update this denormalized data. So you know what? I'm going to say this is my favorite option, at least for this particular setup. Whew. All right, well that was a lot of examples to go through, but I'm hoping by now you're getting a better sense of when to use documents, when to use maps, when to use subcollections, when to use arrays, and when to put things in a separate top-level collection. Now there's always more to explore here, and as you've noticed, there's rarely one simple answer to anything. There will always be trade-offs no matter what option you choose. So hey, if you have a setup you like better than any of these situations, go ahead and share them in the comments below. And I will see you on a future episode of Get to Know Cloud Firestore. Well, I think that's a wrap. All you all hungry? Let's go to Todd's Tofu Hut. I hear they serve burgers now. No, no, meat burgers. I don't know why they didn't change the name. Ask them. No, it's a different Todd. Come on, let's just go. [MUSIC PLAYING]
Info
Channel: Firebase
Views: 192,502
Rating: 4.950366 out of 5
Keywords: Arrays, Maps, Subcollection, Data Structure, cloud firestore, firebase, firebase developers, json, store arrays, store documents cloud firestore, subcollections, maps vs arrays, maps vs subcollections, app developers, mobile app developer, building apps, firebase tutorial, firestore tutorial, cloud firestore tutorial, storing data, firestore data, fields, cloud firestore documents, firestore data model, array of maps, client sdks, GDS: Yes;
Id: haMOUb3KVSo
Channel Id: undefined
Length: 13min 57sec (837 seconds)
Published: Wed Oct 24 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.