Maps, Arrays and Subcollections, Oh My! | Get to know Cloud Firestore #4

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
SPEAKER: So if you've been following along with this video series, maybe you've noticed something here. Cloud Firestore consists of a bunch of documents, which are basically key value pairs, as well as collections, which are collections of documents. But then, within these documents, we let you store maps, the JSON-y looking things, that also kind of work as key value stores-- you know, like a document. Not only that, but you can also store arrays, which looks suspiciously like collections. So at some point, you might start asking yourself, well, why do we even have separate documents when we could just, say, put all of our data into these maps? Why create a subcollection when I could just create an array of maps, or even a map of maps? When is it appropriate to use one versus the other, and what would work best in my app? Good questions. Let's see if we can find some answers on this episode of "Get to Know Cloud Firestore." [MUSIC PLAYING] So when it comes to determining the best way to structure your data in Cloud Firestore, it's helpful to remember some rules about how Firestore works. Let's go over some of these now. All right. Rule number one, documents have limits. So there are some limits when it comes to how much data you can put into a document. For starters, you're limited to one meg total of data in a single document. And I know that, in our world of, like, high-res, 16-megapixel photos, one meg seems tiny. All right. Cheese. Whew. Oh, you got to turn the flash off. But when we're talking about text to numbers here, and we generally are, that's still a lot of data you can fit into a single document. For example, the complete text of "A Tale of Two Cities," 300-page book from an author who is paid by the word, only about 770k. You could fit all of that into a single document, and still have room for a good "Sherlock Holmes" mystery. Also, another rule around documents-- and this will come into play later-- is that they can't have more than roughly 20,000 fields. And by the way, this also includes fields within the maps that you have in your documents. So this is six fields, and this counts as seven. One for the overall address field, as well as one for each individual element in that address map. And this would count as nine. And one big reason for this limit is that Firestore is creating indexes for any field you have here in your document-- even, yes, inside a map. Basically, from Firestore's point of view, my document here has a field called address.city, and it's going to index it. And even if you don't hit this 20,000 field limit, having documents with a ton of fields in them can impact your write performance. Because Cloud Firestore needs to redo every single one of these indexes when you create another document, make a change, and so on. One other rule around documents is that you're generally limited to one write per second on the same document. So if you've got a situation where, like, a lot of clients are all trying to write to the same document all at once, you might start to see some of these writes fail. Now, your client libraries are generally smart enough to retry these writes with some incremental back-off if that starts happening, but it's still a situation you want to avoid. On the other hand, writing to lots of different documents in the same collection, that's generally fine. Go crazy. OK. Let's move on to rule number two. You can't retrieve a partial document, at least not using the client SDKs. So just because you can put "A Tale of Two Cities" into a document doesn't mean you should. See, when Cloud Firestore retrieves a document, you're retrieving all the data in that document. You can't retrieve partial data from a document using the client SDKs. So imagine I have my collection of Charles Dickens books, and my documents have a title field and a "the entire contents of the book" field. Then I decide, hey, you know, maybe I just want my app to show a list of what Charles Dickens books are in my collection. So I query this collection. Well, hey, guess what? In addition to grabbing the titles, I'm also grabbing the entire content of all the stories, which means I'm now downloading approximately 38,000 times more data than I actually need. And that's bad, right? This is something your user is going to notice. This is going to affect your app's performance, its data usage, and its battery consumption. And this tends to put your app in a state I like to call getting uninstalled. Sorry, buddy. I can't keep it around. It's killing my data plan. This also has implications when it comes to security rules. You can't create a security rule that gives you access to some parts of a document and not others. If you want to keep part of a document secret, you'll need to place those secret parts into a separate document. And I'll talk more about that in our video on security rules. OK. So now you're thinking, well, gosh, clearly the answer is to break my data up into as many tiny documents and as many tiny subcollections as possible. Well, hang on there. Because we have rule number three, queries are shallow. So you probably already know this, but when you grab a document, you don't grab any of the data in any of these subcollections. And this is generally a good thing, right? It means you can store your data hierarchically, in a way that feels logical, without having to worry about grabbing more data than you intended. So if I took that content from my "entire text of the books" field, and moved it down into a chapter subcollection, well, now, I can grab my list of Dickens books titles without having to grab all the text associated with them. However, there's a flip side to this, which is if you think you're almost always going to be showing data together, it might not make sense to put it into a subcollection. For example, maybe I decide to put my list of main characters into a character subcollection. In many applications, this might make sense. But if, in my app, it turns out that whenever I'm showing the title of a book, I'm also going to be showing a list of all the characters, well, now I can't grab all that data I need in a single query. I now have to get the book and then get a list of all the characters that are in that book by querying each book subcollection. And this is going to make populating a list like this a lot more complicated. So if this ends up being the look and feel of my app, well, maybe I don't want to put this data into a subcollection. I can keep it as part of my main document. There's also another reason this setup may be bad, which is rule number four. You're billed by the number of reads and writes you perform. The other disadvantage of putting these characters in a subcollection like this is that-- and again, this is only assuming I want to grab the documents in this subcollection every single time I grab this main document-- I've increased my reads here, from one read to several, depending on how many characters I want to read in. Now, how much this matters is probably a factor of how often you'll be making queries like this. I mean, if this structure means your app is going to be making 2 million extra requests per day, well, you know, that's only roughly $0.12 a day. On the other hand, if your app is making 2 billion extra requests because of this structure, well, then that's $120 a day and probably something worth optimizing. Like I said in the last video, I'm not always a fan of premature optimization for billing purposes, particularly if it comes at the expense of more complicated code, but it's definitely something to be aware of. OK. Rule number five, queries can only be used to search for specific documents within one collection. And they do so by looking for specific fields that match a certain criteria. Hang on, Todd from 2018 Todd from 2019 is gonna cover this rule. Lemme take it from here. So on this side, I have my Charles Dickens characters stored as a subcollection for each book. I've included their name and an occupation as two different fields. On the other side, I have the same list of characters stored as a map field directly inside the book document, with maybe the character name as the key and their occupation as the value. Now, these two structures let me run very different types of queries. With my characters in a subcollection, I can say, hey, show me every character in "Great Expectations" with a name that starts with p. Or every character in "Oliver Twist" who's a doctor. And if I wanted to say, "Hey, show me all characters in all books named Oliver," I can now do that with a Collection Group Query Now, this would require that I set up a Collection Group index for the name field in the Firebase Console, and do keep in mind you're limited to about 200 of these. On the other hand, what if I put characters in a map like this? Well, I could kinda say, "Hey, show me every book that has a character whose name is Oliver." And I would do that by looking for a field called characters.Oliver with a value that's, like, greater than an empty string. But I'll be honest, this is kinda weird in that I'm relying on there being a field in a map with a key named in a particular way, and that seems all sorts of error prone to me. In addition, I wouldn't just be returning character data here. I'd be returning information about the entire book that contains this character, which may or may not be what I want. And I wouldn't be able to do queries like, show me every character whose occupation is socialite or show me all characters sorted by name. So my search options are a lot more limited here. Yeah, I'm pretty sure socialite is an occupation. It's just that in 2019, we like to call them "Social media influencers." Now a third option is to put your characters in a top-level collection. This was a nice workaround before Collection Group Queries were supported, because they let me search for, say, characters by name, or all characters in a particular book, and so that search across all characters in all books. And if you'll recall from our earlier video, queries in Cloud Firestore are very fast. So this "Show all characters in Oliver Twist" query in the top-level collection really isn't any slower than grabbing all the documents from a subcollection. I think the biggest drawback here is that if I wanted to, say, search for all characters in Oliver Twist sorted by name, well, now that requires a composite index because it is now a multi-field search and, you'll recall, I'm limited to 200 of those. So if you think you're gonna be searching for individual records from a group of data, you should put them into a collection. And you can use either a sub-collection like this, or a completely separate top-level collection. As a general guideline, I'd probably say, store data hierarchically if you think you're mostly going to be searching for items per subcollection and only occasionally want to do a collection group query. And put 'em into a top-level collection if you think you're mostly going to be searching across all documents and only occasionally want to do, say, a per-book query. And putting your data into a map like this, it's not really useful from a querying standpoint, This would more likely be useful if you're just storing data you know you're always going to want to retrieve with the parent document. All right, back to you, Todd from 2018! Whoa... too far! (Grunts) This might make more sense when we get into some more examples in the next video, so stick with me if you're confused. We have one more rule to get through, and that's rule number six, arrays are kind of weird. So there's a really good blog post out there on why arrays are evil, but basically, a lot of traditional array operations don't work very well in a service like Cloud Firestore. Because when you've got data that could be altered by multiple clients, it's very easy to get confused as to what edits are happening to what field. At least in a map, if I have several clients all trying to edit different fields, or even the exact same field, I generally know what's going to happen, right? I'll end up with a map that has fields that are equal to that of the most recent client update. But in an array, if one client is trying to edit a value at index two, and another client tries to delete the value at index two, and another client tries to insert a value somewhere at index zero, I'm going to have very different results, and possibly out of bounds errors, depending on what order these instructions are received in. So for that reason, Firestore's actions with arrays are a bit different than what you might expect. For instance, you can't perform actions like Insert or Modify, or delete an element of an array at a specific position. And you can't run a query on the element contained at position 2 in the array. So are they useless? Well, no, not at all. By the time you see this video, the team will have added some operations that do make working with arrays a lot more useful. See, it turns out, a lot of developers don't really care about the exact order that they're storing things in an array. They just want to use arrays as a simple way to contain a bunch of flags. For instance, maybe I have a bunch of keywords that I'm storing for each of my Charles Dickens books. So what do I do if I want to find all of my dramatic books, or books that involve war or crime? Well, we now have an array-contains querying feature that will search for documents in a collection where an array contains a certain value. So I can keep my key words in an array like this and say, hey, let's find all dramatic books by looking for books where keywords, array-contains, drama. So if you're wondering how this works, you can kind of think of it like behind the scenes, Cloud Firestore converts this array into a map, where all the values of the array are fields in this map. And like a typical map, every one of these fields gets placed in an index. So asking for books where keywords, array-contains, drama is kind of like asking for books where are imaginary keywords drama is equal to true along those lines Cloud Firestore also added some new features to add or remove specific values from arrays, as long as you don't really care about the exact position of them. So editing this array of keywords is a lot easier now, and doesn't run into some of those problems I talked about earlier. So go check out the documentation if you want some more info about that. So that's a lot of rules. Can we turn these rules and restrictions into guidelines? Well, let's see. For starters, put data in the same document if you're always going to display it together. You generally want a happy medium in the size of your documents. Don't make them so big that you're downloading more data than you need, but don't make them so small that you're downloading 30 documents from two different levels of the database just to fill out one screen in your app. Put data in collections when you're going to want to search for individual pieces of that data, or if you want your data to have room to grow. On the other hand, leave them as a map field if you're going to want to search your "parent" object based on that data. Another good time to put data into a map field is just if you want to put related data closer together for organizational purposes. For example, take a typical address. Sure, I can make each of these fields top-level values in my document, but I think it looks a lot nicer if they're put into a map like this. I can still query them just fine, and actually reduce the risk of keyword conflict naming later on in my document. Vectors and games might also be another good candidate for maps, something like this. And if you've got items that you would generally use as flags, go ahead and stick them in arrays. So that's an awful lot of rules and guidelines, and maybe talking about these in the abstract is enough for you to get started. But personally, I'm always happier if I can talk about these with some examples that mirror more real-world situations. And it turns out-- minor shock, I know-- Charles Dickens apps aren't exactly a hot category in the app store right now. Maybe next year. So follow me along to the next video, where we'll look at how to put these guidelines into practice. Come on. It'll be fun. I'm sorry. You're asking for a $10 million valuation, and the Charles Dickens app market is already too crowded. I just don't see the numbers. And for that reason, I'm out. [MUSIC PLAYING]
Info
Channel: Firebase
Views: 203,249
Rating: 4.8960366 out of 5
Keywords: Arrays, Maps, Subcollection, Data Structure, cloud firestore, firebase, firebase developers, json, store arrays, store documents cloud firestore, subcollections, maps vs arrays, maps vs subcollections, app developers, mobile app developer, building apps, firebase tutorial, firestore tutorial, cloud firestore tutorial, storing data, firestore data, fields, cloud firestore documents, firestore data model, array of maps, client sdks, GDS: Yes;
Id: o7d5Zeic63s
Channel Id: undefined
Length: 13min 47sec (827 seconds)
Published: Tue Sep 11 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.