Cloud Firestore Data Modeling (Google I/O'19)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] TODD KERPELMAN: Hi, everybody. Hi. Oh, look at that. He got a reaction. That's great. I could tell-- you're the best crowd I'm going to have at this event. Also the only crowd-- but still, the best. So my name is Todd Kerpelman. Thank you for attending this intro talk on Cloud Firestore data modeling. I am from the Firebase team as maybe you might have guessed. Hey, Firebase, in case you hadn't heard, is a set of tools and services to help you build more successful mobile and web apps. And we do that with everything like analytics, to A/B testing, to performance monitoring, to, yes, hosting your app data in the cloud with Cloud Firestore, our massively scalable cloud-hosted NoSQL real-time database. So in the interest of time, I'm going to kind of skip the product pitch, because I think if you're here at this session, you generally kind of know that Cloud Firestore is awesome. It's got yay, reliability, and a truly serverless app development environment, and magical syncing of your data to all your devices, and great offline support, and client libraries for iOS Android on the web, and much more. We'll breeze over that, because I know that, certainly, when I'm first approaching any new technology, something that excites me like Cloud Firestore, I'm a little bit like our developer here who is apparently working on a food delivery app. I have this mixture of excitement and fear, right? I'm super excited to use all these great new shiny features and can't wait to see what I can do with it, but I'm also kind of afraid of messing things up, right? Like, how can I make sure that six months down the road I've made the right decision so that my database can make the right kind of queries that I want it to, or they haven't done something wrong and messed up the performance as soon as my app starts to scale, or I've done something terrible and I have driven up my database costs exponentially? How can I make sure I'm making the right decisions now so that I am not some, you know, cautionary tale on "Medium" later. That's at least, generally, the feeling I kind of have when I approach a new technology. I'm guessing maybe many of you do as well. I certainly kind of see this sentiment, this mixture of excitement and fear, out on Stack Overflow and, say, on our own discussion lists. And to be fair, there's a lot to think about when it comes to how you're structuring your database, and it can be kind of scary stuff here. So let's see if we can shine a light on some of these topics. And I think I'm going to start by just kind of reviewing, and in case you're new to this, what a NoSQL database is, I think, because for a lot of mobile and web developers, this is kind of the first time trying to make a real production-scale app using a NoSQL database. And then we'll get into a few details around how Cloud Firestore is different. So I would assume most of you kind of know what a more typical traditional SQL database is. You've got tables. And each one of these tables represents a strictly-defined object, something like an author, or a book, or a review. And you have schemas that have very strict rules around what kind of data is allowed to appear in each one of these columns. Like, you know, that first column in the authors table probably has to be a string that represents their last name. And the second one needs to be an auto-incrementing integer. And the third one has to be a timestamp. And so on and so forth. And then later on, you might want to sort of merge bits and pieces of these different tables together to get some data that you're interested in. And you would do this by writing something in a language called SQL. And writing the SQL statements is nice in that you can get the database to do all the work for you of finding all these different pieces of data together, and merging them together, and delivering to you. But it does have some drawbacks in that performance, for instance, can be very variable. This could be very fast or it could be kind of slow. It depends a lot on how much data you have to go through, what kind of queries you're asking of your database, exactly how your data is structured, and so on and so forth. That is SQL in a very overly-simplified nutshell. I've only got 40 minutes. In a NoSQL world, there's a few differences. For starters, things tend to be a little more loosey goosey in terms of how your data is defined, or I guess to use a more formal term, we like to say it is schemaless, which means that by convention, a lot of your data, a lot of your records for your data, will probably look similar, but there's no hard and fast rules around it. Yeah, sure, every animal here has a name, but that's just by convention. The database isn't really enforcing that. And this is nice. And then it gives you some flexibility. If you want to add a field, you can. I can add a birthday field here for my dog and not worry about adding a birthday field for my fish, because I don't care. It's a fish. No one cares about a fish's birthday. And developers generally like this, because it does give them the freedom to start adding data as needed. I don't have to worry about how am I going to backfill this birthday field into all my other animals, because, again, I don't care about my fish's birthday, right? This also lets you store similar but not exactly alike data. For instance, I can store a plumage field for my bird, and a hair type field for my dog, and a fin count field for my fish, and not have to worry about putting these fields in other records where they might not make a whole lot of sense. Now, the flipside of all this is that you do have to code defensively. You can never really be guaranteed as to what kind of data you're going to be getting back from the database. So you should fail elegantly if things don't meet your expectations. But honestly, if you're building mobile apps where your user might have a version of your app that's a year old because they refuse to update, this should be general practice that you're following anyway. Never make assumptions about the data you're getting. I think the biggest difference with a NoSQL database is-- if you could tell, it's sort of a hidden message right there in the name, there's NoSQL, so queries tend to be a lot simpler. So again, going back to our SQL example, imagine I had some tables of books and some tables of authors, and if I wanted to get a list of books and the names of the people who wrote them, I would do that with some kind of Join statement. But in the NoSQL world, you don't have access to these Joins. So strictly relying on kind of foreign keys like this, that's kind of foreign. This is not something you would be able to get in one database call. So I think a more likely scenario would actually be to duplicate some of that data. Take the author name from these little author records and also put them in the book records, so that they're grouped together in basically ways where you're going to want to retrieve them together. Now, this practice, this duplicating of data, is known as denormalizing data, which I know is a bad and scary thing for a lot of SQL developers, because we've been told ever since we were wee tots designing our first database that denormalized data is a really bad thing. You're really only supposed to have your data in one location so that it's easy to change later. You only have to change it in that one spot. But in the NoSQL world, denormalized data is not only allowed, it's expected. And so yes, if Charles Dickens were to change his name to Chucky D to stay relevant for the kids, I would need to go back and change it everywhere in my database. Not just there in the author record, but in those book records where that duplicate data lives. And yeah, OK, that's kind of a pain, but there are good reasons for doing this. One reason is that reads are really easy now. If I want to grab all books along with the names of the authors who wrote them, I can just do that, right? It's just there, and it's really easy for me to sweep up all that data. And think about it. Realistically, how often is your data being read versus being written? Depending on how popular my book app is, those records might be getting read in thousands or millions of times. How often is Charles Dickens changing his name? Like, once? Never? And so the NoSQL philosophy is kind of, hey, you know what? Let's actually optimize for the case that's happening thousands or millions of times in the real world instead of the case that's happening once. Another big reason NoSQL databases are set up this way, and yes, I am oversimplifying, but it makes it easier for us to scale horizontally, meaning that as your database needs to grow, as you add more and more data to your database, you can basically just throw other machines at it, and your data will automatically grow to span across these multiple machines. And it all just works. Particularly in a managed server environment, like say, Google Cloud Platform, it makes it really easy for us to just flip on and flip off servers as your database grows and shrinks. And we can accommodate your data without you ever knowing that we're doing any of this work behind the scenes to make sure we're adding room to expand. Now, by contrast, SQL databases that often have these tight interrelated joints, they tend to scale vertically, meaning that as your database grows and you need to accommodate it, you generally have to move it onto bigger and beefier machines. And at some point, you're going to run out of massive supercomputers to put that thing on. But also, generally speaking, every time you migrate your database to another machine, there's going to be downtime. And we don't like downtime. So NoSQL databases come in a lot of different flavors. We've got just big key value stores. We've got big-old JSON objects like the old Realtime Database. But Cloud Firestore is about documents and collections, so I'm going to spend a little time looking at these. Let's look at a document first. A document is something you can think of as a dictionary or a hash. It's got a set of key value pairs, which we like to refer to as fields. And the values of these fields can be a number of different things, anything from strings, to numbers, to very small binary objects, to these JSON-looking things that we officially call maps. Now, documents are stored in collections, which are, as you might suspect, collections of documents. Now, documents cannot directly contain other documents, but they can and often do point to subcollections, which contain other documents, which then point to other subcollections, and so on and so forth. Now, one important thing to note in Cloud Firestore land is that queries are shallow, meaning that I can grab this document in the top and not worry about grabbing all that data underneath it that are in all those subcollections. And this is generally nice. Developers generally like this, because it means that you can structure your data hierarchically in a way that sort of might make sense to you intuitively without having to worry about grabbing a ton of unnecessary data-- if I just want that document on top. The next thing we should probably cover are queries and how they work. Queries in Firestore are interesting in that, as a general rule, they are quite fast. And as a general rule, they scale proportional to the size of the result set, not the size of the underlying data set. And what do I mean by that? I mean that if I were to run a query that's asking for, say, the top 10 pizza restaurants in San Francisco, that query is going to take the same amount of time whether I have a thousand records to look through in my database or a hundred million. No matter how large that underlying data set is, that query is going to take me the same amount of time. So how does Cloud Firestore do this? By indexing every field in every document in every collection. So thinking about our fake restaurant delivery app a little more, imagine we start having some restaurants represented as documents here. And we put them in some kind of restaurants collection. Well, you notice each one of my restaurants has a field for name, and cuisine, and city, and rating. If I do that, Cloud Firestore is going to go ahead and create an index for every name, and every cuisine, and every city, and every rating in that collection. Now, all these indexes are created automatically for me by Cloud Firestore whenever I add, change, or delete a document. And now I can search for these documents in this collection as long as I can follow this two-step rule, which is step one, find a spot in the index where some condition is true. And then basically grab a bunch of adjacent documents until that condition is no longer true. So let's go into a concrete example here. Imagine I wanted to find all restaurants in Dallas. Well, that would be easy. I would find my City Index. And using this two-step procedure, find where city equals Dallas, and then grab all the adjacent documents until city no longer equals Dallas. Similarly, finding all restaurants with a rating of 4.5 or more, I could do that. Find that spot in the index. Grab all the adjacent documents until, I guess in this case, I run out of documents. I should probably also note, by the way, that map fields, the JSON-y looking things, you can query those fields the same way as any other field. If my address is set up like this, Cloud Firestore essentially looks at this as saying I have a field for address.street, and address.city, and address.zip. And it will go ahead and index those map fields the same way it would any other field. Two other features that I'm not going to get into just in the interest of time-- you can query across multiple fields. So I could say, hey, find me all Mexican restaurants in San Francisco with a rating of four or more. You can also query documents that have arrays that contain certain values. So if I have a flags field that has an array that contains a bunch of elements about that restaurant, I could perform a search that says, hey, find me all restaurants that serves alcohol or takes reservations. But again, I think the biggest takeaway is remember that every field is indexed. And every query has to follow this two-step procedure. I think it partly explains why things work so fast, but it also might explain why some things that seem like it should be possible aren't really, like ors. I can't say, hey, find me all restaurants in Chicago or San Francisco. That doesn't follow this two-step procedure. Similarly, I couldn't say, find me all restaurants where the city doesn't equal Dallas. You can't get the same performance guarantees using those types of queries as you could with this two-step process. And again, because we're fetching your results in real time, we want all these queries to be fast. So with all that in mind, let's start thinking a little more about our food delivery app and how we might want to organize some of the data. We've already been talking about restaurants, and I can imagine we're going to have customers that are going to want to place orders. And then we're also going to want to talk about the items that each of these restaurants are serving on their menu. So I'm going to continue talking about restaurants and, actually, that last one as well, because they kind of go together. So I think a good start is imagine a restaurant as we've been thinking about them so far. There are going to be documents that are in a collection like this. And our data, it's kind of what we've been thinking about. This actually seems like a pretty good start. Obviously, this is a little more simplified than what we would see in a real production app, but I think we generally get the idea. But the one thing we haven't talked about yet is what should we do about the actual items on the menu. Well, it seems like it'd be pretty easy to convert a menu into a JSON-y looking thing and make that a map field that we put inside a restaurant document. That seems reasonable, but it could also make it a subcollection. If you think about it, each one of these individual menu items, they could easily be their own documents. And I could make that a subcollection of this restaurant. Well, that also seems reasonable. So I have two kind of reasonable-looking solutions. What's the right one here? What should I actually-- which way should I go? Well, I'm going to spend a fair amount of time talking about this, because it does bring up a few more rules that I want to get into. Hurray for more rules, right? Just when you thought a talk on database structures couldn't get any more exciting, I'm going to add in more rules. So the first one I want to talk about is that documents have limits. There are limits in Cloud Firestore that prevent you from having documents that are too big. Specifically, these are three that you should be worried about. One megabyte in total size of your data in a single document, 40,000 index fields, and one QPS of sustained dock rights. Meaning that you can have little bursts of rights to a document, but on average, you should only have one right per second to the same document in Cloud Firestore. So these are some limits that we put in to make sure that your documents aren't too big, but in practice, if we take our menu and we make it part of our restaurant, does that push us into the too big area? Well, let's think about that. So one megabyte doesn't seem like a lot of space, right? We're all taking pictures of these slides, and each of those are going to be four meg or something, right? But remember, we're primarily dealing with text here, and text, or numbers, or JSON-y looking things, and those don't really take up a lot of space. All of "Pride and Prejudice" could fit into one megabyte. So unless we got George R.R. Martin writing our menu item descriptions, I think we're probably going to be OK. And then he would kill off all our favorite dishes. 40,000 index fields-- well, all right, this could be an issue. Remember that each one of these fields in my map is going to be indexed. So Cloud Firestore is creating an index for menu.ribs.name, and menu.ribs.price, and menu.ribs.description. And, oh boy, that seems like that could add up. But at the same time, 40,000 is kind of a big number. Even if I had 200 items on my menu, I'd have to have 200 fields for each one of those items to really worry about this limit. So again, we're probably OK. And as for one QPS of sustained dock rights, I don't think that's really going to be a problem. I can't imagine us updating the price of our menu items more than once in a second. If we do, we'd be having problems. Maybe if I had a real-time inventory of how many of these dishes that our kitchen had available to serve and I was updating that real time, then I would worry about this limit. But again, I think in this example we're probably OK. So you do want to be careful about having documents that are too big, because you are going to run into these limits. Although, in practice, right now I'm not actually sure that's an issue. So let's look at some other rules and see if they can help us make our decision. Rule number two is that you can only fetch documents. So you know how I told you queries aren't shallow? When you grab that top one, you're not going to grab any of the stuff, any of the subcollections, underneath. Well, that's part of it, but the flip side is that you cannot retrieve a partial document. You either get the entire document or you get nothing. So if we start putting our entire menu in one big document, that's going to start to get kind of big. And if our client, one of our users, is saying, hey, you know what, give me the top 30 sushi restaurants in Boston, our database is going to give back the information they probably care about at that time, which is the name, and the delivery fee, and their rating, and their address, stuff like that, along with everything each one of these restaurants has in their entire menu. And that's probably more information than our user really wants at that moment. And yes, I know a few slides ago I told you that text isn't a big deal compared to stuff like photos, but still, you're going to have users that are going to be quite sensitive to how much data their app is using, particularly in certain parts of the world where data is expensive. In addition, the more data your app is going to use, the more battery life your app is going to take up, and the slower your app is going to seem, because we have to download all that data before we can show your results. And, by the way, if you had a real-time listener setup on these results, then when one of those values changes, we actually send you over the entire document again. So you can kind of see how this could start to be a bad user experience. So yeah, you really don't want to send out more data than your user is actually interested in at that time. Now, on the other hand, if we break this up into subcollections, then when I say, hey, show me the top 30 Japanese restaurants in Boston, I'm going to get back just the restaurant information I care about at that time-- the name, the address, the rating, and so on-- but not any of the menu items. Later on, when I say, oh, Izikaya looks really good, let's see what they have on their menu, then and only then can we fetch those other documents from the database. And that's a lot better from a data usage standpoint. We're only sending our users the information they care about at that time. So does that make our subcollections solution clearly the winner? Well, hang on, because we've got more rules to go through. More rules. Rule number three is that billing is mostly based on the number of documents that you touch. So Cloud Firestore pricing involves several different factors, but I guess it's primarily driven by the number of documents that you're interacting with. Specifically, you're going to get charged $0.03 to $0.06 for every 100,000 reads you perform and similarly for writes and deletes. And so you want to think about the number of different documents you're going to be interacting with. So if I ask for the top 30 sushi restaurants in Boston and everything is in one giant document like this, I'll get billed for 30 document reads. Even if later on I say, OK, let's see what's on the menu for one of these, we've got that data loaded up locally. And assuming we've kept it around and haven't discarded it, now we can show you the menu for one of these without incurring any additional reads. So that seems nice. On the other hand, if I were to load up 30 documents of the top three restaurants in Boston, and then say, OK, let's see what's on the menu for one of these, now I'm getting billed for that initial batch of 30 reads and then that second batch of 25 reads or whatever to get what's on the menu. So is this bad? Well, the answer is it depends. Specifically, if you think about most food delivery apps, you're generally looking at a list of restaurants first. And then maybe after you've found one that you're interested in, clicking through to get the full menu, then probably place an order from there, which means that each one of these sets of reads is a manual action. Meaning that, realistically, your user will be doing this maybe a few times per session, but probably not hundreds or thousands of times. Now, on the other hand, if we really wanted to load up the full menu every time we did a search of restaurants, or we thought, hey, we'll be clever and start preloading all our menu items every time our user performs a search, now with one single user action, we're grabbing not just all the restaurant documents, but all the menu items in all the subcollections of all these restaurant documents. And that is going to be bad. This is the situation you probably want to avoid. So you do need to stop and ask yourself what is your app actually doing and make the right call from there. And sometimes when I give this advice, people get mad. They're like, why don't you just tell me what the one right answer is? But I think the point is there isn't always one right answer in every situation. You kind of have to understand what the trade offs are and make the right call based on how your app is actually behaving. So in our case, even though we are going to be making more document reads with this subcollection, I'm still OK with it, because, like I said, each one of those sets of reads is a manually-driven action. Be careful about over-optimizing for price. I've seen some really strange solutions out there where people are too focused on pricing and they end up either creating a bad user experience or creating a lot more work for themselves. If you really want one right answer, one rule of thumb, I would say generally have one collection per table view controller/activity/page. In our app, if we have our list of restaurants, our restaurants search page, that's one view controller. That's going to be driven by the restaurants collection. Later when I say, OK, let's see details, what's on their menu, that's going to be then driven by another subcollection. So if you really want one right answer, one collection per view controller/activity. But hang on, because we're not done yet. There's still one more rule to talk about. [COMICALLY GASPS] And that's query search for index fields across a collection. So we kind of talked about queries earlier. So if I were to say, hey, find me the top 30 restaurants in Dallas, I could do that with either one of these setups, whether our restaurants are larger documents or smaller documents with these subcollections. Either way, this query works. What would I do if I say, hey, I'm in the mood for chicken tikka masala? Can I do that with either one of these setups? Well, let's start by taking a look at the bigger documents. As you recall, every field in a document, even the ones in these map, are indexed. So looking at our menu here as stored as a giant JSON-y looking thing, you can see that I'm going to have an index for menu.korma_lamb.name. And so it kind of looks like looking at something like that, looking at what I have for menu.ctm.name. I could kind of search for restaurants that serve chicken tikka masala, right? And I would do that by saying, let's find restaurants where this menu.ctm.name field exists. Honestly, I don't actually really care about the value as long as that field exists, they probably serve chicken tikka masala. But the problem here is that I'm essentially relying on every restaurant menu having the same key name for that dish. The fact that I actually don't care what the name of the dish is is kind of a red flag. If Roger's Restaurant were to use a different key for that JSON object that it's using to represent chicken tikka masala, now that dish is not going to show up in my original search. I'm essentially relying on this weird secret hidden information that every dish has to have the same key name in my JSON object. And that's going to be kind of weird and error prone, and so I'm not a big fan of this solution. This is where it seems like subcollections would be a simpler and more natural solution. If I look at every one of these documents representing an item on my menu, I can see that each one of these has its own set of values. And I would have traditional indexes on each of them. And searching by name now seems a lot more natural. I could say, find me items in this collection where the name equals chicken tikka masala. The problem is this only searches in one collection. I can do it for Kiran's Restaurant. I cannot do it across all collections. Until now. [COMICALLY GASPS] Big gasp. Yeah. [APPLAUSE] All right. So this is where collection group queries can help. This is a feature that I know we've been talking about for a while. And I'm happy to say you can all play with it this week. So basically, the way it works is you go to the Firebase console and you would tell Cloud Firestore about the queries you might want to make across multiple collections. So in this case here, you can see I'm basically saying, OK, out of all the collections called menu items, I want you to find this field called name, and I want you to index it in this collection group scope. And what that basically means is I want you to index the name field across all menu items collections anywhere they exist in my database. Index it as if it were just one giant collection. And so that's what Cloud Firestore is going to do. It's going to look at every collection with the same name and index that name field as if it was one giant collection, which then means I have a name index that looks at all of these documents in all these different subcollections, which then means that I can query that collection group to find all restaurants that serve chicken tikka masala, even though they are now split into different subcollections. So going back to our original dilemma of do we want to have larger documents or subcollections, I think given the advantages that we get of putting them in subcollections, specifically, we don't have to worry about hitting that theoretical larger document limit. We're much more respectful of our users' data. And we can now search for menu items by name by creating a collection group query. This is going to be my winner. Yes, the bigger documents will still give me fewer reads, but like I said before, be careful about over-optimizing for price. You want to make sure you're not going to do anything that's going to be catastrophically bad, but if you're trying to wring out every last cent of your database usage, that might actually be better spent on other things. All right, so let's switch gears for a little bit and think about how we might want to store our users, our customers who are placing orders. So this seems like a pretty obvious candidate to put everything in a top-level collection. It seems fairly straightforward, right? And we can store their name, and their delivery address, and their profile picture, maybe some of their favorite foods. And this all seems very reasonable. I like this. And then one day our product manager comes in with a fantastic idea-- hey, you know what we should do? We should make this thing social. Let's have our users find friends in their local area who like the same kind of food as them. It'd be great. They can get together. They can order out food together. And now we've become a food-delivery/dating app. Sounds good to me. And this is a query that we could pretty easily create in our setup. We could say, hey, let's find everyone in San Francisco where their favorites array contains Korean food, and just like that, we've found folks in our city who like Korean food. So what's the problem here? Clearly I'm leading us to some kind of problem. Otherwise it wouldn't be a very interesting presentation. It's this rule here. Remember, you can only fetch documents, not partial documents. And so when some of our random users are getting a list of people in their city who like Korean food, we're finding out all sorts of stuff about them, right? Like their name and also where they live-- oh, crap. Well, that's bad. I can't imagine anything getting worse than-- oh, crap. Right? Now we know where all these people live and how to break into their house. And this is bad. And sure, you're probably smart enough that you're not showing this data on the client. But that doesn't matter, right? The fact that this data is getting sent over means that a sufficiently motivated hacker could get at this data. And all of a sudden, you've leaked all your users' addresses and how to get into their front gate. So there are definitely other options we should be considering here. One option-- just put your addresses into a sub-collection, right, where we keep that information. That's nice, too, because now we can add multiple addresses per user. And then we can make sure that only our delivery people have access to that when they need it. If we had like payment information or anything else that we might want to keep private, we could store that in like a private info sub-collection. On the other hand, if we actually think, hey, you know what; we may be storing a lot of information about our users in these documents, we could also flip this on its head, right, and have a public profile sub-collection for each one of our users. I like this because this basically can say that everything in our user document is private by default until we explicitly take a copy of that data and put it into the public profile. And that sort of prevents a lot more accidental leaking of data. And I know I haven't spent a lot of time talking about security rules yet. But when it comes to preventing unauthorized access, both this approach and the approach on the previous slide are nice because it's generally easier to sort of have different access levels at different collections. So being able to say, hey, you know what? In this user's collection, users can only read the user document that belongs to them. But hey, these public profile sub-collections, those are open for any logged-in user to read. That kind of setup is generally easy to do using security rules, right, having sort of different security setups for different collections. That's kind of how security rules work. So let's move on to the last data object we're going to tackle, and that's orders. This is kind of interesting because it combines data from a lot of different places, right? We're going to have some elements unique to the order itself, like the time the order was placed and the delivery fee and probably a few other things. And then we're going to have information about our user, their name, where to send the food. We have information about the restaurant itself, like where to place the order, where a courier needs to go to pick up the dishes. And then, yes, we're going to have the menu items that our user has ordered, right, like, what was it? How much did Cost Did they ask for it extra spicy or hold the mayo? And again, if we were looking at this from more like a SQL background, we'd probably think of it something like this. We might have a little bit of order-specific information and then kind of foreign keys to represent all these other bits of information. And then we would do some kind of big old join before sending this information off to a restaurant to kind of process the order. But again, we don't live in a SQL world. We are a NoSQL database that's got super-fast reads and horizontal scalability and all that great stuff but no fancy joins. So this is probably sort of not the default way we want to think about storing this order. Instead we're just going to build the document with the data that we need at the time. So when our user places an order, we'll create a document, store in that order-specific data, copy over the relevant user information that we'll know from our user document, copy over whatever relevant restaurant information we need, and then also add in the food that they're ordering. And this actually is a case where I would recommend adding all the items directly to the order in a big array like this or a map instead of putting it into a sub collection. Because if you kind of think about it, anybody who's reviewing their orders, whether it's a user looking at past orders or a restaurant looking at open orders they need to process, they're probably going to want to see this menu information alongside their orders. So again, kind of stop and think, what is my app actually doing? And make the call from there. And yeah, I know is still a little strange to kind of see this duplicate data in our records. But if this is still kind of weirding you out, one of our engineers had a really nice analogy, which is instead of just thinking of this de-normalized data as de-normalized data, this data in your database really is your Realtime API for your app. If you were to make an API for your app that's like order get order, you'd basically sort of be generating a JSON-looking object that would look an awful lot like this. And so kind of the whole idea maybe with a lot of NoSQL databases and with Cloud Firestore is well, maybe just look at that data that as this API, this Realtime API for fetching orders. And that really is the data that you're going to store in your database. So if that actually helps you sort of get your mind around NoSQL databases a little better, use that. If, on the other hand, that just confused you further, then forget I said anything. So where this order is collection actually goes in our database kind of depends. Honestly, I don't care too much. You can make this a sub-collection of a restaurant or make it a sub-collection of our users or even make it another separate top-level collection. Obviously, any one of these situations is fine with me. Now that collection group queries are working, you could basically sort of query for any of these orders by the restaurant or the courier or the user. And they would all work just fine. So I would say kind of pick any one these architectures that sort of first popped into your mind intuitively, because that's probably the right one. I actually don't want to spend a lot of time on this decision because I do want to go back to the duplicate data, right? Because we do need to ask ourselves, what do we do if one of these values later changes? How can we make sure that that gets updated in our order? Well, in some cases, maybe the answer is, eh, we don't do anything, right? Like, imagine that a few days after Diana places her order, Troy decides to raise their price of their bibimbap. Well, in this case, it's probably accurate and correct to not actually change that value in her order, right? We want her order to reflect the price of the item at the time that she ordered it. So this is actually one situation where I think having this de-normalized data kind of works in our favor. But there are cases where it might make sense. Like, imagine that we've done a UX research study. And we realized that when restaurants change their name, we want to make sure that name change does get reflected in the user order because it makes it easier for the user to remember it or something like that, right? So when Troy's Tofu Hut changes they're named to Troy's Tofu Cabin, well, maybe we do want to change that in our order as well. So how we do that? Well, one option is just have that client make that change everywhere. So we've probably got some kind of client app set up for our restaurant owners, right? And we can say, all right, when that restaurant owner decides to change the name and change that on the restaurant document, we will also go ahead and do a search through all these orders that belong to this restaurant and make the change there as well. That can work. But it's a little weird to have a client of make this big transaction that changes all these orders. For starters, it's a lot of work that we're asking our client to do. And depending on the situation, it also sort of might open up some kind of strange security rule setups, right? Like, now, we've got to make sure that a restaurant can go ahead and change the restaurant field or the restaurant name field in the order document, but we probably don't want them like modifying the price of orders or adding more food onto other orders or doing anything to nefarious like that. And so this can get a little strange. And so I think in practice, this is something that would be better done with a Cloud Function. So if you don't know what Cloud Functions are, there a way for you to run server-side code by having functions that execute in response to actions that happen in your app, like, for instance, someone changes the value in a restaurant document. And because these run server side, they're generally not subject to the same security rules they you would have to have for your clients, right? These are being run in an environment you trust, as opposed to something on some person's phone somewhere. And that means you can generally sort of lock down your security rules a lot more because your Cloud Functions get to circumvent those security rules. And so I like simple and locked down when it comes to security. It just sort of means fewer things to worry about. So we can create a Cloud Function that activates when a document in one of our restaurant collections is changed, right? And now our restaurant client only has one job, and that's to go ahead and change its name in the restaurant. And then we can rely on the Cloud Function to notice that change and make the corresponding change in all those orders. And this could really be used anywhere that we've got duplicate de-normalized data that we want to keep in sync. Like, remember how we had our users and our public profile? Well, Rebecca if Becca changes her name to Rebecca, and we decide, you know what? The right thing to do is sort of always automatically update that value in her public profile as well, that is something we could sort of rely on a Cloud Function to do for us. So hey, wow, all this stuff we were worried about at the beginning, I guess it's not so scary after all, which is nice. But I know there was a lot of information to throw at you. But if you want to learn even more, because it turns out there is a lot more to cover, I have a series on the Firebase YouTube channel called "Get to know Cloud Firestore," where-- yes, take pictures of that. This is the most important thing to take a picture of-- where I cover all this in even more excruciating detail. But I also have cute cartoon characters. Because when you think of lectures involving databases, you think cute cartoon characters. They just go together. Don't forget to rate the session, if you liked it. If you didn't like it, my name was Reto Meier, and I was talking about Android Studio. And with that, I'm going to say if you have any further questions, I'll be hanging out in the Firebase dome for like another hour. I hope you all learned something. Thank you very much. And now go out and have a great rest of the conference. [APPLAUSE] [MUSIC PLAYING]

Info

Channel: Firebase

Views: 115,738

Rating: 4.974072 out of 5

Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate

Id: lW7DWV2jST0

Channel Id: undefined

Length: 40min 37sec (2437 seconds)

Published: Wed May 08 2019