How do queries work in Cloud Firestore? | Get to know Cloud Firestore #2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
TODD KERPELMAN: So one of the most important features of a good database isn't just about being able to store stuff in there. It's also about being able to find things when you need them. That's the difference between a database and, like, my kid's bedroom. Ugh, what a mess. Now finding things through most databases is done through a process known as querying. And as we talked about in the last video, NoSQL databases tend to be pretty different than SQL databases in terms of their querying capabilities. But there's still a surprising amount you can do with querying in Cloud Firestore, particularly if you're coming here from real-time database land, and it's all amazingly fast. So let's get started. [MUSIC PLAYING] So I'm going to start by saying that this episode of "Get to Know Cloud Firestore" has already been nominated as our most likely to be out of date by the time you watch it video. You see, one of the reasons we're all excited to be moving onto the Google Cloud Platform is that it gives us the ability to add more features onto the backend, and that includes being able to add more powerful queries in the future. So while I am describing the state of the world at the time of this recording, there's a very good chance things will have gotten more sophisticated after a few months. And in the meantime, you can always view the documentation for the most up-to-date information. But let's take a look at the state of querying on Cloud Firestore today, explain to you some of the current rules, and then maybe explain why Cloud Firestore works the way it does, so that these rules start to make a little more sense. So first off, queries that you run against Cloud Firestore can only be used to find documents within one specific collection or sub-collection. Let's take a look at my hypothetical restaurant review app over here. Now, if I want to find all restaurants in a specific zip code, for example, I can totally do that. That would be a matter of querying for specific documents in my restaurant collection. Similarly, if I wanted to find all -star reviews for Todd's Tacos, I could totally do that, too. I'd just be looking for documents within the Reviews sub-collection of this specific restaurant. Now, on the other hand, if I wanted to find all four-star reviews for all restaurants, I couldn't do that with the architecture the way it's set up now, because that would be trying to conduct a query that spans multiple sub-collections. Whew, did I make it? What year is it? Perfect. Hello, YouTube viewers. Todd from 2019 here, with an important update. This kind of query that spans multiple collections is something called a "collection group query." And, good news, it's something that Cloud Firestore now supports. [APPLAUSE] To enable a collection group query, you will go to the Firebase console and tell Cloud Firestore exactly which field you're going to want to search for across what collection name. For example, if I wanted to find all four-star restaurant reviews, I would go to the Firebase console and declare that, for the collection called Reviews, we should enable the field named Rating with a collection group scope. This basically tells Cloud Firestore, hey, I want you to index the rating field of every document in any collection I have called Reviews, as if it were one giant collection. Then, I can go ahead and search for those reviews by rating. Now, there's two important notes about these types of queries. First, at the time of this recording, you're limited to about 200 of these things. So only add them for queries you know you're going to want to use. And second, note that these queries look for all collections of the same name, regardless of where they appear in the database. So if I had a completely unrelated collection elsewhere in my app also called Reviews, those would be included in this index. So be careful about how you name your collections. Oh and, uh, Todd from 2018, lose the beard. Your kids are going to hate it. OK, back to our original content. I also can't do any fancy SQL-like joins that query one collection in order to grab information from a second collection. So, like, you couldn't say, hey, fetch me the documents from the Users collection for users who have written a review for a specific restaurant. Or least, you know, I couldn't do that in a single query. Another rule? Queries you run generally have to be based on equality or greater than or less than comparisons of one or more fields in the document. So, like, I can say, find me all restaurants where the city is equal to San Francisco, or find me all four-star or higher reviews from this specific sub-collection. But I couldn't, for example, query documents based on a calculation, like, hey, find me all restaurants where the rating divided by the price is greater than 1.5, unless, I guess, I create an explicit, rating divided by price field, and keep that up to date using something like Cloud Functions. Third, and I mentioned this in the previous video, but results you get back from Cloud Firestore are shallow, meaning that if I search for all restaurants that are within a specific zip code, I will just get back the documents that represent those restaurants. I won't get back all the reviews and a sub-collection attached to that restaurant or responses people gave for those reviews or health inspection reports or anything else contained in any sub-collections. I only get back these documents. And if you're coming from real-time database land, where this was not the case, this is actually a really nice improvement. It means you can keep your data stored hierarchically in a way that probably makes a lot more sense to you logically and not have to worry about downloading too much data whenever you perform a simple query. On the other hand, if you really do want to get back all this data, all at once, from a query like this, you would either need to make multiple fetches or maybe not break that information up into sub-collections. It's kind of up to you. Finally, and I guess this is less of a rule and more just, "hey, something that's really neat about Cloud Firestore," the time it takes to run a query is proportional to the number of results you get back, not the number of documents that you're searching through. So if I want to find the top five highest-rated restaurants in my zip code, that's going to be an order R operation, where R is, like, the number of results I'm requesting, meaning it will take the same time to run whether I have 60 total restaurants to search through, 60,000, or 600 million. So how does it do this? Well, basically, whenever you add a document to a collection in the database, Cloud Firestore automatically creates an index for every field in that document. You've probably heard of indexes before. They're essentially a sorted list of all the values in the field that we are indexing. Every entry in the index records the value of the field and where that corresponding document exists in the database. Now, when you have an index sorted like this, it becomes incredibly fast to find any particular value using something resembling binary search, which means searching through a database of 20 million records takes about, you know, 25 steps. And then grabbing the adjacent rows in the index from here is quite easy. And I should point out, Cloud Firestore does this not just for every field, but also for every field in every map that you add to the database. Maps, as you'll recall, are what Cloud Firestore likes to call those little "JSON-y" looking objects. So even if I store my restaurant's address like this, Firestore will create an address.zip index, and then I can still search by zip code. So, yes, this does mean that inserting or modifying documents takes a bit more time, because if our document contains, like, 20 fields, well, that's 20 indexes we would need to update every time we add a new document. But that also means my inquiries will continue to be fast, no matter how many documents I have. And you can kind of see this fits into the whole NoSQL philosophy of, hey, let's prioritize reads over writes, because it turns out, in most situations, reads happen way more often. And so every query right now in Cloud Firestore scales like this. Essentially, Cloud Firestore makes it impossible for you to run a slow query. And that's great, because it means you probably won't find yourself in a position where you suddenly have to, like, redo your entire backend architecture when you hit a million daily users. But it also means that all your queries have to follow this, hey, find a spot in the index, and then grab the adjacent records, kind of algorithm. So with that in mind, maybe you can start to see what kind of things we can look for in Cloud Firestore. Finding the restaurant Todd's Tacos? Well, that's easy. That's a binary search to find a specific string in the restaurant name index. Finding restaurants with an average rating greater than or equal to 4.5? Eh, that's also pretty easy. I can jump to my score value of 4.5, and then just grab all adjacent rows here in the index. Finding all restaurants with a rating in-between 4.5 and 4.7? Also pretty easy, right? Like, I jump to my score of 4.5 in the index, then grab all adjacent rows until I hit the first value greater than 4.7. Now, on the other hand, finding restaurants that contain the word "Taqueria" anywhere in its name, I can't do that natively in Cloud Firestore. There's no index that will give me that kind of information. There's no native pattern searching or regex searching or anything like that. If you do want to do this kind of full-text search, there are third party options for you to explore. And maybe we can talk about those in a future video. I also can't do OR queries, because this also breaks the find one point in the index and then grab all the adjacent documents from there role. You'd kind of have to do the work yourself to run both of these queries, grab all the information, and then merge the two sets of documents yourself on the client. Ultimately, if you know in advance that there's a specific OR query that you would want to use, you could add a value in the database that represents the OR'd value. For example, I can't search for "French" or "Italian" restaurants. But I could, say, add a European Cuisine field that would be true in either of those cases, and then do a query based on that field. I also can't do "not equal to" queries for kind of the same reason, or look at documents where a value doesn't exist. If it doesn't exist, it's just not going to be in the index, so, you know, I can't search for it. Another side effect of indexes, by the way, is that you're better off if you don't mix different types, like strings and numbers in the same field. I mean, it's totally possible to do this. That's the flexibility you get from having a NoSQL database. But as soon as you start to mix strings into your numeric field, you're going to end up with two indexes, and you'll have to do two searches-- one for your numeric values and a separate one for your text values. And that's almost never what you actually want. So if at all possible, try not to mix types on a field that you might be querying on. But now let's get into one of the interesting features of Cloud Firestore, and that's being able to query multiple fields at once. Say, for instance, I want to find all Japanese restaurants in San Francisco. Well, if you're querying through multiple fields, and all your conditions are equality searches like this one, Cloud Firestore can cleverly join these multiple searches and do it in a way that still scales proportional to your results set. And it does this through-- and this might be my favorite new algorithm name ever-- a zig-zag merge join. Basically, within our indexes, when we sort by a particular value, we do a secondary sort by the document ID. And that means that when we're doing an equality kind of query, we're guaranteed that all documents that meet this criteria will already be sorted by their document ID. And because of that, it ends up being really easy for Cloud Firestore to, like, bounce back and forth between these two lists and find documents that are in both of those lists, and then return them. So let's say my restaurant app has a bunch of Boolean values to represent flags I might want to record about each one, like, you know, takes reservations, kid-friendly, romantic, and so on. Well, since Boolean queries are pretty much always equality ones, I can create queries for "find all Japanese restaurants in San Francisco that are kid-friendly and take reservations." And this kind of query works just fine in Cloud Firestore and stays nice in [INAUDIBLE]. The trick comes when I want to introduce inequality searches, kind of greater than or less than kinds of queries. Suppose I want to find all restaurants in my zip code that have a rating greater than 4.6. Well, huh, I have this index for zip codes and this other one for ratings, but there's no easy way for me to intersect these two. These aren't sorted in a way that I can perform a zig-zag merge join. So what do I do? Well, in the Firebase Realtime Database days, the way you would do a query like this would be to make a custom "zip code concatenated with rating" field in your document. This would basically be an entry you would maintain for every restaurant in your database that would take the zip code and rating and put them together in the same field. Now, by doing that, and then indexing this field, you could create an index where you can easily search for a specific zip code, and then from there, find anything of a certain rating. Now this worked, and again, was super fast. But it was also-- and this is a bit of a technical term-- a giant pain in the butt. Yes, it's a technical term. I looked it up. You should Google it. Nobody really wants to manage these combo fields, right? They're a hassle to build and keep up to date whenever you want to change a value. So how is Cloud Firestore different? Well, Cloud Firestore says, oh, I know, these combo fields, they're, like, such a pain, right? So let's fix that by making combo fields. Well, all right, it's actually more sophisticated than that. There aren't any actual extra fields in your database. These things only exist at the index level. And Cloud Firestore does all the work of building and maintaining these for you. It also has a much better name for them-- composite indexes. Now, it should be noted we don't or can't automatically create a composite index for every single combination of fields in your document. There's just sort of too many options. A document with just 20 fields would have-- let me think about it-- carry the 4-- eh, a little over 6 quintillion different combinations. And it turns out that updating 6 quintillion indexes is pretty processor-intensive, even for Google. So instead, you're going to need to tell Cloud Firestore what kind of composite indexes you're going to want to have available for your app. Now, there are two ways to do this. One way is to create these indexes manually through the Firebase console. And I guess that works if you consider yourself a composite index aficionado. But honestly, the way I usually recommend creating these things is to just run a query in your app that would require one of these composite indexes. Cloud Firestore will notice that the index isn't available yet to support this query, and it will give you an error in your Xcode, Android Studio, or browser console logs. Now, I know this sounds bad. But in the text of this error is a URL that will take you right into the Firebase console to generate exactly the composite index you need to run this query. Just click on the button that says, why, yes, I would like to create this index, and you're done. And so you can basically go through life just clicking on these links and never really understand anything more about composite indexes again. But personally, I kind of like understanding how these things work, because they can help explain some of the restrictions you'll run into when it comes to running queries on multiple fields like this. So stick with me. Let's look into these things a little further. Now, a nice general rule for creating composite indexes is to take the thing you're going to do the greater than or less than query on and put it last. For instance, I might want to know what restaurants in a specific zip code have a rating of 4.5 or more. So I would create a composite index that has the zip code first, and then the rating last. Keep this value sorted, and I can find any specific zip code and rating and start doing less than or greater than searches on the rating from there. And what if I want to find Japanese restaurants in San Francisco and also limit my results to those with a rating of or more? I could totally do that too, right? I just have to make sure I have an index that includes both the city and cuisine-- and honestly, the order of these first two doesn't matter too much-- and then put the rating last. So I can now run a lot of useful queries, thanks to these composite indexes. Note, though, that I can't do two inequality comparisons in the same query. Like, I can't look for restaurants with a rating of 4 or more and also a noise level of 3 or more, because there's no way I can make a composite index in a way that has all these results adjacent. Now, you can kind of see that here. Whether I have my rating first or my noise level first, there's no way I can get all my results grouped in a nice, adjacent block. Now, on the other hand, this composite index could still be used for sorting my results. For example, maybe I can just say, hey, give me all restaurants with a rating of 4 or more, but then sort those results, first by rating, and then by noise level. Well, that actually is something I could do with a rating and noise level composite index, because then it could just grab all adjacent rows, and they would be properly sorted. And that's something I haven't really talked about yet, which is that these composite indexes aren't just used for querying. They can also be used to order my results. Although, there are some restrictions around this. For example, if you're doing an inequality condition in your query, that's the field that you also need to order your results by. So, like, I could search for Japanese restaurants in San Francisco with a rating of 4 or more using a composite index, but then I can't get these results sorted by name, at least not directly from the query itself. Remember, my composite index is already sorted by city, cuisine, and then rating. And that's the order they're going to get pulled in. So if you want to sort them in a different way, you're going to need to do that on the client or something. On the other hand, if I want to find all restaurants in San Francisco sorted by cuisine and then rating, well, this index here is perfect for that. And so that's why when you're building these composite indexes, you're sometimes given the option of specifying all these fields as ascending or descending. I mean, if you think about it, it doesn't really matter how those first few fields are sorted from a querying point-of-view, because you can only do that greater than or less than search on the last one. So it almost feels like I don't care what direction these other fields are indexed in. But it sometimes can be helpful if you're like, oh, yeah, I think I'm going to want a query for all restaurants, any specific city, but sorted by rating in descended order and, like, noise level in ascending order. In that kind of query, I guess it does matter what direction those last two fields are sorted in. And, you know, hey, if these last three minutes of the video were, like, way too into the weeds for you and now you're like, ah, you've totally confused me and I don't know what kind of indexes to make anymore, don't worry about it. Seriously, don't worry about it. You'll be fine. It's OK. Just calm down. Just run the indexes you want to perform on the client, and then follow the link that the library gives you in your debug output to create the perfect composite index. So there you go. That's the basics of querying and sorting within Cloud Firestore. I think once you get the handle of composite indexes, you'll find that there's a lot that you can do. And it is pretty awesome to see how fast those results come in. So I know we've covered a lot here. But there's plenty of other stuff to talk about, including more fun around data structures and pagination, all of which are great topics for future videos. But for now, if you'll excuse me, I lost my keys, and I'm pretty sure they are somewhere in my kid's room. I'd better start looking. As for the rest of you, I will see you soon on another episode of "Get to Know Cloud Firestore." OK, gang, show's over. You gotta go. Beat it. You don't have to go home, but you can't stay here. See ya. Nope, you too. [MUSIC PLAYING]
Info
Channel: Firebase
Views: 267,454
Rating: undefined out of 5
Keywords: Cloud firestore, firebase firestore, firestore, NoSQL, Queries, querying, composite indexes, GCP, google cloud, app developers, mobile app developers, cloud database, NoSQL database, client side development, server side development, mobile app, web developers, Firebase Realtime database, client apps, realtime listeners, responsive apps, cloud functions, firebase cloud functions, android, flutter, firebase, apps, app, iOS, firebase developers, GDS: Yes;
Id: Ofux_4c94FI
Channel Id: undefined
Length: 17min 16sec (1036 seconds)
Published: Wed May 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.