TODD KERPELMAN: So one of
the most important features of a good database
isn't just about being able to store stuff in there. It's also about being able to
find things when you need them. That's the difference between
a database and, like, my kid's bedroom. Ugh, what a mess. Now finding things
through most databases is done through a process
known as querying. And as we talked about
in the last video, NoSQL databases tend
to be pretty different than SQL databases in terms of
their querying capabilities. But there's still
a surprising amount you can do with querying
in Cloud Firestore, particularly if
you're coming here from real-time database land,
and it's all amazingly fast. So let's get started. [MUSIC PLAYING] So I'm going to start by saying
that this episode of "Get to Know Cloud Firestore"
has already been nominated as our most likely to be out of
date by the time you watch it video. You see, one of the
reasons we're all excited to be moving onto
the Google Cloud Platform is that it gives us the
ability to add more features onto the backend,
and that includes being able to add more
powerful queries in the future. So while I am describing
the state of the world at the time of this
recording, there's a very good chance
things will have gotten more sophisticated
after a few months. And in the meantime,
you can always view the documentation for the
most up-to-date information. But let's take a look
at the state of querying on Cloud Firestore
today, explain to you some of the current rules,
and then maybe explain why Cloud Firestore
works the way it does, so that these rules start
to make a little more sense. So first off, queries that you
run against Cloud Firestore can only be used
to find documents within one specific
collection or sub-collection. Let's take a look at my
hypothetical restaurant review app over here. Now, if I want to find all
restaurants in a specific zip code, for example, I
can totally do that. That would be a matter of
querying for specific documents in my restaurant collection. Similarly, if I wanted
to find all -star reviews for Todd's Tacos, I
could totally do that, too. I'd just be looking
for documents within the Reviews
sub-collection of this specific restaurant. Now, on the other
hand, if I wanted to find all four-star
reviews for all restaurants, I couldn't do that with
the architecture the way it's set up now,
because that would be trying to
conduct a query that spans multiple sub-collections. Whew, did I make it? What year is it? Perfect. Hello, YouTube viewers. Todd from 2019 here,
with an important update. This kind of query that
spans multiple collections is something called a
"collection group query." And, good news, it's something
that Cloud Firestore now supports. [APPLAUSE] To enable a collection
group query, you will go to the
Firebase console and tell Cloud Firestore
exactly which field you're going to want to search for
across what collection name. For example, if I wanted to
find all four-star restaurant reviews, I would go to
the Firebase console and declare that, for the
collection called Reviews, we should enable the field named
Rating with a collection group scope. This basically tells
Cloud Firestore, hey, I want you to index
the rating field of every document
in any collection I have called Reviews, as if
it were one giant collection. Then, I can go ahead and search
for those reviews by rating. Now, there's two important notes
about these types of queries. First, at the time
of this recording, you're limited to about
200 of these things. So only add them
for queries you know you're going to want to use. And second, note
that these queries look for all collections
of the same name, regardless of where they
appear in the database. So if I had a completely
unrelated collection elsewhere in my app also
called Reviews, those would be included in this index. So be careful about how
you name your collections. Oh and, uh, Todd from
2018, lose the beard. Your kids are going to hate it. OK, back to our
original content. I also can't do any fancy
SQL-like joins that query one collection in order
to grab information from a second collection. So, like, you couldn't say,
hey, fetch me the documents from the Users
collection for users who have written a review
for a specific restaurant. Or least, you know, I couldn't
do that in a single query. Another rule? Queries you run
generally have to be based on equality or
greater than or less than comparisons of one or
more fields in the document. So, like, I can say, find me all
restaurants where the city is equal to San Francisco, or
find me all four-star or higher reviews from this
specific sub-collection. But I couldn't, for example,
query documents based on a calculation,
like, hey, find me all restaurants where the
rating divided by the price is greater than 1.5, unless,
I guess, I create an explicit, rating divided by price
field, and keep that up to date using something
like Cloud Functions. Third, and I mentioned
this in the previous video, but results you get back
from Cloud Firestore are shallow, meaning that if I
search for all restaurants that are within a
specific zip code, I will just get back
the documents that represent those restaurants. I won't get back all the reviews
and a sub-collection attached to that restaurant or responses
people gave for those reviews or health inspection reports
or anything else contained in any sub-collections. I only get back these documents. And if you're coming from
real-time database land, where this was not the
case, this is actually a really nice improvement. It means you can keep your
data stored hierarchically in a way that probably makes a
lot more sense to you logically and not have to worry about
downloading too much data whenever you perform
a simple query. On the other hand,
if you really do want to get back all
this data, all at once, from a query like
this, you would either need to make
multiple fetches or maybe not break
that information up into sub-collections. It's kind of up to you. Finally, and I guess this is
less of a rule and more just, "hey, something that's really
neat about Cloud Firestore," the time it takes
to run a query is proportional to the number
of results you get back, not the number of documents
that you're searching through. So if I want to find the top
five highest-rated restaurants in my zip code, that's going to
be an order R operation, where R is, like, the number of
results I'm requesting, meaning it will take the same
time to run whether I have 60 total restaurants to
search through, 60,000, or 600 million. So how does it do this? Well, basically, whenever
you add a document to a collection in the database,
Cloud Firestore automatically creates an index for every
field in that document. You've probably heard
of indexes before. They're essentially
a sorted list of all the values in the
field that we are indexing. Every entry in the index
records the value of the field and where that
corresponding document exists in the database. Now, when you have an
index sorted like this, it becomes incredibly fast to
find any particular value using something resembling
binary search, which means searching through a
database of 20 million records takes about, you know, 25 steps. And then grabbing the adjacent
rows in the index from here is quite easy. And I should point
out, Cloud Firestore does this not just
for every field, but also for every
field in every map that you add to the database. Maps, as you'll recall, are what
Cloud Firestore likes to call those little "JSON-y"
looking objects. So even if I store my
restaurant's address like this, Firestore will create
an address.zip index, and then I can still
search by zip code. So, yes, this does mean
that inserting or modifying documents takes a bit more
time, because if our document contains, like, 20
fields, well, that's 20 indexes we would need
to update every time we add a new document. But that also means my inquiries
will continue to be fast, no matter how many
documents I have. And you can kind
of see this fits into the whole NoSQL philosophy
of, hey, let's prioritize reads over writes, because it
turns out, in most situations, reads happen way more often. And so every query right now
in Cloud Firestore scales like this. Essentially, Cloud Firestore
makes it impossible for you to run a slow query. And that's great, because
it means you probably won't find yourself
in a position where you suddenly
have to, like, redo your entire backend
architecture when you hit a million daily users. But it also means
that all your queries have to follow this, hey,
find a spot in the index, and then grab the adjacent
records, kind of algorithm. So with that in
mind, maybe you can start to see what
kind of things we can look for in Cloud Firestore. Finding the restaurant
Todd's Tacos? Well, that's easy. That's a binary search
to find a specific string in the restaurant name index. Finding restaurants
with an average rating greater than or equal to 4.5? Eh, that's also pretty easy. I can jump to my
score value of 4.5, and then just grab all adjacent
rows here in the index. Finding all restaurants with a
rating in-between 4.5 and 4.7? Also pretty easy, right? Like, I jump to my score
of 4.5 in the index, then grab all adjacent rows
until I hit the first value greater than 4.7. Now, on the other hand,
finding restaurants that contain the word
"Taqueria" anywhere in its name, I can't do that natively
in Cloud Firestore. There's no index that will give
me that kind of information. There's no native pattern
searching or regex searching or anything like that. If you do want to do this
kind of full-text search, there are third party
options for you to explore. And maybe we can talk about
those in a future video. I also can't do OR
queries, because this also breaks the find one
point in the index and then grab all the adjacent
documents from there role. You'd kind of have to
do the work yourself to run both of these queries,
grab all the information, and then merge the
two sets of documents yourself on the client. Ultimately, if you
know in advance that there's a specific OR query
that you would want to use, you could add a
value in the database that represents the OR'd value. For example, I can't search
for "French" or "Italian" restaurants. But I could, say, add a
European Cuisine field that would be true in
either of those cases, and then do a query
based on that field. I also can't do "not
equal to" queries for kind of the same
reason, or look at documents where a value doesn't exist. If it doesn't exist, it's just
not going to be in the index, so, you know, I
can't search for it. Another side effect of
indexes, by the way, is that you're better
off if you don't mix different
types, like strings and numbers in the same field. I mean, it's totally
possible to do this. That's the flexibility you get
from having a NoSQL database. But as soon as you
start to mix strings into your numeric
field, you're going to end up with two
indexes, and you'll have to do two searches--
one for your numeric values and a separate one
for your text values. And that's almost never
what you actually want. So if at all possible, try
not to mix types on a field that you might be querying on. But now let's get into one
of the interesting features of Cloud Firestore,
and that's being able to query multiple
fields at once. Say, for instance, I want to
find all Japanese restaurants in San Francisco. Well, if you're querying
through multiple fields, and all your conditions
are equality searches like this one, Cloud
Firestore can cleverly join these multiple
searches and do it in a way that still scales proportional
to your results set. And it does this through-- and
this might be my favorite new algorithm name ever-- a zig-zag merge join. Basically, within
our indexes, when we sort by a particular
value, we do a secondary sort by the document ID. And that means that when
we're doing an equality kind of query, we're guaranteed
that all documents that meet this criteria will already
be sorted by their document ID. And because of that,
it ends up being really easy for Cloud
Firestore to, like, bounce back and forth
between these two lists and find documents that
are in both of those lists, and then return them. So let's say my restaurant app
has a bunch of Boolean values to represent flags I might
want to record about each one, like, you know,
takes reservations, kid-friendly,
romantic, and so on. Well, since Boolean queries
are pretty much always equality ones, I
can create queries for "find all
Japanese restaurants in San Francisco
that are kid-friendly and take reservations." And this kind of query works
just fine in Cloud Firestore and stays nice in [INAUDIBLE]. The trick comes when
I want to introduce inequality searches, kind
of greater than or less than kinds of queries. Suppose I want to find all
restaurants in my zip code that have a rating
greater than 4.6. Well, huh, I have this
index for zip codes and this other one for ratings,
but there's no easy way for me to intersect these two. These aren't sorted
in a way that I can perform a zig-zag merge join. So what do I do? Well, in the Firebase
Realtime Database days, the way you would
do a query like this would be to make a custom "zip
code concatenated with rating" field in your document. This would basically be an
entry you would maintain for every restaurant
in your database that would take the
zip code and rating and put them together
in the same field. Now, by doing that, and
then indexing this field, you could create an index
where you can easily search for a specific zip
code, and then from there, find anything of
a certain rating. Now this worked, and
again, was super fast. But it was also-- and this is a bit of
a technical term-- a giant pain in the butt. Yes, it's a technical term. I looked it up. You should Google it. Nobody really wants to manage
these combo fields, right? They're a hassle to build
and keep up to date whenever you want to change a value. So how is Cloud
Firestore different? Well, Cloud Firestore says,
oh, I know, these combo fields, they're, like,
such a pain, right? So let's fix that by
making combo fields. Well, all right, it's actually
more sophisticated than that. There aren't any actual extra
fields in your database. These things only exist
at the index level. And Cloud Firestore
does all the work of building and
maintaining these for you. It also has a much better name
for them-- composite indexes. Now, it should be noted we
don't or can't automatically create a composite index
for every single combination of fields in your document. There's just sort
of too many options. A document with
just 20 fields would have-- let me think
about it-- carry the 4-- eh, a little
over 6 quintillion different combinations. And it turns out that
updating 6 quintillion indexes is pretty processor-intensive,
even for Google. So instead, you're going to need
to tell Cloud Firestore what kind of composite indexes
you're going to want to have available for your app. Now, there are two
ways to do this. One way is to create
these indexes manually through the Firebase console. And I guess that works
if you consider yourself a composite index aficionado. But honestly, the way I usually
recommend creating these things is to just run a
query in your app that would require one of
these composite indexes. Cloud Firestore will notice that
the index isn't available yet to support this query,
and it will give you an error in your
Xcode, Android Studio, or browser console logs. Now, I know this sounds bad. But in the text of this error
is a URL that will take you right into the Firebase
console to generate exactly the composite index
you need to run this query. Just click on the
button that says, why, yes, I would like to create
this index, and you're done. And so you can basically go
through life just clicking on these links and never
really understand anything more about composite
indexes again. But personally, I kind
of like understanding how these things
work, because they can help explain some of
the restrictions you'll run into when it comes
to running queries on multiple fields like this. So stick with me. Let's look into these
things a little further. Now, a nice general rule for
creating composite indexes is to take the
thing you're going to do the greater than or less
than query on and put it last. For instance, I
might want to know what restaurants in
a specific zip code have a rating of 4.5 or more. So I would create
a composite index that has the zip code first,
and then the rating last. Keep this value sorted, and
I can find any specific zip code and rating and start
doing less than or greater than searches on the
rating from there. And what if I want to find
Japanese restaurants in San Francisco and also limit
my results to those with a rating of or more? I could totally do
that too, right? I just have to make
sure I have an index that includes both
the city and cuisine-- and honestly, the order
of these first two doesn't matter too much-- and then put the rating last. So I can now run a
lot of useful queries, thanks to these
composite indexes. Note, though, that I can't
do two inequality comparisons in the same query. Like, I can't look
for restaurants with a rating of
4 or more and also a noise level of 3 or
more, because there's no way I can make a
composite index in a way that has all these results adjacent. Now, you can kind
of see that here. Whether I have my rating
first or my noise level first, there's no way I can get
all my results grouped in a nice, adjacent block. Now, on the other hand,
this composite index could still be used
for sorting my results. For example, maybe I can
just say, hey, give me all restaurants with
a rating of 4 or more, but then sort those results,
first by rating, and then by noise level. Well, that actually is something
I could do with a rating and noise level composite index,
because then it could just grab all adjacent rows, and
they would be properly sorted. And that's something
I haven't really talked about yet, which is that
these composite indexes aren't just used for querying. They can also be used
to order my results. Although, there are some
restrictions around this. For example, if you're doing
an inequality condition in your query, that's
the field that you also need to order your results by. So, like, I could search for
Japanese restaurants in San Francisco with a rating of 4 or
more using a composite index, but then I can't
get these results sorted by name, at least not
directly from the query itself. Remember, my composite
index is already sorted by city, cuisine,
and then rating. And that's the order they're
going to get pulled in. So if you want to sort
them in a different way, you're going to need to do that
on the client or something. On the other hand, if I
want to find all restaurants in San Francisco sorted by
cuisine and then rating, well, this index here
is perfect for that. And so that's why
when you're building these composite indexes,
you're sometimes given the option of specifying
all these fields as ascending or descending. I mean, if you think about
it, it doesn't really matter how those first
few fields are sorted from a querying point-of-view,
because you can only do that greater than or less
than search on the last one. So it almost feels
like I don't care what direction these other
fields are indexed in. But it sometimes can be helpful
if you're like, oh, yeah, I think I'm going to want a
query for all restaurants, any specific city,
but sorted by rating in descended order and, like,
noise level in ascending order. In that kind of
query, I guess it does matter what direction those
last two fields are sorted in. And, you know, hey, if these
last three minutes of the video were, like, way too
into the weeds for you and now you're like, ah,
you've totally confused me and I don't know what kind
of indexes to make anymore, don't worry about it. Seriously, don't worry about it. You'll be fine. It's OK. Just calm down. Just run the indexes you want
to perform on the client, and then follow the link
that the library gives you in your debug output to create
the perfect composite index. So there you go. That's the basics of
querying and sorting within Cloud Firestore. I think once you get the
handle of composite indexes, you'll find that there's
a lot that you can do. And it is pretty awesome to see
how fast those results come in. So I know we've
covered a lot here. But there's plenty
of other stuff to talk about, including more
fun around data structures and pagination, all of which are
great topics for future videos. But for now, if you'll
excuse me, I lost my keys, and I'm pretty sure they are
somewhere in my kid's room. I'd better start looking. As for the rest of you,
I will see you soon on another episode of "Get
to Know Cloud Firestore." OK, gang, show's over. You gotta go. Beat it. You don't have to go home,
but you can't stay here. See ya. Nope, you too. [MUSIC PLAYING]