SPEAKER: So as you recall
from our last video, there are a number
of rules governing how Cloud Firestore works
that all play a part in how you should structure your data. And while it's fine to talk
about these rules in theory, I think sometimes
it's better if we can put these in the
context of some real-world or close to real-world examples. So come on with me. Let's figure out the best
way to structure our data on this episode of Get
to Know Cloud Firestore. [MUSIC PLAYING] So for the purposes
of this video, let's keep pretending
that I've got this great restaurant-review
app that I'm powering through Cloud Firestore. Maybe it's got a
search page where you can search for the
top-20 Japanese restaurants in Boston, stuff like that. And from the Search
Result screen a user could click
on a restaurant to see details about it. This Detail screen
would have more info about the restaurant along
with maybe a few snippets from its most recent reviews. We'd have an option for a
user to see more reviews, and if they click on any
of these review snippets we would show them the
full and complete review, so pretty standard. Let's tackle a few
questions around how we want to store our data. For starters, where
would we put the reviews? When I've used this
example in previous videos, I've generally assumed we've had
this kind of database design. We've got the restaurants
in one collection and all the reviews are in
individual subcollections. But is this actually
the right design? Let's look at a couple
of alternatives. Alternative number one,
let's not make these reviews separate documents. We can stick all the review
data into the main restaurant document as a map
or an array of maps. Once a user clicks
on a restaurant, I could give them these
little review snippets immediately because I've
already got them loaded up. And hey, as a nice
little side effect, I've also saved myself a
bunch of document reads. So this is an alternative,
but maybe not a very good one. One problem with
keeping our reviews in a map field like this
is that when a user decides to search for the top-20
Japanese restaurants in their city, we're not only
grabbing all the information about the restaurant
but we're also grabbing every single review for
every restaurant on that list. And that is a bunch of
reviews and, frankly, a whole lot of data that they
will probably never need. I'm also going to
run into limits, either the 1 meg of data limit
or the 20,000-field limit if these restaurants
get too many reviews. I also can't query these
reviews either, right? I can't filter for just the 10
most recent reviews or anything like that. I only get back my giant
restaurant document with all the reviews
in there, and if I want to sort those reviews
or filter them or something, I would need to do
that on the client. So I think keeping reviews in
their own separate collection is the right way to
go, but let's look at our second alternative which
is what if rather than keeping these reviews as a subcollection
for each restaurant they were just in a completely
separate top-level collection? Well, now this gets interesting. Your first impression might
be like, oh, don't do that. It's going to be a lot
harder to get reviews for a single restaurant. But that's not the case. Remember, queries
in Cloud Firestore are always fast, so grabbing
all reviews for Todd's Tacos from a single collection
of 3 million reviews takes essentially the
same amount of time as just grabbing all the reviews
from a single subcollection. More importantly, by
putting everything into a top-level
collection like this, I can do searches
for reviews that span different restaurants, like
I could search for all reviews by an individual author. That's not something I
could do by keeping them in separate subcollections
like this, at least not until collection-group
queries get supported. And so if I went with the
subcollection strategy and I knew I was
going to have to find all reviews by a
single user, I'd need some kind of
workaround that I would be doing in advance. Probably what I'd is over
in the user collection I'd keep a list
of all the reviews that this user has
written and then use Cloud Functions to keep
all this data up to date. Sure, it's not the
end of the world, but none of that
extra work would be needed if reviews were just
in another top-level collection like this. On the other hand,
if I have everything in a big top-level
collection, some things are a little harder. Specifically doing
something simple like grabbing reviews for
a restaurant sorted by date or sorted by score would
require a composite index, whereas I wouldn't need that
if these reviews were already in their own subcollection. Also, I suspect a
number of security rules will be easier to write when
I'm dealing with subcollections of a specific document instead
of a top-level collection, mostly in cases where you're
going to want to restrict access to a subcollection
based on information that's in their parent document. But really, either one of
these solutions is fine. I've seen them both
used in the wild. But if I'm going to be
totally honest with you, the main reason I went with
reviews as a subcollection is that it makes it easier
for me to emphasize the whole, hey, look, Firestore
supports shallow queries thing. So I'm going to keep using this
one for demonstration purposes, but you feel free to
do what's best for you. Now as long as we're here, let's
talk one further refinement. Remember that in my
Restaurant Details page I've got my restaurant info
and then these snippets from 10 recent review
section at the bottom. And so with my current
database structure, I'm either populating
these reviews snippets from a review subcollection
or my reviews top-level collection. And that's fine, but that
does mean bringing up a single restaurant page is now
11 document reads instead of 1. And yes, I will get
billed for those. And if you consider that
a typical user might click on some restaurant
details, skim the reviews, go back and repeat that
process several times, that can add up
to a lot of reads. And so if I really wanted to
avoid all those extra reads, I might look at creating
a review snippets field in the original
restaurant document. This could contain the
first hundred words of the 10 most recent reviews,
maybe a headline and the author name-- basically enough to create
a presentable review snippet but not so much
that I'm downloading a ton of unnecessary
data for each restaurant. And again with some
Cloud Functions, which I promise I will talk
about in a future video, I can keep those in sync. Now this is nice because
I'm now only reading in 1 document instead of 11,
and it simplifies the loading of this page because I only
need that one document which, most of the time, will
already be in memory. Plus if I ever do need to show
more reviews or the full text of one of those reviews,
then I can read those in from the individual
review documents. But is this optimization
worth the extra code needed to keep
these values in sync along with the extra
data our user is loading in every time they
bring in a restaurant? Honestly, I'm not sure. Viewing restaurant details
is a user-powered action, which means it's not going
to happen several times a second, right? It will more often
happen a few times over the course of a
several-minute session. So this could be a case where
you end up doing a bunch of work to save a million
read a day or $0.60, which may or may
not be worth it. So I think this is one of
those situations where you're going to have to measure what's
actually happening in your app and decide accordingly. I know that's kind of a
wishy-washy recommendation, but that's software
development for you. OK, next example-- how would
we store flags for restaurants? And I'm talking about
little binary values for takes reservations,
is romantic, and so on. Put those away. Two months ago, I would
have said the best way to store these would
have been either to make these as individual
fields inside the restaurant document or group
them into a map field. And to be honest,
these are still perfectly reasonable solutions. But now that we've added
the array_contains field, I could just as easily put
these into an attributes array. Then it would be pretty easy to
search for all restaurants that take reservations. But keep in mind
that right now you can't run two array_contains
queries on the same array. So if I really do want to look
for restaurants that are both romantic and kid friendly--
which is kind of an oxymoron, I know-- I should keep using
separate fields. Should have read
the reviews first. OK, now here's a
fairly common case you're going to come across. What about storing
a list of users who can access a document? Now this is a very
common piece of data you're going to want to
keep track of, particularly in collaborative apps
where users can invite their friends to view
or edit their data, and you're going to need to
keep track of who can edit what. For example, let's
say a restaurant has a list of editors, and
these are the user IDs of people who are allowed to
edit the restaurant document. Now I know I haven't really
talked about security rules yet. That will be in the next video. But one thing you should
know about security rules is that they have the ability
to grab a specific document and read its content. So I could, for
instance, put everybody who's allowed to edit
a particular document into an array, and
then my security rules could say something
like, hey, only let users edit this document
if their user ID appears in this array. But what could work better
would be to use a map here where a user ID is the
key and their value is their role, something like
editor, owner, community manager, et cetera. And so we could
write a security rule that says a user can
edit this document if document.roles.userID equals
editor or something like that. Now this is great, but it
does have one small issue in that when you
retrieve this document, all the contents
of this document are transferred along with it. Remember there's no such
thing as a partial document query on the client. So you're kind of
leaking the list of users and their roles for
this restaurant. Now this may be OK. These user IDs
are pretty opaque. You can't really reverse
engineer them to figure out who the underlying
user really is, and they are unique per app. But still, the
phrase this may be OK isn't super reassuring when
we're talking about security. So this information
might be better stored outside of this document
in one of two ways. Option one, have a distinct
editor subcollection where the document ID might
be the user ID of the user, and maybe we store
this person's role and some other little
information about them inside each document. Or option number two,
just have a subcollection called private data that
contains only one document. And this is where
you can keep your "who has access to what" map
that we talked about earlier along with other information
that you might not want to share with
the general public-- maybe data for your
internal sales team or data that Cloud Functions will
need to perform its duties. Now both of these examples
work fine, but definitely think about this
kind of situation for your app because
you will probably want to create access
control lists like this and you will probably have data
associated with the document that maybe you don't
want everybody to see. OK, last example, but maybe
the most important one. What if my user
wanted to store a list of their favorite restaurants? This is the kind of
many-to-many relationship that often is problematic
in NoSQL databases. So first, let's assume that
our database has some kind of users collection somewhere. Well, one option is we can just
store a list of restaurant IDs as an array in a favorites
field for each of these user documents. Thanks to our new array
union and removal methods, it should be pretty easy
to maintain this list as our user adds or
removes restaurants from their favorites. If I load this array into memory
when our application first starts up, my client
could very easily tag a restaurant as a favorite
when we see it in the app. The problem, though, is
that it's hard for us to do a "hey, here's all
your favorite restaurants" kind of page using
an array like this. I would basically have
to do a separate query for every single restaurant ID
that I get back from this array in order to populate this page,
or maybe slightly better, get a callable Cloud Function
to do this work for me, and I'll talk about
that in a later video. Now this isn't a
great experience, but it's not terrible either. And if I think
this feature isn't going to be used
very often, then this would be a perfectly
fine solution. Don't let this scare you away. On the other hand,
if this is going to be a frequently
used feature, there may be ways of making this
a little easier and more performant by using
some denormalized data. For example, instead
of just storing a list of restaurant IDs,
I can keep a big old map of maps that would contain
enough data to populate a simple My Favorite
Restaurant screen. Maybe I have the restaurant
name, cuisine type, and address stored in this favorites field. That might be enough to
populate the My Favorite screen. And if the user ever wanted more
information about a restaurant, well, then I can query the
full restaurant document and populate a Restaurant
Detail screen just like normal. But the trick with
denormalized data is that we need to make sure
that if Todd's Tofu Hut ever changes their cuisine
from Japanese to burgers, every copy of that
denormalized data has to change as
well, which means we need to be able to
query every user who has this restaurant listed
among their favorites so we can make that change. As it turns out
though, we can do that. I can basically create a
query that says something like select all
users where favorites dot restaurant 4215 is
greater than the empty string. That would give me a list
of all user documents where this restaurant is listed
as a favorite in the map field somewhere, and it would be
fairly easy to have a Cloud Function update all those. Now the disadvantage
is that this is a big chunk of extra data
that I'm loading per user, and we are eating
away at my 1 meg slash 20,000 fields limit here. So I'm going to need to limit
the number of restaurants my user lists as a favorite. Although honestly, that
kind of makes sense. If you list 500 restaurants
as your favorite, hasn't the word favorite
lost all meaning? This got kind of philosophical. Now another option might be to
keep my user's favorite list as an array of IDs and then
keep the restaurant snippets inside a subcollection. I could then load a list of
the user's favorite restaurants in memory without loading
all the extra restaurant details along with
them, but they're also easy to load up if I ever need
to show this favorite screen by querying the subcollection. And I can still
find a list of users who have favorited a particular
restaurant by running an array_contains query
across my users collection. Now once I have that,
it's fairly easy to get at the individual
snippets that I need to change, although you'll notice I can't
do it all in one single query now. I basically have to fetch
these documents one at a time. Also, just a heads up, if you
have an array that you're going to use with an
array_contains query, those array elements
will still count against your 20,000-fields limit. So I'm still going
to need to limit my user's favorite restaurants
to a reasonable number. But you know what option
might work best here-- a completely separate
top-level collection. Imagine that I have a
FavoriteRestaurants collection. Every document in there would
contain a user ID, a restaurant ID, and enough of a snippet
about the restaurant to populate a My Favorite
Restaurants page. Now grabbing all the restaurants
favorited by a specific user is a very simple query, so
creating that My Favorites page is very straightforward,
and it's just as easy to grab all the instances
of a particular restaurant if I ever need to change or
update this denormalized data. So you know what? I'm going to say this is my
favorite option, at least for this particular setup. Whew. All right, well that was a
lot of examples to go through, but I'm hoping by now you're
getting a better sense of when to use documents,
when to use maps, when to use subcollections,
when to use arrays, and when to put things in a
separate top-level collection. Now there's always
more to explore here, and as you've noticed,
there's rarely one simple answer to anything. There will always be
trade-offs no matter what option you choose. So hey, if you have a
setup you like better than any of these situations,
go ahead and share them in the comments below. And I will see you on
a future episode of Get to Know Cloud Firestore. Well, I think that's a wrap. All you all hungry? Let's go to Todd's Tofu Hut. I hear they serve burgers now. No, no, meat burgers. I don't know why they
didn't change the name. Ask them. No, it's a different Todd. Come on, let's just go. [MUSIC PLAYING]