SPEAKER: So if you've
been following along with this video series, maybe
you've noticed something here. Cloud Firestore
consists of a bunch of documents,
which are basically key value pairs, as well
as collections, which are collections of documents. But then, within
these documents, we let you store maps,
the JSON-y looking things, that also kind of work
as key value stores-- you know, like a document. Not only that, but
you can also store arrays, which looks
suspiciously like collections. So at some point, you might
start asking yourself, well, why do we even have
separate documents when we could just, say, put
all of our data into these maps? Why create a subcollection
when I could just create an array of maps,
or even a map of maps? When is it appropriate to
use one versus the other, and what would work
best in my app? Good questions. Let's see if we can find some
answers on this episode of "Get to Know Cloud Firestore." [MUSIC PLAYING] So when it comes to
determining the best way to structure your
data in Cloud Firestore, it's helpful to
remember some rules about how Firestore works. Let's go over some of these now. All right. Rule number one,
documents have limits. So there are some limits when
it comes to how much data you can put into a document. For starters, you're
limited to one meg total of data in a single document. And I know that, in
our world of, like, high-res, 16-megapixel
photos, one meg seems tiny. All right. Cheese. Whew. Oh, you got to
turn the flash off. But when we're talking
about text to numbers here, and we generally
are, that's still a lot of data you can fit
into a single document. For example, the complete text
of "A Tale of Two Cities," 300-page book from an author
who is paid by the word, only about 770k. You could fit all of that
into a single document, and still have room for a good
"Sherlock Holmes" mystery. Also, another rule
around documents-- and this will come
into play later-- is that they can't have more
than roughly 20,000 fields. And by the way, this also
includes fields within the maps that you have in your documents. So this is six fields,
and this counts as seven. One for the overall
address field, as well as one for each individual
element in that address map. And this would count as nine. And one big reason
for this limit is that Firestore
is creating indexes for any field you have here
in your document-- even, yes, inside a map. Basically, from
Firestore's point of view, my document here has a
field called address.city, and it's going to index it. And even if you don't hit
this 20,000 field limit, having documents with
a ton of fields in them can impact your
write performance. Because Cloud
Firestore needs to redo every single one
of these indexes when you create
another document, make a change, and so on. One other rule
around documents is that you're generally limited
to one write per second on the same document. So if you've got a situation
where, like, a lot of clients are all trying to write to
the same document all at once, you might start to see
some of these writes fail. Now, your client libraries
are generally smart enough to retry these writes with
some incremental back-off if that starts
happening, but it's still a situation you want to avoid. On the other hand,
writing to lots of different documents
in the same collection, that's generally fine. Go crazy. OK. Let's move on to
rule number two. You can't retrieve a partial
document, at least not using the client SDKs. So just because you can
put "A Tale of Two Cities" into a document doesn't
mean you should. See, when Cloud Firestore
retrieves a document, you're retrieving all the
data in that document. You can't retrieve partial
data from a document using the client SDKs. So imagine I have my collection
of Charles Dickens books, and my documents
have a title field and a "the entire contents
of the book" field. Then I decide, hey,
you know, maybe I just want my app to show a
list of what Charles Dickens books are in my collection. So I query this collection. Well, hey, guess what? In addition to
grabbing the titles, I'm also grabbing
the entire content of all the stories,
which means I'm now downloading approximately 38,000
times more data than I actually need. And that's bad, right? This is something your
user is going to notice. This is going to affect your
app's performance, its data usage, and its
battery consumption. And this tends to put
your app in a state I like to call getting uninstalled. Sorry, buddy. I can't keep it around. It's killing my data plan. This also has implications when
it comes to security rules. You can't create a
security rule that gives you access to some parts
of a document and not others. If you want to keep part
of a document secret, you'll need to place
those secret parts into a separate document. And I'll talk more about that
in our video on security rules. OK. So now you're thinking, well,
gosh, clearly the answer is to break my data up
into as many tiny documents and as many tiny
subcollections as possible. Well, hang on there. Because we have rule number
three, queries are shallow. So you probably
already know this, but when you grab a document,
you don't grab any of the data in any of these subcollections. And this is generally
a good thing, right? It means you can store
your data hierarchically, in a way that feels
logical, without having to worry about grabbing
more data than you intended. So if I took that content from
my "entire text of the books" field, and moved it down
into a chapter subcollection, well, now, I can grab my
list of Dickens books titles without having to grab all
the text associated with them. However, there's a
flip side to this, which is if you think
you're almost always going to be showing data together,
it might not make sense to put it into a subcollection. For example, maybe I decide to
put my list of main characters into a character subcollection. In many applications,
this might make sense. But if, in my app, it
turns out that whenever I'm showing the
title of a book, I'm also going to be showing a
list of all the characters, well, now I can't grab all that
data I need in a single query. I now have to get
the book and then get a list of all the characters
that are in that book by querying each
book subcollection. And this is going to make
populating a list like this a lot more complicated. So if this ends up being
the look and feel of my app, well, maybe I don't want to put
this data into a subcollection. I can keep it as part
of my main document. There's also another reason
this setup may be bad, which is rule number four. You're billed by the number of
reads and writes you perform. The other disadvantage of
putting these characters in a subcollection
like this is that-- and again, this
is only assuming I want to grab the documents
in this subcollection every single time I grab
this main document-- I've increased my reads here,
from one read to several, depending on how many
characters I want to read in. Now, how much this
matters is probably a factor of how often you'll
be making queries like this. I mean, if this structure
means your app is going to be making 2 million
extra requests per day, well, you know, that's
only roughly $0.12 a day. On the other hand,
if your app is making 2 billion extra requests
because of this structure, well, then that's $120 a
day and probably something worth optimizing. Like I said in the last
video, I'm not always a fan of premature optimization
for billing purposes, particularly if it
comes at the expense of more complicated
code, but it's definitely something to be aware of. OK. Rule number five,
queries can only be used to search for
specific documents within one collection. And they do so by looking
for specific fields that match a certain criteria. Hang on, Todd from 2018 Todd from 2019 is gonna
cover this rule. Lemme take it from here.
So on this side, I have my Charles Dickens characters
stored as a subcollection for each book.
I've included their name and an occupation as two
different fields. On the other side, I have the same list of characters stored as a map field
directly inside the book document, with maybe the character
name as the key and their occupation
as the value. Now, these two structures let
me run very different types of queries. With my characters
in a subcollection, I can say, hey, show
me every character in "Great Expectations" with
a name that starts with p. Or every character in "Oliver
Twist" who's a doctor. And if I wanted to say, "Hey,
show me all characters in all books named Oliver," I can now do that with
a Collection Group Query Now, this would require that I set up a
Collection Group index for the name field in the Firebase Console, and
do keep in mind you're limited to about 200 of these. On the other hand, what if I
put characters in a map like this? Well, I could kinda say, "Hey, show me
every book that has a character whose name is Oliver." And I would do that by
looking for a field called characters.Oliver with a
value that's, like, greater than an empty string. But I'll be honest, this is kinda weird
in that I'm relying on there being a field in a map
with a key named in a particular way, and that seems all sorts of error prone to me. In addition, I wouldn't just
be returning character data here. I'd be returning information
about the entire book that contains this character, which
may or may not be what I want. And I wouldn't be
able to do queries like, show me every character whose
occupation is socialite or show me all characters sorted by name. So my search options are a lot more limited
here. Yeah, I'm pretty sure socialite is an
occupation. It's just that in 2019, we like to call them
"Social media influencers." Now a third option is to put
your characters in a top-level collection. This was a nice workaround before
Collection Group Queries were supported, because they
let me search for, say, characters by name, or all characters
in a particular book, and so that search across all characters
in all books. And if you'll recall
from our earlier video, queries in Cloud
Firestore are very fast. So this "Show all characters
in Oliver Twist" query in the top-level collection
really isn't any slower than grabbing all
the documents from a subcollection. I think the biggest
drawback here is that if I wanted to, say,
search for all characters in Oliver Twist sorted by name,
well, now that requires a composite index because it
is now a multi-field search and, you'll recall, I'm limited
to 200 of those. So if you think you're gonna
be searching for individual records from a group of data,
you should put them into a collection. And you can use
either a sub-collection like this, or a completely separate
top-level collection. As a general guideline, I'd
probably say, store data hierarchically if you
think you're mostly going to be searching for
items per subcollection and only occasionally
want to do a collection group query. And put 'em into a top-level
collection if you think you're mostly going to be searching
across all documents and only occasionally want
to do, say, a per-book query. And putting your data into a map
like this, it's not really useful from a querying standpoint, This would more likely be
useful if you're just storing data you know you're always
going to want to retrieve with the parent document. All right, back to you, Todd
from 2018! Whoa... too far! (Grunts) This might make
more sense when we get into some more
examples in the next video, so stick with me
if you're confused. We have one more
rule to get through, and that's rule number six,
arrays are kind of weird. So there's a really
good blog post out there on why arrays are evil,
but basically, a lot of traditional array
operations don't work very well in a service like
Cloud Firestore. Because when you've
got data that could be altered by
multiple clients, it's very easy to
get confused as to what edits are
happening to what field. At least in a map, if I have
several clients all trying to edit different fields, or
even the exact same field, I generally know what's
going to happen, right? I'll end up with a
map that has fields that are equal to that of the
most recent client update. But in an array,
if one client is trying to edit a
value at index two, and another client tries to
delete the value at index two, and another client tries
to insert a value somewhere at index zero, I'm going to
have very different results, and possibly out
of bounds errors, depending on what order these
instructions are received in. So for that reason,
Firestore's actions with arrays are a bit different than
what you might expect. For instance, you can't perform
actions like Insert or Modify, or delete an element of an
array at a specific position. And you can't run a query
on the element contained at position 2 in the array. So are they useless? Well, no, not at all. By the time you see
this video, the team will have added
some operations that do make working with
arrays a lot more useful. See, it turns out,
a lot of developers don't really care
about the exact order that they're storing
things in an array. They just want to use
arrays as a simple way to contain a bunch of flags. For instance, maybe I
have a bunch of keywords that I'm storing for each
of my Charles Dickens books. So what do I do if I want to
find all of my dramatic books, or books that
involve war or crime? Well, we now have an
array-contains querying feature that will search for
documents in a collection where an array contains
a certain value. So I can keep my key words
in an array like this and say, hey, let's
find all dramatic books by looking for books
where keywords, array-contains, drama. So if you're wondering
how this works, you can kind of think of
it like behind the scenes, Cloud Firestore
converts this array into a map, where all
the values of the array are fields in this map. And like a typical map,
every one of these fields gets placed in an index. So asking for books where
keywords, array-contains, drama is kind of like asking for books
where are imaginary keywords drama is equal to
true along those lines Cloud Firestore also added some
new features to add or remove specific values from arrays,
as long as you don't really care about the exact
position of them. So editing this
array of keywords is a lot easier now, and doesn't
run into some of those problems I talked about earlier. So go check out
the documentation if you want some
more info about that. So that's a lot of rules. Can we turn these rules and
restrictions into guidelines? Well, let's see. For starters, put data
in the same document if you're always going
to display it together. You generally want
a happy medium in the size of your documents. Don't make them so big that
you're downloading more data than you need, but
don't make them so small that you're downloading
30 documents from two different levels of
the database just to fill out one
screen in your app. Put data in
collections when you're going to want to search for
individual pieces of that data, or if you want your data
to have room to grow. On the other hand, leave
them as a map field if you're going to want
to search your "parent" object based on that data. Another good time to put
data into a map field is just if you want to put
related data closer together for organizational purposes. For example, take
a typical address. Sure, I can make each of
these fields top-level values in my document, but I think it
looks a lot nicer if they're put into a map like this. I can still query
them just fine, and actually reduce the risk of
keyword conflict naming later on in my document. Vectors and games might also
be another good candidate for maps, something like this. And if you've got items that you
would generally use as flags, go ahead and stick
them in arrays. So that's an awful lot
of rules and guidelines, and maybe talking about these in
the abstract is enough for you to get started. But personally,
I'm always happier if I can talk about these
with some examples that mirror more real-world situations. And it turns out--
minor shock, I know-- Charles Dickens
apps aren't exactly a hot category in the
app store right now. Maybe next year. So follow me along
to the next video, where we'll look at how to put
these guidelines into practice. Come on. It'll be fun. I'm sorry. You're asking for a
$10 million valuation, and the Charles Dickens app
market is already too crowded. I just don't see the numbers. And for that reason, I'm out. [MUSIC PLAYING]