[MUSIC PLAYING] TODD KERPELMAN: Hi, everybody. Hi. Oh, look at that. He got a reaction. That's great. I could tell-- you're
the best crowd I'm going to have at this event. Also the only crowd--
but still, the best. So my name is Todd Kerpelman. Thank you for attending this
intro talk on Cloud Firestore data modeling. I am from the Firebase team as
maybe you might have guessed. Hey, Firebase, in
case you hadn't heard, is a set of tools and services
to help you build more successful mobile and web apps. And we do that with everything
like analytics, to A/B testing, to performance
monitoring, to, yes, hosting your app data in the
cloud with Cloud Firestore, our massively scalable
cloud-hosted NoSQL real-time database. So in the interest of time,
I'm going to kind of skip the product pitch,
because I think if you're here at this
session, you generally kind of know that Cloud
Firestore is awesome. It's got yay, reliability,
and a truly serverless app development environment,
and magical syncing of your data to
all your devices, and great offline support,
and client libraries for iOS Android on the
web, and much more. We'll breeze over that,
because I know that, certainly, when I'm first approaching any
new technology, something that excites me like
Cloud Firestore, I'm a little bit like
our developer here who is apparently working
on a food delivery app. I have this mixture of
excitement and fear, right? I'm super excited to use all
these great new shiny features and can't wait to see
what I can do with it, but I'm also kind of afraid
of messing things up, right? Like, how can I make sure
that six months down the road I've made the right decision
so that my database can make the right kind of queries
that I want it to, or they haven't done
something wrong and messed up the performance as soon
as my app starts to scale, or I've done something
terrible and I have driven up my database costs exponentially? How can I make sure I'm
making the right decisions now so that I am not some, you know,
cautionary tale on "Medium" later. That's at least, generally,
the feeling I kind of have when I approach
a new technology. I'm guessing maybe
many of you do as well. I certainly kind of
see this sentiment, this mixture of excitement and
fear, out on Stack Overflow and, say, on our own
discussion lists. And to be fair,
there's a lot to think about when it
comes to how you're structuring your
database, and it can be kind of scary stuff here. So let's see if we can shine a
light on some of these topics. And I think I'm going to start
by just kind of reviewing, and in case you're new to
this, what a NoSQL database is, I think, because for a lot
of mobile and web developers, this is kind of the
first time trying to make a real production-scale
app using a NoSQL database. And then we'll get into
a few details around how Cloud Firestore is different. So I would assume most
of you kind of know what a more typical
traditional SQL database is. You've got tables. And each one of these
tables represents a strictly-defined object,
something like an author, or a book, or a review. And you have schemas that have
very strict rules around what kind of data is
allowed to appear in each one of these columns. Like, you know, that first
column in the authors table probably has to be a string
that represents their last name. And the second one needs to be
an auto-incrementing integer. And the third one has
to be a timestamp. And so on and so forth. And then later
on, you might want to sort of merge bits and
pieces of these different tables together to get some data
that you're interested in. And you would do this
by writing something in a language called SQL. And writing the
SQL statements is nice in that you
can get the database to do all the work
for you of finding all these different
pieces of data together, and merging them
together, and delivering to you. But it does have some
drawbacks in that performance, for instance,
can be very variable. This could be very fast or
it could be kind of slow. It depends a lot
on how much data you have to go through,
what kind of queries you're asking of your database,
exactly how your data is structured, and so
on and so forth. That is SQL in a very
overly-simplified nutshell. I've only got 40 minutes. In a NoSQL world, there's
a few differences. For starters, things tend to
be a little more loosey goosey in terms of how your
data is defined, or I guess to use
a more formal term, we like to say it
is schemaless, which means that by convention,
a lot of your data, a lot of your records
for your data, will probably look similar, but
there's no hard and fast rules around it. Yeah, sure, every
animal here has a name, but that's just by convention. The database isn't
really enforcing that. And this is nice. And then it gives
you some flexibility. If you want to add
a field, you can. I can add a birthday
field here for my dog and not worry about adding a
birthday field for my fish, because I don't care. It's a fish. No one cares about
a fish's birthday. And developers
generally like this, because it does give
them the freedom to start adding data as needed. I don't have to
worry about how am I going to backfill
this birthday field into all my other
animals, because, again, I don't care about my
fish's birthday, right? This also lets you store similar
but not exactly alike data. For instance, I can store a
plumage field for my bird, and a hair type
field for my dog, and a fin count
field for my fish, and not have to worry
about putting these fields in other records
where they might not make a whole lot of sense. Now, the flipside of
all this is that you do have to code defensively. You can never
really be guaranteed as to what kind of data you're
going to be getting back from the database. So you should fail
elegantly if things don't meet your expectations. But honestly, if you're
building mobile apps where your user might
have a version of your app that's a year old because
they refuse to update, this should be general practice
that you're following anyway. Never make assumptions about
the data you're getting. I think the biggest difference
with a NoSQL database is-- if you could tell, it's sort
of a hidden message right there in the name, there's
NoSQL, so queries tend to be a lot simpler. So again, going back
to our SQL example, imagine I had some
tables of books and some tables of
authors, and if I wanted to get a list of books
and the names of the people who wrote them, I would
do that with some kind of Join statement. But in the NoSQL world,
you don't have access to these Joins. So strictly relying on kind
of foreign keys like this, that's kind of foreign. This is not something you would
be able to get in one database call. So I think a more likely
scenario would actually be to duplicate some of that data. Take the author name from
these little author records and also put them
in the book records, so that they're grouped
together in basically ways where you're going to want to
retrieve them together. Now, this practice, this
duplicating of data, is known as
denormalizing data, which I know is a bad and scary thing
for a lot of SQL developers, because we've been
told ever since we were wee tots designing our first
database that denormalized data is a really bad thing. You're really only supposed to
have your data in one location so that it's easy
to change later. You only have to change
it in that one spot. But in the NoSQL world,
denormalized data is not only allowed, it's expected. And so yes, if
Charles Dickens were to change his name to Chucky D
to stay relevant for the kids, I would need to go back
and change it everywhere in my database. Not just there in the author
record, but in those book records where that
duplicate data lives. And yeah, OK, that's
kind of a pain, but there are good
reasons for doing this. One reason is that reads
are really easy now. If I want to grab
all books along with the names of the
authors who wrote them, I can just do that, right? It's just there, and
it's really easy for me to sweep up all that data. And think about it. Realistically, how often
is your data being read versus being written? Depending on how
popular my book app is, those records might be getting
read in thousands or millions of times. How often is Charles
Dickens changing his name? Like, once? Never? And so the NoSQL philosophy is
kind of, hey, you know what? Let's actually
optimize for the case that's happening thousands
or millions of times in the real world instead of
the case that's happening once. Another big reason NoSQL
databases are set up this way, and yes, I am oversimplifying,
but it makes it easier for us to scale
horizontally, meaning that as your database
needs to grow, as you add more and more
data to your database, you can basically just
throw other machines at it, and your data will
automatically grow to span across these multiple machines. And it all just works. Particularly in a managed
server environment, like say, Google
Cloud Platform, it makes it really easy for us
to just flip on and flip off servers as your database
grows and shrinks. And we can accommodate
your data without you ever knowing that we're doing any
of this work behind the scenes to make sure we're
adding room to expand. Now, by contrast,
SQL databases that often have these tight
interrelated joints, they tend to scale
vertically, meaning that as your database grows
and you need to accommodate it, you generally have to move
it onto bigger and beefier machines. And at some point,
you're going to run out of massive supercomputers
to put that thing on. But also, generally
speaking, every time you migrate your database
to another machine, there's going to be downtime. And we don't like downtime. So NoSQL databases come in
a lot of different flavors. We've got just big
key value stores. We've got big-old JSON objects
like the old Realtime Database. But Cloud Firestore is about
documents and collections, so I'm going to spend a
little time looking at these. Let's look at a document first. A document is
something you can think of as a dictionary or a hash. It's got a set of key
value pairs, which we like to refer to as fields. And the values of
these fields can be a number of different
things, anything from strings, to numbers, to very
small binary objects, to these JSON-looking things
that we officially call maps. Now, documents are
stored in collections, which are, as you might suspect,
collections of documents. Now, documents cannot directly
contain other documents, but they can and often do
point to subcollections, which contain other documents,
which then point to other subcollections, and
so on and so forth. Now, one important thing to
note in Cloud Firestore land is that queries are
shallow, meaning that I can grab this
document in the top and not worry about grabbing
all that data underneath it that are in all those subcollections. And this is generally nice. Developers generally
like this, because it means that you can structure
your data hierarchically in a way that sort of might
make sense to you intuitively without having to
worry about grabbing a ton of unnecessary data-- if I
just want that document on top. The next thing we
should probably cover are queries and how they work. Queries in Firestore
are interesting in that, as a general rule,
they are quite fast. And as a general
rule, they scale proportional to the
size of the result set, not the size of the
underlying data set. And what do I mean by that? I mean that if I were to
run a query that's asking for, say, the top 10
pizza restaurants in San Francisco, that query is going
to take the same amount of time whether I have a thousand
records to look through in my database or
a hundred million. No matter how large that
underlying data set is, that query is going to take
me the same amount of time. So how does Cloud
Firestore do this? By indexing every field in every
document in every collection. So thinking about our
fake restaurant delivery app a little more, imagine we
start having some restaurants represented as documents here. And we put them in some kind
of restaurants collection. Well, you notice each
one of my restaurants has a field for name, and
cuisine, and city, and rating. If I do that, Cloud
Firestore is going to go ahead and create
an index for every name, and every cuisine,
and every city, and every rating
in that collection. Now, all these indexes are
created automatically for me by Cloud Firestore
whenever I add, change, or delete a document. And now I can search for these
documents in this collection as long as I can follow
this two-step rule, which is step one, find a spot in the
index where some condition is true. And then basically grab a
bunch of adjacent documents until that condition
is no longer true. So let's go into a
concrete example here. Imagine I wanted to find
all restaurants in Dallas. Well, that would be easy. I would find my City Index. And using this
two-step procedure, find where city equals
Dallas, and then grab all the adjacent documents
until city no longer equals Dallas. Similarly, finding
all restaurants with a rating of 4.5 or
more, I could do that. Find that spot in the index. Grab all the adjacent documents
until, I guess in this case, I run out of documents. I should probably also note,
by the way, that map fields, the JSON-y looking
things, you can query those fields the same
way as any other field. If my address is
set up like this, Cloud Firestore
essentially looks at this as saying I have a
field for address.street, and address.city,
and address.zip. And it will go ahead
and index those map fields the same way it
would any other field. Two other features
that I'm not going to get into just in
the interest of time-- you can query across
multiple fields. So I could say, hey, find
me all Mexican restaurants in San Francisco with a
rating of four or more. You can also query
documents that have arrays that contain certain values. So if I have a flags field
that has an array that contains a bunch of elements
about that restaurant, I could perform a
search that says, hey, find me all restaurants
that serves alcohol or takes reservations. But again, I think the
biggest takeaway is remember that every field is indexed. And every query has to follow
this two-step procedure. I think it partly explains
why things work so fast, but it also might explain
why some things that seem like it should be possible
aren't really, like ors. I can't say, hey, find me all
restaurants in Chicago or San Francisco. That doesn't follow
this two-step procedure. Similarly, I couldn't say,
find me all restaurants where the city
doesn't equal Dallas. You can't get the same
performance guarantees using those types of
queries as you could with this two-step process. And again, because we're
fetching your results in real time, we want all
these queries to be fast. So with all that in mind, let's
start thinking a little more about our food delivery
app and how we might want to organize some of the data. We've already been
talking about restaurants, and I can imagine we're going
to have customers that are going to want to place orders. And then we're also going to
want to talk about the items that each of these restaurants
are serving on their menu. So I'm going to continue
talking about restaurants and, actually, that last one as
well, because they kind of go together. So I think a good start
is imagine a restaurant as we've been thinking
about them so far. There are going to
be documents that are in a collection like this. And our data, it's kind of
what we've been thinking about. This actually seems like
a pretty good start. Obviously, this is a
little more simplified than what we would see
in a real production app, but I think we
generally get the idea. But the one thing we
haven't talked about yet is what should we do about
the actual items on the menu. Well, it seems
like it'd be pretty easy to convert a menu
into a JSON-y looking thing and make that a map field that
we put inside a restaurant document. That seems reasonable,
but it could also make it a subcollection. If you think about it, each one
of these individual menu items, they could easily be
their own documents. And I could make
that a subcollection of this restaurant. Well, that also
seems reasonable. So I have two kind of
reasonable-looking solutions. What's the right one here? What should I actually-- which way should I go? Well, I'm going to spend a
fair amount of time talking about this, because it does
bring up a few more rules that I want to get into. Hurray for more rules, right? Just when you thought a
talk on database structures couldn't get any more exciting,
I'm going to add in more rules. So the first one I
want to talk about is that documents have limits. There are limits
in Cloud Firestore that prevent you from having
documents that are too big. Specifically, these
are three that you should be worried about. One megabyte in total
size of your data in a single document, 40,000
index fields, and one QPS of sustained dock rights. Meaning that you can have little
bursts of rights to a document, but on average, you should
only have one right per second to the same document
in Cloud Firestore. So these are some
limits that we put in to make sure that your
documents aren't too big, but in practice,
if we take our menu and we make it part
of our restaurant, does that push us
into the too big area? Well, let's think about that. So one megabyte doesn't seem
like a lot of space, right? We're all taking pictures of
these slides, and each of those are going to be four
meg or something, right? But remember, we're primarily
dealing with text here, and text, or numbers, or
JSON-y looking things, and those don't really
take up a lot of space. All of "Pride and Prejudice"
could fit into one megabyte. So unless we got George R.R.
Martin writing our menu item descriptions, I think we're
probably going to be OK. And then he would kill off
all our favorite dishes. 40,000 index fields-- well, all
right, this could be an issue. Remember that each one
of these fields in my map is going to be indexed. So Cloud Firestore is creating
an index for menu.ribs.name, and menu.ribs.price, and
menu.ribs.description. And, oh boy, that seems
like that could add up. But at the same time, 40,000
is kind of a big number. Even if I had 200
items on my menu, I'd have to have 200 fields
for each one of those items to really worry
about this limit. So again, we're probably OK. And as for one QPS of
sustained dock rights, I don't think that's really
going to be a problem. I can't imagine us updating
the price of our menu items more than once in a second. If we do, we'd be
having problems. Maybe if I had a real-time
inventory of how many of these dishes that our
kitchen had available to serve and I was updating
that real time, then I would worry
about this limit. But again, I think in this
example we're probably OK. So you do want to be careful
about having documents that are too big,
because you are going to run into these limits. Although, in practice, right
now I'm not actually sure that's an issue. So let's look at
some other rules and see if they can help
us make our decision. Rule number two is that you
can only fetch documents. So you know how I told you
queries aren't shallow? When you grab that
top one, you're not going to grab
any of the stuff, any of the subcollections,
underneath. Well, that's part of it, but
the flip side is that you cannot retrieve a partial document. You either get the entire
document or you get nothing. So if we start putting
our entire menu in one big document,
that's going to start to get kind of big. And if our client, one
of our users, is saying, hey, you know what,
give me the top 30 sushi restaurants in
Boston, our database is going to give back the
information they probably care about at that time, which
is the name, and the delivery fee, and their rating,
and their address, stuff like that, along with everything
each one of these restaurants has in their entire menu. And that's probably
more information than our user really
wants at that moment. And yes, I know a few
slides ago I told you that text isn't a big deal
compared to stuff like photos, but still, you're
going to have users that are going to be quite
sensitive to how much data their app is using, particularly
in certain parts of the world where data is expensive. In addition, the more data
your app is going to use, the more battery life your
app is going to take up, and the slower your
app is going to seem, because we have to download
all that data before we can show your results. And, by the way, if you
had a real-time listener setup on these results,
then when one of those values changes, we actually send
you over the entire document again. So you can kind of see
how this could start to be a bad user experience. So yeah, you really don't
want to send out more data than your user is actually
interested in at that time. Now, on the other hand,
if we break this up into subcollections,
then when I say, hey, show me the top 30
Japanese restaurants in Boston, I'm going to get back just
the restaurant information I care about at that time-- the name, the address,
the rating, and so on-- but not any of the menu items. Later on, when I
say, oh, Izikaya looks really good,
let's see what they have on their
menu, then and only then can we fetch those other
documents from the database. And that's a lot better from
a data usage standpoint. We're only sending our
users the information they care about at that time. So does that make our
subcollections solution clearly the winner? Well, hang on, because we've
got more rules to go through. More rules. Rule number three is
that billing is mostly based on the number of
documents that you touch. So Cloud Firestore
pricing involves several different
factors, but I guess it's primarily driven by
the number of documents that you're interacting with. Specifically,
you're going to get charged $0.03 to $0.06 for
every 100,000 reads you perform and similarly for
writes and deletes. And so you want to
think about the number of different documents you're
going to be interacting with. So if I ask for the top 30
sushi restaurants in Boston and everything is in one
giant document like this, I'll get billed for
30 document reads. Even if later on
I say, OK, let's see what's on the
menu for one of these, we've got that data
loaded up locally. And assuming we've
kept it around and haven't discarded
it, now we can show you the menu
for one of these without incurring
any additional reads. So that seems nice. On the other hand,
if I were to load up 30 documents of the top
three restaurants in Boston, and then say, OK,
let's see what's on the menu for
one of these, now I'm getting billed for that
initial batch of 30 reads and then that second
batch of 25 reads or whatever to get
what's on the menu. So is this bad? Well, the answer is it depends. Specifically, if you think
about most food delivery apps, you're generally looking at
a list of restaurants first. And then maybe after
you've found one that you're interested
in, clicking through to get the full
menu, then probably place an order from
there, which means that each one of these sets
of reads is a manual action. Meaning that,
realistically, your user will be doing this maybe
a few times per session, but probably not hundreds
or thousands of times. Now, on the other
hand, if we really wanted to load up the
full menu every time we did a search of restaurants,
or we thought, hey, we'll be clever and start
preloading all our menu items every time our user
performs a search, now with one single user action,
we're grabbing not just all the restaurant
documents, but all the menu items in all the subcollections
of all these restaurant documents. And that is going to be bad. This is the situation you
probably want to avoid. So you do need to
stop and ask yourself what is your app
actually doing and make the right call from there. And sometimes when I give
this advice, people get mad. They're like, why
don't you just tell me what the one right answer is? But I think the point
is there isn't always one right answer
in every situation. You kind of have to understand
what the trade offs are and make the right call based
on how your app is actually behaving. So in our case,
even though we are going to be making more document
reads with this subcollection, I'm still OK with it,
because, like I said, each one of those sets of reads
is a manually-driven action. Be careful about
over-optimizing for price. I've seen some really
strange solutions out there where people are too
focused on pricing and they end up either
creating a bad user experience or creating a lot more
work for themselves. If you really want one right
answer, one rule of thumb, I would say generally have
one collection per table view controller/activity/page. In our app, if we have
our list of restaurants, our restaurants search page,
that's one view controller. That's going to be driven by
the restaurants collection. Later when I say, OK,
let's see details, what's on their menu, that's
going to be then driven by another subcollection. So if you really want one right
answer, one collection per view controller/activity. But hang on, because
we're not done yet. There's still one more
rule to talk about. [COMICALLY GASPS] And
that's query search for index fields
across a collection. So we kind of talked
about queries earlier. So if I were to say,
hey, find me the top 30 restaurants in Dallas,
I could do that with either one of these setups,
whether our restaurants are larger documents or
smaller documents with these subcollections. Either way, this query works. What would I do if I say, hey,
I'm in the mood for chicken tikka masala? Can I do that with either
one of these setups? Well, let's start by taking a
look at the bigger documents. As you recall, every
field in a document, even the ones in these
map, are indexed. So looking at our menu here
as stored as a giant JSON-y looking thing, you can see
that I'm going to have an index for menu.korma_lamb.name. And so it kind of looks like
looking at something like that, looking at what I have
for menu.ctm.name. I could kind of search
for restaurants that serve chicken tikka masala, right? And I would do that by
saying, let's find restaurants where this menu.ctm.name
field exists. Honestly, I don't actually
really care about the value as long as that field
exists, they probably serve chicken tikka masala. But the problem here
is that I'm essentially relying on every restaurant
menu having the same key name for that dish. The fact that I actually don't
care what the name of the dish is is kind of a red flag. If Roger's Restaurant
were to use a different key for that
JSON object that it's using to represent
chicken tikka masala, now that dish is not going to
show up in my original search. I'm essentially relying on this
weird secret hidden information that every dish has to have
the same key name in my JSON object. And that's going to be kind
of weird and error prone, and so I'm not a big
fan of this solution. This is where it seems
like subcollections would be a simpler and
more natural solution. If I look at every
one of these documents representing an
item on my menu, I can see that each one of these
has its own set of values. And I would have traditional
indexes on each of them. And searching by name now
seems a lot more natural. I could say, find me items in
this collection where the name equals chicken tikka masala. The problem is this only
searches in one collection. I can do it for
Kiran's Restaurant. I cannot do it across
all collections. Until now. [COMICALLY GASPS] Big gasp. Yeah. [APPLAUSE] All right. So this is where collection
group queries can help. This is a feature
that I know we've been talking about for a while. And I'm happy to say you can
all play with it this week. So basically, the
way it works is you go to the Firebase
console and you would tell Cloud Firestore
about the queries you might want to make
across multiple collections. So in this case here, you
can see I'm basically saying, OK, out of all the
collections called menu items, I want you to find
this field called name, and I want you to index it in
this collection group scope. And what that basically
means is I want you to index the name field across all
menu items collections anywhere they exist
in my database. Index it as if it were
just one giant collection. And so that's what Cloud
Firestore is going to do. It's going to look at every
collection with the same name and index that name field as
if it was one giant collection, which then means I
have a name index that looks at all of these
documents in all these different subcollections,
which then means that I can query that collection group to
find all restaurants that serve chicken tikka masala, even
though they are now split into different subcollections. So going back to
our original dilemma of do we want to have larger
documents or subcollections, I think given the advantages
that we get of putting them in subcollections,
specifically, we don't have to worry about
hitting that theoretical larger document limit. We're much more respectful
of our users' data. And we can now
search for menu items by name by creating a
collection group query. This is going to be my winner. Yes, the bigger documents will
still give me fewer reads, but like I said before, be
careful about over-optimizing for price. You want to make
sure you're not going to do anything that's going
to be catastrophically bad, but if you're
trying to wring out every last cent of your database
usage, that might actually be better spent on other things. All right, so let's switch
gears for a little bit and think about how
we might want to store our users, our customers
who are placing orders. So this seems like a
pretty obvious candidate to put everything in a
top-level collection. It seems fairly
straightforward, right? And we can store their name,
and their delivery address, and their profile picture, maybe
some of their favorite foods. And this all seems
very reasonable. I like this. And then one day
our product manager comes in with a fantastic idea-- hey, you know what we should do? We should make
this thing social. Let's have our users find
friends in their local area who like the same
kind of food as them. It'd be great. They can get together. They can order
out food together. And now we've become a
food-delivery/dating app. Sounds good to me. And this is a query that
we could pretty easily create in our setup. We could say, hey, let's find
everyone in San Francisco where their favorites array
contains Korean food, and just like that, we've
found folks in our city who like Korean food. So what's the problem here? Clearly I'm leading us
to some kind of problem. Otherwise it wouldn't be a
very interesting presentation. It's this rule here. Remember, you can only
fetch documents, not partial documents. And so when some
of our random users are getting a list of
people in their city who like Korean food,
we're finding out all sorts of stuff about them, right? Like their name and also
where they live-- oh, crap. Well, that's bad. I can't imagine anything
getting worse than-- oh, crap. Right? Now we know where
all these people live and how to break
into their house. And this is bad. And sure, you're
probably smart enough that you're not showing
this data on the client. But that doesn't matter, right? The fact that this data
is getting sent over means that a sufficiently
motivated hacker could get at this data. And all of a sudden,
you've leaked all your users' addresses and how
to get into their front gate. So there are definitely
other options we should be considering here. One option-- just
put your addresses into a sub-collection, right,
where we keep that information. That's nice, too,
because now we can add multiple addresses per user. And then we can
make sure that only our delivery people have access
to that when they need it. If we had like payment
information or anything else that we might want
to keep private, we could store that in like a
private info sub-collection. On the other hand,
if we actually think, hey, you
know what; we may be storing a lot of
information about our users in these documents,
we could also flip this on its head, right,
and have a public profile sub-collection for
each one of our users. I like this because
this basically can say that everything
in our user document is private by default
until we explicitly take a copy of that data and
put it into the public profile. And that sort of prevents
a lot more accidental leaking of data. And I know I haven't spent
a lot of time talking about security rules yet. But when it comes to
preventing unauthorized access, both this approach and the
approach on the previous slide are nice because
it's generally easier to sort of have
different access levels at different collections. So being able to say,
hey, you know what? In this user's
collection, users can only read the user document
that belongs to them. But hey, these public
profile sub-collections, those are open for any
logged-in user to read. That kind of setup
is generally easy to do using security
rules, right, having sort of different
security setups for different collections. That's kind of how
security rules work. So let's move on
to the last data object we're going to
tackle, and that's orders. This is kind of interesting
because it combines data from a lot of different
places, right? We're going to
have some elements unique to the order itself, like
the time the order was placed and the delivery fee and
probably a few other things. And then we're going
to have information about our user, their name,
where to send the food. We have information about
the restaurant itself, like where to place the
order, where a courier needs to go to pick up the dishes. And then, yes, we're
going to have the menu items that our user has ordered,
right, like, what was it? How much did Cost Did they
ask for it extra spicy or hold the mayo? And again, if we
were looking at this from more like a
SQL background, we'd probably think of it
something like this. We might have a little bit
of order-specific information and then kind of foreign keys to
represent all these other bits of information. And then we would do
some kind of big old join before sending this
information off to a restaurant to kind
of process the order. But again, we don't
live in a SQL world. We are a NoSQL database
that's got super-fast reads and horizontal scalability
and all that great stuff but no fancy joins. So this is probably
sort of not the default way we want to think
about storing this order. Instead we're just going
to build the document with the data that
we need at the time. So when our user places an
order, we'll create a document, store in that
order-specific data, copy over the relevant
user information that we'll know from
our user document, copy over whatever relevant
restaurant information we need, and then also add in the
food that they're ordering. And this actually
is a case where I would recommend adding
all the items directly to the order in a big
array like this or a map instead of putting it
into a sub collection. Because if you kind of think
about it, anybody who's reviewing their
orders, whether it's a user looking at past
orders or a restaurant looking at open orders
they need to process, they're probably going to want
to see this menu information alongside their orders. So again, kind of
stop and think, what is my app actually doing? And make the call from there. And yeah, I know is
still a little strange to kind of see this duplicate
data in our records. But if this is still
kind of weirding you out, one of our engineers had
a really nice analogy, which is instead
of just thinking of this de-normalized data
as de-normalized data, this data in your
database really is your Realtime
API for your app. If you were to make
an API for your app that's like order get order,
you'd basically sort of be generating a JSON-looking
object that would look an awful lot like this. And so kind of the whole idea
maybe with a lot of NoSQL databases and with
Cloud Firestore is well, maybe just look at that
data that as this API, this Realtime API
for fetching orders. And that really is the
data that you're going to store in your database. So if that actually
helps you sort of get your mind around NoSQL databases
a little better, use that. If, on the other hand, that
just confused you further, then forget I said anything. So where this order
is collection actually goes in our database
kind of depends. Honestly, I don't care too much. You can make this a
sub-collection of a restaurant or make it a
sub-collection of our users or even make it another
separate top-level collection. Obviously, any one of these
situations is fine with me. Now that collection group
queries are working, you could basically sort of
query for any of these orders by the restaurant or
the courier or the user. And they would all
work just fine. So I would say kind of pick
any one these architectures that sort of first popped
into your mind intuitively, because that's
probably the right one. I actually don't want to spend
a lot of time on this decision because I do want to go back
to the duplicate data, right? Because we do need
to ask ourselves, what do we do if one of
these values later changes? How can we make sure that that
gets updated in our order? Well, in some cases,
maybe the answer is, eh, we don't
do anything, right? Like, imagine that a few days
after Diana places her order, Troy decides to raise their
price of their bibimbap. Well, in this case, it's
probably accurate and correct to not actually change that
value in her order, right? We want her order to reflect the
price of the item at the time that she ordered it. So this is actually
one situation where I think having
this de-normalized data kind of works in our favor. But there are cases where
it might make sense. Like, imagine that we've
done a UX research study. And we realized that when
restaurants change their name, we want to make sure that
name change does get reflected in the user order
because it makes it easier for the
user to remember it or something like that, right? So when Troy's Tofu Hut changes
they're named to Troy's Tofu Cabin, well, maybe we do want
to change that in our order as well. So how we do that? Well, one option is
just have that client make that change everywhere. So we've probably got
some kind of client app set up for our
restaurant owners, right? And we can say, all right, when
that restaurant owner decides to change the name and
change that on the restaurant document, we will also go ahead
and do a search through all these orders that belong
to this restaurant and make the change
there as well. That can work. But it's a little
weird to have a client of make this big transaction
that changes all these orders. For starters, it's a
lot of work that we're asking our client to do. And depending on the
situation, it also sort of might open up some kind of
strange security rule setups, right? Like, now, we've
got to make sure that a restaurant can go ahead
and change the restaurant field or the restaurant name
field in the order document, but we probably don't
want them like modifying the price of orders or adding
more food onto other orders or doing anything to
nefarious like that. And so this can get
a little strange. And so I think in
practice, this is something that would be better
done with a Cloud Function. So if you don't know what Cloud
Functions are, there a way for you to run server-side
code by having functions that execute in response to actions
that happen in your app, like, for instance,
someone changes the value in a restaurant document. And because these
run server side, they're generally not subject
to the same security rules they you would have to have
for your clients, right? These are being run in
an environment you trust, as opposed to something on
some person's phone somewhere. And that means you can generally
sort of lock down your security rules a lot more because
your Cloud Functions get to circumvent those
security rules. And so I like simple and locked
down when it comes to security. It just sort of means fewer
things to worry about. So we can create
a Cloud Function that activates when a document
in one of our restaurant collections is changed, right? And now our
restaurant client only has one job, and that's
to go ahead and change its name in the restaurant. And then we can rely
on the Cloud Function to notice that change and
make the corresponding change in all those orders. And this could really
be used anywhere that we've got duplicate
de-normalized data that we want to keep in sync. Like, remember how we had our
users and our public profile? Well, Rebecca if Becca
changes her name to Rebecca, and we decide, you know what? The right thing to
do is sort of always automatically update that
value in her public profile as well, that is something we
could sort of rely on a Cloud Function to do for us. So hey, wow, all
this stuff we were worried about at
the beginning, I guess it's not so scary
after all, which is nice. But I know there was a lot of
information to throw at you. But if you want to learn
even more, because it turns out there is a lot
more to cover, I have a series on
the Firebase YouTube channel called "Get to know
Cloud Firestore," where-- yes, take pictures of that. This is the most important
thing to take a picture of-- where I cover all this in
even more excruciating detail. But I also have cute
cartoon characters. Because when you think of
lectures involving databases, you think cute
cartoon characters. They just go together. Don't forget to rate the
session, if you liked it. If you didn't like it,
my name was Reto Meier, and I was talking
about Android Studio. And with that, I'm
going to say if you have any further questions, I'll
be hanging out in the Firebase dome for like another hour. I hope you all
learned something. Thank you very much. And now go out and have a
great rest of the conference. [APPLAUSE] [MUSIC PLAYING]