SPEAKER 1: So what
is Cloud Firestore? Why, it is a horizontally
scaling NoSQL document database in the cloud. Oh, I guess we're done here. Boy, that was easy. What's that? Oh, we're not-- we
need more explanation? All right, fine, I guess I can
stay a little while longer. Let's roll the credits. [MUSIC PLAYING] So let's talk about
Cloud Firestore. Cloud Firestore is what's
known as a NoSQL database. Now, if you're coming here from
the Realtime Database or Mongo world and already know
all about NoSQL databases, hey, lucky you. You can go ahead and
skip forward, like, eight minutes into this video,
because this next part will probably be review for you. But for the rest
of you, if you're used to traditional
relational databases-- things like MySQL or mobile
frameworks like SQLite Core Data, you're probably used
to keeping all your data in tables that look a
little something like this. Every table has its
own schema, which means that every row in that
table is very strictly defined. You have a specific
set of columns that you can add per
row, and every column has its own very strict rules
about what kind of data type goes in there. Oh, sorry, buddy. Your age has to be an integer. Those are the rules. I didn't make them up. Because of this
very strict schema, usually you end up storing
one type of object per table in your database. And if you want to
associate one object with another object hanging
out in another table, you're usually doing that by
creating another column known as a foreign key that
contains the unique ID of that other entry
in that other table. For example, let's say
I'm creating a database for a restaurant review site. I might have one table to
represent my restaurants, and another table to
represent my reviews, and maybe another for my users. Now, let's say I want
to look up reviews for a particular restaurant,
which kind of seems like the thing I'd want to do
in a restaurant review app. I would want a restaurant
foreign key in my Reviews table that shows me which restaurant
this particular review is for. And if we assume I want
to show within my review some info about the author
who reviewed that restaurant, I might include another
foreign key for the user. So later, when I need to show
my Restaurant Info screen, I would grab some info
about the restaurant from the Restaurant
table and then also grab some of the
reviews from my Review table, where the restaurant
foreign key equals the ID of this particular restaurant. And then I would also
look up the users, where the user ID
equals the user foreign key for this review. And then I could use that to
add user names and profile pictures for each review
for this restaurant. Now, this is a fair amount of
work being done on the backend, right? Grabbing all these entries
from all these different tables and joining them together, based
on these foreign keys and all that-- but this can be done with a
fairly straightforward SELECT statement in SQL. The database does the work
in grabbing all these pieces from the different tables and
joining them together, not you. And that is relational databases
in a somewhat overly simplified nutshell. Now, in the NoSQL world,
things are a little different. Generally speaking,
all your data is not going to be stored in
neat little tables like this. In fact, there's a
number of different ways you can store your data, from
a plain old key value store, to a big nested tree like
the Realtime Database, to a collection of JSON objects. But one thing that most
of them have in common is that NoSQL
databases are usually schema-less, which
means there aren't any database-level restrictions
around what kind of data you can put at any
point in the database. So I might have my
list of restaurants here with a bunch of
restaurant objects, all containing a name, and a
rating, and an address. But that's basically
by convention. There are no explicit
database-level rules that say all these
objects have to all have the exact same fields
within the exact same types of data, or even that
these need to be objects that represent restaurants. So this loosey-goosey
approach might seem a little weird
at first, but it does have some advantages. A lot of developers like working
with a schema-less database because it means they
can really easily iterate on their database design
by adding or changing fields as needed, and
it won't necessarily break anything else. I could start adding
a noise-level value for my restaurants
and only start adding it for new restaurants. I wouldn't have to
worry about having to back-fill it for all
my existing restaurants. My NoSQL database can handle
that without freaking out. It can also come in
handy in other situations too, where I might
want to store data that's similar to each other
but not exactly the same. For example, I could easily
expand my restaurant app to include bars,
and tattoo parlors, and skydiving lessons
for what sounds like a pretty awesome night out. And my database doesn't
care that one object has a tandem-jumps field and another
has a tattoo-style field, right? I don't have to fight
with my database to add these slightly different
types of establishments. The drawback here
is that you do need to code a little defensively. While you can set up
security rules to help enforce what kind of data you
put where, there's not really any guarantee at
the database level that you're going to retrieve
a certain set of data at any time. And that means you're probably
going to want to do some checking on the client
side to make sure the data you're getting is really what
you're expecting and fail nicely when it isn't. But honestly, coding defensively
is probably a good idea anyway. Particularly in
the mobile world, you can't always guarantee
that your users will be running the latest
version of your client app against the latest
iteration of your database. So I'm kind of hoping this
is a habit many of you are used to already. And if not, this is a
great time to get started. The other important aspect
of NoSQL databases is that-- and I'm sure this is
going to shock you-- there is no SQL, meaning that
all those fancy joins where I was able to say, hey, go
grab the review from this part of the database and the
user from this other part, and merge them all together-- I can't do that here. In general, if I
need to grab objects from three different
parts of the database, I would need to make three
entirely different database requests. And that's usually
not going to happen, or, I guess, that's
not going to be your default way of thinking. Instead, you're going to want
to put your data in places where you can grab it all
together if you need to. And that does mean you might
be putting duplicate data in multiple places. For example, let's
take a look at what a NoSQL version of our
restaurant review app might look like. You probably expect
to see restaurants listed as their own individual
objects in our database. But now, depending
on your database, reviews could be embedded
within the restaurant itself, but I think they'd most
likely be their own objects. And then you'd find
some way to indicate that these go together. The reviews themselves
might include the ID of the restaurant, for example. Now, with this setup,
I can fairly easily fetch an individual restaurant
or a group of restaurants. And through some
careful querying, I could fetch all reviews
for an individual restaurant. But I typically can't do
both of these requests in one single call. But for a restaurant review app,
this is probably fine, right? A user is first
going to want to view a summary of 20
or 30 restaurants and then drill down into one
of these to see more details. And it's really
only at this point that I'd want to request
these extra reviews-- so, so far, not too bad. But what if we want to show
the reviewer's name and profile picture in those reviews? This is where it
gets a little tricky. Let's assume our
users are represented by their own objects
elsewhere within the database. And while I could add
references to these users from within my review
objects, there's still no way for the database
to automatically grab that user's name and
profile for each review, as I'm requesting them. I would need to make
a separate database request for every
single review I get to fetch this
information, and that's bad. So if we want to
automatically include some information about who
wrote a particular review, we would most likely need to
copy some of that user profile data-- the author's name
and picture, for instance-- and place it into
our review object. Now, if you're coming from a
traditional relational database world, you're probably
freaking out right now, right? You're like, ah,
what are you doing? You're going to have duplicate
data all over your database, and that's the worst thing
ever to happen in programming since Go-To statements. And you're kind of right. People who have spent their
time with relational databases have been taught that data
normalization, meaning that every piece
of data should only exist in one place in your
database, is super important. And they've kind of got a point. Like in a situation
like this, it would be a lot more work if
my user decided to change their profile picture. I'd need to look up
every review where I've copied over this
profile picture information and replace it with the new
one, and there's always a risk that I don't change
it everywhere, and suddenly I've
got inconsistent data in my database. And so now maybe
you're thinking, well, this just seems terrible. Why is it that NoSQL databases
are so hot right now? Why are there so many
developers moving away from this nice world of
clean tables and data normalization and
join statements for this crazy, new, messy
world of data storage? So yes, one of the big drawbacks
of having this duplicate data is that when I change
it, I have to change it in multiple locations. But on the other hand, any
time I want to grab a review, it's really frickin' easy. All the data is right there
for me, all in one place-- no need to run joins across
multiple tables or anything like that. And while that means our writes
are going to be more work, our database reads end
up being really fast. And for many apps, if you
really think about it, your reads are going to
outnumber your writes by a lot. How many times am I going to
change my profile picture? Once a year, at most-- but on the other hand, maybe
a couple of dozen people are going to see my
restaurant review every day. So when it comes
to this data here, our reads might outnumber
our writes by 7,000 to 1. And so maybe it makes sense
to optimize the case that's going to happen 7,000
times, over that case that's going to happen once a year. But I think the biggest
advantage with a NoSQL database over traditional
databases is that it's able to distribute its data
across multiple machines pretty easily, and
this is a big deal. With most relational databases,
if my app gets super popular and I need my database to scale
up to a larger and larger data set, I generally need to put it
on bigger and beefier machines. And this is known as
scaling vertically. On the other hand, with
many NoSQL databases like Cloud Firestore,
if I need to scale up to a larger and larger
data set, my database can, behind the scenes and
pretty much invisibly to me, distribute that data
across several servers, and everything
just kind of works. And this is known as
scaling horizontally. And for those of you who are
working in managed server environments like the Google
Cloud Platform or AWS, it's pretty easy
for these systems to automatically add
or remove servers to your database as needed,
with very little to no downtime. So your database can scale
pretty much automatically without your ever
needing to lift a finger. And it's really
for these reasons that you're starting to
see a lot more databases, particularly ones
hosted in the cloud, moving to this NoSQL model. But, now, if you're coming
from a NoSQL background like the Firebase Realtime
Database, not much of this is new-- well, maybe except for
the automatically scaling part. Cloud Firestore does handle
that a whole lot better than Realtime Database. But it's more than just that. So let's talk more specifically
about Cloud Firestore's document collection model. In the Realtime Database
world, we typically describe the data that's stored
in Firebase as a big JSON tree because, well, that's
basically what it is, right? It's a tree. It's got keys and
values, and those values can sometimes be objects that
contain other keys and values. Now, Cloud Firestore, like
the Realtime Database, is a collection of objects. And all these objects are stored
in a tree-like hierarchical structure. And while databases like the
Firebase Realtime Database store everything as a
big old JSON object, Cloud Firestore is a
little more organized, in that it's made up of
documents and collections. Now, documents are similar to
JSON objects or dictionaries. They consist of key
value pairs, which are referred to as fields
in Cloud Firestore land. And the values of these fields
can be any number of things, from strings, to
numbers, to binary data, to smaller JSON-y
looking objects, which the team likes to refer
to as maps, among other things. And that's a document. Now, collections are basically,
well, collections of documents. You can think of them like
a hash or a dictionary where the values
are always going to be some kind of document. Now, there are a few rules when
it comes to using these things. The first is that
collections can only contain documents,
nothing else-- no collections of strings or binary
blobs or anything else here. Second, documents can
only be 1 meg in size. Any larger than that, and
you'll need to break it up. Third, a document cannot
contain another document. Documents can point to
subcollections, but not other documents directly. So it's very common to see
a collection containing a bunch of documents, which
then point to subcollections that contain other documents,
and so on and so forth. The fourth rule is that the very
root of a Cloud Firestore tree can only contain collections. Now, in most real applications,
this will seem very intuitive. You'll have a Users collection
and a Tasks collection and so on. I do find the one
time this ends up being confusing is
when you're building your first little, tiny
test app where you're storing two pieces of data. It's a little weird to
store "Hello, world," inside a document that's
then inside a collection. But in most real-world use
cases, this will be fine. Trust me. So this means that
as a general rule, you're going to be drilling
down into your data by specifying a collection,
and then a document, and then a collection, and
then a document, and alternating
like that until you get to the document containing
the data you actually want. Since this code can get
kind of messy and awkward, you'll often be specifying
the document or collection you want by creating a
path to that document, kind of like this. Just remember that
in your path, you're still going to be alternating
between collection, document, collection, document, and so on. So let's go back to thinking
about our restaurant review app. Seems like a
no-brainer that we're going to have a collection
called Restaurants, and each one of
these documents will contain some information
about the restaurant as well as probably a pointer
to a Review subcollection. Now, within this
Review subcollection, you're going to have
a bunch of documents. And each document will
represent one individual review. And so within these
documents, you're going to have a pretty
large text block containing the review itself and then
probably a few other details, like the overall rating,
and the date, and so on. And already, I'm kind of digging
this hierarchical structure, because it turns
out to be pretty trivial to grab all the reviews
related to a restaurant here. But then we're also
going to want information about who wrote this review. Now, I'm pretty
sure our app will have some kind of
Users collection, but that will probably be more
of a top-level collection that would contain all sorts of
information about that user, like their name, their
user profile, last login time, default location, food
allergies, what have you. And this really does feel
like a top-level object, not something I'd want to make
as a subcollection of a review. And so I talked
about this earlier, but this probably means that if
we want to include information about the user who
wrote this review, our review documents
will probably contain a couple of fields, like
author name and author profile picture, since that's
probably the only user information I'm going to need
when I'm looking at a review. And if I wanted, I could
also make this a map field-- those are the little
JSON-y looking things-- kind of like so. And so this would
probably be duplicate data that would live both in
the top-level User object and in this individual review. And we'll talk in future videos
about the best strategies to keep these kinds
of things consistent. Incidentally, if you're coming
from the Firebase Realtime Database land, this kind
of deep, nested structure might be giving you
heart palpitations, because in the Realtime Database
world, when you retrieve some element in the
tree, you automatically retrieve everything below it. And that would mean downloading
potentially hundreds of restaurant reviews anytime
I want to grab a couple dozen restaurant documents. But in Cloud Firestore
world, queries are shallow by default, which
means when you grab documents within a collection, you
only grab those documents. You don't grab documents
in any subcollections. So I can go ahead and grab my
20 top-rated burrito restaurants and just get those
restaurant documents without all the reviews
associated with them, which makes sense, right? If I'm doing a search in my
mobile app for best burrito places, that
results page is just going to contain that
basic restaurant info. I don't need the individual
reviews at this point. Later, if I were to click on
one of those burrito places to get more info,
that's when I'd want to see the individual reviews. And that's probably the point
where it would make sense for my app to request them
from the database-- make sense? All right, so I know that was
a lot to go over, but let's summarize. Cloud Firestore is a NoSQL,
horizontally scaling document model database in the cloud. See, just what I said at the
beginning-- all kind of makes sense now, right? Now, there's plenty
more to talk about here, like how you can run queries
in Cloud Firestore, tips for optimizing your data, and
how to keep it all secure, all of which are great
topics for our future videos. And hey, lucky you, we're
making a whole series all about Cloud Firestore. So if you want to
keep watching and you want to keep learning
about Cloud Firestore, why don't you go
ahead and subscribe to our YouTube channel? And then I can see you
soon in a future episode. All right, thanks for
watching, YouTube land. I'll talk to you soon. [MUSIC PLAYING]