What is a NoSQL Database? How is Cloud Firestore structured? | Get to know Cloud Firestore #1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
SPEAKER 1: So what is Cloud Firestore? Why, it is a horizontally scaling NoSQL document database in the cloud. Oh, I guess we're done here. Boy, that was easy. What's that? Oh, we're not-- we need more explanation? All right, fine, I guess I can stay a little while longer. Let's roll the credits. [MUSIC PLAYING] So let's talk about Cloud Firestore. Cloud Firestore is what's known as a NoSQL database. Now, if you're coming here from the Realtime Database or Mongo world and already know all about NoSQL databases, hey, lucky you. You can go ahead and skip forward, like, eight minutes into this video, because this next part will probably be review for you. But for the rest of you, if you're used to traditional relational databases-- things like MySQL or mobile frameworks like SQLite Core Data, you're probably used to keeping all your data in tables that look a little something like this. Every table has its own schema, which means that every row in that table is very strictly defined. You have a specific set of columns that you can add per row, and every column has its own very strict rules about what kind of data type goes in there. Oh, sorry, buddy. Your age has to be an integer. Those are the rules. I didn't make them up. Because of this very strict schema, usually you end up storing one type of object per table in your database. And if you want to associate one object with another object hanging out in another table, you're usually doing that by creating another column known as a foreign key that contains the unique ID of that other entry in that other table. For example, let's say I'm creating a database for a restaurant review site. I might have one table to represent my restaurants, and another table to represent my reviews, and maybe another for my users. Now, let's say I want to look up reviews for a particular restaurant, which kind of seems like the thing I'd want to do in a restaurant review app. I would want a restaurant foreign key in my Reviews table that shows me which restaurant this particular review is for. And if we assume I want to show within my review some info about the author who reviewed that restaurant, I might include another foreign key for the user. So later, when I need to show my Restaurant Info screen, I would grab some info about the restaurant from the Restaurant table and then also grab some of the reviews from my Review table, where the restaurant foreign key equals the ID of this particular restaurant. And then I would also look up the users, where the user ID equals the user foreign key for this review. And then I could use that to add user names and profile pictures for each review for this restaurant. Now, this is a fair amount of work being done on the backend, right? Grabbing all these entries from all these different tables and joining them together, based on these foreign keys and all that-- but this can be done with a fairly straightforward SELECT statement in SQL. The database does the work in grabbing all these pieces from the different tables and joining them together, not you. And that is relational databases in a somewhat overly simplified nutshell. Now, in the NoSQL world, things are a little different. Generally speaking, all your data is not going to be stored in neat little tables like this. In fact, there's a number of different ways you can store your data, from a plain old key value store, to a big nested tree like the Realtime Database, to a collection of JSON objects. But one thing that most of them have in common is that NoSQL databases are usually schema-less, which means there aren't any database-level restrictions around what kind of data you can put at any point in the database. So I might have my list of restaurants here with a bunch of restaurant objects, all containing a name, and a rating, and an address. But that's basically by convention. There are no explicit database-level rules that say all these objects have to all have the exact same fields within the exact same types of data, or even that these need to be objects that represent restaurants. So this loosey-goosey approach might seem a little weird at first, but it does have some advantages. A lot of developers like working with a schema-less database because it means they can really easily iterate on their database design by adding or changing fields as needed, and it won't necessarily break anything else. I could start adding a noise-level value for my restaurants and only start adding it for new restaurants. I wouldn't have to worry about having to back-fill it for all my existing restaurants. My NoSQL database can handle that without freaking out. It can also come in handy in other situations too, where I might want to store data that's similar to each other but not exactly the same. For example, I could easily expand my restaurant app to include bars, and tattoo parlors, and skydiving lessons for what sounds like a pretty awesome night out. And my database doesn't care that one object has a tandem-jumps field and another has a tattoo-style field, right? I don't have to fight with my database to add these slightly different types of establishments. The drawback here is that you do need to code a little defensively. While you can set up security rules to help enforce what kind of data you put where, there's not really any guarantee at the database level that you're going to retrieve a certain set of data at any time. And that means you're probably going to want to do some checking on the client side to make sure the data you're getting is really what you're expecting and fail nicely when it isn't. But honestly, coding defensively is probably a good idea anyway. Particularly in the mobile world, you can't always guarantee that your users will be running the latest version of your client app against the latest iteration of your database. So I'm kind of hoping this is a habit many of you are used to already. And if not, this is a great time to get started. The other important aspect of NoSQL databases is that-- and I'm sure this is going to shock you-- there is no SQL, meaning that all those fancy joins where I was able to say, hey, go grab the review from this part of the database and the user from this other part, and merge them all together-- I can't do that here. In general, if I need to grab objects from three different parts of the database, I would need to make three entirely different database requests. And that's usually not going to happen, or, I guess, that's not going to be your default way of thinking. Instead, you're going to want to put your data in places where you can grab it all together if you need to. And that does mean you might be putting duplicate data in multiple places. For example, let's take a look at what a NoSQL version of our restaurant review app might look like. You probably expect to see restaurants listed as their own individual objects in our database. But now, depending on your database, reviews could be embedded within the restaurant itself, but I think they'd most likely be their own objects. And then you'd find some way to indicate that these go together. The reviews themselves might include the ID of the restaurant, for example. Now, with this setup, I can fairly easily fetch an individual restaurant or a group of restaurants. And through some careful querying, I could fetch all reviews for an individual restaurant. But I typically can't do both of these requests in one single call. But for a restaurant review app, this is probably fine, right? A user is first going to want to view a summary of 20 or 30 restaurants and then drill down into one of these to see more details. And it's really only at this point that I'd want to request these extra reviews-- so, so far, not too bad. But what if we want to show the reviewer's name and profile picture in those reviews? This is where it gets a little tricky. Let's assume our users are represented by their own objects elsewhere within the database. And while I could add references to these users from within my review objects, there's still no way for the database to automatically grab that user's name and profile for each review, as I'm requesting them. I would need to make a separate database request for every single review I get to fetch this information, and that's bad. So if we want to automatically include some information about who wrote a particular review, we would most likely need to copy some of that user profile data-- the author's name and picture, for instance-- and place it into our review object. Now, if you're coming from a traditional relational database world, you're probably freaking out right now, right? You're like, ah, what are you doing? You're going to have duplicate data all over your database, and that's the worst thing ever to happen in programming since Go-To statements. And you're kind of right. People who have spent their time with relational databases have been taught that data normalization, meaning that every piece of data should only exist in one place in your database, is super important. And they've kind of got a point. Like in a situation like this, it would be a lot more work if my user decided to change their profile picture. I'd need to look up every review where I've copied over this profile picture information and replace it with the new one, and there's always a risk that I don't change it everywhere, and suddenly I've got inconsistent data in my database. And so now maybe you're thinking, well, this just seems terrible. Why is it that NoSQL databases are so hot right now? Why are there so many developers moving away from this nice world of clean tables and data normalization and join statements for this crazy, new, messy world of data storage? So yes, one of the big drawbacks of having this duplicate data is that when I change it, I have to change it in multiple locations. But on the other hand, any time I want to grab a review, it's really frickin' easy. All the data is right there for me, all in one place-- no need to run joins across multiple tables or anything like that. And while that means our writes are going to be more work, our database reads end up being really fast. And for many apps, if you really think about it, your reads are going to outnumber your writes by a lot. How many times am I going to change my profile picture? Once a year, at most-- but on the other hand, maybe a couple of dozen people are going to see my restaurant review every day. So when it comes to this data here, our reads might outnumber our writes by 7,000 to 1. And so maybe it makes sense to optimize the case that's going to happen 7,000 times, over that case that's going to happen once a year. But I think the biggest advantage with a NoSQL database over traditional databases is that it's able to distribute its data across multiple machines pretty easily, and this is a big deal. With most relational databases, if my app gets super popular and I need my database to scale up to a larger and larger data set, I generally need to put it on bigger and beefier machines. And this is known as scaling vertically. On the other hand, with many NoSQL databases like Cloud Firestore, if I need to scale up to a larger and larger data set, my database can, behind the scenes and pretty much invisibly to me, distribute that data across several servers, and everything just kind of works. And this is known as scaling horizontally. And for those of you who are working in managed server environments like the Google Cloud Platform or AWS, it's pretty easy for these systems to automatically add or remove servers to your database as needed, with very little to no downtime. So your database can scale pretty much automatically without your ever needing to lift a finger. And it's really for these reasons that you're starting to see a lot more databases, particularly ones hosted in the cloud, moving to this NoSQL model. But, now, if you're coming from a NoSQL background like the Firebase Realtime Database, not much of this is new-- well, maybe except for the automatically scaling part. Cloud Firestore does handle that a whole lot better than Realtime Database. But it's more than just that. So let's talk more specifically about Cloud Firestore's document collection model. In the Realtime Database world, we typically describe the data that's stored in Firebase as a big JSON tree because, well, that's basically what it is, right? It's a tree. It's got keys and values, and those values can sometimes be objects that contain other keys and values. Now, Cloud Firestore, like the Realtime Database, is a collection of objects. And all these objects are stored in a tree-like hierarchical structure. And while databases like the Firebase Realtime Database store everything as a big old JSON object, Cloud Firestore is a little more organized, in that it's made up of documents and collections. Now, documents are similar to JSON objects or dictionaries. They consist of key value pairs, which are referred to as fields in Cloud Firestore land. And the values of these fields can be any number of things, from strings, to numbers, to binary data, to smaller JSON-y looking objects, which the team likes to refer to as maps, among other things. And that's a document. Now, collections are basically, well, collections of documents. You can think of them like a hash or a dictionary where the values are always going to be some kind of document. Now, there are a few rules when it comes to using these things. The first is that collections can only contain documents, nothing else-- no collections of strings or binary blobs or anything else here. Second, documents can only be 1 meg in size. Any larger than that, and you'll need to break it up. Third, a document cannot contain another document. Documents can point to subcollections, but not other documents directly. So it's very common to see a collection containing a bunch of documents, which then point to subcollections that contain other documents, and so on and so forth. The fourth rule is that the very root of a Cloud Firestore tree can only contain collections. Now, in most real applications, this will seem very intuitive. You'll have a Users collection and a Tasks collection and so on. I do find the one time this ends up being confusing is when you're building your first little, tiny test app where you're storing two pieces of data. It's a little weird to store "Hello, world," inside a document that's then inside a collection. But in most real-world use cases, this will be fine. Trust me. So this means that as a general rule, you're going to be drilling down into your data by specifying a collection, and then a document, and then a collection, and then a document, and alternating like that until you get to the document containing the data you actually want. Since this code can get kind of messy and awkward, you'll often be specifying the document or collection you want by creating a path to that document, kind of like this. Just remember that in your path, you're still going to be alternating between collection, document, collection, document, and so on. So let's go back to thinking about our restaurant review app. Seems like a no-brainer that we're going to have a collection called Restaurants, and each one of these documents will contain some information about the restaurant as well as probably a pointer to a Review subcollection. Now, within this Review subcollection, you're going to have a bunch of documents. And each document will represent one individual review. And so within these documents, you're going to have a pretty large text block containing the review itself and then probably a few other details, like the overall rating, and the date, and so on. And already, I'm kind of digging this hierarchical structure, because it turns out to be pretty trivial to grab all the reviews related to a restaurant here. But then we're also going to want information about who wrote this review. Now, I'm pretty sure our app will have some kind of Users collection, but that will probably be more of a top-level collection that would contain all sorts of information about that user, like their name, their user profile, last login time, default location, food allergies, what have you. And this really does feel like a top-level object, not something I'd want to make as a subcollection of a review. And so I talked about this earlier, but this probably means that if we want to include information about the user who wrote this review, our review documents will probably contain a couple of fields, like author name and author profile picture, since that's probably the only user information I'm going to need when I'm looking at a review. And if I wanted, I could also make this a map field-- those are the little JSON-y looking things-- kind of like so. And so this would probably be duplicate data that would live both in the top-level User object and in this individual review. And we'll talk in future videos about the best strategies to keep these kinds of things consistent. Incidentally, if you're coming from the Firebase Realtime Database land, this kind of deep, nested structure might be giving you heart palpitations, because in the Realtime Database world, when you retrieve some element in the tree, you automatically retrieve everything below it. And that would mean downloading potentially hundreds of restaurant reviews anytime I want to grab a couple dozen restaurant documents. But in Cloud Firestore world, queries are shallow by default, which means when you grab documents within a collection, you only grab those documents. You don't grab documents in any subcollections. So I can go ahead and grab my 20 top-rated burrito restaurants and just get those restaurant documents without all the reviews associated with them, which makes sense, right? If I'm doing a search in my mobile app for best burrito places, that results page is just going to contain that basic restaurant info. I don't need the individual reviews at this point. Later, if I were to click on one of those burrito places to get more info, that's when I'd want to see the individual reviews. And that's probably the point where it would make sense for my app to request them from the database-- make sense? All right, so I know that was a lot to go over, but let's summarize. Cloud Firestore is a NoSQL, horizontally scaling document model database in the cloud. See, just what I said at the beginning-- all kind of makes sense now, right? Now, there's plenty more to talk about here, like how you can run queries in Cloud Firestore, tips for optimizing your data, and how to keep it all secure, all of which are great topics for our future videos. And hey, lucky you, we're making a whole series all about Cloud Firestore. So if you want to keep watching and you want to keep learning about Cloud Firestore, why don't you go ahead and subscribe to our YouTube channel? And then I can see you soon in a future episode. All right, thanks for watching, YouTube land. I'll talk to you soon. [MUSIC PLAYING]
Info
Channel: Firebase
Views: 375,882
Rating: 4.9760222 out of 5
Keywords: Firebase, NoSQL, Cloud Firestore, Google Cloud, firestore, google apps, mobile app developers, app developers, google firebase, firebase cloud firestore, NoSql, no sql, intro to NoSQL database, firebase developers, android app developers, flutter developers, google flutter, google app building, google play, GCP, google cloud platform, document database, what is a document database, json, Sqlite, core data, schema, client app, server side, nosql, ios, flutter, GDS: Yes;
Id: v_hR4K4auoQ
Channel Id: undefined
Length: 15min 29sec (929 seconds)
Published: Mon Mar 26 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.