MongoDB Schema Design Best Practices

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] this video presentation is brought to you by Red Hat OpenShift the Cooper neighs platform for Big Ideas you probably either knew interested or you've been using MongoDB a little bit and you're looking to kind of take your skills next level or getting into it and you want to figure it out so anyway let's jump into this talk and let's see what happens so first of all I'm just what your palette a little bit here about why this topic is important and why you should care about it so I see a lot of databases particularly MongoDB or document based databases and I've seen the number one most important thing the most critical part from proving performance and scale build your database is the schema design of your database there's literally like like it's more important than indexing and caching and doing those other crazy stuff too because what you're pulling from your database and how it's designed is paramount to performance and scalability so super super important but let's get started here my name is Joe Carlson and I work for a little company called MongoDB I'm a software engineer and developer advocate and actually I live here in Minneapolis Minnesota I'm assuming most people are from the Midwest or Minneapolis and yeah I've been living here my whole life so if you want to follow me though here's my Twitter twitch tik-tok and my personal website best way to reach me if you want to chat about anything software developer related is on my Twitter and I've been twitch streaming every week every Friday at noon on the DB twitch channel which is actually just at MongoDB and then lastly I don't think eyes know about tik-tok but I've been making really goofy videos on there recently there's like three of software engineers on tik-tok right now including myself so if you're interested in that you should definitely come check that out opinions on my own if I say anything opinionated or whatever just know it's me and that my company please don't get me fired that would be terrible and lastly all the links like and resources I discuss in this talk or on that bitly link you can also see that little QR code right in the upper right-hand corner of the page you just scan that that should take you to the page as well I'll have the QR code wherever you go on the link the video slides resources free credits for MongoDB etc etc so well let's jump into this so today we're gonna talk about a couple things first of all I want to do a quick comparison of traditional SQL relational database design and then compare that to MongoDB I think a lot of people who are in this position or most of us coming from a relational background kind of understand that works some of the basics of normalization and I think it's helpful to come from that place to understand how to actually design a schema for MongoDB next thing what are we talking about the basic tenants of schema design which is basically just a bedding you're referencing everything we're doing is a variation on one of these and trying to side where and how to do this and lastly I want to start talking about some typical relationships you see in schema design in general again we're gonna be coming from this from a relational perspective and then showing that apples apples with MongoDB I'm also gonna showing you some cool hybrid or like advanced schemas I'm use cases at the end of kind of just tickle your imagination a little bit and see what's the see what's in store for the future ok so let's jump in the thing was anything else that's important to note here I think so let's start with relational versus a MongoDB schema design approaches so in my experience when most developers come to MongoDB and they start designing schemas their instinct is to just start designing schema designs them all gonna be just like they had done an SQL they see no difference in it and they start seeing performance hits almost immediately with these designs this is bad my job here is to try to kind of help you unlearn your SQL relational background and learn some new ways open your mind up a little bit to new ways of designing a schema for your database so with relational database scheme design when a developer starts modeling it you are modeling your data independent of the queries are we making from it most of the time most of the time when you're designing schema with their traditional SQL database the developer is asking what data do I have and trying to split that up so let's look at what that looks like with with traditional SQL there's been there's a long history of like academic studies of you know normalizing and very prescribed very rigid approaches to how to structure data and we do that through normalizing there's a bunch of different snow-like forms of normalizing and basically normalizing and it's like most low-level description what it is it basically is a way to structure data so you're not duplicating data so for example if I was designing a some sort of user profile page for my documents or for a project I might have a SQL database that it may look like this I'd have a database with three different tables in it maybe my user table and my professional table and a car stable and all these tables will be linked together via foreign keys so you used your ID column in both the perfections and cars column right so we use that to kind of link all these separate collections together and that's called normalize like when you're splitting this data up into and the reason we're doing this is if we wanted to keep multiple professions and cars per user we'd have to denormalize the data set and we'd have repeating data in that user table do this we are splitting it up normalizing it so we don't want to duplicate data and that's through the normalizing and that's what we're used to rows columns DD duplicating data via normalization and we can do that to varying degrees DB schema design is a little bit different so with DB schema design it's able you have way more flexibility and freedom to design your schemas based on what you're actually building there are no rules there's no prescribed process and there's no algorithms for designing your schema there are no rules the most important thing you can do when you're designing a DB schema is to make sure that you're designing that schema for your application it's whatever you need and like we're gonna be talking about the specifics of what a good scheme design for a given application actually looks like but when you are designing that schema for your application there's three things you want to consider first of all you want to consider how you're actually gonna be storing the data so like right look what that looks like what's the data structure what that looks like the next thing you want to be considering too is your career performance with anything there's always a pro and a con there's always you know there's things you got to weigh out and for your application you need to make sure you're understanding what that looks like and lastly we want to make sure that we're using a reasonable hardware right we don't want to like spend four billion dollars on four hundred replicas clusters - right like we just want to keep things under control - so we're concerned about this design that schema performance for querying and writing that data and then also making sure they're not using too much hardware so let's go back to our original example here of our relational table in a traditional SQL or legacy SQL database and then I want to show how we would do the same data in a MongoDB database so let's take a peek of this that user table right we have ID for same surname cell city and some geolocation data there no problem if MongoDB if we're going to be translating SQL data we just take all those columns and we add those as key value pairs in our MongoDB document alright so we were just translate that over one note there too if you check out that location we're saving that geolocation data as an array of latitude longitude basically right geolocation data is a native data type in a MongoDB database okay but we still have those two other tables in our traditional SQL database we have that professions in that card table how do we save those well with MongoDB you can just save that as a either additional key value pairs or you could say that it's a nested array and for example if this professions since a user can have multiple professions we want to be saving that as an array of data all right through set professions with an array of whatever and last thing we have that car data so this car data is a little bit more complex you'll see here with professions we just have single column the user ID in the profession there's not a lot of additional data saved with that of course it's a little bit different we have the model and the year may even say but make to write we could save additional data and for all my design I might say that is an array of nested objects just save it all together so if you want to say that all is a single document that's to be great um so yeah perfect and we were talking about - there's different use cases for when you made separate this out - but let's do a quick recap here of relational vs. MongoDB schema design approaches so first one we talked about was how when you're designing with a relational schema the thing that most developers or SQL developers are considering is their modeling their data independent of queries and usually their normalizing the data in a third form or they're just normal X like D duplicating the data by splitting that data into separate tables and then joining those tables based on foreign keys in this example we're using user ID MongoDB scheme design is a little bit different though there are no rules no process and no prescribed algorithms the only thing that matters is what your app but how your application is used in the data and we consider three different things how to save that data crew performance and make sure that we're using a reasonable amount of hardware and as I said before we're designing the schema that works for application that's the most important thing here you make sure that you're like we have to consider your application how you're using it and we have to design the schema based around that application so let's get to the basic tenets of how we're actually going to be doing schema design and the most important the bedrock of Mongi be scheme design as are we going to be embedding this or getting acceptance in separate documents or collection or and are we going to be doing referencing two separate objects so let's talk about what those are and pros and cons of each of them for your application so first of all embedding I think most of us are pretty familiar with embedding I think one of the advantages of using MongoDB is that we're saving data the way that we're using data in our application for example if I'm building a user profile page this might be the data structure I actually use for one it might actually rendering the front-end or saving that on the back end and kind of using that object this is how we think about that data that we're actually digesting reusing hashmaps dictionaries objects JSON right this is the internet or this is this is the data structure of how we as engineers think but isn't betting we could just embed the data right with it right it's just a key value pair no problem and then betting is the same as a join so remember have a separate like the separate user tables we just threw them together click join the data together so if you're considering and betting why should you be considering that as opposed to referencing which we talked we'll be talking about in just a moment so if you're embedding something one of the key advantages is that you are going to be retrieving all that data with a single query right for example it's been a legacy SQL database if you want to get all that data you have to do a join and joins are extremely expensive both time lines and memory rise that's to bring all that data into memory and that's to do your execute some functions based on that thing and that could be it's a blocking operation it's expensive and it takes time but if you have all that data all at once like so if I'm designing an application and a user profile page and I need to get all that user profile data every single time someone uses uploads that user profile page boom it's one hit to the database I get that data all the data indeed and I don't want to do any joins or lookups or anything it's super super fast in a huge advantage of our traditional SQL database and you can update all of that information with a single atomic operation at this common misconception that MongoDB is not asking a plant which is actually not true at all first of all MongoDB by default is atomically every operation is atomic like for single operations and as a4 or 4.2 you can actually do multiple documents I acid operations even on shorted clusters so now I'm going to be Ken and is acid compliance but especially it's super easy though if you have all that stuff in a single document boom instant acid compliancy instant incident atop atomic operations okay so embedding a lot of great reasons to it mostly related performance if you can keep all that data together you're gonna saving a lot of time for application and they're not gonna be blocking any operations but nuts when should you be considering not embedding that's that's the interesting question right so for example large documents it's really easy to keep all your data all to but you should be considering if you don't need a little all at once maybe you shouldn't be embedding that like if it's pretty rare you need some data like you have some user profile information and maybe I don't use their pet day that every single time like their favorite pet or animal right like so I just I don't need maybe I don't need to embed that in the document because the larger the document the more overhead or more data that needs to be transferred over the wider to give that data to you you want to be keeping your documents as small as possible but keeping it like by keeping only the necessary data in it want to avoid putting unnecessary amounts of data in it and it's up to you to decide where that line is it can be anywhere right on the other thing to note too is with the MongoDB document there's a 16 megabyte size limit for documents I'm going to Asus a lot it's like hey great I might have some documents that are over 60 megabytes ok cool cool cool that's probably a time for us to reconsider your schema design if it's over 60 megabytes there are a use cases where the Hat but that does happen but it's for me that's a code smell or red flag that there's something wrong there schema design I mean it be discovering what we can be referencing instead of it directly embedding its other documents so that actually leads me to my next point here referencing so with referencing referencing that basically is when you could reference other documents from within a document surrett embedding just a reference pointer to a separate document in either that collection or a separate collection MongoDB collection so an example here of my left handed smoke shifter from acne court and it might have a lot of different parts and let's say I'm making a ecommerce store then I'm selling my left-handed smoke shifters whatever those are but it's probably pretty rare that I need a complete part list with all the additional metadata information on that every single time I pulled that page right I probably just need maybe some basic information description name how many are in stock the price and it might be another use case then like if some rarer but the case where you need a complete parts list that you can reference and go get that data if and when you need it but we do not need that every single time in this particular pace right so that would be a great example of a reference so referencing why would we use referencing well first of all as we discussed it allows us to do break down into smaller documents it allows us to only deliver exactly what we need while still keeping connections to other data that we might need for other use cases excuse me in our application this allows us to avoid the 16 megabyte limit imposed by MongoDB documents and again if you're hitting that limit probably red flag we need to start pulling it in referencing some of the data out of your database and it allows us to not duplicate any of our data we can pull it out we can start doing normalization like we do with a with a legacy SQL database I will note though it's not a problem duplicating data and actually normalizing data in legacy SQL databases is a super common practice super common but so let's avoid that if that's a concern or if we start running into some rugged performance hits and allows us to pull out that in frequently accessed data we gave an example of perhaps I'm not loading up user data or like user favorite animal pet pets or animal data every single time a little bit maybe it's an advanced section of our profile page or something if we're not extant that data very often why do we need to be transferring that data over the wire every single time we pull up that page we probably don't you probably do not so let's avoid it thus get rid of it this this reference is somewhere else and so we only pull it in if and when we need that for those rare use cases so add parts of it obviously if you need to get all that data this requires two queries or a lookup operation and of course there is a performance kit for having to do additional lookups and queries on lots of documents there's gonna be hit on it can't be avoided right it's a pro/con you decide what's am i saving more time by luring or shrinking the size the document or is it okay for us to have the slower lookup time for these infrequently used operations or use cases and that's pretty much it for that so betting versus referencing those are the total basics of it let's do a quick review here so embedding obviously we're just using key value pairs and we can embed dated it on there I'm going to be we have super huge flexibility with how we actually want to embed or deeply nest data in there but and embedding is the same as a joint with a traditional legacy SQL database so what are some pros of embedding well we talked about how it allows you to retrieve data all of the single query all the data is in the occu the object to begin with done we don't even do any joins or lookups which are slow and expensive and lastly we could update all information as a single atomic operation remember MongoDB can be at um earth a second plant and it is with single documents and it can be with multiple documents too and things that might go wrong is you might have more overhead going over the wire which is slower and expensive for network traffic right we can't but more stuff you're sending over the one are the longer it takes and lastly there's a 16 megabyte document size limit per monkey to be document and let's talk about referencing so referencing is basically referencing either another document or document another collection another MongoDB collection and it's we usually referenced by a unique ID just like we you would do with a foreign key with our legacy SQL database so why would you consider using referencing over embedding well when you're referencing you have smaller documents you're less likely to hit that 60 megabyte data you're not deduplicating data with an asterisk that duplication of data is not a bad thing and we don't have to we don't have to access the data we don't use on every single query we're not using it don't pull it up your since that could be a potential performance roadblock for your application and referencing why should you not be considering it home referencing basically makes it so that you have to do either a lookup or join in order retrieval of data which there's going to perform its it for doing that kind of operation okay cool so see here this is my favorite part let's give them two types of relationships I think it's kind of cool to start digging to specific use cases and compared it to some of the types relationships we see with legacy SQ all databases so first of all let's just start with one to one I'm just gonna kind of we're gonna start easy here we're gonna ramp up to the some more advanced designs schema designs so if you want to be adding one to one day that's your document you might have guessed we're just gonna be adding key value pairs your documents right so we're just this is a user document and I have one Twitter accounts one twitch count one tick-tock right one Dewan just add a key value pair alright so pretty easy my one user for one attribute you might have no problem alright what about one to few let's see here so see I could have potential multiple addresses and maybe have a work address or home address or permanent address or temporary address you could just do embedded data either like an array or nested object with additional data for that pretty simple pretty simple one a few one important note here is that we only have a few addresses and we're gonna talking about unbounded race here in just seconds so I'd have some general purpose rules here for designing a MongoDB schema and rule number one is I want you to be favoring embedding and let's give a compelling reason not to your default should be a favor to embed it in there and you should that should be your default and I want you if you ever want to reference it you should be able to justify why you decide to reference it I think if you're coming from an SQL mindset you're just gonna start referencing just because that's what you're used to with denormalizing or normalizing data and pulling it two separate tables and needing to access data on its own is a compelling reason check sheet to not embed it right so we just need to like if you need to get data separately or we need to get like for example that parts right we just need to get that part's data that isn't that is a compelling reason for not inventing it all right so we've done one two one one a few pretty pretty simple so far not too complicated let's get to some more fun ones so one too many let's say I'm designing a ecommerce store and right we have those a single product can be made up of multiple parts we have this by testicle here or whatever or is left handed smilk shifter we had before right these products are made up of many parts and we how do we design an e-commerce store to handle both it's like keeping track of all the parts and the products our database well might design a product right we make reference then all of these separate parts on there because that bicycle or this left-handed smoke shifter may have thousands of stub parts that we need to keep track of so that's a lot we have one one tenants milk shifter - you know 3000 parts keeping all those parts embedded inside the document is a problem right that we that we might hit that especially we have all the metadata for that parts that we had that we have in this example me we might run into danger of hitting that 16 megabyte limit all right so we want to we can start referencing that at that point the point is is like keeping track of like what data do we need so if the product page are we gonna be needing these parts say that every single time you load up is that a company use case or is this a more rare use case so the user might need these parts and are we willing to separate these out to keep them a tight product object but also keeping that reference all the parts on there and I think it in most ecommerce stores that is a that is a type of design II might that would work well for most ecommerce stores I know no it's there to right we have those foreign keys referencing each other so we go as reference those in future with separate lookup some joints alright cool cool cool cool so rule number three you for designing a grip monger to be scheme design and see it looks I got a question but I can't view it yet so I'll get to you in just seconds rule number three I want you to be avoiding joins and lookups if you can but don't be afraid to if adabot provides a better scheme design alright so we have our example here like we're doing a reference here but for our imagined ecommerce store it works it works well for us and it's a pretty rare like Amazon Best Buy Target you're very rarely needing all the parts data every single time someone loads of that page that makes sense for us in this use case but it's up to you to decide if that's something I'm willing to do and if a reference works great for you provides a better scheme design go for it do not be afraid to use it but again in prefer embedding but if you can justify it and it provides a better scheme design use their reference use a reference don't be afraid of them all right when to squillions um i'm even know swilling is a word but I kind of love it but what is squillions how is one squillions different than one of many well let's imagine we're making a a log a program for logs and I don't mean like timber logs I mean I want to make a program that is able to keep track of server logs and if you've ever looked at server logs before or especially like a server farm you've probably noticed that like log files are completely massive so I'm atrophied for a moment that you are designing an event logging system that collects logging messages for an entire fleet of computers or servers so you have you know more than one you're gonna have many many different servers each of them is collecting different log data I need to be designing a system to keep track of all that data how would you do that it's not squillions right like for for many it's like maybe 10 to a couple thousands maybe squillions is like an unbounded amount of data per machine that's a lot of data how we're gonna be keeping track of that so the section look at how that works if I was designing a database schema for Lana I might have two separate collections among a to be collections I might have a hosts that keep track of all different computers and then I might have a separate collection that keeps track of log messages so this is kind of maybe what I'm thinking here so you'll notice here we are gonna be doing referencing but the referencing is different than we had for one-to-many because you might notice that like our products kept raft of all the parts but in this instance our folks actually keep track our hosts collection keeps track of none of of the log data it knows nothing about it all it knows about is what it is and kind of where it exists maybe it's IP address maybe some other metadata the log messages is where we actually keep all that data and we're doing a reverse object or up ID look-up on there you'll notice where you keep the track of our host IDs in the log message so we're referencing it from those message and so that we can just keep making unbounded can't keep adding additional messages so doing queries on those log messages and referencing the host that those exist on we could be keeping all that hosts metadata in it but depending on how much metadata is on that host machine it might be better to keep that in a separate collection I just want to do maybe I have a dashboard that keeps track of just commonly referenced machine data right and I only want to be doing some specific queries on the log messages for that dashboard and this schema design would work phenomenally well I work phenomenally well so again I just want to show this operation again with a log message we're doing a reverse reference as opposed to the embedding the reference in the host file the log message is keeping track of which computer corresponds to as opposed so the host keeping track of which log messages relate back to it this is great for unbounded amounts of data or one to squillions because if we kept those embedded you would for sure you guaranteed him like 16 megabytes a lot it's probably pretty rare that we would hit that with the reference unless we think as 40 billion parts but you're going to hit that eventually especially with your servers on for years right you're going to hit that oh yeah both those are referencing up there too okay one is squilliam it's fun so rule number four a MongoDB schema design is arrays should grow or should not grow without bound if you have data that's going to continue to grow without a stopping point you need you you cannot have that that's when we do the reverse referencing if you have a couple hundred maybe and again depends on the data using but or elected at your betting it may be a couple hundred I'll be okay now a couple thousand probably you need to be doing some reverse referencing okay all right what about many of the many so imagine with me for a moment you've been asked to build a the total world of applications right the hello or the to-do list right um so you aren't going to do us and in this to-do list one user can have many tasks but also one task can have many different users this is an example of the many too many referenced and we have to be able to handle referencing differently than we did before so I might have a user collection to keep track of all the users of my application no problem and I might have a tasks collection they keep track of all the tasks for my given application and we didn't make sure that we're referencing those many to many relationships together how did that listen oh I would build this my person has a metadata about it but I also would have a in like it depends right if you have we have user it may have a couple hundred tasks or whatever so we're just gonna embed those directly if the reference to those tasks inside of that person object and then you see there on the right-hand side we have the tasks objects so we have some metadata about those tasks are some due date but you also note there we have also have a owners array that can keep track of multiple owners for that task so the task we're doing a reference to that tasks on the person we have multiple in there and we'll also have references to the owner on the each of the tasks we're doing a two-way reference in order to keep track of that many-to-many relationship for those two separate documents or student to separate collections Oh fun alright so we're getting some fun stuff down rule number five how you model data depends entirely on your particular applications access patterns it is I cannot say this enough this is the most important data or most important rule you can think up here right when you're designing your DB schema the most important thing you can do is to make sure that it is you're designing it for your des needs your application it's not a one size fits all there's no prescribed approaches it's how you're gonna be accessing the data and what you need from it that's the most important thing okay so it's here I want to do a quick recap of this and then I want to go over some fun little extra hybrid approaches here too so let's do a quick recap here of the relationships we've just covered so firstly we covered the one-to-one relationships no problem right we're talking about just using key value pairs easy easy wonderful you we're good about that we give an example of that the embedded of just an array of data in there I'm just embedding some data on it I have a few in there no problem just gonna put that straight in there mainly too many that's when we start doing some references and we escaped example the product in parts you may have hundreds to thousands of parts there so we're just going to reference those as an embedded array in that in that object under School Ian's this one I think is so much fun right we're doing a reverse reference from that alot so we get an example of building a server logger right how are we building at that location outs so we're just gonna be reverse referencing for those log files to the host information that hosts metadata in a separate document all right what about many-to-many that's a - a reference that that we gave example if it's do list and the user object is keeping track of the tasks and the tasks is also responsible for keeping references to which owner is responsive that data it's a two-way reference of that data let's split that up and let's review those rules right so on favor and betting and betting strip your default option in a design a schema and you should be using embedding unless you can vocalize or explain why you are referencing it you should have a compelling reason to do it needing to access that data on its own is a is not a composer so you can be accessing sub data from that from the object so it is not a compelling reason you want to be avoiding Jones and lookups if they can be avoided and that's by using just embedding that data in their array should not grow without bound right so and that's we talked about one two squillions you having lots and lots of data that you're going to hit that 60 megabyte limit and that's gonna be a potential problem for your application as it grows and lasting most importantly how you model your data depends entirely on how you are actually accessing and using the data no one-size-fits-all approach you need to figure out what your application is how excusing that data and structure your data depending on how you're actually accessing that data alright so let's see here I got like eight minutes left I want to just I'm not gonna have time to do all these hybrid approaches so I just want to cover two of my favorites here um okay so I'm gonna just do this one here it's one my favorites so imagine with me for a second here that you were building a Twitter like social media site right just like a social feed here and we're gonna be using MongoDB as our database how would we start split data so if you had to use your object for that this is right we have a typical user and make keep track of I don't have I have 11,000 followers in Twitter so I'm a maybe you know it's hitting the limit but maybe you lose keep track of all those followers in there like it is inside of that object okay probably a great idea but like we can do it but what if a Kim Kardashian or like a Justin Bieber joinder cycads that is going to be a problem for your MongoDB documents you're going to hit that limit right Kim Kardashian has 64 million followers on Twitter right now and having an array of 64 million it's probably not a great idea probably it's pretty rare we need actually all 64 million those users in a single query that's probably pretty rare so how do we handle that this make me would let me use something called the outlier pattern and there's a ton of these I'm just gonna scratch the surface here due to time but you'll see here on the bottom we have a for these big power users we haven't has extras or like an overflow data on it and that would basically signals to our application that we need to be doing additional work on it so for example me are gonna start doing some overflows on our application you say that is overflow it goes true and we're using incremented IDs in order to keep track of the additional followers so separately pulling that data out and doing some overflows and this is called the outlier pattern because the kins and the justin bieber's or outliers a system that we didn't make sure handling it and actually it's interesting that twitter actually uses a similar outlier pattern for their users actually verified users get handled differently in their system because they're so large they need to be handled differently on the back end this is very similar protection what they do for real in production let's see here I only have five minutes left here so I'm gonna skip it I do want to just brag about a project I built recently this is my IOT kitty litter box pretty fun little project that's me assembling it but it builds a little dashboard that shows my cats bathroom habits in real time and they used a time series based schema design to put that together I'm just gonna show that really quickly but every day creates a brand new document and then in that document I have an event it's filled in there keeping track of an array of every single time a cat goes the bathroom basically time stamp and then pass and measuring its weight this is actually what that that schema design looks like alright so every single day and I do this because with my dashboards I have per day I'm keeping track of how many times he's gone the bathroom I designed this schema based on how I'm digesting the data on my dashboard so I can make simple queries unto my database to get the data I need and display that easily on there my schema design is based on how I'm accessing that data alright so I just wanted to touch ensign those because I think they're super cool but there are a ton of other patterns like those too that just showed you out there this is an example of some of them too and there's a great blog post here which I'll link at the ends what's amazing use cases of some super flexible interesting innovative new scheme designs you can use for your application all right let's do a quick recap here and I'll answer the question so today we talked about relational versus MongoDB schema design we talked about with relational schema based schema design when developers are designing a schema they're modeling their data independent of how they're accessing that data and usually they're doing something like normalizing into the third form they're splitting that data the rows of columns and doing referencing based on foreign keys I'm gonna be schema design is a little bit different there are no rules no process and no algorithms for designing ads you're just designing based on how you're accessing that data for your given application and some things you want to consider are how you're storing the data in Korea performance your again you're designing that scheme enough for your given application that's the only thing that matters and the two basic ways we do that is variations of embedding a referencing so we're embedding it's just a key value pair embedded data directly in that MongoDB document and the same as doing a join on a legacy SQL database but allows us to retrieve all data easily you know how many joins super easy and it's a essica plan but it can get our documents can get pretty big for doing embedding all the time which is my reference and references when we just have a foreign key lookup based on a separate object ID if smaller documents no duplication and you're not accessing the data you don't need every single time but two queries or more is required to retrieve all that data and we talked about some of the basic relationships of schema design we talked about one-to-one I'm just super easy one a few awesome one of many were you want to reference the embedded reference ID in the one object when to squillions we give an example of a log the server log application and you're doing reverse lookups on those applications as they grow to unbounded size so we need to make sure that we're not gonna be hitting that limit as we design those applications we talked about many to many doing a to a reference in order to talk to each other and then we add five rules for monetise schema design so you want a favor embedding and let's give a compelling reason not to embed that data you also if you need to access a data on your own on its own it is not is it good reason not to embed it it's a good reason not to put it in there can you just get that great just throw it in something not a problem avoid joints if you can be avoided but no array should not grow if that bound super important to know your data and if you think in the future we'll be growing stuff or adding way more data to it that's a problem you want to make sure that you're referencing that instead of embedding that in there and most importantly how will you mount our data depends entirely on how you're at how you're building it how you're building it is like my samos node company builds another application right you just figure out how you are gonna be asked to that data making sure you're designing your schema based on your particular needs oh oh my gosh okay let's here I'm gonna see here check out the Q&A got a couple questions here some Matthew Turrell into asked do the embedding I'm gonna put this over here actually he asked do the embedding and referencing approaches impact query complexity and they do so so when you're doing like lookups in with MongoDB if you have like super complicated structures or like you're doing like you're embedding or referencing something that's referencing something that's referencing something you can do that but that of course will affect your lookup you also design your lookups based on the reference structure of your database okay cool let's see here no it just happened let's see here Brian asked SQL has two column types to ensure the data is valid doesn't he have typed system like that and yes it does it is actually another myth that MongoDB is a schema less database Mugabe's actually has a flexible schema design so let me explain what that means with with the SQL legacy SQL database you're forced to use types for each column you have no choice about it and I'm gonna be you can actually force types at a database level you don't need an aura at all you can enforce types at the database level for key value pairs or document shape or structure the difference is you get to choose what is flexible and not flexible so you have more give more flexibility if you wanted to have a complaint sq all like super flexible database do you design the schema no one can do any else you can totally do that with MongoDB but you can have flexibility if an application can have it so for example my I know T letter box I am the new sensor type and I just updated the schema with that new sensor data and it wasn't a problem stuff like that kind of application it's okay to have greater flexibility but you can have the rigidity if you need it and that's super important for a lot of massive applications which I totally understand okay cool let's see here okay and then Brian also asked I love monkey to be but it's billed as an non-relational we keep showing relational yeah I know right umm I think that just has to do with like it's yeah it's not we don't we don't advertise it as a non relational database because it is you can do relations in it I will say it like if you're doing relations just because you're used to them as SQL then it's a problem but like we call ourselves a general-purpose database not a non-relational I think that that's the no SQL thing but um I think it's more it's more what the community calls us like isn't entirely accurate it's what we actually are okay cool let's see here I think that one more question here and then I'll wrap it up but Kevin Boyd asked if you're just starting out with MongoDB and don't understand the queries you'll need how do you model your queries and modify your schema as understanding requirements change yes great question so things change all the time of applications the only constant in software engineering is that it's your acquirements will change your software will change it's the only thing that we can be sure of so I like I would recommend like yours should design your schema based on the needs of what you need in your application and things will change and that's okay and that happens in SQL database databases - we need to add new columns but things up were you're growing as a data denormalized normalize data like super super comment you can do the same thing in MongoDB you can just start restructuring pulling data part referencing running queries aggregations lookups based on whatever you need money thing with just to be just billed for what you need and then you can make changes based on what happens in the future it's right we can't we can't predict the future you can't predict it but you can adapt to changes as you see them okay let's see here so uh I just want to wrap up here and what's next so if you're interested in learning more we actually have a completely free learning systems university that MongoDB comm and if you're particularly there's a manga to be one on one 101 course if you've brand new to MongoDB I totally recommend checking out but if you want to learn more about schema design I would check out m320 and that's a data modeling course we have I was note here when I started MongoDB this is what we use internally to train for internal training for engineers and it's totally free and open to public if you're interested learning you need to be checking on University them on V to be calm it's incredible and lastly I'm just gonna go ahead and say if you want to learn this stuff too or Mosby to be in general just do it um we're sitting at home right now like just go and try to learn some like on your next application you're building maybe just try uh MongoDB throw it as a database in your back-end your next project see what happens and if you want to try it out I can actually give you $100 and free Atlas credits Atlas is our cloud hosted now going to be options you have to run it local anymore it's super amazing I've never gone over the free tier amount but just in case there's $300 on there and that's with joke a 100 you can use that code when you're signing up to get 100 dollars in free credits here's some of the additional references rules of thumb documents and articles ever checking out other check out if your interesting stuff too and here is my information one more time and thank you so much for having me I appreciate you all so much you're amazing and I couldn't do this without your help are you being here today too so it just appreciates your oh so there's my info hit me up on Twitter if you want to talk you
Info
Channel: Joe Karlsson
Views: 77,268
Rating: 4.9440069 out of 5
Keywords: programming, software development, web development, JavaScript, Introduction, Code, Learn to Code
Id: leNCfU5SYR8
Channel Id: undefined
Length: 50min 39sec (3039 seconds)
Published: Thu Apr 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.