What Every Developer Should Know About Databases

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
just like knowing some Basics about how operating system works or how memory or CPU work knowing some Basics about database internals will especially a backend developer will make you a more complete developer or engineer and I believe it will boost your confidence as well so let's see it hello guys my name is inel I'm a developer working on kubernetes database operators as perona on this channel we talk about goang kubernetes databases Vim I don't know General programming stuff so if you like that maybe consider subscribing an application developer usually thinks about a database at its fundamental level which is that a database needs to do two basic things when you give it some data it will store it like safely and when you ask for that data it will give it back but we need to be aware at least to some extent uh that there is much more to it and choosing a database right or wrong can potentially have long-term consequences and if by chance the database you chose is not a good fit for your application performance reasons consistency issues operational challenges cost it is not a trivial task to change it and even though you might don't need to go deep into the subject you at least need to choose a database for your particular application and depending on the type of the workload you expect you should be familiar for example that their databases optimized for like transactional workloads and for analytical workloads and if you miss that for example you will probably have some issues if your application gets slightly serious and at some point when your database needs some attention and every slightly bigger application will need that at some point for sure you need to be aware of like what is going on and knowing some Basics about how a database work will definitely help you out and still if you like use a managed database for example I believe having this knowledge is still valuable right because you know the internals you know what is going on and you know if there is some like performance ISS issues you are aware of what potentially could be wrong okay let's talk about it the first thing that I want to distinguish here before moving on is the basic terminology So when you say MySQL or pogress for example those are not actually databases but rather they are database Management Systems a database is simply a collection of data of information that is stored and can be retrieved that is more or less it but the database management system is everything around the database so that is is a query language so the way you interact with your data I don't know security replication transactions and everything else but generally we use those terms interchangeably so when you say a database you probably mean a database management system and frankly it is easier to say database rather than repeating database management system all over again so when I say a database uh in the video I mean database management system so when it comes to a database architecture there is not like a blueprint for a database management system design okay uh every database is built slightly different and the boundaries uh between different components and subsystems are blured depending on the type of the database uh different consistency and uh performance considerations different architectural decisions and so on so the architecture that I will present here is like what is generally common in these kind of systems so let's go so so when a request comes to a database it will be handled by a transport subsystem the requests will usually come in form of a query and they will be expressed by some query language right and they will land in client communication component and like we see here uh the transport subsystem is also responsible for for handling uh intercluster communication that is communication between different database nodes in a database cluster okay after the request is received it will be passed to quy processor so something like this okay the query parser will pass interpret and validate the query here usually access control and permission uh checks are performed as well after that the query will go through uh query Optimizer it will first remove all of the redundant uh parts of the query and it will try to find most efficient way to execute it based on internal statistics that that it calculates the query is usually presented in a form of a execution plan or a query plan and one query can be executed with like multiple execution plans so query Optimizer will choose the best available we can also see uh the the execution plan that is that is chosen by the by the optimizer with like so for example if you have a query select I don't know star from users where ID equals 10 okay we can prepare this with explain keyword and this will give us uh a query plan that is selected by the query Optimizer okay mongod DB has also something similar it has explained command I think also usually here we can find like a a query cache something like this it will basically cach all of the select query results and if it finds it it will return it immediately and thus skipping everything else okay the execution plan is then executed bya execution engine so something like this the execution engine will basically aggregate the results from uh remote execution and local execution remote execution involves reading and writing data from a remote uh different database node inside the database cluster and this is where this cluster commun communication uh component like like fits in so we have something like this okay and local queries are executed by the storage engine okay so we have something like this cool so when it comes to storage engine we can like say that this is where the magic happens this is the part of the database that is uh responsible for managing how the data is stored and accessed both in memory and on disk this is a core component and it is basically what makes a database fit to a certain category okay like storage engine will organize the data that it stores in like uh relations or tables like uh columns and rows or it can organize it in terms of uh flexible documents and because of that we basically have like relational databases and document databases so relational databases like MySQL pogress Oracle right and document databases like mongodb Cassandra and so on the storage engine can also store the data in a row oriented fashion where all of the rows are stored uh are stored together on the disk or it can store it like in a column oriented fashion where all of The Columns of a table are stored together on the disk and from this we have like columnar databases and row oriented databases columnar databases like click housee for example and row oriented databases we can say these are like traditional databases MySQL Oracle pogress and so on columnar databases are usually used for like analytical workloads while row oriented databases are usually used for transactional workloads and besides this uh storage engine can can store can store the data in memory or you can store it on the disk so for that reason we have like in memory databases and disk based databases in memory databases like redis hazelcast and disk based databas is like basically everything else like we see here some basic components of a storage engine are transaction manager this manager schedules transactions and it will ensure that uh they cannot leave the database in a logically inconsistent State okay the next one is lock manager this manager locks on the database objects uh for a running transaction ensuring that concurrent operations do not VI violate like physical data Integrity okay access methods these manage access and organizing the data on the dis access methods include Heap files and different storage structures like B trees and so on the buffer manager even though ssds are really really fast today but still RAM memory is orders of magnitude faster so the buffer manager also known as like buffer pool or buffer pool manager is basically like in memory cache of pages that are read from the dis it is essentially like a large chunk of memory that is allocated inside of the database to store data pages that are fetched from the disk right and last but certainly not least is the recovery manager recovery manager uh maintains the Operation Log and restores the system in case of a failure this is primarily needed when there are changes that are made only in memory like here in inside the buffer pool and when a failure occur occurs before everything is flushed to the dis from the memory we need a way to recover that state right generally here uh we can find a pendon data structure that is that is always persistent on dis and it is called uh right a head log so basically when whenever operation comes in like update or delete it is first stored uh to the wrer Head log and this way in case of a failure we can reconstruct all of the changes that happened before the crash what we also can say about storage engine is that some databases can have like a pluggable storage engine system in place so you can basically swap storage engines so for for example uh MySQL and mongod DB can do that so for mongod DB you can choose two storage engines like in memory one that will store everything only uh to memory and the other one is dis based it is called uh wire tiger and it is the default one and in my SQL you can choose bunch of different storage engines and two most commonly used are Ino DB storage engine that is a default one and Myro storage engine which was first developed at at at Facebook but for example pogress comes with only one storage engine and that's it you cannot change and the last thing about storage engines that I want to uh just briefly mention so we can be complete here is that there are two families of storage engines they are page oriented storage engines and log structured storage engines and this is a huge topic on its own but uh essentially it is basically uh two different ways of how the data is physically stored on the disc and both of them have their like pros and cons uh log structur are generally preferred for like right intensive workloads while page oriented uh storage engines are like more balanced between rights and reads so okay guys this was like some primary components and subsystems that like form a database management system more or less each of these components is like a topic on its own but I hope I managed to give you like high level overview of what is going on so we can at least appreciate a bit more what it takes to store our data like durably and correctly okay guys thank you very much for watching I really hope this was uh useful for you if it was maybe give me a thumbs up and consider subscribing so you don't miss content like this and until the next video take care and bye-bye
Info
Channel: Inel Pandzic
Views: 2,478
Rating: undefined out of 5
Keywords:
Id: sK8WfBVzNs0
Channel Id: undefined
Length: 12min 48sec (768 seconds)
Published: Mon Mar 04 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.