Introduction to NoSQL Databases

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] me on the same page [Music] me [Music] and we are live hello everyone welcome to today's workshop introduction to nosql databases hello let us uh so hi hi rux welcome how are you doing hello good morning good afternoon or good evening wherever you guys are thanks for attending the workshop and and taking a few hours of your time to uh listen to us okay so let us let us do a very quick uh sorry let us do a very quick sound check one second can you hear us please uh gives us okay can you hear us well so i can switch to presentation yes great very good thank you so much thank you okay so okay so welcome to today's workshop introduction to nosql databases this is going to be a two-hour workshop and yeah it will be some theory some hands-on that you will do as well with us and there will be also quiz and games so be prepared because you can also end up winning some prizes so let's start uh who are we i'm stefano i'm a developer advocate at data stocks i love cassandra and this is my these are my contacts so feel free to get in touch with me and rux what about you hi my name is raghavan srinivas but for whatever reason people in the world have a tough time pronouncing my name so i just go by rags i used to wear this hat you know i'm just wondering if i should wear it during the webinar as well but but i'll i'll probably take it off right now but but you know i've done a lot of evangelism in my early days and i'm looking forward to being here yeah oh uh did you go mute drugs no i'm i'm good okay i think i lost your signal for a little time are you all right yeah i'm good okay sorry for something for some time i i think i i lost your signal but now it's it should be all right okay okay so very good uh okay so today you are seeing the two of us on the screen but there is a whole team behind us who is helping us and uh some of them you probably have seen in the chat as well so i've seen cedric and ryan at least high certification but yeah we are whole team so let's move on okay okay so before we start just a few useful information um okay today the workshop is going to be here on youtube probably as you are all watching us on youtube now but there is also a twitch channel it's our fallback channel in case youtube has some trouble but on twitch we are not looking at the chat so please uh stay on youtube if you want to discuss with us and yeah our team is missing a brazilian yeah we will we will try and extend us to cover brazil as well hola you know what stefano next time around i will wear my um you know brazil t-shirt that way you know i will be faking a brazilian i mean i just faked australian then i can think about it right you have to learn the accent though but that is true okay so uh as we were saying uh we have uh uh you're we are now on youtube and you're watching uh the chat uh there is a chat closed next to the video so we can interact on the chat but uh don't forget that the chat of youtube disappears after the video is over so uh you're encouraged besides uh liking us on youtube and and following us on youtube because we are close to hitting the 20k follower mark so please help us but we are encouraged to connect with us on discord as well where the discussion can go on even after the the streaming the the live event is over so the another thing that we want to that we are going to use today is this mendy.com that you see here below uh it's a platform that we use to do quizzes and games so you will later be given instructions to connect with us on your mobile for example to mendy and there is where you will play uh interactive quiz with us so that's more or less it there is another part to the very important part before we start today we are going to do besides uh some theory we are going to do some hands-on exercises uh you are going to do some extra exercises with us and to do that there is nothing to install so everything that we do today is is going to be done in your browser easy and it will be illustrated by us and you will do that in your browser there is a further part of the exercises that is left as an exercise for you and this is this requires something to install on your machine so but it's all explained in the github repository because there is and this is very important and let me drop the link on the chat there is a very well done github repository that contains the slides for this presentation and the instructions to do all the exercises by yourself in a self-service way so if you if today if you would just want to follow along no problem then you take your time you go to this repository that i just gave in in the youtube chat and maybe i'm also pasting that in the discord chat and now thinking of that let me also give you the link to connect with us on discord so i'm going to write in youtube the link to connect with us on discord sorry for the complication but that's important join us on discord yes okay so um this repository uh contains all instructions for you to to do the exercises by yourself at your own rhythm at your own in your own time so uh another thing to say is we are going to um to use a database for our exercises uh one a single database that is offered as a service in the cloud and that is asteroid b that is built on on apache cassandra and this database thanks to the magic of a api gateway called stargate will be appearing as several different kinds of nosql databases but more on that later okay so let's move on another important thing is that we leave homeworks so there are homeworks to do uh mostly they are mostly you repeat the same steps that we are doing now live with you in the hands on but after you complete the homework the instructions are in the github repository after you complete the homework you send us the screenshots of your final of your completed homework and you will be awarded a batch a verifiable badge that you can use to brag around on linkedin and whatnot okay so the instructions for the homework are in the github repository okay now uh let's see rugs are there some questions maybe about this part just so everybody is aligned yes yeah i think oh sorry i think one question was you know um are we going to go high level and then dig into cassandra or do we start with cassandra you know and the answer really is that uh we will use cassandra as as an example but but really applicable across um you know a number of different uh no signal databases yeah yeah good point so most of what we do today is a general introduction to the landscape of nosql databases then again we will be using cassandra for for the hands-on exercises but the concepts are general and there is we also have other workshops more specific more focused on cassandra so please connect with our so yeah you are connected to us on youtube but we you can also follow us in some other ways there is also let me just drop a link to a page where we announce every one of our workshops because we do several workshops per week so that might be of interest to you another point is somebody asks is this session recorded so this session in this moment is live hi gabriel gonzalez but it will be recorded and you will be able to watch it anytime at the same youtube link so very easy okay what is the mentee code one second because mentee is and this is the slides you're seeing so many is the the service we use to do quiz but before we do the real quiz let me go to mentee so this answers the question what is the menticode so this is the platform we will use for the quiz but also for a few introductory questions a bit of survey to get to know better our audience so please go to mendy.com and enter this code 6861-3818 so please use your phone or any way open in in a new tab and this you will soon see how it works we will be asking you questions let me paste the code in the chat as well one second here meant to code here so go to menticode and enter the code so let me just quickly check okay somebody's already jumping on board and suffice yeah suffice to say that uh you know speed does not matter right now yeah so just couldn't point to it but but you know you may want to start building your muscles uh to go faster because at the end we're gonna have a quiz and there uh timing absolutely matters yeah um you know i've never been able to get even to the top 10 because you know i'm just not good at it but but you know you may want to kind of start building your muscles like i said yeah because you guys are too fast for us so we give you a few minutes more to join in the meantime i see there is a question uh what is cassandra well cassandra is one of the many or few or little nosql databases uh and it happens to be a database on which astrodb is built and aster b is the platform is the cassandra as a service platform that we will be we will use today for the exercise in the browser so short answer cassandra is a nosql database okay so let's see there are a few people now looked in so let's start okay so this so we have a white swath bolivia yeah colombia costa rica brazil obviously india anybody from the u.s keep going yeah yeah argentina argentina wow so hello everyone from all over the world so there is a question about data modeling in cassandra and data modeling in nosql databases that's a bit pre-med a bit too early now to address this question but we hope today you will get some knowledge of why and how this is not the same as a relational database so let's start with a few questions to get to know you better okay so are you using nosql databases today and yes document oriented db no maybe let's see so the answers are well many answer no no sequel sorry well there's no need to be sorry because you are here to get to know sql databases but somebody and it's nice to see somebody is also using document tp graph db which is cool column db multiple technologies maybe i have no idea well we hope that by today you will have a better idea but it's nice to see that some of you are [Music] coming here to learn more even though you are using already no sql database so let's move on to the next question so one thing again um you know as you kind of get used to mentee is don't pay attention to the youtube chat for the time the midi quiz is on because what you may want to do is just focus on the device or whatever you are you know coming through because there is a slight lag and that might be enough to you know not kind of not be on the leaderboard yeah that's a very good point and it will be important by the end of the workshop when we will have just decrease the quiz with prices this is just to to know each other so nosql databases are schema-less let's see what are what your answers are so some say some say it depends some say no sql are schemas and some say it's not schema-less rux what do you think it's almost uneven split yeah interesting right um you know so i don't know i mean i i would like to think they're schema-less maybe they're schema flexible uh maybe maybe you should put in another another and and let people put in whatever they want you know what you know what let's find out today let's see how many of you by today learn the answer to this question and let's move on okay so so there's a question is that graph being created live from the chat data of course uh ganesha yes yes it is and this is there may be a slight lag but but i think yeah you know yes it is you mean of these slides this is what this is what mentee.com does for us so this is the service mentee offers so how many nosql databases exist somebody says one somebody says less than them hundreds or dozens well at least one because i mentioned cassandra but uh let's see we are not giving the answers yet okay so this was just most people think it's in the dozens yeah yeah it's reasonable right yeah well there are more than one this we can already anticipate otherwise we i we couldn't show you some types of nosql databases okay so uh now this this small uh survey is over we ask you to not close menti because it will become useful again by the end of the workshop and it will be your chance to win some prizes so let's go back to the slides okay so i already gave you the link to the github repository let me give it again on the youtube chat and the discord chat because it's very important and it's that you might well be uh you might want to open the link repo you might want to open the link and keep it open in a second tab to follow along better okay so the link above is going to be very important in a short while and it's something that also i'm going to paste here and there as well and it is the link you will be using astra dev slash 819 it's the link you are going to use to create your own deep database in the cloud in astra db now let me immediately say one thing this is free and it's going to be it's going to stay there for you to play with and to experiment with it's free and it's going to stay free up to a very generous amount of usage per month and you will get a free credit every month to continue playing with it and when i mean playing i mean even more than that so you can also run a small production workload and still be within the free tier of this usage so stefano one important thing you forgot yeah you know i i know that you know you get a lot of offers for free as a developer but uh you know they'll require you to put a credit card and yeah painful right you know because it auto renews after a year or something you know yeah um so nothing like that no credit card required uh and like what um stefano was saying you can you would run some production loads um you know i i have a credit of 25 and it's showing zero dollar spent which is fine by me you know i'll keep using it as long as you know like so you have spent at you have spent already zero of your 25 dollars for this month exactly yeah no so the point is that it's a free offer and this is what we are going to use to play with nosql databases today okay so the link is there let us move forward and very quickly look at what is what is going to be what we are going to see today first we will see a bit of general ideas and the why and the how of the nosql philosophy the nosql revolution in the world of databases then we will see some of the types of non-sequel databases in particular tabular sorry tabular databases document databases key value databases and we will also see graph databases by the way this list is by no means exhaustive exhaustive there are a few other types of nosql databases but the landscape is so large and so heterogeneous that we cannot just cover everything so uh because we want to save some time at the end for the games and to give you a chance to win some swag okay so that being said rux if it's okay with you uh i'll give you again the link and i'll give it to you to guide our friends here in the creation of their own database in astra okay so you're right let me show your screen here perfect yes there you go okay um so one of the nice things about today's workshop is that you don't have to install anything i think you know stefan alluded to this uh and we have not really crammed too much into it so even if you feel like you're falling behind you're really not falling behind um you know you can always start from scratch and you know if if you're really stuck but feel free to you know hit us um you know on the chat uh but basically what i will do here is um you know kind of walk you through some of the steps uh my suggestion is if you want to watch it for a little bit and and kind of put down you know don't don't do anything at the keyboard for a little bit uh that'll minimize uh the distractions to you uh and also you know i don't need to be talking over you when you are actually doing it okay so so that's the um you know you can you can just uh basically what you do is um you know take the instructions from the github repository that stefano uh uh pointed to right i'm i'm using the same github repository except that i'm using a editor here called macdown okay and and essentially what you do is you just follow along right so follow along first thing is create the astrodb instance uh i think stefan already mentioned that what is astrodb astra db is really you know a database as a service like uh you know it was mentioned in um you know in some of the chats as well um but but essentially the idea is that we will worry about kind of where it's going to be you know instantiated uh how do we deal with some of the mundane operations and so on but from your perspective all that you're going to do you know to use astrodb um you know is to go to this particular um you know the uh astradb link that's there right so you know if you go to astro.datastax.com you know it'll it'll bring you to this link i'm already logged in but you know you may need to sign up with a github link you know or an email uh whatever you like right you know you can use any of that like i said before no credit card required um so all that you're going to do is you're going to say start free now okay you register on astra db and then you know you're going to go from there okay so what i'm going to do is i'm going to do a create a database okay and um there's always some questions about you know kind of like um is it case sensitive is it not you know and things like that my suggestion is just use the case that we have provided instructions um and then you'll figure out you know it's it's just easier that way but but essentially uh you know you can make it um you know case sensitive by putting it in double quotes and all that but but let's not worry too much about that right now um so i put in a database name and a key space name nothing fancy a key space name is really a container for logical grouping of your you know different tables if you will um you can you can um use that as kind of like you know the analogies like a directory or something like that right uh and then the nice part about this is that you know i can i can really use any of the clouds that uh you want or i want right so in this case i'm just going to put pick uh azure okay and i'm going to go to north america because that's kind of where i am right uh and i'm going to pick south central because that's the closest right uh and then you know you're going to say create me the database right as simple as that okay so create the database so you'll see here that this particular database is up it's coming up as pending right you know so as opposed to some of the other ones which are already active so i won't be able to connect to it right now but but you know maybe this is a good point to pause and see if there are any questions and let you guys create this because this takes a little bit of time you know not a whole lot but but you know once you have this ready we can we can kind of jump into the next uh part of the um workshop yeah so there are a few questions if i may interject one question is can we create one database or multiple uh well within astrodb you can create up to five databases in the free tier as far as i'm aware so multiple and they will be completely separate from each other another question was name of database and key space right because i think you showed that and i also gave it to in in the youtube chat again so and there was another question can we select any cloud provider yes you can so you can choose any cloud provider and then you can also choose any region probably you want to choose a region close to you but that's just because you know latency and stuff just go ahead and choose whatever you prefer because yeah google cloud as well yes the point is the point is that uh behind the scenes there is a lot of cloud stuff going on but you do not have to care because one of the advantages of a managed database in the cloud is that you do not have to worry about maintenance operations setting up configuring you just click a few clicks and you have your database ready to use which is what makes this workshop two hours instead of one day and a half probably so is there a is there an option to create a database in our cloud account and and the uh of course you can but but then you'll have to worry about you know your maintenance and and that's not really astra db per se but you know you can create your own cassandra database you know and there are different flavors in which you can create that yeah so it is mandatory to choose exact location where we are no you can choose any region probably if you want to optimize latency for your own usage you might want to keep the database close to your region but that's not a request requirement okay so there is a question about mongodb versus cassandra any big difference yes there are big differences and in some of them will be seen later when we speak about the tabular versus document databases uh okay i think we can oh hi xr welcome are you celebrating yesterday's uh victory in the lottery i hope so okay so it's nice to see you again let us move forward it's nice to see someone so one of the things that one of the things that pratyush mentioned is azure is not available in india uh i yeah i'm trying to guess why that might be uh you know obviously there are some business agreements or something um so not quite sure but but you know pick something that works right yeah okay so there are there is also an important question about can we have a setup on our local ubuntu as well yes you usually well that's true for cassandra and that's true for almost all databases i would say you can also install locally on one machine maybe on pro or more likely on several machines in your own home for example uh but again that would require you to take care of installing operations configuring and so and so and this is not what we are doing today but yes usually databases are something as something you can install by on your own that depends i don't want to go into the deep end i myself am a developer and i like to kind of you know go very deep you can you there are different flavors literally you know i mean you can install it on a virtual machine you can install it in a container or you can install it in kubernetes what we call as kh sandra you know uh so there are multiple flavors um of of how to install it but but probably you know even i can say that you know as3db is probably easiest you know to get started yeah that's the point so okay uh everybody do not worry if you're still seeing pending in your database status we are going to leave it there for a while because you know the the astra db engine is spinning up a whole host of new machines and databases and it's provisioning configuring for you so it's it's taking a few minutes let us leave it there and move on so if you want if you want you can put a thumbs up if you're yeah it is either inactive or pending you know so even even if it's pending just gives us give us a thumbs up so we can we know we can move on all right waiting waiting waiting okay okay the first thumbs are coming okay so active as well congratulations and yeah it's actually okay i think we can start moving on okay so let us jump to [Music] a generic introduction to uh nosql databases so here there we go okay so no sql databases let's first ask the first question what is the database well very simply a database is a software where you can save data in it and then you later want to retrieve this data simple and that would be the end of it right simple but obviously you can go deeper than that but usually a database does exactly that it is a store where you can put data you can change data delete data and then you can retrieve data now more into detail a database is usually a very complex piece of software it is not just a monolithic single thing that runs it is made by layers or components that interact together and usually you can look at the database and you can think of it as made by three different uh three separate layers so when you speak to the database from outside what you hit what you speak to is the interface layer so that is the part of the database that is listening to the outside world and speaking and receiving and sending out information so the interface is a set of uh it's a part of the database that follows a set of some definitions some protocols in particular there is a language there is a language the instructions to and from the database are expressed in and for example those of you who are familiar with sql with relational databases sorry know the language called sql structured query language it's a very well established language that is has been used for forever basically to query and and to query data in databases then there is a transport layer that is the actual protocol used to transmit data back and forth so for example usually one thinks of odbc or jdbc in that okay so this is the interface layer of a database but then what happens within the database when data comes in or is requested to come out there is an execution layer this execution layer is able to understand your query for example written in sql and it is able to transform it into actual uh parts actual execution of parts of the of the engine that actually go and retrieve the data in a particular way so your query might have a certain appearance but then inside the belly of the database the query is going to be probably deconstructed changed a lot and it undergoes a lot of transformations uh probably also for the sake of optimization so what table is is a retrieved first how is this table going to be uh inspected how is how are those two parts of data going to be mixed together to form the final answer to send out and stuff like that so the execution layer takes care of building the actual execution of the query in inside database and this has to be aware of the internal structure of it when i say internal structure i also refer to the storage layer so what is what is in the database data right and what is how is data stored usually it is stored in files there are some files so how are those files structured where are those how many are those how can i know which files which file to look at for that particular piece of data how is the file structured internally what kind of internal protocol it follows is it binary and so on so there is a whole layer of storage that takes care of the actual the actual storing in physical form of files of the data of your database so far so good so so stefano if you don't mind me jumping in um one of the things that you know i like to do is kind of try to get a simplified view of what i'm looking at right and the way i look at this is it's like a distributed hash table right um because you can have only so many values on a single um node or a single system right so now you have to distribute it at multiple nodes but the moment you start distributing it you have to worry about you know what happens if that particular node is unreachable and so on so the way i look at it you know i mean obviously there are a number of different layers that you know the platform has to worry about but you know from a developer perspective all that i'm looking at it is like a distributed hash table and the platform takes care of distributing it evenly um you know and building in resilience durability and all that that stefano is going to talk about in in just a second maybe it's oversimplified but but that's kind of how it looks yeah that's a good point and uh yeah okay so let's see let's see uh what comes next so databases is a from a very broad perspective are you can see them as made by these three layers now uh there is a notion of traditional rdbms that would be relational database management systems so there are there is something called relational databases that are the usual good old databases that have been around for 50 something years now and they are let's say a general purpose workhorse that has been serving the it industry well and heavily for decades and it is a general kind of database what do i mean i mean that you can ask various kinds of workloads from a database and in particular the kind of uses you can do of a database you can place them on an axis on a particular axis that is the oltp all up axis let us be more clear what is oltp oltp means online transaction processing and it is a particular kind a particular way of using a database a way where i might have a lot of data but still every single operation i do is usually going to change one single piece of data going i go there and i change that number by one i go there and i need that number so usually it's about going to a very specific point in in the whole data stored and writing or reading in a simple way and this is for example useful when i have to keep track of the number of items in a warehouse in a inventory or something like that usually this is the requirement is that this is very quick very fast i need my answers immediately i write fast i need to read very fast and the queries are simple and usually they are just those ones and they will not change very often because they are sinful enough to be written in stone let me say on the other end of the spectrum there is what is called olap olap stands for online analytical processing and it's a very different in a sense it's the opposite because there is where i do my complex analytics kind of operations on the data usually it's very complex analysis that involved lots of different data that have to be compared mixed joined and processed in complex length long ways so and and a job in all up in the olympic kind might take hours to complete and they are complex queries there might be a lot of joints and they might be changing a bit every time because let's say there is every night my my online shop runs a very complex job that analyzes all the clicks in my website to figure out how to optimize my what my sales so this usually is very complicated it involves looking at data from any possible of any possible kind combining them and stuff so it's very complex stuff and usually i do not expect it to be real time in execution so this is a spec a whole spectrum one so your database workload can see can be anywhere anywhere in the mid in in this spectrum my point is that the usual relational databases so those who came before nosql basically are sort of a all-purpose tool so they are not particularly specialized and they are more or less good for more or less all kinds of workloads speaking of which i can also look at the same concept from a different perspective i can look at this diagram this diagram somehow summarizes four different needs that i can have for my application for different requirements that i might want from a database so on the on the top i might want my database to be able to handle a lot of data and this is what is usually called as big data so i want my database to be able to store a lot of data maybe too much for a single machine even a very powerful machine on the bottom i might want my database my computation and data and machine to be able to do a lot of calculations a lot of cpu uh what is called cpu bound so i need particularly in particular i need the computation on the left side i need what is called streaming so my data i'm i look at my data not only just to store it and retrieve it but an important point is to be able to put data and have it automatically sent out fast and reliably to many other clients who are there to receive this data so online gaming for example but a lot of other examples are there so this is the streaming case okay check your mic the sound is different can you hear me now let's see we can yeah okay okay i'm i'm actually nice so i'll go on like please write if you have some trouble but apparently i'll assume it's okay now okay great thank you okay super uh where were we okay the right side the right side is throughput i might need my database to be able to handle a lot a lot a lot of requests per second so i'm there is an application there are many applications that are right into my database like crazy and i want my database to be able to absorb all of them or i want my database to be able to be read from many many clients simultaneously and this is a requirement that is a very high throughput so again the point is that relational databases okay relational databases are somehow some sitting in the middle they are good a bit good for all of the tasks but they are not specialized in any way now at some point something happened so relational databases for many of the emerging workloads were just not good enough so to speak especially because they were made to run generally on a single machine maybe a big machine but still there is a limit to how big a machine can can be so there was the emerging need to scale out to have a system able to live on a family of computers family of servers and share the workload to be able to reach high performances in either of those in any one of those four directions and this is where the nosql paradigm emerged so in the last 15 years or so uh thanks to web giants such as google facebook or linkedin uh a new way of looking at databases emerged that what is called nosql now let me say one thing no sql is not not not sql is it some it stands for not only sql which means it is just a way to go beyond the paradigm of the relational databases so nosql databases somehow we're aimed at solving in particular the problem of uh very big data and very high throughput per second and this is something that is has been needed by the the evolving uh kind of tasks that were that were appearing in the ite world in the technology world so big data and all that what what i mean by all that one speaks of the 3b so volume which means big data velocity which means i want a very fast response even though the data is very big and the system is very large and variety variety might be not apparent from this graph but it is also very important because today's kind of data more very often is unstructured data it is heterogeneous data you might have to collect data from several sources and it might have it might be not necessarily the case that it can be that this data can be show formed into a single format a single schema so sometimes those databases have to be very flexible to accommodate this data there is the concept of data lake right so you just collect data of any kind from various sources and you just keep it there how do you keep it there well it's heterogeneous so you need variety as well okay so before we go on there is a very important point to make and which is maybe a bit of a downer but you know that's how life is uh a very an important point is that usually nosql databases are distributed because big data and all that a single machine is not enough to host all of the data i want to to have there so i have a distributed system but when i have a distributed system there is a something called cup theorem it's also called brewer theorem by the name of uh the engineer who proposed it and it says a very sad thing but it's a very unavoidable thing so in an ideal world i would want three different requirements for my distributed system one is consistency i want to be able to read data in a consistent way even if i read it multiple times if there are concurrent rights i want to be able to read data and to have always proper answer back so to speak i want to read the result that i just wrote in the database the second requirement is availability my database should be able to always answer to me i query for some data and i never want my database to tell me oh wait i can't right now because i'm doing something else because i want a system that is available but then there is a third very important point which is partition tolerance because you see a distributed system relies on a network and the network will fail sooner or later a cable will explode uh a driver will fail so i want my system to be able to work even in the case of network failures of communication troubles now the theorem says that if you design a distributed system you cannot and you will never be able to have all three together all the time so you have to sacrifice one of them you can have a ca system you can have an ap system and you can have a cp system but you will not design a cap system and that's not just because of technical limitation it's a mathematical impossibility so this is a very important point because you see tomorrow you go there and start inventing your own database well the first question you have to ask yourself is what of the three do i want to sacrifice for the other two and that's depends because there might be different use cases so for example it might be worth mentioning that you know um these are really critical when it comes to failure scenarios right you know when and everything is good everything's good yeah very good point so well let's see uh there is there is a um well a relational database is was not even built to be distributed over several machines but now we have relational databases with replicas right with replica sets and all that but those have a master replica architecture and in a sense they are they are ca because they do not they are not able to cope with uh network failure because parts of the system will just stop functioning if they lose connectivity that's an example there are system systems that also have a very tight relation between the nodes and they are they just refuse to work if they they refuse to work unless the data is completely uh exchanged between all nodes and that's a cp system uh zookeeper mongodb are example of a cp system there are also ap systems that are able to always be available withstand partition tolerance but they might occasionally give different answers for a small amount of time before they have a chance to arrange to to exchange data properly and there are databases like that but the point is that there is no a better choice than the other well maybe p is absolutely important but ap and cp are not one there is no one better than the other it's just that there might be different use cases okay so that being said uh let's move on okay so now to the question how many nosql databases are there well the answer is many this is a survey conducted a few years ago and well the result is this map which looks a bit like an underground map but actually every station here every station in this chart is a different database and the the part that is not white is all nosql just to give you an idea so there are many now let's go ahead okay so there is okay we were saying we were seeing uh the relational database and this is summarized here on the left with with one layer uh an interface layer that is for example usually sql and odbc a layer that is the execution layer where it analyzes the query and then there is the storage layer and that's about it for the relational databases especially for the interface they all share the same interface to a large extent but in the word of nosql databases uh creativity led to in a sense led to a lot of different ways of interpreting the idea of nosql indeed there is not one nosql database there is not even a single type of nosql database you can see here there are six main types here ledger databases that have the property of never forgetting their history time series databases that are optimized for time ordered events or or records tabular databases document databases graph databases or key value databases and you can see some example of those here so there are two different kind of differences between nosql databases one is how when is their architecture so this is the six types here below and this is there are even more than that and then there is also a difference in the in the layers because in the relational world there was this very well established standard of sql but in the non-sql world again creativity rain so we have interfaces based on sql or json or cql or other languages also the query parser layer and also the storage layer are very different from database to database so it is a it is really a whole a whole universe to explore there and we will try to look at some of them but the main idea is that they go beyond the relational paradigm in one way or the other okay yeah so uh okay so let's see uh four of the main nosql database types we have column oriented databases which i will call tabular from now on which are based on tables and they have a superf say they they superficially resemble uh relational databases but actually the difference is sort of uh stop there at the concept of table then we have document databases where you can store documents so structures with a different schema from one from one document to the other we have key value databases that are the simplest and fastest solutions and we have graph databases where relations between items are a first class citizen okay so before we start having a look at tabular databases let me remind you of so you probably have at this point you have already seen your database becoming active nice so what you created is a database in as3db as 3db as we said is based on cassandra but makes it uh running in the cloud for you so no operations needed and let me say let me tell you that cassandra as well as as3b is fundamentally a column database so a table a tabular database but thanks to the magic of an api layer a data layer called stargate this single database is presented to you in various ways so you can access this database in various ways in particular it offers also an interface that behaves as if you had a document database so in a sense after having created this single database on astra you will be able to play with different types of database today and that's very handy because it makes for a very simple simplified application development by the way okay so let's see what a tabular database is okay a tabular database is based on tables it's a database that in a sort of a traditional perspective keeps the concept of tables from relational databases so a table is rows columns all clear it has a schema so you when you create a table you define the columns that are there and for each column you also define the data type so there's a string there's an integer and so on okay the points and the difference with a traditional relational database is that data are distributed over the various machines that make up your database so if you look here on the right there is there are seven orange blobs seven orange circles and there are seven servers and you see these rows that are scattered across the servers they are they form a single table a table with three columns country city population the point is that country is defined as the partition key or well the name can change between databases but let's call it partition key it is that particular column in the table that tells the database how to group and distribute the data across the machines so based on the value of this key the rows are grouped and distributed across the node you see there is this top topmost node here contains two rows with usa on the left we have two roads with germany and so on so it's a very clever way to handle amounts of data that would not fit on a single machine still from outside i get the appearance of a table but then there is an important point i want my database to be fast so if i want to get data from all countries that's not a good idea because i would have to speak to law to a lot of different servers so latency increase and that's not good but if my database if my application always needs just to access a single country at a time that's a very clever choice because i know which node to speak to beforehand when i'm about to run my query so i go directly to that node and i get my answer and it's everything everything is very fast but i can afford huge amounts of data so that's more or less the idea another important point is that this kind of architecture does not play well with joins because you would have in general to speak to all nodes again or the the the execution engine would have to for you so usually in a tabular database in a tabular nosql database joints just do not exist or any way they are to be avoided and so you will you will you might be curious what is the trick well the trick is that when you design your data structure instead of joints you embrace the idea of denormalization that means you duplicate the data all you duplicate the data a lot if you want just for the sake of having fast queries yeah that's a change of mind from the relational database but that works okay so i see a question do we keep multiple copies per key to keep it partition tolerant well partition tolerance is a is a you have it anyway because partition tolerance means that if the network is broken the database still still works and it still works usually in such in such a setup whatever node you query you will get some answer to some extent but i think this then then again because okay what what is not shown here in the graph is that you usually have a replication of the partial replication of data so it's not that every piece of data sits only on a single machine but still up to some extent partition tolerance is uh is uh working because you usually you are always hitting at least one of the note with the data you need then okay this is up to designing the right architecture for your cluster okay but i think i'm afraid we have to cut short the rugs have you seen some other interesting questions maybe yeah one one question was uh let me see if i can i don't know if you talked about it i just saw it a moment ago um but basically what kind of database does it use uh underneath and so on so what i think this is from manupriya gird what is the underlying data schema ds cpu or data structure it uses um and and basically it uses something called as uh and we are going really deep into this you know like ss tables um uses a hashing algorithm consistent hashing uh and what i can do is i can put a pointer to uh you know to that if you're interested in that but but i think um you know uh it's it's a fairly detailed discussion of how it's maintained uh it's not something that's standard but you know it's something that's made to be more performing because cassandra's um you know the main reason for existence is because of performance yeah yeah so rex properly mentioned cassandra cassandra is one of the best examples of this kind of database but it's not the only one yet today we are using cassandra under the hood so i invite anyone curious to look for our workshops introductions to apache cassandra that go much more in detail on the particular architecture of this kind of database but now i oh there is a quick question more than two data countries could exist in the same machine yes so you see here australia and india are together in a single node so you can have you know even many partitions because this is the name of these groups every country here defines a partition you can have many partitions on a single machine but the opposite is the problem and it never happens that you have a one partition split across different nodes so one one thing is allowed and it's not a problem because usually you have many more partitions than nodes and that's okay the opposite will never happen okay with that i think it's time to drugs to give you okay now there is still a very something that i have to do a bit fast through so use cases for these databases well such a database a tabular database is is very scalable it can scale out a lot it can have hundreds thousands of nodes sharing the same database so it can support high throughput and very big data so it can have very heavy perform very heavy write and read performance and this lends itself to a lot of uh very concrete use cases and that's one point the other point is availability because once the data is distributed in this way and also replicated you can have very high availability basically you can configure such a database in a way that is has zero downtime and that's good for mission critical application so caching pricing market data everywhere every car is where you you need your application to never fail can be dealt with with this kind of database then there is this distributed uh point because data is so these databases are designed to be distributed okay so that's there's a case there's a point to be made about uh compliance in this case because you you can have a single database spread across the world but still you can say okay data from my european customer have to stay in europe then you can do so you can comply with privacy regulations on the other hand you can also design your database to have data close to any client so you can replicate the data geographically all over the world and have low latencies everywhere the last point is cloud native because usually these databases are built to are designed to be to live in the cloud so yes rugs mentioned the kate sandra for example that is a very nice initiative to have cassandra run on kubernetes so very easily and so microservices uplink microservice application are a well a good a good match for these databases not only that but you can also have multi-cloud hybrid cloud by the way you you just created a database in the cloud and you didn't even have to worry about which cloud service cloud provider you were using you just choose from a drop down and you click create that's because of the very versatile nature of multi-cloud or so you can also have a single database living on multiple cloud provider and also on-premise so that's more or less it now with that rux what if we jump on a bit of hands-on with your tabular databases yeah so i'm sure i'm switching to your screen okay now yeah there you go okay okay so again the idea is you know keep your browser and you know the github link um you know handy so all that you're going to do is you know you're going to cut and paste from here okay um so so we already created this you know remember this uh database that we created earlier you know hopefully that's all active right now right and what we're going to do is we're going to look at some of the ways to be able to connect to the database we'll do what is called as sql um and then you know we'll look at uh you know graphql and and you know we also uh connect to what is referred to as swagger ui which is really rest based uh i think um you know stefano was talking about um you know multiple ways of doing this uh so we'll we'll kind of try to look at you know some of those at least so the easiest way to you know at least get started is using the sql console okay and and if you come here you know you're basically dropped into the sql console uh and you are already connected uh to the particular database okay so all that i'm going to do i'm going to describe the key spaces here okay so again all that i'm doing is you know taking stuff from one and and you know entering it here right so so i describe the key spaces the number of key system key spaces which we don't need to worry about uh but this is the key space we created remember you know no sql one right so what i'm going to do now is i'm going to use this particular key space right and i'm going to use this particular key space a lot of times if you don't put you know the semicolon at the end you know it's going to be hanging so what you can do you can you can just put a semicolon and and it will complete it okay so it just gives you a flexibility and you'll kind of get used to this uh if you have used sql before you know it's kind of very similar as well uh again the easiest way to do this is probably just contest right so now what i'm going to do is i'm going to create a table uh always a good idea to see if not exist because you know you don't want to overwrite or you know get into situations always defensive programming as they call it right and you know if it does not exist i have a number of different uh attributes you know for the table right but video id is the primary key so you know if i want to um you know do a quick access the easiest way is to uh just specify the video id and you will see that in a second okay so i've created the table this doesn't mean that you know i have done you know there is any data in there um all that i'm doing right now is um you know essentially uh kind of creating the schema if you will right so if i describe the key space you know it has this particular table called nosql videos and it has a whole bunch of properties right uh we just left it to uh you know default uh but but you know if you want to kind of go through that you can you can do that as well okay so what i'm gonna do here is just insert um into videos right and i'm going to insert a few videos or rather insert a few entries into videos okay so you know i can i can either do one at a time okay or i can do all of them at the same time whatever works for me right whichever kind of suits your fancy so to say floats your boat as they say okay so you know i have one entry now i have probably three entries right here so this is not a good idea to be able to select everything you know this is uh typically what you will do in a sql kind of query uh in a key value kind of pair um database it's probably not a good idea to just say select star uh you know you you probably want to specify um you know the uh the key and then ask for the value retrieve the value that way it knows exactly where to go you know retrieve it you know partitions of partitions we've been talking about that agnostic we need don't need to worry about it but in this particular case it's probably okay to do a select start from videos because you know there are only really three entries right uh but really ideally you know what is better is you know if you have um you know equity or most of the queries uh something along this line right where i'm specifying a video id right so you know you again you know you can try uh you know with different ids see what happens right so you know um so it will give you a syntax mismatch um and and so on so um what am i missing there uh video id okay that's fine um select start from videos so to you know you can try this out if you want you know you can just do a help and it gives you information on this you can go to this particular documentation if you're interested in more of that but what i'm going to do now is i'm going to create a new table and you can see here what the table is users by city and as you can guess what i'm trying to do here is i'm using a partition key starting with city right so i'm going to create a table okay and i'm doing a whole bunch of stuff in there and we'll kind of walk through that but you can you can take your time to kind of work understand that as well basically what i'm saying here is i'm my primary key is going to be on city and then last name and email and then what i do is i cluster by last name and email and this is the thing about nosql databases where what you do is you organize your data based on the use cases and based on how you're going to retrieve it rather than you know kind of have a one format which is called the third normal format right you know which you may be very familiar with and then adapt your applications to be able to use the data it's really about kind of um using uh the data in the manner you uh beam fit and sometimes it might you know it may involve some planning earlier and sometimes it might require some adaptation and so on uh but but you know generally what you're trying to do here is you know to be able to optimize uh for access okay so let's go ahead and insert some entries always like to change it up a little bit okay i'm going to insert two entries right and the last because i don't like david gelardi right i'm going to just put my name in there let's see if i can do that correctly okay and i'm just going to put lines and as you can see i'm very self-obsessed so everywhere it's lags you know you get a point so now again um what i can do is i can i can select okay uh from [Music] the um users by city and this is kind of what i'm doing is you know it's showing everybody who's in paris and there are two people there right and if you remember i think uh ours was orlando right so hopefully it's going to pick me up okay and there is rags okay so what you can do is you can try this out and and and you know there are a bunch of other queries itself uh here uh i don't think i need to go through all of these okay but let's look at this for example okay obviously i changed this right so there are a bunch of qualifications that you can do um that you know essentially it tells you uh you can you can you can do a trace off there are a bunch of bunch of other commands as well that you know you're welcome to play around with but this kind of gives you an idea um of how to use the uh so you can trace off um and so on okay um so this kind of gives you an idea of how to use this sql console um you know there are a number of ways of uh um using the sql console actually uh surprisingly uh one is you know probably the easiest way with uh astrodb using uh you know um yeah the browser or the console in the browser but you can install it on my on your command line uh in your cli you know on your laptop and and there are a number of other ways of doing that but essentially you know it it's kind of similar to other query languages that you might be used to specifically sql um but you know a little bit different because this is really more intended for cassandra and that's why it's called c cql um anything else uh stefano any interesting questions we need to look at there is a question that comes out sometimes because it's just a technical thing it's about pasting because in some on some operating system it uh it is difficult to paste in the sql console so control v does not seem to work i think that's usually for windows uh in this case you should right click and choose paste that should work everywhere then good point good point yeah control v may not work or our command we in the case of mac may not work so what you you know control pretty much works across the board then there is another interesting question about do we have to populate our own tables or are there some pre-populated data sets for exercising well the answer which i wrote in the chat anyway for everyone i've repeated is for today's exercises the commands are self-contained exercise so there is also the table population as you have seen but actually there is a nice way to to insert the bulk data from a csv file in your tables so if you want to start with that appropriately popular a full table so to speak to play with you might want to just import a csv file into your table say from any other source then there are also more enterprise ways to bulk load the data into a table in cassandra but yeah there are also stress test tools that populate tables at astonishing speed just for you know testing the performance and so and so on so there are various ways okay and there is some sample data available um you know that you can you can kind of use in this particular case we're kind of going through um you know one at a time just just for the understanding yeah so yeah there is there is a one of our workshops the one about the building your netflix clone that maybe it's a bit out of our in scope today but one of the things it lets you do it lets you download the csv create a table and load the data from the csv into the table and its data about movies so title year and so on so that you might want to have a look at that one it's called building uh netflix clone something maybe somebody might drop the link later yeah but i think we are done with this part of the hands-on so let us move if it's okay for you oh there is there was a question by anupriya is there a way to delete the account and astra um yeah you should be able yeah i guess yeah you have to i've done it i haven't done it myself but i think i think you have to because you know regulation gdpr and so on so it's very hard to imagine a service today that doesn't let you delete your account so i i never did that but i'm sure you can if you find yourself in need to just reach out and we will get in touch with the technical team behind us in case okay and is that an option to do bulk export uh there are a number of different connectors that um yeah you want to talk about stuff well just one point it's in general it's very big data hosted on several machines so a bulk export would probably not be just you know a button to download a file so you would need a destination for this bulk data think of petabytes and usually this is something you do with a spark job for example to completely copy all of the contents of a table to somewhere else it's a distributed task in general if i may say so well the csv might work but in that case yeah you can do a bulk export i guess but uh yeah i'm not sure about that maybe somebody else might chime in and uh i yeah is a point that we might want to please if you if you want to reach us on discord we can tackle this question later because now we have to move on i'm sorry but that's a very good question i'm writing that down csv export i'm not sure how you can go about it but okay uh yeah also the netflix clone so maybe somebody can well you find the link to the netflix clone repo in the git in our github anyway i will so i moved on to the document databases if that's okay because we are a bit tight on time at this point so let me uh here let me switch to my slides and let's have a quick look at document databases okay so so far what you have seen is tables with very precise schema so columns each column has a data type but what if you have to store say heterogeneous documents document databases to the rescue so what is a document database well the main point is that documents are structured data each document has a structure it has fields subfields and so on but there is no common schema so every document has its own structure another point is that usually json format is used to exchange a document information and to remind that to remember that you can think of json statum which is aptly enclosed in curly brackets and quoted because json javascript object notation is uh now very way beyond the scope of javascript itself and it's a way to exchange structured information so nested information made of fields values and subfields and so on in a document database you can you don't speak of tables usually you speak of collections that are like big bags where you can store documents each document might have a different shape and before we move on i'll have to remind you that there is another jason in greek mythology a jason who killed some monsters and retrieved a very precious piece of fur made of gold and as you can see maybe in the slides it's a bit small but these two jsons correspond to different json documents and they have different fields and subfields and different structure and they are completely different yet we are storing them in the same collection i was about to say table but that is wrong what is the point the point is that these collections in this collection the concept of a primary key or partition key hardly makes sense because how do you define a column that is used the to for the partitioning of or identifying the document because every document has different columns different fields so usually when you store documents in such a database what you get back is an id generated by the database engine and you can use that id to later retrieve the document but you can also retrieve the document by looking at the value of a particular one of the of its fields the point is that document databases fit some specific use cases for example they are very handy when it's when it's about building a front-end application because front-end speaks json all the time so you can have your back-end speaking the same language which is absolutely nice yeah on the other hand the performance in writing these documents is not as fast as in a completely schema full database so this kind of schema-less or variable schema json based document database is usually best when you do less right than reads or when the rights are aimed at changing a particular field of a document and you do not always constantly write the whole document on the database okay so this is more or less it about document databases there are many the most famous is certainly mongodb but that's not the only one and thanks to the magic of stargate you are about to use cassandra or better astrodb as if it were a document database because this stargate layer behind the scenes makes its magic and makes you see one of its tables as if it were a document a schema less let me say collection so let's move on raj if it's okay i'm moving to your screen again for the demo on document databases so here you go all right so let's let's get started with the document databases and again all that i'm doing is having the github link handy right and i'm going to start with you know the document databases you know you can you can insert into videos a json document for instance so something like this and you know what you can do is you can retrieve it right so now if you retrieve it you get this back in json okay so you can see here you know title url uh tags and so on and title url tags so so you're getting it back as as json is a really a document database okay but but more importantly what we'll do here is uh kind of look at swagger ui um okay and and to be able to use the swagger ui there are a couple of challenges that you know uh you want to keep in mind before you do that one is you need what is called as an application token um and you may want to keep this handy so typically what i do is i create an application token um you know before uh and then delete it at the end of the session because you know um i don't want you know it to be reused or misused or whatever but but what i do is i typically keep it handy so that i can um i can use it okay so so what what is the best way to generate the application token um you know the instructions kind of walk through that but essentially you you go into this what is referred to as organization settings okay and here you have something called as token management right and you're gonna generate a new token okay you're going to pick the role as a database administrator okay and only then you will have this generate token and i have to warn you that once you generate this token this is the only time you will be able to see it um so if you for some reason browse away from this uh this token is no longer available for you right i mean it doesn't mean that you can't generate a new one but you know the best thing to do is to copy it right so you know there are a number of different ways you can download the uh um you know the uh csv file so to say uh in this particular case what i'm doing is i'm just down uh i'm cutting and pasting it okay and typically what i do is like i said i put it in my um you know like a text file or something like that no notes pad or whatever okay so let me make sure that i get the right one a b just to make sure i got the right one a b yeah looks looks reasonable all right uh i also have this particular um you know url handy uh and and you will see this url the first time when you land there so it might be a good idea to just to keep it handy because sometimes i've had some issues with that and other than that you're good to go okay so this one is a little bit more a little bit trickier because you know it's not really all commands that you're following through okay so first of all you have to look at where the swagger url ui is or url is so there are a number of ways of doing that again um you know you can just say connect and you can see here launch the swagger ui this swagger ui is applicable only to your particular database okay so the moment you hit this okay it's going to launch something that looks like this what is swagger swagger is like an open api specification that lets you use you know rest apis in a in a convenient fashion right like for example you know um um if you if you know rest it basically uses verbs called get post delete put and so on um and essentially those are kind of the verbs that you're using here and then what you do is you have apis right and as you can see here these are all version two api and then there is a bunch of different apis that you can use okay so remember keep the um you know the uh application token handy uh you know just to make sure that i have the application token in my text you know i have it right here right so i can i can use it anytime i want okay um and then what i'm gonna do is i'm gonna just go through this again okay so the first thing i'm gonna do is uh let's see create an empty collection in a namespace okay so i'm going to have to look at you know they're all kind of classified in kind of documents schemas um you know data and so on okay so so all of them are are kind of distinct so what i what i want to do is i want to create a new empty collection in the namespace so you can see here create new empty collection in the nameplace in the namespace right so i'm going to click on this and you will get something like this okay and you may be tempted to kind of put post something here but you know you you got to do the try it out first okay so let me pick the cassandra token you know i'm going to be a little bit more um you know kind of um methodical here i'm going to make sure that i put the token every time okay let's not worry about namespace id or whatever right um and then if there is anything else that i need to put in like in this particular case i need to put this name and i'm going to do this right and then what i can do is i can do you know hit this execute button which is sitting right at the bottom right um and you'll see here there are a number of responses i haven't gotten the response yet right you know there's 201 which is created 401 which is unauthorized 409 conflict unprocessable nt whatever right so so what i'm going to do is i'm going to hit the execute and hopefully i should get or did i miss something here yeah yeah oh name space right that's the one so it's it's pretty um you know self uh kind of correcting so i hit the execute okay and hopefully the response should be a um you know 201 i got a response and you can see here you know what was the uh the you know the call message that i sent uh what was the request url uh what was the server response in this particular case it's 201 and so on right so just for kicks what i'm going to do is i'm going to use the same token right and instead of a b i'm going to put a z and let's see what happens in this particular case right hopefully i should get a error unauthorized right because i didn't put the token properly right so you can kind of play around with this uh you know um i don't know if i want to go through all of these but at least i should try a few of them right so let me create a new document okay again what you're going to do is you're going to locate you know through the documents documents um header if you will it says create a new document right locate that so those are the important parts right and then what you're going to do again is try it out right for namespace put the cassandra token again hopefully i have the right one no i don't so it's always good to make sure that i have the right token so i go back pick it up again and there are there are ways i can use multiple um you know cut and paste buffers but i'm not going to be too much about that okay so now i'm going to specify the um namespace id which is nosql 1 right and now i have to specify the collection which i just created right remember i created the collection one uh and for the body okay i'm going to use this again very easy all that you're going to do is you're going to cut and paste it okay and let me not worry too much about you know kind of changing that or whatever so i'm just going to execute this okay and you'll see again what is the response you know it's a 201 um basically indicated that everything went wrong um and you know and you're in business right so you can use this document id actually um you know to be able to retrieve documents if you want okay and there are some um you know rest api calls that kind of walk through that i'll just put it in my application token again so that you know if i want to use it i can use it right but let me see how how we're doing with respect to time i don't think we're doing great right so um so when i click the execute button i got a document id which is what was created right i can find all the documents in the collection okay so so again what i can do is i can go and uh i don't know i'm not good at reading uh find documents in the collection wherever that is right uh but but you get the idea of how to kind of um you know look at all the get all the name spaces getting right multiple documents um you know delete a path update and and so on and so forth okay i don't know if all the documents is further up here oh yeah search documents in the collection is that what it is search documents in the collection okay so so i can i can go here so the tricky part is really locating where it is yeah and for somebody like me who doesn't have um you know a lot of patients right um you know sometimes i might paste into the wrong one and so on uh again you know if you put it in here i put in the wrong document id it's not gonna help me uh putting the app um the application token [Music] right and then it tells you you know exactly uh what name space uh in this particular case again it's nosql one right no big deal and it believes one right and uh i think that's pretty much it right you can specify page state page size and so on to do some kind of paging uh but but you get the idea you know in this case we have like only one exam um one entry and you got got that back okay so enough of that um you know you can kind of retrieve a document from its id um you know you can search the document with the web class and so on okay um so i think that should be good for now then we can come back to you know the key value databases yeah okay great they're good yeah so i don't see particular question uh some of them some of you are getting some error here and there so i suggest you to go through the instructions and make sure you get you put all tokens and you type the right payload json properly and you you might want to repeat it slowly at your own with your own time making sure every that you get the right endpoint and all that and and by the way you know we have one extra uh exercise there which we'll give it as homework you know typically that's what we do because you know we just don't have time for that and it involves a little bit more complex setup but uh you know we really want to focus on the mentee quiz and and finish up you know what we had planned for today uh i'm sure we'll get to that yeah so very quick question about the token is there any way to set the token once and then use it for all other apis not within the swagger api as far as i am aware because this is actually a playground to just test your endpoints so this is a volatile kind of thing you insert the data and you click execute and then you move to another endpoint and you have to do it all over again if you want i've seen it mentioned and here and there in the chat all of these endpoints are accessible with postman also you can write your own python script to access them whatever so they are rest and points the swagger api makes it just easier to navigate and test them quickly before you know writing your application your proper application that would just query those same endpoints for real exactly you can use tools like postman okay yeah the beauty is that it's rest so yeah you know it's a universal language so easy okay so let me come back to my slides and let's move on to key value databases so this is a very simple thing these key value databases are maybe the simplest kind of nosql databases and they are just what the name says they are a big map between keys and values and that's it so to every key a value is corresponding and keys are hashed into buckets to for partitioning so it's a distributed system so actually behind the scenes you have many machines that share parts of this key value big map but that's it so you can think of a key value database as a very big table with only two columns key that is the primary key and the partition key maybe and value that is a single other column a data column that contains whatever you want and that's about it so it's key value databases are designed to be passed fast and simple they do not support complex operations they do not support joins obviously and they just support writing a key value pair reading it deleting it replacing it and that's it so um then again compared to a two column database there are some optimizations at play but still they are designed for speed and they they might support typing so in some of the databases the value might also have a type but usually what you store as a value corresponding to a key is just a string and then it's up to you to serialize it that in and out or it might be a binary blob something like that a good example of a key value database maybe the most famous one is redis redis is a very performant piece of software that acts as a distributed key value store and for example uh since the primary goal is speed redis is mainly working in memory then again it is also able to persist if it's needed but the point is that it is fast and simple so what are the use cases for a key value store well you might want to cache expensive computational results so if you if you have a pure function say a complex factorization into prime factors or maybe uh sentiment analysis on a sentence that you might want you might re-encounter something expensive that is a just whose result is just the result of the inputs there are no impure parts in the function you might want to use a distributed cache which is a key value database or you want to store user info in an application so user session information again user id is the key and in the value you serialize whatever blob you want to associate to that user you can use it to easily duplicate data for example so there are a few very important use cases for key value databases but the point is really speed and simplicity and i would say that's about it because it's very simple key value databases right my observation is that cassandra that is astra for for you today offers also a key value uh a way to it can be also used as a key value database namely we in a in the case of a table of with two columns and i think that's what you are about to try real uh rugs a real case of a key value database but working on astra db and i leave it to rugs again for the hands-on uh is it okay for you shall i jump to your screen again yeah you can you can but i think i think i'm gonna jump straight to the graphql part of it right oh is that correct okay well uh because there's really not a whole lot of you know yeah the key value would take uh well a few minutes maybe do you want to jump on the graphql okay yeah yeah okay so let's let's do like this in the repository you find the so yeah so we don't have to rush the end right good point on the repository you find all the instructions to play with the key value database which is a fairly simple concept in itself but the nice thing is that the repository makes you play with something called graphql api which is a different way to interact with a different paradigm a different uh protocol for speaking with an api a different language and it is also something supported by the stargate layer so you're learning a lot of ways to speak to a lot of different nosql databases but we will leave that as an exercise and we move on to graph a graph databases but then okay sorry so i got it wrong i thought you were skipping you were jumping to graph databases sorry yes so good no no no no sorry about that let's let's do the graphql because i think uh using the graphql playground is still yeah it's uh relevant yeah so there you go yes sorry for your misunderstanding yes yeah that's okay um all right so so i'm gonna get rid of this swagger ui part because you know i have uh already done this but go back to again the connect part right and then you can see here there's this graphql api part of it right so i go to the graphql and again just like before right we're going to get a graphql playground here right uh and again you know received status code 401 the reason i'm getting a status quo of 401 was because i don't have the uh the token pasted here so so the first thing you're probably going to want to do is you're gonna paste the token okay so let me take the token again and paste it here sometimes i've had issues where i can't see um you know what's going on there uh let me know if you're running into that and i can kind of try to help you with that um and there are two tabs here one is for the schema and the other is for the graphql data itself right for the data itself so remember um you know a a good point to do here is you know to kind of take this url and because sometimes you know it gets occluded uh might be useful to um do that okay so again all that i'm gonna do is make sure see it says here we are creating a table we want to use the graph sql schema tab so make sure you're in the schema tab and not the graphql tab right and essentially what i'm going to do is i'm going to create a what is referred to as a mutation okay and i'm going to create a table called as kv right and you'll see you know again all that i'm doing here is cut and pasting it right and you can hit the execute button okay oh did i miss say all right so i'm going to execute this you can see you know the output is pretty um concise you know basically it said the you know it created it right and you can go back in your sql console and i'm not going to do that but you know if you look at it you will see um you know there is this um you know this particular um or you can get the information on the key value table uh you know from from sql as well okay uh from uh from the sql console okay so now that we have created the schema so to say right now what we're going to do is we're going to go and and and create some data as you know same thing that we did before right but now you want to make sure that you switch to the graphql tab okay so make sure you're switching to the graphql tab um okay okay this is where you know sometimes what happens is you know it gets occluded and i'm not quite sure what's going on there but uh but we'll worry about that later so i i just so that's the wrong one right all right let me go back here and go come back here and see what happens okay so that's not very useful um so this is where the thing was supposed to be pasted so i'm going to please keep myself smaller because folks there are not able to see anything yeah okay now that's way better sorry guys thank you so this is the application token again you know make sure that i have the application token okay and i put in the application token correctly right and then here i need to make sure that i am in the graphql space so so to do this what i'm going to do is i'm just going to make sure that i have the right space okay so i'm using the graphql hopefully this is the right url okay and you can see now you know that that thing went away right uh and then what i'm gonna do is i'm gonna paste in the way to create the um the data or instantiate the data okay so again you know all that i'm doing is really cutting testing okay so i'm going to put it here and i'm going to run it and hopefully this is going to work fine okay so keeman is created ke2 is created uh you can kind of you know check this out from sql as well um okay you can insert more into here um but you know you probably you know i think you get the idea of how to use this the thing to watch out here is make sure you put the application token uh the urls are in the correct uh shape and so on um you know because sometimes that kind of gets you okay and make sure you're in the right tab you know in this particular case when i'm creating data uh i'm in graphql when i'm in the schema then i'm you know essentially creating the schema right and here you know there's another example so with that let me hand it back to stefan is that good yes so let me re-inflate myself in proper size okay yeah no it's we have to be careful because sometimes we forget and our ugly face is just covered that one tiny bit of screen that is crucial to make everything work for you folks at home so sorry for that okay so uh let us quickly see some of the features of the last type of nosql databases for today to leave room to leave enough time to play games and to award you your reserve prices so graph databases now so far we have been seeing databases that are nosql in the in one particular sense they usually disregard relations to some extent and they just work with data that has been put together and just works like this so we we've seen there are no joins here and there there are simple key value stores and so on but a graph database is not sequel in a very different way indeed the idea of graph databases is to take relations and make them a first class citizen in the database a very important part of the structure of your data indeed graph databases are good when the data you have to model and you have to work with is full of relations when there are a lot of relations between items so very general what is a graph a graph is a thing where there are points connected by segments so there are edges connected by vertices sorry vertices connected by edges so points connected by lines and these lines usually represent relations look at the example here on the on the right upper right there are customers addresses orders products tags and they are related by arrows so a customer resides in an address an order is made by a customer an order ships to an address so there are a lot of relations between these objects or you can think of a social graph uh i know my friend my friend knows another friend these other friends know seven other people one of the one of them is my friend and so on so there are a lot of examples in the real world where the best thing the best way to model data is as a set of points connected by segments and this is where graph databases shine so they are natively built to describe situations with a lot of relations between them and even they they usually have uh their own dedicated language to interact with this data so in many cases the analysis you want to run on a graph database involves a lot of walking you know through through your database so walking from friend to friend to find out who is who is a friend of whom what kind of so the typical problem the typical question you ask to a graph database is something like okay i have this network of people who know each other let me tell me what subset of points subset of items of vertices are form a subset that is very densely connected these are typical questions that you ask to graphs and they have various applications so you might want to identify groups of people who know each other very well you might identify types of customers for a market analysis you might want to identify groups of criminals in a network of cell phone calls if you are the police and you want to do this big data analysis so there are very different use cases so network uh approaches are good in social networks obviously recommendation engines throughout the detection even just you know your navigator your card navigator find me the best way from this address to this other address that is also graph kind of analysis and so you might you might be doing that also with a relational database maybe with a lot of joins stacked on on top of each other but probably that would not be so performant because graph analysis is computationally demanding and these graph databases are usually designed to be optimized optimized around that kind of analysis those kind of very complex queries and they usually involve their own language there is a very nice language called the gremlin that is able to exp you can see an example here is able to express uh a tower of joints in a simple and understandable way so yeah i think that's about it graph databases uh the famous one are neo4j titan there is a commercial product by data stacks called the dse graph also but that's not the end of it now there is no okay before uh no okay let's let's finish this part and then we spent one word or two about the exercise about graphs we have come at the end of our survey of nosql databases so you can think of what we saw as lying in a plane you see here i made a i have a small graph of the positioning of various types of nosql and relational databases you see on the on the vertical axis we have the scalability so how easy is for the database to scale to large clusters large distributed environments and on the horizontal axis we have the importance the database places on relationships so you see a graph database is more or less pretty well scalable but in particular it is very high it places a very high value in relations okay a key value or a tabular database can scale very well but it basically neglects the concept of relations it's up to you the user the application to draw connections between items in your database so to speak then there is the relational database that we already saw is a sort of all-purpose tool not specialized and it is some importance in relations but it's very hard to scale it up to high sizes and high workloads document databases lies somehow somewhere in the middle okay so with that i don't know rux do you want to spend some words on the last exercise graph databases or we can just mention mention it quickly then we might move on to the quiz so yeah yeah like um you know what what stefano was mentioning was that you know again the real life is is very complicated because there are all kinds of relationships multiple relationships um yeah and and there is an example that kind of walks through one of those relationships and even though there is only like maybe um you know like a dozen um data you know sort of say and data entries the relationships are pretty complex there just to visualize it you know there's a graph and and essentially what you have to do is you know kind of use docker compose to kind of walk through that uh basically guards and demigods and so on and you can kind of walk through all of that you can look at what are vertices what are relationships and so on uh but but for the you know for the purposes of today uh it's easier to kind of do it um at home uh you know typically uh you know with a uh a pretty good ram um that is running docker uh and and you can you can kind of try it out uh when you have the time okay uh but but essentially what a graph database is to be able to capture the you know relationships and to be able to analyze it and and i think stefano gave a lot of examples of that yeah okay thank you uh i just let me just assure you on one thing this fifth part of the hands-on that you will do on your own is not required for the homework so the homework is completed and graded as passed if you reach the the end of the fourth part the graphql part on key value store that you just saw that's enough for for the homework to be completed but you find more instructions in uh in the github repo nevertheless uh i really invite you to play with the graph database one because it's cool and two because it's a very nice hidden gem the the exercise that you will have to play with is not only just a bunch of commands to execute but it is also teaching you a lot so please go ahead and play with it because it's cool yeah and just for sanity's sake you know you and i played with it just before just to make sure yeah it's nice yeah and the gremlin language is cool also okay yes so uh i think we have we are now at the last part of today's agenda and we are almost on time for some reason thank you rex you've been uh very tight on time and you've been doing a great work of uh making up for my slowness it's all used so i think we can play the real quiz now so uh i yeah so leandro your screenshot is only for the last step in four or five yes then again in the the homework also uh requires you to do some other interactive scenarios in the in some interactive labs to get to know better some parts of the databases and you are required to do them as well they are quick and fast and fun and you will learn a bit more especially there there will be the answer of some of the questions that we have seen today in the chat so about data modeling and stuff so go ahead and play with them as well now let's go to the juicy part of today the quiz so please okay i should be good to go now i'm not sure if it's prison okay okay so um let me paste again the link to join us on many meter that is not working for some reason okay don't why exactly oh i'm having some strange problem with the trees but that's bad 20 is not working uh i think my screen is locked okay we will find a way in the meantime let me paste the code here in the chat while i fight with my technical problems i already put the you know the number okay great thank you uh you know what and just to repeat repeat some of the tips uh don't look at the uh you know the youtube you can you can take your attention off of it um just focus on on whatever okay you know browser device or whatever you're using uh for mentee because um you know there is a lag and it takes a while so so you know focus on that and and remember the faster you answer the better it is but you know if you get a couple of them wrong i think you have no charge the chance of being late there we go maybe even get one wrong right uh might be hard so good luck okay so i think i solved my problems so how many of you are there so 15 please give us uh give us a thumbs up when you are able to connect because that should be we should be back in business 30 and more and more please jump on board and you will have a chance to win some swag so this is quiz time so as rox mentioned uh you have to answer fast but correct as well and perfectly got a lot of let's start okay so is there any warm-up questions or you just go straight into so get ready and let's jump in okay so okay i think we are directly good to go and there we go okay so first question this question is a special it's not important please tell me what do you see in the picture a truck an airplane a ship let's see a ship very well done okay this is one of our ways to ensure there are no bots at play so now the real question these questions are time so you better answer fast to get more points let's start why not sequel why indeed because the relational model is inconsistent to piss off the previous generation because new is better no matter what or to overcome scaling limitations so what is the the why the reason nosql was invented discovered built engineered why so let's see and the right answer is to overcome scaling limitations well done because no sequel obviously there's one one teenager who's playing the question yeah pushing off the next jumper yeah yeah nice three because the relational model is absolutely consistent so let's see the leaderboard after this first question okay so alex or four wow it's pretty good yeah yeah i don't think that's let's see after a few years but but usually it starts like that yeah yeah let's see after a few questions this probably gets spread a bit more question number three answer faster if you want to get more points what is astra db it is a local in-memory version of cassandra or it is cassandra as a service in the cloud is it perhaps an album by radiohead or it is a development tool that's a tough question that you've been using it today so maybe you might have an idea time's up and it is a cassandra as a service in the cloud absolutely not local and not in memory okay so people are doing pretty well yeah yeah you've been you've been paying attention congratulations yes so now fastest is jon snow who jumps in second position hey i'm i i'm number two i'm number three you are r i am i see you you'll see that okay okay so later let's not forget to [Music] okay so answer pass to get more points question number four which nosql database uses partitions to distribute a table's data over several machines relational tabular document key value graph or microsoft excel we discussed that a lot so time's almost up and the answer is tabular very good congratulations to those who enter tabular no by the answer microsoft excel not surprised and let's see what happens to the leaderboard after this question so the fastest is who jumps in first position lucas and vivek second and third yeah but they are all very close to each other so defend your position and let's go to the next question so we didn't we didn't set up the quiz to you know favor cassandra this this time so question number five the sql database types discussed today graph relational document cached tabular titular tubular terrific or graph tabular document key value or maybe key value document circular schema less or maybe document key value json and serializable that's a mouthful isn't it right i really like graph tabular document [Music] i still have to see a tubular database but that might be for an interesting new invention right so let's see the fastest is cassandra and cassandra lucas and vivek are still first second and third well done but it's still anybody's game right and let's go for question number six answer fast if you want to get more points a key value database supports complex relations and joins or is perfect for caching expensive computation is perfect for doing complex analytics or always stores valuable cryptographic keys what is it up for we mentioned that so if you've been paying attention and the answer is is perfect for caching expensive computations so i mentioned the example yeah absolutely that's a leaderboard shattering question yes right and and maybe we need to do a better job of explaining this yeah well we didn't we just mentioned a few examples but lucas vivek and surab surab jumps in third position but i think cassandra still has a chance to climb a chance yeah exactly question number seven there are still three of them answer fast to get more points for the document database we exchanged data using xml json html binary or carrier pigeons i would go for carrier pigeons they are very fast high bandwidth yeah you can use it anywhere especially low power right a few grains of wet correct green it's all green yeah json format well done jason think of json the myth greek myth hero or jason statham anyway and this is what we used so and i think you did a good job stefano about talking a little bit about jason so it was in everybody's mind okay so lucas and surab still maintained the first three positions and let's move on question number eight answer fast to get more points in a tabular database each record has its own shape or schema records in a table follow a fixed schema there are no records only a big bag of values or values never have a definite type let's see this is for a tabular database the first example we saw and the answer is records in a table follow a fixed schema ah many i think many have mixed the the document with tabular because each record has its own shape is for documents you know the collection shapeless but we might try and explain that a bit better next time anyway so let's see exactly let's see let's see and ah wow we have a leader i change lucas first second and third wow surprise only lucas maintains the podium and then for the last question last question get ready for all them ready and go nosql databases are schema-less true false or it depends and there we are promised we end up we finish on the same question we started with and the right answer is it depends we saw examples where there is a schema examples that are schema-less so it depends and with the last question with the last question the leaderboard who's the leader ah so reigns the winner lucas beckham and katya second and third and then to to the first three winners please get a screenshot of your winning mendy screen and mail it to the address that i'm about to paste to the youtube chat so [Music] please i'm also pasting it here into the discord okay so uh please i'm i'm not closing these i'm just leaving it there so do not oh i'm i still have my trouble with okay okay there we go so let's go back to the slides for closing and saying goodbye to everyone okay so closing words uh okay so those of you who got the first second and third place please get a screenshot and mail it to this address and you will get contacted and you will get the swag by him by ordinary mail sent to your address now let's not forget about the homework the homework is uh explained in the repo you do it you submit it and then we will grade it now i see a question in the in the youtube chat when will the badge available well great in the homework is a manual process we go through all of them one by one so give us a few days sometimes there are so many submissions that a few days become a few days more but be patient we will go through all of them and in case we forget about you write us on discord and we will promptly look at your homework so go ahead and do it then okay there is also some optional hands-on with docker with the graph part as we already mentioned everything is explained in the ripple now how do you submit the homework that's also given in the repo you go there you open an issue you feel so you you click new issue and then you fill a template and you put uh you give us your name and email and then you are able to submit the the screenshot and click submit new issue that's it now uh that's not the end of it because you you have also uh by just by attending this workshop today you have you are given an option a chance to have a free voucher to take the cassandra certification exam that's usually uh 145 dollars for each exam attempt but if you go and if you go to this link and i'm pasting it in the chat as well if you go to this link in the next uh say a few minutes the the there will be an open forum for you to go to fill and to get your own voucher that is valid to take the exam for free uh voucher for the exam and you will get you will be cassandra certified which is a valuable thing in uh looking for a job or looking to improve your career obviously you will still have to study that's unfortunately so yeah okay so we have we have gentile um kotalam lucas the cassini um you know they all think that we did a good job here uh you know obviously uh please give us feedback on how we can improve uh but but really um you know thanks for showing up yeah well thank you everyone and let me finish with a few other information we have a hackathon coming in september that's going to be this is the hackathon that we were discussing i know that couple of people actually have attended this session so you know back and forth right you know whoever has been in this uh you know you can you can try your hand at hackathon and if you are in the hackathon already you know attend some more of our sessions so that you can get up to speed and it's going to be fun and there is going to be some actual price in cash you see it on screen so don't hesitate significant yeah then okay we have we are not we are not eligible right don't think so but that would be interesting we shall find out so we have uh we as we already mentioned we have we hold various workshops per week so don't hesitate to connect with us and take a look at this link i gave to keep info to be informed of the new workshops in particular next week there will be a a nice new workshop for you that is about implementing a react native mobile application so that that is one of our first forays into the realm of mobile development and it's going to be fun new fresh and interesting so that that dovetails nicely into the question that was um you know do we have any more sessions uh that are not an intro session yeah yes we do definitely do um you know we take you from all level our our agenda is very simple you used to upload so keep an eye on on on our page that i just pasted because you will see there are various kind of workshops then we invite you to join us on discord and maybe i'm also pasting that link here just for completeness if i find it here discord here join us on discord at this link and we we are a pretty solid community and there is a lot of discussions interesting discussion going on so yeah uh yeah i think that's about it so we are saying goodbye and see you next time i hope it's been interesting fun and congratulations to everyone for following along thank you uh yeah see you next time perhaps do you want to see something say something before we close no i just want to put on my hat right right absolutely so bye bye everyone and please subscribe to youtube because we are almost there at the 20k mark so thank you everyone see you next time have fun and good luck with the homework bye everyone thank you ciao and as always don't forget to click that subscribe button and ring that bell to get notifications for all of our future upcoming workshops gifted with powers from the goddess of cassandra who grew those powers until she can multiply it will move with limitless speed and unmask hidden knowledge with those powers she was able to fully understand the connectedness of the world what she saw was a world in need of understanding from that day forward she sought to bestow her powers on all who came into contact with her empowering them to achieve wondrous feats you
Info
Channel: DataStax Developers
Views: 1,789
Rating: 5 out of 5
Keywords:
Id: AofIhPshHCo
Channel Id: undefined
Length: 144min 24sec (8664 seconds)
Published: Thu Aug 19 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.