Introduction to NoSQL Databases

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and welcome to this week's workshop we are doing an introduction to nosql database databases i am your host ryan wilford and i'm joined by cedric london hello cedric hello ryan hello everybody happy to be here and ready to work for this two hour session full of enzone so today you will work as well with us yes uh esteban i see a question in here uh will the live broadcast have subtitles unfortunately not it adds a lot of latency and to make sure that we can interact with our live audience we do we don't turn them on they will be available on the recording but um i will attempt to speak clearly and slowly for those who might have a tough time understanding you know ryan yeah they won't have hard time to understand you but me with my you know nasty french accent so i will try to do my best for everybody to understand fair enough all right uh let's get into some of this stuff so as i said i'm ryan welford uh i'm your host i'm joined with cedric uh we are developer advocates uh for here at datasacks apparently cedric's name didn't show up on this slide i don't know what happened um but we are we are very excited to uh to be uh presenting this material for you today um but we are not the only ones uh here in the stream we have a whole team of advocates um and moderators that uh are in the chat ready to answer your questions um so so please uh give us your questions as you have them we will be more than willing to to help you out all right a little bit of uh housekeeping to do uh so we are streaming on youtube that is our primary stream uh we also have a backup stream on twitch so if anything happens with youtube if the service goes down or the performance is bad you can always go to twitch and catch us there however we're not really active on the chat in twitch we really monitor the chat in youtube for questions and we also monitor the chat in our discord so if you are interested in joining a growing community of developers that are looking to learn more about cassandra and databases in general you can join our discord and you can also ask questions during the stream in there and we are monitoring that as well um and it's also a good place to ask questions outside of the stream because uh you know we can't answer questions when we're not streaming on youtube so that's a good point ryan yeah as soon as the live is done uh we cannot answer your question in the live anymore so this goal is great we do have how many 15 000 people now with us on discord so later if you do have issue with one no sql database you can still reach out to us and ask your question there yeah aj asks i don't have any knowledge of sql will this course be will be useful yes it'll be very useful we do um talk about sql and how it compares to nosql um but even if you don't have a familiarity with that you will still uh learn a lot for sure uh games we're going to be playing a couple of games today we're going to be using mentee.com and that is a service that we use to run quizzes we'll be doing a few questions here in just a little bit just to get to know you and um kind of what your knowledge level is on some things but at the very end we'll be playing a game where you can win prizes um you can win uh some swag and we ship that globally so wherever you are in the world we can ship it to you and the top three winners of our game at the end will win those prizes so stick around for that for sure nice all right so uh we're going to be doing some hands-on in this workshop uh one of the things that we'll be using mainly is uh where all of our content here for today's workshop is in github and i'll be putting this in the chat this link so this is the repo that we will be working out of all of the instructions that we'll be going through today are included in that repo so if you find yourself maybe you missed something that we said live check the repo for the instructions first uh because everything is uh spelled out there um and if you still need help feel free to ask the questions but exactly but yeah everything should be you should be able to follow along that repo just fine and and you'll be good to go there is one uh section and uh we'll be doing a demo of this uh we're not gonna do the um hands-on like workshop portion of this but docker is um a something that will be needed for one of the homeworks or i suppose it's an optional homework i'll get to homework in a little bit but we use docker to showcase one of the one of the nosql database styles called graph databases and so if you are interested in going through that uh part of the repo you will need docker for that and the instructions are are included in the repo as well uh we are going to be using a cloud-based database uh called astradb that is a cassandra it's built on cassandra which is a nosql database we're going to be using astro for a couple reasons first it's very easy to use it's free and it allows us to showcase a lot of these a lot of the different flavors of nosql databases without having to sign up for a dozen different databases and accounts and everything we just use one account um it includes a uh a um api layer called stargate that lets us uh use it as if it were a document oriented db or a tabular db which is it is naturally um it it'll allow us to showcase all of these concepts uh just using the one thing so it'll be super easy to use and we will walk you through how to set that up shortly i will be doing that pretty soon here all right i mentioned homework what is homework well uh we in in all of our workshops or at least most of our workshops we include a homework section and that's something that you can do after the workshop a lot of times it's kind of going over what we do in the hands-on and it will allow you to earn uh badges that you can use on social media your linkedin profiles basically to show that you are continuing your learning and um and showing that you you have you know successfully completed all the exercises that we are going through um and it's a really cool cool way to kind of collect them all i know there are a few people that i've already seen in the chat that uh i think have collected them all um like a pokemon master yeah okay but yeah yeah i've seen some some written yeah returning people who already have most of our veggies uh that's cool to see what i'd like to do now is we're actually going to uh we're gonna go to the repository um that we i i linked i'll link it again uh we're gonna do a quick hands-on uh section right away uh we're just gonna set up the the service that we're going to be using the astra db account um because it takes a little bit for the for the server or for the database to spin up and while we're waiting for that we'll have uh cedric walk us through some of the content um so if you go to the repo or you can go directly to the this link that you see the astrodb link which i'll also post we will go ahead and get our astro account set up all right so if you're on the repo you'll go down to the create astrodb instance section um and just an overview so as i said uh astrodb is a a cloud-based database built on cassandra um it's a it's a database as a service uh it is it has a a free tier so it's completely free to start no credit card is needed to sign up we're not going to ask for a credit card at all you will get a 25 credit every single month uh which is equal to roughly 5 million writes 30 million reads and 40 gigs of storage i think that's actually a little bit more at this point but it's that is enough for you to run all of these workshops and even a small to medium-sized business it's pretty powerful um you will see when you go to the uh the page you can uh sign up with your github you can sign up with google or you can just put in your email and a password i've already got a google google account i've gotta refresh though um when it when you land on the first page it'll ask you what kind of account you want to start and there should be a option for start free now right and then you can click the get started and it'll it'll walk you through the initial setup now the initial setup will look something like this it's going to ask you to create a database right away apparently aws is having issues so we're going to go ahead and create our database right away and this is the information that we're going to use so for the database name we're going to use no sqldb and technically you guys can use whatever database name and key space name that you want but if you run into any issues down the line uh and we need to troubleshoot it uh it's nice for us to reduce the number of variables right if we can expect that it's called a certain thing then you know that's one less thing that we have to worry about so for the database name we'll use nosql db and for the key space name we're going to use nosql 1. now what is a key space well key space is a collection of tables so cassandra is a tabular database we're going to be working with tables and a key space is basically a collection of those tables if you're coming from a background in relational databases it's basically your schema or i think another service calls it namespace i'm not sure yeah namespace is for yeah you know when you do have a single database you can have multiple applications using the database and you want to isolate everything that your application would use and you know could use a schema in relational key space in tabular namespace or even even yeah or even metastore yeah like varsha told us yes so for the provider in region you can select whatever you'd like i just got a message that says aws was maybe having issues so i'm going to go ahead and use google cloud just to make sure that i don't run into anything but you can select whatever provider you want and whatever region is closest to you is usually the best idea so i'm going to go ahead and use us east and on the right here uh you'll see all these prices right so you don't have to worry about that this is the prices that the provider is charging us basically to provide this um to you and this is kind of the rate at which it will eat into that 25 a month credit okay this is not what you are going to be charged um no it will be free okay so these credit credit values you know just for as information no credit card is is asked um you know it's it's free until you i mean it's free for you until you reach 40 million queries a month and this credit is renewed every month so really this is a free tier free forever your database will be still up after the workshop and you will never get charged if you do not reach that limit which to be honest is pretty hard to do yeah all right so then i'm going to just click create database uh and sri asks uh going directly into demo we need to know some basics of nosql before going into demo yeah we're going to get into the basics we just want to set up the account so that while it's working in the background getting all set up we can get into the basics don't worry we're not going to jump into uh you know actually doing anything quite yet no and you know we got the question from sri balaji sekar oh directly demo but we know you didn't even explain anything so um i will explain you uh about no sequel databases multiple flavors uh what are there why they are different uh and each time we will illustrate that using the astra db uh platform uh and astralis will take a few minutes to to to load so what we did is ask everybody to start the db up front then let's do some theory and when we will come back to ends on everybody will have its instance running and we can move on yeah so you should see uh i have a few here but you should have the nosql db and then on the right here it should say pending that's what you want to see um and it'll take a few minutes to switch over to active uh but that's what you want to see so if everyone has uh gotten to this point once you give a thumbs up in the chat we want to make sure that we haven't left anyone behind i don't see any uh big questions that indicate that people are um having trouble which is great i see some thumbs up all right uh i will try and go a little bit slower jigme sorry yes slower is better and you know we do have time uh what we would like is really you to work with us if we do not complete all the steps it's not a big deal you know the the purpose is really for you to you know manipulate that a little bit all right okay i see some thumb up more and more there's a few people that say it's still loading for a long time if you're waiting on pending it we're hitting a lot of people are hitting it at one time so it might take a little bit longer uh mongodb will not be discussed here we'll mention it um and we'll show off the document oriented uh database using astr though we're not going to be using alright got an email saying it's created perfect all right so while we're waiting for the pending to switch over to active for the rest of you i will switch over uh to the content and we're gonna hand it over to cedric woohoo okay all right so um this is the engine now uh two-hour session by the way i've seen the question before it's recorded so if you feel you cannot follow along it's not a big deal you can come back later with the same link it's recorded and you can pause the video and do the exercises at your own pace we will do theory enzyme theory and zone theory game okay and i'm confident that we can do it in two hours maybe two hours and 15 minute max so let's get rolling your database should be pending and it will be up when we will go for the first end zone so first point definition of and objectives of nosql databases okay so first you might ask what is the a database is a software to save things and retrieve them later with queries right super obvious something that everybody should know among you even beginners this is where you save the data and retrieve them later okay but what i would like you to know is databases is not a monolithic system in a database you do have multiple parts multiple components each one is in charge of some functions so first components uh can be called first layer could be could be called the interface layer so this interface is in charge of the format the language and the transport so if we are talking about relational database well the language sql you know that the transport it's called odbc gdbc it's a binary transport and format uh you know base this is the queries you will execute okay then second layer is the execution you will pass the query analyze the query and distribute the executor the worker based on where your data is and the third layer is the storage how the data is stored on disk is it text format binary format zip or not and for each database we will discuss today you need to understand that some database could have the same interface both are using for instance sql but under the hood they are not the same at all they do have the different execution engine and different storage and when we will talk about multiple uh no sql database just just in a minute please remember that for each db each there is each layer to to yeah to understand okay that's first part so let's keep rolling with this no seek the relational database because this is probably the one that you are most familiar with so a relational database can do a lot of things okay first it can do real-time queries what we call oltp or online transaction processing you execute something you want the answer now super fast quite low volume and any system that is transactional real time where the real interaction with users is oltp but using the same rational database you can also do some olap queries online analytical processing this is also called business intelligence at the end of the day you want to query the full data set of the day to do some aggregation computation and this time those queries are not real time it can take seconds minute it's not a big deal those are used to do some reports so even call traditional rational database can be used to do multiple things okay so if a relational database is so good what's the issue what do you want to use something else so let's see now i've put the relational database in the middle of this los angeles okay um i put the throughput how many transactions per second you want to do i put the the capacity or the storage how many data of volume you want to store at the bottom cpu you know computation and i also put some streaming uh on the side okay rational database are very good to work on a single machine and you everybody could understand that at some point the most would put you add at some times you will reach some kind of limit of your server and on top of that my head the uh hit ratio so how many transactions per second you can do on the relational system is about 10 000 percent that's just a rule in my uh rule of thumb in my head so same for capacity you know if you can store all data on a single machine uh one terabyte five terabyte 10 terabyte still okay you know you do have some rd drive able to do so but now if i tell you about one petabyte hey it's not working on a single machine anymore okay and at some point this system that is uh that has been designed to do the most of everything but is not really specialized in anything with reach some limits okay this is what i uh here put as a red circle rational db are very good to do all kind of things but at some points if you reach too much throughput too much capacity your system need to scale scale out add machine to your system i did not tell you that relational database cannot scale at all you know the big one oracle post gray of course can scale out you can add nodes and make the developers scale but it has not been designed to do so um relational database are transactional they are doing what we call acid transaction and in the distributed world those are slow and so to cope with this limitation the big giant first so like google facebook linkedin created some new database system to cope with these new requirements you might have heard about the 3v in the big data world is the same as no sql world so too much throughput is what i put here at velocity at some points your system cannot keep up volume is how much capacity at some point your system cannot store that many and variety oh maybe you do not want to store your data in tables tables are cool but if you want to work with hierarchical data nested structure joints is not always the good way to go and this is why document oriented db like mongodb and we will talk to later you know you want to save the data as it is if my data is a g-zone i want to save a g-zone if my data is a binary i want to save my binary format and so no sql is also a way to store different kind of data graph is another sample so first what you need to understand nosql mean not only sql not only sql so sql is just the interface layer some nosql system are using the sql language to interact with and i'm thinking about cockroachdb that's using that system for instance it's a no secret database but you will interact with it music sequence so not only so and second this system has been designed to scale and scale to big big volume and better that's a real difference between relational and nosql and why no sql has been introduced but nosql is pretty wide stuff okay i will come back to that in us you know in a second but uh there are a lot a lot of no sql system and it's we cannot cover all of them in two hours but what we will do we will give you the main category of no sequel okay you have seen in my agenda the four main category and this is exactly we will talk about one do an enzo talk about another and do an enzo okay so if you understand me scaling mean now you do have a distributed system your database is installed on multiple machine and there is a famous theorem in the in the i.t science it's called the cap theorem of eric brewer theorem and this term stated that in a distributed system you cannot have at the same time availability and consistency uh if you get any issue so let's say we do have a distributed system multiple nodes multiple machines and to have a system to be available anytime that means if you lose one of the node the stem still be up and answer to be able to do so the data is replicated you have the same data stored on two nodes just to be able to be available okay you got it but now second is how can be sure that at any point of time both of the nodes that should store the same data are consistent they do exactly have the same uh data at any point of time okay so um that's availability consistency partition tolerance is the bad guy is something that's go wrong you are in a distributed system you can lose the network in between the nodes so partition tolerance is really the the issue you lose some network in between nodes okay capitol ram say in a distributed system you can have only two at certain point of time and never three at the same times so some nosql system used to be a p so focus on availability some system choose to be cp focused on consistency so availability system you focus really on i want my system to be as fast as possible if something goes wrong i keep having some response but i know that sometimes the consistency won't be exactly what i expect so that means no distributed transaction right consistency on the other hand okay i want at any point of time my note to be in sync and be consistent okay but that mean if i lose one of the node the query will fail because now i cannot achieve consistency so just to give you some some distribution so cassandra pick ap mongodb pixip and based on the that no sql database or distributed system that you will use you need to be aware of which kind of system it is kafka is cp as well for instance zookeeper is cp it's the choice made by design in the beginning so no sql are distributed system so the cip theorem rule applied and uh i put here cloud system or distributed system that's just for you to know this is one of the reason in the cloud we have seen no sql database to be more successful at the beginning now of course you can have relational database also running in the cloud on any cloud to be announced docker helping but also sas system but it was first the nosql system because this has been designed to work easily in distributed system okay so i will go fast with this so if i look at the left part of my screen relational database you know the interface layer sql and the format odbc then the execution layer will do what we call a query plan planning based on how you do joins this guy will compute should i need to look for the data on table a b for table b and do the joints and we'll decide uh on which size the joints will be will make the most sense to be as fast as possible and the storage if you are using derby it will be text if you are using oracle it will be binary format and depending on which relational database you will pick they will have different storage they don't have to there is no store a standard to have storage of relational okay now if you look at not only sql landscape there are many many solutions so first in the enterprise layers some db are using sql some db are using json some db are using cassandra language n1qn cipher it's really a language for each more or less for each uh no sequence then the parser will be different and of course the way to store the data will be different if you are working with a graph database of course the data will be saved as a graph which is totally different at the way to store tables so here i put six big flavors of databases ledger like ksq db iw there is typo aws qldb so that's really like happened only type of database so time series database influx open tsdb prometa use tabular database cassandra edge-based digitable document database elastic document db graph database neo4j titan or key value radius dynamo or any distributed cache solution we won't cover the sticks we will cover four and you know you will see the difference and the difference in between those database are some of peak ap availability partition some of these cp so some are more performant than others but if you are more performant you have less consistency some are using json some are using sql so it's really based on the use case that you will pick one or another and when it comes to one flavor or no sql database against another there is no one better than others is really based on your use case and i will keep saying that okay so what we do today four main column oriented document key value and graph and it was quite some theory so let's cover the column oriented quickly for you to move to an exercise uh i explained that just just at the beginning um instead of having you to to install 10 database today in the two hours uh we asked you to create astradb instance uh that guy is based on cassandra it's a column oriented database open open source but in the platform there is also a proxy a gateway that that will help us interacting with the db as it was a key value db as it was document db or as it was a column oriented so we are using a single system for simplicity purpose what we will do is explain what is the flavor of nosql what are the use case and we will illustrate with this platform okay okay let's go with the tabular database okay so tabular database first if you look at the icon on the top right hand corner you might imagine understand that the key will be super super important you will have a key and the rest of the rows here label as values so in a tabular database as the name stated you will store tables and now you are saying okay so if you are using tables what how it's different from relational then well now it's a distributed system so your data is distributed among the node of your distributed system so here if you look at the table on on here my my my little draw even a very very small table with one two three four five six yeah less than 20 record the data is distributed among all the nodes and we try to have the data evenly distributed to have all nodes having the same load if i do select star from this table from these you say oh now i need to get piece of each data from each node so it will be slow so select star from table in tabular oriented is not what we want to do it will be slow best case scenario and as this database has been designed to store a lot of data eventually you will get out of memory at your site because too much data will come your way okay so the data is distributing in the node the data is distributed based on a key this is why the key is super super important that key in the case of cassandra is called a partition key in the case of h base it's called a row key and depending on what you are using it's a different name so the data is partitioned or sharded based on the value of that key so if you look here all the rows that have the value usa so the first column is in my key will be stored as the same place you store together what you want to retrieve together okay so the big names in this flavor are cassandra big table key spaces and edge base okay so this is really what you should uh remember for this session is say hey what about ashby's oh yeah ashby's is a tabula oriented so it will store some tables and i need to be careful with the key but you know now saying about which is better cassandra and edge base that really the internals the execution layer and storage layer and that will depend on what you want to do so what kind of queries do you want to do here well you always always need to provide the key in your work clause because if you provide the key now the database will know on which node to work and will start retrieving all the data in a single node no using the network at all and this is where it will be the fastest okay there is no joints you do not want to use one row and join with another because it will be spread across the cluster so instead of having joints what you do is what where to we are doing what we call the normalization so if you have you know users and group that would be many too many relationships in a rational database now you will have two tables one tables would be called user by group and then you will store all the user from the same group in one in one partition one node okay but you will also have the others table group by users and now if you provide the username you will have multiple rows each rows is the name of the group and now you say hey but then the same data is replicated on two tables yes folks this is how it works this is a tradeoff to be as fast as possible with very very high volume of data and keep being performant you need to replicate the data for the data to be stored in a way to optimize the queries okay so what are the use case for this guy okay so first you have seen there are multiple nodes and it's mostly focused on ap availability if you need more capacity add new nodes most would put add new nodes for those guys i through put i volume every read every right and i he'll give you uh just a subset even streaming log analytics internet of things or time series okay that's one flavor what you have already seen maybe second is availability data is replicated you can lose any of the node it's not a big deal so everybody will be happy with that yes if you lose one node and the system is still available super cool always on so i mean most of the use case would work here i put caching pricing market data inventory but you know not just some subset again anything would be happy availability first at the cost of persistence is distributed so i've here if you look at my schema i do the single ring so a single set of nodes but you can easily have multiple rings multiple small cluster like i show multiple wheels inside the same big cluster so that's pretty cool because now you can have the data close to your customer because the the nodes are close to to your customer and reduce the latency you can have nodes installed you know in germany if you're in germany because this is a law this is you need to have data in the same country as you you can have one i don't know one ring in the cloud and another ring on premise totally possible again some use case by the way all the slides are also in the github repo okay and cloud native it's focused on scale out not scale up so adding new nodes and that's fit very well the deployment model of kubernetes for instance could you real quick go back to the slide that has the ring and there's a couple clips yes uh i just there's a couple questions that i want to address for everyone i will be very quick though i won't dive too deep uh one of the questions was uh we have multiple rows with the same key and i know uh stefano answered this uh in text but um the the key that we're talking about here is not the primary key it's not indicating uniqueness it is simply partitioning the data into into segments that we can spread around the cluster that's what that key is so yes we'll have multiple rows with the same key and that tells us that all of those rows are related and they'll be stored together and someone else asked a tool asked do we need a unique key on each node if you notice on the bottom left we have a node that has multiple keys in it we have australia and india in that same node each node can have multiple partitions within it and to your question about how do we keep a particular key from overflowing the capacity of the node that is something that you need to keep in mind when you design the system to make sure that you don't have any large partitions um that that are prone to do that all of this in this is a very high level answer we go into detail on this whole system in our introduction to cassandra workshop um which talks about all this in much greater detail so if you're interested in that um we do have some of those pre-recorded on our youtube channel we also do live workshops pretty regularly so keep an eye out for that if you're interested in learning more about how all that works great yeah quite a lot of question that's cool that people are focused that's i love it okay keep moving uh this was the use case and you know what i was finished with my rambling and i think now it will be time for everybody to try to play a little bit with a tabular database and see the difference between a partition key and the primary key yeah definitely all right so we will move back to uh our next hands-on section uh if you want to go back to the github repo um i'm gonna i'll get a link directly to this section we're in section two step two talking about tabular databases uh real quick i wanna make sure all right stephanos i can't i have so many windows up i can't have the discord window up at the same time so sometimes i miss it i'm glad stefano is in in there answering questions yeah this is what i say thank you stefano and rags being on on top of all the questions but you know now ryan you can go and do the end zone i will uh sounds good all right so by now you should have your database created in astra and it should be active uh hopefully enough time has passed for that so the first thing that we're going to do is we're going to go to [Music] the connect tab or the connect button so on your on your database there should be a button on the far right called connect you can click on that and then at the top you'll see a few options there's a there's a tab for the cql console so go ahead and click on that and it will load up a console and automatically log you in to your database and you'll be able to um run some commands so the first command that we're going to run is describe key spaces right so we created our database and we created a key space so we just want to list all the key space key spaces that exist in our database and as you can see there are a lot of system level key spaces ones that we uh we didn't make um this is just some some default stuff that that astra needs to run but we also have our no sql one key space right here i'm gonna go ahead and uh make this a little bit bigger so we have no sql one so we can see that our key space does exist in here um and we can go ahead and use it which is actually going to be our next command we're going to use no sql1 that's this part right here so the reason that we're doing that is that we're basically telling the console we're working in this key space and all the commands that uh follow this will be commands for this key space it's just so that we don't keep typing no sql1 uh dot table name or whatever and then we're going to create a table so i'll walk through this if you are familiar with sql this will look very familiar so cql is a subset of sql it is very similar in a lot of ways there are some important differences though that we'll talk about um but this will be very very familiar we're going to create table if not exist so if it doesn't exist already we'll create it we're going to call it videos this is the name of the table and then we're going to define our columns and their the data types so we have on the left here we have our video id column email title upload you know all these column names and then the the data type of that column and then we define our primary key now the primary key is what defines uniqueness um so this will be uh this will identify a unique row and in this case we're using video id as our primary key which is going to be a unique identifier so that will be unique but you notice that it's also in parentheses this is what signifies the partition key right we talked about the partition key is what sorts all that data into different sections to be saved on different nodes so the primary key and the partition partition key in this case are the same but they're not always the same so we'll we'll talk about instances where that's the case and it's actually usually the case that they're not exactly the same so i'm going to just copy this right out of github and paste it in here and we will create this table and so now if we run this command describe key space nosql1 we will see all of the tables that exist uh oh actually it's going to be the structure of the of the key space so we have our our key space command what all the the settings that has replication um you know where it where it's hosted the replication is three and then we have our table and the the structure of the table right all of our column names and their data types and there are there's a few a little bit more metadata as well that we can see there so we can see that the the videos table has been created all right so now we're going to insert some data i'll let me uh pause here if everyone has any questions or needs me to slow down please let me know text is blurred we are streaming at 1080p so make sure that you um have the quality set up high i can zoom in a little bit more but at a certain point it gets too big too big yeah so i think that's as big as i can go i might have to zoom out later to show everything but check your quality settings uh but we are streaming at 1080p so it should be should be good uh strengthy asks is the data types the same as other dbs uh there are some that are the same there are some different ones and i don't have a comprehensive list or an exhaustive list here um but there are some differences for sure all right so we're going to insert uh we're going to insert some data so i'll go over this really quickly if you are coming from relational or sql background but this will again look very familiar we're going to insert into we're going to define the table that we're inserting into and then in the parentheses we're going to define all the columns that we are uh adding data to and uh and in in a specific order because then the values the order that the values come up that's the the row or the column that it will be inserted into right so video id will get the uuid generated to uuid the email we'll get this clu at sample.com and so on so we're just inserting into our videos table and we're defining the values for each uh row and we're gonna do that three times um all of these are the same the one difference that i'll note is that this uh uuid is hard-coded uh because we're gonna use that later and you know we don't want a random random uid to have to copy so i'm just going to go ahead and copy this and paste it all right so all that looks good uh if you are not copying and you are typing um feel free to copy it's totally fine but if it if it hangs uh and doesn't seem like it's like it seems like it's thinking you are required to use the semicolon so it's probably just waiting for a semicolon i know a lot of people have run into that in the past so all right so now we're going to read these values and we're going to run this command select all from videos and this should select all of the all the uh data from our video table all right i'm gonna have to zoom out for this one as you can see all of the data that we have entered the three rows three inserts are there um the green is the id the video id now we have our url we have our email you see all the videos there all the data's there yeah um ryan we got a couple of questions just to can you show slowly how to copy paste some some cql like right click copy right click paste without ctrl c ctrl v because some people on on windows don't have that work that working okay yep so you can you can highlight the text and right click and click copy you should also be able to use this widget in github you can click that and it will also copy it to your clipboard and then in the console which i have just been doing command v you should be able to right click and paste and it will probably auto execute um it might also pop up of a window that says do you want to allow pasting in the browser or something you can click allow on that i know that sometimes pops up um row heat that's a good a good um suggestion is you can do control shift v or command shift v to paste sometimes just the straight control v doesn't work but the control shift v will does the sequel console have any gui this is a gooey what are you talking about yeah but you know a web-based uh um it's coming so yeah we are actually working on on a fork of the netflix data explorer with ryan and we will make that available in the sample app gallery before end of this year yeah that's uh something that we've been working on i know there's a couple of questions that came through that all right lutaro says control shift v is working so ctrl shift v is is your solution okay so we just executed this command select all from videos now i think cedric mentioned uh something about never doing this because if you don't provide your partition key the database does not know which node to look in for that data so it's going to look in every node which is not very performant now in this case we don't have a lot of data we only have three nodes uh it's you know we were able to do it and it didn't complain and it's fine but in production in large data sets this is going to be a problem you should never do this one of the the way to do it is to always provide in your where clause uh which is if you if you're not familiar with sql you provide you know select all from videos where video id right this is our partition key for this table equals our specific data so we should always know what our partition key is and then provide that data and so if i copy this over we should only get the one row that uh applies to that query and this is much more performant because you're providing the partition key the database will see that partition key and be like oh i know where that i know which node that's on and i'm going to go straight to that node get the data and provide it back i don't have to look at every other node because i know it's not there so that is the the convention that should be followed all right so uh now we're gonna go deeper into partitioning right uh we didn't really do much partitioning last time uh but this is going to be uh more of that so we're going to create a new table i went over that already in this case the primary key looks different right we have a few different column names here so in our in our overall parentheses right this is our whole the whole thing is the primary key this is what defines uniqueness all of these columns together work together to to indicate a unique row so no row should have all three columns with the same data inside uh if they do it'll just get overwritten um inside this uh this primary key we have another uh uh column and we have parentheses around it and that's a name kind of a convention to indicate this is the partition key this is the um the column that we're going to use to partition by right so the database is going to say what city you know phoenix that's going to go in one node uh one partition memphis that's going to be a another one so that's how it's going to partition the data and these other two columns are called clustering columns and this is um how we define uniqueness right so there could be many people in phoenix but and there could be many people in phoenix with the same last name but there's only going to be one person in phoenix with this with one last name and one specific email right so this all together makes a unique row um but they do another thing these clustering columns do something else it allows us to order the data on disk so in this case we're using this command with clustering order by last name ascending and email ascending so what that's going to do is store the data on disk by last name in ascending order then by email in ascending order and so when we retrieve the data it will come back in this order uh rather than using a order by clause in the uh in the query which is expensive uh in uh uh processing we're going to write the data in the order that we are going to want it in and then just read the data very simply and it'll come back the way we expect it so that's a really useful tool in using clustering columns in addition to providing uniqueness uh so i'm gonna go ahead and copy that and paste it in uh i see we have a couple questions abdul uh within parentheses so the field the column within the parentheses here is partition that's how we define what which one is the partition key the rest of them are called clustering columns okay and now we're going to insert again uh i won't go over this again this is the same same deal um you will say i will mention we're named the naming convention for this users by city we're we're saving users by whatever our partition key is that's kind of how the naming convention is it just allows us to see the table name and know exactly what the partition key is supposed to be so i'm going to go ahead and insert all of that data and now we're going to uh retrieve this data so we're going to select all from users by city where we have to provide a partition uh our partition key right city equals paris and this should retrieve the one row or the two rows actually they have uh paris as the city and again this went to the one node that well we'll get to the replication but the one node that owns that partition got the data retrieved it sent it back didn't have to look anywhere else for that data and we have our data here uh lutaro as in the previous example the uuid parameter was indicated as the partition key that's because we needed it we needed it to be unique and the the uuid is unique um and that was just this is the first example was kind of how you would normally do it in a relational database this is very similar to what the relational database setup would be and so it was kind of a for those of you who are familiar with that easing you into the concept this is more uh similar to what you would see in production because it gives you a lot more tools and power with how to store and manage your data tim larson asks who is responsible for shipping dispatching the work to the paris node so anytime you you query the database your query goes to one of the nodes in the cluster which then becomes a coordinator node there's no master slave at all all the nodes do the same thing uh but the node that you communicate with is the coordinator node and it knows that the topography of the of the database and it will communicate with the node that has that data retrieve that data from that node and then pass it along to you yep there is no master one of the big differences between uh databases as well is whether it's master slave or masterless okay um i'll be looking out for more questions but with that i will send it no no please no more questions just kidding please come with the question yeah i say replication factor characters yes so astrodb defaults to three nodes and a replication factor of three so technically all of our data is stored on every node that's just because it's the same number of nodes but if we were to extend the nodes to like five the replication factor uh is three as well so our data will be stored on three of the five nodes again that is uh talked about in depth in our introduction to cassandra workshop so if you're more interested in that come to those workshops that's great you are engaged everybody that's great to see yeah there's lots of questions coming really fast a second i'm going to have to answer them all i think we yeah we are doing a good job we make it if we have a hundred nodes then every node consumes cpu power for search data i mean yeah if if they get a request they're gonna i guess i don't necessarily understand the question um the that brings up a good point like one of the benefits of a distributed system is you can use a lot of cheaper machines rather than a you know huge machine with really expensive cpu you know cost i don't know if that answers your question or not or addresses your concern uh i query using just the last name in the where clause got an error stating i should allow filtering uh you should not allow filtering you need to provide the partition key not the last name um yeah you need to provide the partition key which in this case is city not last name allow filtering basically forces it to search every node which is not what you want we talk about that in the intro to cassandra as well yes if we don't know the partition key how to check the partition keys for the data set so one of the um paradigms of application development with uh the denormalized structure is you should you should know how your application is going to talk to your data like you uh you start with the application and what the users will be doing and what queries you will be using and then you you then you design your database around what those are so you should always know what the partition key is um at least you should know you know what column it is and then um there are you need to strategize in how to make sure you know what um uh what partition key to provide right so like if it's users by city you know you can you can provide a city right maybe there's a filter and you filter by you know location data like where is the user right now oh they're in phoenix let's you know provide the phoenix partition key or something like that um and then they can change it later if they need to so there are definitely ways but you need to be you need to really like map it out beforehand uh okay so yeah i was about to type the the the answer but it will be long so super great question for narendra um you know how i think it's now it's going so fast uh how do we decide on which node we will go when we insert something so the value of your partition key will be ashed as a big integers this is what we call the token and your database know on um each each node is in charge of a range of token so we have if i hash my value and gain 12 i know that for instance one node is in charge of 10 to 20 and the request will be routed to that guy okay yeah you can think that about a modulo function you have a x node and you will hash the value do modulo x and you know where where you need to go uh yeah and and cause xander does that for you like you don't have to just you do not have to determine which node owns which data the uh the database will do that for you so you just provide you just tell it what column is the partition key or you know what your partition key is and then when you insert data even if it's you know a new a completely new city right you you've never had orlando before and you insert data with orlando it's going to hash that figure out where to put it and put it there and you don't have to worry about it yes uh can we define a partition key by combining multiple columns yes you can and uh that is why this is in parentheses because if you were to provide two columns in these parentheses right separated by a comma then it would be a um uh a oh what's the word the two columns together would be the partition key i'm blinking on the thing we call it bucketing and there's reasons that you would do that we go in detail on our introduction to cassandra a lot of this will be is covered there so i'm glad that we have lots of questions um yeah but we do need to talk about the other types too compositekey thank you my brain sometimes doesn't work it's a composite key in that okay yeah that's great uh okay so um should we move on you know i think we still have three databases to cover yes i i sent you are back uh your screen is back live i will continue answering questions as they come up but we do need to move on because yeah we've got more stuff to talk about yeah go fight no you're good okay okay okay so second range of data so okay so wrap up tabular database use store tables big names cassandra hatchbase bigtable that's the big three now let's move to document database okay so now in the document database you want to store structured object identified by a key and the most news is the json so if you want to remember think about my stupid joke json statup okay now you will remember that documentary on tdb is for json big names mongodb couch base elastic okay so these big three enter in the category of document oriented database documents are json blob okay json structures chosen nested nested structures okay you we are not talking about table anymore we are talking about collections okay and um there is no schema validation so you can insert in a collection one g-zone with some attributes one abc and just after that insert another json with totally different structure totally different names so it's often called schema-less it's not 100 accurate it's uh no validation on the schema when we insert things so some tips for the question in the end are all nosql database schema-less no because cassandra and tabular does have a schema and document oriented have no schema so not all database are schemas okay so now you know which with which kind of data we work we are working with so you want to retrieve data inside the same collections using either the key it's often get document by id or update the document by its id or sometimes you would like to do a web close and you provide some path less slash attribute 1 slash attribute 2 with the value and based on the tag and the value in each level you will define exactly filter on the root level of the json document or nested levels if you have to okay so now the use case for this guy uh it's good what it's good is you save at the same place everything that relate to your uh object so you can have the client and everything related to the client inside the same g zone and so if you want to export the client context it's just a single object to export so that's pretty good to read now when you need to write if you need to write the full json each time it's pretty easy right it's it's easier than the row so the the payload is bigger and so the right queries will be a bit uh slower so the range of so these databases are pretty good for reads a bit less for rides uh those three choose the cp so consistency and partition tolerance so transactions are handled and as such the availability is as not as good as the previous one okay so these documents are these documentaries database are pretty famous thanks to mongodb and you know mongodb works very well with all the front-end development because any object in javascript i mean json means javascript object notation so anything in javascript any object can be saved as json up front and you know you don't have to think so you just save that in the db retrieve that from the db and this uh is pretty famous with that but hey now in the end zone you will see that you can do the same with the other flavors of database you know it's the same document oriented pattern but you know it does not have to be always so let's go with enzo number three okay i just wanted to make a quick mention because i early on i saw a lot of people doing this it seems like has become the uh kleenex of document databases is just a flavor of document oriented databases it is not they are not one in the same so i know a lot of there are some people commenting like oh we're going to use it like or use it as or something and it's not quite yes so they've done a really good job in marketing that's for sure exactly documentdb are one of the flavor of no sequel and mongodb is just one database of document oriented in a bunch so of course mongodb is super famous as a huge footprint a lot a lot of people know of it but hey you know it's good for certain use case and it has some competitor definitely all right so let's do our next hands-on we're gonna use this document uh oriented database so we are going to continue using astra um and we have uh the we're gonna be using that stargate api layer which will allow us to use cassandra which is a tabular database as if it were a document oriented database because the api will handle all of conversion for us so it will act just the same and it makes it very flexible but first [Music] interesting to note that cassandra actually supports json right out of the box uh we can insert into our videos table we define that we're using json and then we uh you know define our columns and then the values that those columns should have or sorry the yeah the columns and then the values that that each column should have and if we copy this in it accepts it just fine and then we can retrieve it we can select json format and then we can select the the exact fields that we want title url tags from videos and this should give us a json object with all of the uh you know each object is one of our rows so we can actually work with json directly with cassandra which is cool but it's not quite the same idea that most document-oriented db's use so we're going to actually use um a another tool that kind of will allow us to to work with it as if it were kind of a traditional document oriented db so but first we have to create an application token um this will allow us to authenticate the tool that we're going to be using so that it will you know can allow it to communicate with the database right so we need to create this authentication token first and to do that we're going gonna go to our dashboard and then up in the top left you'll see this current organization and it'll have your email if you drop that down and go to organization settings click on that it'll bring you to a new window and then if i make this bigger on the left side you'll see token management as one of the options if you click on that okay even bigger and then we're going to generate a token so we're going to select a role the role that we're going to use for now is the database administrator this is just you know high-level uh authentication allows us to do everything that we need to do uh without really worrying about you know what subset of roles you know we'll need this will give us everything that we need so database administrator go ahead and generate your token now it will generate your token and this is the one that we want we want the one on the bottom where it says token and we can copy this one now if you navigate away from this page uh you will never be able to see this again it won't show it to you again so so what you want to do if you are worried about losing it uh is download the token details which will download a csv um and then you can save that for reference later and then you'll you'll be able to access your token um so i'm going to actually open this in a new tab i'm going to keep this open so that i can copy easily um but once you have that either downloaded or copied uh go back to your main dashboard thanks kapil it says amazing sessions so far glad you like it oh nice thank you captain i can see it's live yeah uh and we're gonna go back to our connect page for our database and you'll see that we have a few different apis that we can connect through and we're going to use the document api which is the default selection we're going to scroll down to the section called launching swagger ui right let me just go down make sure it caught up okay so this uh this link is unique to you it's unique to your database so if you go go ahead and click on this it will launch this tool swagger ui is a an api tool which will allow us to call these different endpoints um and and kind of test out the the api so to speak all right so i know i just went over a few different steps give me a thumbs up if everyone has followed and you have your authentication token and you have found the link to swagger ui and have gone there and give me a thumbs up in the chat when you are there all right arvik says thumbs up awesome got a lot of people uh yeah i can repeat after token it'll be pretty easy um if you go back to your database so go back to your dashboard click on your database and you can go to the connect tab at the top click on document api on the left side and then scroll down to launching swagger ui and click this link i'm glad i pronounced it correctly i'm probably mostly pronounce everyone's name incorrectly so it's good that at least i get one out of a thousand okay let me i'm going to zoom in a little bit on this all right so now we're in swaggy swagger ui and we're going to use a few of these uh a few of these endpoints here um so we'll be jumping around a little bit so just be be prepared for that the first one that we're going to do is create a new empty collection now cedric did you mention collections at all yes okay so collection are really like tables in the tabular world but there are no schema validation by default but in mongodb for instance you can enable schema validation oh i don't know yes you can it's possible so we're going to create a collection because we need a place to store all of our documents uh and so we're going to create use use one that's called create a new empty collection in a namespace right so this is the second one here you can see the description here so click on that and you'll see this is kind of an example of of all the things that you need on the top right though you'll see a button called try it out so each of these things we're going to open it and we're going to click on try it out and this will allow us to fill in these fields the first one is our token right so we need to authenticate this communication to our database so the database knows you're good to go so i'm going to go ahead and paste my token in there now your token should start with the astra cs make sure you include that uh and then our namespace id now this is our key space name right and remember we use no sql1 as our key space so namespace is just kind of a convention for document oriented databases yes exactly the same it's the same thing and then in the body we're going to use this we're basically just going to provide the name for our collection col1 for collection one and copy that paste it in there and then there's a big blue execute button how to pace if you're using linux you try to control shift v uh try right clicking and pasting yeah uh and then we click on the big blue execute button and we should see a response that says it's success taking a little longer than normal all right so we have our server response 201 that is a success and uh 201 means created so that should be good we have now created our collection all right now oops we're going to create a new document so we're going to go to the the create new document uh section which is the fourth option uh we're going to click on try it out and uh we need to enter our token now i copied something else over so i need to recopy my token so each one of these you're going to have to copy your token and that's why you should download the csv um where should we click try it out the try it out is at the top right uh it turns to cancel now but it should be up in the top right once you open you can click on these uh rows and then it'll open up and there should be a try it out on the top right all right we're going to fill in our token again we're going to give our namespace id which is nosql1 we're going to give the collection id which is the one that we just created col1 and then we're going to provide the document which is this json object right and as you notice there you can have nested objects too we have this object within the field for this object which is you know within the field you can have nested objects it's totally fine i'm going to copy this whole block over why do we say swagger what is swagger swagger is allows us to uh make calls to the api that stargate api um just as if an application these are the same endpoints that the application would use to communicate with the database and save documents and retrieve documents this is kind of a way to test those endpoints um but we're using it to actually you know add stuff so we click on the execute button uh and you can see we get the 201 which is a success and we also receive back a document id now this is actually generated by uh stargate the api uh we did not define the the document id i don't think in this one no we didn't so it actually generated a document id and we can use this to retrieve that document uh you got an error dinesh got an error 409 error conflict uh was that in the must be in the collection one then um try it again so conflict really mean like it's already exist with the same name and you know it's it's kind already exist so maybe you are creating a collection that is already there maybe try col2 have you done this workshop before and done the homework all right so one of the expected outputs is this document id right uh now obviously it'll be a randomly generated number it's not going to be exactly the same um that's fine another thing we can do is find all documents in a collection so if we find let's scroll up and we find uh i think it's the third one search documents in a collection search yep so we're gonna search documents in a collection click try it out copy my token again fill in the token fill in the key space name and the collection name right col1 this is going to give us a list of all of our documents in this collection right we're not defining a specific document just give us everything and as you can see it provides us our our document that we saved previously now if we had more than one it would it would uh show us multiple i think swagger ui defaults to just one at a time though so page size is a way that you can return more than one but i think swagger ui just limits it for performance sake so the expected output is our object that we created but now that's not as useful we want to get a specific document right we want to retrieve the specific data that we have already saved so we're going to provide the document id and we're going to use a different tool called get a document it's a little further down i can paste my try it out place my token give it the name space or the key space give it the collection name and now we need the document id now we were given the document id previously uh which i'll have to open up this one so in our result here you can see this this part right here actually we can use a different one it'll be more apparent uh in our create a new document right it gave us the the document id right this guy right here so if i use that i just copied it and i provide it in this one in this field the document id now when i execute i should get this exact document that i'm looking for and there we go it's all right here so this is the endpoint that you would use and you can see the endpoints right here right this is the the whole endpoint v2 namespaces this is the where the name space or the key space id is is entered so this would be no sql one then slash collection slash our collection id col1 slash the document id when you hit that end point you will receive back this data that's how the api works it's kind of an overview the rest apis can be directly used from web-based application yes you can you just have to provide the authentication token to make sure that it's you know authenticated all right we can also search for documents um without the id by providing a where clause um so in this case we're going to search documents in a collection so let me close this i'm going to reopen close some of these reopen the search documents and now we're going to provide so we don't have a document id we're going to provide a where clause and in this where field we're going to paste this right here so this is basically saying this little object here i'm going to sneeze real quick i'll do maybe not we're basically saying give me the ob the uh document where email is equal to this command means equals see london at sample.com so whatever documents have the email equal to c london at sample.com will be retrieved in fact who did these exercises in the third place pretty hard to know right yeah wasn't me uh so you can see we got a success and we got our we got our uh our document back the homework task looks similar to the current lab yes it is it is the same kevin the homework is uh doing the same step as today and just sending a screenshot to proof that you have done it and you get the badge dummy says going very fast yeah i'm going a little fast we are running out of time uh but it's fine um i'll if there's a if there's something that people want me to do over again i totally can yes yeah yeah so so yeah want you to keep some time for the for the games so yeah you know you know ryan take the time you need and for key value and uh ski and graph db uh we will skip the the end zone uh leave it at homework and and um yeah yeah and then just explain the the the notions yeah these are probably the two more popular ones anyway so i'm glad we got this done uh omkar asks about the uuid or the doc the document id so this document id is generated by uh so you can actually say this this document id is generated by the stargate api because we didn't provide a document id for it which we can do but it will send it'll generate that unique identifier send it back to us so we know what it's what the document id is and then we can use that to retrieve it however you can provide a document id if you want something a little less jumbled um and then it will it will save that as the document id and you can use that to retrieve it uh ryan please call up i need to check the values in the last that's why you're cool what's that no just scroll up scroll up yes call up just to see what's on screen seems like people are a bit lost okay i'll scroll up uh well see the repo is available to everyone you can access it yourself if the if it's the repo um i need to check the values in last so i could call oh got to gotcha i gotcha oops sometimes i get lost where my mouse is all so you're gonna have your token then your key space then your collection name which is col1 should be and then the where clause is this guy and i just copied that out of the repo right here in 3g and pasted it in there row heat the homework is in the repo um it's basically doing all the steps that we're doing now um plus there's a key value one and then the graph one i think the graph is optional though because it's a little bit bigger all right it's omkar says looking forward for graph and other db types too please try your hands on oh okay okay okay i will do my best to show you the graph then okay we might have time for that the key value is is really simple and when we talk about it you'll realize it's it's stupid simple okay any other questions i think we are good for the question for the kind words okay i'm gonna send it back to you i will address any more questions that come through uh but you are now back live okay okay uh tell me when i'm live and boom you are live all right so now let's move to a flavor of db number three called key values so again if you look at the top right hand corner it's sum up what the db is it's really really simple you will only work with keys and update values based on that keys so the only operations that available in a key value database are get put delete update there is no where clause no no high level language this is really basic so one key one value the value could be whatever you like okay the value could be an object the the value could be a list but that's what's important to understand is always work with the key and you only you really need to have it okay so the big names in the key value world are is redis first uh you know mostly redis dynamodb is also considered as a key value db and you know any distributed cache so infinity span memcache terracotta gcash uh what else uh you know you got the memcache you got the id so the the biggest by far a use case for this range of database are distributed cache okay you provide a session id boom you put the the value of the sessions uh also if you want to track how many calls someone someone is doing per second just get the ip is the key and you keep updating the value so because there is no request to analyze it's the fastest among all the database both for read and write and also most of the times those db work mostly in memory and you know only then flush the data on disk once needed to be as fast as possible if you need to if you do not need to save the data on disk that would be the fastest among all so distributed cache okay to go the id you want to have data in memory to get that as fast as possible and uh some also maybe some uh you know help the relational database that are under the hood and can be slow you know if you put some cache layer in between that will help the relational database you don't have to access the relational database each time if it's not the first time that you access the data okay another use case we put there is data deduplication let's say you have multiple values for the same key so you want to deduplicate the key so what you do is just you put everything on a single um collection or table bucket here it is it was looking for the term and the database because when you put something it's it will update so any insert is an upset so if the data does not exist it will be created if it already exists it will be updated um so and this is why i and ryan told you this is the simplest and as such the fastest but limited range of use case but probably the fastest among all okay and so the end zone for this guy was uh again reusing astra and doing put get and delete using not the rest api but the graphql api that was also a good uh occasion for you to try using graphql but you know what i will move right away to the graph database where i will take a little more time including a small demo if you don't mind because yeah really key value is as simple as we expect okay uh graph database so now uh as the name stated your data is a graph so not only you have the uh the the vertis or the node uh and it has some properties but you also have another type of object called edges and the edges also have some properties you want to use a graph database when you have a lot of joins to do where your data set is highly connected because you know when you start doing multiple joints in the same rational database query it will get slower and slower each time the complexity increase the graph database language and way to interact has been designed to be fast even if you add more and more joints the edge is first citizen so how does it work well first you will find a subset of either vertex of or edges and to do so you will use a language so so-called gremlin or cipher or the two big graph languages and you will do something like g dot v so give me all the vertex in my graph matching all this condition okay probably will get a subset so let's say uh this guy these requests only retrieve the customer okay that's the first result then using the same language we can use out dot out result so once you find the customer use the language to move to the address and find the country of the address it will be dot out country and um by navigating the graph from one step to another it's easy to retrieve the data but again the demo will talk more about it so use case highly connected data so social network is quite the obvious one customer 360 you want to know all the the interaction of your customer with your system could be contract could be logs could be connected connection internet of things you know you can have a network of sensor or you know a network of an electrical network as well yeah you do have the network what happens if you lose the connectivity between two nodes is there a path to go to one node to have one vertex to another or not so it will compute if there is existing path and what would be the best path to go through personalization and recommendation so i like this one very very much because hey let's say these customers read some books and you try to create some recommend recommendation engine for books so first query you d you will you can do find all my friends of all all peoples that have read the same book as me so this is user this is book and the the edge would be read easy to do find me all the users that have read the same things as me it's one step query and now you can say hey find me all the books that this the the people that these population have read that i do not have read so now you are moving from you to all the people reading this to the books you read to all the people that read the same books and again go back to books so now you have three steps in your graph and finally you can do a where clause among all these books uh find the one with the best notation and you will get a ranking and you will get presented a list of books that you might l you might like based on the readings of the others and you know there is no machine learning there let's just pure basic graph query uh else care yeah you want to find some side effects on on on some drugs pass finding security fraud i told you about that so the big names here are neo4j neptune on aws titan db and at data stacks we also have our solution called data stacks graph it's not part of astra db so to do the end zone number five you need a docker and it if you just execute the single line docker compose a d it will download all the docker image needed and we'll run that but you know for the arm work it's not mandatory it's just um yeah it's really just optional so and as i've been asked to do to to show it so let's let's let's show you the demo of the graph okay so if you start docker you will have uh the graph database running and this tool called data stack studio notebook it's a notebook oriented tool like maybe you have already used jupyter notebook or apache zipline it's the same logic so if i open one notebook you will see that you will find multiple cells some cells are just documentation marked and some cells are gremlin which is the language to work with graph so here first we need to create the vertex so we will create uh the family of the gods so a god have an as a name and age a demigod as name and age a human same monster same location name titan name and edges okay that's only the vertex now next i create the edges so a father is a edge going from demigod to god okay it's the arrow that we have seen before okay edges could also have some properties so leaves so a god lives in the location and maybe you can have a reason why this god lives in there you know thus lives in olympus months because it's part of the 12 greatest god something like that so seems battled battled pet and so here we just have design the schema of the graph next we are creating some data so saturn is one guy i will add the text so i will add the vertex titan providing these properties and i will do that multiple time okay to execute a cell just go there and execute and if it's okay you got success and so you might think uh okay so let me clear that yeah the results yes so here g.v find me all the vertex in my graph that do have labels in that so more or less it will compute me all the uh it will compute me all the vertex and you say okay easy easy and you know nothing special but it's a graph so you can visualize this result as a graph and now here we do have our uh god named jupiter that lives here that do have some browser fuzzer browser and you say oh that's that's ugly so maybe if you if you think like me it's a bit ugly let me change that and go here and say hey the name for the dummy god if i go to demigod here good you should use the name and here you should use the name as well okay what about now so now we know that jupiter live in okay this location uh let's use uh the name as well for location just for you to give you an idea okay so now jupiter live in the sky and this is how it works okay so now hey this is jupiter and this is everything that we can do with him can we do better okay so here i only find the dummy god okay we have some demigod here i could use the same trick to show you proper label what it's cool with the graph is navigate the graph it's pretty cheap so what i can do okay i can expand the numbers and i say oh this guy boom please show me more about that and you can go to one node and say hey okay so that's that guy is the c uh okay so all i've been done here you all have been done here but i if there are small neighbors i can expand that so it seems like now everything is being explained now here i can also find where that guy lives and now i do have a new location in my graph so and you know you can do the you can do the end zone and you know it will explain you how to show the neighbors how to navigate the graph and how which cool query you can do to navigate your graph um and you know pass and third pass and it's you know which is cool with these um notebook is it's pretty well explained first you do have some text explaining what you can do and hey it's here uh for you for free so have fun and we will be here on discord for your questions a few people have asked how to get to data set studio that is provided in the docker container that you can uh download from the repo so if you follow the instructions uh in the repo to for the graph db section which i think is section five uh it will walk you through getting the docker container um and setting it up and then it will launch data stack studio from that container so we were just showing it off we didn't we were doing it's a very high level uh overview of it yeah because because we wanted to give some time to play a game so it would be my last slide before playing some games we have shown you four big flavors of no sequel database so but here at the bottom i put we put back the relational and the relational tried to be the best of everything oltp olap but at some points it reaches its limit too much volume too much throughput and you need to find something else so if you need more throughput here i put the document the tabular and at the end the key value which is you know the fastest among all and something more rational that the rational database themselves is the graph and so the graph is what you want to do here in the i relationship world and we put graph very high here because what i've shown you is data stacks graph which is a distributed graph so data starts graph can do both scalability and relationships so you can you should you should have a look but the end zone for part five the graph is optional to get the badge so homework uh we've talked a lot about this uh if you complete the uh part one through part four um the optional part five if you want that's the docker the graph database um you can certainly do that um there is all the info is on the github repo there's a homework section at the top and it'll tell you how like what you need to do how to submit it there's an issue um template that we use on uh on github and that you can see we have a few screenshots here that go through how to do it you can create a new issue so one of the links in the homework section will get you straight to the template and you just fill that out include the screenshots one thing to note is when you're doing the document db part when you're using swagger ui your astra token might be exposed on a lot of those screens so make sure you don't take screenshots of the astra token sometimes it's a little bit hidden just keep that in mind because we don't want any of those sensitive authentication tokens to be exposed so yeah just just keep that in mind when you sip it um we are doing the certification voucher so if you are interested this is for the cassandra certification uh if you are interested in learning more about cassandra we have a few tracks on our on our academy datastax.com uh website and you can go through the courses there i know uh kasara is uh uh going through that right now and has gone through a couple of those courses but you can use these vouchers to try for the certification uh these vouchers are worth 145 each normally and you get two attempts with this voucher so uh take advantage of that it should be open for the next few minutes um so get on that if you want it you'll need um a few bits of information from your astra db uh your your dashboard the the database id that's what you'll need it's on it's on the astro dashboard so make sure you you do that now speaking of those courses these are the courses on academy.datastax.com that we provide there's a developer track an administrator check and the kate sandra the kubernetes and cassandra track so those are all available for free on academy.datasex.com and we have a hackathon coming up so the september 3rd through the 5th we are doing a a build a modern data hackathon you can go to that website to register build a modern data app dot com and there's 26 000 in prizes so you know pretty good go in some money uh we do these workshops uh weekly we do multiple a week you can go to datasacks.comworkshops to see all of the upcoming workshops and register for them if you'd like you can also subscribe to our youtube channel and you'll be notified when we go live as well exactly and next week we will run how to build a react native mobile application so this one would be cool for sure a brand new workshop next week is react native um really excited it'll be really fun yeah that would be great if you want to start mobile development next week same time uh one last plug for our discord uh so thank you all for asking your questions on youtube when the session is over though the youtube chat goes away and we can't answer your questions anymore so go to our discord and you can continue to ask questions there um and we will answer them uh throughout the weeks uh so it's a great place i think i don't really remember how many people we have somewhere around 15 000 or 16 000 people so it is uh it's growing a growing community and it would be great to see you there all right so that completes our workshop for today thank you for joining us uh we are going to be sticking around very shortly uh yeah for a little bit afterwards because we have a uh giveaway that we have to do our uh what do we call the datasets lottery it's the three 500 gift cards so we're gonna be doing that in just a little bit but if i know we are a little bit over our two hour sessions for those of you who need to go thank you so much for joining us uh we hope you learned a lot and uh we really appreciate all the questions uh that you um that you asked and and hopefully we were able to answer them to your satisfaction yes thank you very much everybody for attending hope you enjoy we are here same time every week's a new topic uh we try to do a new topic consider to do the homework you will get a bag that you can use on linkedin pretty cool and start to collect them all
Info
Channel: DataStax Developers
Views: 4,181
Rating: 4.982533 out of 5
Keywords:
Id: vkSqkLPm5aM
Channel Id: undefined
Length: 114min 45sec (6885 seconds)
Published: Wed Aug 18 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.