Data Modeling with DBT - Step-by-Step Tutorial for Handling 440K Records using Docker and Postgres

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back data leaders and python learners DBT a beginner's guide handling real-time data data modeling Made Easy with DBT the challenge that we are facing the objective of learning to data model is to uh is to understand the data and then model it in such a way that the end user requirements are met okay that is the purpose of data modeling DBT is a tool that helps to do this modeling so just let us go to the Internet Explorer for a moment and we see that this if you go to docs.getdbt.com you will see that a lot of information is available about DBT okay and if you come to the home page just a minute if you come to the home page you see that the raw data is taken by the DBT or the data build tool it is built into a proper data set and given to the next step of bi ML and operational analytics applications this is the purpose of DB tool however this raw data is available inside a database server and Arab arrows and then it is going to be given to another database server or Warehouse where these applications work this is going to be e extract T transform load process this DB tool can effectively interface with all this applications snowflake redshift bigquery database however we are not going to use any of these applications you might be wondering why uh why are we not using it we are data leaders we have to understand how the entry cases of the data works and also how the applications that run the the data activity also works like data modeling data transformation data loading and extraction we should understand everything but when we use this of this any of this application right some of the these are automated for us okay like for instance they you don't you don't get to see how the database server is getting set up or if something is going to be changed for example you want to move move from Snowflake to redshift or from Red shift to bigquery how that gets done you may not be knowing okay and you may not be even knowing that what kind of challenges you will face if you are going to move from that trip to bigquery okay that that is that will be you know enormous amount of challenges but if you understand the basics of how these things work at the entry level at the database level it will be very helpful that will be helpful for us to manage the data in a very efficient fashion okay so let us go back to the presentation in order to become efficient and become a data leader we have to understand as I told at the application Level but we need to get our hands on the DB server first okay how are we going to do that second we have to get the data into the server how are we going to get the data first and how are you going to get into the server second that is the second part the third is to we have to connect the database server with DBT and finally we have to start using DVD this is a very challenging steps that we need to take to learn data modeling is it worth it okay I in next one are if if you give me one hour of your time I can tell you how to not just you know do data modeling I can tell you how to use Docker to to get the postgres database server in your local environment then work a little bit with Docker ecosystem then push 400k plus records into dockerized database that you actually have got it and then install and configure DBT and then do data modeling all this can be understood in next one so I mean it's worth a try right and Docker is going to be our hero we are going to head to the terminal before that after this video I'll be considering on python data apis Library which actually provides the data from various other data sources so that libraries I will be concentrating if you like this video do share the video and do leave a like subscribe to my channel so that you will get the notification whenever I am updating new videos okay let us go to the terminal right now in the terminal what you are going to do is you are going to execute the command called as Docker version okay you see my terminal is actually split like this so this is kind of a terminal multiplexer this will be easier for me to show you both the codes as well as the database in this area so it will be easier for me to work so that is why I've split the terminal like this so database will be at the bottom code will be at the top okay and now I am going to do Docker version command here and I'm going to execute you'll see that my Docker version is 23.0.1 in your case if you are not getting the reply like that it means let us go to the Internet Explorer it means you need to install Docker so come to this docker.com and install install the docker for your OS okay I'm not going to get into deeper into that get the docker installed it's it's very easy actually so whether it is Linux or whether it is Windows operating system it you can get up get it up and running very fast next what we are going to do is we are going to go to up.docker.com this is another place that we will immediately go to for getting the postgres image the postgres image or the postgres DB server image is is an application software that you can get it from the home page of the hub.docker.com just click on this link and it will take you to the image location image page images the software uh of the BB server that I'm talking about and if it is not available on the homepage you can search it here you will get it so this is the official Docker image that we will be downloading for postness okay this you can be it is safe it has been checked thoroughly by a lot of people almost 1 billion people have been one billion downloads has happened so you don't need to worry about the authenticity of this application okay now I have introduced DBT I have introduced how to install Docker and what is the image that you are going to take okay now let us go back to the presentation once and uh so first and foremost we are in the process of get okay just we are in the process of pulling the postgres docker that is the next first step we have to do let us go to the terminal the terminal you need to type docker in just a minute Docker pull postgres colon latest so this is what you have to type okay once you type you press enter this will be the postgres database application image to your local Drive after that if you type image LS Docker image LS and type enter you will see that the postgres image where it is yeah postgres latest downloaded to your database to your local machine so it is 377 MB so this is the size created date Etc so it is showing all those things and this is how you actually get your postgres application okay okay we are not at started we have not got the we have not got the database server you know just got the application now how to get the server for that you have to use the command this command this is going to be a lengthy command yeah you see this this command is what we are going to use okay don't get intimidated by this command I will explain to you each and everything one by one now what is going to happen okay and once I explained to you this command will not be intimidating so let me add one more Point here name is tutorial B B so what this happens is it this is having a lot of Docker is written using the programming language called as go and this uh particular application can interface with the hardware very effectively so this is one of the very versatile application what happens is that if you let let me go to the top area and I'm going to say docker run help okay I'm going to ask for help and if you see it will give you give me a lot of switches like this w v u t so all these switches that I'm using here d p e v all this explanation interactive host string environmental list uh detach all these things you can actually refer what is the meaning of it okay so you have to just type this command or you have to I will give this command in the description also just copy paste it in your terminal in your command prompt okay and then execute it it will work okay you don't need to worry about uh what is happening underneath you can just copy paste it to do work okay but if you have to understand I'm going to explain first and foremost uh Docker run will start the docker uh process on this image you see this image the image that we downloaded in order for this image to work properly we need to give it a port that is 5431 map25432 so basically I am telling that I want the postgres image the postgres application to interface with 5432 Port keep this in mind it will interface with 5432 port OK and it will be in detached method detached in the sense this particular prompt that I'm using will not get used by this application it will detach from this prompt that is what detach means this hyphen e means environmental list okay so environmental list password is pass and a postgres user is postgres this is a username and password this V means volume this is important okay the okay let me actually make this little more uh different in in the command prompt now in the Shell command prompt we can actually use this kind of a flag uh okay I think uh this was an error just a minute okay so here it is saying that uh the tutorial DB is already existing so it's fine I just want to make some changes here so that I can uh just a minute it's creating a power okay so basically what I wanted to show is that I want to show that I want to show that the V switch is for the local volume this is the local volume the one on the left side of this colon one on the left side is the local Drive okay and one on the right side is the one inside the docker image okay this is the application uh location application data and this is your localhost folder so if you go to my home if I actually use home like this you'll see that there is a folder called as postgres Target you see this this folder is my local folder okay but this is actually inside the docker keep this in mind this is very important don't give it in the opposite way it will not work it will actually error out okay so this is something that I wanted to show and in my case I have already executed this so I don't need to execute it after you execute that if you say Docker LS if you uh sorry docker PS sorry PS and iPhone a if you execute that you will see that the docker is running for me it has been created an hour ago and set is up it has to be up it should not be exited ensure that it is up and the name is tutorial DB okay now we have access to just a minute let us go to the presentation once we are having access to our varieties we are having access to our database server we have access to our database server okay you are thinking where is the database server we actually did something with the docker image but we have not got the access to the server yet so that is what we are going to see next let us go back to the terminal in the terminal what we are going to do is we are going to use Docker exec it and in ID we are going to its actually we are going to give interactive TTY so this is all Linux terminal Linux terminologies just follow it you don't need to worry about what these things are as of now just follow it tutorial DB is the instance that I am that I am using and I am going to say PS ql psql is the command line interface for postgres server this is where we are going to get the access to the server that I am telling about and I am going to say username is postgres the same username that we gave in our environment uh the yeah come here so we give this we give this username right we give this username postgres and we give this password so we are going to use that h it's just localhost host is localhost and uh uh it's his localhost and then we need to use database equal to postgres so that is not required so if you then enter it now it will actually give me connection to the database server so there you go this is how you get connected to the database server which is running inside the docker okay but you are you need to do little more activity so I am going to give some more comments about Docker here okay just just follow this so docker inspect okay tutorial DB so if you do this if you if you actually execute this command you will get the IP address at the bottom you will get the IP address of the database server this IP address is important take a note of this okay and this is how you get the IP address and the next is docker logs tutorial video if you see this uh this is actually showing the database situation it is giving the logs of the database situation so it is it is here it is saying that something has gone wrong with the database it was not shut down properly okay and uh some of the activities all the logs are getting loaded we can check it if you want okay in your case it should be just saying database ready for accepting connection nothing else it should say database ready for accepting connection that is what it will say if you are connected here it will say that okay someone has connected also so that is what it will it will show so these are the two commands logs and uh what was the other one inspect so these are the two commands that I wanted to share okay now we have the database server here we have understood how Docker works so let us go back to the presentation once again just to recap so we have got the database server now we need to get the data into the server okay how are we gonna do that so before we actually get the data into the server we have 400k plus records ready to be push into the server but before that let's go back to the terminal before that we need to create the database so what am I talking about let me make this remote bigger okay and let us list so in order to get the list of database inside your database server you have to use slash l so you use slash L you'll see that there are four databases in my database server but in your database server this will not be there test will not be there how to create it you have to use create database test and semicolon you should always finish it with semicolon RLS postgres SQL Server will not consider it as a command okay if for an example if you don't give a semicolon like this and if I press enter you will see that this equal to is not coming it is still minus so if I give semicolon here then it will say that the command is complete and it is erroring out it's saying database test is already existing okay so in my case it will say that it is already existing in your case it will execute and it will create the database test okay and this database if you have followed every step properly till now will persist even okay let me go out now I am going to exit from the database server I've exited okay and I am going to do Docker stop Docker stop tutorial DB okay this is the name of the docker I have stopped it now once it once it is stopped and if I try to connect so if I connect try to connect to uh the same database it will say that error responds from the demo container is not running okay after you stop it always remember that you need to RM tutorial DB so you have to do this or else the next command will not work the starting command will not work so after doing RM I'm going to start again so if I start again the docker will start again and now I am going to reconnect but now it will connect so let us check the logs first Docker logs tutorial DB so you'll see database system is ready to accept connections it will come like this and it will also say when it was shut down last and if I execute this now you will see that I am able to execute and if I list the database the test database is still there so uh the whatever you are putting into this Docker container will actually persist even if the docker container is stopped why this is happening it is happening because of this command okay this is happening because I am asking the whatever data I'm putting into the docker to be mapped to my local hard drive if I don't have this command then whatever doing in this Docker will be lost after I stop the docker container okay the container that I'm talking is the tutorial DB okay keep this in mind very firmly or else you will lose the data or whatever work you are doing but it's not a very big deal right you can anyway get it very very fast back so you'll see you will see how fast it is in a moment okay now let us go back to the presentation for a moment and we have got the access to the database server properly and we also understood how to work with the docker little better now next what we are going to do is we are going to go to the Internet Explorer and we are going to go to the this particular GitHub repo this repo contains the data and also it contains the postgres data loader script so let me open this script here it's a little bit easier for you for me to explain to you and in this script you'll see that the this script actually loads the data into the docker postgres database this is the script for it so it will un open the uh just a minute let me zoom in a bit yeah it is visible for you guys so it will unpack the archive.zip so archive.zip is here let me open that in a separate so it will it will unpack this archive.zip that will have this particular CSV file and then it will go through the cleaning process so from here to here it will go through the cleaning process and after that it will print that cleaning is complete and then it will move the move the data into the database this simple command is going to do the heavy lifting okay but all these things how to do with pandas library in Python I have covered it in a separate video so I will tag this that video into along with this video along with this particular postgres database DBT tool video you can understand better what happens is that uh pandas is a separate Library which is very very versatile so you need to take a look at that and here you see this is the username and the password is one two three four I will show you how to change the password in your database server in a second this is the IP I told you to remember the IP right this is the IP and this is the database test that we create so this string has to be correct as per your database configuration this four values are important if this is wrong when we execute the script the execution will fail at this point it will fail at this point and it will error out it will say that cannot connect it will it will error out so let me show you the yeah this is The RK file that I was talking about okay let us reset and let us go back to the terminal once again we are in the terminal and we are going to go to the one more point I forgot to tell you something let us go back to the you can download this entire GitHub repo from the using bitclone if you are new to GitHub if you are not used to you used to GitHub process I have a separate video for that take a look at that you will understand how to use it in less than one hour and then you can use this uh do a bit clone of this uh directory to your local drive and you need to go to I'm going back to the terminal you need to go to the terminal location where your drive is you have to open in 10 minutes okay in my case it is there in git folders and inside the git folders I am having DBT DBT Hands-On tutorial so this is the code area so what I'm going to do is I'm going to keep the database little down now I have shown you how how the loading happens right I am going to execute that file python the file postgres data loader is already inside this dbd Hands-On tutorial so let me show you here also again so this is already inside here so this file is already inside here I have already downloaded it so if I use python postgres data loader what will happen is it will start the process it will actually read the file it will read this file okay first of all it will unpack it is saying see unpack archive and it will read this file then it will actually create the data frame and it will load it I am going to stop it because I've already done this process okay and once that process is done if without any errors okay there are possibilities of Errors occurring if your IP is wrong okay the IP that you for the docker that your provided is wrong if the port that you have provided is wrong any of these values are wrong in the typo so let me go to the internet browser again and let us let me show you the loader for a moment for your benefit let me zoom in so if any of these values are wrong then it will the it will error as I explained to you okay and successfully if it loads then it will you will see that you can come back to the terminal you will see that you have to connect to the test Trader sorry connect to the test database for connecting to the database you have to use slash T Test it will say you are now connected to database test as user postgres and inside the test you will actually found you will actually find Factor underscore data this is the table that you need to find these three will not be there okay so let me drop them so in order to drop table all you need to do is hacker defract okay it will drop the table and if I say drop table my first this this is something that you also need to aware so Cascade so this will drop the other two tables also and if I list sorry I need to do slash D okay if I do slash D you'll see that there is one table and if I do select count star Brown data semicolon that is important and if you press enter you will see that 448 K rows are inside the database so you don't need to believe me all you need to do is go remove this and let us actually do a limit on this limit 5 enter you will see the data is inside so you got the data let us go to the presentation once okay so there you go you have got the database server and we have got the data into the server using paper I have not mentioned here as python but yeah that will actually break the suspense right now how to connect the database with the DBT that is data build tool already I explained what is data build tool right so let us go to the Internet Explorer just for you too I already told that this is a tool and we are not going to use any of these warehouses we are going to use postgres okay and you can you can actually come I think it will be here I suppose you can check getting started guide okay these guys have not even given postgres they have only spoken about warehouses go to the getting started uh just a minute go to the documentation if you go to the documentation you will get a brief idea of how to work with the DB tool okay so it starts with what is DBT documentation getting started right so click on this click on getting start upgrade and you'll see that there are two options so get DBT cloud and DBT core I prefer DBT core as I already told you we are data leaders right we have to understand everything from the basic level and what is more basic than the command prompt tell me nothing is more basic than command prompt except kernel level put it now let us get started with DBT code if you go into dbd core it will actually tell you how to get started with DBT core it will give you the okay all these things you are going to explain we are going to check now okay we uh we can do that you can also follow this also if you want but I am going to anyway follow uh I'm going to explain this to you in this video now so here you will see that installation overview installation with Pip go to install with Pip okay so here you have to install see here you see that right you can actually use DBT postgres also there is uh it is available with DBT tool and if you scroll down you can use with redshift Snowflake bigquery and try now we will be doing installation of these two so keep install DB decor and postgres okay just copy this just copy this and then come to the terminal in the terminal inside the the folder that we downloaded right go inside the folder so I am talking I am now seeing showing at the top we are going to do show some code so here you have to paste so you have to do DB decor and you have to you have to put the type the command like this and then press enter so it will install the both the applications once the application is installed you have to say DBT version if you do dbd version you will see that it will give a lot of information about the situation uh the update situation whether it has been installed properly and what are the tools uh plugins that is installed so you see that so the plugins is postgres it is installed version is 1.44 and it is uploaded all this information we get it in single command and you will also know that the version you will also know that the tool is installed properly okay so we are ready so now we have to configure DBT if you uh okay now that you we are here right let me actually show you something this is very important just follow try to follow with me little it closely so here I am in the database right now the time in this place database I am going to list the tables so there is only one table here hacker underscore data now I am going to go to the place where I am having the code unless so this is the place I am having the code that you got it from the from the GitHub right first and foremost what you need to do is you need to create a folder in your home directory so what is home directory home directory if you are in Windows it is my documents home directory okay you need to create huh uh uh dot DBT folder and inside the folder you need to create profiles.yaml dot oml you need to create this file okay that file should contain this data the file should contain this data okay all this data should be inside the that file profiles.yaml file I have given an example file here profile underscore yaml underscore file.txt this will contain the same data so you see the host IP and the port and the and the user Etc you see this this is actually you have to use it uh just a minute actually just give me a moment I think there is a mistake in this file let me check it that I think the port is wrong here let me recheck it with my original file yeah the port is wrong so the actual Port it should be five four three two so let me edit it right away and let me push it so when profile yaml file and let me edit it yeah let me edit it now three two and this is also three two okay ensure that it is you know appropriate in your file also okay I have saved it the point here is that let me go to my ml file this has to be created then only the next step you can do so let me show uh let me first of all go to my profiles.com so you see that my profile has default and output is these values production output has production and production has the database name and the database IP is this and then this is another one output location called Pub and that is also having the database but the only difference is the schema so I will show you how to work with the schema in couple of minutes in this video and there is also one more profile called test project I will explain to you how it got created okay so once the file profiles.yaml file is there then what you need to do is you need to come to this uh folder and you need to just run DBT debug just execute this DBT debug once you execute this dbd debug you should get everything is okay there will be green updates that comes that it says everything is okay so all the checks have passed has passed the profile files are okay git is found git is very important okay GitHub is found and the connection between the database is okay the port is 5432 IP is okay if any of this is wrong then it will add a route okay now the point here is that let me again remind you let me go back in the database location we have only one table okay but now what I am going to do is I am going to execute in this code area I'm going to execute DBT in the inside this folder I am going to execute DBT run this is the command to run the tool so what this will do is it will actually let me go to the Internet Explorer it will actually do this process it will extract the data from one database server or table it will go through this ETL process and create a new data set so let us go back to the terminal and in the meantime it has completed so it has done something with the 448 records and let us see that has there any change in our database okay let us do a list of the table ah now you see there are four tables and I did not do anything okay you saw me that I executed that code and I was showing you how the DBT Works correct now let us see what is how much data is there in the this table actor underscore d flat so select count star from Factor D track okay that is for 448k rows in this also and let us check how the rows look okay select star from limit by oh this is actually different the the one that we saw earlier inside the data that was having a different uh format this is having a lot of columns but the dfac has less number of columns and it has something called as count group and this number looks different so actually we have done an ETL activity right now so what has happened is I have taken the uh the DBT tool model has taken the data from this table it has done some transformation and written a new table okay this is what has happened this is already coded here okay now what I'm going to do is I am going to go back to the Internet Explorer to explain the folders just a minute let us go to the Internet Explorer and I am going to go into the GitHub repo this will be much more easier to explain to you the code what is going to change okay so uh once I explain this I will show it to you in the command prompt there will be lots of files like this okay in order to get these folders you don't need to do anything manually you have to execute one command called as DBT init and the project name so let me show that first okay these folders all these folders that you see here will get populated let us go back to the terminal I am going to go out of this DBT Hands-On tutorial okay you also should go out of the DBT Hands-On tutorial change out and then you can say BBT in any folder you can say DBT init and you say test tool test underscore db2 and you press enter and the moment you press enter what will happen is it will start the project for you and it will start populating so first of all it will ask me what database you want to connect with so I can I have only one option right we did not install any other thing so I say one and it says immediately okay I am ready and now if you see you can actually connect with the you can change into the folder so if you change into the test DBT tool and if you do LS you will see all this file automatically populated and this is how let us go back to the browser this is how all this files got created in the first place okay and in what we are going to do is we are concentrating we are going to concentrate on models folder specifically and in models folder when you create the data when you create the project DBT init project you will get only example so this example will contain my first DBT model and my second DBT model only it will contain these two and it will contain schema.yama this schema.yaml is actually the models the basic model that is available okay these are how to say let me get to this in in in couple of minutes so let us actually go out and let us actually look at the my first DBT model click on this if you open this my first DBT model you will see that it is saying with Source data okay it is starting blindly that with Source data you don't need to worry about this right now as of now so with Source data as select one as ID Union all and select null as ID so this is doing some kind of SQL activity and then it is saying select star from Source data when you look at this particular example right it is extremely tough to understand what is going on to be honest with you so don't try forget it don't don't refer to this I I just want to show it to you so that uh you will not you will not actually try to see it so you'll actually get confused what you it is better to do is that you come to hacker underscore tab and there you'll see there will be two more files like this so click on schema.com you will see that this schema okay just a minute yeah I suppose nowadays more better schema dot ml will contain the name this is the name of the source it will contain something called the sources it is the name of the source test and description and database the database test that we created and the schema is public so we will see that schema is very important in the next video I will explain to you how we can use schema for a kind of a data marked and staging unstaging area we can we can do all those stuff with schema and then we have the name hacker data the table name that we created hacker underscore data all hacking records from Honeypot data so this is under keema.yaml file let us take a step back and let us go into hacker underscore defrag.sql and in that place you will see that let us Zoom back a little and you'll see that it has it is actually creating a lot of tables here so it is creating table it is starting with this is actually Common Table expression CTE with classes used for Common Table expression and it is selecting from the cryptic source if you know SQL right I hope you are aware of little bit of SQL and if you know that you will know SQL commands start with select and there will be tables there will be columns and then there will be a table name but here you see that I'm I'm having a different kind of a syntax source test hacker okay what this actually does is it will it'll actually go into this hacker tab folder and inside the hacker tab folder where is schema.aml and inside the schema.yaml sources file is there right it will check is there any Source called test and you'll see that there is a name Source called name test and inside that it knows that it is a database called test and under the table is hacker data so that is what it is getting referred here okay and from that we are selecting Pi columns and from the same Source we are actually counting the number of countries by grouping okay we are doing this group activity so we are selecting we are grouping and then in final we are actually doing a join so we are joining the hacker table occur here and we are joining the group data group data here and we are doing a left join and we are using country as the the what is it called the joining call to to reference column that you are using country and the formula is also different we are saying using not in the SQL script what you will say we will say equal to so we will say group data dot country equal to hacker dot country right but here we are saying group data using country now this is what is actually called as modeling in DBT so we have modeled the table Okay and like this you see when you read it right it is so easy to explain and I can explain to anybody by just showing the code right what is happening and uh let us Zoom back let me Zoom back and let us uh I we are we are actually you know completed the actual activity now so let us go to the terminal and so right now we have actually seen how to we we have to go back into CD DB tool DB DB tool Hands-On tutorial and we are inside that if you do LS this is the folder that I was showing in the GitHub repo just a minute ago so if you do models here you will see the same folder and if I do a win here let me let me show that so inside the hacker tab you will have hacker defrag the same file I showed you there the same file and also the scheme I am the same file that I showed you in the uh in the report so this is already downloaded here and now when I uh wait a second so now when I run when I run the command when I run the command DBT run what it will do is what DBT will do is it will actually go into this model folder it will check the models and it will execute the models okay and what it will do it will write the SQL in this folder called as Target okay let me let us show you let me show you how the targets folder looks let me go back to the Internet Explorer again and let us go to the let us go to oh where is the Target folder not available here the reason is that whenever we whenever I we are pushing the data to GitHub the targets are already always removed Okay the reason is that the targets folder might contain important information so the data is always removed it will contain the database tables name Etc so it will always get removed but when you look at the models it it doesn't make sense so what is the table underlying that it may not make sense so let us uh just go back to the terminal again and let us actually check it here only just a minute I am doing a good start now yeah it is not there will and we are in the targets and you see there is compiled and that is run okay and you check the compare you will see that there is both example and hacker Tab and hacker defrag SQL so go into hacker tab and click on hacker underscore D5 SQL you'll see that here the database name is different here the database name is test public hacker data this is the same as your database so test database public schema hacker data table okay same here also test public hacker data and here also you can see that right so here also you can see uh hacker data oh here it's actually you know selecting the earlier tables and you come here and yeah it is essential so it is both both the files are same so that is how it gets executed okay so this is the actual SQL that's get written and the point that I want to make here is that uh if you understand SQL properly then it will be easier for you to see you see this here so inside the Run DBT intro model you see that the test hack it is creating the table it is creating create table test public hacker defrag DBT temp so this defrag table that is created is using this table okay when when this SQL is run the table is getting created so that is how the entire process is working so the point the reason of you know showing you all this stuff is that the DBT tool is basically writing SQL in such a way that it is more readable for us in the back and also it can document very easily so there are ways to test document and do lot of other kind of data modeling activities with DBT that will be covered in the next video the following up video I am planning to use the same data if you are following up in this video right you can use you can do you can check the next video also and you can continue the tutorial but here we have already seen the major points from so let me let me actually quite out of it so if okay just a minute okay now the point is that let me go back to the presentation okay here in the presentation so what have you learned we have learned how to pull uh the image from the docker you have learned about Docker how to work with Docker and get the database server we have pushed 400k records into the database server using python and we have configured DBT and we have started modeling with DVD and we created a folder we have created a table in our database using DBT now what to do next and where to go next the major point is that let me go back to the terminal okay if you if you go there I actually created only one model right I created only one model and let me do again now just think about it let me let me also think along with you just think about it now here you see that I can actually create a new new table by combining by splitting the existing one table I took one table and I split it into a different table and I joined it I did it all in one script now assume that I have one more script okay one more uh model and in that model I do something else and like this I can have any number of models I want and all this model will create separate separate tables in my database server and all this table will be in the staging area okay or it will be in the Raw area so whichever area you want to keep you can keep it and the way to manage that which area to move the table is taken care by just a minute let me show you what happens is what happens is that when it is what happens is that the just a minute DBT project this area example and the hacker tab I opened the file DBT project dot ml okay once you open it you will see at the bottom that there is example and Hakata this is actually the model folder that we are using in the model we can actually mention the schema also so let me open the DBT tool DBT file in Game DVD project.yaml file and name and if I come here and if I say plus schema is equal to uh staging if I give this kind of a command if I give this kind of an update and if I save this and if I run the if I run the DPT tool what will happen is whatever is there inside the model hacker tab we'll get materialized into staging area staging schema it will not be in okay let me show you it will not be in public schema no more it will mean public schema it will mean staging schema okay let us show you what are the schemas available so in this database the list of schema is only one public schema nothing else okay if I have to search so if I want to say set search bar okay let us say show search bar first so search path why what this does is it actually searches it sets the path in which the table will be searched so it will only search in public schema right now so basically schema is kind of a folder okay as I was explaining the database server is a setup folder like uh your all your applications it is not uh different from your existing files or the way it works it looks the name is given as a database server but internally it is made of text files and program files which have the data inside them but data will not be visible to you the way it will be visible to visible in this area so database server knows how to query it by naked eye we cannot be able to find it so we can actually change this schema by using this uh using this option so if I change this we I will be doing these things in the next video okay and that is one thing that we will be doing and apart from that what we'll do is we can also do the let me exit out so we can also change we can also add let me go into models in this we can also add more folders here and we can create new models so for example we want to create a start schema or we want to create a kind of a honey snowflake schema I want to create that we can actually do the data modeling by using multiple models here also that is the power of DBT and you know of all these things the benefit is that each and every the biggest benefit is this if I do a whim here the biggest benefit is that it can actually document everything that you are doing so if I I can share this file to anybody but assume that I am doing the let me come let me come to the database here I am doing a select command here and I am selecting say index I'm just giving some 10 things index Source DPT spt country I am doing some kind of things here and uh then I say okay group from hacker data and then I say Group by I am doing all these things okay and I I am actually creating a table but how will I share this with anybody you cannot write it is inside the database and and I am giving the command directly to the database so what DB tool does is that we can easily share this code that we have done in SQL with others and we can document it there is a way to document also all these things how to document how to create more models how to work DBT with DBT to create more you know efficient models will be tackled in the next video this video let us go back to the presentation right now in this video we have seen the actual uh all these problems we have solved where where to get the VB server how to get the data into it how to connect the DBS base with the database with DB tool and how to use DB tool or data modeling basic data model all these things we have seen in this video so yeah if you are if you have any questions or comments do leave it in the comment section I will revert back to you as soon as possible and uh if you uh I hope you like the content do leave a like and uh do share this content with others if someone is saying that they have trouble connecting with a database like postgres or MySQL both the any of the database server it's not just postgres or any of the database server will have basic password and username request along with some of the command line application right nothing different but the point is that how to get that application command line application and all those stuff you have to do some internet search but the process of connecting will be very much similar from the command prompt and yeah do subscribe to my channel and once you subscribe Whenever there is a new video that I upload it will you will get the notification in that way you'll be updated with the content that is what I wanted to share in this video the most important thing is that you practice what is you know shared here the main point the intention was to what was the intention of getting all these things the main intention was that it can be easy for you to get your hands dirty okay you have you can get the docker no issues you can pull the docker you can install the docker you can get the docker image for the postgres you can get the records it's already there in the GitHub repo you can configure DBT by doing a Pippin stock after installing the DBT and uh you can do this practice for any of the data sets yeah you might have to work in this place you might have to use uh use your pandas skills in cleaning the data set but the process of pushing the data into the database using the python will be same as the code that I shared with you so this gives you a overall picture to how you can you know work more effectively with the database irrespective of the size whether it is a snowflake server a snowflake garage or bigquery warehouse at the end of the day there will be a way to connect using Python and also there will be a way to you know give the connection string so that will be so all you need to do is just do a Google Google search so you will get it that is that is the important point of data leaders so we actually are on top of things and we get the things done that that is more important so uh yeah that's all guys so keep on practicing and uh till the next video practice practice see you
Info
Channel: Kamalraj M M
Views: 3,619
Rating: undefined out of 5
Keywords: getting started with docker, data modeling, data modelling, python, data modeling concepts in software engineering, docker postgresql, postgresql docker, data pipeline in python, docker for data engineering, data driven science and engineering, knowledge and data engineering, docker, data vault modeling, docker container, ensemble data modelling, modeling, docker online course, how to unit test in sql with dbt, how to learn docker in 2022, docker for data scientists, getting dbt
Id: kUkDJEr93H8
Channel Id: undefined
Length: 61min 2sec (3662 seconds)
Published: Fri Mar 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.