Google BigQuery and DBT| Build SQL Data Pipeline [Hands on Lab]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] i'm tariq from skill curb and in this video you'll learn how to use data build tool with google cloud bigquery to transform data the data build tool is a cli tool for running transformation pipelines testing and documentation in your data warehouse each stage of your transformation pipeline is called a model and it is defined in a sql file it can help you tame massive data and simplify building complex data pipelines the data build tool allows us to build elt pipelines which are a bit different from the classic etl pipelines in this approach we can load data into the warehouse and transform it later because the data is first stored in the warehouse and then transformed we don't lose any information from the raw data during the transformation bigquery is google's serverless data warehouse offering which is one of the easiest and most user-friendly data warehousing platforms out there the first step of elt pipeline is the extract and load you have to load the data into bigquery before you can do the transformations using the data build tool for this lab we'll use a public data set from google cloud storage to extract data and load into bigquery then we'll apply transformations on it using the data build tool the transform data can be analyzed using several analytical tools so let's get started with the lab the first step is to login to your dbt and google cloud accounts then generate a service account in google cloud account then create a new dpt project and connect to the google cloud project using the previously created service account then create a new bigquery data set using public data sets run sql query on the dataset using dbt and finally view transformation on the bigquery console first go to getdpt.com and sign in if you already have an account or create a free account after signing in go to account settings and select the developer plan from the billing option as this plan is free now go to console.cloud.google.com and select your project go to apis and services from the navigation bar and select credentials click on create credentials and select service account here we have to give it a name and select a role for it i'll name it dbt curie demo and then click on create and continue now select the honor roll for this project now click on continue and then done it will create a new service account click on the name of the new service account go to keys click on add key and create a new json key once you click on create a new file will be downloaded to your system this is the key to your google cloud service account in json format now go to library in apis and services and enable the bigquery api i already have it enabled the next step is to create a new dbt project go to account settings and create new project click on begin i'll name it dbt cloud demo for bigquery click on continue select the bigquery data warehouse rename the database connection then click on upload a service account json file upload the file you created it will automatically fill out most of the fields then go ahead and rename your data set i'll name it dbt demo query and then test the connection once the connection is successfully tested click on continue the next step is to connect to your github account click on github login and authorize now you need to create a new repository i'll create one with the default settings i'll name it dbt demo1 and then click on create i'll keep it public once it has been created the next step is to connect to it go to the dbt profile settings integrations and then click on configure integration here you can allow read and write permissions you can select one or all of your repositories i'll select the one i've just created and then click on installed go back to your project page and click on start developing now click on initialize your project to initialize the empty project it has automatically created all the sample necessary files needed for this project click on commit to commit the changes of project into github you can observe it from the github account let's go to the github account we can see the first commit here now go to bigquery console from google cloud account and create a new data set in your project i'll name it dbt demo big query go back to dpt project and create a new branch so that we can modify add or delete file we have two model files and a schema file in models example folder model file is basically a sql script first model file contains a definition of the dbt transformation it will materialize the transformation as a table the second file contains an example view definition which refers to the previous model the schema file provides additional information on how to prepare the model run the dbt run command the first model is created as a table while the second is created as view let's go and find them on bigquery now go back to dbt and open the project ml file we'll rename the project name and write the same name under the models field at the end of the file make sure to write exact same names in both of these fields now if you change the materialized field to table instead of view both models will be created as tables let's run the dpt command again and observe the results we can also see the two new tables on the big query console public dataset available in the google cloud storage we'll be using the medicare data set which contains information about the inpatient and outpatient charges now we will write and run a query on this data set the query we're writing is a very basic example of transformation where we are extracting a few columns from the table and renaming them we are also applying sum over some of the columns and finally grouping them together now let's run the query a new table is formed as a result now let's save the new result table inside our current project data set we'll name it new table export we'll now apply the same query using dbt to apply this query let's make a new sql file in the models directory using dbt we can easily extract data from any data set present in the big query and then we can apply one or more transformations on it we can see the new table formed upon executing the dbt run command we can also preview the new result table in the dbt console we can see the result in the bigquery source also this table is materialized as a view we can change this view into a table very easily with the help of dbt we just have to write the configuration statement on the top of the sql query in dbt we will change the value of materialized from view to table execute the run dbt command and observe how it has created a table now this is how easily you can apply transformation on any data set present in the big query using dbt that will be all for now but here is how you can delete a project from your dbt account go to account settings select your project click on edit and then click delete also don't forget to delete the bigquery dataset you created as it is a billable resource i hope this lab was useful for you please leave us a comment to share your experience of using this tool thanks for watching i'll see you next time
Info
Channel: SkillCurb
Views: 8,092
Rating: undefined out of 5
Keywords: etl, data engineer projects, data engineer tutorials, how to load data in BigQuery, loading data into BigQuery, analyze data in BigQuery, create table in BigQuery, create dataset in BigQuery, data pipelines, SQL, how to run queries in dbt, bigquery and dbt, data pipeline design, data pipeline tutorial, dbt github, dbt tutorial, create dbt project, how to install dbt, data build tool dbt, intro to dbt, data transformation, SQL Tutorial, elt, dbt, what is dbt, dbt hands on lab
Id: _C_pYeuF6_s
Channel Id: undefined
Length: 9min 35sec (575 seconds)
Published: Sun Aug 28 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.