Data Engineering Project: Quickly Build a Complex Data Pipeline

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello this is Stephen Johnson I'm a data community advocate here at Shipyard I know what you're probably thinking this video was much longer than the normal video that we put out over at Shipyard and we thought we'd mix it up and do something a little bit different today and so what we're gonna do in this video is we're gonna do an end-to-end data project so that means we're going to start with everything completely Fresh So the data is not going to be in our database not going to be inside a shipyard and we're going to extract that data using Python scripts so then once we get done extract that data using Python scripts we're going to put that into bigquery once that is inside a bigquery we're going to do some Transformations using DBT core so we're actually going to host the DBT core instance inside a shipyard which should be pretty cool to see and then once it is transformed we're going to take it into looker studio and do some basic some basic visualizations uh this video is not going to be heavy on visualizations it's going to be heavy on the extraction and transformation part but I'm excited to kind of go from you know A to B to C to D here because using our videos kind of just go A to B so I'm excited to kind of go kind of full scope here and so if you're watching this video and you want to kind of know what you need to do what you need to know to kind of follow along so that you won't get lost at the video we will be using some python just some straight Python scripts I'm not going to spend just too much time breaking down the Python scripts and I've already like some of those are already pre-written and I'm going to kind of run through those to see kind of what's going on with those Python scripts kind of pouring the data from YouTube uh in webflow um but that's you're gonna need a little bit of python knowledge you're also gonna need a little bit of SQL knowledge as well uh my SQL knowledge I'm not an expert by any stretch's imagination I know just enough to be dangerous and you'll see that in this video um thankfully there's all kinds of resources out there in the world to kind of let you know SQL and python as well so I use some of those resources to kind of write the code that I needed to get this data transformed to get to the form that I needed it to be um and so once that data is transformed we're going to take it into looker studio uh again we're not gonna We're not gonna do any crazy crazy visualizations we're just going to test and make sure the data got there like we wanted it to um so that's what we're going to do in this video is we're going to kind of Go a to d i guess this year uh in this video from extraction to looker Studio but I'm excited to kind of show a full process flow inside a shipyard just to kind of show how you can orchestrate a full process like this for your organization whether y'all use webflow and YouTube for um for videos and for your content management system uh but even if you don't this process can apply to almost any system out there in the world as long as you can hit it with it like hit it with a python API call this process flow should work for you as well so with all that being said let's jump into extracting the data from webflow and YouTube so jumping over into our Fleet Builder we can go ahead and get started with that extraction at that extraction phase of this project uh but before we actually start writing any coding and you know kind of putting some code on paper per se I want to kind of show the data that we're trying to pull out here just so we all have an idea of what we're trying to actually accomplish here um so looking at YouTube so if you've been on our YouTube channel uh which you might be watching this on our YouTube channel right now you know that we have solution videos for you know vendor to vendor kind of a to B Solutions so we kind of have those going down here uh there's some DBT ones uh YouTube and Whisper one so on and so forth there so we have these solution videos that we have uh inside of our YouTube channel um and so we have those videos uh and we we have you know obviously they have their views their comments their likes things like that but at Shipyard we we have at this point we have not pulled that into a database that we can analyze that um and so that's what we're trying to accomplish in this video first off is getting this YouTube data into a database so that we can analyze it to see what videos are doing better what videos aren't doing is good so that we can you know properly track what we should continue doing in the future and then kind of match those up if you've been to our website you should know that we have our solution page on our website so let me show what that looks like real quick so our Solutions page is here so I can click on one of these and it kind of shows you the A to B things things like that and then if it has a video it's going to show up here not all of them have videos that's something we got to take care of later on and inside this video as well so not all of them have videos but for those of them that do we want to pull these we want to pull all the solutions here from webflow we have all the video URLs in there so that we're able at that point to kind of match video ID from YouTube to video ID inside of webflow and then in the future we can actually pull in some Google analytics data so we know which like which pages are getting views off of the YouTube videos uh which we can obviously match up with some UTM parameters as well that's outside the scope of this project we might follow up this video with another one kind of hitting kind of matching all those together uh but all we're doing right here is pulling the data from YouTube pulling the data from webflow making sure that we can match the solution CMS things here so from here to our actual YouTube videos here and that's what we're going to accomplish by the end of this video so let's jump back in the shipyard and show how we can do that and so again we currently don't have blueprints to pull data from YouTube or pull data from webflow so we're gonna have to use actual custom Python scripts here uh so Shipyard we do have our low code blueprints that we'll get into a little bit later but we're going to start off with some basic python so we're going to start off with that and the first thing we're going to do is actually pull the YouTube data so I'm going to name this uh download data from YouTube okay and I'll name it YouTube download that pie um again that file name's not that important just keep it that way or you know you can name it whatever you want to as long as it's you know as long as you can tell what it is um so again I'm not going to just write all this code just fresh for you uh I'm gonna talk to the code as I kind of put it into the channel um excuse me so starting off here we're going to import request Json and Os so request is so we can make that API call Json is for some formatting stuff that we'll get into later and then OS is so that we can use uh we can use environment variables so that we can hide our API Keys um of course definitely you don't have to hide your your keys uh but definitely a pro tip from us at Shipyard just so no one can log into your account and see that key kind of in your code so the first thing that we want to do is we want to grab our YouTube channel like our YouTube videos um so I'm going to do that I'm going to paste in this I'm going to paste in this function here get Channel videos it takes the channel ID um it also takes an API key and so you can see what it's doing is basically it's taking the base video URL and then this API URL from Google it's putting in that channel ID in there uh and then basically it's going to grab all the videos all the information are all like base the basic information for these videos so we can call those video IDs later on um and then actually get the you know the views the likes the comments the descriptions the titles so on and so forth there so you can see that we're returning the video links from that so you can see we're basically looping through all those videos to get those out so that's what that function's doing there and so what we're going to do is I'm going to kind of paste this this next block in here as well uh so it's outside of the function and so what we're doing here so here's the first thing I really want to take a note of here so inside of Shipyard if you're using your own custom Python scripts you do have the ability to do an environment variable to hide API key so this is the first chance that we're going to see that so since we did it this way I can come down here to environment variables I can add an environment variable I'm adding it to API key because that's what I called it and then I'm going to go ahead and throw my API credential and there there you go you can see it's hidden I can't view it I'm not going to but then once I click save that's going to go away forever and you're never going to be able to actually visibly see that API key inside the platform ever again which I think is a pretty cool feature so you can see our Channel ID here so you can see we're getting using that function above to get Channel videos channel ID API key feeding those two values in to get get all of our Channel videos you can see I'm going to print those as well so whenever we actually run this we'll get that information um and so on and so forth here so that we're going to grab our video IDs from um because we need to get the URLs so we're just going to grab just the end point which is what the video IDs are okay so we're going to get all of our video IDs there um so moving down the code here uh we got one more function so this is the get the video information so this is where you're going to actually get the um you know actually tell YouTube what video what information you want from it um so we have our base URL that we're calling the API um and then basically this this part with the statistics snippet content details topic details um so that's what we're telling YouTube we want to hit those specific API endpoints so there's a few more that YouTube has like I know for example if you have live streams on YouTube you can call a specific endpoint to get live stream information as well so those are just the ones that we need for this video uh so I'm gonna go ahead and scroll down we're gonna make that request call there uh turn it into Json and then we're going to turn that data outside of that function call and then again we are going to uh maker video ID API key get video info there all right so we're ready to go there um so now what we're gonna do so now that we've now we're gonna now we have a function to get our video information now we got to actually Loop through all that basically turn it into a data frame so then we can spit it out as a CSV so we can feed it into bigquery later on um so this this bit of code um probably could be written better uh someone argued definitely could be written better uh but that's okay I'm not promising the best code here I'm just promising it gets the job done uh so what you can see we're doing here uh basically I've manually coded all the all the column names that I want there uh I have my blank list for info for videos and I'm going to Loop through that and you can see basically the the Json kind of structure here so data items uh the first one so on and so forth there as we keep going down um and that's going to Loop through and grab all of the video information for us there um and then it's going to put it into that list it's going to pin it into video info so on and so forth that's what we look for through um and then last but not least so last but not least here we're going to take this data that we have inside of this data frame um or inside of this Json we're going to turn it into a data frame and then print it out as a CS or you know turn it into a CSV so we can download it that way uh so that's what we're doing here so we import a pandas down here turn it into a data frame the info for videos uh we cleaned up a few things here uh we ran into some issues where um essentially we ran into some issues that the video description has some emojis in it um basically you know we threw in some emojis to make it you know a little bit more fun to read it was just a wall text unfortunately bigquery really does not like emojis being in your database uh so what I decided to do is just go ahead and drop that column there but then you can see down here uh we we have X we've taken this to YouTube data.csv that's what we'll call later on down the road um to actually kind of reference this file and upload it to bigquery uh that's all we need to do there to get the data from YouTube um and now we just need to install our python pack so if you remember we we use pandas so scrolling back up here we use pandas we used request Json and Os so requests I can get and then Json and Os are going to be system packages so we won't actually have to install them you can see that there and then of course the same thing for OS so we don't have to install those two so we do have pandas and requests that's all we need to do that's going to get that data from YouTube for us so that's that's perfect there uh so now we need to go ahead and grab the data from webflow now webflow's documentation has great documentation on how to use their API um so you know thankfully this is more of a copy and paste job from webflows uh webflows websites um so kind of working from there so again another custom python script we're going to name this download solution if I can spell solution data from webflow okay so then we can name the file webflow data dot or download Dot Pi there we go there so we're good to go naming conventions all right so now running through this code again this was going to be a much easier code to look at than the last one um so we have here um import requests import OS so I'm going to go ahead and install request Russ offer gets so request there and just another note about Shipyard kind of a pro tip if you have all these scripts inside of GitHub and have a requirements.txt we're just going to install those packages for you you don't have to manually do it something I wanted to note there before we continue so you can see that we also have the API key uh being hidden as webflow API there so I'm going to go ahead and put that in there as well so webflow API let me grab that API key for us there copy paste that in so that's good there so it's going to feed that into the code for us so if you remember from webflow we had 503 Solutions uh so again I'm kind of feeding in the vast majority of those there I kept a few outs uh just because I knew that we didn't fit so we call our API throw in our API key there it's going to return our data as response I'm going to turn that into a dictionary and I'm going to Loop through that dictionary and and kind of and kind of put that into a list of solutions so that's that's what we're going to get there is kind of a list of solutions uh you can see we made the empty list there for Solutions so we're gonna have our empty list of solutions and so now the next thing we're going to do is we're going to kind of name our keys so again this is a bit of manual code manual code here uh probably could do this a little bit better but that's okay so I always went through and named these things uh because I knew how they were going to come in and then next we want to kind of turn this thing into a CSV from that uh that list format uh so I'm going to paste the code in here we're going to import CSV that should not be there we go so I don't think we have to install CSV I think that's an internal module it is great um so our file name is going to be solutions.csv there so we're going to open it and then write those rows into the CSV and that filed into me written successfully so we're going to have our two file names which are solutions.csv and YouTube data.csv so at this point our data has been extracted so our data has been extracted at this point and now what we're going to do is actually load that data into bigquery all right so now we're jumping into where we want to upload this data into bigquery so again these two Python scripts are going to extract the data and save it as csvs for us and so now what we need to do is take it into bigquery and so thankfully for us in Shipyard we have low code blueprints that are going to do that for us so I just got to search for our bigquery blueprints you can see we have upload file to table from Shipyard so I'm going to click that closer authorization page we're going to put this in our make sure we're going to put this in our DBT core Solutions data set there so DBT core Solutions the table name so for our webflow data we're going to call that raw Solutions uh and that file name is solutions.csv what we named it uh and the last thing we need well first off we want to we're going to do an override data here because we're just going to do a full pool every time uh and then we can use our bigquery credentials from Shipyard import those in there we're good to go we just need to connect those together so now basically if that's uh solution data from webflow is pulled in successfully it's then going to upload it into bigquery and again we want to do exactly the same thing but for the YouTube data as well um so again we're going to put that into DBT core Solutions with a table name of raw YouTube data and that that data is YouTube data.csv we're going to override the data and then of course that service account again we're gonna take from our credential management system connect those together this section is much much shorter than the last section right and so that's kind of the joy of using Shipyard is if if there's a solution that we have for you and our low code blueprints it makes getting some of this stuff done much more quickly uh because otherwise we would need to actually write some Python scripts there to actually get this data uploaded into bigquery uh since we have blueprints ready for you you just got to use our blueprints uh fill in your information there just like I did and that data is going to go into bigquery so now at this point in this project we've extracted the data from webflow and YouTube using our Python scripts and then we've used shipyard's low code blueprints to upload the data in the bigquery so the next thing we're going to do is actually start writing our DBT core models so then we can actually run DBT so let's jump into that so we're going to jump into now is a little bit of DBT core um so DBT coordinates out of Shipyard uh we have Gods out there uh to help you get started with DBT core no matter whether you use bigquery redshift snowflake or dated bricks I haven't got we have gods for you uh to get started with those uh so I'm skipping the steps that are in those gods that kind of get set up with side of DB inside a DBT core so you can kind of see the starting point here that we have so you know and of course your DBT core all your all your models and things are going to be hosted in GitHub um so you can see our starting point here so the only things that I've actually built is at my DBT project I've kind of changed the name I have some model configurations down here um and then similarly for my my profile I also have um I also have that set up to connect to the DBT database that we have and then the data set will be DBT Steven there and then last but certainly not least we have our execute DBT python script that's actually going to run inside of Shipyard um so looking at that real quick so you can see down here in the bottom I have a blueprint set up that is uh DBT core commands here that's what we're going to use to actually run the DBT so that's what I have done prior to this video uh so I'm not going to walk through those steps because they're available in the guide again I'll link those in the comment or in the description of the video below since you can run through those steps through ndbt core as well so anyway so jumping into the db2 course so all this works can be done inside of Visual Studio code so vs code uh somebody expand this a little bit so it's bigger okay so again this so you can see we we want staging intermediate and Marts uh so in our models folder here we're going to go ahead and create a few more folders so we need staging we need we need intermediates and last but certainly not least we need our marks okay so there's our three folders so of course we're going to start with staging here um and then I can kind of look at our bigquery data here so DBT course Solutions um so we're going to start with our Solutions data here so our raw Solutions you can see there's the data that got loaded in there we can preview it inside of bigquery you can see there it is so what we're going to do inside of vs code uh so we're going to create a staging table so I'm going to call this STG for staging Solutions dot SQL so there we go uh and we're going to be super super boring and we're going to do a select star from Shipyard Dash demos dot DBT well exactly that thank you vs code perfect so that's exactly what we want there so we're just going to select that to get our staging table to be our raw data there so that's perfect so we're going to go ahead and save that for that for that table and then we're going to do exactly the same thing except we're going to do staging YouTube dot SQL and again we're going to do exactly the same thing so I'm just going to be lazy and copy and paste here in uh except instead of raw Solutions that's going to be Raw YouTube data uh go ahead and save that and again I can look at that YouTube data as well you can see that's what's coming in um there's our data so great so that's going to basically set those staging tables for us so it's just gonna be the raw data that we can work off of and then for each one of these folders we're going to need a schema.yaml file um and you just kind of copy and paste this based on the template that DBT had so I'm going to paste that in for us so we have our table name the description table name in the description of course these get kind of taken over into bigquery to give you those tables descriptions for you you can go down and do like column level descriptions as well I'm not gonna do that in this video but definitely an option for you so that is our staging models there so pretty simple nothing crazy there uh that's what we want to do for staging all right so now moving into intermediate models um so starting off we're gonna do our intermediate Solutions that sequel okay so one of the things that we got to clean up in our raw Solutions data here is you can see if I preview data you can see down here in our YouTube video kind of column it is a dictionary so what we want to do is we want to pull that URL out here and then of course we just want to pull just the YouTube ID because remember that's what's coming in from YouTube is just the YouTube video ID so we want to keep that ID and then we want to pull it out of that column and make it into a new column so that's what we're going to do with our intermediate Solutions model here so we're gonna do that kind of in two steps so I'm going to do with query dictionary as and I have my select statement here I'm going to capitalize it it's like a good SQL user should um we're gonna do name as solution name so we're going to rename that to be solution name because it'll be a little bit little bits more descriptive we have description uh solution description there perfect uh synopsis we're going to keep that the same uh we got slug we have primary blueprints we have secondary blueprints no they're not they're not plural sorry about that secondary blueprint we have Integrations used um so that lets us know like if it used like DBT or snowflake or something like that uh it's a perfect there um categories that lets us know what the solution category falls into we have roll so who might actually want to use it uh YouTube link embed so perfect there we got underscore CID which is the collection ID there so rename that so it's more readable we have just the ID we'll keep that just as ID there um that works uh we got YouTube video which is that column we're gonna we're gonna play with here in a second we have yaml definition uh which is going to be the yaml so that you can put it into a fleet um so now we're gonna actually you know do some of that Json work there some of the dictionary works we uh so bigquery has a function called um Json extract scalar so you can see it's already collabs already typing that in for me so I'll be lazy and go ahead and work with it so this is where we're going to put the column that we want to work with which if you remember is the YouTube video column well YouTube video column if I can copy and paste correctly there we go uh and then going back to bigquery so it's in this column and it's the the key that we want is the URL um so we're going to have that dot URL and we're going to name that as the YouTube url so perfect so that's going to actually pull that out as its own column so the YouTube url is going to be its own column there so perfect so then we can say that is from from oh yeah there we go and then we can do our ref here since we're using DBT that's one of the one of the cool things about DBT so we're going to ref our staging Solutions table okay so query dictionaries is good to go uh so now I can do I'm gonna name this next one query YouTube ID because we're going to pull the YouTube ID out of this so as there we go so this one can be a little bit lazier so selects so we're gonna do a select star comma we're gonna pull a substring because again we want to just take away the actual URL part and only keep the ending which is the YouTube video ID um so we're gonna do a substring of that of the YouTube URL see so we do our string position there uh we want to do the YouTube so yeah just exactly like that collab is spectacular uh we're gonna name that the YouTube video ID so let me explain what's going on here so we're doing a substring we're going to find the equal sign layer and we're going to go to the position next to that so you can see in the YouTube url it's always watch question mark the equal sign and then the video ID so we're just going to get those that get the things after that equal sign there which is perfect so as the video ID which it works uh and we're going to reference that from the query dictionary so perfect there and then last thing we're going to do is just select star if I can spell so we'll do a select collab let me down there so we'll do a select star from query dictionary I'm sorry query YouTube ID perfect so it's gonna it's gonna query this this uh this sub query here so we're good to go there so that is our intermediate Solutions table there um so perfect uh so that's intermediate Solutions uh so now we're going to do is we're going to do an intermediate YouTube table gonna be a lot less complicated than what we just did there uh so intermediate YouTube dot SQL uh so the only thing we want to do in our YouTube table if you remember uh we had this we had this weird uh inter like number field here which is just that index coming in from python we're going to get rid of this column here uh let's wipe that out um just so we can keep our data clean uh so we're just going to do a simple select and we're just going to list these columns to video ID which again is going to be that that primary key that we can match off of uh date published uh we got video title uh video tags let's see does video have caption oops I need a comma there here we go doesn't have caption video view counts we're gonna have video like counts and then we'll have video comments counts perfect there uh and that's gonna be from we can reference do our reference stage YouTube Perfect there we go so that's just going to take that one column out there keep everything else the same uh so again for our intermediate folder we're gonna also need a schema dot yemo um and again I'm just going to copy and paste off of that format that we had for uh in DBT so there's that interview so it's I'm just saying what I'm doing here so parsing through the Json extract the YouTube video ID and then in I'm just cleaning up to date a little bit there so perfect so we're going to save that so now we have intermediate tables ready to go so now the last thing we're going to do is create our Mark tables here so we're going to do in our Mark tables uh so for the solutions we're literally just going to copy the intermediate table move on so since we did all the work in the intermediate table we're going to stop there what we're going to do for our YouTube data is we're actually going to separate it into a dimension in a fact table I am a a product of using power bi so I love a good star schema I love a good Dimension and fact table so we're going to do that here just to kind of show the function that we can do so what I'm going to do we're going to start with that marked Solutions uh that sequel there um and again like I said we're just going to do a simple uh select star from and then we're going to ref that intermediate um intermediate Solutions table there so perfect good to go that's all we gotta do there so very simple on that one so get yeah our Mark table we don't really want to have too many Transformations going um so lastly so next we're going to do our YouTube Dimension table so dimensions.sql so we're going to select um so we're going to select the video ID so that's our key so video ID we're going to do the date published we're going to do the title of the video we're going to do the tags of the video and then we're going to do does the video have captions and we listen I've got into many fights many days with many people about whether or not Boolean values belong in the dimension or a fact table uh if you have any opinions on that leave them in the comments below I love to hear God I love um I don't get to do that as much as my job at Shipyard is getting to talk about kind of star schemas and whether things should go like kind of the data architecture work of like trying to plan out where data should go in tables uh so yeah if you have any opinions on kind of whether Boolean values belong in Dimensions or fact tables leave them in the comments below let's give this a good discussion going on that so anyway so we have our Dimension values here so perfect so then the last two we need to make here is the march of YouTube fat dot SQL and again very similar to what we just did there uh so we'll have our select and we're gonna do our video ID because remember that is our primary key so we got to keep that in there so video ID um comma video view counts perfect we're gonna have our video like counts and then we're gonna have our video comment count there and that is going to be from the the reference table of intermediate YouTube so perfect so now we have our YouTube fact table YouTube Dimension table our March solution table so we're good to go for all of our all of our tables so now we need to get our schema file in there so schema.yamo uh so perfect um so again gonna be lazy copy and paste that for my notes um put them in there again putting marks YouTube uh YouTube fact table dimension table March Solutions so we're going to save that so that's all that we need to do for our DBT models inside of uh vs code here so now we're gonna need to do is pull up GitHub guest desktop here you can see all the changes that we made so we're going to say added models for Solutions and YouTube I'm going to commit that to my main push the origin since I'm the only one working on this didn't do a PR so um so again I'm sure you can definitely laugh at me in the comments if you don't like GitHub desktop and you'd rather use uh the terminal but listen I just got to click a button and push stuff to the push stuff to GitHub so you can't argue with the results there so anyway so that's good there uh so we have our models ready to go we can check that out in GitHub here so I do a quick refresh uh we can see our models here there's all of our models we just added in uh so that's good to go there all right so now let's pop back in the shipyard and actually use this so again I created this blueprint kind of before the video that's all in our guides for DBT core getting started with Shipyard again that'll be linked in the description below they're going quick on DBT core commands uh so since I don't have too complicated of a process here I'm just going to do a simple DBT run there and it's going to actually run that data like run those models for us um so that is setting up DBT core to run inside a shipyard there so we got our models set up we have our project set up we have have our profile set up so this is set up to run against bigquery to actually run these tables and do our Transformations so now the last thing we have to do to kind of get this full thing set up is actually taking our data into or taking our data into looker Studio which thankfully is is just going to be a connection between the two so actually what we're going to do now is we're going to actually go ahead and click the save Fleet uh and then we're going to actually kick this thing off to run and we're going to take a look at the logs and look and see the kind of information that Shipyard gives you based on this DBT core run so we kick that reflect the Run uh you can see it that takes us to our Fleet log page which is going to kind of give us like a vessel by vessel breakdown of this thing as it's running um and so you can see the download data from YouTube uh you can see that that vessel has just started here since it's that yellowish orange color so that means that it's currently running uh inside of our platform um so whenever that thing completes successfully it's going to turn green and then if it finishes not successfully it's going to turn red um and so we'll wait for that guy to finish here it just turns green so I can actually click into that and it's going to show me the python output from that if you remember I had a few print statements out but you can see I printed out all the URLs that we pulled from from YouTube and then that success message of data extracted to YouTube data.csv um so we do we do have some success like so if you do print some things out like it's going to show you in the console just like it would if you were like coding in Python on your local machine so we always have you know for our for our low code blueprints inside of Shipyard we have air handling uh success messages and things like things like that that built out for you so that if something goes wrong like say that you're just like me and don't know how to type in your password or something like that we'll let you know like hey your password's messed up go fix that um just so you're not you know it we try to save you some time while you're troubleshooting here inside a shipyard so um so anyway so what I wanted so while this is finishing up running this takes around two minutes two minutes or so to run uh I do want to pop over and look at the triggers here so if we wanted to schedule this thing to run uh we can schedule it based on hours days weeks months so on and so forth there as many of those as we want and we can actually do a web hook trigger as well uh so we can actually send the web hook to actually start this Fleet run and then you can also send parameters to the web hook as well so all those environment variables that we set throughout the whole process we can set those at runtime as well as well as all the inputs inside of our blueprints as well we also have Version Control I'm not going to dig into that too much but basically that allows you the opportunity to kind of go back and forth between fleets uh you know like when you're every time you click save it's going to actually show it's going to it's going to create a new version for you inside of Shipyard so that you can revert back to old versions you can create new fleets from those versions as well and you can kind of compare the like compare the changes between the two uh so there's something didn't work like you had expected um you know you can kind of go back and be like what changed between these two like what could like what what did I do wrong here to fix this uh and so that helps you out there so you can see that at this point uh We've downloaded the data from YouTube we've uploaded that to bigquery we've downloaded the data from webflow uh we're about we're uploading that into uh into bigquery as well uh so that turns green we'll actually go into bigquery real quick and look look at that table that happens or that table that was created uh well well those that's just those raw tables I'm sorry because we just override those every time we load those in so those are definitely there um and so last but certainly not least is we're going to have our DBT core command run here so this is what's actually making those Transformations happen for us inside of bigquery uh so you can see that thing is actively running now so it's connected to bigquery running those models for us and sort of that turns green we'll be able to look at the logs there and kind of show you what it shows you using DBT core um very similar to what it shows you inside of DBT Cloud but of course just a different interface here running this inside Shipyard versus running it anywhere else so as this finishes well that's how you know this is a live video my series went off over there um it's funny so um so yeah as you can see with the um you can see the output here from the DBT core run um so you can see we had the four threads with the target of Dev you can see the models that were ran here created the views uh you can see it finished running seven views it told us when it completed successfully gives us the pass the warnings the errors there so everything built out like we wanted it to uh so now I can actually go into bigquery look at DBT Steven I'm going to refresh the contents here you can see all of our tables here uh there's our Columns of our fact table our Solutions table here so we have everything that we want inside of inside of bigquery here so that is looking through the logs inside of Shipyard and just kind of verifying this worked so the last thing we have to do is we're going to pull all this data into looker studio and actually just do a quick visualization to make sure that we're getting the data that we want and to make sure that it's working as we expected okay last but certainly not least we're gonna get into the looker Studio here and we're gonna visualize this data from bigquery just to make sure that the connections work like we wanted to and we can actually model this data like we want um and again not not a local Studio experts not going to say looker Studios the best bi tool in the market uh but it does connect a bigquery really quickly for us just so we can make sure that things work as we expected them to uh so we're going to connect the bigquery here that's where our data is our data is let's see my projects so I'm sorry we're under uh Shipyard demos DBT Steven uh and we want to pull in just our Mark tables here so March Solutions we'll pull in first that'll load in there yeah so again wish that I could load all three tables in at the same time but you know that would that would that would you know we would lose the opportunity to click on all them individually which I think we all build character together by getting the click on all them individually so uh so yeah this is loading for us here um and again we know how big these tables are so yeah this this should take a lot a long time here just being sarcastic obviously uh loading in the data um hopefully it'll finish loading in just a moment uh see all we're going to do once we get this data in is we're just going to put it into a table making sure that the name of the solutions from the website match up to the YouTube IDs that we have set up um well let's give this the old refresh here and we'll try to restart I shouldn't take that long some March Solutions add there we go let's add that to the report so you can see there's our Mark Solutions when I go ahead and add the other tables as well um I thought I was going crazy there so we did Mark YouTube Dimensions we're going to add that to the table as well um there we go and last but certainly not least we're going to add the fact table in there we go so again it's going to populate here and it goes in and creates that visual like that one little table for us definitely don't want that we'll delete that off in just a second um here we go again it looks like I might have to do another refresh yeah I'm going to another refresh here um there's our Dimension so now we just need our fact table in oops nope okay fact table there we go add it in come on looker Studio we believe in you we believe that you can get this in here maybe might not believe um um well this is definitely not hot whenever I uh when I was uh working on it before but it's okay um give it a couple more go and refresh it um not giving uh YouTube I'm not giving looker Studio a very good uh very good look right now but hey there we go we did it okay so there's our three tables um so now we got to make sure that they join together properly so inside of local Studio they call those Blends so we're gonna manage our Blends here uh so add a blend so we have our marked YouTube facts we're gonna do our more YouTube dim then we're going to do Air Mart Solutions there we go so we'll configure the join so between those two we're going to join the video ID with the YouTube video ID there we go and then we can join these two video ID to video ID Perfect all right so those look good to go there well just kidding we need to add the dimensions in forgot about that so let's go back and okay there we go so we want to make sure that we add in what we want to use so for our Solutions we want to add in the solution name there and then video ID works for the rest of them okay so go ahead and Save um so that saves when going close that blend so now you can see in our Blended data it's one we want to use I'm going to get rid of this little table here um sorry blend the data so we can put our solution name in there and then we can put the video ID with it uh so it loads up well add dimension we want to put the solution name in there with it yeah one if you can believe it or not I used to do VR tools that's my job it's once you don't use it for a while it definitely goes away uh all right so there we go okay so you can see there's video IDs from YouTube are now being paired up successfully with our solution names inside of webflow which was the objective that we wanted here of course we can further this project by connecting it to some GA data uh any of our marketing data that we have people looking at our website or looking at our YouTube videos we can start connecting those together so we could actually get view counts for the solutions and things like that as we as we keep going uh but this was just kind of the starting of this project and I just wanted to kind of show kind of an ADA a to d a to e kind of uh workflow here inside of Shipyard so again it's kind of running through everything that we can put in this video so we extract the data from from webflow and from YouTube using python so after that we extracted that data and saved it as csvs we uploaded it to bigquery using our low code blueprints inside of Shipyard we then used DBT core which we hosted inside of Shipyard as well uh we use DBT core to do the transformations to get our staging intermediate and Mark tables and then once we had those tables made using DBT core we were then able to pull the data into looker Studio as as we can see and we were able to actually just make sure that the data lines up like we wanted it to um so thank you for joining us for this long form uh solution video again this is you know a lot different than what we usually publish on our YouTube channel much longer much more in depth uh if you want to see more of this content uh make sure you like the video make sure you drop us a subscription we'll and you can see more of it uh if you have any other ideas of kind of some other tools you'd like to see mix in on these long-term videos are these long these long-term uh solution videos make sure you comment those in the in the uh in the comments below uh we'd love to do more of these long like these long videos but we don't want them to become stale so if there's any other tool that you want us to use for extraction for transformation for visualization if you want to see those who reverse ETL in anything like that that you would like us to actually throw into one of these uh drop it in the comments below and we'd love to make those for future editions we'll see you next time
Info
Channel: Shipyard
Views: 7,862
Rating: undefined out of 5
Keywords: BigQuery, Data Engineering Projects, Python, data build tool, data engineering, data engineers, data operations, data orchestration, data pipelines, dbt Core, dbt data build tool, end to end data engineering project, workflow orchestration, google bigquery, data management, workflow orchestration engine, data orchestration tools, data orchestration engine, data engineering tutorials, data engineering tutorial, data engineering projects, data engineering project, etl pipelines
Id: IQxhutCbN7A
Channel Id: undefined
Length: 43min 27sec (2607 seconds)
Published: Tue Aug 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.