What is Tableau Prep | Tableau Prep Hands on Training | Complete Tutorial [ Basic to Advanced]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to this new tutorial series tableau prep so this is a tableau prep introduction and we were going to look at how you can get the tableau install downloaded and finally install it and then we can go through its interface at a very high level so just to tell you a little about or give you a glimpse about the tableau crap this is how generally a tableau prep looks like so tableau prep is I will explain about the tableau prep but in this slide I just wanted to give that about a workflow which usually you will create within a tableau prep for the preparation of the data as the name itself suggests that trap is nothing but data preparation so just to give you a glimpse this is how a tableau prep workflow or the data cleaning data aggregation data massaging steps usually looks like within the within the actual software and as you can see they are so intuitive that all the steps that we have like all orders then you have added the returns then you have cleaned the data we will see all of those different things in the subsequent tutorials and see how we can create a flow like these and then you know use it for the data cleaning and preparation perspective and before that some quick highlights of tableau prep so one is tableau prep was created for data preparation so by now you must have got this idea as I have shown you the workflow as well that W prep or the tableau preparation tool has been created for the data preparation now data preparation is as you know is one of the important step for the data visualization so as I have mentioned over here it was one of the biggest thing which was missing in the tableau data visualization tool that we have been using it from so many years and other tools like ultrix etcetera have been used in the past and with the tableau prep tableau as a company is trying to complete the ecosystem where they all we have the visualization tool but in the backend where the tab preparation was missing though you could do things like joining the tables etc if they are from the same database etc or something like creating calculated field or you know pivoting the table and also some of the common steps you could do in the W but no it was not a full slashed data preparation tool and it was suggested that by even by the tableau that most of the heavy lifting of the tableau data preparation should be done outside of the tableau so that's where tableau prep really comes into the picture and helps you to the data preparation so after that using tableau prep is very simple like I should shown you in the previous previous slide that a simple flow where like a table tool you can simply drag and drop the various components to create a workflow for data cleaning aggregation joining etcetera to give it finally to the visualization tool and how you can give it well you can publish the data directly to the server from this tool or create a local extract to use it in the data visualization tool so this is just a quick theory around the tableau trap just to give you a high level of 10,000 feet view about the tableau prep if you have never heard about this whenever used this tool ok so operations that you can do with tableau prep so by now you can understand that all it's all about tableau preparation and all data preparation and all within tableau prep so what are all these steps so one is you can take the data from multiple sources which is sort of a restriction when it comes to the tableau there then it is there is a comprehensive list of things that you can do to clean the data as you know you know data cleaning is a massive task where you need to perform multiple operations to bring the data into any structure or clean the data so that you can use it in the visualization and final after that you have data transformation so things like how whether you need to aggregate the data dia and things like that after that you knew if you need to create new calculation you could do this in in tableau itself here also you have the flexibility where you can create new calculation and it's again very intuitive and very straightforward and a lot of help which is already available and finally you have Philip the missing data that you can do and nothing not only that but much more things that you can do with the tableau prep so let's go ahead and see tableau prep so here as you can see in the Google I have just searched the tableau prep and within the W once I search it you will see the link which will take me directly to the tableau prep builder and here you have the link to try it for free also there are some video tutorials to see it in action which is directly from the tableau and even I recommend that you see that and then down there just a quick summary of what I've already told you at a very high level that it's a self-service data preparation you have the preparation builder this is sort of an image which I've already shown you but I did not show you this one because this is a little bit more complex and which we will see in subsequent tutorials you can see the offering and pricing information directly from there what is the latest offering in the pricing and then some information about immediate results by telling it how fast it is work smarter work faster then connect data all the different data sources you can click the link over here to see all the data sources through which you can or with which you can connect and then stay in the flow you can clearly you know rename the step preview it in tableau text talk or remove all of these options we will see but what it is trying to tell you is how you can see all of the information in a flow after that tableau prep conductor it's another tool or another part of the component where you can schedule monitor and mr. your data flows down their schedule your flows monetary of monitor your flows across the server's things like that just to make the administrators life easy so that is these those are the different things which even we were going to see in in the tutorials that I will going to publish in the future but just to get it you can click over here try it for free it will give you the new link where you can enter the business email and then say it start trial start free trial and then you can get it so once you are once you have got it so for example I installed I already downloaded it if I go into my downloads and if I see that blue over here so W prep is something which I have already downloaded and installation as you can see is pretty straightforward like any Windows installation so what I can do is come over here and click on the tableau prep so once I click on the tableau prep you will see this flash window which is coming and it will take a couple of seconds to finally show you the entire interface and that's what I want to show you so initially it will ask you the information for registration for now I will just cancel this and over here you have structure of a window here on the left hand side you have the information for connections to connect it from the different data sources in the middle you have the recent flows sample flows is been given for superstore and world indicators like you have it in the tableau itself the world indicator and superstore visualization the same for the same samples the flows are over here which you can view and try to understand on the right hand side you have some training material I getting started visual dictionary video tutorial cleaning staff W prep some you know instructor-led classes information and down there you have some blogs and resources and the forums and finally update to 2019 dot point two so all of this information is given to us on the first screen as you can see where you can you you can look at the sample you can get the training information you can get the blog information once you become more and more experienced you will start using it and up there you have a very standard menu like file add it flow server and help and from the help you can get to know what tableau prep builder you are using I am using 2019 or 3.1 which if you download these days the tableau desktop with that it gives you the option to down download it directly and you can I suggest that you download them together so that you know you don't have to download it as a separate you know entity if and if you want to evaluate it later in the stage you can download it later as well up to you how you want to see that but if we want to see the sample superstore flow so I had already tried to open it so I will just click again and it will show us the flow left-hand side it will show the connections like order central or the first written reason Kota and the order East all of these are over here as an first step then you have some more step then you know as and when we are moving ahead the steps have been added rollup sales quota plus orders and then finally created the output to run the entire flow you have the option over here to refresh the flow refresh the data you have the option over here as you can see it's pretty straightforward so with this all of these functionality which is given over here which we will see it in action in the future videos we will going to look at step by step and then see this relatively complex workflow how we can create that and use it in the action so that's about it a very first introductory video and I look forward to meet you in the future video tutorial so stay tuned for that and we will go step by step for these video tutorials so let me know what are the things that you would like to see in this tutorial however I have my standard structure which I will show you but based on your comments I will try to modify based on your learning aids so that's about it and I will meet you in the new video the new topic hello there and welcome to the next video and in this video I will going to show you how you can take the first step in the tableau prep builder so the very first step in tableau prep builder is to import the data and to import the data once you open the tableau prep like this you need to come over here on this screen as you can see generally it's hidden but if you click over here it will show you the connections within the connections if you click on this add connection icon you will see a list of the various connection types which is available so for example the very popular like the Microsoft Access PDF axel statistical File tableau extract or text file is highlighted on top and then those which are customized for example the cell wall tableau server Amazon Ethne which is related to AWS Amazon Aurora Aurora and Amazon EMR map are hadoo all of those are are down there and finally if there are JDBC or ODBC connection to your custom database that you need to build you can build it using these and as and when it is progressing the new versions are coming out more and more features and connection types are coming out so if there is nothing no connection in terms of what database you use or what back-end data set you use if it is not available just check out the tableau site and for the prat builder and you will be able to see when the connection or the you know in built connection which is out of the box is coming here for you so for now what we will do is I will just click on the text file to create this tab and here what I have is very simple print files so what I'll do is I'll just take the orders file or the South file and as I double-clicked over there two things happen one is the connection is reflected over here and it's related table along with that the very first inputs tab has been created over here so this this is the stab which has been created along with the plus sign as you know I just clicked on this and it has the further steps that we can add but that will be later so the very first thing is input step and as we are here you will see that lot of things have happened down here and here so this is basically two different screen the very first screen over here where I'm dragging my mouse left and right is basically the flow screen where you will create the flow for cleaning and massaging the data and down there you have the input step where you have lot of information related to the file that you have imported or the number of files that you will import in future it will have that information so text setting is basically shows you the information about the connection so this is the data set file that you have imported this is a text file or the CSV file you can add it this if you want to choose a different file and then first line contain header and because of this all of this field name information is coming so you need to make sure if you are first if your first line within the data set contains header make sure that it is enabled otherwise you can click on generate field names automatically but it will not generate something pretty interesting but the f1 f2 f3 f4 which you can change it so I will just go back and click on the first line contains header then by default it will going to select the field separator text qualifier character said if v u if your data set is having a special character set then you can choose it from here for now we are good because we are not using anything special and finally the local English unit it stayed with which I am pretty good so once you are here once you are reviewed these settings you have couple of other settings which I'll show you in a minute and then on the right side is basically the screen where you can review that data set columns along with some sample values which are part of each of the columns so for example sales is having numerical values so that's fine region is basically South that's good even State is indicating the luciana so that's fine ship mode second class all of that information you can review and suppose if you don't want any of the column let's say for example you know that you are creating this dashboard for South Region so you can uncheck this and with this that the column will not come in the front and or in the process data which you will use it than in the tableau and this way you will have only those columns which are required and which actually results into the good performance for now I will just check that and then the second column as you can see indicates the the type of column that it is whether it's numerical weather it is character whether it is calendar you can see that this is all this information which you can if you can click you can also change it based on your requirement say for example if it has incorrectly identified the date as numeric or number then you can change it to either date or date in time based on the column values that you have afterwards you have the column as field name and original field name so field name is something you can change it in in the next tab which I will show you maybe in the next video where as it compares with the original field name so field name is something which is in the W prep builder based on any customization that you are doing so for example order ID so you may want to remove the space and have order underscore ID instead of order space ID so original field name will be remain as order ID whereas order underscore ID is something you can change which I'll show you later how you can do that and then if any changes that we have done or any sample values so very first change that we can do is probably let's say in the ship mode for example we only want first class so what you can do is we have the option over here to filter the value so we can click on this and we can say ship as I'm enter as such I it has showed me two values when a ship date another is ship mode so what I will do is select the ship mode and say equals to first class so I hope everything will be fine and if I click on apply and save the first class information is coming and the change that we applied is over here and once I hover over this filter it shows that ship mode equals to first class if I want I can right click either had it or remove so for now I'll go ahead and remove this so this is just to show you an action of filter which you can take with the help of the filter values option and if you remember this this field or this what box should be you should be able to correlate it with the calculated field box of the tableau and because of that it will be pretty easy and pretty familiar for you to make the calculations over here for the filters so once we are done we can click on apply and save so for now I'll just close this then what we have is basically the multiple files option so right now we are having only the single table but if we want multiple table we can have the wild card Union option and within this folder or the South it is looking at the different files that that is part of the folder so right now it has 2015 is something we already had 2016 17 and 18 so if we say just the orders star so in this case it will going to show all or let's say if we have something like ABC star then it would not show anything I just press the tab after after ABC star so if I say orders star it will show you the all the different files and then you can click on apply and all the orders file will be added over here for 2015 16 17 and 18 so as you can see the button has been changed to applied so once we are done with this we can come over here on the data sample so what happened with the data sample as it is mentioned over here as well that for large data sets you can improve performance by working with a subset of your data use these settings to select the data to include in the flow so what that essentially mean is let's say you have 10 million rolls or let's say you have 1 billion rules let's let's take that extreme example and in this 1 billion row if you have to process it again and again based on the flow that you are creating the first of all the data import will going to take a huge time to import all of the data inside the tableau and because of that what you can do is you can just sample the zeros so for example you can say that you want only the this is 1 million rows so fixed number of rows less than equals to 1 million so 1 million or 100,000 or 10 million or 10,000 it it simply depends on you how you want to select it so you can have the 2 option like a sampling method and random sample I do random sample is good but clearly it is written that more thorough but it may impact performance so I think quick select should be good because fixed number of rows like let's say 1 million if you have 1 billion rows 1 million is a good representation or you can have 10 million which is again a very very good represent but if you are not sure you can also have random sample and evaluate whether it is really impacting your performance so for now I will just keep it default option and finally changes any changes that we have applied to the data set related to clean and transform which I will show you in the next videos about how you can do that so this way you can you can basically import all of that files which are which are there in the folder another option is if I just show you over here another option is what I can do is I have other files let's say order central east and west apart from south I have these files so what I can do I can simply drag also over here and this will become as you can see it has become part of the if I can drag it down so blue is indicated by order South Orange is or the central green and this pink or magenta is order east so these are the four different files and four different color icons just for the better understanding and better categorization so this is just another way by which you can also import and as you can see we have created the four different steps right for different input steps with which we can further join aggregate transform clean create new calculated field all of that we can do afterwards once we have got that old data inside it so that's about what I wanted to show you as a first video because input step is as you can see is pretty huge and probably it has given you all the necessary ideas related to how you can import the data either using straight option like single table or the wild card or directly dragging it inside and how you can apply the filter to the sampling of data and so on and so forth so that's about it and I will meet you in the next video with a new topic hi there welcome back to the next video and in this video we were going to look at some of the interesting thing which we can observe from the inputs tab so if you remember from the previous video we have imported the four different files for four different region which is south central west and east and their respective information is coming over here related to the fields so the field name original field name if any changes like filtering or anything that we have done and their sample values so this provided a lot of input in terms of the values that we are importing and as I've mentioned in the previous video we can uncheck any of the column if we don't want it to be a part of our final data but if you look at the order central over here the data set that we have imported we will see that we have some values like order your order month order date ship your ship moonship days so it's basically an entire date and bifurcated into their respective year month and day whereas if we look at for example or the south you will see that order date is one column and ship date is one column so that's the kind of problem that you will you will see in your data set once you will get the data from different sources or for different regions or maybe for different years you will see that one or the other column is it's it's bit haywire and you need to do some processing to properly put the data in the right format also one other thing which we can notice over here is related to country city state but there is no field as a region whereas as compared to the south over here if we click over here we will see that a region field is present over here but in case of the central we do not have the region field we have country city state and if we go down problem yeah we don't have the region filled so that's one thing that we can observe by comparing the fields of two different sets and this comparison and understanding helps us how we need to process the data and if we go to order vast we see that down there we have region yeah so that's cool there is ship date order date ship mode looks like in sync with the most of the data sets but if you come down I just noticed that you have these fields which is right row ID right order date and a lot of fields which has been duplicated with the right and down there you have also the state but this state value is not similar to the state values which is present in the other sheets so it has just abbreviated column with the name easy I think it's for Arizona if I'm not wrong but if you look at let's say central over here this is properly the full name of the taxes but not abbreviated towards latter over here so probably we may need to do some processing in this case and make it as per our needs but this particular problem like these right field probably not very useful if we look at because we already have all the related fields over here and looks like a duplicate see so we can uncheck this and we will make sure that they are not part of the data set so if I uncheck everything over here which is related to right as a right as a prefix so in just few minutes I will make sure that I have only the relevant fields so yeah all the fields which is right it's been removed and you can see the remove field icon is coming over here making making a sure that this will not be included if you look at order east we see that category city country we have discount I think region is present profit is present yeah ship paid and I guess ordered it was also yeah order date is also present but if I look at the values I see that for sales you have a prefix of USD whereas if I look at let's say vest and invest if I look at sales it does not have as a prefix of USD similarly if I look at central it does not have USD as a prefix coming back here look trying to look at sales yeah so it does not have it's on the order East which is having u.s. Tissot we may need to do some processing over here to make sure that this is not becoming a part of our dataset and these are just very high level information that I could gather by looking at the data set or you can if you are did expert or if somebody is a data expert in your team they can you know do the comparison over here and you know make sure that you know all things at a high level that you need to do but as in when we will move forward we will see more and more issues more and more things that we need to do for example things like creating a new field altogether or splitting a field or joining a field all of that we will see in action once we move ahead so that's a very high level understanding I wanted to give you related to the input steps and the things that you can look at once you are importing the data so that's about it and yeah that's I have for this video and I will meet you in the new video with a new topic hi there welcome back to the next video and in this video we will start cleaning up our data so that it is ready for our analysis and you'll see step-by-step how we can do that so first we were going to do what we're going to do is look at the order central and click over here on the plus icon and we were going to click on add step so once we click on the add step you will observe that this workspace has been changed now and it is showing you a lot of different information and mostly what it is showing you is basically the distribution of data so for example in row ID not that much relevant but showing you a distribution similarly the order ID a distribution is over here and from this also you can clearly see that what sort of distribution that is available but things which is more relevant is in the ship mode where it is clearly highlighted that standard class is the one where most of the orders are coming similarly customer ID this and this is basically seems like most repeated customer and customer name we can see the distribution of customer name as well as it has been highlighted in gray similarly in segments you have consumer so that's basically a distribution that is present to us and not only this you have things like filter values also over here and another option like create calculated field and there are some recommendations like country change date roll did that tailor roll took country region to city to state province and say postal code so based on the geographic value it has it is giving you the recommendation you can also search the field if you have let's say hundreds of columns you can search the fields as well so once you click on these this once you add the cleans tab all of this information is present to you and you have you have this this similar not that similar like tableau because in tableau you have a data table but in this case you have this distribution of theta along with the similar sort of icon like you can click over here and change the data type you can double click on this and you can change the name of the column and over here you can sort and you have these three ellipses more options for further cleaning of data that we will see it in action also if your flow is pretty big and you don't want to show this information let's a later part in this stage of data cleaning you can always click on this icon and this will give you the entire space for looking at the flow and you can also again click back over here like this and then the information will be with you for all the different columns that are that are present now let's look at some of the cleaning steps so if you remember well the thing which is missing from here is the region field so you have country city state postal code but what you don't have is the region field so what we can do is we can create a calculated field call it region to me to be inconsistent with the other data sets that we have and within double-quotes we will say central and close this and click apply and save so what we have applied is central and you will see that central is reflected even over here as well so what it has is basically you can think it has a change log which which basically log all of your changes that you are doing within this data set so let's say later in this stage of development maybe after a week to week or a month you want to review all the changes or you want to show all of the changes that has been done on the tena maybe to other developer because they may want to reuse your data you can simply take this change lock and walk them through as well as for you also if once you are done with the development most of the time it happens that you are forgetting what you have done so this change log can become really handy and you can see what what are all the changes that you have done so the very first change that we have done is we have added the calculated field and now at least our data set inconsistent with the other data sets and next thing I remember if I go towards right and just check if this was part of this particular data set is yeah so order here or the month order day similarly shaped ear shape month and shape day so what we want is you want to combine these two fields into one single field which is order date and ship date which is inconsistent with the other data set so what we need to do again is create a calculated field we will say order date and then come over here use the function which is make date first mention order here then order month and then order T which as you can see the format of the data is first you need to specify here ten month ten t so once we have this we will click on apply and you will click on save so our order so this change has been logged over here as you can see along with the function and we can see the order date may be far left yeah over here so one thing also to notice is whatever field the new field that you will add will have a icon like this which is a calculated field icon and it will be on the extreme left or at the start of the data set fields so this way you can locate them easily so first we create it region it came here then we created order date and it came here now let's go ahead and create the ship date field as well with these same same syntax and just repeating the process and you can repeat with me ship month ship all right quick and save so now we have ordered eight we have ship date okay and all of these three locks that first we did the addition of region then we added the calculated field for making the order date and then we added the field for adding the ship date as we have the order date and ship date field already created we now don't need the other order field so if I just search with the order you see that we have created the order date but now we don't need order here or the month and order day so what we can do is come over here and just click on remove come over here click on remove come over here click on remove right so this way this will not be part of the data set however in the backend data it is available and that is why you are able to see the order date so order date is basically using these three fields to make the date but since all of the information is present here we don't need it in the front end it as it so don't think that we have removed it from the original data set it's always the part of original data set once we have imported it into the data over here so once we are done with this we can do the similar thing with these ship so ship date we already created and we have ship here we will just go ahead and click on remove come over here and click on remove come over here and click on remove so now all of these logs have been generated based on the operation that we have performed and that's really nice thing which I love about it and one more thing which I will just highlight to you and that you can do for a better understanding so as I said that this basically represent the data distribution right but in case of year you know it's it's seems like it has summarized the data and that is basically the option which is present here any more options where are somebody you can have a state by somebody or by details so if you can click on detail and this will present a beautiful detail by each date what was the order or what was the order value that was present so if you're really interested in looking at the detail information you can come over here any more options and you can click on the detail option and this way it will become pretty handy so in case of dates it big make sense but in case of ship mode or things like segments it makes sense that it should be a summary right so we are pretty much there and one last thing maybe the second last thing what we can do over here is what I noticed is basically the discount and as you can see it is a numeric value but it's highlighted as string also wherever the discount is not given it says none however it should say zero so what you can do is double click on this right and just change it to zero that's how easy it is in tableau that all in this tableau prep bill does that all those values where it was none in all those rows it will be replaced with zero right and now I will just come over here and change it to number decimal so this is the another change which I wanted to show you that how you can replace the value and finally what we can do is change the name of this tab is fixing order date so based on you know all the steps that you have done is you can mention it or you can save fixing dates and discount something like at a summary level so this way you will see that all the changes that you have done is is mostly it is covering or you can say fixing it's discount and reagent field so that's pretty much it what I wanted to show you in this first cleaning video we have I have a couple of videos for rest of the issues that we have discussed in the previous video and I'll show you step by step how you can do that so that's that's about it in this for this video and I will meet you in the next video with a new topic hi there welcome back to the next video and in this video we will look at the other data sets for cleaning and if I remember one of the easiest one is the order east where I believe the only issue was related to the sales and if I look at ship date or any other field like order date so it's it's good and I think even the state was fine you know which was and I think region column is also present which we saw as the issue in one of the data sets like or the central so cleaning this should be relatively straightforward I'll show you some of the options where you can clean this data so first of all we can click on this plus icon and click on add step once we add the step again you will see the data distribution is coming based on what we saw earlier when I showed you order date you can always come over here in more options and click on detail to see the better distribution of the data that's what I covered in the previous video so again showed you and this is a pretty nice distribution as you can see that mostly the product within the bin of 2 to 3 or 3 to 4 is being sold you can always click on this and you will see the relative information as you can see it's blue and everywhere the value which is related to this will be highlighted in blue and this gives us some code indication like in segment consumer is sort of leading it and similar related information is present in each and every field so I will just come back and check that and which was here so this is just to give you some idea about how powerful the new tool is and yeah so what we had is since and the issue is with the USD option and what we can do here is we can come over here click on the more options and the more options we have this step clean and within the cleans tab we can remove letters this is the one which we need to use to remove the UST and as you can see it's so easy and surprisingly you know flexible the tool that they are created by looking at the general issues that we face that removing the latter or from an alphanumeric string removing the numbers so they have created you know the backend algorithm for that and they means the developers so all thanks to them that the data cleaning has become really easy and the one last thing which I will probably do is changing the data type to number decimal and this way I have the different representation which is which is making sense so that's about cleaning up the order underscore East now let's take a stab at order underscore West so in order underscore West we one thing we had already done is removing these fields from the right so it's one of the cleaning step and one issue is that you have AZ AZ Arizona or the name is abbreviated you know as a short form for State in comparison with the other data set where you have the full name so let's see how we can improve it so I'll just click on this tab which is and step alright so we have the same distribution information over here as well now let's go to state I think should be attached yeah so over here we need to change this information and this this is a very handy list as you can see it's been grouped by by their respective state name so the option that we have is in this more options group and replace and in this you have the option I'll just click over here which is manual selection so once we double click on this Arizona I just double clicked and I will say Perry so now right so this is how I need to step by step go in and do it so see a is California see oh I'm double clicking it and it's Colorado I hope I've written I am writing the spelling correct and I am just checking my list and looks Colorado they should be right and then Idaho Montana all right New Mexico they wide out and then Oregon ooh [Music] Washington and finally you'll make fire I hope I'm pronouncing it correctly so now all of this this operation that we have done is is completed and the values been mapped with their respective numeric values and once we are pretty much done then we can click over here and say we are done so let's click on that and we are having a much better information which we can see that is inconsistent with my other files that I have all right so we have achieved a lot so now what we'll do is we'll just save this and we will say the autos underscore reach him underscore slow and this is the dot pfl extension that it creates that is tableau flow so now this file is being saved and whatever what we have done is there if you want to review we can always click on each of the claim step and make sure that you know we have the values properly listed over there we changed the order West and cleaning state names let's give it a proper name and for order east we cleaned saves value all right so with that that's pretty much it what I have for you in this video for cleaning these steps and I will meet you in the next video with the new functionality of this time you prepare hello there welcome back to the next video and in this video we're going to share the union or combination or combining the data set so so far we have cleaned the different data sets and now I will show you how you can make a union or combine all of these data sets so to do this it's very very easy it's again a drag-and-drop the unique part of tableau and the tableau prep builder so to combine first of all these two data sets what I need to do is I just need to drag it over on the Union as you can see as I am hovering or dragging it over this particular cleaning step I have two options either I can join or I can union right so because I need to combine these two data sets I can I need to click on or I need to put it on the Union Union is highlighted in orange as you can see right so I am just leaving it over here and now a union snap has been created now what I need to do is I need to pick the order central and drag it over here now the catch is that I just need to drag it over here on the existing Union star not on the new Union which is it'sh it is showing over here because it will create a new staff for me so make sure that based on how you need to clean the data or add to combine the data you drag the step accordingly so since I need to combine it with the existing unit step I will drag it over here which is the add part so it is now combined right and finally I have orders out since there is no cleaning step I will just simply add it again as you can see I am NOT putting it over the Union or join simply dragging it over the add step and now this is been added and as you can see earlier the value was here but it's been dragged here to match up with all the cleaning steps and then the Union step over here once we have done that you can see over here so that setting column has been created and along with that changes so setting is indicating you that all the different tables which has been joined together as an input resulting field you have five mismatched field from 25 fields and over here this is basically your mismatch field which is discount product name for filepath product and discounts so if I just uncheck that this basically shows there's information about the mismatch field and along with that it has also created a new field which is table names again very helpful in terms of we want to understand from where the rules are coming also over here you have the show you only mismatched field so you can check it over here and whatever it was shown over here is shown over here and the mismatch is basically from the color coding you can clearly see is happening from the order central and down there you have discounts and you have discount right so probably that is sort of a bit of a mismatch that is going on over here also in case of if you see the product and product name the value looks like same is just that the name is different so what we can do over here is that we can merge this discount with this discount field so what I am doing is I have just selected this and dragging it over to this let me try it again yeah drop two merge fields so it's merged now and no and it's not showing in the mismatch field similarly I will drag it on the field product field over the product name and now the only field which is remaining is file path so we will just leave it but rest of the fields are now properly matching and we can uncheck this and come over here with our clean data set which is now combined as well one last thing that you can do is renaming this double-clicking over here and we can say all orders or combining all order or you know all regions whatever that makes more sense from the business perspective and understanding perspective all regions right since these are all the regions that we have combined into the Union step over here and it is showing you the further step for the further processing so that's pretty much it I wanted to show you for the union of different data sets into one single one I hope you have got a good idea by now and I'll meet you now in the next video hi there welcome back to the next video and in this video we were going to add one more file to it which is related to the return of order and there are some cleaning steps that we will going to take as well as the step that is related to joining the reader so so far we have already seen cleaning steps and I'll show you some more steps related to the cleaning and then finally we will add it to the regions and we will see all it in action in this video so stay tuned so first of all the option of adding a new file is present over here add connection I'll click over here since it is an excel file I'll just click on the axle I'll go to my next stop and in that I will use the location or I'll go to that location which is written regions region or reason returned reason new click open alright so the file is open over here and down there if you see if I just for my view if I just see that you have row ID order date order ID product ID subcategory with a manufacturer product named written reason and notes so some of the most important information that we have over here is the order ID the product ID we already have subcategory in our main data set may be the order date can be the one and return reason is as can be the one because order ID and Product ID will going to be used for connection or for joining the data IFIF because we need to have some unique or some some similar column values to join the data on so you'll see how we can do that so if I come down even the notes shows some good information so let's see what are all the fields we want so probably we want nodes we want product return reason we uncheck this we uncheck manufacturer we don't need subcategory and even if I look at we don't need order day because order date is anyway present over there and row ID is something not very very useful for us so I think these are the four fields that we would require for our data set and over here we see that a lot of information is present over there and we will see how we can clean this and the written reason is a high level reason that it is defective so to do that for the cleaning of data we will let me just drag it down a little bit so that you can see all the flow and we can click on this head step now the cleaning staff is been added and we see that order I te Product ID looks code written reason defective general incorrect product order in credit product shape no longer needed product listed incorrectly so these are good doesn't look like there is any problem but if we just drag this so one of the very first thing which I noticed in the very first line is that there is a space at the start so that's that one thing another thing is there is this name of the person as Kelly Williams al Jenkins that we may want to extract and as for the data set these are basically the name of the approvers who have proved that rejection for the customer so so the first thing is removing these pieces so I'll go to the options over here and clean and trim spaces so once I click over there you will see that the that space is gone and you will not be having the space problem either on the right or on the left too so that this is creating an issue for us now afterwards what we can do is we can split these two values to get name of the approver so to do that again I'll go into the more option split value and I'll just click on automatic split because it looks like from the data set that dash has been used which the algorithm will deduct and I just clicked on that and over here now we have the values if I just drag it to understand there is any any value which is coming up as has a along with the approver so yeah nothing is coming so I think we are good we can scan all of the values if you want like this yeah nothing is coming but over here you see that you know the name is is getting repeated like it is a capital seed and small C then you have them one dot you have two dot you have slash and so on and so forth so there is an N there is this another cleaning step that we need to take care of so to correct this there is an interesting by a default out of the box property that is there is in the group and replace and it is common characters so if I click over here on the common characters you will see that see Arnold is now one evil yems is one if I saw this one lint fake Lawrence and you can clearly see the names are not getting repeated so if I just drag it a little bit up these these are all the values that we can see and down there you are also getting having a dataset field which was not visible up until now for some reasons but now you can see the respective values as well so this option is really helpful the group and replace and common characters now we can click over here on done and once we click on done our settings been changed alright so finally we have made all the corrections and and what we can do is as the last tab we can just come over here and remove this because we don't need it now so we have this node split one we can change node split two to approval name and we have something like nodes split one we can say customer node or something which is meaningful as per your need and we will rename this tab as cleaning notes right so that's about what you can do and in the next video I'll show you how you can join the data and show you some of the properties related to joining the data so stay tuned for that hi there welcome back to the next video and in this video we were going to add one more file to it which is related to the return of order and there are some cleaning steps that we will going to take as well as this tab that is related to joining the radar so so far we have already seen cleaning steps and I'll show you some more steps related to the cleaning and then finally we will add it to the regions and we will see all it in action in this video so stay tuned so first of all the option of adding a new file is present over here add connection I'll click over here since it is an excel file I'll just click on the X so I'll go to my next stop and in that I will use the location or I'll go to that location which is returned reach region or reason returned reason new click open alright so the file is open over here and down there if you see if I just for my view if I just see that you have row ID order date order ID product ID subcategory with a manufacturer product named written reason and notes so some of the most important information that we have over here is the order ID the product ID we already have subcategory in our main data set may be the order date can be the one and return reason is is can be the one because order ID and Product ID will going to be used for connection or for joining the data IFIF because we need to have some unique or some some similar column values to join the data on so we will see how we can do that so if I come down even the notes shows some good information so let's see what are all the fields we want so probably we want nodes we want product return reason we uncheck this we uncheck manufacturer we don't need subcategory and even if I look at we don't need order day because order date is anyway present over there and row ID is something not very very useful for us so I think these are the four fields that we would require for our data set and over here we see that a lot of information is present over there and we will see how we can clean this and the written reason is the high level reason that it is defective so to do that for the cleaning of data we will let me just drag it down a little bit so that you can see all the flow and we can click on this head step now the cleaning step is being added and we see that order IT Product ID looks good returned reason defective general incorrect product order in credit product shape no longer needed product listed incorrectly these are good doesn't look like there is any problem but if we just drag this so what are the very first thing which I noticed in the very first line is that there is a space at the start so that's that one thing another thing is there is this name of the person as Kelly Williams L Jenkins that we may want to extract and as for the data set these are basically the name of the approvers who have proved that rejection for the customer so so the first thing is removing these spaces so I'll go to the options over here and clean and trim spaces so once I click over there you will see that if that space is gone and you will not be having the space problem either on the right or on the left too so that this is creating an issue for us now afterwards what we can do is we can split these two values to get name of the approver so to do that again I'll go into the more option split value and I'll just click on automatic split because it looks like from the data set that - has been used which the algorithm will tip tact and I just clicked on that and over here now we have the values if I just drag it to understand there is any value which is coming up as as a along with the approver so yeah nothing is coming so I think we are good we can scan all of the values if you want like this yeah nothing is coming but over here you see that you know the name is is getting repeated like there is a capital seed and small C then you one dot you have to dot you have slash and so on and so forth so there is an M there is this another cleaning step that we need to take care of so to correct this there is an interesting by a default out of the box probability that is there is in the group and replace and it is common characters so if I click over here on the common characters you will see that see Arnold is now one evil yems is one F I saw this one lint fake Lawrence and you can clearly see the names are not getting repeated so if I just drag it a little bit up these these are all the values that we can see and down there you are also getting having a dataset field which was not visible up until now for some reasons but now you can see the respective values as well so this option is really helpful the group and replace and common characters now we can click over here on done and once we click on done our settings been changed alright so finally we have made all the corrections and and what we can do is as the last time vegan just come over here and remove this because we don't need it now so we have this node split one we can change node split two to approval name and we have something like nodes split one we can say customer note or something which is meaningful as per your need and we will rename this tab as cleaning notes right so that's about what you can do and in the next video I'll show you how you can join the data and show you some of the properties related to joining the data so stay tuned for that hi there welcome back to the next video and in this video we will going to look joining the two data sets so so far we have seen we have looked at cleaning the data you know joining using the Union to combine the data set and this in this tab we have just added in the last video the returns new and cleaned up the notes field which is this customer node and approval names or some operation we did relate it to the cleaning and now let's go ahead and see how we can join the two columns so you can see that once we drag over this cleaning node stop over here you have option of joining Union and add so we can come over here and click drag put it on the join and there are some changes like for example you have this option enabled which is applied giant clauses giving you the information about on which column the joiner has been added and the type of join that has been applied is inner join and summary of results mismatched values included values over here and excluded is almost like thirteen thousand six hundred sixty two values that has been excluded and that is because the type of joint that is being applied its inner join and I will show you in a minute how you can you know how you can correct this so join Clause recommendation at the end is on the order ID customer node and customer ID and so and so forth but an order ID it makes sense so you can click on this plus icon and over there you will see that join is now added on not only in the product ID but also on the order ID right because order ID is also present over there now if we want to correct the exclusion of the values as I just applied the order ID the excluded value count is increased from 13 thousand six hundred to fifteen thousand six hundred and if I based on the requirements what we want is basically all regions data and data only that is been returned so that is basically your left join so I'll just click over here left join and you will see that the shape over here is changed and now the excluded value from all region is 0 but from the cleaning step is just 3 and the joined result is 16300 2 so it's like we have corrected did the entire error that we wanted you know from the from this tap respective that all the data for all the regions along with the cleaning result we should get it from there or from these two data sets one other thing which you may be noticing is these red columns or sorry the read red values and this basically indicates the the order IDs or the product IDs which does not have the written corresponding written value and which has the corresponding return value is highlighted here in the black as a normal value so that's what it indicates so now while the join steps been completed you will see that if you go on the right hand side you now have like we have have it earlier in the Union step like table names we have file paths and then we have product ID for cleaning nodes and all regions so what we can do is we can just click over here and add and step which is a cleaning step which will give us the final output out from this joint one and if you see that the customer node approver name table names file path and if I go to volts till right we will see the the new column which has been created with the joints so like order ID one product ID one right so this is something which has been created with the when the join has been applied and row ID is again I I don't think we really need it we can remove that and I'm just checking if there is anything else that we need to take care also if you are interested not interested in scrolling you can simply look at like order so you have order order ID 1 and if I say product you have product ID and Product ID 1 so this is something which I was you know looking into and trying to figure it out which one to remove and which one not remove so if you look at product ID and Product ID you have some null values over here so definitely coming from the return field so we will just go ahead and click on remove similarly if I look at order the order ID field over here so this is there's like all the IDS and this is the order ID with the null and you have 251 values over here and 5000 values over here so let's go ahead and remove this alright so once we are done over here we have our data set with most of the values which are present and one thing one new value that we can create is with the help of if I just yeah so written reason so if the written reason is now then you know the order is not returned and if it is if a value is present then it is returned so what we can create is the new returned question mark if is null null what return reason then no it is not returned else yes all right so I take it as double quotation which is creating an issue let me change this to here and there is one error if missing and finally and so we are good apply and save so that must be on the left hand side if I'm not wrong yeah left and there are so many products which have not been written and there are few products like 621 roles or 621 orders that has been written so that's new value that we have created here and one last thing that we may do is just to find just to create one more information column for us is how many days it is taking to ship the order from the day when we received the order so we can create a new calculated field and we can give it a meaningful name like these to ship and there is a function called date diff and we want it with the number of days we will specify day and like over here in the example if you want month we can we could have used month but we want my days and first is the start date which is order date the date we receive the order and end date is basically the ship date the date on which we shipped it ok and we will click Save and we again on the left hand side we have clearly for is is coming out four and five is coming out as one where on an average the days which is being taken to ship the order from the day it has been received so you can look at it in where it is happening and this is one of the criteria which can make your customer happy if you can move this distribution from here to here and that's the power of data that you have in the hand by looking at each one like just clicking on for and over here you can clearly see from which table it is so ordered vest is something which is contributing it followed by office supplies followed by you know the information so on and so forth so that is something for us to look at so for now I'll just leave it as it is and the final thing is to to add the output once we are done with pretty much all of this information in terms of joining processing cleaning and all of that thing is taking the output so that we can further use it and you also have the option before I take the output step is right-clicking over here and you can preview it in the next stop so if I just click it over here it is showing that flow is running flow is completed in just one second and the tableau workbook is open and I have all of my values over here and now which I can save it to you know keep using it as I need but since there will be new data which is coming like every day every month or you know based on the frequency that you have set you may want to click and say add output and you can click come over here and say you want to save the output as a publish as a data source or you want to save it to a location as a tableau data extract dot hyper extension W data extract or DTD e or comma separated values so let's save it as dot TDE and where I want to save it is something which I can specify over here if I just drag it so this shows the entire location where this this particular file is going so we can run the flow and this will create an extract for us so yes we have done it and we have created the entire flow with the TDE so what I can do is I can just copy this and in the window here run we can see the name the name is output and output is present over here as a tip eww data extract and we have created this so that's all about how you can then you know join clean and we have looked at a lot of things in this this particular tutorial and finally took the data out in as a tableau data extract and you have couple of other options like hex CSV or data extract or hyper extension and you can run the flow which will help you take the data out and one last thing is dot t-- de and dot a hyper dot hyp is basically for the version which is beyond ten point five and anything between ten to ten point four you have dot T de and if you don't want this then you have always the option of dot CSV which you can take it and you can you know use it in any version as per your needs so that's about it I hope you found it useful and I'll meet you in the next video the new tutorial hi there welcome back to the next video and in this video I will going to show you the data interpreter functionality of tableau prep so most of the times it happens that we don't get the data in the format that tableau prep can understand and translate into proper row and columns for the further analysis of data and in scenarios like this W prep can be real helpful so what we can do is we can simply go ahead and let me try to connect with the data which is in the microsoft excel file and the data set is FBI case study if I double click on this we have the data in two different sheets one is the analytics question that's the case study question that I prepared and posted earlier but the data real data is present over here so all what I can do is I can take this data and drag it on the screen you can see the data input step is created over here and down there you have the information about the field name original field name and sample value but if you see something odd is happening over here where we don't have the field names but the f1 f2 f3 some arbitrary freed values or the field name values is what you get over here and that is mainly because the data is not properly positioned in the sheet so that W prep can understand and what I mean by that let me show you here in the sheet itself so if you see this sheet this is the data that we are trying to import the name of the tab is data tab and over here the first row is having the information about complete view of data the second row is blank third row is show entries fourth row is basically search and fifth row is basically the place where the data is starting so over here you have then the data up until like row number 71 to be precise but row 69 is where you know you have the data in all the rows and all the columns so how we can interpret you know properly import this data as you can see right now the information is not coming properly so for that what you can use is basically the data interpreter you can check this box so if I just click on this I have checked the box and what W prep has done in the backend it has to run it is its algorithm and identified where actually the data is present and down there you will see that it has picked up the different field names and their respective values so this way you can take the data inside where the data is not properly formatted and once you have the data you can add the step which is a cleaning step and then can further analyze the data that you have imported but mainly I wanted to show you how you can use the data interpreter in scenarios like this so that you don't face any issue in getting these types of table which are like which I used which I basically say the formatted tables and these formatted table sometimes is a pain when you are importing it but thanks to data interpreter you don't have to do all that different processing as a manual processing in the excel file but tableau track can take care of it so that's about it and I'll meet you in the new video hi there welcome back to the next video and in this video I will going to show you how you can create a long structure from the white structure within tableau prep so what that mean is for example if you see from the previous video that we imported the one of the formatted table using the data interpreter where tablet data interpreter cleaned the data by looking at the title or some of the formatting options that was applied for a properly formatted report but to read the data it has excluded all of that and just considered the data which we want to analyze but if you look at the structure of the data you will see that it is a white structure or sort of a pivot structure where on the left hand side or the first column you have the sector information and then for each year you have their respective values now tableau works well when the data is in a long format instead of the white format like this in the white form and there is a lot of restrictions on what you can do but in the long format you can perform more function more functionality more visualization in an efficient and better way so how we can convert it from a white structured long structure is basically by adding a pivot step so this is like unfitting it using the pivot step so what we can do over here is after the cleaning step we can add the pivot step so once we click on the pivot step it were going to ask us what are all the columns that we were going to you know star take the pivot into consideration so all of these columns basically so either I can drag one by one like this or use you had seen that it was there was a wild-card option so what I can do is uncheck uncheck this and click on the wild-card and over here we will say two zero and if I just press ENTER you will see all of the values which are coming over here so once we have these values you will see that a new column has been created with the name pivot one and pivot one values and all of those rules which were all of these values which are they were coming in a column is now has been converted into a long format and which is a row for each row you have a respective values for their respective sector right and now what we can do is we can just double click on this and we can call it here and over here what we can do is we can simply go ahead and call it values or whatever that particular value is whether it is like the production value or some some other value and once we are done with this we can see our data if I just go ahead and click the clean another step you will see the ear information you will see the values information in sector information now up to you if you want to further clean the air for example splitting the values and all you can go ahead and do that but mainly I wanted to show you how you can add pivot the pivoted data and create a long structure on which tableau can perform the analysis and visualization efficiently and you can create a stunning dashboard so that's about it in this video and I'll meet you in the new video hi there welcome back to the next video and in this video I will going to show you how you can aggregate data within tableau prep so aggregating data is a common step that we want to perform most of the times when analyzing the data so in this video what we have already done in the past two videos is we have imported the data using the data interpreter and then we have pivoted the data to create a long structure now let's say if we want that we so for example here what you have is for each year for the respective sector you have some values right now it can be production values or it can be some other values I can't really recall at what the values were really referring to I think those were FDI value because it was FDI case history so that's a foreign direct investment in India that basically came in into their respective sector and in the perspective but what if we want that irrespective of sector we just want to understand that what is the average FDI that came in in the entire year so what we can do is to to create a structure like that we can add these tab over here and the staff will be the aggregate step alright so what we need to do over here as you can see we have two sections over here the group fields and the group fields is basically your dimension and dimension is sector or here and another field is aggregate which is the numerical field and it will be values so to do that first of all I can just double-click on here let's say my question is if I repeat myself is what is the average of Ti that record in a respective years so I just double click on that and here is my year value and if I want values now what we want is average but here what we have caught is sum so we can click over here simply left-click over there and based on what we want as a measure we can choose so we have average median count distinct count so on and so so what we can take is average right and as I select the Harwich the aggregated field is coming over here but as you can see the values are summarized and it is not coming respective to each of the year so what you can do is come over here and click on detail and now for each year you have the values that what is the average fti that has come in the respective year so for example in 2001 I think it was 68 0.37 is what we come I think it should be in millions of dollars or maybe billions of dollars I am not really pretty sure about what the value is but what I want to show you is the respective value for each of the year that has come in as a part of FDI so if you look at in the pivot step if I just click over there you had one around 1000 rows 1 0 88 to be precise and when you moved to aggregate its tab what you had is just the 17 rows so think about this that aggregate data if you aggregate it to a right level for your analysis it can really help you to save a lot of processing time as well as the faster response when you are showing it to your user so whenever possible and based on the dimensions and measures you need try to aggregate as much as possible and then use it as an extract out of it to start producing the dashboard so that's about it I wanted to show you in this video and I'll meet you in the next video
Info
Channel: Abhishek Agarrwal
Views: 21,635
Rating: 4.9298244 out of 5
Keywords: complete tableau prep tutorial, tableau prep, tableau prep builder, tableau prep hands on training, tableau prep training, tableau prep builder training, tableau prep tutorial, new tableau prep builder course, tableau prep;, tableau prep builder tutorial for everyone, tableau prep training for beginners, visual analytics, business analysis, data analytics tool, business dashboards, tableau, tableau software, data prep, Tableau data prep, tableau prep overview
Id: sVf5O6-zNEY
Channel Id: undefined
Length: 94min 44sec (5684 seconds)
Published: Thu Jan 30 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.