Data & Business Intelligence terms every analyst *should* know

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
are you confused by the jargon of data modeling and business intelligence in this video i'm going to try and explain some of the keywords in the data modeling and business intelligence world i'm going to try a different technique this video will be almost like you standing just right next to me watching over my shoulder while i explain the concept i'm a little nervous to try this concept of explaining this but i'm also excited so let me know in the comments if you enjoyed this style of delivery let's jump it let's start off with one of the key concepts when it comes to data modeling a table now a table is a collection of rows and columns here is a table of some transactions that are happened on our fictitious awesome chocolates company as you could see here we have a date in the transaction against which product the product id which customer bought it and how many units so this is a table now these kind of tables where you record things or as they happened are usually referred to as fact tables because they contain the facts but if i want to know for example what is the specific product that particular customer bought or who is this customer what is their gender then we are not able to answer those questions from this type of a table you need a supporting table to provide that information so here is a dimension table that tells me the information about the product so here i can see that product px23 a11 is a milk bar belonging to the bars category and it cost 5.26 so this is a dimension table because this tells you the information from the product dimension so here is my fact table here is my dimension table and here is my customer story which tells you who is the customer what is their name how old they are and what is their gender and this goes on like that you can also have a dimension on the date values itself such a dimension would be a calendar table so a table is a collection of some rows and columns you can have a couple of different kinds of tables you can have a fact table you can have a dimension table now when it comes to dimension tables there is a special type of a table called calendar table that shows and talks about time so in the data modeling or business intelligence world you keep hearing these words fact dimension and calendar fact is where usually most of your numbers are stored and the dimensions explain the story from different dimensions one easy way to think about fact and dimensions is that fact tables are usually tall and narrow let's take a simple store like awesome chocolates they will be selling hundreds and thousands of chocolates every day so you will have fact table that records all the facts as they happen and this will quickly grow to be millions and millions rows long whereas a dimension table on the other hand tends to be fairly short but it is wide if you take your product you may not have millions of products but for the hundred products that you have each product may have 100 different columns each column talking about different sides of the product so that's what a dimension would be so a fact is a tall and narrow table whereas a dimension is a short and wide table now that we understand what tables are let's understand how tables can be related to each other this is the relationships that we normally talk about so for example here is my fact table that tells you the transactions and here is my product table and you can take the product id like this p px23 a11 and then attach it to the px23 a11 here to figure out that this particular transaction refers to milk bars of bars category and the price is 5.26 in this situation because this particular product can be purchased many times by different customers the transact the product id px23 11 can appear multiple times in the data set but it will only appear once in the dimension table because of this nature this relationship is many to one many times here one time there just as you have many to one you can also have relationships that will be one to one one to many and many too many whenever there is a relationship the relationship would be based on a column and this kind of relationship columns are loosely referred to as keys not to be confused with the keys that you open your doors or the ones that you use on your pianos whenever there is a key concept the keys can be divided into foreign key and primary key primary key refers to the column in the table where the information is maintained so for example product id if i want to know the full story of the product i will go here so in this case the product id column in my products table is the primary key foreign key is the key where it is used just to link it up so here i have the same id but i don't really know the full picture of this id so this is my foreign id here foreign key and usually when you are making a relationship you will attach the foreign key to the primary key now let's talk about modeling [Music] oh wait not that kind when you have multiple fact and dimension tables you can connect them and hence you will end up creating a model so in this case we have our transaction table and this fact transaction table is connected to my products dimension my customer dimension and my calendar dimension when you end up connecting these through lines what you have created is a model now this kind of a model where there is a central fact table and then multiple dimension tables on the outer edges is referred to as star schema it is called star schema simply because it looks like a star with a central concept and multiple spokes on the edges you can also have another type of schema called snowflake schema so for example from the product you can attach this product to a vendor table that talks about which vendor supplies which products in this case what you have built is a snowflake because we start off with a star but some of these outer nodes themselves can be a star with more connections on the edges within the world of power bi and many other business intelligence applications it is fairly common to find a star schema but occasionally you will bump into snowflake as well let's talk about measures kpis and values going back to our fact table let's say for some analysis i want to see how much quantity has been purchased so we could kind of add up all the quantities to see what is total quantity but if i want to see this quantity broken down by individual categories or gender groups or by day of week we need to create a special calculation whose definition would be sum of quantity column and then we need to refer to that definition every time we calculate the total quantity against a specific category or a specific gender group this definition of what you want to calculate is referred to as a measure alternatively we also call this as aggregation or value you can relate to the word value from the pivot tables that you may have created in excel so all of these are interchangeable words but within the world of business intelligence normally this is referred to as measure when you have a measure such as total quantity you can then calculate in for all the values or you can kind of break down and look at it by individual categories or gender that kind of thing is nothing but slicing because we take the total quantity and we slice it into chunks one corresponding to each of our product categories another definition or another word that people use for slicer is filter because what we are doing is we are just filtering it down to look at bar category total or female customer totals so imagine a star schema like this with some tables and some measures that you have attached to the fact table that tell you total quantity total amount etc all of this together that the collection of tables their relationships and the measures are referred to as model again not this kind of model another word for model is cube or even very loosely a data warehouse so when you talk about building a model within the context of power bi this is what we are doing we are bringing in some tables we are making connections between them we are defining some calculations and that's what data modeling refers to let's back it up here for a minute and talk about how we obtain the data usually your data is coming into the reporting platform such as power bi by means of some connections because the data doesn't really automatically sit within the reporting systems it has to come from somewhere else so this is the process where we connect to the data sources the data source could be your sap system your sql server database it could be a customer relationship management system like a sales force or whatever so all of that process is what i'm going to talk about next so here is my transaction table and in order to get this we need to connect to our sales system and then bring the data over this process is usually referred to as querying because i'm asking the system to provide me with a snapshot of the data within the context of power bi in excel this is also referred to as power query but let's say your system that provides you with this data doesn't really give you the data in this format it gives you in certain other format then occasionally you may have to transform it change the shape or size of the data or the nature of the data such transformations can be done directly in the source system and bring the data over or they can be done as part of the process wherein you are fetching the data this is where power query offers you some solutions as well because we may need to transform data sets from multiple systems and then bring it all over for our reporting solutions normally we engage in a process called etl etl refers to extract transform load extract is where you go into different source systems collect the data transform is where you change the shape or size or nature of the data and the load refers to loading the final output data into your reporting platform such as power bi because such etl processes need to run periodically so that your reports are up to date you'll commonly hear about another word called as batch process a process that runs at a specific point of time or when the specific conditions are met and then it will run all your etl processes give you final output data so that you can update your reports you also hear about this word data pipeline not to be confused with the ones that you smoke which refers to the concept of how you are getting your data what happens to the data as it travels through the pipeline before you generate your reports another common set of words that we hear within the context of data modeling is source and refresh source refers to the place where your data is originally maintained and refresh refers to the process wherein we ask our reports to be updated based on the latest data did you enjoy that video i'm also planning to try and record a similar video explaining the power bi concepts and technical terms let me know in the comments if you would like to see such a video and if you would like to learn more about data modeling measures and the power pivot dax engine check out my live stream that i did a while ago it is linked on the screen right there thanks again for watching tada
Info
Channel: Chandoo
Views: 23,323
Rating: undefined out of 5
Keywords: chandoo, chandoo.org, Excel, data analyst, power bi, what is star schema, what is snowflake schema, define data modeling, explain table relationships, business intelligence jargon buster, data modeling explained, data analyst interview questions and answers, key terminology in data warehousing, technical terms in data science, what is data warehouse, primary key vs foreign key, define etl, what is a measure, measure vs. aggregation
Id: a906b80lCE8
Channel Id: undefined
Length: 12min 20sec (740 seconds)
Published: Wed Jul 14 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.