Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehouse Architecture | Edureka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys my name is varda and on behalf of ADA Rekha I welcome you all to this webinar on data warehousing so I'll be your instructor for trade and in today's session I'm going to talk about the data warehousing concepts okay the weariest concept that are related to data warehousing like OLAP schemas dimensions and facts all these things are all the important topics I'll be covering all these things in today's session alright guys so without wasting much time let me just go to the agenda slide so guys this is the agenda slide and this is the topic that I will be covering in today's session okay and this will be the order in which I'll be covering them for those of you who are new to data warehousing I will give you a quick brief about data warehousing and I will talk about what data warehouses and talk about the data warehousing architecture in detail and then I will immediately go to the different concepts involved so the important concepts that are there are what overlap is so OLAP stands for online analytical processing and I'll talk about will app and detail and then I'll talk about the various will apps cubes that we will usually work on right for any company or anyone working on a data warehouse the different OLAP cubes that they work on so I will talk about these things and after that I will talk about dimensions facts and measures so these three topics are interrelated so I will talk about them in one segment and the final segment of today's section is going to be about schemas all right so this will be the three or four different topic that I'll be covering in today's session so guys any doubts here are you guys fine with the agenda and and I hope everyone can hear my voice and see my screen so guys if that's the case please put your acknowledgments in the chat box and if you have any doubts during the session please put them also in the chat box and I'll get back to you at the earliest all right so can I get up with yes one everyone all right great so have a couple of guesses from here all right Anita is saying yes no Phil is saying yes all right great a nutbar is saying yes all right guys so let's get started then first of all what is data warehousing okay let's understand what our data warehouses what data warehousing does and why it's seeded and what are the added benefits right so what are the data warehouse a data warehouse is like a relational database designed for analytical leads so you guys might all be aware of water databases right so you have your database where you Beast large amounts of data right you'd be soaring either in your database or even in form of flat files okay but flat files like Excel or something cannot be used for storing large amounts of data there is a limit on how much you can store in an Excel right so that is the case why you stored data and some kind of database so example of databases are those of Oracle my sequel Microsoft sequel server all these things so you have all these databases you will store large amounts of data okay but the problem is they cannot be used for analytical purposes so that is why we have to convert or you know get the data into a database get that inside a data warehouse and when it's inside the data warehouse then we can perform analysis so that's the advantage so the data that once has come into the data warehouse can be used for analysis for visualization all these things because you can look at the data and from a different angle because the data that comes into the data warehouse is more information rather than just data right so what you have your resist data which you can probably view and you can write data into these places and you can just view preneur high levels okay you can view from the level of every year transaction that happens so every time a sale happens then the entire table there or the entire database will get updated okay all that will happen button when you want to perform analysis like what happened on this day and how many products were sold during this period of time all these things cannot be gained from a database right so that's why we need to put them all into the data warehouse and once it's here we can perform these kind of analysis and you know get insights from the data warehouse and of course since the data borrows your raw or your end-users your business analysts your data analyst they can all use it for analysis and visualization right so that's the thing about a data warehouse and that's the use of a data warehouse and it functions on the basis of full app right so I told you will a person for an online analytical processing so the whole process of fluid analysis or running queries on data where I was right it's done on the basis of online analytical processing right so that's one of the activities that is done so if you are running or any kind of queries on the database then it will be called as LDP online transaction processing but any kind of activity or queering on all these things when it happens on data it's called OLAP so that's what an OLAP is guys and yeah basically the central location where a consolidated data from multiple locations are stored so you get data in here from multiple locations from multiple sources you can get it from you know one or two databases you can get from one or two flat files you can get all this data as long as you know you are putting it inside the same table or inside the same columns and dimensions then you would be fine because you can get any amounts of data from any source and you can get them all and store them all inside your data warehouse so is that all about data warehouse is data warehouse so simple can we just simply get the data in from database and get it into a data warehouse and then perform analysis is such a thing so simple is that going to be the question on your minds okay does anyone here has that kind of fit out if you guys have that out please raise your hands raise there's an option on the right side of your raw panel there's no option to raise your hand so I mean you're supposed to get that kind of a doubt and it's a very valid question right because it looks so simple from the outside but is it that simple yes okay I see a couple of hands raised from Akbar and I see a hand raised by Anita all right so yeah that's the thing okay it's not that simple now let's see what the other activities that are involved in this whole process of converting data inside the database converting it into an information and storing that into data warehouse and then performing analysis right so these are the entire set of activities that are involved okay now we can also consider this to be the data warehousing architecture now before I explain all the activities and all the different stages that are there in this data warehousing architecture let me first of all go to the definition okay so this will ask the scholar data warehousing all right and data warehousing is the act of organizing and storing data in a way so as to make its retrieval efficient and insightful okay so this is the key terms your to make it retrieval efficient and insightful so you organize the data and you so the data in a data warehouse in a way you can access the data at a later point of time and that kind of an access should be easy you should be in efficient ways to be able to retrieve the data there could be meaning out of that data it should not be the same data that you store inside your database okay so inside the database is the data that you collect so you collected on your sword inside a database okay but the data that you get from a data warehouse should not be the same so it should serve some purpose it should have a meaning and it should be for your benefit so that's the whole point for data warehouse so that's the difference between a data warehouse and a database all right and the data warehousing is also called as a process of transforming data into information okay so relevant data or more process the data that is also called as information so that's what is stored inside a data warehouse and there's a lot of difference between the data stored in a data warehouse and the data that is sold inside a database all right so that is the thing now while speaking about the architecture and means let me explain the diagram here first of all so you have your data inside your different data sources right it can be a database or a flat file and that data should first of all be transferred into the data warehouse but how was it done there is an intermediate stage here that's called the staging area okay the place where the data is stored that is called the staging database okay now this is a temporary your storage it's not going to be there for long it's very temporary so it and the act of getting the data from here to your sitting area is done through the process of ETL okay so ETL your stance for extract transform and load ok and once it comes into the temporary storage again the cross of ETL is used to prance the data inside the data warehouse alright so so once the data is in take your data warehouse it's not just your raw data that's going to be your ok it's going to be categorized and it's going to be divided so you can think of for these three things mainly so you will have your raw data you will have your radar data and you will have your aggregate data ok so first of all your raw data is the rows or columns or the actual data that's being transferred ok so this next of the junk of the data okay but metadata is the most important aspect which power the data warehouse and which differentiates between database and data warehouse why because metadata is something that's going to work give you data about your raw data so whatever rules that are there in your data warehouse or so whatever information that is there inside your data warehouse right so information about that particular data will be stored in secure metadata so we'll have data about which are the different tables that are there in your data warehouse what each table does and what kind of attributes are there inside each table what kind of information what will be the data type of the data present on each of those attributes so all these things would be controlled and governed by metadata right so without your metadata your data warehouse is basically useless and metadata is the you know so anyone who is working on our data warehouse right they will tell you that metadata is what differentiates everything that's what makes their life simpler and that's what is actually the difference between a database and a data warehouse right so this serves the whole purpose and the whole difference between the two so that and then you have aggregate data right so all these three things together they form the data warehouse and then once the data is inside your data warehouse your end-users can use this data to perform analysis so how do they do it they run queries okay so you have user groups here and they perform queries okay now the act of performing queries on a data warehouse is called online analytical processing okay that's the OLAP okay so such queries are called OLAP queries because they will be analysis based right so such queries will be analysis based so that's why it's called on an analytical processing and you have something called as data marcher okay now don't get confused because data Mart's again not something entirely different it's again a part of data warehouse zoning so the end-users your they can run that ways either directly on later where else or on separate data match now you have something separately called a data Mart over here because maybe you want to give the extra security in your data warehouse systems right so you can see here there is user group 1 there is user group 2 and there is user group 3 ok and each of these people they don't get access to everything and the whole point of even having the data Mart is are simply to the restricted access to your data warehouse so your organization's or your company's entire data right whatever is there that will all be sold in sitio data warehouse ok but what if you don't want people to get access for everything and if you want to just give them restricted access and give them access only certain tables and certain records so at that point of time you can use the data Mart so a data Mart again you know it is called small version of data barrows where it does have data about a particular domain so supposing the user group won the by talking about here is something with sale scene right then you can have a sales data Mart and the fields data Mart will only have the sales data from the data warehouse right and then supposing this is the operations team right and they want to access to the operations data only so then you can form a separate data Mart for the operations team called the purchase Mart or something and then based on that they'll get access only to the data that is present inside this Mart okay so even even though do all these things we present is at the data warehouse so you get access only to this particular table or this particular domain or this particular field so that's what the difference is so data Mart is just basically dividing your own data warehouse into smaller parts so that different sets of users or groups get access to only the relevant information that they want ok so that's what data warehousing is and these are the various equities are involved in data warehousing ok so this is the diagram of the data warehouse architecture and I hope nobody here has any doubt this again is do have an outer please or shoot them right away and I'll ask them because next up is the various data warehousing concepts ok now if you want to understand data browsing concepts in detail then you have to have complete understanding of this I mean of course it's simple so I don't expect you people to have any doubts but yeah if you do have any doubts please put them in the chat box and I'll get back to you all right so moving on the data warehousing concepts now let us understand the various concepts revolving around data warehousing like OLAP dimensions facts and schemas so first of all let me give you a brief about will up again let's go into details here as it says will up is a flexible way for you to make complicated analysis of multi-dimensional data ok so when we say multi-dimensional data then whatever data is stored in a database it has multiple views right so they'll all be sued in such a way that you can perform analysis then we store all the different tables will be linked with each other right so there'll be different views every different categories of data and all these things you know to perform analysis then you use the OLAP activities right so the hola you run on that data and whatever data stored in a data warehouse that's called multi-dimensional data and the very act of you know storing it is in the form of OLAP cubes right so you have OLTP on one side and we all roll up on the other side so as you can see this line it says OLTP systems use data stored in form of two-dimensional tables okay an example of this would be any excel so you will have your rows and you'll have your columns here okay you'll have your rows and columns and you'll have your data stored over here so this is your LDPC can run on them and you can not do all these things okay but reference with the OLAP is they'll be stored in form of OLAP cubes right so they'll be multiple dimensions there will be multiple views that you can get of the same data with respect to the year with respect to the number of sales with respect to the different products all the things you can get a single view of all these things so as you can see this diagram is more for 3d view right so that's what will a pure stands for so we call it OLAP cubes and a word word reader that is stored inside your data warehouse and with Radar you access with the help of lab activities then that data is called as multi-dimensional data so that's what the whole line here is all about right so your data basically is converted into a multi-dimensional form and that particular data to access that data you use OLAP queries and yeah it will be stored in such OLAP cubes right so these are the three different things and the next point it says data warehousing is modeled on the concept of OLAP yeah this again this is what I told you right so what our data is stored inside a data warehouse is basically stored in a multi dimensional form and it's a certain form of cubes and to run your queries on your rock you you run OLAP queries right so that's why the entire process called OLAP cubes and processing multi number dimensional data on your OLAP cubes right now that's where data warehouses but with your data based database however are remodeled on the concept of OLTP right online transaction processing so your data bases are modeled on the concept of Ltd and your data warehouse is modeled on the concept of OSAP or so that's the key difference between the two or the based on which these two are remodeled and your LTP systems use data as sorted form for two-dimensional tables with rows and columns and your OLAP will have multiple views right so your OLAP cubes and cuboid so you will a password in that infrastructure but whereas your LD be there you just have your x-axis or y-axis or your two-dimensional data right so that's the thing about OLTP and yeah that's the difference between the two and well the advantages of over LTP are first of all that when you can see data from multiple angles right from different angles from different views from different dimensions then you will get new ideas right so you get new insights you can think of different things and your whole job of making analysis and getting insights will become simpler it will become easier with the hazard book or lab activities right because it supports so activities like filtering and sorting of data so maybe filtering and sorting of data may be possible even inside OLTP okay but there is a limit on that so if when you have data stored in multiple tables then you cannot do filtering and sorting of data right now that is the advantage with a data warehouse so even if you have data stored in multiple tables you might have four different tables or five different tables and will have like thousands and millions of rows and you know transaction that will be stored in those tables okay in spite of all these things you can perform filtering and sorting operations or any mathematical operations like calculating all these things okay you can do all these things and get the output inside one single new table so that is the difference between OLAP and OLTP right such activities cannot be supported on OLTP idea so it finally says that you know data can be refined so when you can perform such kind of analysis when you know your data looks much simpler for you to read right when you can gain some information out of your raw data then we called refine it so that's the other advantage of OLAP you know that the other advantage that Olaf has over OLTP so that's the other thing about on an analytical processing right so now I've spoken about these things now let's go to the next slide so first of all in OLAP access there are three different types of cubes okay so the first one is molap roll up and roll up okay so you know it we also call it m ola mo lab or mo lab so this answered multi-dimensional online analytical processing rollup stands for relational online analytical processing and whole app stands for hybrid online analytical processing alright so guys the whole topic here is OLAP cubes right so your OLAP cubes is where your data will be stored okay now the analysis that you will do the kind of queries that you'll run they will all be OLAP queries and they will all be on the multi-dimensional data okay so the data that's going to be stored inside your cubes is going to be multi-dimensional data okay but your multi-dimensional data where are they going to get stored okay so that is be your whole topic sure so there are three different types of places where you can sort your multi-dimensional data okay so there are three different types of OLAP cubes so the first type is multi-dimensional OLAP cube okay when we say multi-dimensional OLAP cube then this is your default type of OLAP cube so your mole up is a form of olá that process and source data directly into a multi-dimensional database so you have a multi dimensional cube and then your readout be swollen site that particular database now the advantage here is that it will give you excellent performance and it can perform complex calculations okay but the problem is only limited amount of data can be handled in your Molag okay but then there's a difference between molap and OLAP ok I'm not that's where rollup scores over at molap so roll ups and for relational online analytical processing now roll up the form of OLAP that performs dynamic multi-dimensional analysis of data stored in relational database rather than in a multi-dimensional database now what this means is that in your molap you have your multi-dimensional data you will be storing it inside a multi-dimensional database right so this will be your Ola cubes okay so what that means is you have your data that we saw inside a multi-dimensional database okay thus making it a multi-dimensional data and that is one type of lab that is a multi-dimensional olá but relational olap is getting that multi-dimensional data converting it into a relational data unless you're going to get a relational database right so that's what it says here in the process of storing data in a relational database so in your roll up you will have your multi-dimensional data stored inside your relational database and then you will be running multi-dimensional analysis and multi-dimensional queries on a relational database so that is the difference right so your queries that you'll be running your data is basically going to be multi-dimensional all right even your queries that you'll be running will be the same it'll be OLAP queries okay but the difference is the place where it's stored the data in case of for OLAP is stored in a relational database okay but whereas in case for multi-dimensional OLAP it is stored inside a multi-dimensional database itself now the other difference between the two and the speaking of see advantage and why rolap can be used instead of molap is because a greater amount of data can be processed in this case all right but in case of small app what is limited amount of data can be handled at any point of time but however the problem is it requires more processing time and a lot of disk space now it needs more processing time because you are going to convert your multi-dimensional data into a relational data and once you've converted that you are then stored into a relational database now this is certainly more time consuming than your multi-dimensional OLAP right so that is one set context with your raw relational olap and also your amount of disk space that will be occupied because of all these process is going to be greater now that's the disadvantage but of course that comes with the benefit right so you're getting something out of using this so that's the difference between multi-dimensional OLAP and relational olap and then this is the two basic differences and then you have a third one that's called as a hybrid OLAP right so your raw hybrid will have this basically combination of both your molap and your roll up so the positives and your advantages of both is used in your raw hybrid Allah so the advantage with the roll-up is that kulap can drill through from the cube into the underlying relational data right so what it means is you will have your cube here and you'll have the underlying relational data right so using hole up you can drill 2 into the relational data using your review so that's the thing about whole app which makes use of the best features of both your multi-dimensional OLAP and your relational olap right so this is the three different types of OLAP cubes so this is again one important question that you'll be asked in any job interviews right so if you guys are you know operating for any job interviews then you have to know the difference between these three cubes because you your work will primarily be revolving around these three aspects so so Kate in the audience do we have anyone that is going to appear for an interview very soon awkward naw fulanita any any of you okay versus he is going to attend one very soon all right so you're still learning upward okay great so he's still learning data warehousing and probably when he's done he might go for it an interview and the thing that you need to remember Robert is the difference between these three because this is a short question in any of your interviews right so any job interview regarding data warehousing and all these things you will be asked and you should know all these things right so that's the differences and the types of OLAP cubes so moving on to the next slide so the next topic that we have for today is that up or lap operations all right so we have five different operations and they are roll up drill down slice dice and pivot okay so these are the five different OLAP operations that we can do on our draw will have some multi-dimensional data right so way to our first operation that is the roll-up now before I explain this let me tell you why I am explaining operations first of all so I am explaining the OLAP operations so that you can understand what are the kind of operations that you can do on your data warehousing and the things that you cannot do on your data base right so with the help of OLAP and since data stored in such a multidimensional of fashion these kind of operations can be performed right so first of all let's get started with our roll up operation so let's read the definition first so roll up or something that forms the aggregation or data cube by either clamming up concept hierarchy for a dimension or for dimension reduction okay so when we say climbing up the concept hierarchy for a dimension it basically means you will have a particular dimension here right so in this case we have the dimension of cities so we have four different cities we have bus Chicago New York Toronto and Vancouver okay now supposing you want to go to the next hierarchy in this dimension okay you want to add a couple of attributes here and more to the next dimension of countries then you can do that so if you look at this example here we have converted this dimension which is in the form of cities to countries so USA is a country which has Chicago and New York as cities and Canada is a country which has Toronto and Vancouver as cities so we have converted this dimension into the city dimension of that special in this cube into country dimension right so this is the kind of operation that we've done so we've done that using a rollup operation right so that's the thing and let me first spin this concept or this example so first of all we have this cube here and we have different axis right we have x axis we have y axis and the z axis so these are x axis where we have the different items right so the different types of products there are they're for sale so we have mobiles modems phone and security right so this is the four different items that are there for sale and then we have the time time at which the phase were done like different of quarters or for your quarter 1 4 2 4 3 a quarter for right and then we have the different cities Chicago New York Toronto and Vancouver so this is the ESL axis y axis and this is the x axis now when we say we are doing a roll up your roller corporation we are basically converting this dimension into a different dimension with the different set of attributes right we are changing this whole dimension so in this dimension right now we have these two cities part of from USA and these two cities from Canada so we are doing a developed such that we are doing a rollup on location and converting the cities into countries so if you look in this year we are aggregating these two attributes right so these two different attributes together and these two attributes together so when you do the roll-up and when you convert this dimension then what you get is for USA you have the total number of stages 2000 okay because it is the aggregate of New York and Chicago for 40 plus 1 5 6 0 but whereas for Canada its 395 because only Toronto adds to this and there is a knit from Vancouver all right so that's what we said when we are doing a rollup so we are doing a rollup the items based on any particular dimension right so this dimension we aggregated a part of this dimension and we converted this dimension from Mercedes to country so this is the kind of operations that you can do on the set of attributes that will be present inside your dimensions so that's about the roll-up okay now the next operation that I want to talk about is out of drill down now drill down is something that is the reverse of Road up so what we did in a roll of course we aggregated set of attributes right so let's break down the entire attribute into smaller attributes all right so we can do that by the stepping down of concept hierarchy for a dimension okay and by also introducing a new dimension so if you look at this example it's a similar cube where you have a similar DDS where we have tails for the quarter one we have the four different total years and basically the dimensions the same right so we have the items or types for Russell we have the different time and the different cities okay but when you look at this diagram what you've done is when you say drill down we have there and drill down operation on the time so the time here right the time dimension we have represented each of them by quarters so quarter one or two or three and quarter four but the thing is each quarter can be further broken down into different months so in what one you will have January February and March right so what we've done without the for drill down operation is we've broken down this quarter one into the different months respectively so if you look at this one here if you look at PE or security that has been sold 404 to one then if you look at that then that's been divided into three different months right so in January February and March and how many products have been sold and each of those are three different months so this is the kind of for example right so we've done a drill down on the time so here it was in the form of a quarter and we drill it down this whole particular attribute into three different attributes three further attributes right in two months and similarly we have three other entries here and each of those would be divided into different attributes so through for four different attributes we have twelve attributes where each of the database habit and further divided into three attributes summing it up to twelve in total so that's the other thing so that isn't between a rollup and a drill-down so let me go back to the previous life remnant so in this case what will just be aggregated these attributes and we reduce the number of attributes in that dimension right so we converted them from four number of attributes in the form of studies to two and boosting countries so two attributes in this dimension all right so in this particular dimension we have four different attributes and we've increased that to twelve different attributes by adding three attributes to each and every single attribute here right so that's the thing that we can do with dimension so that's the the next operation that we can do on dimensions right so that is the difference between roll up and roll down all right guys so all through all right great so let's move on then so next all operation I'm going to talk about is out of slice operation right so the slice operation provides a new sub cube from one particular dimension in a given cube now what this means is in any cube of us we'll have three different dimensions right so we have our access here we have our y-axis and we have our x-axis so three different dimensions here and what the slice operation means is with the help of three in the three different dimensions we can use one of the dimensions and break it down into a two dimensional cube right so if you look at this example here here we have only our y-axis and we have our x-axis right so the z-axis is missing now what we've done is we have taken the time and we've done a slice of the time so we've done a slice for only quarter one so here we have different representations for different quarters so quarter one or two or three and quarter for now the data that is displayed here is with respect to only the quarter one and since we have drill down into the quarter and since we have taken a time as a reference we've replaced the time dimension here with the location dimension right so this comes to the y-axis and the whole representation here is only with respect to q1 right so that's what you mean with respect to our slides so you slice one particular dimension and you get into the videos of that particular mention with respect to the other dimensions so that's the slice operation and then moving on to the next slide we have a dice right so the dice operation provides the new sub cube from two or more dimensions in a given cube right so in the earlier examples we saw you know we saw slice right so slice what it does is it you know gives us a new sub cube from one by using one particular dimension in a given Q okay but the ties here it gives us a new sub cube from two or more dimensions in a given Q so if you look at this example again we have the location dimension we have the time dimension and we have our items dimension here correct and what we're doing here is we're using the dice for a location okay so we have four different cities so we are run dicing only for this location of Toronto and Vancouver right so because we are dicing only for Toronto and Vancouver in our final large cube we have only Toronto and Vancouver here correct and then we're also sizing the tile here so we are doing a dice operation on the time with respect to quarter 1 or quarter 2 so we are only dwelling into the details of Q 1 and Q 2 here and again the item is the third dimension which we are dicing and we are realizing it's on mobile or modem so here we have a mobile modem phone and security right so these two are ignored and we are dicing the item only for mobile and modem so here if you see the x-axis we have only mobile and modem and after the dice the y-axis has only q1 and q2 and the racks will have net around on Vancouver so that is the three different dimensions that have been dies right so if you are going to you know get a new sub cube by your making use of two or more number of dimensions in your cube then it's called the dice operation but however if you are wrong getting a new sub cube from your existing cube by your slicing only one particular dimension then it's called slice operation so that is the difference between slice and dice okay so we have the difference between roll-up and drill down then you know the difference between the slice and dice right so that's the difference between these two and then finally we have one operation called as pivot operation so the pivot operation is also known as the rotation operation okay it basically transposes both the access whether it's the X and Y axis it transpose it them in order to provide an alternative presentation of data now if you look at this example here we have these location dimension on the y axis and we have the item dimension on the x axis right now these two are row transpose now that's what a pivot does so when you transpose them your item comes your and your location it comes over here your location dimensions your and your item dimension is here right so even your data corresponding to them are transposed so I hope you guys got Teja will learn the concept of operations right so the five different operations which can be done with the help of OLAP data warehouse so I hope you guys got the entire concept here right so the kind of operations that you can do using your OLAP activities right on your data warehouse so that's the thank you right so these are the five different operations and moving on to the next slide so the next topic that I am going to talk about is that of dimensions all right so the tables that describe the dimensions involved are called dimension tables so first of all to give you an example of what a dimension is in data base you have something called as tables right so you have different tables which will be a part of your database now similar to tables we have something called as dimensions in data warehouse so we have dimension tables which will have a set of attributes so our customer dimension will have the customer details like the customers ID the customer name and the customer address okay and your product dimension will have other set of attributes like your product ID product name and your product type and similarly you will have a date dimension which lab the order date the shipment date and deliberate it right so each of these dimensions they talk about a different aspect of your entire data warehouse so your data warehouse may be one for you know in this example it is that of e-commerce company right and as you can see here it says the e-commerce company here this is the subject and the dimensions are these and these are the attributes right so first of all why would we need dimensions why should we use dimensions right so we go to the table for having different dimensions instead of data warehouse similar to having tables in our database we have different dimensions in our data warehouse why should we have all these things you can have that kind of a question now the answer to that is dividing your entire data warehouse project in two dimensions provide structured information for analysis and reporting right so when you break down your render data warehouse into different dimensions like for your customer Delia's you will have a separate dimension called as customer dimension for your product details you can have a separate dimension called a product dimension and only for other things you can have other different and when you break it down in a structured form like this your analysis and reporting can be very easier why because when you store data it will be in this form right so you will have your customer here you'll have your customer ID the name address and the product that particular person purchased the name of the product the ID of that product the type of the product and then the date as to when it was ordered when was it shipped and when was the delivery completed so you have all these details in a structured format so that's what the benefit that dimensions and data warehouse gives us right so end-users can simply fire queries on this dimension tables this content descriptive information so you will be here as an end user and you cannot just fire or simple queries on your raw data warehouse and you will get the answer that you need so that's what is the use of for having dimensions and that's the benefit right so any doubts here guys that's about the dimensions that's why we should have different dimensions in our data warehouse project all right so moving on to the next slide so the next topic that we have here is that of fat and measures all right now in the previous slide I spoke about dimensions right so we spoke about there being different dimensions in our data warehouse and then we being able to sort them filter them up are using different queries right now how do you think you can run your queries and result because by the look of this table you might not be able to get an idea of what kind of dimensions here to filter orb it's kind of four dimensions here to sort so you have that kind of out right so that's where the whole cost of facts comes into picture facts and measures so factor something that helps you measure your row dimensions so water does let's look at the definition first a fact is a measure that can be summed averaged or manipulated okay and every fact table it contains two kinds of data a dimension key and a measure all right now what this means is or you will have all your dimensions in your data warehouse but if you want to perform any kind of query like any sorting or analysis or some kind of drill down or all these things you can only do it if you have a fact so for every dimension you will have an Associated fact and using the back you can measure your dimensions okay so that's what it says here if you want to measure the data that you have in your data warehouse if you want to measure any kind of data that is there then you use the fact to measure your dimensions and that is the reason why your fact table will have two kinds of data it will have a dimension key which will basically be the link to your dimension table and then it will have a measure so your measure is something that's going to calculate the data that is there in your dimension right so the dimension key is going to let you connect to your dimension and the whatever data that you get from there so whether it's any kind of addition or subtraction or any kind of average that you want to take or the summing or manipulation so whatever arithmetic operation are you perform that should be stored as an measure in your fact table right so that is the relation so it will have one particular dimension key which will connect to the dimension and the results from here what are the aggregates or what are the operations performed the result of that will be stored as an measure so that's the thing all right and it also said every dimension table is linked to a fact table yes now that's the thing now if you want to perform any kind of query or analysis on any dimension then you need a fact table that's what we are talking about in the Geo final line here it is only if you have a fact table can you perform you know any kind of queries or analysis right so that's the thing and it also ruled that for every dimension table you have to have a fact table so you cannot have any dimension table in your data warehouse without a fact table and that is what is the whole process that leads us to having lot of flexibility with respect to wearing and analysis and all these things right so take this example in your fact table you will have one product dimension and then that will have a dimension key and a measure right so your dimension K would be that of Product ID and your measure would be that of the number of units sold right so your product can be your path table and your dimension key might be that of Product ID so you'll have all the different products that you have in your repository and if you want to find a number of units role then you can perform any kind of arithmetic operation there and measure of that will be stored over here as the number of four units or so that is the fact table so that's the whole concept you're right of facts and measures all right so moving on to the next slide we have that of schemas okay so if you remember what we spoke about here right off for about dimensions facts and measures so these are the important topics that will help you understand the whole concept of schemas so getting back to schemas a schema is basically that which gives the logical description of your entire database okay it gives details about the constraints placed on the tables key values present and how the key values are linked between the different tables now what this means is you have different dimensions or if you take the example of a database you have different tables right so you have multiple tables you will have an employee table you will have a department table and how do you link them by using the whole primary key foreign key relationship right so that's the example that's showed over here so you have one table here that is of employee your table and the other one is that of the part mental and in your table you have a department ID which acts as the foreign key right and then you have your department table where Department ID is the primary key and you use the department ID to do any lookups correct so that's the whole concept of first key Mazur in a database but whereas in a data warehouse we follow a similar model of primary key and foreign key okay but we don't use the entity relationship model here however we use the star schema the snowflake schema and the fact constellation schema right so this is three different schemas that you use to establish relationships all right so that's what a schema is I'm pretty sure you all know what a schema is right so you guys should all have a good idea of word schemas and what it does but anyways talking about the different types of schemas that you have in your data warehouse and a couple of slides earlier I told you that every dimension table will have a fact table right so that's what this diagram also symbolizes so if I can recall then over here I told you that a fact table connects look at the data are damaged key and a measure all right and I told you that every dimension over here is supposed to have a fact table is supposed to be linked to a fact table and it would be linked with alb of schemas okay so that is the relationship between dimensions and facts ok measures are something that is going to aggregate the data that is present in your dimensions and it will store it again in the fact table okay but the fact table and the dimension table will be linked with the help of your schemas and your schemas if you talk about schemas in the concept of database then it would be an entity relationship schema right based on that entity relationship model there will be a link right so that will be the relationship but in case of a data warehouse they will be using a three different types of schemas like cost our schema our snowflake schema or fact constellation schema so let's just talk about these schemas and how the primary key foreign key relationship works over here okay in these three different schemas so the first one is the star schema and in our star schema each dimension here in a star schema is represented with a one dimensional table which contains a set of attributes okay so when we say a one dimension table it basically means that there will be a parameter key foreign key relationship between the fat and the damaged table only okay so we'll have your primary key in your dimension tables and you will have the Associated four and eat in your fact table okay so to look at the example of D log okay this is a dimension table and the dealer ID is a primary key okay and into your fact table you have the dealer ID which is the foreign key now similarly for your date dimension table then the data ID right this is the primary key and this is the foreign key here and for your product dimension table model ID is the primary key and your model ID is the foreign key okay and similarly for the branch dimension table you have your branch ID which is the primary key over here and which is a foreign key over here now when we say primary key foreign key it basically means your dimension table can now only one entry so each of your primary key so whether in this case if it's your dealer ID then each dealer can have only one ID right and there can be only one date there can be only one entry for a particular date cannot have multiple entries for the same weight and similarly you cannot have multiple products with the same ID right so you cannot have multiple products with the same model ID and if you look at this example you cannot have different brands or supposing there are three different branches then they should have different IDs right so they cannot have the same ID so that's why we say that the primary keys they cannot have duplicate values right so the primary key over here is branch ID which cannot have duplicate values the primary key here is modernity which cannot have duplicate values and your at state ID and your user ID right so these are the primary keys in the four different dimension tables which cannot have duplicate values but however in a fact table you can have duplicate values right because you'll be making use of the foreign key so using the foreign key you will be referencing to the damage in table you'll be looking up to the dimension table and the because of that you can have duplicate values there's no restriction on not having duplicate values right so we have a foreign key relating to all the four of our tables and those are the four different dimension keys that we have and along with that we have two different measures right so units sold and revenue are the two different measures so revenue we use revenue to find the total number of products that we have sold so we have a product ID will have the name already in variant ID right so each of those products how many have we sold so all those things will be part of our revenue and the same thing with units sold so each and every product that result the number of for those products will be present here and the money that we'll made out of selling these products will be present enough revenue right so that is the whole concept of measures here okay so let me just go back to this slider as you can see a fact table contains two kinds of data a dimension key and a matter correct a dimension key and a measure a dimension key is basically the foreign key here which we'll be referencing to the primary key in your dimension table so that is your dimension key and your measure is basically the calculation or the arithmetic value will be stored based on the operation that is performed on your dimensions so that's what has been explained in this slide right so you have your units sold and you have your revenue correct so these two will be your measures and these four will be your four different dimension keys for four different tables so that's about your star schema so if you understand your star schema then it will be easier to understand the other schema because they are just a slight variation okay but what you got to understand is the main thing is that every dimension table will be linked to the fact table correct so that is the one thing which you have to remember and talking about the next appoint a fact table is at the center correct so the packet is at the center it keys to every dimension table and attributes all right so it has the keys to every different dimension table and it has attributes like units sold and it has revenue as the other measure correct so that's the thing about the start kima that's how this works now we're the next slide we have something called as a snowflake schema okay now it just slightly modified version of your star schema when we say modified in technical terms we call it normalized okay and when using normalize it means split into additional tables so if you have one table which is having a lot of attributes then we can break down this table into further tables so that's what is happening over here so if you look at this table you have your stock table okay over here in your star schema you had only your dimension table and you have your country ID you add your location ID you had all these details okay but inside your raw snowflake schema you have split this delay table into two further tables one is with respect to location and the other is with respect to country right so you have your location ID foreign key and your country ID foreign key over here and they will be referenced to your location ID primary key in your location table okay and to your country ID your primary key in your country table right so that's how things work so since your table is being normalized or split into additional tables it's called the snowflake schema correct the same thing is done here right so you have something bowlers variant ID which is again a foreign key and that is reference to the primary key which is inside the variant table correct as you can see so these four where the dimension tables that were present in the star schema and if you normalize these tables or this data warehouse further then you get your raw snowflake schema right so it's further normalized into additional tables so that's what the difference between a star schema and a snowflake schema is all right so I hope there are no doubt guys if anything if there are any doubts please put that in chat box all right so the other thing about snowflake schema and then now we have one more type of pesky McCullough's galaxy schema okay it's also called as the fact constellation schema and the concept here is that the galaxy schema contains more than one fact table okay now we saw a star and snow like right so here it had a single fact table but the number of damage in tables went different right we had additional dimension tables and there were additional dimension tables which are normalized in case of our snowflake schema but in case of our galaxy schema where you have two fact tables right so you have two fat tables and you have your dimension tables such that some dimension tables can have more than one fact table okay so if you take the example of our branch dimension and your date dimension so these two tables have two fact tables they both have the fact table of revenue and the fact table of product okay but however the product dimension table your okay and this one has only one fact table that is product and the dealer dimension table here has only one fact table that is revenue correct so to help you understand this in a simpler fashion I've broken down this product into another row dimension table right so earlier we had the one product table here and here we have broken down this product table into further a product with respect to product ID correct so that's the whole concept or if you have more than one fact table then it's called a galaxy schema so this would probably help you understand correct so the revenue in product or the two fact tables and the dimensions which are shared are called confirmed dimensions so that means these two dimensions which are rochette by more than one fact table these who are called conformed dimensions correct so that's the whole concept of galaxy schema so guys any doubts okay so guys that was about the gaseous keema all right and those are the three different schemas that you can have in your data warehouse all right so if you're going to work on radar warehouse in a dam in your future then you can formulate your whole organization's data in generative arrows by using any of these schemas okay they are either Rho star snowflake or galaxy schemas so that's it about the schemas and yeah that's it so that's the end of the session and let me just summarize whatever I've covered in this session all right so I first give you an introduction to what is data warehousing I spoke about a water heater warehouses how it is a benefit and what are the activities involved in creating a data warehouse the whole concept of core data warehousing and I spoke about the architecture here and then I spoke about online analytical processing I spoke about the different types of labs right what are the advantages of that Ola has overall DB and the different operations that are there that we can do on the OLAP cubes so I spoke about all these things and then I spoke about what are dimensions fashion measures so dimensions was one concept that I start off from OLAP itself when I was talking about collapse operations so it started there and then we went on to talk about what are dimensions facts measures and then finally a finish off today's session we're talking about schemas right so we had the three different schemas with respect to star schema and snowflake schema and fact confliction schema so this is three different schemas that are used in a data warehouse right so that's it that's the end of the session and thank you guys thank you for attending the session and if you have any doubts please put that in chat box or you can email it to me and I'll get back to you immediately all right so all the best I'll probably meet all in the next session and until then see you take care guys I hope you enjoyed listening to this video please be kind enough to like it and you can comment any of your doubts and queries and we will reply to them at the earliest to look out for more videos in our playlist and subscribe to our ready Rica channel to learn more happy learning
Info
Channel: edureka!
Views: 209,844
Rating: 4.8345323 out of 5
Keywords: yt:cc=on, data warehouse concepts, data warehouse tutorial, data warehousing concepts, data warehousing tutorial, data warehousing tutorial for beginners, data warehousing tutorial videos, data warehousing architecture, data warehouse architecture, data warehouse basics, what is data warehouse, olap servers, data warehouse schema, data warehouse facts and dimensions, etl process in data warehouse, edureka, data warehousing edureka, business intelligence tutorial
Id: CHYPF7jxlik
Channel Id: undefined
Length: 52min 29sec (3149 seconds)
Published: Tue Jun 27 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.