Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Warehousing | Edureka

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys my name is Gordon and on behalf of ADA Rekha I welcome you all to this webinar on data warehousing and business intelligence I'll be your instructor for today and I will be teaching you why any company needs to business intelligence and thus thereby do data warehousing so without wasting much time let me show you the agenda for today so these will be the topic that I will be covering in today's session and this will be the order in which I will be covering them ok the first topic that I will be talking about is what are the needs of business intelligence and after that I will talk about the need for data warehousing so business intelligence is one of the most important aspects for any company to grow well and do good right and data warehousing is among the most important activities of business intelligence so that's why these two things are interlinked and that's the connection these two have so you can think of data warehousing to be a kind of a subset of business intelligence so I will talk about these two things enough that I will talk about the key terminologies that are related to data warehousing architecture right and some of the key terminologies are those of OLTP and Ola the differences between the two okay the OTP somewhat very similar to the databases and OLAP is what represents data warehousing so when you understand the difference between the two you will also understand the difference between a database and a data warehouse you will also understand why a data warehouse suits this is intelligence more than data base so that's about these two topics and then we'll talk about ETL so ETL stands for extract transform and load eatle is a strategy to convert the data from your database onto your data warehouse right so moving the data from one place to another that's all done by ETL so we'll talk about ETL and detail all right and after that I'll talk about water data martyrs and then what's at metadata now these two things are two topics which I can only explain once I have given you an introduction to the other topics all right so any of you tie it out that you have during the session you can ask me at that time and I will clear them right away and once I am done teaching about all these four different topics related to data warehousing architecture I will show you the complete architecture and the complete the life cycle of data and what kind of insights your company can get and what can our advantages you can get out of data warehousing right so data browsing architecture will be the last topic in my presentation and after my presentation I will show you a demonstration of creating data warehouse where I will import data from database and sort in a data warehouse right so this will be the topics of today's session I hope at the agenda is clear everyone if you all agree with me please start knowledge that and also to acknowledge the fact if you can hear my voice and see my screen so if everything is fine I can get started with this session you can acknowledge and put any of your doubts or queries during the session inside the chat box that you see on your right side so I'm going a couple of acknowledgments from my everyone all right so Rodney says yes Roger says yes all right guys and a couple of more people are also able to get my voice and see my screen great so since the agenda is clear everyone let me get started with the first topic that is what is the need for business intelligence okay so will understand why business intelligence and data warehousing are among the fundamental and the foundation for any company's success so why do we have to go for business intelligence right business intelligence is the activity which contributes to the growth of any company and there are also so many MCS which have been established with the past few decades now how did that happen they just didn't happen by luck right so there were all small ideas there were small companies that start with a small idea and then they grew bigger so that's what any company that wants to do good that's what they do the first thing is they plan what they want to be and depending on that plan then they start gathering data okay now once they gather data they know they're in the right direction now so they know what to do and how to do it and then they do further data analysis on that they make up their plans and they come up with strategies they come to know what is the important thing that needs to be done and all these things so when they finally have a conflict and then they execute it into a business action and once those actions are taken then they're all good right that's when the business starts to grow that's when the company gets back all their investment and that's how actually any company grows so any company that has done well over the past few decades of beaut Microsoft or Google or Facebook or Amazon Facebook's all of these companies that have all grown from small ideas and they've become something big right and any startup that's also trying to do great nowadays even they have got off the same strategy and the same plan this is a very common thing and this is something that everyone knows okay but this is not whatever you know come to teach all in this section what I've come to teach all is something about data warehousing and that is one of the most important strategies or activities which is part of business intelligence right so before I talk about data warehousing let me go into details of business intelligence so what exactly is business intelligence BI is the act of transforming raw or operational data into useful information for business analysis right so BR your stance or business intelligence that's the short form and yeah it is the act of transforming any law or operational data so when we say R or operational data it's basically the data that you've collected the data that you have about your business so it can be revealed with your company starting from scratch then whatever data you've gathered so you've kind of got to take that data and convert that into useful information right so that you can plan and make strategies and if you are a company that's well established then you have to look at your passes us how your company has done over the past six months or the past the last quarter or the last year or two and then may come up with proper plan for your future so when you do this then this entire access call has business intelligence and how does it work and this working of business intelligence is with respect to the ID technology okay so VI which is baitul data warehouse technology okay this is the key term that you got to remember the VI is based on data warehouse technology it extracts information from a company's operational systems and the is extracted is first transformed and remediates transformed it's cleaned and integrated and then it's loaded into data warehouses now the thing here is there be data in many forms it can be the form of flash files you can be the form of databases such a running company and if you are trying to do good ok if you've been working for a number of years then you'll have data about your past success about your sales data your marketing data your expenditure all these things you might have served on any form maybe the form of databases may be in the form of for Excel flat file so all these things so these make up your data source right so you have this data over here and this data is first transformed ok it's of course it's first extracted then insert the data warehouse it's transformed we just cleaned and integrated ok so once it's transformed then it's ready for you to do your data visualization or data analysis it's in a form in which you can get inside and this is the data which the end users will be using so you will have your data analyst in a company right your data scientist your data analyst and all the other people your managers your Rob and the other guys you call the big shots in your company so all these people they'll be getting they'll be using this data to make your analysis and that the role of data warehouse it is between these two end points and this serves as the basis or the springboard for success and finally since the data is credible it is used for business insights yeah this is again something that I just spoke about right so now you'll have a better understanding of how business intelligence works right so you all know that this is intelligence is something important and how important and how good it is how does it work is what I've explained in this slide so you all guys all agree with me here anybody has any doubts Rodney Rajesh Jacob okay pretty good all right nice okay so I have a question from Jacob Jacob is asking is it our house the only thing that's needed okay Jacob this is just an overview of business intelligence okay the rough diagram and it's actually not just data warehouse so data we're also something that I'm concentrating on today's session okay so that's why we have data webs there's another important step called of data visualization so that visualization is done by end-users right since it's done over here on mention this diagram but what you got to remember is data warehousing is probably the springboard without data warehousing the visualization cannot be done and data from the source is right from here it cannot be directly used for any other purpose so that's the role of data warehouse so that's what data warehouses and yeah I'll clear take off now okay great that's fine so let me go to the next slide then another you know what exactly is business intelligence let's go to details of data warehousing let's analyze the challenges in achieving business intelligence so first of all why should be user data warehouse because data collected from various sources and certain various databases cannot be directly visualized ok now look at this immature in this diagram we have different databases like Oracle you have the sap base you have the - of sequel server and then you have other databases like sequel database and all these things ok you can also have flag files in this list so all these make up your data sources you as a company can show data everywhere so if you're a small company you might just deal with Microsoft Excel and you might just use small analysis tools okay but occur a big company that has a lot of data coming in so if you're a retail company then you'll ask for details about your sales your marketing and what's been your growth so for all these purposes you need big databases right so all the data will be stored in all these databases ok now the problem is some teams in a company might be using one database and the other teams may be using another database now the biggest problem that people would find while they are doing visualization or doing analysis data is they're in different databases and they'll have a tough time integrating them right now that's where data warehouse comes in and that's where data warehouse course data Pharos it will get data from all these databases and then processes that data and brings it in a form it is very easy to do visualization ok that's what the second point says the data first needs to be integrated and then processed for visualization takes place now this is the problem that you have with the regular databases ok the data from here they cannot be directly use of visualization and since data warehouse can do that since it can integrate data from multiple data warehouses and since that data can be processed easily since it brings the data in a form which can be easily visualized that's where it arose has the advantage that's where its course so that's the problem with data base and that's the advantage the data by rows are in fact little arrows it's more like an act it's a discipline which is followed by people ok these are actions which are adopted and studies which are taken ok that's what rate of warehouses and that is the role it plays in us doing very good visualization all right so now there should be a little more clear few people as to why data barrels plays the key role in the whole BI aspect right ok great let's go to the next slide then now let us understand in details what a data warehouse is now a data warehouse is a central location where consolidated data from multiple locations ok or databases that's what locations me from multiple locations are stored now this needs this is exactly what I explained earlier right so you have got a DES that's coming in from multiple data sources you have all the data you consolidate all the data into one single place and the data barrows is mated separately from the organization's operational database now TWh your Stanford rate of warehouse all right so and here it says that data warehousing is made in separately from normalizations operational database ok and yeah the DW has your Stanford data warehousing and the reason data warehouses are stored separately from the operational database is because the reader should not get affected so you will have your operational data on one end ok where all your legacy data will be stored where all your Raymond probably in your real time data will be stored so all your transactions all your sales all your marketing order operations data all these will be sold in one place and in during the alpha-beta warehousing what you're doing that when you are making analysis when using the data you don't want that to get corrupted right so it's more like a backup so for backup to our purpose your operational data is separated so you have an operational data you keep it in one area and then you create a new database okay in fact it's called a date of bed house okay so you get all the data from multiple sources or maybe from a single source get it into a data warehouse and from here you do your analytics so the process of getting the operational data into your data warehouse that's called extraction transform and loading ok now when you've done these three things you form your data warehouse and from your data warehouse you use the OLAP strategy okay so all apps and for online analytical processing so you use this OLAP or strategy or with this analytics processing for the business users to do analysis so it's there in the name right stands for analytical processing so the business users water analysis they want to do they do it because there is the the observe pull up and then along with the analysis they can also do polarization for visualization you have various the tools like tableau and click you write they are some amazing tools so you can get this data you can get it into the data warehouse and the data warehouse also it can be sort in some kind of database it can store this data back into some kind of Oracle or sequel server or maybe even in Excel and when you have stored there then you can do your lab activities there and also you can import that data into your various visualization tools like tableau or Rock click view and tasks you can you know get insights you can get insight into your data you can download presentations during your board meetings you can show your findings to your superiors or your managers into all these things so that's what a data warehouses ok and then the next point we have your end users access it whenever any information is needed yeah so this again the same thing right so once the data comes in from the operational system it's stored in the data warehouse it stays there so this data is not going to change so whatever change you want to do your operational data that can be done ok you can modify the data you can update it you can delete data here you can do all these things but once your data comes in your data warehouse it cannot be deleted ok you can maybe modify things the worst case scenario you can modify data here ok but it's highly advisable not to but of course you can so that's the thing but yeah the keep on you got to notice the angelus can use any they can probably access data that is 10 years old or 20 years old and all these things and how can they do it by using Ola so they can do the analysis and they can run it over different times right so these are a series of snapshots so based on you can find data analysis like what happened at this particular day on this particular York all these things you can see what kind of product was sold Namie customer spot which product all these details can be easily gathered and access from here so that's what we save and you know the data can be accessed at any time by the end users or the business users so the business users here are typically those managers it can be managers or people who are leading board meetings who are making a plan for the next quarter or the next half year or the next one year and all these things so guys yeah that that was a question which Jacob asked eyes so Jacob ours would be the end users would be the business users so that's what they are okay people will be using this data it can be even data analysts or data scientists and all those people right guys are you clear Jacob okay great and there is no to get the data whereas your it is not do it every time when new data is added to the database so what this means is you have data coming in to your operational data ok this operational data will get updated every minute probably every second if you have a 24/7 working sales team then they'll be making sales around the clock right so as and when any sale happens the data will be added to your database your operational data but that not necessarily needs to be added to your rate of arrows also so what you find in your data warehouse is legacy data right its historical data which you can use to perform analysis or all those find inside the operational data if you have new radar coming in here this has to be imported and this has to be moved to your data warehouse first and then once it's moved your rate of warehouse from here it can be used for analysis and all these things by our end users so that's what this diagram here means and that's what the last point also means okay data warehouse is not loaded every time you read are added to this database okay so I hope it's all here Rodney Rodney and Rajesh okay great so fine then if you guys are gonna said what data warehouses I can go to my next slide and I'll talk about the next topic so was about data warehouse now let's look at the advantages of a data warehouse when we compare it to any database or just regular flat files and all these things the first advantage is that subject questions can be answered by storing trends so this is the biggest benefit that you can get right your raw data analysts and data scientists they can answer strategic questions they can read the past data they can predict your future also by coming up with by having their strategic questions be answered because friends can be analyzed they're using the data warehouse is basically the Riddler stood in an operational data also but it is that so easier to study trends on your data warehouse rather than data base ok because Rajat sure has a question guys Rajesh nothing why not a database what is it that data where else can do that database cannot do so guys that's what I was answering correct Rajesh remember the first thing is let me go back my previous life was this ok so you have an operational data here so you have all your data here which is probably legacy which is even real time all these things will be present here but in your data warehouse you only have your legacy data you only have the historical data you won't have real-time data but that doesn't mean it's not you know it's any lesser than operational data since you have your data completely your you have the freedom to do your analysis and you also have your freedom to do your data visualization ok so that's one advantage and the other advantage with the data warehouse is that your data will be coming in from multiple sources right it will have water coming in from multiple sources your tables will not be related to each other even if it's from the same greater risk you'll have multiple tables for multiple teams right and you can't easily integrate all these tables because they'll be separated ok that is one big problem that you will face when you are doing the analysis or visualization but in a data warehouse it will be stored such that all the data will be interlinked right could be related by using schemas or all these things so you have a different schemas X star schema snowflake schema galaxies schema and all these things so you have all these dimensions and facts all these concepts using which you can relate your row tables you can relate your data so all your rows and all your columns your which are unrelated which are sort of separate databases or in separate flat files separate tables they all be integrated they cannot be stored together so there'll be a relation so every single row or every single table will be linked with each other and when you do an analysis like that then you can probably pull data across the database right so whatever it has stored across the various databases you can put all that data and link them all together by running one query and you can get all those details in this store by just running one single query using your data warehouse so that is the advantage Ratish right so did you get it Rajesh this is the second advantage okay that's why business users they prefer to use the data coming in from data warehouse so this is a more structure and this is the more related data when you compare it with the operational data and the basic data source all right fine fine fine very good very good as it I hope you've cleared your doubts I hope that the same thing with even others right okay even with your great grid Rodney is telling now I've got reading okay fine fine ready night that was a very good question good you asked me that and yeah I'm glad you stopped me because I could explain it in a better way so anyways moving on that was about tour data warehouse and talking about the advantages of data warehouse I spoke about the fact that you know you can other strategic questions by studying all the trends by asserting your past data you can you will have all the graphs you can have the pictorial representations right you can see what was whether the trend is growing or not which practice getting sold how better is getting sold all these things you can easily read by using a data warehouse because data warehouse makes your data more readable information so that is the thing so you guys must own the different wien data information right so information is something that is processed process the data is called information so information is easier to understand easier to relate to and easier to use now that's what data warehouse does it takes you one step closer to information right so that's the advantage and yeah the other thing is data warehousing is faster and it's more accurate yes this is something that's completely true because in your data base you will have loads of data you will of course you'll have a historical data and real time data but the thing is it's run going to be as fast as our data warehouse data warehouse you'll have links there you will have we'll have tables you have relations between the various tables and because of all these things you can easily gather and you can now easily access data here and the data that is gathered from your also more accurate because they only much change because there's not going to be any question of real-time data coming in and changing things around right so you may live so whatever you know processing or analysis it's done based on the past data that is stored in the data warehouse and that makes this data more accurate it makes it more stable so stability is the key word here and stability is not something that you can have all the time in database but you will have it with the data where or so that's the second big advantage and in fact there are many more advantages right so data warehousing is something that you guys will understand when you start implementing so in the demonstration that I'll show you later today that time you'll understand you know you lunch and the other advantages with data warehousing ok and one important point that you need to notice that data warehouse is not a product that a company can go and purchase ok it needs to be designed and it depends entirely on the company's requirement so like I said your data base is something if an answering right your database or your various own your data your data source is something that you have to have and then your data warehouse is something that is designed and which complete depends on your company's requirement based on your data source based on the requirements that you want to get out of your data source out of the error that you have in your data source you can come up with a way to design your data warehouse right so data warehouse is more for concept and strategy and it's not an end product it's not a tool or something that you can use you have multiple tools to implement data warehousing and the thing you go to notice data warehousing is not a product ok so it's a strategy that you adopt to make your data more readable and make your data in a better fashion ok so that's the biggest advantage with data warehouse so look at this guy here ok he'll just run one query on the data warehouse ok now what the data warehouse will do so the data is taken from the operational systems alright and in fact if there are multiple operational systems then all those multiple data from multiple operational systems will be integrated together and then that will be standardized and any inconsistencies there and that data will be removed ok now these are the three important things the data has taken from the operational systems and that data if there are multiple operational systems those will be integrated okay and then the real will be standardized and any inconsistencies will be removed and once all these three things are done then it will be stored in an easy format which can be you know which is very suitable for analysis and access and that is what the data warehouses so whenever you run the query on this kind of data warehouse which is the process on which is ready in such a fashion you can get the result quickly and this result close to be more accurate all right so this is a big advantage with the Year data warehouse so I hope but this clear it's a pretty simple concept and it's just overview of what I explained in the previous slides right right Jacob cadre Rodney all right okay so moving on there are four important properties that a data warehouse has okay and the default properties are based on what Bill in one set bill inmon is the father of data warehousing and initially you define data warehouse as a subject oriented integrated and myriad and non-volatile collection of data in support of the management decision-making process okay so when we say subject oriented it means that the data will be categorized and stored by the business subject rather than by the application now let me get back to this point after I finish these three ok now this the most complicated point okay now talking about integration right he said that it has integrated so the meaning here is data on a given subject is connected from disparate sources and sold in a single place so this is something you people known it has collected from multiple sources and we are all stored in one single place so you don't have to you know go about searching for data and different tables or different sources and all those things and then your data it is time variant it is stored as a series of snapshots each representing a period of time so when you do your analysis you can do it based on a series of snapshots of time okay you can see what was your raw company status on this month or that your or on this month this year what is the progress that has been made or if it's not a progress if it's the same if your raw code hasn't stagnated then you can find out what are the metrics what are the reasons why that has happened you can find all these things and you can look at all those idiots from a time approach right from a time variant approach so is what data where O's the advantage sure is okay that's one of the properties and the advantages are you have and then data is non-volatile the data in a data warehouse is not updated or deleted so this is what is the other property that I mentioned earlier once the data comes into a data warehouse it cannot be deleted or run either can it be changed in fact it can be updated but the process of form to update it is a little complicated okay but of course it can be updated and deleted so that's the thing but it's highly recommended not to operate okay so that's the advantage with the data warehouse and since it will not be changed there is no question of quit getting corrupted and that's why doing analysis and all these things are you know a better option now getting back to the first point we you said that it is subject oriented right there does categorize and store by business subject rather than by the application now what this means is the data here will be stored or the data that you will your that you retrieve from a reader warehouse right you will get in the form that you wanted to now if you want to go me an example of that let's say that we are dealing with a radial company and in my retail company I have a marketing team I have a sales team and I have a Operations team and my system kind of keeps keeps track of all the sales that happens over a period of time okay let's say the last one month whatever sales they've done they have showed all those details and then you have your operations team which will make sure there is a smooth running of all the process once the sale is done gradually the activities involved right like shipping the product and you know all these things shipping and coordinating via transferring activities and all these things and then your marketing team is or someone is probably that team which would take care of for your sales which would ensure that the right leads come in to ensure that the right people get the right the kind of service and it's all about acquiring more such sales right so your marketing team is on top of the funnel and they do all these things now if you want to integrate all these details if you want one single view of them and you want to find details such that in this particular month what was your dose is and what was the kind of operations that was done right what kind of service was given to those customers and from how they became our customers so when we the question of how something related to marketing so if you have a question like this where three factors are involved then at that point of time it's your data warehouse which comes to the rescue because many order questions are related to this particular time and these three different term metrics say is marketing and operations then all these things can be integrated and you can get one single view similar data warehouse this is what a database lacks correct so you know integrating all your different data sources and you know storing them together and making them ready for any time axis is the biggest advantage and the most important property of data warehouse so if you guys had any problem the previous slide then I'm pretty sure this is something you have really understood after this slide because this is probably the epitome of for data warehouse right these three properties if you understand these four properties then you are pretty much ready to understand the next part the next or you are ready to go to the next step and data warehousing right so finally you should have understood how important all this is why I know how important business intelligence is and what can improve the data warehouse place you can just think of how among us our deal a data warehouse is correct yeah so a couple of people are satisfied with that so in order to spoken about the properties let's go to the next slide and okay now we have to talk about key terminology okay so right now we've understood data warehousing from a higher level okay now let's dig deep let's go to mode basics okay let's understand the key terminology that are all related and that are involved in a data warehousing so first of all we have LTP and OLAP okay now there are four things I will talk about the differences between OLTP and Ola okay then I'll talk about UDL I'll talk about data Mart and then finally about metadata okay so let me go to the first topic that is Ltd versus OLAP okay so in this part which is l DB over DT stands for online transaction processing okay now this is something that is the representation of data fir database if you're running any kind of queries on your database then that's called online transaction processing okay and then o is last stands for online analytical processing and this is the property of a data warehouse so any kind of query or any kind of analysis that you run on your data warehouse that's called as an OLAP activity correct so let's go to the differences between the two so first of all any data that is stored in a relational database right in an which involves OLTP that contains the current data as well as past data okay current data as well as path data but with respect to data warehouse and while performing an OLAP you will be dealing with only historical data here okay it contains only historical data and the data that will be stored in your database okay when you use the OLTP then those queries will be useful and running your business okay when you have to run your business like if you want to store the data of the number of sales that has happened today like every time a field happens then your records in your database as we updated right so that's what we say so when you update your record with the latest details of your customer then that is what is the meaning of useful and running the business okay but in order for something that is usually analyzing the business so here the kind of activities that you would do is that of finding out details like at what time how many customers bought the products or at what time which customer but which all products all these kind of questions will be answered with as both hola okay and then the whole OLTP model of accessing data of accessing or querying data on a database is based on the entity relationship model okay but whereas with the data warehouse it's based on the star schema or the snowflake schema and the fact constellation schema okay so it's called also called as the Gatsby schema so all these three things will come into picture okay and then your relational database it provides a primitive and a highly detailed data so since you'll have one database if you run one kind of a query like a select star from this particular table then it would give you all the details that are stored right so you cannot filter too many details with respect to the data that is stored in a database okay of course you can but the level of you know the filtering and analysis that you can do is not that much so the reader that you will get back from your query write the result that you will get back from a query will be highly detailed okay and it will not be exactly what you want so it will not be that accurate but whereas with pull up when you do an OLAP on a data warehouse it provides summaries and the consolidated data Neverland point to you exactly what you want to look for correct so it will it's a very processed data and it points to one particular aspect which matters the most so that's what this is and then the oil TB you use this for writing data into the database okay so ever like I said the same sales example whenever a new sales happens your database has to be updated right with the new records of the product sold of the customer who bought the product and all these things so you basically use it for writing data into the database but your data barrows is primarily used for reading data from the data warehouse so right into the data warehouse or something that is done so that you can do the reading from the data warehouse ok the primary concept here is to read the data from the data warehouse and to do the analysis and all the visualization actually but with the database it's more of writing the data into the database alright and the size so speaking of the size a databases size would raise anywhere between hundred me to 1gb ok and this is also a very big number 100 MB is a very less number I would say and one tip is also very less so it would typically be much more than this but come back to this range if you look at a data warehouse a data warehouse is ranges from 100 GB to 1 TB correct so your data wells will have all the historic Durant and it will have all the relationships between the different data right such that so you can do your analysis straight away so since it makes all the data more efficient and stuff the data here the size ranges from 100 GB to almost 1 P so that is what our data warehouses and that's the part of data warehouse all right and I can actually show you the difference between the two in today's demo session ok later during this session I'll show you that the size of the source file that I have added with the different sizes of the two files that are they using as the source and then after the UH processed the data are all sorted in my data warehouse I'll show you what is side of that data so there will be a big difference between the two okay and the red arrows will be modes in the database I will show you that aspect later all right so and that's about this point and then yeah of course this will be fast ok database is fast and it provides high performance alright so your data where owes of course it's not as fast as your database but it's how you sexy it's highly flexible because it gives you different views so you have something called as the OLAP cube right so using the OLAP cube you can get the you can look at insights from different angles different perspectives and different views of data we get so that is the big advantage sure okay and the number of cycles that is accessed it is intense but whereas with the data warehouse the number of records access is in millions all right an example of this can be all the bank transactions made by a particular customer so if there is one customer and whatever transactions he's made you will get all those or you know all those videos right so supposing take the example of any email statement that you request for any email statement or a bank account statement all these things so they are an example of OLTP so whatever radials are present in the database those will be given to you but whereas the bank transactions made by your customer at a particular time okay this is a most filtered query and the answer also will be a very accurate and point Q to exactly that particular question so that's what an OLAP is so it's normal to be one overview here is one exactly point you to what you want it's not very much in detail but it's more accurate correct so that's what an app is and that's the difference between LTP and OLAP okay so there are two strategies that are used for you know accessing data well DB is used for accessing data in a database and we'll appreduce for accessing data in a data warehouse alright so I hope I mean this is cured everyone all right guys okay if you have any doubts then that will be cleared by this slide okay because the examples are there for both these strategies so if you want more examples of an OLTP then one would be that of a supermarket server which records every single product purchase at that market okay so every single perk in their history or probably in the last one month all these things can be accessed using your OLTP okay from your database so you don't have options to do much of filtering here and then another example that of a bank server which records every time a transaction is made for a particular account every time or transaction is made in a particular account the data will be updated in that table and you can create that kind of data and you'll get that result okay another example is our railway reservation server which records the transaction of a so whatever equities that passenger does so all these things will be recorded by the reservation server right so this is one example but when you look at the OLAP there will be much more detail the query juror will be much more detail and the answers will be much more accurate and very crisp okay so an example is a data from bank manager wants to know how many customers are utilizing the ATM of his branch okay because maybe based on that he may take a call whether to continue that idiom or relocate that idiom to a different place right so this guy this bank manager would want to one of us for understand if there's any use and having the ATM in that place are people using it correct if people are using it then how many people are using it at what time are they using it or would it be better to have the ATM in a different location where it's much more easier for people to access it so all these kind of questions will be answered only if you have the oil up in place okay only when you are a data warehouse in this not with the database so another example is that of an insurance company that wants to know the number of policies each agent has sold okay this will happen a better performance management of agents so you'll have multiple agents in your company and you'd want to know their performance right you know you would want to know who is the best performing agent you would want to know how they're performing why is this person performing better you can you know optimize each of their performances so all these things can be done with the help of data barrows and they can be done with help of Allah right the OLAP strategy the OLAP activity that is done on a data warehouse so that is a difference between the two right so I hope you're getting the difference as your guys right OAP and OLAP okay since it's all clear let me go to the next slide now the second important terminology that we have with data warehouse is that of ETL extract transform and load so by going by the definition usual is the process of extracting the data from various sources transforming this data to meet your requirements and then loading the data into a target data warehouse okay you are exciting the data from here right you're extracting it and then you're transforming it into the way you want in a more readable fashion in a more relatable fashion then once that is done you load the data into a data warehouse and the whole process of for getting rid of from your data source to your data warehouse this done by the EDL the activities of extraction extraction transformation and loading so we have got popular tools for for this very process so you have tools like talent informatica you have open all these things and informatica and talent are probably the most popular tools for this process for extraction transformation and loading data into a data warehouse right so this is something that usually understood by now and it out guys because I don't want to waste much time I want to go to the next slide and teach the next concept ok great so the next one is data Mart okay now if you've understood so far to LA ETL then half your job is done because data Mart is something that's very close to a data warehouse and you don't have that much of a difference when it compared it to our data warehouse but the basic difference between the two is the data Mart is just the same data warehouse itself but a smaller version ok so let's look at the line the definition here the data Mart is a smaller version of the data warehouse which deals with a single subject ok the data Mart's are focused on one area hence they draw the data from limited number of sources and the time taken to build the data match is very less compared to the time taken to build a data warehouse now to give you an example or an explanation of this in your data warehouse you will have all your details right all your raw periods that you have so this itself is more refined ok but your data mods are smaller versions of that data warehouse which is used to satisfy only a certain users supposing you have one particular user base ok that is your sales team or your sales manager who wants to use only or receives data then that can be done with the help of a data Mart and similarly you have a marketing team who wants to access all the marketing data then that can be done by using a data Mart ok and then again you have other operations team that wants to do wants to access the operations data then you can give them only access to that data by using data Mart separately ok now the need for data Mart is first of all your data warehouse will show all the data ok your sales data marketing data operations data so all the data coming from different data sources so you'll have multiple data sources right all that will be stored in one single place and from here this will again like separate data sources right so for the sales team this will act as a data source for a marketing team this lap as a data source and then for the operation seen this Mart will act as a data source now the now as in the question as to why we need them it's to you know probably more security to enable more security and integrity because since all your data will be there in your data warehouse and if you let all your your entire company access the data warehouse they will get access to all the details and all the all the work that is done by other teams okay so there might be multiple teams and every multiple strategies which you might not want to reveal other rates so at that enough time you can divide your data warehouse such that only this particular users okay your CL user or a particular user base gets access to only certain data from your data warehouse so at that kind of a time a data Mart is useful or you can have a second data Mart for only another set of user base for example your marketing users so they will get access only the marketing data from your data warehouse right and then your operations data similarly which can be accessed only by your operations force so that's the different advantages that you have since the data is all divided it's all stored separately different people have access to different different parts of your data warehouse so this probably brings more advantages right so that's the data Mart and that's why I told you that you know if you understood so far till then half your job is done but data Mart or something like that extends the functionality of your data warehouse right so that is the thing and stinking off the differences that is there in this table the data warehouse will store the enterprise-wide data right the enterprise of the enterprise wide data whereas data much will store it departments wise data there'll be multiple departments in the whole company in the enterprise and they will sort department-wide data and then the data warehouse it will have multiple subject areas okay but a data Mart will have a single subject area there'll be multiple data sources in case of a data warehouse okay but in case of data Mart's they'll be limited data source in fact they can be just one data where all right this will act as a source pure data match but however we have gossiped limited your because not always do you need a data warehouse there are also instances where your data source itself as an input data model okay now that is something that you will answer next slide okay so just don't get transfused when we see that limited resources it does not mean this is the only source they can also come from a data a property a source like a flat file or from a data base and all these things all right and then a data warehouse it occupies largely this of course because there is a lot of data on to price where data will be stored your multiple subject areas will be dead because of that there's larger data here at state and that's why there is greater memory that's occupied okay but in case of your data mod it occupies limited memory because it's very crisp and limited to only a particular department okay and then the other thing is data where else is long it takes longer time to implement but a data Mart is a takes very short time to implement it is once you have all your data warehouse or stuff in place you can easily divide them by creating different data mass okay so be a tough part is your data well so once you've got a data warehouse ordered you can easily from your data mass in fact there is even the other way there are also practices where you first bring your data mass and once you've done that after that you create one single repository and that's when you create a data warehouse so there's also two approaches here one is the top/bottom approach and the other one is bottom-up approach so those are the two things and I'll go to details about these two approaches in my next session ok of course I can't do it today because we have very limited time all right so moving on to the next slide ok now speaking of the different types of data mass ok this is what I was talking about in the previous slide so you have three different types one is a dependent data Mart the other one is independent data Mart the third one is hybrid data mod you are dependent data matters are the data's first extracted from the OLTP systems and then populated in the central data barrows and then from this data warehouse the Ritter travel to the data Mart so look at this example ok this is the standard practice or the year the default approach where you have an OLTP source then you get the data into a data warehouse and then from the data warehouse you form a data model ok we'll have multiple data mass where each different Mart will have particular data from the entire data warehouse ok this is the first regular approach and then you have the independent data Mart which is a slight variation compared to dependent so here the rears directly received from the system okay you don't have a data barrels in place over here that is what this line means and this is suitable for small organizations or smaller groups within an organization so basically an organization which is very small it might not need to go to the trouble of creating a data warehousing stuff so you can just have an OLTP source and from there they can just get the data in onto a data Mart and they can use for various purposes okay that's what we have an independent data Mart for so that's the difference it just does not involve a major data warehouse so directly the data goes to a data Mart in fact you'll have multiple data Mart over your you'll have great or not one dat amount to one stuff okay which they coming directly from the OLTP source and then you have a third type which is the hybrid data Mart so by definition you might know what this is right by your the name of by the name itself is pretty obvious it's combination of these two the data here is fed from both the OLTP systems as well as the data warehouse okay so look at this example for that tour instance you three are here it's coming from the ODB source as well as from a data warehouse so this is what the hybrid data models and depending on your company depending on the size of your company the requirements of your company or your organization you can choose one of these you can model your entire database and data barrows in any one of these models either the dependent or the independent data mass or hybrid Radames alright so that's about the different types of data model so moving on to the next slide we have something called as metadata now people here from programming background or from the technology background you might all be aware of what America does metadata basically is defined as data about data okay it contains data about where your actual data stored supposing you have your raw data right where is that data stored what is the size of that data so these are the answers to these kind of questions is what will be Plus on your metadata your metadata will contain the location of your actual data it is not n the size of your actual data it will contain details like which was the source it came from and when was it created all these details will be stored in your metadata right so that's what made our IRA's so that's how different our metadata is from regular data and a metadata is specifically in a data warehouse it defines the Soraa there is a flat file a tional database and other objects so the reason that we give so much importance to metadata in a data warehouse is because take the example of any company that's having a 24/7 business okay they have a rolling system that works throughout the clock the 24/7 they are sales coming in data will be going into your database okay now in this case you cannot always you know keep adding data into your data warehouse because you know that data warehouse is not real-time correct so you have to manually update your data into your data warehouse probably at every day at a particular time maybe at 6 o'clock every day in the morning or maybe once in the night at 10 o'clock or at basically at regular intervals you've got to so data into your data warehouse and this whole strategy becomes difficult because you have to do this process every day so every day when a guy comes in in the morning he has to look at the new data that has come into your data database correct and then from that data base which is data source he has to add that adagio data warehouse now this process becomes difficult that is where a metadata comes in handy with metadata you define the address where your data sources okay you define the address of your raw of that file from my data's coming in or your relational data from measure data is coming in and then you can also impact us or the metadata of your raw data warehouse where you want the data to go so it basically Ellen you know see a lot of your time okay and this is the most common and the most you know unspoken fact about data warehouse so any professional that's dealing with data barrows you would always be using metadata because it saves a lot of time because every time you cannot be importing data from a database you always have to get it from your metadata by defining rules and defining your source and your targets correct so mirror it has gives lot of your time and this is something I will show you a demonstration on ok and when I show demonstration you will understand how easy it is so we just read the rule says that you have your source over here and then you have your destination over your target over here and once you've written this thing you don't have to go back here so every day it might pick up the data from here and it might move your data warehouse and all these things it will update all the new details which are present in your database and it will add it into your data warehouse so that's what a metadata does it's a very big advantage and it's one of the best sergey's are being okay and then the final point here is metadata is used to define which table is source and target and which concept is used to build a business logic called transformation to the acts alone but yeah this is what I told you right so it's used to define which is your source and target and how to build your business logic called transmission so transmissions basically your rob act of converting your source data into the form that you want to and what is the logic that you use correct so all the different the filter criteria all the transformational criterias all these things can be also done using the metadata correct so even the process of forgetting data from your source your destination it can involve extra additional steps which can save your time and default process basically so every time you have data in a particular format you might want to store it in a different format and for all that purpose you can use a metadata right so since things will already be defined you're the work of metadata is just to get the data from the source define and do the set of activities that is required and which is already defined it will perform all those activities and it will sort in the case where you want to do so that is the role of a metadata correct so many data is very very important very very very highly used and it's actually the most important or let's say the epitome of data barrows okay this is probably the best thing that can happen to data warehouse right so that is the thing about the major a degrees now going to the next slide we have the architecture so now that you know everything okay these four raw terminologies are in our removin enough for you to understand the architecture okay so let's understand what the architecture is so this is the entire architecture diagram okay so this is what you know data comes in from various sources it can come from either a database or in the from flat file and then that data an action of ETL will be performed on that data and it will go to the staging area okay this called the sitting area and this is the staging database and the rails are stored over here it is temporary data before data completely moves to the data warehouse it will be present in this area okay and that is done by using the act of ETL and also between moving to the data warehouse the process of idle continues so ETL process starts over here and it ends over here okay and between the conversion it is a temporarily stored in a staging area and this is most often present inside the ETL Coolidge okay like your talent order semantics and all these things and then this data will be stored in your data warehouse so whatever is the extracted transformed unloaded it will be loaded into your data warehouse and in your data barrows you'll have metadata okay and of course you'll have your raw data and then you will have your aggregate data okay and this is the reason why a data warehouse is generally you know it's a larger in size because it has not just raw data or data base which from where the data is coming in it will only have your raw data okay but your data warehouse will have additional stuff here it will of course your metadata and your aggregate data and together all these three things together is what helping you being you you doing your analysis sooner okay this is what powers you're ready to do OLAP online analytical processing okay so that's what a data warehouse is and that's what it stores and there's this data warehouse you can either your entire company can use the data warehouse or if you want more for security based access then you can divide it into a different data mass where your sales team and your different teams like here we have a purchase table and then you have a stock table so these are like three different maths data mass for three tables like a sales purchasing stock so you'll have different tables here and different teams can access different set of tables okay your purchase can be something that's used by your operations people right and then this may be something that's used by your raw sales people and your stock may be again used by your operations or your sales group so that's what each of these your you know defines you have your user group one user group two and user group three and each group will be getting access to different parts of your raw data warehouse because your data valves will be divided into different data Mart's and your data your different groups we could access only that data which they want to or which they can get access to right so this way no group can get access to every data that is personal data warehouse and there's advanced there's a little more data security in this is okay same thing over here so this group get action into this data and this data and this table or this mod right and then this user group gets accessed only the Riddler is Belgian they smart and then this Mart so that's what the entire architecture looks like and resolve the data flow is right so if you guys have understood this much then you're ready to get a demonstration correct guys and this is also my last topic in this presentation so right now I can go to my demonstration and what I'll be doing in my demonstration is I'll be using talent okay so I'll be x importing data from my database I'll be using be getting it from my Oracle database and I'll be showing that into a data warehouse which can be ready for any kind of analysis or visualization on any other visualization tool okay so this act over here which I'm going to show you this is what powers your business intelligence right so are you guys ready for the demonstration any doubts your guys okay great Ronnie says he's already he's all pumped up already Ronnie that's very good so Jacobs also ready and so is Rajesh are you Rajesh can I get a yes from you yes very good very good olive oil Rajesh so let's go so let's go to the next part of my session and there's going to be demonstration where I will populate or detail barrows okay I'll be using talent for the data warehouse activities and let's see how we can import data from waste data sources and create or data barrows so for our hands-on session I'll be importing data into my talent bi okay like I said earlier we love using my talent bi for night data warehousing and ETL processes and I will create a data warehouse out of using talent bi okay now the data set that I'll be using is that of per 10,000 row table and 3,000 row table okay so the there be one table having 10,000 customer details okay and then there'll be another table having 50,000 rows of transactions which each of those customers make okay now based on this data we have to find those customers who have the lowest number of purchases okay so right now my dataset is present in my Oracle database okay and so I have two tables and both my tables customers table and my transactions table is present in my Oracle database I also have them in my here excel so first let me show you how they are present in my excel file okay okay so this is my freshmen table as you can see we have all these days right we have all these fields here so we have customer ID we have customer name we have the address of that customer city the country is from the contact number and his email address right and if I do this then you can see that there are almost 10,000 cutoff fields here so 10,000 rows are there here so this means that you know I have it's a pretty big data set and I can use this in competition with my other table okay so here we have customer ID as the primary key okay and if I go back to my other data set that's my transactions table you can see that I have furthermore freeze your Abdul for invoice number of a stock code I have description app quantity I have invoice date unit price customer ID and Product ID okay so there's a customer ID over here also so the customer ID is the primary key for there and this is the foreign key over here so basically the customer ID is the same and I will have to do a lookup to that table with the help of my customer ID okay since these are two different tables I can create my data warehouse with the help of this particular primary and foreign key concept I can create a link and I can create I can probably just link these two tables with the primary key for a key concept okay so any doubt guys I'm sure that you all know what a primary key and a foreign key is and I fed all anybody has a problem please let me know okay so Rodney says I don't know what's a foreign key okay so see Rodney the thing is we have something called as the primary key and a foreign key okay and we use these two columns when we want or when you want to use a combination of two different tables right so if you would consider the example of this one then you can see that there are a number of columns here and not everything is matching with the radius in that put in the other table okay so only customer ID is the only common field okay so there's a customer ID here and even in this table we have a customer ID now the thing is the customer ID is a primary key because it's it's constant and it's unique for each and every single record over here so each and every customer here will have a separate customer ID okay so no two customers will have the same customer ID okay and over here the transactions here which has the inverse numbers the description of the product data spot is the stock code the quantity all these things are sorted by the customer writing okay so if you find a particular customer ID more than once then well here you can find the person already more than once that's because a customer would have made more than one transactions he might have you know come on to their no we might have bought from the same person more than twice or thrice or four times or something like that right so supposing I go back to this person and I buy his products since its radial so I buy products say 50 50 times then elapsed for three different times my customary will appear here okay but however URL it will not appear again because it's a primary key right here so that's the difference so here the uniqueness is there that's why it's called the primary key in this case and here there is no uniqueness for the customer writing okay can appear more than once and that's why the foreign key over here and what we're going to do is for every customer ID over here right so for every customer ID over here we are going to link it to this particular table with the help of this customer ID since the customary is the common field in both the cases I can make a table which would show which customer which would probably display all these details along with the other columns okay now that's because I have customer ID which is common in both the tables and using that I can link the freezer are there in this table and which are there in the customers table right so that's what I'm going to do now the similar thing is also there in my or actual database so let me show you that so this is my your sequel developer GUI okay so I have two tables that have customers table and I have transactions table right and then I have a final table but this is not something which is relevant so this is not something we'll be using but we have the same two tables customer table and transactions table as I showed you you have these two here and you have almost the same fields over here right because if you see the customers table we have the customer ID we have the customer name we have the customer number and the email address okay and here we have originally we have the address city and country now what I did is when I imported this excel file into my database I did not I ignored these three columns right I didn't want these details because I was not going to use them in my data warehouse so if you see a problem statement here it money says you have to find out the cosmos grab the lower number of purchases it doesn't mention that you need their contrary details or city details or anything for that matter of fact so so because so for that reason in power using the country city and address fields would be your ways so that's why I'm not imported these into my data warehouse I have only the customer ID the customer name my contact number and my email address fields okay this would be sufficient to find out the customer details okay and to find the purchases right and to see who has all the lowest purchases we can use the other table so in the other table there is a transactions table we have this quantity field right we have this column this quantity table basically represents how many products this particular person purchased so this person with this customer ID process 71 products right and then you have another person whose purchases 67 projects products and so on so you also have people who purchase only seven or nine okay now our problem statement is such that we have to find those people who have made unless a very less number of purchases okay now if we define less to be ten then we can extract the data such that the data warehouse gets the details of only those people who have made purchases of less than ten items per order right so per transaction whoever has made less than ten number choices so this person has made seven and this one this person has made nine right so we can get a list of those customers right using the customer ID in go to that customer table and get his name we can get his email addresses phone number and other details okay and we can now publish all those things and we can probably export them to another CSV file like this or we can put that into our database basically into anything which would support data visualization right so that's what we want and this function is what our talent is going to do okay and what I'm going to do for that purposes I'm going to do a lookup from this table onto my customers table right so this is my transactions table and from here I'm going to use the customer ID field to look up to my customers table to obtain their name email address and phone number and all those things so first of all let me introduce you our talent interface again so this was the Oracle sequel database which I showed you earlier and this one is the talent right to talent open studio this is basically the data integration version of the project okay now what I'm going to do here is I'm going to show you how talent works I'm going to show you the layout over here first of all so on the left here you can see that you have something called a job design we have a standard you have demo and then you have various other projects here right so all these are the different projects that I have or created okay and when we say job design so job design is what you use to create your jobs on the UI here okay so in our case in talent whatever the actions are we performed it we gotta do it with with respect to a UI so we don't have a quoit this is not a coding interface okay so of course this talent runs on Java but it we don't but talent is known for its GUI okay so we can drag and drop the items on the job task so this is the job design task bar and we can first of all create a new task ok supposing I wanna create on your radar arrows by importing videos from another database then we can start by creating a new job design okay okay and then create a new project in that job design so right now we are in my your local project so I'm going to create a new job design okay I want to say create standard job let's give a name to this particular job so let's say data warehousing bi session ok and demonstration is the purpose so I'm just gonna copy plates the same videos into my description field also and I'm gonna say finish ok now what you can see this is a job designer so this is where I can get all the data from my different databases or fields files and get them to my data warehouse so I can run my I can write my projects over here and then on the right hand side you have your pilot okay so from your palette you can choose the files or the database from which are you from where you want to input your data ok supposing you want to import from a flat file then you can just go to this heading file here so under file you have input right so under input you have all these options so if you choose any of these options here and if you just drag and drop them over here then you can use them as a means input data using set the path where the exit file is stored okay and you can choose that and when you the Center for the fields you can get the data in from these excel files to your data warehouse okay to your talent okay but exa file is a very simple thing and I don't want to show you that let me show you how to import this large data from our database right so we have also other options like big data big data intelligence intelligence we have business we have cloud right so we have integrations with numerous technologies here and what we would be considering now on is that of databases right so I have my Oracle database so I'm going to go to a rattle under databases and under Oracle I'm going to say tea or rattle input okay let's see where is it articulate current put is right here so I'm going to just click on to your input and paste it here okay I'm going to drag and drop it here and now that I've done this I can use this as a means to get my data in from from my database okay so this is my Oracle database right so let me contribute my input first so first of all I need to configure all these details with respect to my host the port number on which its hosted the database name the username the password and all these things okay so let me first start off from the host it's hosted on localhost right it's not on any server it's not on any remote server so one is going to say low close and the port number is 1 Phi 2 1 so these are details that I got when I was when I installed this Oracle sequel server right so you will have to probably enter the same details when are you create one and one fight 1 is actually the default port on Metro at use what happens usually was it so you should have any problem there so database you're this exit in my case so ok and then my username is important and the password is something we have to set over here ok all right then I've done this my connection type is our atlas ID my database version is 11.6 alright so all the deals here are correct now let's check my schema the schemas basically is what is going to you know map your fields from your database to your they are talent right so this is your you so we have to define a schema here to add details okay so now that we have all these filled here now let me go back to my database right so we are in my customers table and in my customers table I have four pins customer ID customer name customer number and email address so let me add these four fields in my talent okay first column is going to be customer ID this is going to be customer name this is going to be customer phone number this will be customer email address all right so that's all fine basically the data type that is coming from is all bad care okay so if you see here the data type that is it currently is in is back at okay they have different sizes oh but email address beat email address or contact number or name or ID all are back to type so I'm gonna just say the input also to be as bad care right so it's all bad care - everything is technically work attitudes it's picking up the same data type by default all right so let's define the length over here - let's say the customer ID is going to be 5 characters in length I'm going to say the lens for the name is going to be around 30 phone number to be around 20 or technically it's 10 so let's do a buffer of another 10 and say 20 email address is going to be another 20 characters okay the customer ID let me just increase the length through 10 so these these are the lengths of the fields which are going to be stored okay so I'm going to set the length over here and I'm going to say ok now whatever do is I need to set the default query okay so I'm going to say select star from customers select star from customers okay because customers in the name of the table in my Oracle database right so I'm going to choose the table name here I'm going to say the table name here also okay there's something I forgot earlier you got to mention the table name that you want to import from and you order specify the query you want to run okay now let me run the same query in my Oracle and let's see what details I get so I'm going to feed the same query here okay so it's going to be select star from customers so when I say this I have this table option here okay now let me just run this query okay so I've got the same radius in here also right so my table name is customers so let me put my talent and set that its status customers already and this is the quays in korea be running here buddy hold all right I'm going to save this so I have set my first data and write it so I've gotten a cosmos data and here so let me say another T Oracle input and I'm going to configure another input for my transactions table right so I have another table for transactions so let me set that also and as you can notice every time you add a new input or configure a new output then you'll have to add these details right you have to add the hosting and database name username password on all these things so instead of this you have a shortcut here so you have a shortcut with respect to mater it are some metadata something that I explained earlier right so it basically contains data about your data so under metadata tab if you see you have DB connections you have file all these things right so so if you want to create a standard input for data coming in from a particular database then you can just set it once over here okay so onto your DB connections you can set one connection and it can be using any number of jobs in this particular project so instead of four so the same probably laughter enemies enter the details of this particular table okay but if I set the same thing through my metadata here then I can use the same metadata every time I want to import data to my talent warehouse so what I'm going to do is the same let me show you how to do it through metadata I'm going to say create connection once the name of the connection is daily input all right let's say database input purpose to get data from Oracle to Ireland let's have the description also and say finish so k we go to the database type over here okay now the database type that I want to import is that of Oracle right and I'm going to do it with the help of Oracle with the service name so when you choose that you have your login you have your password and all these details so in place of your login give your username here so in my case it's worse than okay password is this one the server is localhost and the port is one five to one so is name is X II okay now to verify if the connection is correct you can now click on check over here so as you can see the DB input connection successful messages come okay this means that your tunnel be able to successfully communicate with your Oracle which is hosted on a port number one five to one okay so now that I have done this I can not straight away say finish and add this metadata over here okay now I'm just going to click on the drag and drop this DB input over here when I do that I can choose this to be for either input or output or anything here because under Oracle you have all these options right so I have chosen this metadata for any of these options your so I want to use as input I can do that also if I want to use that output I can do it even at that time so I'm going to say to your our input and I'm going to say okay now as you can see my DB input here this is a meta data which has these properties already input okay so in case of this one I had to give it manually but here in my metadata I specified it once and it's already replicated so the advantage here is that even if there are ten different tables from which you want to get data in then you don't need to add a ten different database connections in your row talent right when updating your metadata if you create the database connection once and then using that one time you can establish ten different connections you can now get it and from ten different tables or replicate the same thing again and again so that is the advantage you're right so that's a thing so next let's give the table name here okay so the table is the transactions table right so that's what I'm going to enter here and the query that we have to run is select start from transactions i ruinous transactions okay now let me also verify if this query is running in mind my sequel developer okay and let me run this query and yes it able to fetch shooter from my transactions table also now going back to my talent now that I've set this let me go and edit my schema now okay so where do I have my schema option okay it's right here so I'm going to say edit schema I'm going to consider all the different of columns and fields over here because I'm not wrong there are five different columns your invoice number description quantity customer ID and product right so I'm going to rename everything here I'm going to say invoice number description quantity what else do we have we have customer ID and Product ID right all right so even the details here have been set to work l okay now this basically means that I'm getting the data from a database and the reader type of that is back here right so as you can see from here mana click on this all my data all the fields have that care to wait attack so that's why I'm setting them as like a 2 over here and I'm gonna set the default length for everything to be as well not for everything so for invoice number let me give a length of 10 for description and want to give a length of hundred for quantity let's say a 5 for customer ID it's going to be 10 and product ID again let me give a length of 10 so I want to say okay so my schema is basically set for both of these files now what I've got to do is now I have the controller two inputs now have to get these inputs into my talent now to check if my data is coming in from my data sources appropriately let me just check the T log row right so the component name is a deal or oh but since I can't find it or I can also search for it here so T log row then I just put enter I have these options right so in this captain Steele oh go and drop it here now you can create connections from here you can say main and you can establish a connection over here okay and similarly I can use another T log row you're okay drag it I can paste it here and check actually no I won't do that check of my data coming to my tea log row from here right so my component is added over here so let's say so let's just check the schema once so as you can see these are the details that are entered in my source and it's already been mapped since I gave a connection between to a pretty mighty log row and my input one these fields have been mapped here also so I'll be getting my customer ID customer name phone and my email details you're okay now let me just stop you conocchia go down and here the field separator would be this pipeline I'm going to say print header print content with log4j all right so now let me just stop save this and let me run this now you can see the part near right when I say run it would build my job basically and should be able to import my data from there as you can see all my data is being imported to this let me just minimize my screen so as you can see all my materials are here right so customer ID customer name customer phone and customer emails where the four different fields that I'd configured to enter into my t logo from my input and they have come there D headers of these and the radius R so for customer ID its thousand and the customer name is this one here and then his phone number is right here right so this is his name here this entire row and things like that so each and every feature is separated by a pipeline so that's because I had in the component editor over here because in the component editor where your ad set we feel separator to be pipeline okay so that is the pen so since my component editor has the pipeline's for every field in my data that is coming through my Teague log row it has been separated by a field separator that is pipeline now if I just go back here and you can see by just clicking on run over here or over here you can get your output okay this I've just done this to show you how trade up comes in okay so now that our data is coming in successfully I'm just going to delete the steel over okay because we don't really need it for our demonstration so you can just click on this clear here to clear your lock screen so now that we've established our connection what we got to do is we've got to merge these two tables right so we've got to create a lookup field and I've got to make sure that my tables are joined so for that we have another component you are called tea tea join and when you hit Enter you have this right so under processing you have this option of tea join so you can just drag and drop here and what this would do is it basically join these two different inputs so we have the source input of this one and this one right so it would basically merge the two of them so let's first create one link over here to my tea Jane and similarly let me create one over here from my input to 290 low now if you can see here instance this was the first connection that was established this took up the name of main okay and my second connection took up the name of lookup now that's because this table here this will look up to the customer ID field in this table and match the subsequent rows or entities with this particular columns right so what this lookup would do is it would use the customer ID field in my in this table and it would do a lookup to this particular table and match the remaining rows and columns right so let's configure the schema for that first of all since we are joining these two we are using this tea joiner okay so tea join let's go to the component and edit all the other details here so we have to check this include lookup columns and output this means that at the output that comes out of this joint we want to also include the fields which are present in my lookup fries right so that's why I have a check this car so besides that here it says column mapping right so we have to always map these two different tables during one lookup field right so it's either one lookup fluid or the primary key foreign key relationship so both are the same so that's what we were to define you gotta define which is the parameter which is the foreign key or which is a lookup or failure okay so first of all let's go - let's go edit the schema here all right so we have the two different fields here we have the input field one and the input stuff we do over here so let's add all of these I'm gonna say control a and I'm going to move it here all right so I'm going to join all these things and from the second table I'm going to move my invoice number my description well I don't want to move description so I'm going to move my quantity customer ID is already there so I would then want to move my product ID okay now the reason I'm not going descriptions because I don't need description as for my problem statement they don't need for our description okay similarly even product ID is not really needed for me so I can actually remove this also right so I have these so these will basically be my output fields my customer ID customer name customer phone customer email address invoice number and quantity right so this will be my output fields and another upset my schema even the event has been auto picked right so this is the nice thing about talent things are pretty simple and it's all straightforward so in this click on okay so guys another will set a schema here let's add the columns that we want in our output from our lookup table okay so this is a lookup table and from here what all columns will be what we want in our output so that we've got to include over here right so let me add the different entries so I really want my customer ID to be there so I'm going to say a row to got customer ID row to a row basically means this table right so and then I want my quantity to come and that I'm going to set as a row to what quantity I'm going to add another entry here and that's going to be for invoice number okay so I'm going to say this is row 2 dot invoice number all right so now that we've done our column mapping here this is pretty good enough now let's go down and set our key definition so here we set our primary and secondary key a foreign key definition okay so let me add an entry here so the primary key is going to be customer ID and the rock table is going to be row 2 dot customer ID okay so with the help of these two fields we're going to find our we're going to make our way over toward lookup and another advantage here is that I can also go back to schema and edit the data of my quantity okay since I'll be using quantity as my differentiator for my problem statement I can set the data type to be something else I can set it to be integer okay so I can do this and set the length to be either 5 or 10 right I can give okay so now this basically means that or even though all these are in string format my quantity can be an integer and the best part about talent is I don't need to change anything over here okay so I can get the data and from any data type the data that comes in can be of any data type okay but the data that I store all the other I finally export that I can change the data type of that particular data so that's the advantage so quantity comes in the form of vodka to data type and I'm converting it to into integer from inside talent right so that's the thing about advantage with talent so I'm just going to say okay now all right so now that I've done this I think I'm done everything with respect to T join okay so now let's go back to my problem statement for a minute now what they say is they wish to find out only those customers who have a low number of purchases okay so it is quantity which defines how many purchases that they have right so I'm going to go back to my talent here another of added might be log in over here I will be getting all my details and all my data into this component right so my input one would be giving me those rows from customers and this would be giving me those rows and fields and records from my transactions table so I will be getting approximately 50,000 rows from here and 10,000 rows from here now that I have all these rows over here what have to do is have to filter them based on the quantity and our problem statement here says that we have to find out those customers who have a low number of purchases and if you remember we define that when if you want to find row number of purchases it's not exactly defined so let's say we want to find those people who have less than 10 transactions or whose quantity is less than 10 right who put is less than 10 items so for that we can just go back to our data warehouse and create another filter okay which would filter out those people who have less than who opposites less than ten items per transaction right so for that let's again go back to my palette here and search for this component filter so as you can see there is pee filter row and tea filter columns okay but I want to filter rows Swami is going to drag this one drop it here so T your filter okay and I want to connect these two right so now I'm going to give a connection here and what I'm going to do here is I'm going to define the the filter criteria okay so I'm going to add an entry here but before that let me go to my or schema and as you can see I have all these entries here and same fields are here also right and even the data type has been picked up by default that's the talents functionality so this will be the input that will come to my filter and this will be the output that will go right so let's just click on OK and now coming to my criteria for filtering I want to filter my quantity right so the input column that I want to check is that of quantity okay on the function that I would want to do is empty now for now let let's just leave it empty and the operator is if it's lower than or equal to right so if it's lower than or equal to 10 that is the value that is present in this column if it's less than or equal to 10 then I want to filter only those rows and I want to send only them okay so basically my command is also set over here all right so in my schema and my function is set over here and now what have to do is have to set the output for my data warehouse okay now where do I want to extract these results and store these this data warehouse all right so let's say I want to store it in an Excel then I have to just go back to files over here that means it's typed file and hit enter and I have these options read out 50 file input XLV file output X L so I am going to choose the file output Excel and drop it here this would essentially mean that the result that are filtered over here in this component this would come and get stored in this excel file which I am going to create now okay so let me connect these two all so I'm going to say filter and put it here and supposing if I want to also get a list of all this who have more than 10 transactions then I can do that also by using this option of rejecting so whichever rows get rejected right which will get rejected from moving to this particular file those can go to another file right so we have that functionality also that as I'll show you even that but before that so let me just show you how this looks like so I've set a row filter t file output Excel and let me edit the component over here and let's say the file name is this one d wh VI demo all right I'm going to say include the head also head up basically is the part which would say customer ID phone number and all those things from one say include head up alright and I'm going to say define all columns to auto size that's just to auto size those columns right so it's all fine now everything set and my file would be saved in this particular path okay and this would be the name of my file which will be stored when I run this so I've copied this path now what I've gotta do is I just save this job and execute this okay when I executed that it's showing me an adult over here okay so the error it shows us that the error and the complex properties type mismatch cannot convert from string to integer okay not a big deal I can fix this error okay we can fix that by first of all checking our schema right over here if you see the schema here it says quantity is type integer right but over here quantity is still type spring so that is what we need to change we got to change the type over here to integer so I am going to say number and I'm going to click on okay fine okay so now let's go back to this file right and let me edit the schema and set the properties correctly okay so it's not able to convert this right so the type of string and the DB type is number so this is the problem so so I can treat this also two integer right so it goes right here so I'm gonna choose this as the integer database type I'm going to click on OK and now let me go back to the T log join and check the schema over here okay so this is my table and quantitate this is integer this also integer is coming here and now it'll get converted over here into this so now it would lightly throw me in here now let me just save this and run this job ok building job ok it's showing an error again cannot convert integer to string okay so let's see what the error is now okay so we have a problem here invoice number right so this is the problem so invoice number it should be Row 2 dot invoice number so this was the type mismatch error that we were getting right so no problem okay so again it's changing right so right I'm going to save this now and we won't have the type mismatch era now okay great so it says connected and disconnected as you can see the output over here we have got around these many rows from this table and about 50,000 rows from the table job and we've got it here and then we would have probably join these two with the help of the customer ID field and then to my T log row it would have filtered all the fields so we've got the input has basically been eight three four two number of rows eight thousand three hundred forty two rows out of which 841 rows got filtered and stored in this excel file awesome right so let's just go to this path and check if on field content are correct so this is the path where the file store right so I'm going to copy this path I'm going to go back here and paste this path so this was the file that we created right and yes as you can see we have got customer ID customer name customer phone the email address invoice number and the quantity okay and the best part is the quantity here is the sorted and it's only that which is less than ten right so that was the criteria that we attack we had filtered for those fields to be only less than ten and to show you further proof what I can do is I can now do this a lot of filter okay and if you can see here the only options you are 110 okay that means nobody whose quantity was 90 or 100 plus has been added over here right so so this is how simple our talent is and this is what data warehousing is in basics right so this is a very simple example that I've showed you where you can get data in to your data warehouse by filtering some data from the database and probably you can use this data to make your analysis or visualization on a tableau or click-through or any other data visualization tool okay but there was another thing that I promised you right I told you that you will have a filtered row and you will have another rejected row so if this was the filtered column then I can probably have another excel here I can say T file output Excel so okay so T file output Excel I'm going to paste it here and what I'm going to do is I'm going to start one when I say reject and here so the rejected the fields or rows and columns are going to go into this particular fields okay so I'm gonna name this also as rejected data right so this also won't be present in the same path and let me edit the gonna say define Auto columns sighs I'm gonna say include header and yeah I'm gonna save this and run it again not again I have the same list of fields over here right another same rows over here right so as you can see the same 8:41 rules have been added to this file okay the filtered results are here and the remaining ones which were rejected they have gone to my new file and this file is called the rejected one so if I go back to that same path this was the original one and reject data is your so when I open here you can see that the only options here are those of above n okay so if I just show you with a filter column right so you can see that there is nothing less than ten so that's the thing so that's how you filter data that's how you get data in and all these things so that's how we work with talent guys so you get a run so the same thing here you can also store it to some database right so instead of having a file to excel for this I can just delete this and I can just delete this and let me go to databases first of all all right so here when I say Oracle and T or a clout put okay I'm going to paste it here and now if I am going to say reject or yeah project right so all these rejected entries would go into this particular file of mine so let me add the components and edit the component or the hose reels here it's localhost okay port is one five to one database is X E right yes it is XE and my username is important my password is a de Rekha all right so and then you have the action on table right so the action is going to be create table okay so when we say create table they've got to give me your table name also and the table name let's say rejected output okay when I'm getting there is nothing by that name yep there is nothing here by your that name so we have customers final table and transactions right so we will have a new table that will be formed over there so which the action is going to be interesting you want to insert or update an existing file then you can choose the various options but in our case it would be create a new table and the action would be inserted okay so let me just save this okay and run this now so seems like my new table has been not created here let me just go back my sequence here or a clue table and we need to refresh this yes so as you can see there's a new table has been created and when you see here these are the different fields error message was another field that gets auto-generated and if you look at the contents of the field you have all these right so guys that's how you get your data back into your database right so earlier I showed you how to do it and sort on an excel file now I've showed you how to create a new table and how to get your data warehouse data back into your database so that's the end of my demonstration all my queries have been passed successfully now let me just quickly go back to my slides for a minute and oh yeah this hands-on is what I completed it showed you how to get our data and your data warehouse so yeah so the I think that brings us to the end of the session and let me just summarize whatever I did in a minute right so first of all I spoke about the need for business intelligence and then I told you what is business intelligence why you need it for your business and what is the role of data warehouse and then I spoke about data warehousing right I give we went into the depth of data warehousing and spoke about the key terminologies and data warehouse we have the EDL we have data Mart's we have the metadata and all these things and then finally we spoke about the architecture of the data warehouse and we finish off the session with the hands-on demonstration of how to populate your data warehouse using talent so let's that's it and thank you for being in the session so that brings us to the end of the session trip probably will have another session on new data warehousing and I'll talk about more advanced concepts like Chema's right I spoke about there are three different types of schemas right like star schema snowflake schema galaxy schema all these things so I will talk about all these things in my next session and probably also I will talk about dimensions and fact tables and all these things right so it doesn't seem like you guys have any doubts so if you do have any doubts please put that in the chat box right okay so thank you guys thank you for being in the session probably I'll see you on until next time okay so I hope you enjoyed listening to this video please be enough to like it and you can comment any of your doubts and queries and we will reply to them at the earliest to look out for more videos in our playlist and subscribe to our ready Rica channel to learn more happy learning
Info
Channel: edureka!
Views: 927,237
Rating: undefined out of 5
Keywords: yt:cc=on, data warehouse tutorial, data warehouse tutorial for beginners, data warehousing tutorial, data warehousing tutorial for beginners, data warehousing tutorial videos, data warehouse concepts, data warehousing concepts, data warehouse architecture, what is data warehouse, data warehouse basics, olap in data warehouse, olap vs oltp, data mart vs data warehouse, edureka, data warehouse and data mining, data warehouse, Introduction to Data warehouse
Id: J326LIUrZM8
Channel Id: undefined
Length: 98min 50sec (5930 seconds)
Published: Thu Jun 22 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.