What is ETL | What is Data Warehouse | OLTP vs OLAP

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm going to explain you what is ETL data warehouse and the difference between OLTP and OLAP system using a simple business use case let's take the example of Wireless business in u.s. Verizon is a wireless company in India it's Reliance Geo they have stores across the country when you walk into any store the store will have different software systems such as POS and insulin software the POS system tracks the sales transactions now when you buy any phone are generally people buy insurance as well and to track the insurance the claims etc you might have insurance software which will have insurance related records in it now these two could be a different systems with different databases for example your POS might have MySQL database whereas your insurance might have or I call other than that you might have unstructured data for example you are running customer satisfaction survey where you are scanning those survey papers and storing those as PDFs on your Amazon s3 cloud now in your business you have the structure and unstructured data as a business owner now you want to get insights into your business you might want to ask different questions such as which store is performing best in terms of device and insurance sales in terms of customer satisfaction survey which store an employee ranked the best holiday season is coming and which store or region is going to have maximum traffic so nowadays getting insights from the data has become very important because this kind of insights can help you make better business decision and the future business strategy let's take the first question which is which store is performing best now one obvious thing you can do is you have these two databases one is MySQL one is Oracle you can get the aggregate sales numbers from the first one and the aggregate insurance number from the second one and you can just add them up but this is not a good idea because these two are mission critical systems these systems are serving your real customers the query that your forming what if that queries complex and it slows down the database that's not gonna be good so typically what organizations do is they copy all of this data into some other database and they perform the query on that database okay because even if you slow down that database it is not serving your customers so it's gonna be okay now you don't want to blindly copy these records into the second database that we just talked about because to make your analytics easy you might want to apply some transformation for example these are the records you have now you'll notice I also have one record from my India store so let's say your business is international business you have some stores in New York and some stores in Bangalore India and your databases are on cloud so they are storing all the information into single table now to know which store is performing best in terms of insurance and device sales you might want to do aggregation first so first step you can extract the data from these two databases and you can just aggregate by store name so here the New York store the USS is 2,350 because I just added these three similarly for Bangalow store of course I have one record so it is showing the aggregate here once I have data in this format it is a little bit better because now I have aggregate numbers but still I cannot compare India store with US store because of the different currencies this one is in dollar this one is in rupees so to do apples to apple comparison I might have to perform normalization which is currency conversion so I converted rupees into dollar and now I can save this store is performing best I can add these two number 23 50 100 I will get 24 50 and then add these two numbers so of course the new store is performing better once I have the records in a format that I can perform my queries on I can then store them into another database called a data warehouse we call it data warehouse because it stores data of your entire organization I just the example of POS software and insurance software but it can have data from your ERP system for employee records your customer satisfaction survey basically all kinds of data that is being generated in your organization they all can be dumped into this data warehouse after applying some transformation so that later on you can perform analytics on top of it now I just use aggregation and normalization actually there are many more things you can do you can remove nulls or you can remove outliers you can do format conversion you can join two columns so you can apply all kind of transformation based on your analytic needs now if you realize we perform three steps here we first extracted data from this various sources so that step is called extraction then we did an aggregation and normalization so that step is called transformation after that we loaded these records into data warehouse that step is called load hence ETL extraction transformation and load and after performing ETL you store it into data warehouse and now your data analyst and data scientist will perform all the queries and analysis on this warehouse to produce the dashboards to produce the reports that can answer the questions we ask those three questions so after data is in data warehouse or your data analysts and data scientists will use their skills to answer those three questions the left hand side box is called OLTP which is online transaction processing system and these two databases are called OLTP databases the right hand side box is called OLAP which is online analytical processing system and the data warehouse here is called OLAP database so it's actually these terms are jargons but it's really very simple the OLTP is your transactional databases which are serving your main business which are serving your customers these are mission critical system and you want to make sure they don't slow they are up all the time all up on the other hand is something that you need in house they are not solving to your customers you need it for performing and analytics for getting insight from your business now the ETL can be performed using simple Python or a Java program but if your ETS are getting complex it's better to use Enterprise ETL tools such as Apache ni5 which is open source and free there is informatica and Talon these are paid softwares in terms of data warehouses we have enterprise solutions such as Tara data's no flag Amazon redshift and greenplum you can use MySQL or Oracle as a data warehouse as well but if you go for enterprise solution they give some additional features such as greenplum gives you MPP massive parallel processing same with Terrell data so these are the solutions designed for storing huge volume of data and performing analytics on top of it so I hope that clarifies if you are interested in learning Python or any of the data science texts such as pandas number machine learning or even get version control system then I have designed a tutorials for complete beginners so please go and check it out on my channel thank you
Info
Channel: codebasics
Views: 145,870
Rating: undefined out of 5
Keywords: OLAP, ETL, Data Warehouse, What is Data Warehouse, OLTP, etl, what is etl, etl process, etl process in data warehouse, data warehouse, olap, oltp vs olap, olap vs oltp, oltp, data warehouse concepts, etl tools, etl tutorial, data warehousing, what is data warehouse, etl tutorial for beginners, etl data warehouse, etl tool, olap oltp, olap and oltp, difference between olap and oltp
Id: oF_2uDb7DvQ
Channel Id: undefined
Length: 8min 6sec (486 seconds)
Published: Fri Feb 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.