Database vs Data Warehouse vs Data Lake | What is the Difference?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on everybody welcome back to another video today we're gonna be taking a look at the differences between a database a data warehouse and a data lake [Music] now when i was first starting out i'd only ever heard of a database and i think that's what most people are familiar with but i had never heard of a data warehouse or a data lake and so in this video we're gonna be walking through the differences between each one of them as well as how they kind of connect with one another so let's jump onto my screen and get started all right so we're gonna be taking a look at a database a data warehouse and a data lake but let's start with a database now when someone says a database typically they're referring to a relational database now a relational database can capture and store data via an oltp process which stands for online transactional process so when company completes a transaction and sells an item it'll record that within a database and that data has the ability to be live real-time data data in a database is going to be stored in tables which has columns and rows and this will be highly detailed which means you're going to be able to go in and see every single aspect of the data and databases also have a really flexible schema which means you can go in there and kind of change things as you go to make it work for what you need now a data warehouse is also a database just like we were looking at before but it's going to be used for analytical processing or olap olap stands for online analytical processing and it's created to basically analyze huge amounts of data now if you notice on the last slide there were these three databases they were just kind of sitting there and they were storing the data in this visualization that we have on the right these three databases on the bottom are all aggregating and sending their data to this data warehouse via an etl process which is where it extracts the data it transforms it and loads it exactly how they need it in this data warehouse and that's how data is put into the data warehouse it isn't getting it directly from the source but it's being put into a database and via the etl process is being updated as it goes or whenever the etl process runs a data warehouse will always have the historical data but it won't always have the current data unless the etl process is running every single day or very frequently the data in the data warehouse is also a little bit different because we're doing this etl process to get the data in there we're not actually putting every single piece of data or every column and row in there we're typically summarizing it and then putting it in there which will allow us to process that data for our analytical purposes much faster now a data warehouse is going to have a much more rigid schema so you really need to plan ahead with how you're going to put your data into a data warehouse it's not as flexible as just a database so now let's look at some of the key differences between a database and a data warehouse a database is going to be used for recording transactions or a data warehouse is going to be used for analytics and reporting a database is going to have fresh and detailed data where a data warehouse is going to have summarized data it's only going to be as fresh as the etl process is created a database is going to be a little bit slower for querying large amounts of data and when you do query large amounts of data it can actually slow down the processing of all those transactions a data warehouse was designed for the exact opposite it was designed to be very fast at querying and not slow down any processes because it isn't part of that transaction processing at all so now that we've looked at a database and a data warehouse let's take a look at a data lake a data lake was basically designed to capture any type of data that you could possibly want it could be a video a picture an image a document a graph anything you could imagine that you'd want to put in a database or store in some way you can store it in a data lake now there are a ton of use cases for a data lake but i think people who work with machine learning and ai get to use it or benefit from it the most they can use all that structured and unstructured data and create models to really use it in its raw form where if you want to use it for analytical purposes typically you're gonna have to clean it up a little bit and do a little bit more work to actually make it usable and so a dale lake is just that it's this lake where you can basically throw any type of data in there but it's not always super usable because you're just putting it in there in its raw form if you want to use it for analytical purposes and reporting most of the time you're going to want to clean that up and put it into a database or a data warehouse so now when we're looking at all three they are all different and they're all used for different purposes so no one option is better than another for your data if you're using it just to record transactions a database is what you should do and if you have a large amount of data that's just too much for your database to handle it sounds like you might need a data warehouse and if you have all this data they have no idea what to do with or it's unstructured semi-structured data that you can't fit into a database well then i highly recommend using a data lake there really is no one size fits all all three of these can be options for different uses and in fact you can use all three within one company for just different things that your company needs so i hope that that was helpful learning the differences between a database a day warehouse and a data lake again i had really never used a data warehouse or a data lake when i first got into analytics but now that i've gotten hands-on experience with all of them they're all really interesting can be used for so many different things so thank you so much for watching this video i really appreciate it if you like this video be sure to like and subscribe below and i'll see you next video [Music]
Info
Channel: Alex The Analyst
Views: 382,652
Rating: undefined out of 5
Keywords: Data Analyst, Data Analyst job, Data Analyst Career, Data Analytics, Alex The Analyst, database vs data warehouse, database vs data lake, database vs data warehouse vs data lake, data base vs data lake vs data warehouse, database vs data lake vs data warehouse, data warehouse, data lake, what is a database, what is a data warehouse, what is a data lake, what is a data lake vs data warehouse, data warehouse vs data lake
Id: -bSkREem8dM
Channel Id: undefined
Length: 5min 22sec (322 seconds)
Published: Tue Apr 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.