SQL Project | SQL Case Study to SOLVE and PRACTICE SQL Queries | 20+ SQL Problems

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone in this video let's work on an SQL case study as part of this case study first we will try to download the data set from kagle once we have the data set we'll then try to upload it into our database using a very simple python script once we have the data available we'll try to analyze it and then we will try to solve around 20 plus SQL queries as part of this SQL case study now you can call this like a case study or you can call it like an SQL project but basically if you want to solve basic to intermediate level of SQL queries then this is pretty perfect now of course I will not be able to solve all the 20 plus SQL queries as part of this video because then the video is going to be very long but rather I'll take a handful of the queries and I'll try to solve it during this video but for all the remaining SQL queries the data set the scripts and everything else you'll find it in my blog I'll leave the link in the video description let's start okay so first of all let's take our data set from kaggle I will leave this link in the description so you can go through this link and can download the data set yourself now the name is famous paintings it's given by Maxwell and I think he has taken this data set from data world he has mention that details here okay now if I go down you can see that this data set has eight different files and it has information about artist paintings museums Etc the first thing that we will do is click on the download so that all these eight CSV files are downloaded into our system now I have already downloaded them and I have placed them in one of my folders okay now the next thing that we need to do is I have the data downloaded from kaggle now I need to load this data into our database now there are two ways I can do this one is I can go into my database now I'm using postular SQL database and I'm using this PG admin tool of course you can use any other database of your choice okay so the first choice is that I create all of these eight tables so one table for each of the CSV file and then I do an import into each of these tables okay so I can simply do an import just right click on the table I find a table under the schemas and let's say I go under public here I go into the tables and okay currently I have not created any of the tables but if the tables are created it would show up here and then just right click and you would have an option to import the data from the CSV file okay you can do that the problem with that is you have to manually create the table scripts and you need to manually import one by one uh the data for each table okay now A Better Way or a more faster way is just by using a python script okay so using python I can upload all of this CSV data I can basically take it into a python pandas data frame and then I can move it into any database okay so I'm just going to quickly show you how to do that okay so let me go into my vs code so I'm using the visual studio code uh editor that is ID and I'll just quickly show you how you can do this okay so it's very simple even if you don't know how to use Python trust me just watch how I'm going to write the script and you will understand everything okay so the first thing is I told you whenever you want to load data from a CSV file into python so we first load data from CSV into Python and then from python into the database okay so how do you load it into a python so that is by using uh one of the modules in Python called as pandas okay so first of all I need couple of modules uh in order to do this uh operation so I need pandas so I can just tell import pandas as PD secondly I need a module that can connect my python into my post SQL database right in order to do that I'm going to use SQL Alchemy uh package okay so I'm just going to say from SQL Alchemy import because I only need one particular function from this module and that is uh create engine okay so these are the only two modules that I need in order to perform this action okay so the first thing is I want to connect my python into the postular database so I need need to Pro kind of provide the URL or the connection string okay so what I'm going to do is I'm just going to say connection string it's just a variable which is going to store the URL or uh the connection details to my post SQL database so my database name is post SQL okay then then I say colum double slash similar to any URL okay and then you provide the username okay now I'm using the default username in postgress and that is postgress and its default password is admin okay then the host is Local Host because it is on my system and then the database name that I need okay I need to provide a database where this data is going to be stored so what I'm going to do is I'll go into my PG admin here okay and you can see that I have already created one new database by the name painting currently that database is empty even though I have mentioned all of these eight tables with their account here I have not created any of these tables okay this is just so that I can quickly show you all the eight tables okay so this is a database that we'll be using by the name painting so I'll go back into my Python and here I'll just tell painting okay this is the database name okay that's it so this is my connection string this will be used by SQL Alchemy to kind of create the engine and then we will do the connection okay so the next thing is to kind of create the engine for that I'm just going to say DB equal to create engine and create engine and I'm just going to say connection string okay and finally I just need to connect into my uh database okay I need to create the connection so I'll store that connection into a variable called con connection okay and I'll say db. connect that's it okay so these are I'll just remove this so these are the only five lines of code that you need to connect your python to post SQL database okay now I'll just execute this if everything is fine then we will not get any error but if there was some issues with the connection then you would basically see some error you can see that I have not got any error that means everything looks fine and the connection was successful okay so I have built the connection from python to my post SQL database the next thing is I need to load the CSV data right so what I'm going to do is I'm just going to say data frame that is DF is equal to PD do um read CSV okay now PD is the pandas uh module or the package and pandas has one function by the name read CSV this is you basically provide uh the file okay with the path and it will read that file data and put it into this data frame okay this variable okay so I'll just provide the whole path so if I go back to my system where this file is present let's say I'll just do uh I'll just copy this whole path okay and I'll just come back to my python I'll put the whole path I'll do a slash and the file name name is artist. CSV okay so that's it so this will basically read all the data from the CSV file and put it into Data frame if you want to verify that what we can do is I'll just say print DF doino okay and let me now execute this so if I execute this you can see that it's kind of displaying some sample data okay so these are all the different column values are displayed here more importantly you can see the last line 42 one rows and nine columns so from this particular file that is artist. CSV 421 rows are loaded and nine columns are taken okay so that is what we wanted so you can see that the data is loaded into our data frame next whatever data is present in this data frame I need to load it into an SQL table a post SQL table so in order to do that what I can do is I can just tell DF do2 SQL okay so this DF is basically my data frame it has a function that is by the name tosql which kind of loads the data data from the data frame into an SQL table so I need to provide a few arguments here the first argument is a table name where I want to load it so I will call this like artist because this file is artist right so I'll create a table also with the same name as the file name okay so the First Column or the first argument is the table name the second argument is the connection so I'm going to provide this connection string because my database connection is stored into this variable so I'll just provide this variable here okay this Co n is basically the parameter name okay and then the third argument is what happens if the table already exist then I will provide a statement here saying that if exist if the table already exists then replace it okay kind of like I'm trying to overwrite the existing table okay and the last argument is you know in pandas we have by default there will be indexes so I do not want when this table is getting created I do not want the indexes to be considered okay so I will just say index equal to false okay so that's it these are the only four details that you need to provide when you're loading data data from data frame into an SQL table okay so now if I just run this okay before I run this let's see if this table artist is present in my database okay I have connected to my painting database you can see that this table does not exist now I'll just go here and I'll just run this and you can see that there are no errors if I go back into my database and if I run this now you can see that the data is present here okay the table is created so 421 rows and all the columns are loaded this is how easy it was I didn't have to create the table myself everything was done by python within like one or two lines here right now this is fine just for one file but I have eight different files so I can do two different things here either I can repeat these two lines of code eight different times one for each file or what I can do is I can put this into a loop and do everything at once okay of course we'll try to do it everything at once using a loop so first of all what I'm going to do is I'm just going to create a list here and I'm just going to provide all the eight file names okay so I'll just quickly do this okay so I have just created uh a list and stored it into this variable files it has eight values the these are basically the file names okay then what I'm going to do is I'm going to open a loop I'm just going to say for file in files okay so that I can Loop through each of these files okay and inside each of this loop I am just going to perform the same thing okay so this is my whole path but the file name will change for each file right so this I want to change it uh in each iteration so what I'll do is I'll just use a F string here okay and then instead of this artist I will just remove this and I'll just use the F string format okay and here I'll just tell file okay what is this file this is basically the iterator in each iteration it's going to get the value from the list okay so the first iteration it will be file second iteration it will be canvas size Etc okay so this is a file name do CSV okay so hopefully this should basically take different files in each iteration okay once I have taken that then the table name also should be the same as the file name so what I'll do is I'll use instead of use hardcoding the file name here I'll use this iterator variable okay so that's it so hopefully when I run this script now it should create all the eight tables okay just to verify I'll just see that we have created artist but all the other tables you can see that it's not present right now let me just run this whole python script and let's see if it actually works and you can see that it hardly took few seconds maybe two three seconds and it is done let's go back into our python script artist table it's done you can see that it has all the data canvas size is also loaded image link is the third file it is also loaded then we have Museum hours then we have Museum then we have product size then we have subject and finally we have the work table you can see that so easily just by using like what hardly 10 lines of python code we were able to load all the data from CSV file into a post SQL database so I hope this was clear of course all of the scripts will be present in my blog you can use it okay now let's proceed so we have taken the data and we have loaded it into our database okay now before we can solve some SQL problems I let's try to spend some time in understanding what exactly what kind of data do we exactly have okay so first of all let's go one table at a time we have the artist table this table as you can imagine it information about the different artist okay so their name their nationality the style of paintings that they do and some other information about them is present in the artist table then there is a canvas size table it basically provides information about the canva that is being used for each painting okay now what is a canva you can consider it to be like a painting board where the paintings are done okay and the size of that is mentioned in this table okay then there is an image link table it basically has the URL for all the different paintings unfortunately none of these URLs work okay if it was proper then probably we could have seen the paintings that we were talking about here okay but I think none of these URLs are actually working okay so the next table that we have is Museum hours this basically tells uh so there is a table called as Museum okay let's first go there and you can see that we have I think 57 different museums the name of the museum its address in which city is it is it present in which country it's present Etc is present here okay and then the the time when the museums are open on which day is is it open which time it starts and which time the museum closes is all present in the museum hours table okay then we have the product size it's basically uh so the First Column here is work ID now what basically work ID means is uh a work is basically one painting okay one painting is considered like one work okay so it's the IDE of a work or a painting then the painting was done on which canva and then the sale price and the regular price now what exactly does this sale price and regular price mean now this is not actually the price the painting was sold for because none of these paintings are sold they're all present in the museum okay but they are available for selling okay what this means is the actual price of the painting is this regular price and the price which is basically listed in the museum for sale is basically the sale price so for example let's say if I consider uh this painting okay the painting I don't know this particular painting you can see that the regular price is $275 maybe dollar okay but it sale price is 675 that means the painting's actual cost is75 but in the museum it has been displayed and it is its asking price is $675 okay so this is what basically it means it's not the sold price it's basically the asking price okay this is what I understand from my analysis none of this was explained in the kaggle OR in uh data world so just by looking at the data this is what I could understand okay if you have a different understanding let me know in the comments below then we have a subject table it basically tells for each of the painting what uh what is this painting about okay this painting is about horses or Marine art Etc okay then there is a last table that is work it basically has information about the painting so what is the name of the painting the artist who drive that painting what style this painting belongs to and in which museum is this painting displayed in okay so this is everything about the data set that we have okay now once you understand this data then you are ready to basically solve some problems using this data set okay now I have come up with around 22 different problems as you can see here so I have uh some basic problems some intermediate level problems but we have 22 different problems that I would like to answer so don't treat it like a problem it is like these are certain questions that anyone can ask about this paintings and museums data set and if you can answer these queries then you kind of are answering the different questions that anyone could have about these paintings okay so now I cannot solve all the 22 problems in this video because I I'm sure now itself the video has stretched along but I'll try to solve maybe two or three SQL problems okay so let's start with the problem number 10 okay so what I'm going to do is I'll go into my post SQL I have these tables created here right and I'll just put the problem number uh 10 here okay and just to create some space I just move this to the left and I'll just expand this okay now the problem basically States identify the museums which are open on both Sunday and Monday okay display the museum name and the city okay so we want to know which are the museums which are open both on Sunday and Monday not just on Sunday but both on Sunday and Monday now as soon as you look at this problem the first thing that I think should come to your mind is in which table we can find information about the opening of a museum okay which day it is open Etc so we know that there is a table called as Museum hours and if I if I just minimize this you can see that in museum hours table we have information for each Museum on which day it is open right so I think in order to solve this problem we probably need to use this table okay so I'm just going to say Museum hours here okay and then let's say I'm just going to say where day equal to Sunday okay so if I run this now you can see that I think totally if you see here we have 57 museums okay and in this Museum hours we have 351 rows it's because for each Museum it might be open on multiple days so there are multiple records okay but in general we have 57 museums and if I see that which are the museums which are open on Sunday okay how do I know if it is open on Sunday because here this is basically the information when the museums are open okay and the day basically indicates on which day the museum was open okay here I can see that out of 57 56 museums are open on Sunday okay and same way if I see for Monday so out of 57 museums how many museums are open on Monday I can see only 29 okay now what I need to find is among these museums which are the museum which are open on both these days Sunday and Monday okay so how do I do that so first of all I put the Sunday uh here so I get 57 okay the next thing what I can do is I put another condition here so I can just tell among these 56 museums I want to check only for these 56 museums which are also present on Monday okay so I put an exist statement here okay and I write my query select one from the same table that is Museum hours and let's say I'm just going to call it like mh2 and this will be mh1 okay and then I want to join these two tables so I'm just going to say where uh mh2 do Museum ID equal to mh1 do Museum ID okay so what I what exactly I'm trying to do is from this outer query I have got 56 museums which are open on Sunday for these 56 I want to also check if they are also present on Monday right so this inner query is basically for the other day that is Monday so here I need to put a filter condition saying that where mh2 do day is equal to Monday okay and now if I run this I'm getting 28 okay so 28 are the museums there are totally 28 museums which are open both on Sunday and Monday okay I hope you understand the concept initially I took all the museums which are open only on Sunday for only for these Museum IDs okay so I took the museum IDs of this mh1 Museum ID and for these 56 Museum IDs I checked in again in that same table but for a different day that is for Monday okay and this condition only matched with 28 rows okay 28 museums and that is why there are only 28 museums which are present both on Sunday and Monday okay now this is basically what I wanted Additionally the queries tells me that I need to display the museum name and the city so if I want to do that I can just join this Museum hours table with the museum table I'll call it like M and I'll put the join condition like m. Museum ID is equal to mh1 do mh1 do Museum ID okay and then I can just print the values here I'll say m. name as Museum name and then it's the city right so I'll just tell M do state or okay m. city right so if I just run this now you can see that these are the 28 museums the name of the museum and the city in which these are present okay so I hope this problem was interesting and you are able to understand it now let's move on to some other problem that we'll try to solve so we have solved problem number 10 maybe let's take problem number 15 okay so in order to solve problem number 15 I'll go back here and let's say I'll put it here okay now what do I have so let me just okay so the problem number 15 basically States uh maybe I'll just move this down just so that it's clear which museum is open for the longest during a day okay so they want to know which is the museum that is open for the longest during a day okay display the museum name State hours open and which day okay so first of all how do I know or in which table can I get information about the opening uh like on how how long the museum has been open in a day and that information is I can get it from the museum hours table right in the museum hours table I know what is the open time and the close time so if I just run that here you can see that for each Museum along with the day it also tells me at which time it opens and at which time it closes okay now ideally if I want to find the duration I could just minus like close time minus the open time I should get the duration right so let's see if that actually works so I'll just do one thing I'll just move this to the right and here I'll just tell close minus open and let's see if this works it actually does not work because I'm trying to subtract a text with a text okay so if you did not notice if I just show you this query again you can see that I have the open time and the close time it looks like a time but the data type of this is actually a text okay so text or you could call it something like a Vare okay so first and foremost I need to convert this data type into a Time data type okay that is like a timestamp data type so what I can do is here I'll just maybe instead of subtracting here first of all let us try to convert it so I'll just use a function two * stamp okay which is available in post SQL two times stamp and I just provide the value and then I provide the format okay of my time so my time is present in the format like uh this is hours HH and then minutes and then whether AM or PM so I'll just provide am okay and I'm just going to call it it like open time okay and I will do the same for the close Okay okay so this is open and this is close and I'll just call it like close and the close time is actually PM you can see that it is pm here right so now if I run this now you can see that I have converted my text into a time stamp okay you can ignore the date and the seconds here we don't need that the only thing that we are interested is the hours and the minutes right so what I'll do is once I have this now I can do the subtraction so in order to get the duration so I can just tell to uh the close time minus the open time so I'll just copy this whole thing okay and I'm going to give an alas for this like let's say duration okay and now if I just run this now you can see that I'm getting the duration so this basically means 7 hours so you can see that this Museum was open at 10:30 a.m. and closed at 5:30 p.m. so if you subtract 5:30 p.m. and 10:30 a.m. you should get 7 hours and that is what I'm getting here okay there are some museums with 8 and a half hours open Etc okay so I have got the duration now if I go back to my query what I'm actually looking for is which museum is open for the longest during a day okay so which is open for the maximum number of hours during a day right in order to do that what I'll do is I'll just use a rank window function I'm just going to say rank over order by and I want to order by the duration so I'll just copy this whole thing and I'll just maybe I'll just put this inside the parenthesis okay and I'll do a descending order okay and I'm going to call this like rank so if I just run this now now you can see that it's basically sorted depending on Whose duration is the most and you can see that there is one museum with the ID 40 it is open for 12 hours 45 minutes in a day okay and that is number one right right so that is basically the information that I'm interested in which museum is open for the longest during the day and that is this particular Museum which is open for 12 hours 45 minutes okay now I have got that and I basically don't need to display all the other museums now that one I can eliminate very easily just by using the filter based on the rank okay additionally I also need to display the information about the museum name State Etc so in order to do that I will join this with the museum table and I'm going to give an alas here this is MH and M I'll say M do Museum ID equal to MH do Museum ID okay and then here I'll just say m do name as Museum name and then m. State and I think we need mh. day right and this is the duration right and yeah I think that's probably all and if I run this now you can see that the museum is okay but I also need to filter only for one particular uh where the rank is one because I only I am only interested in the museum that is open for the longest time I I don't need information about everything else right in order to do that I can just put a subquery I can use a subquery so I can just tell select star uh from and I'll just move this to the right and I'll just use an alas here and I'll say where x. rnk equal to 1 and if I just run this whole query now you can see that I'm only getting information or the data for one particular Museum the museum name is mentioned here it's in Paris it's on Friday it's open for 12 hours 45 minutes okay so this is basically the solution to my problem number 15 okay I hope this is clear now let's conclude this video with solving one last query so I'll just go back here and maybe I'll try to solve the problem number 18 okay so what I'll do is I'll just uh I'll just go into my post SQL and I'll put it here so I have my problem number 18 okay now the problem number 18 basically tells display the country and a city with the most number of museums okay so I want to know which country has the most number of museums and I also want to know which city has the most number of museums output two separate column to mention the city and Country so City should be in one column and Country should be in the another column if there are multiple values separate them with a comma okay so first of all how do you solve this let's solve it like one by one right we first try to find the highest number of museums in a country then we try to find the highest number of museums in a city how do you which table does uh which table has this information it's basically the museum table right because in the museum table we have information about the museum name its address its country its City Etc right so in order to find which museum which country has the highest number of museums I can just use a group by here I can tell Group by country and I can say country comma count of one okay and here I can just tell order by count of one in descending order if I run this you can see that in USA there are 25 so there are basically out of the 57 records present in the museum table each record indicates one Museum 25 record belongs to USA that means 25 museums are present in USA okay so this is what I have got for the highest museums present in a country now let's go for a city so in of country I'll just put a city here and now if I run this now you can see that I think there are four different cities which has highest number of museums so London Washington DC I think New York York and Paris all four are having three museums each so what this means is we have information that we need to solve this problem in two different queries right the highest number of uh museums in a country is present here the highest number of museums in a city is present from this is getting from this query right we basically need to bring these two queries together now there are few different ways you could do that one of the easier way that I can think of is just by using a city so I can just tell with CTE let's say I'll create two different cities for Country I'll create uh the city country cityor country I don't need an order by inside a City so I'll remove that okay and I'll put this whole thing here but I also need to only find the country which has the most number of Museum right if I run this query I'll get every country information I only need the country with the highest number of Museum so in order to fetch that I'll use a rank window function okay so I'll say rank over order by count of one in descending order and this is going to be my rank okay now if I run this inner query you can see that the first row is going to be the one that I'm actually interested in okay now I'm going to create another city so this is my first city and my second city is going to be for the city and I'm going to use the same query that I have here I'll just move this to the right and maybe I'll just move this to the right as well just to make it neat and I don't need the order by here I'll remove that but I do need this rank okay for the city as well and I'll put it here okay everything else stays the same and then finally maybe I'll just zoom this query a bit uh okay here so I'll just keep it here so and finally I have both of these queries here I need to bring them together so I'm just going to say select from CTE country and join it with CTE City okay now the problem is how do I join this query with this query because there is nothing common here right this is about Country and this is about City they'll never join so if I don't have a join condition one way I can join it is by using the cross join okay so I'll do a cross join here and then I'll just display the data that I need so I want country and I want city right and I want to fetch only when the rank is one only for the number one country and the number one city with the highest number of museums so I'll put a filter condition here saying that where let's say I'll use the same table name City dot uh rnk equal to one and I'll copy the same thing and I'll call it like City and this is going to be and it will be one okay so if I run this and I think I need to zoom in a bit let me zoom in a bit so I have executed this query maybe I'll just go down okay here I have executed this query and you can see that I'm getting these four records okay because there were four different cities and I did a cross join for each of that record uh the data from the first table also got joined okay even though the first table here that is City Country only had one record now additionally as part of our query we were also asked if there are multiple values separate them with a comma so here we have multiple cities right with the same number of highest museums so I need to instead of having the them like row wise I need to combine them comma separated values right in order to do that what I can do is I can use an aggregate function that is string uncore aggregate okay and here I can just tell the separator to be a comma okay and I'll call it like a city okay and since I'm using aggregate function here all the other columns also needs to have that aggregation so I'll use that here and I'll say comma and I'll call it like country okay and now if I just run the whole query okay and if I just run this whole query now you can see that I'm getting the four different cities comma separated but the countries are also getting repeated so in order to like eliminate that reputation I can use a distinct inside the string aggregation and if I run the query now now I will get the final output that I wanted so the the country with the highest number of museums is USA and the city with the highest number of museums are London Washington um New York and Paris so this is basically the solution to the problem number 18 I hope you like this video if yes then give me a like on this video subscribe to the channel and let me know in the comments below if you would want me to make more such interesting videos on any other SQL case study thank you so much for watching and see you soon in the next one bye
Info
Channel: techTFQ
Views: 119,793
Rating: undefined out of 5
Keywords: sql practice, sql queries, sql, learn sql, solving sql queries, solve sql problems, sql project, sql case study, 20+ sql queries, 20 sql queries, practice sql, sql interview queries, sql problems, sql techtfq, techtfq sql, techtfq
Id: AZ29DXaJ1Ts
Channel Id: undefined
Length: 34min 53sec (2093 seconds)
Published: Tue Jan 09 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.