3 SQL Queries Asked in Interview for Business Analyst - Solved

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone what you see here are three intermediate level SQL problems that were asked during a recent interview for the role of business analyst now this was shared to me by pratish and in this video I will be solving all three SQL queries one by one let's start with the first one the first problem is named as election and as you can see this is the problem statement so first I'm going to read through the problem statement and then I'm going to explain you what we need to do looking at the given data set okay and then we will try to solve it it is mentioned given a database of the results of an election find the number of seats one by each party there are some rules that you need to consider there are many constituencies in a state and many candidates who are contesting the election from each constituency each candidate belongs to a party the candidate with the maximum number of words in a given constituency wins for that constituency okay the output should be in the format party seats one so that means output should only be one single column which should tell how many seats each party won okay then they are mentioning the ordering should be in the order of sheets 1 in descending order okay some order that we need to consider now let us look at the tables that are given so we have been given two tables there is a candidates Table and there is a results table okay in the candidates table it has information about the candidate like their candidate ID their gender their age and a party each candidate belongs to okay so we have six candidates and you can see three candidate belongs to democratic party and three belong to Republic Party okay then we have the results table there are three columns in this table one there is a constituency ID then there is a candidate ID and then there are the words so this basically tells me how many words each candidate won in their respective constituency okay now we need to use these two tables to arrive at the output that is mentioned here at the bottom here okay you can see that the output is just one single column it just tells how many seats did each party want so it's telling Democratic one two part two seats and Republic one one seat okay what you see on the top here is just a repeated of what was mentioned here okay so you can ignore that I hope you understand the problem statement now let's try to solve it now in order to solve it I have already created the data set and the tables in my database I am using the postcode SQL database and I'm using the PG admin tool okay now you can of course solve it in any database you wish so straight away let's look at the data in the two tables I have this candidates table as I mentioned there are six candidate information here then the results table has information about the votes each candidate has got in each constituency okay now in order to solve this the very first thing that I would do when I have this kind of requirement is I would like to see all the data together in one single table right so what I'll do is straight away I am just going to join these two tables because I can see in the results table I have the candidate ID and in the candidate table I have the ID column so using that ID column ID field the candidate ID field I'll just join these two tables and see all the data in one single place okay and then we will try to analyze what we need to do so straight away I'm just going to say select star from candidates and name it like C I'll join it with results and I'm going to call it like R and I'm going to use the join condition like r dot I think candidate ID is equal to C Dot ID okay if I just run this now you can see that I am getting all the data from these two tables shown together here okay now what I basically need is I am interested in the party information and then the constituency ID and then probably the votes I'm not really interested in all the other fields right so what I'll do is I'll just shrink this such that I only fetch the fields that I need okay just so that it becomes easier for me to understand whenever you're solving a problem first thing if you can bring all the data together that is great and then you can eliminate any unnecessary data for solving that problem you can just eliminate that so that you have the shortest possible picture of what you need to consider and then you can come up with an idea to solve it okay so I'm just trying to do that so here I'll just tell I need party of course I need the constituency constant 2 and C ID and then I'm not sure if I need candidate ID I'll ignore it for now I'll say the words okay if I just run this now you can see that I have this information okay now what I want to do is for each constituency you can see for constituency 1 there is someone from democratic who has participated and then there is okay maybe I'll also include the candidate ID so that you don't get confused okay so I'll just have the candidate ID as well and now you can see that for constituency one candidate one has basically completed same way for constituency one there is one candidate from Republic also who has completed and among these two I can see that Democratic has got more votes he has got 847 whereas from Republic they have got 283 cables okay now in order to better understand it or see it better I'll just sort the data so that I can see data for each constituency together okay I'm just going to say order by uh wants to do ncid so I'll just copy this so I want to group all the data like or bring all the data for each constituency together and then I want to sort it based on who got highest votes so I'm just going to say words in descending order okay I'm just trying to rearrange the data so that I can see it better so that I can analyze and then apply my logic okay now you can see that for constituency one I have the data in first two line for consequency 2 I have the data in next two lines and the constituency three data is present in the last two lines okay now among here since I already sorted based on votes in highest in descending order I can see that for constantly one this candidate one that is the first record here for the consistency 2 this is the person or the candidate who has won that who has got 394 compared to 293 and for the constituency 3 I can see that the candidate from democritic who got 429k votes has one compared to the Republic candidate who got 303 cables so in short if I am only interested in candidates who actually won the election then I can I just need to find a way to only fetch this record this record and this record right the winning candidate from each constituency how do I do that probably you could think of doing a group by you could do that but a more easier way I think would be to just use a window function I use a window function probably Rank and then I can partition based on each constituency and then I give a rank Whoever has got rank one that means they won the election in that Constitution and I just fetched them right so what I'll do is I'll just use the rank window function I'll say rank over I'll say Partition by uh uh constituency ID right and then order by votes in descending order okay and I'm going to name this column like rnk okay if I just run this now you can see that for rank 1 I have this record and for constancy 2 I have this record with rank 1 and for constituency 3 I have this record with rank one right so I only need to fetch where the rank is one right so what I'll do is I'll put this whole thing into a city I'm just going to say with CTE as I'll just move this to the right and I'll put this inside the parenthesis and whenever you have a query inside your CT you don't need order by so I'll just remove that and I'm just going to have select star from CTE but I'll put a condition saying that where R and K equal to 1 because I only want to fetch the candidates who won the election who got the highest votes in that constituency right now if I fetch this I will get only the three records that I'm actually interested in these are the three candidates for each constituency you can see constituency one two and three who won the election there right so this is basically what I wanted now from here if I look at my output I am not interested in displaying the complete information about who won the election I only want to know how many seats did each party won right so if that is the case I am only interested in this particular column the party column right so from here if I want to know how many seats did how many records are present for Democratic and how many for Republic I can easily do a group by right and then you do account so that's what I'm going to do so what I'll say here is I'll say Group by party and I'll put the party here and then I'm going to do a count of one I'm not interested in any other field right so if I fetch this now you can see that Democratic there is two and Republic there is one this is exactly what I wanted but the output is in one single column right here I have the output in two columns so I need to basically bring them together so if you want to bring them together you just need to concat so what I'll say is I'll just use the concat function I'll put a space in between and then I will concat it with the count and I'm going to name it like what they have asked us to name this column I think it is party seats one so I'll just name it like that party seeds one okay now if I just run this I think that's all so this is basically my output Democratic has won two seats and Republic has won one okay so that is exactly what I wanted here okay so this is basically the solution to the first problem I think probably I was a little fast but I hope you understood what I was trying to explain now let's move on to the second problem the second problem is about advertising system deviations report let's read through the problem statement understand what we need to do and then we'll try to solve it okay as part of hacker ads advertising system analytics they need a list of the customers who have the most failures and the most successes in ad campaigns there should be exactly two rows that contain type customer campaign and total type contains success in the first row and failure in the second row these relate to event status so basically what they are telling is when our output should have two rows one row which tells the customer who has the most number of successful campaigns and the second row which tells the customer who has the most number of failed campaigns okay then they are mentioning customer is the customer's first name and last name separated by s waste campaign is a Commerce operated list of campaign name that are associated with the customer ordered ascending total is a number of associated events report only two customers the two with the most successful and the most failing campaigns okay now let's look at the data first and then I'll explain you what you need to do so we have been given three tables there is a customers table there is a campaigns Table and there is a events table in the customer table you can see that we have information about three different customers there is Caroline Matthew and Melissa okay and in the campaigns table you can see there is information about the campaign ID the customers customer ID and the campaign name like overcoming challenges business rules Etc okay then there is a table for events what this mean is each of these candidates or these customers basically have done multiple campaigns okay the campaigns that they have done you can find it in this table the customers will find it here and how many events that they have done and whether that events are successful or failure you will find it in the events table okay so that is what mentioned here now what we need to do is we need to come up with an output that looks something like this so basically they are telling output should be only two rows and two rows should have these four columns it's basically telling the customer who has the most number of successful campaigns and which campaigns were there where they and we need to mention them comma separated and then we need to have the total number of successful campaigns that they have had that we need to mention here same way for the most number of failed campaigns who is that customer what campaigns were they and what is a total fail campaigns that we need to mention here okay so this is what we need to do now in order to solve this again I have already created the tables in my database I am again using the postgresql database and I'm using the PG admin tool okay now you can see that I have the three tables customers campaigns and events so let's look at them so in the customers table I have information about three customers in the campaigns table I have information about uh I think how many five different campaigns and in the events table I think there are a lot more events there are 28 okay now one thing I want to highlight is this PDF was given to me by pratish as I mentioned earlier and he did not give me any data set so just by looking at this PDF I just created a data set myself what I realized was when he gave me this PDF some of the data was actually missing because I understood that because when I compared with the output there were no 8 successful campaigns okay so it just looked at some data was missing it so I have added that data and the correct data set you will find it in my blog okay in fact all these all of the scripts data sets and everything else that I'll be using in this video I'll be putting it in my blog you can download it from there okay so going back here I have this data how do I start the very first thing that I will do here is I need to understand what I need to do I have the customers here I have the campaigns here and then I have the events right in order to simplify this and in order to help me to analyze this better first of all let me join these tables and understand how the data looks as a whole okay I can easily join these two customers and campaigns table because I think in the campaign stable I have the customer ID right so I'm just going to say select star from customers I'll name it like CSD and join it with the campaigns table I'm going to call it like CMP and I'm going to join it using CMP Dot customer ID is equal to CSD dot ID okay if I just run this you can see that I am having the information that I wanted now I think it's more easier for me to analyze I can see that I have the customer name then I have the campaigns that they have done okay now to be frank I'm not really interested in the customer ID or the campaign ID I think okay probably I'm interested in the campaign ID I think because in the events table we have the campaign ID so what I'll do is from here I'll only fetch the information that I want okay so that I can eliminate unnecessary columns okay so first of all they have mentioned in the question when you are displaying the name you need to provide the customer name first name and last name together with a space in between so I'll State away do it here so I'll say concat first name comma then a space and then I'll say last name and this whole thing I'm just going to call it like I think it's customer okay so that is my customer then I need the name that is my campaign name which will be my campaign okay so I think that's all for now let me run this just to keep the data short so you understand now I can see that the customer Caroline has done two campaigns overcoming challenges and business rule similarly Mateo has done one campaign Melissa has done two campaigns okay now this is fine I know this but the thing is each of these campaigns these customers are actually the candidates could have done it multiple times and each time they do it is called as an event so now let's go into the events table I can see that based on the campaign you can understand how many times how many events they have done and whether they were successful or not right now just I will join the events table I'm just going to say events I'll give you Alias like EV and I'm going to join using AV dot I think campaign ID equal to CMP dot ID okay and here I'll just tell uh AV Dot start okay I'll just fetch all the fields from there okay from because only two fields are there now if I fetch it now you can see that the number of Records have increased because each campaign were done multiple times there are multiple events and you can see that I have some information here right so this kind of you can get some idea from here okay now what I am interested in is I want to know the most the customer who has done the most number of successful campaigns right and then the customer has done the most number of failed campaigns right so it looks like I need to group the data based on the customer name and the status okay I need to group them and I need to count how many records are present each record is one event right problem is here you can see that record number I would say this six and seven it's the same customer it's the two different events it is successful but there are two different campaigns if I do a group by and if I have to fetch these two Fields I cannot use the normal aggregate function so probably I need to use some other function what is it I'll explain in a short time okay now let's put a group by and probably I don't need this field okay so what I'll do is I'll just fetch the status I only need this many fields okay and from here I'll do a group by so I'll say Group by the customer so I'm just going to put this Alias and then I'll say status okay and that's fine so status and customer I have grouped by but this name is basically campaign if I don't include this in the group by then this query is going to throw an error but I cannot include this campaign in the group by because then it does not make sense for me right because I want to count how many successful campaigns each customer has done right and when I'm grouping them I want if they have done multiple campaigns they should be comma separated values right as you can see here the requirement is to display that in comma separated values here right how do I do that when I group a data and if I have to if there are there is a field with multiple values I want to cover have them moved from row to column with a comma separated value one way of doing that is by using an inbuilt function called string aggregate okay I'm using postcode SQL so we have the string aggregate function if you're using some other database every database will have an alternative to this function you can probably Google it okay so I'll just use that string Aggregate and here I'll say the separator to be a comma okay and I think that's all let's see if this works and you can see that it is actually working okay so I have my customers and I have all of these campaigns but the only problem is it it can happen and that Caroline has done many successful events and these were all the campaigns right but these are all repeated because she could have done the same campaign multiple times in different events right these are all repeated here also it is repeated here also it is repeated right if you want to avoid these repeated values through your string at string aggregate function what you can do is you can just put a distinct here okay so it will fetch the unique values now if I run this now hopefully the duplicate values will be remote okay and now you can see that this makes more sense okay so now I have for each customer how many success okay how many successful events they have done I do not have y because I forgot to put the count column here so because I grouped the data based on the customer and based on the status but I also needed to count how many records they were present each record represents one event right so now I'm putting the count and I'm going to call it like let's say total okay now if I run this query now I think it's we are very close to our final output okay so you can you can see that Caroline has done successful uh events eight times she has done failed events three times okay but in the if I have to sort this so let's say I will put all this success in one in one side and I'll put the failures after that so that it becomes easier for me to analyze I'll just say order by status and then total in descending order okay so if I run this okay now I think probably you will better understand what we need to do okay now we can see that the first three are the for the failed campaigns in the failed campaigns nine is the highest so looks like Melissa has done nine failed campaigns and the most number of successful campaigns is Caroline who has done eight successful campaigns okay and if I look at my output and that is exactly what I want so Carolina has done eight successful and Melisa has done nine nine failures okay so that means from this query I only need to fetch the first record here and the first record here okay so now I think you after our first problem you already know how we can do that right and we can do that by just using the rank window function so what I'll say is I'll create another column here I'll say rank I'll need to call it like rank over partition by status and then I'll say order by count of one okay count of one in descending order and I'm going to call it like rnk so if I just run this let's see if my rank function works and you can see that my rank function does work you can see that this for failure this is rank one for successful this is rank one okay and I just need to fetch the records where the rank is one right the most failed campaigns and the most successful campaigns so what I'll do I'll put this whole thing into a city so I'm just going to save it city as and I'll move this to the right and every time whenever you are putting a sub query inside a city you do not have to have an order by okay so I'll just remove that and I'll just write my query select star from CTE where rnk equal to one okay and I think that's all if I run this now you can see that I'm only getting the two records that I actually want okay so if I just show here so that is a these are the two records I want of course probably I don't need this rank so I'll fetch the fields that I actually need and probably the first field is event type the status is called as event type so I'm just going to rename it like or give an alias like event type and I'll just copy that I'll put it here and then the next one is customers then I think it is campaign the next one is probably total okay and that's all and then probably I think the success should come at the top so I'll do an sorting so I'll say order by event type in let's say descending order okay and now let me run this and now I think this is probably my final output so now I can see that success Caroline and she has eight so if I go here success Carolyn she has eight and then failure Melissa she has nine failure Melissa she has nine okay so this is basically the solution to my second problem okay I hope you guys were able to understand this and I hope you like this problem okay now let's move on to the third problem now let's look at the last problem it States election exit poll by state report as part of hacker polls election exit poll analytics a team needs a list of candidates and their top three vote totals and the states where they occurred the result should be in the following format candidate name first place second place third place concatenate the candidates first and last name with the space between them first place second place third place are commas operated U.S state names in the format they have mentioned a format we will see that later results should be sorted ascending by candidate name okay now let's look at the table data and better understand what we need to do we have been given two different tables there is a candidate table and there's a results table in the candidate table you can see that we have two different candidates there is David cantish and thoughtsteen Bridge in the results table it basically tells me that votes each candidate has one in which state okay so you can see that if the candidate one has candidate one that is I think David canbish has one one vote so you can see that one plus one two votes in Alabama and then he has few words he has one in California and so on and so forth okay so the output that we want is something like this okay so we need to display the candidate name in The First Column and in the second column it should basically tell the number of words they have one in each state and we only need to include the top three states Okay so so each of these states if it is in top one it should be in one column the second placed State should be in the second column and third place should be in the third column so this probably looks like something where we need to use a row to column transformation right so and then there is one thing is you can see here for Torsten bridge in the first place there are two states where he has one equal number of votes three and three in Texas and California right so these two states should be put together into the single first place column okay so this is also something that we need to consider okay now let's try to solve it again I'm going to go back to my postgresql database I have already created this data set and I'm using the PG admin tool if of course you can solve it using any database you want okay all the data sets the scripts everything you will find it in my blog I'll leave the link to my blog in the description below you can download it from there and you can create these tables in your database and you can solve these problems okay now let's try to understand what we have so we have the candidates table and you have two different candidates and then I have the results table with some info information about the votes each candidate has won in the state in different states okay so what we need to do first of all again in order to better understand this problem because the second table hardly has any information right the first table has more information so in order to better understand this whole picture I'll just be joining them okay I know that I can join them using the candidate ID so what I'll do is I'll just tell select star from candidates tab I'll call it like C I'll join it with the results tab I'll call it like R I'll say r dot candidate ID equal to C dot ID okay if I run this now I think this makes more sense we have some information here okay so we have the the candidate name and and different states where they have won some words right the first thing that I just look at this and I think I know is because I'm for each candidate I need to count in California how many votes they won in Texas how many votes they won Etc right so straight away this is a very simple thing that I can do that is I just group the data based on the candidate and I group it based on the state and then I count the number of records that should give me the total number of words they have one in each state right so what I'll do is first of all I need the first name and last name to be concatenated and displayed as I think it should be displayed as candidate name so I'll do that I'm just going to use the concat function concat first name comma space last name as candidate name so this is the first column that I need secondly I need the state information uh I will just put that and what are they calling the state okay so they there is no State here it's all uh converted from road to column but for now I'll still have the state here and I think this candidate ID probably I don't need but what I do need is the count so I want to count right so I'll say count of one as I'm just going to call it like total and here I'll just do group by the candidate name and then the state okay so now if I run this now you can see that I think it makes more sense but just to better understand I'll just sort the database on each candidate so I'll say order by candidate name and then I'll say total in descending order okay so just to see for each candidate where they have won the highest uh seats okay in which state so I have this table here now you can see that for David cantish I have four records and if you see California he has won five votes Texas three votes Alabama two votes New York one vote okay so if I have to only I am only interested in finding the top three because that is what is mentioned here right the top three would be for David kandish only California Texas and Alabama and when I am displaying that I should display the state name and along with that the vote count within the parenthesis right we'll see how to do that later okay then for thoughtstin bridge I can see that in the first place there are two cities or two states Texas and California both won three votes then in New York two votes in Alabama one vote okay first and foremost even before I do anything I want to rank these records based on the number of votes they have won right in each state so how do I do that I can use the rank window function so straight away I'll do that I'll say Rank and I'll say over Partition by how do I partition okay it should be based on the candidate name right or probably I'll use this whole thing so I'll say this whole thing here and then I'll say order by uh the count of one in the descending order okay and I'm going to call it like rnk now if I just run this now you can see that I have the rank so for David cantish I can see that the first three rank one two three are the correct records that I have got but when it comes to thought strain Bridge one two three is not fetching the correct one because I have one because there are two Records who have got the same rank so both the rank is one but since I am using the rank window function the second rank is actually skipped because this is a functionality of rank window function right second script it is directly showing me three and if I put a logic such that I only fetched where the rank is one two and three I'm going to lose the last record right so this is a very good example where you should be using a dense Rank and not rank okay so I'm just going to say dense rank because dense rank will not skip a value here right so now if I use dense Rank and if I run the same thing now we can see for thoughts in Bridge I have rank 1 these two states rank two is New York and rank 3 is Alabama okay this looks fine now I can put a filter to fetch where the data is less than or equal to rank 3 right so I have all of this but the thing is I need to convert this data from row level into column level how do I do that so I have two options one I could use the pivot but in postgresql pivot does not exist we can use crosstab and I can easily do that rather since it's a very simple uh problem and it we only have three places that we need to convert from row to column I can easily do this using case statement okay so I'm going to use case for now so first of all out of all of this data I only want to fetch the data where the rank is either one two or three right so what I'll do I'll put this whole thing into a CT I'm just going to say with CTE as and I'll just move this to the right I'll put whole thing inside a parenthesis I don't need order by inside CT so I'll remove that and here I'll tell select star from CTE where rnk is less than or equal to 3 okay so if I now run this now you can see that I am getting the data that I want okay the next thing that I need to do is from here I need to convert this row level data into column level right so first of all what I'll do I know I need to fetch the candidate name so I'll fetch that the next thing is when the rank is one then I want to put whatever data the state and the total I want to put it into a separate column then when the rank is 2 I want to put the state and the total in another column and when the rank is 3 I want to take the data and the state and the total in a separate column right so three different columns so whenever you want to create separate columns in SQL one way is by using the case statement so I'll just do that I'll say case when rnk is equal to 1 then I'll say I take the state and I need to concatenate because you see that the state space then there is a parenthesis then the total how many words they want right so I need to have both so I'll put a space here I'll concatenate with the total and okay after the space there should be a parenthesis I'll put that then concatenate with total then again concatenate with the closing parenthesis right and I think that's all I'll end my case and I'm going to name it like uh it is first place the column name is first place so I'm just going to name the same so first place okay and I'm just going to copy the same for the other two columns so here I'll tell rank equal to 2 and here I'll say rank equal to 3 and the column names will be this is second Rank and this will be third rank okay so now if I just run this now you can see that I am somewhat getting the data but the problem is because I am using a case I have I did not aggregate the values right so it did not reduce the number of records for these candidates I'm still getting all the did all the records They had okay but ideally I only want one record for each candidate as you can see here right so I want one record for each candidate so probably I need if I I have two options one I could do group by each candidate and then I take the max of each of these fields then it will fetch it will ignore the null and it will take wherever the values are there the problem with doing that is for Thursday Ridge in the first place there are two records and I need to consider both of them and I need to display them in comma separated values right so when you have this requirement using any aggregate function Max or Min is not going to help rather we can use string aggregate which we used previously right so that is exactly what I'm going to do so first of all I'll say Group by the candidate name so I'll put the candidate name here and then here this whole case I'll put it inside the string aggregate so say string underscore aggregate and I'll just copy this whole thing here and here and outside the end I'll just say the it should be comma separated value okay if there are multiple values then the values should be comma separated that is what this means okay so I'll put that here and I'll put that here okay I think that's all now if I just run this now we can see that I'm almost I think I have got the data that I wanted okay so now you can see David cantish in California five Texas three Alabama to Torsten Bridge Texas and California both three New York 2 and Alabama one okay now if I show here it's almost everything is exactly what I want except for you see in the first place California should have come first and then Texas So within this field I need to do a sorting when I'm taking multiple values using string aggregate I need to sort the values right based on the state if you have to do that what you can do is after you mentioned your um the separator okay the comma you can just mention an order by here so I can say order by state okay so hopefully this Texas will go later and California will come first okay now if I just run this now you can see that it is exactly how I want it so California 3 and Texas three this is exactly how I wanted okay of course I can put this order by for the other two Fields as well but here I don't need it so I'm not putting it okay but this is the solution to the third problem I hope everyone understood this I hope you like this video if you did give me a thumbs up and make sure to leave a feedback of what you felt about this problem if you like this video and if you have this kind of problems which you have faced during your interviews definitely share it with me I'll leave my email ID in the description below thank you so much for watching and see you soon in the next one bye
Info
Channel: techTFQ
Views: 55,244
Rating: undefined out of 5
Keywords: sql, sql query, sql queries, learn sql, solving sql queries, practice sql, practice sql queries, learn sql queries, sql tutorial, sql interview queries, sql problem, sql interview, sql interview problem, business analyst, business analyst interview, data analyst, data analyst sql, sql interview questions, sql interview questions and answers, interview questions sql, techtfq sql, sql interview question
Id: W5Wvyc9Pass
Channel Id: undefined
Length: 36min 26sec (2186 seconds)
Published: Wed Sep 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.