SQL Server Queries Part 10 - GROUP BY and HAVING

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to this wise our video tutorial in this session we're going to teach you all about grouping and totaling or aggregating queries in Microsoft sequel server what we'll cover in this session to begin with is a look at the basic applica functions will explain what aggregate functions are and how they differ from the normal functions or introduce you them to the five most commonly used aggregate functions we'll also take a moment to teach you about how to use base types properly with the aggregate functions when we cover that we'll move on to show you how to you to use aggregate functions and group data at the same time so we'll introduce you to the group by clause show you how you can add grand totals to a groups query we'll explain how to use criteria based on aggregated data whether choose the where clause or the having clause and finally we'll show you how you can group by more than one column at a time so let's get started the job of the aggregate functions in sequel server is to change a range of numbers such as this list of film running time in minutes into a single data point so as a simple first example instead of showing each individual film run time in minutes we can calculate the sum of the film running time in minutes which is basically the total if I give it a simple alias and then execute the query what we'll see is these 261 rows changing to a single row there are five main basic aggregate functions that you'll probably want to get a handle on in sequel server as far as you've already seen the sum function calculates the total of a range of numbers there's also the AVG function which will give us the hopefully obviously average I can spell my aliases properly this will go along okay so there we go the average of run time there's also max which gives us the highest in a range of numbers hopefully for obvious reasons then there's a min function which just the opposite of that so the minute film in ten minutes I'm gonna void putting my alias is in there we go the minimum the lowest number of that range and finally there's or count which simply tells you how many there are now ordinarily you count the the asterisk field or perhaps an index primary key which which is supposedly they're the most efficient way to use the count function count simply tells me how many items there are so those are the five main basic aggregate functions just as with many other calculations that you're performing sequel server when you're working with the aggregate functions you'll want to make sure that you're working with the correct data types so as a short example here I've got a list of the film's box-office takings in dollars but reasonably large numbers if I try to aggregate those numbers by summing them try to calculate the total I'm gonna find that end up with an error message if I execute the query I get the arithmetic overflow error the problem is that the film box office dollars feels or has the date type of int and when I add all those numbers together the the total is larger than the capacity of the end datatype so what I need to do is convert or cast the film box office dollars field into a different datatype first I can use that using I can do that using either the cast or convert functions I'll use the convert function for this particular example our convert into big int the film box office dollars takings and execute query and I can finally calculate the total that's fairly obvious example when you get an error message when you try to aggregate data but sometimes you'll end up with that with examples that aren't quite so obvious but I try to take the average of let's pick a smaller a smaller range of numbers of film in run time in minutes if I take the average of this of this field of i'ma value 126 but that's not actually the true value the true average of those numbers because the film runtime minutes is stored as an int then the answer I'm gonna get here is an int as well what I'd really like to see if I use the convert function again I convert the film runtime in minutes into a decimal first and then execute query I see a much more accurate reflection of the average so two small examples there that the suggest you should be using the correct data type when you're using the aggregate functions just like every other calculation in sequel server another useful feature of the aggregate functions is that you can calculate the the totals the aggregates for different groups of data within your result set so here I'm looking at the list of films with their country names the IIU the country they're in which they were filmed and their running time in minutes what I'd really like to see is the average running time for each individual country so at this point if I add in the AVG function around the film run time in minutes field I'm not going to bother converting the data type in this example but if I execute the query at this stage I'm going to run into a problem the problem is with the country name field it's an invalid object as there the message suggests as it tells us at this point because I haven't told the query to group by the country name that's where what the group by clause comes into play if you're using the aggregate functions in a select list any other field that isn't aggregated must be included in the group by clause so what I need to do is say group by kanji name in the group by clause and when I execute the query now I can see my data groups by country as well as showing the aggregate information for groups of data within your result set so here I've simply added in a few extra aggregate functions so for each individual conscience in the average running time total longest and shortest sometimes it's also useful to be able to see the grand total for all of the data in the result set so one straightforward way to do this is to add to the end of the group by clause the width roll-up if I execute the query now I'll see that I get an extra row in the result set calculating average running time for the entire set of films the total for the entire set of films and so on if you're unhappy with the word null in the and the country name column a relatively straightforward way to modify that is to use the is null function in the Select list so is null country name I can simply replace that with the word total if I exclude the query now I have a fairly obvious total row you can use criteria in a group in total query just in the same way you the same way you'd ordinarily add criteria using the where clause and as long as you're using the where clause to add criteria to a non aggregated field this works perfectly happily so if I add in for instance a where clause just that from which says let's see let's look for all the countries whose name begins with a letter U as a simple example so where the country name like you the percent symbol there's a wild-card if I execute the query now I'll find that I only get the countries whose names begins with a letter U perfect now sometimes you're wanted to perform or use criteria on I feel that you've already aggregated so for example perhaps I'd like to show the this set of results but only for countries whose shortage running time is greater than or equal to 100 minutes so just to show you why why you wouldn't use this or try to do this in the where clause if I tried to use the the min function and having a where clause where that's why the minimum of the film running time in minutes is greater than or equal to 100 it all looks fairly sensible doesn't appear to be anything wrong with it but if I execute the query I'll be told that I'm not allowed to use an aggregate function in the where clause so I'm it's covered fortunately there's a clue you know that I can use this the other way the aggregate function in the having clause instead that's what I'm going to do I just remove the results panel just for the moment by pressing ctrl + R and then if I add in and the having Clause this is the the order you always have to use your you write your select statements in this particular order it's like from where grouped by having order by I can now move this criteria into the having Clause instead eliminate the where clause and now on execute the query that's the set of results that I'll get one thing to be quite careful of when using the having clause is that your if you're using that the role updates are the grand totals you may will find them very chance any longer accurate if I can quickly show you what I mean by that I'm gonna I'm gonna write in another having clause it's gonna look for films having an average or so you look for countries having an average winning time in minutes of greater than 125 minutes if I execute this query now I'll see that I get this fella short list of countries for countries in my grand total however can you see the total running time in minutes is significantly longer than the sum of the remaining films or the ringing there are many countries their issue with this is that the the roll up date of the aggregate total aggregated data is calculated before the having Clause filters out the non matching records so the countries have been filtered out after the grand totals have been calculated that's just one thing to be quite careful of when using the having clause it's possible to creep by and aggregate data over more than one column at a time so in this example I've got a country name and direct name next to the running time in minutes of each film so you can see that for certain country combinations country and direct combinations there's more than one entry so this country named China territory move you move young and so on also unknown number three then in France we have luc besson having two films sort of like to do is group by country and by director and find the average running summer minutes of my films so by adding the average running time in minutes and what you can imagine the that converged function here as well so I can see the actual true value just as a quick reminder convert it into a decimal thumb runs I'm in minutes then if I want to see that the data groups by country and director that's all I need to add in to the group by clause could you name and then just as you come a separate a list of fields in the Select list you can comma separates a list of column names in the group by clause as well there is a name if I execute the query now that's exactly what I'm going to see so for luc besson who had two films in list he has now just one single row for his french films things might become a little bit more obvious further down the list so Steven Spielberg for instance has there the largest number of films in this database and that's his single row of data if I added in the count of films perhaps that'll make things even more obvious so there we go the final interesting thing that we can do is just revisit adding in the grand totals for this group by query and we can do that if you remember by adding the with roll-up commands at the end of the group by clause now because I've grouped by more than one column in this query if I execute this now I'll see not just the grand total for the entire dataset in this case at the top of the list but also at the bottom of each country group seeing a row that calculates the average average running time and then total number of films for each individual country which is kind of neat I suppose is a final flourish what we should really do is use the is null function to change the directors name where it becomes null in that column we should change this to the word total and perhaps as well the country name where that's becomes null exchange it for the word grand hopefully when I finally axia this query you'll see what the point is very top of the list is the grand total for the entire list and then at the bottom of each country there's a total row for each country so that's how to use gripping the group by clause and the aggregate functions in sequel server if you've enjoyed this training video you can find many more online training resources at www.weiu.net
Info
Channel: WiseOwlTutorials
Views: 99,699
Rating: 4.9756837 out of 5
Keywords: sql server, group by, aggregate, sum, avg, average, wise owl
Id: oWkvHodS9cA
Channel Id: undefined
Length: 13min 50sec (830 seconds)
Published: Wed Jun 27 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.