Database Lesson #3 of 8 - The Structured Query Language (SQL)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Good day, everyone. This is Dr. Soper here and today I have the great privilege of giving you a crash course in the Structured Query Language. We have many different topics for this lecture, but generally speaking all of the topics that we will address are concerned with providing you with a basic set of skills that will allow you to begin working with databases using the Structured Query Language. So along the way today we will learn basic structures for performing our four basic database operations. That is creating new data, reading existing data from the database, updating data, and deleting data. And we'll learn about some additional interesting topics, including how to extract data from multiple tables and how to use the Structured Query Language to create, modify, and delete various database objects and constraints. As I mentioned, the primary focus of today's lecture is on the Structured Query Language. The acronym for this is SQL and it's pronounced as Sequel or S-Q-L depending upon your preference. I will use both of these throughout the rest of this lecture. The Structured Query Language was originally developed by IBM in the 1970s. So this has been around for a long time. It was originally called SEQUEL and it was designed to support the relational database model which was first proposed by Edgar Codd in his 1970 paper in Communications of the ACM. As we move through our course together and we learn more about the relational model and relational databases, I hope that you will come to appreciate just how amazing it is that no one's solved this relational data problem until 1970. For all of human history, until just a few decades ago, people were not able to think about data and relationships in this formal way that we are now all so familiar with today. A few additional things that you should know about the Structured Query Language before we proceed is that the ideas and capabilities of the Structured Query Language are grounded in relational algebra. And the Structured Query Language, itself, is a standard. So this is an ANSI standard and an ISO standard. One of the great things about the Structured Query Language is that if you can learn to use SQL well, then you will be able to sit down and use almost any database in the world with very little learning curve, since virtually all major database vendors support the Structured Query Language in their products. If you can learn this language well then you'll be able to work with databases regardless of the vendor, regardless of the operating system, or regardless of the device. So you could work with a SQL Server database, or an Oracle database, MySQL, DB2. You can work on Linux, Unix, Mac, PC. You can even work on mobile devices. So learning the Structured Query Language is an extremely valuable skill for anyone who wants to work with databases. Another important characteristic of the Structured Query Language to remember is that it is not in and of itself a programming language, but rather it is a data sub language. You will come to appreciate, I hope, that the Structured Query Language is simple to use and is fairly easy to understand. Even if you've had no experience using SQL in the past, by the end of the lecture today, you should be able to understand SQL statements and what they are intended to do and even write some simple SQL statements of your own. The Structured Query Language itself can be subdivided into three major components. These are the data definition language which we can refer to as DDL, the data manipulation language which we can refer to as DML, and the data control language which we can refer to as DCL. The data definition language provides us with a series of commands that we can use to create database objects and constraints. The data manipulation language, by contrast, provides us with a series of statements that we can use for creating new data, reading existing data from the database, updating existing data in the database, or deleting data within the database. Finally, the data control language is used for establishing things like permissions, user accounts, security, and so forth. Our focus here today is going to be on the first two of these three components. So we will be learning about some of the elements of the data definition language and the data manipulation language. Beginning with the data definition language, major statements within the DDL include CREATE, ALTER, and DROP. CREATE is used for the purpose of creating new databases or database objects. So if I want to create a new table, for example, I can write a CREATE statement in SQL. ALTER is used for the purpose of modifying existing database objects. So if we need to modify the structure of a table or if we need to add a new constraint we can use the ALTER statement within SQL. Finally, we have the DROP statement. And this component of the DDL allows us to delete existing database objects. So for example, if we need to delete a table or if we need to delete a view we can use the DROP statement to accomplish that task. Let's begin by taking a closer look at the CREATE statement. In this case, we are looking at a CREATE TABLE statement, the purpose of which is to establish a new table in the database named Employee. Let's examine some of the characteristics of the CREATE TABLE statement. First, note that we begin just by using the statement itself, CREATE TABLE. And then whatever comes after that will be the name of the table. Remember that each table within the database must have a unique name. In this case, we're creating an Employee table. The next component is an open parenthesis. Everything that will be a part of the CREATE TABLE statement must fall between the open and close parentheses. Also note that the SQL statement ends with a semicolon. It is always good practice to end all of your SQL statements with a semicolon. What we see here is a very simple CREATE TABLE statement in which all we are doing is establishing a table named Employee which will contain two columns. The first of these columns is called Employee ID, which I've abbreviated here as empId, and the second column is named Employee Name. Let's begin by looking at the Employee ID column. You will notice that the first component is the name of the column and then we have a number of spaces after which the word Integer appears. This is referring to the data type of the column. So we are saying here that every employee ID that is added into this new Employee table must be an integer. Next we see a number of spaces and then the words NOT NULL. This indicates to the database that every row which is added into the Employee table must have a value for the employee ID. That is we do not allow null values to exist for the employee ID. And then we see a comma. A comma tells the database that we have finished defining this attribute and we are now ready to define the next attribute. A few notes here. I have made the CREATE TABLE statement extend over four lines and this is just for convenience. It is for us, as human beings, to be able to read the statement more easily. Similarly, I have added spaces between the names of the attributes and their data types and whether or not the attribute allows null values. Again, this is just for convenience and ease of interpretation by human eyes. The database engine itself does not care about all of these extra spaces or line feeds. It will treat this as one full and complete statement regardless of whether we have split it across multiple lines or have added extra spaces in between the various elements of each statement. Next, let's see how we can expand upon our CREATE TABLE statements in order to create tables that are a little more interesting and a little more useful. We'll begin by learning how to define a primary key using the Structured Query Language. And for this purpose we will use the CONSTRAINT keyword. So within the Structured Query Language keys, such as primary keys and foreign keys, are considered to be constraints. There are other types of constraints, as well, and we will talk about those as we progress through this lecture. Note that the CONSTRAINT keyword is used inside the parentheses because it is a part of the CREATE TABLE statement. And note that we give the constraint a name. In this case, we're defining a primary key for our Employee table and we are a naming that constraint Employee PK, empPk. An important point to remember is that within a database every constraint must have a unique name. After the name of the constraint, we see the words PRIMARY KEY and then an open and close parenthesis. This tells the database that we want to use our empId attribute, which we defined a little earlier in the statement, as the primary key. And as we know, every value of a primary key must be unique so we are placing this primary key constraint onto the database. We are telling the database that each value within the Employee ID column must be unique for the Employee table. And then note that we, just as before, close the CREATE TABLE statement with a close parenthesis and a semicolon. Here we see a variation of creating a primary key where, instead of creating a simple primary key, we want to create a composite primary key. In this case, the primary key consists of two attributes named empId and skillId. And to create the composite primary key, we simply need to separate these two attributes by a comma and put them both inside the PRIMARY KEY statement. Now that we know how to establish both simple and composite primary keys, let's take a look at how to create foreign keys. Now, just as a primary key is considered a constraint, a foreign key is also considered a constraint. So we implement a foreign key relationship using the CONSTRAINT keyword. Here we are establishing two foreign keys, one of which is named empFk and the other of which is named skillFk. And we are going to use these foreign keys to link our EmployeeSkill table to an Employee table and to a Skill table. So conceptually speaking, we can imagine that we have two parent tables, Employee and Skill, and then we have this third table named EmployeeSkill which sits between the other tables and allows us to connect them together. In this way, each employee will be allowed to have multiple skills and each skill will be able to apply to multiple employees. To establish these relationships, we need to add foreign keys to our EmployeeSkill table. And as we can see, we have these two foreign key constraints, empFk and skillFk. And to establish the foreign keys, we use the keywords FOREIGN KEY and then an open parenthesis and inside those parentheses we specify the name of the attribute within this table that we want to be a part of the primary key/foreign key relationship. So we're saying FOREIGN KEY empId in the first example. That's telling the database that we want to use the empId attribute in this table as a foreign key. Following this we see the keyword REFERENCES and then Employee open parenthesis empId close parenthesis. This tells the database where the other end of the relationship should be. So we want this foreign key to point to the empId attribute within the Employee table. We then see a comma and after that, we provide our next constraint which, in this case, is the foreign key which links our EmployeeSkill table to the Skill table. And we are linking these together according to the skillID attribute in each table. So creating these primary keys and foreign keys is actually quite easy in the Structured Query Language. Now, let's extend our knowledge of these constraints even further by considering cascading deletes and cascading updates. First, let's consider the employee foreign key where here, after defining the foreign key, we see the keywords ON DELETE CASCADE. This tells the database that we want to establish a cascading delete relationship between this table and the Employee table. A cascading delete is used to help us maintain the integrity and the quality of the data in the database. And here's how it works. In this case, we have an Employee table and EmployeeSkill table. So let's imagine that we have an employee number one and this employee has skills 101 and 102. Now, if our employee number one leaves the organization-- maybe they find a better job or we decide to get rid of them-- then we may want to delete the employee from the database. What a cascading delete will do for us is when we delete our employee from the database, the database will follow the relationship line into the EmployeeSkill table and will look for any records that were associated with the employee that we want to delete. If it finds any of those records, it will delete them from the EmployeeSkill table automatically for us. In this way, the database maintains the quality of the data by eliminating any records in the EmployeeSkill table which would no longer be associated with an existing record in the Employee table. Next we will consider a cascading update, which is very similar to a cascading delete in that the purpose of a cascading update is to maintain the quality or the integrity of the data in the database. In this case, we are establishing a cascading update between the Skill table and the EmployeeSkill table. And what this means is that if we were to update one of the primary key values in the Skill table, let's say that we change skill 101 to 105, then the database will follow the relationship from the Skill table to the EmployeeSkill table and will look in the EmployeeSkill table for any rows that contained skill number 101. If it finds any of those rows, it will automatically update 101 to the new value 105 so as to maintain the link between records in the Skill table and records in the EmployeeSkill table. I hope that you can appreciate just how useful these cascading deletes and cascading updates can be. Next, let's take a look at the SQL ALTER statement. As I mentioned earlier, the ALTER statement is used for the purpose of changing the structure of a database object. For example, let's imagine that we created our Employee table, but after creating the table we realize that we forgot to establish a primary key. Well, we could easily use the ALTER TABLE statement in order to add a primary key to the table. The basic syntax for the ALTER statement is shown here. In this case, we want to alter the Employee table and we use the keywords ALTER TABLE and then the name of the table that we want to change. And then we specify the change that we want to make. In this case, we want to add a new constraint, specifically a primary key constraint named empPk, and we're telling the database that we want the empId attribute within this Employee table to serve as our primary key. And by running that SQL statement our Employee table would then have empId marked as the primary key. Just as with our CREATE TABLE example, we see here that we use a comma separated list of attributes inside parentheses to tell the database that we want to create a composite primary key, in this case consisting of the Employee ID and Skill ID columns. Adding a foreign key constraint to an existing table is just as easy as adding a primary key. We use the ALTER TABLE statement, specify the name of the table, and then we add a new constraint using the same syntax that we learned for the CREATE TABLE statement. In this case, we are telling the database that we want to link the deptId attribute in this Employee table to the deptId attribute in the Department table. With respect to changing the data within the tables in our database, there are really just three things that we can do. We can either add new data into the table, we can change data that already exists in the table, or we can delete existing data. And we accomplish these three tasks by using the INSERT INTO, UPDATE, and DELETE FROM commands respectively. Let's begin by considering the situation in which we want to add new data into the database table. For this purpose, we can use the SQL INSERT INTO statement. Note here that if we want to specify non numeric data, if we want to add non numeric data such as text into one of the columns in the table, then we must enclose that value in single quotes. Let's look at the syntax for the INSERT INTO statement. Note that it begins with the keywords INSERT INTO followed by the name of the table in which we want to add this new row of data. And then we have an open parenthesis and a comma separated list of attribute names followed by a close parenthesis. So what I'm telling the database here is I want to add a new record into the Employee table and I'm going to be specifying values for the employee ID, the salary code, and the last name of the employee in that order. Next, we see the keyword VALUES followed by an open parenthesis, a comma separated list of values, and a close parenthesis. The values which appear inside this second set of parentheses are the actual data values which will be added into the table. In this case, the values are 62, 11, and Halpert. Note that the order matters here. We have specified that we will be providing an employee ID, a salary code, and the last name in that order. The values that I provide inside the second set of parentheses then must match the same order of the values as they appeared in the first set of parentheses. In this case, we're saying that the empId should be 62, the value of salaryCode should be 11, and the value of lastName should be Halpert. Now let's look at the UPDATE statement. UPDATE allows us to alter or modify the values of an existing row or set of rows within a table. Let's consider the syntax for the UPDATE statement. To begin, we use the keyword UPDATE followed by the name of the table that we wish to update. Next, we use the keyword SET and then the name of the attribute that we want to change followed by the new value for that attribute. After which, we see a new keyword called WHERE which specifies which row or rows we want to change. So in this first example, we're telling the database I want to change a value in the Employee table. I want to change the phone number to 657-278-1234. And I want to make that change only for the row in the table which has empId equal to 29. So in plain language we're saying change the phone number for employee ID number 29 to the value shown here. In our next example, we are setting the Employee ID to four every employee in the table who's employee name begins with the letters D-A and we will learn a little bit more about these wild card searches later. But the point here is to illustrate that an UPDATE statement can change values for more than one row at a time. And this point is further shown with our final example, which just says UPDATE Employee SET deptID equals 3. And what this command will do is it will change the department ID for every employee in the table to the value 3. One of the things that you will learn as you become more skilled with the Structured Query Language is that often the shortest statements can cause the most change in the database. So here, if we had a table, an Employee table, which contained one million rows, this statements UPDATE Employees SET deptId equals 3 would make one million changes to that database. It would set the department ID to 3 for every employee. And what if we wanted to delete one or more rows from an existing table? In that case, we would use the DELETE FROM statement. And the syntax is simple. We begin with the keywords DELETE FROM and then the name of the table from which we want to delete data. In this case, we're telling the database that we want to delete some data from the Employee table. Next we see WHERE clause and that specifies which employee we want to delete. So in this case, we're telling the database delete the record in the Employee table who's Employee ID has a value of 29. Just as with the UPDATE statement, we can use the DELETE FROM statement to delete more than one row at a time within the table. And this is what we see in our second example. Here we're telling the database to delete all of the rows in the Employee table where the name of the employee begins with the letters D-A. So any employees named Dan or Daniel or David or Danica or Danielle or Dana they would all be deleted from the employee table if we were to run this query. And our last example is the simplest, DELETE FROM Employee. Again, this is a short command which has the potential to do the most damage or cause the most change within the table. In this case, if we were to run that query it would delete every employee in the Employee table. Now we can turn our attention to SQL SELECT queries. A SELECT query is used to extract information from the database, or perhaps a more interesting way of thinking about this is we can use the SELECT query to get answers from the database. So for example, if I want to know which employees work in the Marketing Department, that's a question that I can ask the database using a SQL SELECT statement and it will give me the answer. Similarly, I might ask questions like what is the salary of employee number two or how many employees work in the Sales Department or what was the total amount of sales for last month? These are all questions that we can ask the database using a SQL SELECT query and it can provide us with the answer. To begin, we specify the SELECT keyword followed by the names of one or more columns that we want to appear in the results. So values for the specified columns which appear after the keyword SELECT will become a part of our results. In this first example, you can see that we are using SELECT empName. This means that the only column which will appear in our results will be the empName column. Next, we see the FROM keyword and after the FROM keyword we specify one or more tables from which we want our data to be extracted. In this example, we're just saying we want data from the Employee table. And just as with the UPDATE and DELETE FROM statements, we can, if we want to, use a WHERE clause which will filter the results. So in this first example, in simple language we're telling the database give me the name of the employee in the Employee table whose ID is equal to 33. In our second example, you will notice that we do not have a WHERE clause. And in simple language, this query is telling the database to give us a list of all of the employees in the Employee table. One of the most important concepts for you to understand about these SQL SELECT queries is that the results of the query are a relation. That is, as long as there are some data to be retrieved, the result will be a two dimensional table of data. It may contain just a single row or a single column or it may contain many rows or many columns but the results can be thought of as a two dimensional table of data. Because the result of a query is a two dimensional table of data, we can then use that result, if we want, to perform other tasks. Such as, we can use the result to create a new relation or we can use the result of one query as input into another query. We'll take a look at that situation a little later. Just remember that our result set, that is the results that we get back, the table of data that we get back in response to our query, may not in third normal form. So it is a relation, but it may not be in third normal form. And this is especially common when we perform something called a JOIN operation which is where we are simultaneously pulling data from more than one table. In our previous examples of using the SELECT statement, we we're extracting just a single column of data from the database. But, what if we want to get values for more than one column from the table. Well in order to accomplish that task, we simply need to use a comma separated list of column names. In this example, we're telling the database to give us a list of employee IDs and employee names from the Employee table. So it will return a result set which contains two columns, empId and empName, and then all of the employee IDs and employee names which appear in the Employee table. We also have the option of retrieving all of the columns within a specific table or set of tables. And we can do this by using the asterisk character. Just as a brief note, I will commonly refer to this as a star instead of an asterisk. So in this example here, we see SELECT star FROM Employee. This tells the database to retrieve all of the data from the Employee table, every column and every row. Remember the star represents the concept of all columns. Not all rows, but all columns. So this example says select all columns from the Employee table. And because we are not specifying any filtering criteria using the WHERE clause, it will also return all rows. What if we have a situation where the result of our query contains many duplicate rows? Well we can eliminate that problem by using the SQL DISTINCT keyword. And if we place the DISTINCT keyword in front of one of our column names then the database will remove any duplicate values of that attribute when generating the results. In this example, we're telling the database to give us a list of department IDs from the Employee table, but we don't want any duplicates in the list. That is, we just want all of the unique department IDs. In a few of our previous examples, we used the WHERE clause and so at this point I think it would be a good idea to explore the WHERE clause in a little more detail. Basically, what the WHERE clause does is it allows us to tell the database to constrain its actions according to some kind of filtering criteria. In this example, we're using a WHERE clause with just a simple equality check. So we're telling the database to give us the names of all of the employees in the Employee table that have a value for deptId of 15. In simple language, we're saying give us a list of all of the employees who work in department 15. The WHERE clause is being used here to restrict the full set of employee names to just to those that work in department ID number 15. So we have an equality check here, a simple equals sign, but there are many other types of comparisons that can be done in the WHERE clause. So aside from equals, which is just a simple equal sign as we saw, we can do not equals using open and close pointy brackets or in some databases you can use an exclamation point and an equal sign for not equal to. We could also represent the concepts of greater than or less than, greater than or equal to, and less than or equal to. And using these various comparisons gives us a great deal of flexibility in the types of filtering that we can do in the WHERE clause. What's more is that we can create compound conditions. We can filter items based upon more than one criterion. One of the ways that we can do this is by using the AND keyword. So if we want to check if two conditions are true, we can use the AND keyword. And in terms of relational algebra, this represents an intersection of the data sets. We also have access to an OR keyword. And in terms of relational algebra, this represents a union of the data sets. One way to think about AND and OR is in terms of Venn diagrams. Imagining that we have two sets, A and B. The AND operation is the intersection of those two sets. It is the area of overlap between A and B. The OR operation, however, is saying if either A is true or B is true or both A and B are true, then the entire function returns as true. Now, let's look at a few quick examples of these AND and OR operators being used. In this first example, we are asking the database to give us a list of employee names from the Employee table where the employees' department ID is less than seven or the employees' department ID is greater than 12. So any employee record in that table that has a value of the deptId of less than 7 or a value of the deptId of greater than 12 will become a part of the results. In our next example, we are asking the database to give us a list of employee names from the Employee table where the employee works in department ID number 9 and where the employee also has a salary code of less than or equal to 3. So in this example both of these conditions must be true in order for a row to be included in the results set. That is, the employee's department ID must be 9 and the employee's salary code must be less than or equal to 3. Whenever I'm teaching the Structured Query Language, I always like to provide people with shortcuts or easy ways of doing things. One example of that is the use of the IN keyword. Now the IN keyword can be used to provide a list of values for a WHERE clause. And perhaps the best way to learn this is by looking at an example. Here we see a SELECT query which is asking the database to give us a list of employee names from the Employee table where the department ID for the employee is either 4, 8, or 9. So as the database looks through this table and is attempting to assemble the set of results, if the deptId is equal to the values 4, 8, or 9, that row will be included in the results. Otherwise, they will not become a part of the result set. Now we can compare this method to a method which uses a series of OR operators. So instead of saying WHERE deptId IN 4, 8, 9, we could instead say WHERE deptId equals 4 OR deptId equals 8 OR deptId equals 9. Hopefully you can see that using the IN keyword can save you a few key strokes, especially if you have a long list of acceptable values that you would like to check for in the WHERE clause. We also have access to the NOT operator which allows us to invert the filtering criteria specified in the WHERE clause. Consider this example. Here we are asking the database to give us a list of employee names from the Employee table where the department ID is not 4, 8, or 9. So the results here will be a list of employee names from the Employee table for every employee except for those employees who work in departments 4, 8, or 9. Another useful shortcut is to use the BETWEEN key word. BETWEEN allows us to specify a range of values in our WHERE clause such that any value between a minimum and a maximum value is considered acceptable. Let's see an example. Here we are asking the database to give us a list of employee names from the Employee table where the salary code falls between 10 and 45. So as the database is scanning through the Employee table trying to assemble our results, it will check to see if the value of the salary code for each row falls somewhere between 10 and 45. And that is an inclusive range, meaning that 10 and 45 fall within the range. If a row has a salary code that falls between 10 and 45, it will be included in the result set. If not, that row will not be included in the result set. This is equivalent using a compound WHERE statement with an AND operator. So instead of saying WHERE salaryCode BETWEEN 10 AND 45, we could instead say WHERE salaryCode is greater than or equal to 10 AND salaryCode is less than or equal to 45. The results will be the same. Next, I would like to talk about the SQL LIKE keyword. And the LIKE keyword is very useful because it allows us to do partial searches. So for example, if I would like to generate a list of employees whose names start with the letter D or if I would like to retrieve a list of customer email addresses that end in gmail.com I can use the LIKE keyword in order to achieve that result. SQL provides us with a few different types of wild card characters that we can use for these types of partial searches. First, is the percent sign. Now a percent sign means match any number of characters. So this is any number of wild card characters. To whereas the second type of wild card character is a single character wild card. And for this purpose we use the underscore symbol. This allows us to match values against a single, unknown character. Let's see some examples of how these wild card searches work. In this first example, we are asking the database to give us a list of employees from the empId table whose names begin with D- A. So we see our percent sign wild card character shown here. We're saying we want the employee name to look like D- A and then any set of characters thereafter. So, zero or more characters thereafter. This means that values of Dan or Daniel or Danica or Danielle or David or any other employee name that starts with the letters D- A would be a part of that result set. In our second example, we see the use of the single wild card character. And in this case, we are asking the database to give us a list of employee IDs for employees whose phone numbers begin with 6-5-7 dash 2-7-8 dash and then contain exactly four unknown characters. So here we're using the single character wildcard in order to tell the database that we want to match a very specific pattern of telephone numbers. This wild card search capability that is built into the Structured Query Language can be easily used to build something like a simple search engine. Thus far, with all of our SQL SELECT statements we've retrieved a list of results but we've said nothing about the order in which the rows contained within those results would appear. I, for example, have asked many times for a list of employees, but they can be in whatever order the database chooses to put them in. If I want to instruct the database to return the results in a specific order, then I can use something called an ORDER BY statement. In this example, we are asking the database to give us all of the columns out of the Employee table and we want the results to be ordered by the employee name. In this case, we can assume that empName is an alphanumeric column that contains text values, so they will be ordered alphabetically. Note that SQL supports two different types of sorting. You can do and an ascending sort or a descending sort. By default, if we do not specify a sort type, SQL will assume that we want an ascending sort. If we like, we can tell SQL specifically what type of sort to perform and we can do this by using the keyword ASC which stands for ascending or the keyword DESC which stands for descending. And all we need to do is add these characters after the name of the attribute by which we want to sort and the database engine will sort the results according to our desires. A few interesting things that I need to point out about sorting. Most people are aware that an ascending sort on numeric values means that the resulting sorted values will go from small numbers to large numbers. Similarly, an ascending sort on text would be an alphabetic sort. So in the English language we would expect the results to be sorted A through Z. And of course, we can do a descending sort which inverts the sort operations. So a descending numeric sort will have numbers that go from high to low, or large to small if you prefer, whereas a descending sort on a text column will be a reverse alphabetic sort. Here's a question for you. If I perform a sort on a series of dates, what do the results look like? Well, if we're sorting on a date or time column and we're using an ascending sort, the results will be sorted from oldest at the top to the newest at the bottom. Similarly, if we do a descending sort on a date or time field then the newest records will appear at the top and the oldest records will appear at the bottom of the list. SQL provides us with several additional built in functions that can be very, very useful for our querying purposes. For example, we can use the COUNT function to get a count of the number of rows that match some sort of specified criteria. Or if we want to find the minimum or maximum value for a column within a set of rows, we can use the MIN and MAX functions. We also have access to a SUM function if we would like to compute a total for a specific column. So for example, I may be interested in knowing what my total sales were yesterday. Or we might want to determine an average. As an example, I might say what were the average daily sales last week? Or a standard deviation. So taken together we have access to some very useful built in functions within the Structured Query Language. Let's see some examples. In this first case, we're saying simply SELECT COUNT star FROM Employee. This is just asking the database to give us the number of records in the Employee table. Or in simpler language, how many employees are in the Employee table. In our second example, let's imagine that we are interested in learning something about the total number of hours that are required for the various projects within our company. In this case, we're using the MIN, MAX, and AVG functions in order to determine the minimum number of hours involved in a project, the maximum number of hours involved, and the average number of hours involved. So when processing this query, the database will first scan through the entire project table and extract the set of rows whose projID values are greater than 7 and then it will look within that set of rows at values of the hours column and it will use those values to determine the minimum number of hours worked, the maximum number of hours worked, and then it will also compute the average number of hours for all of those projects. And then return the results to us. Very convenient. Another extremely useful statement within the Structured Query Language is the GROUP BY clause. And what the GROUP BY clause does is it allows us to combine results into some sort of categorical or categorized output. Let's look at an example. In this case, we want the department ID and the number of employees that work in each department. And we can do this by using the GROUP BY clause. So we tell the database to give us deptId and the count or the number of rows within the Employee table, but we want the results to be grouped together by deptId. So in this case, the result of this query would be a table which contains two columns, the deptId and another column which will be named numOfEmployees, and each row then will contain the department ID and its associated number of employees. Note that I'm using something here called an alias to refer to the result of the count operation. And this is implemented in this case using the AS keyword. So I'm telling the database that I want the result of the COUNT function to be called numberOfEmployees. The use of aliases can be very convenient and we will see some more examples later in our lecture. If we're doing a GROUP BY in order to categorize our results in some way, we may also want to use the HAVING statement. And what the HAVING statement does is it allows us to filter categorized results. Let's look at this example. In this case, I'm asking the database to give me a list of salesperson IDs, salesperson last names, and the total sales, which in this case is computed as the sum of the sale amount, out of the Sales table. And I'm grouping by the salespersonId and the salespersonLastName which means I want the results to be combined by the sales person. And I'm also using the HAVING clause to constrain the set of results only to those salespeople who have a total amount of sales that is greater than or equal to 10,000 in whatever units our sale amount column is measured. So perhaps units sold or total dollars sold, et cetera. So HAVING can be very, very useful for filtering a set of categorized results that has been generated using a GROUP BY clause. Another way of thinking about the HAVING clause is that it serves the same purpose as a WHERE clause but for a GROUP BY statement. In all of the examples of using the SQL SELECT statement that we've seen thus far, we have been extracting data from just one table. But oftentimes the kinds of questions that we will want the database to answer for us will require that we extract data from more than one table. And the Structured Query Language provides us with several different ways of doing this. The first approach that we will consider is the use of a subquery. Now remember as I said earlier that the result of a SQL query is a relation in to two dimensional table of data. And because of that, it's possible to use the results of one query as input into another query. And this is what we mean by the term subquery. Note there are two different types of subqueries. They are non correlated and correlated subqueries. I know these are very technical terms but I will try to explain them using examples that are easy to understand. To begin, let's consider a non correlated subquery, which I think is the easier of the two to understand. So here we see an example of a non correlated subquery. What characterizes a non correlated subquery is that the inner query, that is the query that is contained within parentheses, only needs to be run once in order for the database engine to answer the question. Let's look at this a little more carefully. Our outer query begins with SELECT empName FROM the Employee table WHERE the department ID is equal to some set of values. That is our outer query. The set of values that will be used for answering that outer query will be the result of the inner query. And in this case, the inner query says give me a list of department IDs from the Department table where the department name begins with the letters A-C-C-O-U-N-T. So any department that begins with the word Account, which might include Accounting or Accounts payable or Accounts receivable, would have its department ID included in the results. That list of department IDs then becomes input into the outer query, which would use those department IDs in order to produce the list of employee names. So in plain language, if our company has an Accounts Receivable and Accounts Payable department, then what this query would say is give me a list of employee names for any employee that works in the Accounts Receivable or Accounts Payable department. Note here that the inner query, that is the query that is contained within the parentheses, only needs to be run once in order for the database to solve the problem. That is we only need to get the list of department IDs a single time in order for the outer query to be solved. This is a non correlated subquery. Now let's look at a correlated subquery. In contrast to a non correlated subquery, in a correlated subquery the inner query, that is the query inside the parentheses, will need to be run many times in order for the database engine to solve the problem. The reason for this is that the inner query is going to need a value from the outer query in order to do its job. Let's see an example of this. Here we're asking the database for a list of employee names from the Employee table and know that I'm using a lowercase letter e here as an alias for this Employee table. So this allows me to refer to the Employee table in the outer query just by using the letter e. So I want a list of employee names from the Employee table where the employee salary is greater than some value and that value is determined by the inner query. In this case, the inner query is computing the average employee salary for the department in which the employee in the outer query works. So imagine that the database is trying to answer this question. It's going to begin scanning through the Employee table. And let's say that we arrive first at employee number one. And employee number one works in department five. So the database is going to say, should I include this employee in the results. And in order to answer that question it needs to compare employee number one's salary to the average salary of the employees who work in the same department as employee number one. So that means that we need the department ID of the employee that we are currently considering in order to answer the inner query. In this case if our department ID is five, the inner query will say give me the average employee salary from the employee table for department number five. And that five is associated with the employee that we are currently considering in the outer query. This is called a correlated subquery. As I mentioned earlier, the SQL provides us with several different ways of extracting data simultaneously from more than one table. Subqueries that we just examined were one of those ways. The second way is through the use of something called a join. And a joint allows us to merge data from more than one table together into a single result set. There are several different types of joins within the Structured Query Language and I've organized the most important types here into what I call Dan's Typology of Database Joins. So all database joins can be divided into inner joins or outer joins. There are three different types of outer joins that we will consider. They are a left outer join, a right outer join, and a full outer join. And we will also take a look at inner joins. Now technically speaking, there are several different types of inner joins, such as an equi join or a natural join, but for our purposes we will just need to focus on an inner join as a more generic concept. To begin, let's see how we can implicitly join two tables together using an inner join in the WHERE clause. In this case, I'm asking the database to give me a list of employee names and their associated department names. And note that we are now including more than one table in our FROM clause. So we're saying give me data from the Employee table, which we will refer to as E, and the Department table, which we will refer to as D, where the department ID in the Employee table matches the department ID in the Department table. What this means then is a database will take the department ID in the Employee table and will look for a matching department ID in the department table. When a match is found it will extract the department name from the Department table and the employee name from the Employee table and will then include those value as those in the row in the results set. An important thing to note here is that a result will only be included in the result set if a match occurs. Let's see another example of how we can implement an inner join. In this case, we're using the INNER JOIN key words in order to move the join operation out of the WHERE clause and into the FROM clause in our SQL SELECT query. This allows us to remove some of the clutter out of the WHERE clause so that it can be used exclusively for other filtering purposes. Let's take a look at the syntax. In this case, we're asking the database to give us a list of employee names and department names out of the Employee table, which we are referring to here as e. And then we see these new keywords INNER JOIN the Department table, which we will refer to as d ON e.deptId equals d.deptId. So we're telling the database look in the Employee and Department tables, link those tables together by their department IDs, and then constrain the results only to those departments that do not begin with the word Account. So in plain language, we are asking the database here to give us a list of employee names and department names for every employee except for those that work in any department which begins with Account, like Accounts Receivable or Accounts Payable. Any employee that does not work in those departments will be a part of the result along with the name of the department in which they work. Earlier I said that in order for a record to be included in the results of an inner query, it must have matching values from both of the tables that are involved in the join. The major distinguishing characteristic between an inner join and an outer join is just that point. In an inner join, records can only be included in the results set if there are matched pairs of values in the two tables that are involved in the join. In an outer join, the matched pairs of values will be included in the results, but so, too, might be additional records that have no matching value in the other table involved in the join. Let's see an example of how this works. In this case, we are asking the database for a list of employee names and department names from the Employee table, which we will refer to as e, and the department table, which we will refer to as d. We are joining these tables together using a LEFT OUTER JOIN operation. Where the department ID in the Employee table matches the department ID in the Department table. Now what the results of this query will be is a list of employee names and department names. The results will contain every employee name in the Employee table along with the department names of those employees which have a department ID assigned to them. If an employee in the Employee table does not have a department ID, then they will still be included in the results it's just that there will be no department name associated with that employee. It will be a null value in the result set. Now the term LEFT OUTER JOIN here is being used to tell the database from which table all of the results should appear regardless of whether there is a matching value in the other table. If you will notice in the FROM statement we have two tables listed, Employee and Department. The Employee table is named on the left side of the join statement, so to the database that is the left table in this query. To whereas the Department is the right table. It appears on the right side of the join statement. So when we do a LEFT OUTER JOIN then, in this example, we're saying give me a list of all of the employees regardless of whether they have a matching department. But if they have a matching department, include the department, as well. A RIGHT OUTER JOIN, by contrast, tells the database to include all of the records in the right table regardless of whether there is a matching record in the left table. In this case, we are instructing the database to include all of the departments in the results regardless of whether there is an employee assigned to those departments. If an employee is assigned to a department, then that information will be included in the results. But if not, if there is a department that currently has no employees in it, it will appear in the table, as well. It will just have an empty or null value in the results where the employee name would otherwise be. Finally, we can consider a FULL OUTER JOIN which instructs the database to include all of the records from both the left table and the right table regardless of whether a match exists. But if a match exists then those records will be matched up in the results. In this case, we're instructing the database to include all of the employees and all of the departments in the results regardless of whether they have any matching values in the other table that is involved in the join operation. At this point, we have a basic level of understanding on how to use the Structured Query Language to create, read, update, and delete data within tables. So let's return briefly to the data definition language sub component of the broader Structured Query Language and learn how to perform a few additional tasks. First among these is dropping an object from the database. For example, if we wanted to entirely get rid of a table, not the data within the table but the entire table itself, we could use the DROP statement. Specifically, if we were trying to get rid of the Employee table, we could say DROP TABLE Employee. Very short set of commands and that will not only delete all of the data within the table, but it will remove the table itself from the database. Remember, there's an important distinction between deleting the data in a table and deleting the table itself. Removing the table itself means there's no longer, in this case, any Employee table in the database at all. Recalling the use of our ALTER statements and CONSTRAINT statements, here we can see how to get rid of a constraint if we need to. So in this example, we want to remove the employee foreign key from the Employee table. And we accomplish this by using an ALTER TABLE statement. So we say ALTER TABLE we specify the name of the table that we want to alter in this case, the Employee table, and then we tell the database to DROP the constraint that is named employee foreign key. Remember that in the context of the ALTER TABLE statement, we used ADD CONSTRAINT to add a new constraint to a table and here we used DROP CONSTRAINT to remove that constraint. An additional type of constraint that we can create is called a CHECK constraint. CHECK constraint can be used to specify an acceptable set of values that are allowed to appear within a column in a table. Let's look at this example. Here we are altering the project table to include a CHECK constraint, which we are naming projectCheckDates, and we are instructing the database to check whether the start date is less than the end date. If that condition is true, everything is acceptable. If that condition is not true, that is if the proposed end date for the project is less than or equal to the proposed start date, then the database will not allow this row of data to be added to the Project table. This is another mechanism that we have available in our toolbox for helping to protect the quality and integrity of the data in the database. Finally, I just want to briefly mention the concept of a SQL View. Now a SQL View is a virtual table which is constructed by using a SELECT statement that is stored in the database. Views can be useful in many different situations, especially when we have different types of users who use our database. To create a view, we simply use a CREATE VIEW statement and then we specify the SELECT statement which forms the foundation of that view. Let's look at this example. Here we are creating a view named SalesDepartment which will include all of the columns from the Employee table, but will include only those employees that work in the Sales Department. So we are using this SELECT statement to construct a virtual table, which includes just a subset of all of our employees. We only want to see the employees who work in the Sales Department. And then once this view is established, I can write queries against that view just as if it were a table. So for example, I could say SELECT employee name from the sales department view. And in this case, the database would look in the View for employee names and the results, by extension, would be a list all of the names of the employees who work in the Sales Department. Well, my friends, thus ends our brief introduction to the Structured Query Language. There is certainly much more to be learned about the Structured Query Language than I presented here in this lecture. However, you should now have a basic set of skills in SQL that will allow you to start working with databases using the Structured Query Language immediately. I hope you learned something interesting. Until next time. Have a great day.
Info
Channel: Dr. Daniel Soper
Views: 297,516
Rating: undefined out of 5
Keywords: database, databases, SQL, structured query language, lecture, database joins, data
Id: kqUIoOM3WEs
Channel Id: undefined
Length: 78min 24sec (4704 seconds)
Published: Thu May 30 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.