Good day, everyone. This is Dr. Soper
here and today I have the great
privilege of giving you a crash course in the
Structured Query Language. We have many different
topics for this lecture, but generally speaking all of
the topics that we will address are concerned with providing you
with a basic set of skills that will allow you to begin
working with databases using the Structured Query Language. So along the way today we
will learn basic structures for performing our four
basic database operations. That is creating new data,
reading existing data from the database, updating
data, and deleting data. And we'll learn about some
additional interesting topics, including how to extract
data from multiple tables and how to use the Structured
Query Language to create, modify, and delete
various database objects and constraints. As I mentioned, the primary
focus of today's lecture is on the Structured
Query Language. The acronym for
this is SQL and it's pronounced as Sequel or S-Q-L
depending upon your preference. I will use both of these
throughout the rest of this lecture. The Structured Query
Language was originally developed by IBM in the 1970s. So this has been
around for a long time. It was originally
called SEQUEL and it was designed to support the
relational database model which was first proposed by Edgar
Codd in his 1970 paper in Communications of the ACM. As we move through
our course together and we learn more about
the relational model and relational databases,
I hope that you will come to appreciate
just how amazing it is that no one's solved
this relational data problem until 1970. For all of human history,
until just a few decades ago, people were not able to think
about data and relationships in this formal way that we are
now all so familiar with today. A few additional
things that you should know about the Structured Query
Language before we proceed is that the ideas and
capabilities of the Structured Query Language are grounded
in relational algebra. And the Structured
Query Language, itself, is a standard. So this is an ANSI standard
and an ISO standard. One of the great things about
the Structured Query Language is that if you can
learn to use SQL well, then you will be able
to sit down and use almost any database in the
world with very little learning curve, since virtually
all major database vendors support the
Structured Query Language in their products. If you can learn this
language well then you'll be able to work
with databases regardless of the vendor, regardless
of the operating system, or regardless of the device. So you could work with a SQL
Server database, or an Oracle database, MySQL, DB2. You can work on
Linux, Unix, Mac, PC. You can even work
on mobile devices. So learning the
Structured Query Language is an extremely valuable
skill for anyone who wants to work
with databases. Another important characteristic
of the Structured Query Language to remember
is that it is not in and of itself a
programming language, but rather it is a
data sub language. You will come to
appreciate, I hope, that the Structured Query
Language is simple to use and is fairly easy
to understand. Even if you've had no experience
using SQL in the past, by the end of the
lecture today, you should be able to
understand SQL statements and what they are intended to do
and even write some simple SQL statements of your own. The Structured Query
Language itself can be subdivided into
three major components. These are the data
definition language which we can refer to as DDL,
the data manipulation language which we can refer to as DML,
and the data control language which we can refer to as DCL. The data definition
language provides us with a series of
commands that we can use to create database
objects and constraints. The data manipulation
language, by contrast, provides us with a
series of statements that we can use for
creating new data, reading existing data
from the database, updating existing
data in the database, or deleting data
within the database. Finally, the data
control language is used for establishing
things like permissions, user accounts, security,
and so forth. Our focus here
today is going to be on the first two of
these three components. So we will be
learning about some of the elements of
the data definition language and the data
manipulation language. Beginning with the data
definition language, major statements within the
DDL include CREATE, ALTER, and DROP. CREATE is used for the purpose
of creating new databases or database objects. So if I want to create a
new table, for example, I can write a CREATE
statement in SQL. ALTER is used for the purpose
of modifying existing database objects. So if we need to modify
the structure of a table or if we need to
add a new constraint we can use the ALTER
statement within SQL. Finally, we have
the DROP statement. And this component
of the DDL allows us to delete existing
database objects. So for example, if we
need to delete a table or if we need to
delete a view we can use the DROP statement
to accomplish that task. Let's begin by taking a closer
look at the CREATE statement. In this case, we are looking
at a CREATE TABLE statement, the purpose of which is
to establish a new table in the database named Employee. Let's examine some of the
characteristics of the CREATE TABLE statement. First, note that we begin
just by using the statement itself, CREATE TABLE. And then whatever
comes after that will be the name of the table. Remember that each table
within the database must have a unique name. In this case, we're
creating an Employee table. The next component is
an open parenthesis. Everything that will be a part
of the CREATE TABLE statement must fall between the open
and close parentheses. Also note that the SQL
statement ends with a semicolon. It is always good practice to
end all of your SQL statements with a semicolon. What we see here is a
very simple CREATE TABLE statement in which
all we are doing is establishing a
table named Employee which will contain two columns. The first of these columns
is called Employee ID, which I've abbreviated here as
empId, and the second column is named Employee Name. Let's begin by looking at
the Employee ID column. You will notice that
the first component is the name of the
column and then we have a number of spaces after
which the word Integer appears. This is referring to the
data type of the column. So we are saying here
that every employee ID that is added into
this new Employee table must be an integer. Next we see a number of spaces
and then the words NOT NULL. This indicates to the
database that every row which is added into the
Employee table must have a value for the employee ID. That is we do not
allow null values to exist for the employee ID. And then we see a comma. A comma tells the
database that we have finished defining
this attribute and we are now ready to
define the next attribute. A few notes here. I have made the CREATE
TABLE statement extend over four lines and this is
just for convenience. It is for us, as
human beings, to be able to read the
statement more easily. Similarly, I have added
spaces between the names of the attributes
and their data types and whether or not the
attribute allows null values. Again, this is just for
convenience and ease of interpretation by human eyes. The database engine itself
does not care about all of these extra
spaces or line feeds. It will treat this as one
full and complete statement regardless of whether we have
split it across multiple lines or have added extra spaces in
between the various elements of each statement. Next, let's see how we can
expand upon our CREATE TABLE statements in order
to create tables that are a little more
interesting and a little more useful. We'll begin by learning how
to define a primary key using the Structured Query Language. And for this purpose we will
use the CONSTRAINT keyword. So within the Structured
Query Language keys, such as primary
keys and foreign keys, are considered to
be constraints. There are other types
of constraints, as well, and we will talk
about those as we progress through this lecture. Note that the CONSTRAINT keyword
is used inside the parentheses because it is a part of
the CREATE TABLE statement. And note that we give
the constraint a name. In this case, we're defining
a primary key for our Employee table and we are a naming that
constraint Employee PK, empPk. An important point to remember
is that within a database every constraint must
have a unique name. After the name of
the constraint, we see the words PRIMARY KEY
and then an open and close parenthesis. This tells the
database that we want to use our empId
attribute, which we defined a little
earlier in the statement, as the primary key. And as we know, every
value of a primary key must be unique so we are placing
this primary key constraint onto the database. We are telling the database that
each value within the Employee ID column must be unique
for the Employee table. And then note that
we, just as before, close the CREATE TABLE statement
with a close parenthesis and a semicolon. Here we see a variation of
creating a primary key where, instead of creating
a simple primary key, we want to create a
composite primary key. In this case, the primary key
consists of two attributes named empId and skillId. And to create the
composite primary key, we simply need to separate
these two attributes by a comma and put them both inside
the PRIMARY KEY statement. Now that we know
how to establish both simple and
composite primary keys, let's take a look at how
to create foreign keys. Now, just as a primary key
is considered a constraint, a foreign key is also
considered a constraint. So we implement a
foreign key relationship using the CONSTRAINT keyword. Here we are establishing two
foreign keys, one of which is named empFk and the other
of which is named skillFk. And we are going to use
these foreign keys to link our EmployeeSkill table
to an Employee table and to a Skill table. So conceptually
speaking, we can imagine that we have two parent
tables, Employee and Skill, and then we have this third
table named EmployeeSkill which sits between the other
tables and allows us to connect them together. In this way, each
employee will be allowed to have multiple
skills and each skill will be able to apply
to multiple employees. To establish these
relationships, we need to add foreign keys
to our EmployeeSkill table. And as we can see, we have these
two foreign key constraints, empFk and skillFk. And to establish
the foreign keys, we use the keywords FOREIGN KEY
and then an open parenthesis and inside those
parentheses we specify the name of the attribute
within this table that we want to be a part
of the primary key/foreign key relationship. So we're saying FOREIGN KEY
empId in the first example. That's telling the
database that we want to use the empId attribute
in this table as a foreign key. Following this we see
the keyword REFERENCES and then Employee
open parenthesis empId close parenthesis. This tells the database
where the other end of the relationship should be. So we want this foreign key to
point to the empId attribute within the Employee table. We then see a comma
and after that, we provide our next constraint
which, in this case, is the foreign key which
links our EmployeeSkill table to the Skill table. And we are linking
these together according to the skillID
attribute in each table. So creating these primary
keys and foreign keys is actually quite easy in the
Structured Query Language. Now, let's extend our
knowledge of these constraints even further by considering
cascading deletes and cascading updates. First, let's consider
the employee foreign key where here, after
defining the foreign key, we see the keywords
ON DELETE CASCADE. This tells the
database that we want to establish a cascading delete
relationship between this table and the Employee table. A cascading delete
is used to help us maintain the
integrity and the quality of the data in the database. And here's how it works. In this case, we have
an Employee table and EmployeeSkill table. So let's imagine that we
have an employee number one and this employee has
skills 101 and 102. Now, if our employee number
one leaves the organization-- maybe they find a better job or
we decide to get rid of them-- then we may want to delete the
employee from the database. What a cascading
delete will do for us is when we delete our
employee from the database, the database will follow
the relationship line into the EmployeeSkill
table and will look for any records that were
associated with the employee that we want to delete. If it finds any
of those records, it will delete them from
the EmployeeSkill table automatically for us. In this way, the database
maintains the quality of the data by eliminating any
records in the EmployeeSkill table which would no longer
be associated with an existing record in the Employee table. Next we will consider
a cascading update, which is very similar
to a cascading delete in that the purpose
of a cascading update is to maintain the
quality or the integrity of the data in the database. In this case, we are
establishing a cascading update between the Skill table and
the EmployeeSkill table. And what this
means is that if we were to update one of
the primary key values in the Skill table,
let's say that we change skill 101 to 105,
then the database will follow the relationship
from the Skill table to the EmployeeSkill table and
will look in the EmployeeSkill table for any rows that
contained skill number 101. If it finds any of those
rows, it will automatically update 101 to the
new value 105 so as to maintain the link between
records in the Skill table and records in the
EmployeeSkill table. I hope that you
can appreciate just how useful these cascading
deletes and cascading updates can be. Next, let's take a look at
the SQL ALTER statement. As I mentioned earlier,
the ALTER statement is used for the purpose
of changing the structure of a database object. For example, let's imagine that
we created our Employee table, but after creating
the table we realize that we forgot to
establish a primary key. Well, we could easily use
the ALTER TABLE statement in order to add a
primary key to the table. The basic syntax for the
ALTER statement is shown here. In this case, we want to
alter the Employee table and we use the keywords
ALTER TABLE and then the name of the table
that we want to change. And then we specify the
change that we want to make. In this case, we want
to add a new constraint, specifically a primary key
constraint named empPk, and we're telling
the database that we want the empId attribute
within this Employee table to serve as our primary key. And by running
that SQL statement our Employee table
would then have empId marked as the primary key. Just as with our
CREATE TABLE example, we see here that we use a comma
separated list of attributes inside parentheses to tell
the database that we want to create a composite
primary key, in this case consisting of
the Employee ID and Skill ID columns. Adding a foreign key
constraint to an existing table is just as easy as
adding a primary key. We use the ALTER
TABLE statement, specify the name of
the table, and then we add a new constraint using the
same syntax that we learned for the CREATE TABLE statement. In this case, we are
telling the database that we want to link the deptId
attribute in this Employee table to the deptId attribute
in the Department table. With respect to
changing the data within the tables
in our database, there are really just three
things that we can do. We can either add new
data into the table, we can change data that
already exists in the table, or we can delete existing data. And we accomplish these three
tasks by using the INSERT INTO, UPDATE, and DELETE FROM
commands respectively. Let's begin by
considering the situation in which we want to add new
data into the database table. For this purpose, we can use
the SQL INSERT INTO statement. Note here that if we want
to specify non numeric data, if we want to add non
numeric data such as text into one of the columns
in the table, then we must enclose that
value in single quotes. Let's look at the syntax for
the INSERT INTO statement. Note that it begins with the
keywords INSERT INTO followed by the name of the
table in which we want to add this new row of data. And then we have
an open parenthesis and a comma separated list
of attribute names followed by a close parenthesis. So what I'm telling
the database here is I want to add a new record
into the Employee table and I'm going to be
specifying values for the employee ID, the
salary code, and the last name of the employee in that order. Next, we see the
keyword VALUES followed by an open parenthesis, a
comma separated list of values, and a close parenthesis. The values which appear inside
this second set of parentheses are the actual data values which
will be added into the table. In this case, the values
are 62, 11, and Halpert. Note that the
order matters here. We have specified that we will
be providing an employee ID, a salary code, and the
last name in that order. The values that I provide inside
the second set of parentheses then must match the
same order of the values as they appeared in the
first set of parentheses. In this case, we're saying
that the empId should be 62, the value of salaryCode
should be 11, and the value of lastName
should be Halpert. Now let's look at
the UPDATE statement. UPDATE allows us
to alter or modify the values of an existing row
or set of rows within a table. Let's consider the syntax
for the UPDATE statement. To begin, we use
the keyword UPDATE followed by the name of the
table that we wish to update. Next, we use the keyword SET and
then the name of the attribute that we want to change
followed by the new value for that attribute. After which, we
see a new keyword called WHERE which specifies
which row or rows we want to change. So in this first
example, we're telling the database I want to change
a value in the Employee table. I want to change the phone
number to 657-278-1234. And I want to make
that change only for the row in the table
which has empId equal to 29. So in plain language
we're saying change the phone
number for employee ID number 29 to the
value shown here. In our next example, we are
setting the Employee ID to four every employee in the
table who's employee name begins with the letters D-A and
we will learn a little bit more about these wild
card searches later. But the point here
is to illustrate that an UPDATE statement
can change values for more than one row at a time. And this point is further
shown with our final example, which just says UPDATE
Employee SET deptID equals 3. And what this
command will do is it will change the department ID
for every employee in the table to the value 3. One of the things
that you will learn as you become more skilled with
the Structured Query Language is that often the shortest
statements can cause the most change in the database. So here, if we had a
table, an Employee table, which contained
one million rows, this statements
UPDATE Employees SET deptId equals 3 would
make one million changes to that database. It would set the department
ID to 3 for every employee. And what if we wanted to
delete one or more rows from an existing table? In that case, we would use
the DELETE FROM statement. And the syntax is simple. We begin with the keywords
DELETE FROM and then the name of the table from
which we want to delete data. In this case, we're
telling the database that we want to delete some
data from the Employee table. Next we see WHERE clause
and that specifies which employee we want to delete. So in this case, we're
telling the database delete the record in the
Employee table who's Employee ID has a value of 29. Just as with the
UPDATE statement, we can use the
DELETE FROM statement to delete more than one row
at a time within the table. And this is what we see
in our second example. Here we're telling
the database to delete all of the rows in
the Employee table where the name of the employee
begins with the letters D-A. So any employees named Dan
or Daniel or David or Danica or Danielle or
Dana they would all be deleted from
the employee table if we were to run this query. And our last example is the
simplest, DELETE FROM Employee. Again, this is a short command
which has the potential to do the most damage
or cause the most change within the table. In this case, if we
were to run that query it would delete every employee
in the Employee table. Now we can turn our attention
to SQL SELECT queries. A SELECT query is used
to extract information from the database, or perhaps
a more interesting way of thinking about this is we
can use the SELECT query to get answers from the database. So for example, if I want
to know which employees work in the Marketing Department,
that's a question that I can ask the database
using a SQL SELECT statement and it will give me the answer. Similarly, I might ask
questions like what is the salary of
employee number two or how many employees work
in the Sales Department or what was the total amount
of sales for last month? These are all
questions that we can ask the database using
a SQL SELECT query and it can provide
us with the answer. To begin, we specify the
SELECT keyword followed by the names of
one or more columns that we want to
appear in the results. So values for the
specified columns which appear after
the keyword SELECT will become a part
of our results. In this first example,
you can see that we are using SELECT empName. This means that the
only column which will appear in our results
will be the empName column. Next, we see the FROM keyword
and after the FROM keyword we specify one or more
tables from which we want our data to be extracted. In this example, we're
just saying we want data from the Employee table. And just as with the UPDATE
and DELETE FROM statements, we can, if we want
to, use a WHERE clause which will filter the results. So in this first example,
in simple language we're telling the
database give me the name of the employee
in the Employee table whose ID is equal to 33. In our second example, you
will notice that we do not have a WHERE clause. And in simple
language, this query is telling the
database to give us a list of all of the employees
in the Employee table. One of the most important
concepts for you to understand about
these SQL SELECT queries is that the results of
the query are a relation. That is, as long as there are
some data to be retrieved, the result will be a two
dimensional table of data. It may contain just a single
row or a single column or it may contain many
rows or many columns but the results can be thought
of as a two dimensional table of data. Because the result of a
query is a two dimensional table of data, we can then
use that result, if we want, to perform other tasks. Such as, we can use the result
to create a new relation or we can use the
result of one query as input into another query. We'll take a look at that
situation a little later. Just remember that
our result set, that is the results that we get
back, the table of data that we get back in
response to our query, may not in third normal form. So it is a relation, but it may
not be in third normal form. And this is
especially common when we perform something called
a JOIN operation which is where we are
simultaneously pulling data from more than one table. In our previous examples of
using the SELECT statement, we we're extracting
just a single column of data from the database. But, what if we want to get
values for more than one column from the table. Well in order to
accomplish that task, we simply need to use a comma
separated list of column names. In this example, we're
telling the database to give us a list of employee
IDs and employee names from the Employee table. So it will return
a result set which contains two columns,
empId and empName, and then all of the employee
IDs and employee names which appear in
the Employee table. We also have the option of
retrieving all of the columns within a specific
table or set of tables. And we can do this by using
the asterisk character. Just as a brief
note, I will commonly refer to this as a star
instead of an asterisk. So in this example here, we
see SELECT star FROM Employee. This tells the database to
retrieve all of the data from the Employee table,
every column and every row. Remember the star represents
the concept of all columns. Not all rows, but all columns. So this example says
select all columns from the Employee table. And because we are not
specifying any filtering criteria using the WHERE clause,
it will also return all rows. What if we have a situation
where the result of our query contains many duplicate rows? Well we can eliminate
that problem by using the SQL
DISTINCT keyword. And if we place the
DISTINCT keyword in front of one of
our column names then the database will
remove any duplicate values of that attribute when
generating the results. In this example, we're
telling the database to give us a list of department
IDs from the Employee table, but we don't want any
duplicates in the list. That is, we just want all of
the unique department IDs. In a few of our
previous examples, we used the WHERE clause
and so at this point I think it would be a good idea
to explore the WHERE clause in a little more detail. Basically, what the
WHERE clause does is it allows us to
tell the database to constrain its
actions according to some kind of
filtering criteria. In this example, we're using a
WHERE clause with just a simple equality check. So we're telling the
database to give us the names of all of the
employees in the Employee table that have a value
for deptId of 15. In simple language,
we're saying give us a list of all of the employees
who work in department 15. The WHERE clause
is being used here to restrict the full
set of employee names to just to those that work
in department ID number 15. So we have an equality check
here, a simple equals sign, but there are many other
types of comparisons that can be done in the WHERE clause. So aside from equals, which
is just a simple equal sign as we saw, we can
do not equals using open and close pointy
brackets or in some databases you can use an exclamation
point and an equal sign for not equal to. We could also represent
the concepts of greater than or less than, greater
than or equal to, and less than or equal to. And using these
various comparisons gives us a great
deal of flexibility in the types of filtering that
we can do in the WHERE clause. What's more is that we can
create compound conditions. We can filter items based
upon more than one criterion. One of the ways
that we can do this is by using the AND keyword. So if we want to check if
two conditions are true, we can use the AND keyword. And in terms of
relational algebra, this represents an
intersection of the data sets. We also have access
to an OR keyword. And in terms of
relational algebra, this represents a
union of the data sets. One way to think
about AND and OR is in terms of Venn diagrams. Imagining that we have
two sets, A and B. The AND operation is the
intersection of those two sets. It is the area of
overlap between A and B. The OR operation, however,
is saying if either A is true or B is true or both
A and B are true, then the entire function
returns as true. Now, let's look at a few quick
examples of these AND and OR operators being used. In this first example, we
are asking the database to give us a list of employee
names from the Employee table where the employees' department
ID is less than seven or the employees' department
ID is greater than 12. So any employee
record in that table that has a value of the
deptId of less than 7 or a value of the deptId
of greater than 12 will become a part
of the results. In our next example, we
are asking the database to give us a list of employee
names from the Employee table where the employee works
in department ID number 9 and where the employee
also has a salary code of less than or equal to 3. So in this example both
of these conditions must be true in
order for a row to be included in the results set. That is, the employee's
department ID must be 9 and the employee's salary
code must be less than or equal to 3. Whenever I'm teaching the
Structured Query Language, I always like to provide people
with shortcuts or easy ways of doing things. One example of that is
the use of the IN keyword. Now the IN keyword can be used
to provide a list of values for a WHERE clause. And perhaps the best
way to learn this is by looking at an example. Here we see a SELECT query
which is asking the database to give us a list of employee
names from the Employee table where the department ID for the
employee is either 4, 8, or 9. So as the database
looks through this table and is attempting to
assemble the set of results, if the deptId is equal
to the values 4, 8, or 9, that row will be
included in the results. Otherwise, they will not become
a part of the result set. Now we can compare this
method to a method which uses a series of OR operators. So instead of saying
WHERE deptId IN 4, 8, 9, we could instead
say WHERE deptId equals 4 OR deptId equals
8 OR deptId equals 9. Hopefully you can see
that using the IN keyword can save you a few key
strokes, especially if you have a long list
of acceptable values that you would like to check
for in the WHERE clause. We also have access
to the NOT operator which allows us to invert the
filtering criteria specified in the WHERE clause. Consider this example. Here we are asking the
database to give us a list of employee names
from the Employee table where the department
ID is not 4, 8, or 9. So the results
here will be a list of employee names
from the Employee table for every employee
except for those employees who work in departments 4, 8, or 9. Another useful shortcut is
to use the BETWEEN key word. BETWEEN allows us to
specify a range of values in our WHERE clause
such that any value between a minimum and a maximum
value is considered acceptable. Let's see an example. Here we are asking the
database to give us a list of employee names
from the Employee table where the salary code
falls between 10 and 45. So as the database is
scanning through the Employee table trying to
assemble our results, it will check to see if
the value of the salary code for each row falls
somewhere between 10 and 45. And that is an inclusive range,
meaning that 10 and 45 fall within the range. If a row has a salary code
that falls between 10 and 45, it will be included
in the result set. If not, that row will not be
included in the result set. This is equivalent using
a compound WHERE statement with an AND operator. So instead of saying WHERE
salaryCode BETWEEN 10 AND 45, we could instead
say WHERE salaryCode is greater than or equal to
10 AND salaryCode is less than or equal to 45. The results will be the same. Next, I would like to talk
about the SQL LIKE keyword. And the LIKE keyword
is very useful because it allows us
to do partial searches. So for example, if I would like
to generate a list of employees whose names start
with the letter D or if I would like to retrieve
a list of customer email addresses that
end in gmail.com I can use the LIKE keyword in
order to achieve that result. SQL provides us with a few
different types of wild card characters that we can
use for these types of partial searches. First, is the percent sign. Now a percent sign means match
any number of characters. So this is any number
of wild card characters. To whereas the second type
of wild card character is a single character wild card. And for this purpose we
use the underscore symbol. This allows us to
match values against a single, unknown character. Let's see some examples of how
these wild card searches work. In this first example, we
are asking the database to give us a list of employees
from the empId table whose names begin with D- A. So
we see our percent sign wild card character shown here. We're saying we
want the employee name to look like D- A and
then any set of characters thereafter. So, zero or more
characters thereafter. This means that values
of Dan or Daniel or Danica or Danielle or David
or any other employee name that starts with the letters D- A
would be a part of that result set. In our second example, we see
the use of the single wild card character. And in this case, we
are asking the database to give us a list of
employee IDs for employees whose phone numbers begin
with 6-5-7 dash 2-7-8 dash and then contain exactly
four unknown characters. So here we're using the single
character wildcard in order to tell the database
that we want to match a very specific
pattern of telephone numbers. This wild card search
capability that is built into the
Structured Query Language can be easily used to build
something like a simple search engine. Thus far, with all of
our SQL SELECT statements we've retrieved
a list of results but we've said nothing
about the order in which the rows contained
within those results would appear. I, for example, have asked many
times for a list of employees, but they can be
in whatever order the database chooses
to put them in. If I want to
instruct the database to return the results
in a specific order, then I can use something
called an ORDER BY statement. In this example, we
are asking the database to give us all of the columns
out of the Employee table and we want the results to be
ordered by the employee name. In this case, we can
assume that empName is an alphanumeric column
that contains text values, so they will be
ordered alphabetically. Note that SQL supports two
different types of sorting. You can do and an ascending
sort or a descending sort. By default, if we do
not specify a sort type, SQL will assume that we
want an ascending sort. If we like, we can tell SQL
specifically what type of sort to perform and we can do this
by using the keyword ASC which stands for ascending or
the keyword DESC which stands for descending. And all we need to do
is add these characters after the name of the attribute
by which we want to sort and the database engine will
sort the results according to our desires. A few interesting things
that I need to point out about sorting. Most people are aware
that an ascending sort on numeric values means
that the resulting sorted values will go from small
numbers to large numbers. Similarly, an
ascending sort on text would be an alphabetic sort. So in the English
language we would expect the results to
be sorted A through Z. And of course, we can do
a descending sort which inverts the sort operations. So a descending
numeric sort will have numbers that go from
high to low, or large to small if you prefer, whereas a
descending sort on a text column will be a
reverse alphabetic sort. Here's a question for you. If I perform a sort
on a series of dates, what do the results look like? Well, if we're sorting
on a date or time column and we're using
an ascending sort, the results will be sorted
from oldest at the top to the newest at the bottom. Similarly, if we do a descending
sort on a date or time field then the newest records
will appear at the top and the oldest
records will appear at the bottom of the list. SQL provides us with
several additional built in functions that
can be very, very useful for our querying purposes. For example, we can
use the COUNT function to get a count of
the number of rows that match some sort
of specified criteria. Or if we want to find the
minimum or maximum value for a column within
a set of rows, we can use the MIN
and MAX functions. We also have access
to a SUM function if we would like to compute a
total for a specific column. So for example, I may be
interested in knowing what my total sales were yesterday. Or we might want to
determine an average. As an example, I might say what
were the average daily sales last week? Or a standard deviation. So taken together we have
access to some very useful built in functions within the
Structured Query Language. Let's see some examples. In this first case, we're saying
simply SELECT COUNT star FROM Employee. This is just asking
the database to give us the number of records
in the Employee table. Or in simpler language,
how many employees are in the Employee table. In our second
example, let's imagine that we are interested
in learning something about the total
number of hours that are required for the various
projects within our company. In this case, we're using the
MIN, MAX, and AVG functions in order to determine the
minimum number of hours involved in a project, the
maximum number of hours involved, and the average
number of hours involved. So when processing this
query, the database will first scan through the
entire project table and extract the set of rows
whose projID values are greater than 7 and then it will
look within that set of rows at values of the
hours column and it will use those
values to determine the minimum number
of hours worked, the maximum number
of hours worked, and then it will also compute
the average number of hours for all of those projects. And then return
the results to us. Very convenient. Another extremely
useful statement within the Structured
Query Language is the GROUP BY clause. And what the GROUP
BY clause does is it allows us
to combine results into some sort of categorical
or categorized output. Let's look at an example. In this case, we want
the department ID and the number of employees
that work in each department. And we can do this by
using the GROUP BY clause. So we tell the database to
give us deptId and the count or the number of rows
within the Employee table, but we want the results to be
grouped together by deptId. So in this case, the
result of this query would be a table which contains
two columns, the deptId and another column which
will be named numOfEmployees, and each row then will
contain the department ID and its associated
number of employees. Note that I'm using
something here called an alias to refer to the
result of the count operation. And this is implemented in
this case using the AS keyword. So I'm telling the
database that I want the result of
the COUNT function to be called numberOfEmployees. The use of aliases
can be very convenient and we will see some more
examples later in our lecture. If we're doing a
GROUP BY in order to categorize our
results in some way, we may also want to use
the HAVING statement. And what the HAVING
statement does is it allows us to filter
categorized results. Let's look at this example. In this case, I'm
asking the database to give me a list of salesperson
IDs, salesperson last names, and the total sales,
which in this case is computed as the sum
of the sale amount, out of the Sales table. And I'm grouping by
the salespersonId and the
salespersonLastName which means I want the results to be
combined by the sales person. And I'm also using
the HAVING clause to constrain the set of results
only to those salespeople who have a total amount
of sales that is greater than
or equal to 10,000 in whatever units our sale
amount column is measured. So perhaps units sold or
total dollars sold, et cetera. So HAVING can be
very, very useful for filtering a set
of categorized results that has been generated
using a GROUP BY clause. Another way of thinking
about the HAVING clause is that it serves the same
purpose as a WHERE clause but for a GROUP BY statement. In all of the examples
of using the SQL SELECT statement that
we've seen thus far, we have been extracting
data from just one table. But oftentimes the
kinds of questions that we will want the
database to answer for us will require that we extract
data from more than one table. And the Structured
Query Language provides us with several
different ways of doing this. The first approach
that we will consider is the use of a subquery. Now remember as I said earlier
that the result of a SQL query is a relation in to two
dimensional table of data. And because of that,
it's possible to use the results of one query as
input into another query. And this is what we mean
by the term subquery. Note there are two different
types of subqueries. They are non correlated
and correlated subqueries. I know these are
very technical terms but I will try to explain
them using examples that are easy to understand. To begin, let's consider
a non correlated subquery, which I think is the easier
of the two to understand. So here we see an example of
a non correlated subquery. What characterizes a
non correlated subquery is that the inner query,
that is the query that is contained within
parentheses, only needs to be run once in
order for the database engine to answer the question. Let's look at this a
little more carefully. Our outer query begins
with SELECT empName FROM the Employee table WHERE
the department ID is equal to some set of values. That is our outer query. The set of values
that will be used for answering that
outer query will be the result of the inner query. And in this case, the inner
query says give me a list of department IDs from
the Department table where the department name begins
with the letters A-C-C-O-U-N-T. So any department that
begins with the word Account, which might include Accounting
or Accounts payable or Accounts receivable, would have
its department ID included in the results. That list of department
IDs then becomes input into the outer query, which
would use those department IDs in order to produce
the list of employee names. So in plain language,
if our company has an Accounts Receivable and
Accounts Payable department, then what this query
would say is give me a list of employee names
for any employee that works in the Accounts Receivable
or Accounts Payable department. Note here that the
inner query, that is the query that is contained
within the parentheses, only needs to be run once
in order for the database to solve the problem. That is we only need to
get the list of department IDs a single time in order for
the outer query to be solved. This is a non
correlated subquery. Now let's look at a
correlated subquery. In contrast to a non
correlated subquery, in a correlated subquery
the inner query, that is the query
inside the parentheses, will need to be run many times
in order for the database engine to solve the problem. The reason for this is
that the inner query is going to need a value
from the outer query in order to do its job. Let's see an example of this. Here we're asking the database
for a list of employee names from the Employee
table and know that I'm using a lowercase letter e here
as an alias for this Employee table. So this allows me to
refer to the Employee table in the outer query
just by using the letter e. So I want a list
of employee names from the Employee table where
the employee salary is greater than some value
and that value is determined by the inner query. In this case, the inner query is
computing the average employee salary for the department
in which the employee in the outer query works. So imagine that the database is
trying to answer this question. It's going to begin scanning
through the Employee table. And let's say that we arrive
first at employee number one. And employee number one
works in department five. So the database is going
to say, should I include this employee in the results. And in order to
answer that question it needs to compare
employee number one's salary to the average salary
of the employees who work in the same department
as employee number one. So that means that we need the
department ID of the employee that we are currently
considering in order to answer the inner query. In this case if our
department ID is five, the inner query will say give
me the average employee salary from the employee table
for department number five. And that five is associated
with the employee that we are currently
considering in the outer query. This is called a
correlated subquery. As I mentioned earlier,
the SQL provides us with several different ways of
extracting data simultaneously from more than one table. Subqueries that we just
examined were one of those ways. The second way is through the
use of something called a join. And a joint allows
us to merge data from more than
one table together into a single result set. There are several
different types of joins within the
Structured Query Language and I've organized the
most important types here into what I call Dan's
Typology of Database Joins. So all database joins can
be divided into inner joins or outer joins. There are three
different types of outer joins that we will consider. They are a left outer
join, a right outer join, and a full outer join. And we will also take
a look at inner joins. Now technically speaking, there
are several different types of inner joins, such as an
equi join or a natural join, but for our purposes
we will just need to focus on an inner join
as a more generic concept. To begin, let's see
how we can implicitly join two tables together using
an inner join in the WHERE clause. In this case, I'm
asking the database to give me a list of employee
names and their associated department names. And note that we are now
including more than one table in our FROM clause. So we're saying give me data
from the Employee table, which we will refer to as E, and
the Department table, which we will refer to as D, where the
department ID in the Employee table matches the department
ID in the Department table. What this means
then is a database will take the department
ID in the Employee table and will look for a matching
department ID in the department table. When a match is found it will
extract the department name from the Department table
and the employee name from the Employee
table and will then include those value as those
in the row in the results set. An important thing
to note here is that a result will only be
included in the result set if a match occurs. Let's see another example of how
we can implement an inner join. In this case, we're using the
INNER JOIN key words in order to move the join operation
out of the WHERE clause and into the FROM clause
in our SQL SELECT query. This allows us to remove some
of the clutter out of the WHERE clause so that it can be used
exclusively for other filtering purposes. Let's take a look at the syntax. In this case, we're
asking the database to give us a list of
employee names and department names out of the
Employee table, which we are referring to here as e. And then we see these
new keywords INNER JOIN the Department
table, which we will refer to as d ON
e.deptId equals d.deptId. So we're telling the
database look in the Employee and Department tables,
link those tables together by their department IDs, and
then constrain the results only to those departments that do
not begin with the word Account. So in plain language, we
are asking the database here to give us a list of employee
names and department names for every employee
except for those that work in any
department which begins with Account, like
Accounts Receivable or Accounts Payable. Any employee that does not
work in those departments will be a part of the
result along with the name of the department
in which they work. Earlier I said that
in order for a record to be included in the
results of an inner query, it must have matching values
from both of the tables that are involved in the join. The major distinguishing
characteristic between an inner join and an
outer join is just that point. In an inner join,
records can only be included in the
results set if there are matched pairs of
values in the two tables that are involved in the join. In an outer join, the
matched pairs of values will be included in the
results, but so, too, might be additional
records that have no matching value in the other
table involved in the join. Let's see an example
of how this works. In this case, we are
asking the database for a list of employee
names and department names from the Employee table,
which we will refer to as e, and the department table,
which we will refer to as d. We are joining these tables
together using a LEFT OUTER JOIN operation. Where the department ID
in the Employee table matches the department ID
in the Department table. Now what the results
of this query will be is a list of employee
names and department names. The results will contain every
employee name in the Employee table along with
the department names of those employees which have a
department ID assigned to them. If an employee in
the Employee table does not have a
department ID, then they will still be included
in the results it's just that there will be
no department name associated with that employee. It will be a null value
in the result set. Now the term LEFT
OUTER JOIN here is being used to tell
the database from which table all of the results
should appear regardless of whether there is a matching
value in the other table. If you will notice
in the FROM statement we have two tables listed,
Employee and Department. The Employee table is
named on the left side of the join statement, so
to the database that is the left table in this query. To whereas the Department
is the right table. It appears on the right
side of the join statement. So when we do a LEFT OUTER
JOIN then, in this example, we're saying give me a list
of all of the employees regardless of whether they
have a matching department. But if they have a
matching department, include the department, as well. A RIGHT OUTER JOIN, by
contrast, tells the database to include all of the
records in the right table regardless of whether
there is a matching record in the left table. In this case, we are
instructing the database to include all of the
departments in the results regardless of whether there
is an employee assigned to those departments. If an employee is
assigned to a department, then that information will
be included in the results. But if not, if there is a
department that currently has no employees in it, it will
appear in the table, as well. It will just have an empty
or null value in the results where the employee name
would otherwise be. Finally, we can consider
a FULL OUTER JOIN which instructs the
database to include all of the records from both the
left table and the right table regardless of whether
a match exists. But if a match exists
then those records will be matched
up in the results. In this case, we're
instructing the database to include all of
the employees and all of the departments
in the results regardless of whether
they have any matching values in the
other table that is involved in the join operation. At this point, we
have a basic level of understanding on how to
use the Structured Query Language to create, read,
update, and delete data within tables. So let's return briefly to the
data definition language sub component of the broader
Structured Query Language and learn how to perform
a few additional tasks. First among these is dropping
an object from the database. For example, if we wanted to
entirely get rid of a table, not the data within the
table but the entire table itself, we could use
the DROP statement. Specifically, if we
were trying to get rid of the Employee table, we
could say DROP TABLE Employee. Very short set of
commands and that will not only delete all of
the data within the table, but it will remove the table
itself from the database. Remember, there's an
important distinction between deleting
the data in a table and deleting the table itself. Removing the table
itself means there's no longer, in this
case, any Employee table in the database at all. Recalling the use of our ALTER
statements and CONSTRAINT statements, here we can see
how to get rid of a constraint if we need to. So in this example, we want to
remove the employee foreign key from the Employee table. And we accomplish this by
using an ALTER TABLE statement. So we say ALTER TABLE we
specify the name of the table that we want to alter in this
case, the Employee table, and then we tell the database
to DROP the constraint that is named employee foreign key. Remember that in the context
of the ALTER TABLE statement, we used ADD CONSTRAINT to add
a new constraint to a table and here we used DROP CONSTRAINT
to remove that constraint. An additional type of
constraint that we can create is called a CHECK constraint. CHECK constraint can be used
to specify an acceptable set of values that are
allowed to appear within a column in a table. Let's look at this example. Here we are altering
the project table to include a CHECK
constraint, which we are naming
projectCheckDates, and we are instructing the database
to check whether the start date is less than the end date. If that condition is true,
everything is acceptable. If that condition
is not true, that is if the proposed end
date for the project is less than or equal to
the proposed start date, then the database will
not allow this row of data to be added to
the Project table. This is another
mechanism that we have available in our
toolbox for helping to protect the quality
and integrity of the data in the database. Finally, I just want
to briefly mention the concept of a SQL View. Now a SQL View is
a virtual table which is constructed by
using a SELECT statement that is stored in the database. Views can be useful in
many different situations, especially when we have
different types of users who use our database. To create a view, we simply
use a CREATE VIEW statement and then we specify the
SELECT statement which forms the foundation of that view. Let's look at this example. Here we are creating a
view named SalesDepartment which will include
all of the columns from the Employee
table, but will include only those
employees that work in the Sales Department. So we are using this
SELECT statement to construct a
virtual table, which includes just a subset
of all of our employees. We only want to see
the employees who work in the Sales Department. And then once this
view is established, I can write queries
against that view just as if it were a table. So for example, I could
say SELECT employee name from the sales department view. And in this case,
the database would look in the View
for employee names and the results,
by extension, would be a list all of the
names of the employees who work in the
Sales Department. Well, my friends, thus
ends our brief introduction to the Structured
Query Language. There is certainly much more to
be learned about the Structured Query Language than I
presented here in this lecture. However, you should now have
a basic set of skills in SQL that will allow you
to start working with databases using
the Structured Query Language immediately. I hope you learned
something interesting. Until next time. Have a great day.