Good day, everyone. This is Dr. Soper here. And today we will be
exploring our sixth topic in our series of
database lectures. With today's lecture focusing
on database administration. Although we have many different
objectives for today's lecture, there are a few
primary areas on which I will be focusing the
majority of my comments. These are concurrent
access, which is a situation in which
more than one user wants to access a given
database at the same time, database security,
and the model that has been put in place
for backup and recovery of these relational databases. We will of course be addressing
the several additional topics along the way, but the primary
focus of our lecture today will be those three areas. In addition, toward
the end of the lecture we will perform a
quick examination of the administrative
and managerial tasks that a database
administrator must complete. And we'll also be examining
distributed database processing, which is a situation
in which our database might be replicated or
partitioned such that it can exist on
more than one machine. And we will conclude
our lecture today with a brief overview of
object oriented databases and object relational databases. To begin, consider this
conceptual representation of the database
processing environment. This diagram is intended to
show the complex environment in which modern database
technologies are used. We can see here that
the database management system exists at the
center of this environment, and it is exclusively
the database management system that has access
to the data themselves. However, many different
types entities can make requests of the
database management system. Aside from internal database
triggers and stored procedures, we have application programs
that can utilize our DBMS. Web requests, perhaps in the
form of active server pages or Java server pages that
similarly can utilize our DBMS. And we have any number of
additional sources of requests that the DBMS may
need to process, such as ad hoc queries,
forms, reports, et cetera. So as we can see
here, the possibility exists for a single
database to have to serve many different types
of requests simultaneously. Because of this
complexity we need to view a database
processing environment as a dynamic environment. And in order to ensure that
the database can meet the needs and demands of all of
the different parties and software applications that
may rely upon the database, a database administrator
must configure and manage the database such
that it provides for concurrency
control, security, and backup and recovery in the
case that something goes wrong. To begin, let's consider
concurrency control. So broadly speaking,
the idea of concurrency is that more than one
user, in this case a user might be a person,
a human being, or software application, such as a
standalone application, a web service, or a data
driven website. More than one of
these types of users may be trying to access the
same data in the database at the same time. And this need to provide
for concurrent access leads to a problem known as
interdependency, where in modifications to the
database that are made by one user may impact other users
who are simultaneously trying to use the same information. So the objective of
currency control mechanisms in database administration
is to ensure that the actions of
the various users, who may be trying to access
the same data at the same time, are not going to adversely
impact one another. Another way of
saying that is, we want to put a
mechanism in place that will ensure that the quality,
and reliability, the accuracy of the data remain
intact, despite the fact that many users might be
trying to work with or modify those data at the same time. The primary challenge
with concurrency control is balancing this need to
maintain the quality of data in the database with the
need to provide accessibility to the data, so that many
users can efficiently use the database at the same time. Thus, we have a spectrum. On one end of the
spectrum we can design our concurrency
control framework such that when one user
accesses or touches a particular set of data
within the database, we block all other
users from being able to use those data until
the first user has finished whichever tasks he or she
was trying to perform. The other end of this
concurrency control spectrum would be that we don't
impose any constraints and a user is allowed to read
and work with data regardless of whether any other users are
simultaneously using or working with those data. In a high volume high user
database processing environment we must often try
to find some level of the balance between
the two extremes of this concurrency
control spectrum. Next I would like to introduce
the concept of a database transaction. Simply put, a
database transaction is a series of operations
that need to be performed against the database. One way to think about
this is that a transaction is a series of SQL commands
or a series of instructions. Despite the fact that
a transaction typically involves more than
one operation, it is critical in the database
processing environment that all of the operations or
all the instructions involved in the transaction be treated
as a single atomic unit, that is a logical unit of work. And using this approach
every step in the transaction must be completed successfully,
or the entire transaction will fail. That is, every step
must succeed in order for the entire
transaction to succeed. If any of the steps fails,
then the whole transaction will fail, and we will
need to undo, or roll back, any changes that were
made to the database during the transaction
prior to the point when the transaction failed. Here we see an example
of a transaction which is going to involve three
separate tables in a database. A Customer table, a Salesperson
table, and an Orders table. On the left we see the state
or status of these three tables before the transaction begins. While on the right, we
see the state or status of these three tables after
the transaction has completed. And in this example,
we're illustrating how a transaction can introduce
anomalies into the database if one of the steps in
the transaction fails. So let's move through this
transaction one step at a time. Prior to running
the transaction we see that in our
Customer table we have one row for
customer number 123. The first step in
our transaction is to add a new record
into the Customer table. And here we're adding a new
record for order number 8,000, which is an order for 250
basketballs valued at $6,500. So that's the first
step in our transaction, is to record that new
information in the Customer table. The next step in
our transaction is to update the Salesperson
table to reflect the fact that our salesperson Jones
has made this sale of 250 basketballs valued at $6,500. So prior to the
transaction running, the record for salesperson
Jones in our Salesperson table showed that he or
she had accumulated $3,200 worth of sales
during this period. And after the transaction we
have updated our salesperson's record, such that he or she
now has $9,700 worth of sales. So the update reflects
the additional sale of $6,500 worth of basketballs. Finally, the third
step in our transaction is to insert the new order
into the Orders table. However, at this point in the
transaction an error occurs. And our error in this case is
that the Orders table is full. So perhaps we've run
out of storage space and the database
is therefore not able to add any additional
information to the Orders table. Now, if we stop
at this point, we have introduced several
anomalies into our database. Namely, we have a new
record in the Customer table which shows that
a customer wanted to purchase 250 basketballs. And we've updated
our Salesperson table to reflect the fact
that our salesperson Jones has made that sale. However, the order
information itself does not exist in
the Orders table. If we were to stop at
this point without undoing the changes that were
made during the earlier stages in the transaction,
then these anomalies that have been introduced
into the database would almost certainly cause
problems for this organization. By contrast, when all of
the steps in a transaction are treated as atomic, as a
single, logical unit of work, any changes that have been
made to the database tables will be undone if an error
occurs during the execution of the transaction. So in this case, we can
follow the execution of the transaction. And we see again,
the first step is to change the Customer
data to reflect the fact that 250 basketballs were sold. We would then change
the Salesperson data. We would then try to
insert the Order data. However, in this
case, an error exists, so we must roll back
the transaction in order to restore the database
tables to the state or status that they had prior
to the transaction. And following the rollback it
is as if the transaction never occurred. Next, let's see
an example of how a database server might
handle a concurrent processing situation. So imagine that we have two
Users, A and B, both of whom are trying to access
information simultaneously in our Item table. So User A makes a request
to read information related to item 100,
while at the same time User B makes a request
to read information related to item 200. User A then makes a request
to change a value associated with item 100, while
User B makes a request at the same point in time
to change a value associated with item 200. The resulting change for A is
then written to the Item table, while at the same time
User B is requesting that his or her changes be
written to the Item table. Despite the fact that
many of these requests are arriving simultaneously,
the database server must process them in
a sequential fashion. So as an example of how the
database server might actually process these
simultaneous requests, we can see that the
DBMS has arbitrarily chosen to handle the
requests from User A first. So although the requests
from User A and User B to read items 100 and
200, respectively, arrive at the same time,
the database first will read the value associated
with item 100 for User A, and then will read
the value associated with item 200 for User B. The proposed change
for item 100 for User A is then handled, after
which that change is written to the database for User A.
After User A's modifications to the Item table
have been completed, then the DBMS will handle
the requests from User B to make changes to the
Item table for item 200. Without an intelligent
concurrent processing strategy in place, the possibility exists
for us to encounter something called a lost update problem. And the lost update problem
occurs when two or more users are simultaneously
attempting to access the same piece of
information in the database, and one or more of
those user's changes are overwritten by the
changes made to the database by another user. Let's see an example
of how this can work. In this example
imagine, once again, that we have User A and
User B, and both and B are attempting to use
information in the Item table. In this case, both
User A and User B are interested in
item number 100. Let's assume for this example
that the current count value, say the quantity on
hand for item number 100, is 10 units. So we have 10 units of
item 100 in our inventory. Now, both User A and
User B simultaneously request from the database to
read how many units of item 100 are in inventory. So again, we are assuming that
the answer to that question is 10. User A after receiving
that information wants to instruct the database
to reduce the number of item 100 that is available
by five units, and then write the result
back to the database. Simultaneously User B wants to
reduce the number of item 100 that are available in
inventory by three units, and then write the result
back to the database. Now we as intelligent human
beings can look at this problem and we can say, OK we
begin with 10 units. User A, perhaps
is a salesperson, and is attempting to sell
five units of item 100, while User B,
another salesperson, is attempting to sell
three units of item 100. So in total eight
units are being sold. And the result after
these two operations should be that two units of
item 100 remain in inventory. So we began with 10
units User A sells five, units User B sells three units. The result should be two units. Unfortunately, because
these two users requests are arriving simultaneously
at the database, and in this example, we do not
have an intelligent strategy in place for handling this
type of concurrent access, we'll see that we are going
to encounter some problems. Let's see how the database
server might process these two requests. So first it might read
the quantity of item 100 available for User A. And it
would return a value of 10 to User A. It would then perform
the same operation for User B, returning a value
of 10 for User B. User A, then, would attempt to
set the number of units of item 100 in inventory to five. That is, our salesperson
has sold five units, so we are subtracting 5 from 10. 10 minus 5 is 5. Therefore User A is
attempting to set the new value of the
quantity of item 100 available to five units. That value is then
written to the database. So by the time that we reached
step four, inside the database the number of units
available is now five. The database then proceeds
to handle the requests from User B. Recall
that the database has told User B That there
are 10 units in inventory. User B then attempts
to reduce the number of units in inventory by three. The result of which would be 7. 10 minus 3 is 7. And then that value is
written to the database. So by the time we have
completed step six, the number of units of item
100 that our database shows we have in inventory is seven,
when in fact, it should be 2. Another way of saying that
is, because we did not have an intelligent
strategy in place for handling this
concurrent access, the changes that were made
to the database by User A have been overwritten
by the changes made by User B. This is an example
of a lost update problem. Speaking more broadly, we
have three concurrency issues which might emerge during the
execution of a transaction in a database processing
environment that supports multiple simultaneous users. The first among these
is called a dirty read. In a dirty read a transaction
will read a record that has been modified
by another user, but who's changes have not yet
been committed to the database. And by committed to
the database here, I mean that the database
has been informed that a change should be made,
but that change has not yet been made permanent. It has been proposed to be made. The database just has
not yet made that change. So, when a transaction reads
a record whose value has been modified by another
user, but has not yet been made permanent
within the database, that is called a dirty read. Our second concurrency issue
is called an inconsistent read. And in an inconsistent
read, the transaction is going to re-read a data
set that it has previously examined and will discover
that the values of the data have changed since the last time
that it examined those data. So imagine a transaction
where one of the early steps is to examine the
values in the data set, and then it performs some
additional operations on some other data. When it comes back
and re-examines those original values,
they've changed. Meaning that in the
interim, some other user or some other process
has altered the data upon which this
transaction is relying. This is called an
inconsistent read. And finally, we have
a concurrency issue known as a phantom read. In a phantom read,
when the transaction re-reads the data
set, it may discover that a new record has been
added since the last time that the transaction
examined the data set. So again, we can
imagine a transaction where somewhere in
the early steps, the transaction reads
a data set and then perform some operations
against some different data values in the database. And when it comes back, it finds
that a new record or a new row has been added to the original
data set that it read, which was not there
during the initial read. This is called a phantom read. So in order to address
these three different types of possible concurrency
control issues, the general strategy
that is employed by databases that
support concurrent access is known as resource locking. And the primary thing to
understand about this strategy is that when resource
locking is in place, data that are being
used by one user are temporarily locked, such
that other transactions which want to use those
data are not allowed use the data until
the transaction which has locked the data has
finished all of its tasks. Database locks come in
two different varieties, implicit and explicit locks. Implicit locks are
issued automatically by the database
management system, whereas explicit
locks are issued by users who are
requesting exclusive access to certain data
within the database. So user initiating a transaction
may lock certain database objects such that all of
the steps in the transaction can be completed
without having to worry about any ill effects
being introduced into the database
well the transaction is under way by
other users who may be trying to use the
same data simultaneously. Any other users
who are attempting to use those data
when they are locked, will simply be put
into a wait state. That is they're
added into a queue and the database will
then service their request after the resources
have been unlocked. Note that explicit
locks can be issued against several different
types of database objects at different levels
of granularity. A transaction,
for example, might request a table lock, in
which an entire table of data is locked. At a finer level of
granularity, the database might issue a row
lock or a column lock, where in a certain set of rows
within a table are locked, or a certain set of
columns within a table are locked while the
transaction is unfolding. And at the finest
level of granularity, we may issue a cell lock. That is, we may lock in a
single cell within the table temporarily while our
transaction is using or manipulating its value. Let's see an example of
a concurrent processing situation when an
explicit locking strategy is put in place. So once again we
have Users A and B. And we recall our
previous example where both users
are trying to access item 100 at the same time. We currently have 10 units
of item 100 in inventory. User A wishes is to reduce
the count of item 100 by five units. User B wishes is to reduce
the count of item 100 by three units. So the result should be, after
both of these transactions have been processed, that
we have two units remaining in inventory. Let's see what happens when an
explicit lock is put in place. So the database server begins
processing these two requests. It first locks
item 100 for User A then reads the value of the
inventory count for item 100. That is, it's going to
return a value of 10 units back to User A. At this point,
the database server may receive the request from the
User B to lock item 100. However, since item 100 is
already locked by User A, the database cannot service that
request at this point in time, so it places B in a queue. It says, OK User B, I
will get you eventually, but right now you have to wait. The database then
continues processing User A's transaction. So it sets the item
count to five for User A, because recall
that User A wanted to reduce the original
item count by five units. 10 minus 5 is 5. And then it writes
the results back to the table, after which
the lock is released. So by the time we have
completed step six, the inventory count for
item 100 in the database is now five units. And now that User A has
released their lock on the data, the database can then honor User
B's request to lock the data. It then reads the value
for B. So at this point it will return a
value of 5 units back to User B. User B wishes
is to debit that value by three units. 5 minus 3 is 2 units. The new inventory
count is set to 2. That value is stored
to the database and the database
releases the lock that had been established for
User B, such that other users now have access to item
100 should they need it. So by imposing this
locking strategy, we no longer have the
lost update problem that we saw in our
previous example. Using the locking strategy
User B had to wait temporarily, but the result is that our
inventory count is correct. An important concept
to understand at this point in our
discussion is the notion of a serializable transaction. And this is a
philosophical orientation which says that if two or more
transactions are processed concurrently, then after
all of the transactions have been completed the
results in the database should be logically consistent
with the results that would be in the database if the
transactions had been processed in arbitrary order. Let's revisit our
previous example to see if we can
understand this concept. When we went through this
example the first time, User A's requests were
serviced by the database before User B's requests were
serviced by the database. Let's now reverse
that process, so that we handle User
B's requests first, after which User
A's requests will be handled by the database. Again, recall that we are
beginning with 10 units of item 100 in inventory. If we handle user B's requests
first we lock item 100. We read the number
of units of item 100 in inventory for User
B. That value is 10. We then reduce the
count by three. That is, we set the value to 7. 10 minus 3 is 7. We then write the
result the database, after which the
lock is released. Next, a lock is issued for
User A. The number of units in inventory is read
for User A. The database returns a value of
7 for that result. User A then makes a request
to reduce the number of units in inventory by 5. 7 minus 5 is 2. So the resulting
value of 2 units is then written back
to the database. Thus the final number
of units in inventory is 2, regardless of whether we
process User A's transactions first, as we did
during our first trip through this example,
or whether we process User B's transactions first. In either case the result, using
our explicit locking strategy, is an inventory count of
two, and that is logically consistent with what
we would expect. So a locking strategy
seems very logical and can have great
benefits, but using a locking strategy can also
introduce certain new problems into the database
processing environment. And one of those problems
is known as deadlock. Deadlock occurs when two
transactions have each locked resources in the database
that the other transaction needs in order to
complete its tasks. In this situation
without some mechanism for avoiding deadlock, the
two transactions involved will have to wait on
each other indefinitely. Let's see an example that
will illustrate this deadlock phenomenon. So here we have a conceptual
example for User A and User B. We're not going to worry
about the technical details of the database. We'll just say that we're
working with simple objects, paper and pencils. So imagine that User A
initiates the following sequence of events. Lock paper then take some paper
or modify the value of paper, and then try to
lock the pencils. Simultaneously User
B locks the pencils, tries to take or modify
the value of the pencils, and then asks the
database to lock paper. So let's see what happens
at the database server when these requests arrive. The database first
locks the paper resource for User A. It then locks
the pencils resource for User B. The database next
process User A's requests to modify the value of paper and
then write those results back to the database. It then handles B's requests
to modify the value of pencils and then write those results
back to the database. And next, this is where
the deadlock occurs. A has requested to lock pencils,
but pencils have already been locked for User
B. At the same time, B has requested to lock
paper, but paper has already been locked for User A. Thus the database
would put User A in a wait state for the
pencils to be released, while user B would be put in
a wait state for the paper to be released. And these two transactions
would then wait on each other indefinitely and
neither would be able to complete its tasks
because the resources that it needs to complete its tasks
are locked by another user. Database management
systems must therefore incorporate a mechanism
for detecting or avoiding this deadlock situation. Next let's talk about two major
strategies toward locking. One of which we can
call optimistic locking, and the second of which we
can call pessimistic locking. Conceptually speaking,
in optimistic locking, we are optimistic that
the changes that we make to the database during
our transaction will succeed and no concurrency
control problems will arise during the execution
of the transaction. Conversely, during
pessimistic locking, we assume that a concurrency
control problem might arise during the execution
of our transaction, and so we proactively
seek to avoid that. Optimistic locking
strategies are useful in database
environments in which the majority of the transactions
involve just reading data. That is, we're not making a
lot of updates to the data. In this case, we
can be optimistic that no concurrency control
problems will occur, because it is unlikely that
more than one user will be trying to update the
same data at the same time. We simply don't have a lot
of updates in our database, so the likelihood of having
two simultaneous updates against the same data is small. And in that case, we might
implement an optimistic locking strategy. By contrast, if we are
in an environment where data are frequently updated
and the chances that two users might want to
simultaneously update the same data are reasonably
high, then in that case, we would want to implement a
pessimistic locking strategy. So let's consider
the differences between these two
locking strategies at a slightly lower level. To begin we'll examine
optimistic locking. And the basic process
which is employed under optimistic
locking is, we first read the data that is
requested by the transaction. We process the
transaction, that is, we initiate any sort of
modifications or updates to the data that we need. That update is recorded. We then check the database to
see if any sort of conflicts occurred, or any sort
of errors occurred. Is there anything unexpected? If no conflict has occurred,
we then commit the transaction and make our requested
changes permanent. They then become a permanent
part of the database. Otherwise, if some sort
of conflict did occur, then we initiate a roll back. We undo all of the
steps of our transaction and we try to repeat
the transaction again, until it ultimately
succeeds without a conflict. Under the pessimistic
locking strategy, we begin first by
locking the resources, reading the data that we need,
we then process the transaction by making any updates or
modifications to the database that are necessary. We commit those changes
and we release the locks. Note that in a pessimistic
locking strategy we do not need to check to
see if a conflict occurred. We simply commit the
changes directly. The reason that we
do not need to check to see if a conflict occurred
is because all the required resources for the
transaction were locked prior to any updates
or modifications being made to the data that are
being used by the transaction. Let's see some
intuitive examples of these optimistic and
pessimistic locking strategies. Here we see an example of an
optimistic locking strategy that has been written
in the pseudo-code. To begin, we request some
values from the product table. Specifically here we are
interested in the quantity of pencils that are
available in our database. So we request the
number of pencils available and we
record that value into a variable which here
we have named old quantity. So let's say that our
number of pencils is 10. So we request 10 pencils. We store the value 10 into
our old quantity variable. We then set the
new quantity value equal to product
quantity minus 5. That is, we set the value equal
to what should be 10 minus 5, or five units. Thus, the value of new
quantity should be equal to 5. We then continue
processing our transaction, looking for any sort
of errors, et cetera. And at this point,
we then issue a lock. We then attempt to
carry out our update. So we instruct the database to
update the value of pencils. We want to set the
value of pencils to the value stored in our
new quantity variable, where the quantity of
pencils in the database is equal to the old quantity. That is, it's equal to 10. If the quantity of pencils
has changed since we first retrieve the quantity at the
beginning of our transaction, that is, if the
quantity of pencils is no longer equal to 10 when
we attempt to issue our update, then we know that some other
user or some other process has modified the quantity
of pencils during the time while we were busy processing
other parts of the transaction. So when we check
and we determine that our update
was not successful, we would need to roll
back this transaction. That is we need to
undo any changes that were made along the
way and try to repeat the transaction until
ultimately it is successful. Next, let's look at
a similar example, except this time we will use a
pessimistic locking strategy. Remember in a pessimistic
locking strategy we assume from the outset that
one of our concurrency control problems might occur, and so to
defend against that we simply begin by first locking
all of the resources that we will need to
complete our transaction. So we lock all of the
resources that we need. In this case, we extract
the product quantity for pencils, which is 10 units. We set the new quantity equal to
the previous quantity minus 5. So the new quantity
is set equal to 5. We process any additional
steps in our transaction. We issue our update and we
then unlock the resources that were previously locked. In this case there's no need
to check whether the update was successful. We know it was
successful, because we had locked all of the
required resources that we needed prior to
executing transaction. At this point you may be
asking yourself, well, why would anyone ever want
to use an optimistic locking strategy if a pessimistic
locking strategy can always guarantee that the
update is successful? Well, the answer
to that question is, when a pessimistic
locking strategies is used, it is much more likely
that other users will be put into a wait state. That is, they'll have to
wait in line for access to the resources in the
database that are required to complete their transactions. And that can slow the overall
performance of the database. So ultimately, the decision
about whether to use pessimistic locking
or optimistic locking must come down to the
nature of the queries that are being run against
this particular database. If the database one in which
many, many updates are done, then a pessimistic
locking strategy would probably be preferred. If the database is one in
which most of the requests that are made of the
database are reads, that is we're just reading
data, selecting data out of the database, then an
optimistic locking strategy would probably be preferable. Next, let's talk
about this notion of a consistent transaction. And a consistent
transaction is commonly referred to in the industry
as an ACID transaction. That is, it is one which is
atomic, consistent, isolated, and durable. Atomic transactions are
treated by the database as a single logical
unit of work. As we said earlier,
an atomic transaction is one in which all of the
steps of the transaction must be completed successfully
in order for the transaction to be considered successful. That is, if any of the steps
in the transaction fails, then the entire
transaction fails. Treating a transaction as a
single logical unit of work helps us to avoid data
anomalies, as we saw earlier in this lecture. Consistent
transactions are those in which other transactions
are disallowed well the transaction is under way. And this sort of
consistency can be achieved through the artful
use of database locks. Next, let's consider
the extent to which a transaction is isolated. We now know that in
multi-user databases, it's possible for
different transactions to be using or working with
the same data at the same time. And because of this, we know
that the data in the database can be continuously changing. We therefore need to
make some decisions as database
administrators regarding the extent to which
our transactions will be isolated in our concurrent
database processing environment. Now, the ANSI SQL
standard defines four different isolation levels. And these four
isolation levels specify which of the three
concurrency control problems are allowed to occur
in the database. If you remember our
three concurrency control problems were dirty
reads, inconsistent reads, and phantom reads. So the least restrictive
isolation level is Read Uncommitted. And here, many concurrency
control problems are considered acceptable. That is, transactions are
not particularly isolated. As we move from Read
Uncommitted toward Serializable, our transactions become
increasingly isolated as they are processed by the database. A Serializable isolation
level than prevents concurrency control problems,
but comes at the expense of accessibility. That is, if we choose to
implement a Serializable transaction isolation level,
then although our transactions will be very well protected,
and the likelihood of concurrency control
problems occurring is essentially zero, the cost
of that is accessibility. It is much, much more
likely that users will have to wait for
longer periods of time in order to have the requests
serviced by the database. Finally, we have Durability. And a Durable
transaction is simply one in which all of
the changes which are committed during the
course of the transaction are considered permanent
within the database. Another topic that can be
very important for database administrators to understand
is that of cursors. Recall that when we use
a SQL SELECT statement, the result of that
statement is a set of rows that is returned by
the database management system. And we call that set
of rows a result set. A cursor is simply a pointer
into that set of rows. One of the reasons that
we need to use cursors in application programs
or data driven websites, is that these results sets
can often be very large. So it doesn't make
sense, or it might even be beyond the capability of
these applications or websites, to work with the entire
results set as a single unit. Using a cursor allows an
application or a website to process a result
set one row at a time. And these cursors
are commonly defined using SQL SELECT statements,
such as the example here which relies upon the
declare cursors syntax. Conceptually speaking, we have
two major types of cursors. There are forward only
cursors, also called non-scrollable cursors,
and scrollable cursors. As the name suggests, in
a forward only cursor, our rows are fetched one
at a time from the start to the end of the cursor. We iterate through the
cursor one row at a time, and we can only move
in one direction. That is, we cannot stop
halfway through and look at the previous row in
the cursor, for example. We start at the first
row in the result set. We then move to the second
row, to the third row, to the fourth row, and
so forth until we reach the end of the results set. And at no time are we
allowed to move backward through the results set. Because we cannot scroll
the cursor backwards, any changes that are made by
other users to the database to rows in our results set that
we have already examined will not be visible
through the cursor. So if I am reading through a
results set using a forward only cursor, and I read row one
and then row two and then row three, and at that point in time
some other user makes a change to row one, I would not be
able to see that change through the cursor, because I
cannot look backwards. If however, I am scrolling
through the results set, and through some sort of insert,
update, or delete statement, values in the results
set are changed for rows that I have not yet
examined, then I will see those changes
at the point in time when I examine
the affected rows. So for example, if I am
scrolling through a results set using a forward only cursor,
and I examine row one and then row two
and then row three, and at that point
in time someone makes a change which affects the
values of row five, by the time I reach row five I will be able
to see that changed result, because I am fetching rows
from the database serially, one at a time as I move
forward through the cursor. By contrast to these
forward only cursors, we also have scrollable cursors. And a scrollable cursor
allows us to move both forward and backward through a
results set as needed for the application program
or the data driven website to examine the rows. And there are three major
types of scrollable cursors, each of which has
different properties. These are static cursors, keyset
cursors, and dynamic cursors. So static, keyset,
and dynamic cursors are the three different
types of scrollable cursors. Let's see how they're
different from one another. Of the three different
types of scrollable cursors, static cursors are probably
the simplest to understand. The way that a
static cursor works, is that when the
cursor is requested, a snapshot of the
results is taken and the complete result set
is fetched from the database and stored in a temporary table. So when the cursor is
opened all of the results are fetched from the database
at that point in time and stored in a temporary table. The user of the cursor can
then scroll forward or backward through the results set. And the rows which are
visible to the cursor are those which were copied
into this temporary table at the moment in time when
the cursor was opened. Because of this, because of the
use of this temporary table, any changes that are made to
the underlying data, that's is, in the real tables from
which the temporary table was constructed, any change it is
made to those underlying data are not visible
through the cursor. So if I open a cursor
using a select statement and that statement
fetches a number of rows into this temporary table
for use by my static cursor. And at that point
in time other users are changing the
underlying values from which my temporary table
was constructed I will not be able to see any of those
changes through the cursor, because I am not
actually looking at the data in the
real table anymore. My cursor, or my static
cursor, is simply using the data from the
temporary table to move forward or backward through
the rows as necessary. So the disadvantage of
these static cursors is that changes which are
made to the underlying data are not visible. But the major advantage
is that they're simple and they consume comparatively
few resources on the database server. The next scrollable cursor
type is called a keyset cursor. And the way that a
keyset cursor works is that when a cursor is
opened, rather than fetching the entire results
set, the database will fetch all of the key values
for the rows in the results set and store those key values
in a temporary table. So for our purposes
let's consider this to be primary key values. So if I open a
cursor on a table, and let's say that the results
set from that select statement would retrieve 1,000
rows, the database will fetch the primary
key values of the 1,000 rows of data that
comprise the results set and it will store those
primary key values in a temporary table. Then as I am scrolling
through the rows using the cursor moving
either forward or backward, every time I scroll
forward to the next row or scroll backward
to the previous row, the database will use
that primary key value to fetch the values
in the underlying table at that moment in time. Because of this, updates
that are made to a row after the cursor
was opened are going to be visible
through the cursor. So let's say that my
keyset cursor fetches a number of key values
when the cursor is opened. These are primary key values,
let's say 1, 2, 3, 4, and 5. I start moving
through the cursor. I examine row one. When I scroll to
that row the database fetches the values for primary
key ID one out of the database, returns them to me. I then scroll to row
2, and the database fetches the values for
row two from the database. Let's say at that
moment in time, some other user changes
the values for row one. Well, if I scroll backwards
using a keyset cursor, I will use the primary key
value, in this case one, to fetch the current
values of the data in the underlying table
and I will therefore be able to see the
changes to row one that were made by the other user. Therefore, updates that
are made to the row after the cursor was
opened are visible. It is also possible to
identify deleted rows using a keyset cursor, because
as I scroll forward or backward through the results
set using the cursor the database will use
the primary key values that it had stored in
the temporary table to attempt to fetch the
true values of the data from the underlying table. And if at that moment in
time a row no longer exists, then the database will
fetch a null result for me through the cursor
and I can then infer that the associated
row has been deleted. Although updates and
deletes are visible through a keyset
cursor, new rows, that is rows that are added
using an insert statement, are not visible
through the cursor. And the reason for this
is that the key values for all of the rows
in the results set are fetched and stored
in temporary table at the point in time when
the cursor is opened. Therefore, any new
rows which might be added after
that point in time, their key values, their
primary key values, are not going to be included
in my temporary table. And as I scroll forward or
backward through the results set I will therefore be unable
to identify any new rows that have been added after
the point in time when the cursor was opened. Finally we have dynamic cursors. And a dynamic cursor is
essentially the opposite of a static cursor. Dynamic cursors use a
comparatively large quantity of system resources, because
they allow all changes that are made to the rows of a
result set to be visible while the cursor is scrolling. So as I scroll through
my dynamic cursor, forward or backward,
any changes that have been made to
the results set are going to be visible
to me through the cursor. That is, all changes
made through updates, all new rows added
through insert statements, or any rows deleted
through delete statements. I will be able to
identify those changes as I scroll forward or backward
using a dynamic cursor. Although this capability
certainly is convenient, I hope you can understand
that it consumes quite a large quantity
of system resources to enable this
level of visibility through the dynamic cursor. Toward the beginning
of our lecture we talked about
three major functions that to a database
administrator must perform. The first was understanding
concurrency control, and we talked about
that, as well as the effects of concurrent access
on different types of cursors. And the second of
the major functions that a database
administrator must perform is to manage database security. In a multi-user
database environment, database security is
critically important. The objective of creating
a database security model is to ensure that only
authenticated users are allowed to perform only
authorized activities. So potential users, we want
them to authenticate themselves to the database. Those users can then attempt
to carry out activities, but only the activities for
which they have been authorized will be allowed to be
performed against the database. To enable database
security, modern databases implement a model, which is
based on processing rights. And database processing
rights define two separate things. First is who, by
which I mean which users, who is permitted to
perform which actions and when are those actions
allowed to be performed? After defining and
assigning processing rights to groups or individual
users, then the individuals who are performing activities
against the database will have full responsibility
for the implications of their actions. That is to say, if the database
administrator has granted you the power to, say delete
customers from the database, and you choose to
exercise that power, then it's not the database
administrators fault if you decide to delete
all of customers. It's your fault. Typically in modern
databases, individuals authenticate themselves to
the database management system through the use of a
username and a password. And the database administrator
can establish password rules, timeout periods, and
so forth, just as if we were managing
user accounts on a different type of server. In the realm of
database permissions, one critical concept
to understand is that a user can be
known to the database, both as an individual and as
a member of one or more roles. And because of this, it is
possible for our database privileges to be assigned
to both individuals on a user by user
basis and to the roles to which different
individuals may belong. Another critical concept
to understand with respect to database permissions
is that permissions are associated with database
objects, such as tables. And permissions can
therefore be granted, denied, or revoked with respect
to a user's ability to interact with one of
those database objects. Because users can be
assigned privileges as individuals and as members of
one or more roles, then a given user will possess
the compilation of all of the permissions
that have been granted to him or herself, and to all of
the roles to which they belong. More formally, a user
receives the union of all of the permissions that
have been granted, denied, or revoked by their
individual permissions and their role permissions. As an example, you
may be assigned to a role which has permissions
to read, modify, or delete rows in the customer table. However, as an individual,
the database administrator may deny your rights to
delete rows within that table. In that case, your
role privileges would ordinarily
allow you to perform operations such as select,
insert, update, and delete. However, if the database
administrator has additionally denied you as an
individual rights to delete rows in that
table, then the database would not allow you
to delete any rows. So what happens
then if a conflict exists between individual
permissions and role permissions? Well the way that databases
handle this is first, a deny permission will
always take precedence. That is, if the database
administrator has denied you as an individual access to
one or more database objects or if your role denies you
access to one or more objects, then that deny will always
be honored by the database, even if your other individual
rights or role rights what otherwise grant you access
to the database object. The deny will always
take precedence. Aside from those
denies, role permissions are always given precedence
over individual permissions. And this is a critical point
for a database administrator to understand. In this diagram we're
using the set of symbols that we learned about in
entity relationship modeling to illustrate the
database security model. Beginning on the left in
working toward the right, we can see that a user can
have zero to many roles, while a role can have
zero to many users. Both users and roles are
assigned permissions, which are associated
with different objects in the database. So a user can have zero
to many permissions, a role can have zero
too many permissions, and an object might have zero
too many permissions associated with it. In SQL Server database
roles can either be fixed or they can be flexible. Flexible Roles are
custom roles that are established and defined
by the database administrator to support the specific
needs of his or organization. By contrast, fixed rules are
predefined roles to which database users can be assigned. Here we see a list
of the fixed roles that come with SQL Server. A user who was assigned to the
DB Owner role, for example, is allowed to
essentially do anything they want with the
given database, including dropping the database. A user who has been assigned
to the DB Data Writer role is allowed to insert rows,
delete rows, or update rows in all of the user
tables within a database. By contrast, if a user has been
assigned to the DB Data Reader role, they're allowed
to read all of the data from the user tables. SQL Server also comes with
two fixed deny database roles. So a user assigned to the
DB Deny Data Writer role is not allowed to add,
modify, or delete data within the database. Whereas a user who is assigned
to the DB Deny Data Reader role is not allowed to read any
data within the database. And there are, of course,
several other fixed roles which are listed here
that we won't talk about. Just a few final points
to finish up our overview of database security. Data are absolutely critical
to modern organizations, and as such, the
DBMS should always be run behind the firewall. This may seem obvious,
but nevertheless, I hope the point is well taken. It's important for the
database administrator to ensure that the
latest operating system and DBMS service packs
and patches are always applied. This will help to ensure the
security of the database. Philosophically speaking,
the functionality of the DBMS should be intentionally
constrained only to those functions
and features that are needed by the organization. Finally, the database
administrator should ensure the physical
protection of the database servers by placing those
servers in a secure location, and using physical
barriers like locked doors. And the user accounts
and passwords which allow access
into the DBMS should be managed very carefully
following established organizational policy. The third of our three major
database administration functions was
backup and recovery. A backup and
recovery strategy is necessary to hedge against
the possibility of database failures. Databases can fail for
many different reasons, including failures of
hardware, bugs in software, human error, or
malicious actions, like efforts by hackers,
denial of service, et cetera. Because it is not possible
to completely avoid all of these issues, it
is absolutely essential that a database administrator
establish recovery procedures for the database. And the foundation the basis
of these recovery procedures is making regular
backups of the database. So a database administrator
should back up the database regularly, but not so often
as to negatively impact the accessibility
of the database. With backup procedures in
place, then the simplest method of recovering a
database that has failed is to go through a process
called reprocessing. And in reprocessing,
the strategy is to restore the database using
the most recent backup copy. And then have all of
the database activities which were executed since
the last backup copy be performed again by
human beings who originally performed those actions. So if our organization,
for example, has a database failure and
we restore a backup copy that was made to days ago. And we need to recover
all of the sales that were added into the database. Using the processing strategy
we would ask our sales force to go back through their
records for the past three days and add each sale that
they made into the database again manually. This approach, of course,
is costly in terms of the effort involved
by the human beings who must reenter all the data. And it is also a risky, insofar
as human error is likely. This strategy is best used only
in very, very small database implementations, such as
personal databases or very, very small corporate databases. For a database that contains
any reasonable amount of data or has more than just a very
few users a better strategy is required. The strategy that is used
by most enterprise level relational database
systems is called rollback and rollforward,
and again this is based on the notion of
having a database backup. The second mechanism that is
required to enable rollback and rollforward is
something called a log file. And log file, as
the name suggests, is simply a file which records
a log of all of the activities to carry out against
the database. With a backup copy of the
database and the log file it is possible for a
database administrator to recover the database
using either rollback or rollforward as necessary. The rollback strategy is
used to undo transactions. So for example, perhaps a hacker
was able to access our database and add a large number
of malicious transactions to the database,
and we would want to undo those transactions. We can undo those transactions
using the rollback strategy. And the process is
quite straightforward. Essentially, rollback requires
that the log file contains copy of the values of
any changed records before they were
changed, as well as the order in which changes
were made to the database. So these types of records
are called before images. They save the values
in a given cell, for example, before the
change is made to the cell. To undo a transaction
then, all we need to do is work backwards
through the log file and apply the before images of
all of the changes in the log file one by one. If by contrast, we want
to redo a transaction, then the log file
needs to contain a record of the values
of all of the changes after the change was made. And these records are
called after images. To redo a transaction
then, we simply move forward through the log
file one record at a time and apply the after images of
the changes to the database. Here we see an example of
a transaction log file. Each of the three
transactions in this log file is color coded. So one of the transactions is
in green, the other in red, the third in a light yellow. My apologies if you happen
to be red green colorblind. Note first that
records in the log file are recorded in
chronological order. That is, they are recorded
in the order in which they were executed by the database. The next critical thing to note
about the transaction log file is that it contains pointers
to the next and previous rows involved in the transaction. So taking row two
as an example, we can see that the
previous row is row one and the next row
in the transaction is row four within the log file. So these pointers
tell us where to look for the next part
of the transaction or where to look for
the previous part of the transaction. Now let's just work through
a few of these transactions so that we have a solid
understanding of how these transaction
log files work. Beginning with
transaction OT1 you can see the first record
of the transaction is the transaction starting. The next thing that we do is
we modify customer number 100. And you can see we record both
a before image and after image. That is, the previous value
prior to the modification for customer 100
and the new value after the modification
for customer 100. The next step in
this transaction is modifying salesperson AA. And we, again, record
the previous value for this record and the
new value for this record. Next part of the transaction
is to insert a new record into the order table and in this
case there is no before image, because it is a new record so
we only record the new value. We then finish the transaction
by issuing a commit. And thus ends our
first transaction. Let's look at another example. Perhaps we looked at CT1. So we see the
transaction begins. We then skip down to row
nine for the next step of the transaction, where
we modified salesperson BB. We record the previous
value and the new value, and then we move
to row 10 where we commit the changes made by our
transaction to the database. Now let's examine this
roll back procedure in a little more detail. Remember that the log file saves
its activities in the order in which they were carried out. So they are saved in
sequence within the log file. We can therefore
undo any changes that were made to the
database by using the before images in the log file
and moving through the log file in reverse order. Let's see an example of how
this works using our log file. Consider transactions CT1. In this transaction we
modified salesperson BB and we recorded the
before image, which was the previous
value, and the after image, which was the new value. If we want to undo this
transaction all we need to do is take the old value,
that is the before image, and apply it to the database. So if the old value was, say
five and the new value was 10, by setting the
value equal to 5 it is as if this transaction
never occurred. That is, it is if
we never modified salesperson BB to begin with. Extending this concept,
continuing to move backward through the log
file, you can see how we can undo one
transaction after the other after the other, until
ultimately our database will be rolled back to a
specific point in the past. Graphically this
rollback process can be understood using
this sort of a diagram. We begin with a database that
has undesirable changes that have been made to it. We apply our before
images to that database in an undo operation,
and the result is the database without
the undesirable changes. Next, let's consider
roll forward. Remember, the activities
in the log files are recorded in sequence. That is, in the
order in which they were applied to the database. This means that if we begin
with the most recent backup copy of our database and we start
at the top of the log file and move downward, we can
essentially redo or replay all of the transactions that
occurred since the last backup. By the time we reach
the end of this process, our database will be restored
to the point at which it was when the failure occurred. This process is
called rollforward. And again, it relies
upon the backed up version of the database
and the log file. Let's see an example
using our sample log file. So let's imagine that this
transaction log file contains all of the
transactions that were applied to the database
since the last backup. Our database has failed, we
have restored the database using the last to back
up, and now we want to recover all of
the changes that were made in the time that has
passed since the last backup. So what we do is, we take
this transaction file and we simply replay
it beginning at the top and working our way
toward the bottom. As we go through
and we are simply restoring all of the changes
that were previously made. By the time we arrive at
the bottom of the file our database will be
fully recovered up until the point when it failed. Graphically this
rollforward process can be understood using
a diagram such as that's shown here. We begin with a backup copy
of the database, that is, the database without
the desired changes. We take the after
images from our log file, apply those
to the backup copy of the database
in a redo process, and the result is that the
database has been restored and all of the desired
changes have been reapplied. In addition to these
three principal tasks of handling concurrent access,
handling database security, and establishing a framework
for rollback and recovery, a database administrator
has several additional responsibilities as well. First, the DBA needs
to ensure that there is a mechanism in place
through which user reported errors
and other problems can be gathered and stored. This information can be
used to tune the database, modify the database, et cetera. Additionally, the DBA
also needs to create to manage a process for
controlling the database configuration. Organizations are living
entities and we cannot expect our initial database
configuration to continue to be satisfactory for the
organization as time passes. As the needs of the
organization change, it's very likely
that we will also need to alter the
configuration of the database. And for this reason, we
need to establish a process for controlling the
database configuration. Finally, the DBA is also
responsible for documentation. So the DBA needs to
continually document the structure of the database,
the concurrency control strategy that has been employed,
the security model, the backup and recovery strategy, and
the various applications and websites that rely
upon the database. When taken together all
of this documentation helps to provide it and
holistic understanding the role of the database in
the broader database processing environment within
the organization. There are just a
few remaining topics that I would like to
touch on in this lecture. The first two which
is distributed database processing. Although conceptually
it's convenient to think of a database as a
single entity located in a single physical position,
perhaps on a single server, it is now very common
for real world databases to be partitioned,
replicated, or both. And a given conceptual
database might be split across many
different servers, perhaps in different
geographic locations. We may also have replications
of the database, which are continuously cloning
all of the actions that are occurring as part of a
backup and recovery strategy. And so we need to start thinking
of these databases in broader terms. Let's look at a few examples
as to how databases can be partitioned or replicated. In this first example,
we see a version of a database which aligns with
our simple conceptual model. That is, the database
is nonpartitioned and is nonreplicated. All of the tables in this
case, labeled W, X, Y and Z, exist on a single
database server. In our next example we
see a database partition. And in this case,
the database has been subdivided such that it
runs on two separate database servers. In this case, tables A and X are
operating on server number one, whereas tables Y and Z are
operating on server number two. And the two servers
are interconnected via communications
channel of some sort. In our third example, we
see a replicated database. In this case, we have
two database servers and each server is maintaining
a full and complete copy of the entire database. So they're interconnected
with a communications line, and we keep track of tables
W, X, Y, and Z on both servers such that we have a full real
time backup strategy in place. And in our final example,
we see the most complicated of these four designs, and that
is a partitioned and replicated database, where we have
two different database servers interconnected with
the communication line. On the first server
we store tables W, X, and Y. To whereas on the
second server we store tables Y and Z. So in this case,
we are replicating table Y between servers one two. And we are then
storing tables W and X additionally on
server one and table Z additionally on server two. So there are many different
ways in which databases can be subdivided according
to our partitioning and replication needs. Our final topic
for today's lecture will be the object oriented
paradigm and its role in the database world. Over the past several decades
object oriented programming has come to be the most
common paradigm for software development. And object oriented
programming is the basis of most of the world's most
popular and widely used computer programming languages. Just like entities
in a database, objects in the object
oriented programming world have identifiers and properties. Objects, however, can
also have methods. And this is what
differentiates them from entity classes,
such as those that we've studied in the
entity relationship model. A method can be thought
of as an algorithm which allows the object to
perform certain tasks. Let's say, for example, that
we have an employee object, and one of its properties is
the employee's date of birth. Well we might then
have a method which allows us to calculate the
employee's age in years. So we could call
that age method, and it would return
a value which would reflect how
old the employee is in years based upon the
current time and the employees date of birth. Since most of the world's widely
used programming languages are based on the
object oriented model, software developers
commonly have the need to save objects on a
storage device such that they can be
retrieved for future use. And this is called
object persistence. Now, object persistence
can be accomplished using several different approaches. One approach, for example, would
be to serialize the objects and store them as a binary
file on the file system. But another common technique
is to use database technology for storing objects. For this purpose, normal
relational databases can be used, but
doing so requires substantial additional
programming. To address this
problem, several vendors have over the past
few decades developed a type of database
management system known as an object oriented
database management system. And the purpose of these object
oriented database management systems is to
provide the ability to easily store and
retrieve objects for use by software programs
that are built using the object oriented paradigm. Unfortunately these
object oriented databases have had very
little commercial success, largely due to the fact
that transferring existing data from relational and other
organizational legacy databases would just be too cumbersome,
too time consuming. That is, most
managers have decided that the cost of transitioning
to an object oriented database are simply not
worth the benefits. The good news is
that the current SQL standard includes several
object oriented features. The two most
prominent being user defined types and inheritance. So if you are familiar with
object oriented programming, you will immediately
see the connection between these features
of the SQL standard and the object
oriented paradigm. Therefore because these features
are a part of the current SQL standard, any database
that fully supports the current standard,
can therefore be considered an object
relational database. That is, they not
only incorporate all of the features
and capabilities of the relational model,
but they additionally incorporate certain
object oriented features which enhance their
viability and value for use in conjunction with
object oriented software programs. Most major database vendors
currently support at least some of the object oriented
features of the SQL standard. This includes Oracle, DB2,
SQL Server among others. Well, my friends, thus ends
our further exploration of database administration. I hope you learned something
interesting in this lecture. And until next time,
have a great day.