Database Lesson #6 of 8 - Database Administration

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Good day, everyone. This is Dr. Soper here. And today we will be exploring our sixth topic in our series of database lectures. With today's lecture focusing on database administration. Although we have many different objectives for today's lecture, there are a few primary areas on which I will be focusing the majority of my comments. These are concurrent access, which is a situation in which more than one user wants to access a given database at the same time, database security, and the model that has been put in place for backup and recovery of these relational databases. We will of course be addressing the several additional topics along the way, but the primary focus of our lecture today will be those three areas. In addition, toward the end of the lecture we will perform a quick examination of the administrative and managerial tasks that a database administrator must complete. And we'll also be examining distributed database processing, which is a situation in which our database might be replicated or partitioned such that it can exist on more than one machine. And we will conclude our lecture today with a brief overview of object oriented databases and object relational databases. To begin, consider this conceptual representation of the database processing environment. This diagram is intended to show the complex environment in which modern database technologies are used. We can see here that the database management system exists at the center of this environment, and it is exclusively the database management system that has access to the data themselves. However, many different types entities can make requests of the database management system. Aside from internal database triggers and stored procedures, we have application programs that can utilize our DBMS. Web requests, perhaps in the form of active server pages or Java server pages that similarly can utilize our DBMS. And we have any number of additional sources of requests that the DBMS may need to process, such as ad hoc queries, forms, reports, et cetera. So as we can see here, the possibility exists for a single database to have to serve many different types of requests simultaneously. Because of this complexity we need to view a database processing environment as a dynamic environment. And in order to ensure that the database can meet the needs and demands of all of the different parties and software applications that may rely upon the database, a database administrator must configure and manage the database such that it provides for concurrency control, security, and backup and recovery in the case that something goes wrong. To begin, let's consider concurrency control. So broadly speaking, the idea of concurrency is that more than one user, in this case a user might be a person, a human being, or software application, such as a standalone application, a web service, or a data driven website. More than one of these types of users may be trying to access the same data in the database at the same time. And this need to provide for concurrent access leads to a problem known as interdependency, where in modifications to the database that are made by one user may impact other users who are simultaneously trying to use the same information. So the objective of currency control mechanisms in database administration is to ensure that the actions of the various users, who may be trying to access the same data at the same time, are not going to adversely impact one another. Another way of saying that is, we want to put a mechanism in place that will ensure that the quality, and reliability, the accuracy of the data remain intact, despite the fact that many users might be trying to work with or modify those data at the same time. The primary challenge with concurrency control is balancing this need to maintain the quality of data in the database with the need to provide accessibility to the data, so that many users can efficiently use the database at the same time. Thus, we have a spectrum. On one end of the spectrum we can design our concurrency control framework such that when one user accesses or touches a particular set of data within the database, we block all other users from being able to use those data until the first user has finished whichever tasks he or she was trying to perform. The other end of this concurrency control spectrum would be that we don't impose any constraints and a user is allowed to read and work with data regardless of whether any other users are simultaneously using or working with those data. In a high volume high user database processing environment we must often try to find some level of the balance between the two extremes of this concurrency control spectrum. Next I would like to introduce the concept of a database transaction. Simply put, a database transaction is a series of operations that need to be performed against the database. One way to think about this is that a transaction is a series of SQL commands or a series of instructions. Despite the fact that a transaction typically involves more than one operation, it is critical in the database processing environment that all of the operations or all the instructions involved in the transaction be treated as a single atomic unit, that is a logical unit of work. And using this approach every step in the transaction must be completed successfully, or the entire transaction will fail. That is, every step must succeed in order for the entire transaction to succeed. If any of the steps fails, then the whole transaction will fail, and we will need to undo, or roll back, any changes that were made to the database during the transaction prior to the point when the transaction failed. Here we see an example of a transaction which is going to involve three separate tables in a database. A Customer table, a Salesperson table, and an Orders table. On the left we see the state or status of these three tables before the transaction begins. While on the right, we see the state or status of these three tables after the transaction has completed. And in this example, we're illustrating how a transaction can introduce anomalies into the database if one of the steps in the transaction fails. So let's move through this transaction one step at a time. Prior to running the transaction we see that in our Customer table we have one row for customer number 123. The first step in our transaction is to add a new record into the Customer table. And here we're adding a new record for order number 8,000, which is an order for 250 basketballs valued at $6,500. So that's the first step in our transaction, is to record that new information in the Customer table. The next step in our transaction is to update the Salesperson table to reflect the fact that our salesperson Jones has made this sale of 250 basketballs valued at $6,500. So prior to the transaction running, the record for salesperson Jones in our Salesperson table showed that he or she had accumulated $3,200 worth of sales during this period. And after the transaction we have updated our salesperson's record, such that he or she now has $9,700 worth of sales. So the update reflects the additional sale of $6,500 worth of basketballs. Finally, the third step in our transaction is to insert the new order into the Orders table. However, at this point in the transaction an error occurs. And our error in this case is that the Orders table is full. So perhaps we've run out of storage space and the database is therefore not able to add any additional information to the Orders table. Now, if we stop at this point, we have introduced several anomalies into our database. Namely, we have a new record in the Customer table which shows that a customer wanted to purchase 250 basketballs. And we've updated our Salesperson table to reflect the fact that our salesperson Jones has made that sale. However, the order information itself does not exist in the Orders table. If we were to stop at this point without undoing the changes that were made during the earlier stages in the transaction, then these anomalies that have been introduced into the database would almost certainly cause problems for this organization. By contrast, when all of the steps in a transaction are treated as atomic, as a single, logical unit of work, any changes that have been made to the database tables will be undone if an error occurs during the execution of the transaction. So in this case, we can follow the execution of the transaction. And we see again, the first step is to change the Customer data to reflect the fact that 250 basketballs were sold. We would then change the Salesperson data. We would then try to insert the Order data. However, in this case, an error exists, so we must roll back the transaction in order to restore the database tables to the state or status that they had prior to the transaction. And following the rollback it is as if the transaction never occurred. Next, let's see an example of how a database server might handle a concurrent processing situation. So imagine that we have two Users, A and B, both of whom are trying to access information simultaneously in our Item table. So User A makes a request to read information related to item 100, while at the same time User B makes a request to read information related to item 200. User A then makes a request to change a value associated with item 100, while User B makes a request at the same point in time to change a value associated with item 200. The resulting change for A is then written to the Item table, while at the same time User B is requesting that his or her changes be written to the Item table. Despite the fact that many of these requests are arriving simultaneously, the database server must process them in a sequential fashion. So as an example of how the database server might actually process these simultaneous requests, we can see that the DBMS has arbitrarily chosen to handle the requests from User A first. So although the requests from User A and User B to read items 100 and 200, respectively, arrive at the same time, the database first will read the value associated with item 100 for User A, and then will read the value associated with item 200 for User B. The proposed change for item 100 for User A is then handled, after which that change is written to the database for User A. After User A's modifications to the Item table have been completed, then the DBMS will handle the requests from User B to make changes to the Item table for item 200. Without an intelligent concurrent processing strategy in place, the possibility exists for us to encounter something called a lost update problem. And the lost update problem occurs when two or more users are simultaneously attempting to access the same piece of information in the database, and one or more of those user's changes are overwritten by the changes made to the database by another user. Let's see an example of how this can work. In this example imagine, once again, that we have User A and User B, and both and B are attempting to use information in the Item table. In this case, both User A and User B are interested in item number 100. Let's assume for this example that the current count value, say the quantity on hand for item number 100, is 10 units. So we have 10 units of item 100 in our inventory. Now, both User A and User B simultaneously request from the database to read how many units of item 100 are in inventory. So again, we are assuming that the answer to that question is 10. User A after receiving that information wants to instruct the database to reduce the number of item 100 that is available by five units, and then write the result back to the database. Simultaneously User B wants to reduce the number of item 100 that are available in inventory by three units, and then write the result back to the database. Now we as intelligent human beings can look at this problem and we can say, OK we begin with 10 units. User A, perhaps is a salesperson, and is attempting to sell five units of item 100, while User B, another salesperson, is attempting to sell three units of item 100. So in total eight units are being sold. And the result after these two operations should be that two units of item 100 remain in inventory. So we began with 10 units User A sells five, units User B sells three units. The result should be two units. Unfortunately, because these two users requests are arriving simultaneously at the database, and in this example, we do not have an intelligent strategy in place for handling this type of concurrent access, we'll see that we are going to encounter some problems. Let's see how the database server might process these two requests. So first it might read the quantity of item 100 available for User A. And it would return a value of 10 to User A. It would then perform the same operation for User B, returning a value of 10 for User B. User A, then, would attempt to set the number of units of item 100 in inventory to five. That is, our salesperson has sold five units, so we are subtracting 5 from 10. 10 minus 5 is 5. Therefore User A is attempting to set the new value of the quantity of item 100 available to five units. That value is then written to the database. So by the time that we reached step four, inside the database the number of units available is now five. The database then proceeds to handle the requests from User B. Recall that the database has told User B That there are 10 units in inventory. User B then attempts to reduce the number of units in inventory by three. The result of which would be 7. 10 minus 3 is 7. And then that value is written to the database. So by the time we have completed step six, the number of units of item 100 that our database shows we have in inventory is seven, when in fact, it should be 2. Another way of saying that is, because we did not have an intelligent strategy in place for handling this concurrent access, the changes that were made to the database by User A have been overwritten by the changes made by User B. This is an example of a lost update problem. Speaking more broadly, we have three concurrency issues which might emerge during the execution of a transaction in a database processing environment that supports multiple simultaneous users. The first among these is called a dirty read. In a dirty read a transaction will read a record that has been modified by another user, but who's changes have not yet been committed to the database. And by committed to the database here, I mean that the database has been informed that a change should be made, but that change has not yet been made permanent. It has been proposed to be made. The database just has not yet made that change. So, when a transaction reads a record whose value has been modified by another user, but has not yet been made permanent within the database, that is called a dirty read. Our second concurrency issue is called an inconsistent read. And in an inconsistent read, the transaction is going to re-read a data set that it has previously examined and will discover that the values of the data have changed since the last time that it examined those data. So imagine a transaction where one of the early steps is to examine the values in the data set, and then it performs some additional operations on some other data. When it comes back and re-examines those original values, they've changed. Meaning that in the interim, some other user or some other process has altered the data upon which this transaction is relying. This is called an inconsistent read. And finally, we have a concurrency issue known as a phantom read. In a phantom read, when the transaction re-reads the data set, it may discover that a new record has been added since the last time that the transaction examined the data set. So again, we can imagine a transaction where somewhere in the early steps, the transaction reads a data set and then perform some operations against some different data values in the database. And when it comes back, it finds that a new record or a new row has been added to the original data set that it read, which was not there during the initial read. This is called a phantom read. So in order to address these three different types of possible concurrency control issues, the general strategy that is employed by databases that support concurrent access is known as resource locking. And the primary thing to understand about this strategy is that when resource locking is in place, data that are being used by one user are temporarily locked, such that other transactions which want to use those data are not allowed use the data until the transaction which has locked the data has finished all of its tasks. Database locks come in two different varieties, implicit and explicit locks. Implicit locks are issued automatically by the database management system, whereas explicit locks are issued by users who are requesting exclusive access to certain data within the database. So user initiating a transaction may lock certain database objects such that all of the steps in the transaction can be completed without having to worry about any ill effects being introduced into the database well the transaction is under way by other users who may be trying to use the same data simultaneously. Any other users who are attempting to use those data when they are locked, will simply be put into a wait state. That is they're added into a queue and the database will then service their request after the resources have been unlocked. Note that explicit locks can be issued against several different types of database objects at different levels of granularity. A transaction, for example, might request a table lock, in which an entire table of data is locked. At a finer level of granularity, the database might issue a row lock or a column lock, where in a certain set of rows within a table are locked, or a certain set of columns within a table are locked while the transaction is unfolding. And at the finest level of granularity, we may issue a cell lock. That is, we may lock in a single cell within the table temporarily while our transaction is using or manipulating its value. Let's see an example of a concurrent processing situation when an explicit locking strategy is put in place. So once again we have Users A and B. And we recall our previous example where both users are trying to access item 100 at the same time. We currently have 10 units of item 100 in inventory. User A wishes is to reduce the count of item 100 by five units. User B wishes is to reduce the count of item 100 by three units. So the result should be, after both of these transactions have been processed, that we have two units remaining in inventory. Let's see what happens when an explicit lock is put in place. So the database server begins processing these two requests. It first locks item 100 for User A then reads the value of the inventory count for item 100. That is, it's going to return a value of 10 units back to User A. At this point, the database server may receive the request from the User B to lock item 100. However, since item 100 is already locked by User A, the database cannot service that request at this point in time, so it places B in a queue. It says, OK User B, I will get you eventually, but right now you have to wait. The database then continues processing User A's transaction. So it sets the item count to five for User A, because recall that User A wanted to reduce the original item count by five units. 10 minus 5 is 5. And then it writes the results back to the table, after which the lock is released. So by the time we have completed step six, the inventory count for item 100 in the database is now five units. And now that User A has released their lock on the data, the database can then honor User B's request to lock the data. It then reads the value for B. So at this point it will return a value of 5 units back to User B. User B wishes is to debit that value by three units. 5 minus 3 is 2 units. The new inventory count is set to 2. That value is stored to the database and the database releases the lock that had been established for User B, such that other users now have access to item 100 should they need it. So by imposing this locking strategy, we no longer have the lost update problem that we saw in our previous example. Using the locking strategy User B had to wait temporarily, but the result is that our inventory count is correct. An important concept to understand at this point in our discussion is the notion of a serializable transaction. And this is a philosophical orientation which says that if two or more transactions are processed concurrently, then after all of the transactions have been completed the results in the database should be logically consistent with the results that would be in the database if the transactions had been processed in arbitrary order. Let's revisit our previous example to see if we can understand this concept. When we went through this example the first time, User A's requests were serviced by the database before User B's requests were serviced by the database. Let's now reverse that process, so that we handle User B's requests first, after which User A's requests will be handled by the database. Again, recall that we are beginning with 10 units of item 100 in inventory. If we handle user B's requests first we lock item 100. We read the number of units of item 100 in inventory for User B. That value is 10. We then reduce the count by three. That is, we set the value to 7. 10 minus 3 is 7. We then write the result the database, after which the lock is released. Next, a lock is issued for User A. The number of units in inventory is read for User A. The database returns a value of 7 for that result. User A then makes a request to reduce the number of units in inventory by 5. 7 minus 5 is 2. So the resulting value of 2 units is then written back to the database. Thus the final number of units in inventory is 2, regardless of whether we process User A's transactions first, as we did during our first trip through this example, or whether we process User B's transactions first. In either case the result, using our explicit locking strategy, is an inventory count of two, and that is logically consistent with what we would expect. So a locking strategy seems very logical and can have great benefits, but using a locking strategy can also introduce certain new problems into the database processing environment. And one of those problems is known as deadlock. Deadlock occurs when two transactions have each locked resources in the database that the other transaction needs in order to complete its tasks. In this situation without some mechanism for avoiding deadlock, the two transactions involved will have to wait on each other indefinitely. Let's see an example that will illustrate this deadlock phenomenon. So here we have a conceptual example for User A and User B. We're not going to worry about the technical details of the database. We'll just say that we're working with simple objects, paper and pencils. So imagine that User A initiates the following sequence of events. Lock paper then take some paper or modify the value of paper, and then try to lock the pencils. Simultaneously User B locks the pencils, tries to take or modify the value of the pencils, and then asks the database to lock paper. So let's see what happens at the database server when these requests arrive. The database first locks the paper resource for User A. It then locks the pencils resource for User B. The database next process User A's requests to modify the value of paper and then write those results back to the database. It then handles B's requests to modify the value of pencils and then write those results back to the database. And next, this is where the deadlock occurs. A has requested to lock pencils, but pencils have already been locked for User B. At the same time, B has requested to lock paper, but paper has already been locked for User A. Thus the database would put User A in a wait state for the pencils to be released, while user B would be put in a wait state for the paper to be released. And these two transactions would then wait on each other indefinitely and neither would be able to complete its tasks because the resources that it needs to complete its tasks are locked by another user. Database management systems must therefore incorporate a mechanism for detecting or avoiding this deadlock situation. Next let's talk about two major strategies toward locking. One of which we can call optimistic locking, and the second of which we can call pessimistic locking. Conceptually speaking, in optimistic locking, we are optimistic that the changes that we make to the database during our transaction will succeed and no concurrency control problems will arise during the execution of the transaction. Conversely, during pessimistic locking, we assume that a concurrency control problem might arise during the execution of our transaction, and so we proactively seek to avoid that. Optimistic locking strategies are useful in database environments in which the majority of the transactions involve just reading data. That is, we're not making a lot of updates to the data. In this case, we can be optimistic that no concurrency control problems will occur, because it is unlikely that more than one user will be trying to update the same data at the same time. We simply don't have a lot of updates in our database, so the likelihood of having two simultaneous updates against the same data is small. And in that case, we might implement an optimistic locking strategy. By contrast, if we are in an environment where data are frequently updated and the chances that two users might want to simultaneously update the same data are reasonably high, then in that case, we would want to implement a pessimistic locking strategy. So let's consider the differences between these two locking strategies at a slightly lower level. To begin we'll examine optimistic locking. And the basic process which is employed under optimistic locking is, we first read the data that is requested by the transaction. We process the transaction, that is, we initiate any sort of modifications or updates to the data that we need. That update is recorded. We then check the database to see if any sort of conflicts occurred, or any sort of errors occurred. Is there anything unexpected? If no conflict has occurred, we then commit the transaction and make our requested changes permanent. They then become a permanent part of the database. Otherwise, if some sort of conflict did occur, then we initiate a roll back. We undo all of the steps of our transaction and we try to repeat the transaction again, until it ultimately succeeds without a conflict. Under the pessimistic locking strategy, we begin first by locking the resources, reading the data that we need, we then process the transaction by making any updates or modifications to the database that are necessary. We commit those changes and we release the locks. Note that in a pessimistic locking strategy we do not need to check to see if a conflict occurred. We simply commit the changes directly. The reason that we do not need to check to see if a conflict occurred is because all the required resources for the transaction were locked prior to any updates or modifications being made to the data that are being used by the transaction. Let's see some intuitive examples of these optimistic and pessimistic locking strategies. Here we see an example of an optimistic locking strategy that has been written in the pseudo-code. To begin, we request some values from the product table. Specifically here we are interested in the quantity of pencils that are available in our database. So we request the number of pencils available and we record that value into a variable which here we have named old quantity. So let's say that our number of pencils is 10. So we request 10 pencils. We store the value 10 into our old quantity variable. We then set the new quantity value equal to product quantity minus 5. That is, we set the value equal to what should be 10 minus 5, or five units. Thus, the value of new quantity should be equal to 5. We then continue processing our transaction, looking for any sort of errors, et cetera. And at this point, we then issue a lock. We then attempt to carry out our update. So we instruct the database to update the value of pencils. We want to set the value of pencils to the value stored in our new quantity variable, where the quantity of pencils in the database is equal to the old quantity. That is, it's equal to 10. If the quantity of pencils has changed since we first retrieve the quantity at the beginning of our transaction, that is, if the quantity of pencils is no longer equal to 10 when we attempt to issue our update, then we know that some other user or some other process has modified the quantity of pencils during the time while we were busy processing other parts of the transaction. So when we check and we determine that our update was not successful, we would need to roll back this transaction. That is we need to undo any changes that were made along the way and try to repeat the transaction until ultimately it is successful. Next, let's look at a similar example, except this time we will use a pessimistic locking strategy. Remember in a pessimistic locking strategy we assume from the outset that one of our concurrency control problems might occur, and so to defend against that we simply begin by first locking all of the resources that we will need to complete our transaction. So we lock all of the resources that we need. In this case, we extract the product quantity for pencils, which is 10 units. We set the new quantity equal to the previous quantity minus 5. So the new quantity is set equal to 5. We process any additional steps in our transaction. We issue our update and we then unlock the resources that were previously locked. In this case there's no need to check whether the update was successful. We know it was successful, because we had locked all of the required resources that we needed prior to executing transaction. At this point you may be asking yourself, well, why would anyone ever want to use an optimistic locking strategy if a pessimistic locking strategy can always guarantee that the update is successful? Well, the answer to that question is, when a pessimistic locking strategies is used, it is much more likely that other users will be put into a wait state. That is, they'll have to wait in line for access to the resources in the database that are required to complete their transactions. And that can slow the overall performance of the database. So ultimately, the decision about whether to use pessimistic locking or optimistic locking must come down to the nature of the queries that are being run against this particular database. If the database one in which many, many updates are done, then a pessimistic locking strategy would probably be preferred. If the database is one in which most of the requests that are made of the database are reads, that is we're just reading data, selecting data out of the database, then an optimistic locking strategy would probably be preferable. Next, let's talk about this notion of a consistent transaction. And a consistent transaction is commonly referred to in the industry as an ACID transaction. That is, it is one which is atomic, consistent, isolated, and durable. Atomic transactions are treated by the database as a single logical unit of work. As we said earlier, an atomic transaction is one in which all of the steps of the transaction must be completed successfully in order for the transaction to be considered successful. That is, if any of the steps in the transaction fails, then the entire transaction fails. Treating a transaction as a single logical unit of work helps us to avoid data anomalies, as we saw earlier in this lecture. Consistent transactions are those in which other transactions are disallowed well the transaction is under way. And this sort of consistency can be achieved through the artful use of database locks. Next, let's consider the extent to which a transaction is isolated. We now know that in multi-user databases, it's possible for different transactions to be using or working with the same data at the same time. And because of this, we know that the data in the database can be continuously changing. We therefore need to make some decisions as database administrators regarding the extent to which our transactions will be isolated in our concurrent database processing environment. Now, the ANSI SQL standard defines four different isolation levels. And these four isolation levels specify which of the three concurrency control problems are allowed to occur in the database. If you remember our three concurrency control problems were dirty reads, inconsistent reads, and phantom reads. So the least restrictive isolation level is Read Uncommitted. And here, many concurrency control problems are considered acceptable. That is, transactions are not particularly isolated. As we move from Read Uncommitted toward Serializable, our transactions become increasingly isolated as they are processed by the database. A Serializable isolation level than prevents concurrency control problems, but comes at the expense of accessibility. That is, if we choose to implement a Serializable transaction isolation level, then although our transactions will be very well protected, and the likelihood of concurrency control problems occurring is essentially zero, the cost of that is accessibility. It is much, much more likely that users will have to wait for longer periods of time in order to have the requests serviced by the database. Finally, we have Durability. And a Durable transaction is simply one in which all of the changes which are committed during the course of the transaction are considered permanent within the database. Another topic that can be very important for database administrators to understand is that of cursors. Recall that when we use a SQL SELECT statement, the result of that statement is a set of rows that is returned by the database management system. And we call that set of rows a result set. A cursor is simply a pointer into that set of rows. One of the reasons that we need to use cursors in application programs or data driven websites, is that these results sets can often be very large. So it doesn't make sense, or it might even be beyond the capability of these applications or websites, to work with the entire results set as a single unit. Using a cursor allows an application or a website to process a result set one row at a time. And these cursors are commonly defined using SQL SELECT statements, such as the example here which relies upon the declare cursors syntax. Conceptually speaking, we have two major types of cursors. There are forward only cursors, also called non-scrollable cursors, and scrollable cursors. As the name suggests, in a forward only cursor, our rows are fetched one at a time from the start to the end of the cursor. We iterate through the cursor one row at a time, and we can only move in one direction. That is, we cannot stop halfway through and look at the previous row in the cursor, for example. We start at the first row in the result set. We then move to the second row, to the third row, to the fourth row, and so forth until we reach the end of the results set. And at no time are we allowed to move backward through the results set. Because we cannot scroll the cursor backwards, any changes that are made by other users to the database to rows in our results set that we have already examined will not be visible through the cursor. So if I am reading through a results set using a forward only cursor, and I read row one and then row two and then row three, and at that point in time some other user makes a change to row one, I would not be able to see that change through the cursor, because I cannot look backwards. If however, I am scrolling through the results set, and through some sort of insert, update, or delete statement, values in the results set are changed for rows that I have not yet examined, then I will see those changes at the point in time when I examine the affected rows. So for example, if I am scrolling through a results set using a forward only cursor, and I examine row one and then row two and then row three, and at that point in time someone makes a change which affects the values of row five, by the time I reach row five I will be able to see that changed result, because I am fetching rows from the database serially, one at a time as I move forward through the cursor. By contrast to these forward only cursors, we also have scrollable cursors. And a scrollable cursor allows us to move both forward and backward through a results set as needed for the application program or the data driven website to examine the rows. And there are three major types of scrollable cursors, each of which has different properties. These are static cursors, keyset cursors, and dynamic cursors. So static, keyset, and dynamic cursors are the three different types of scrollable cursors. Let's see how they're different from one another. Of the three different types of scrollable cursors, static cursors are probably the simplest to understand. The way that a static cursor works, is that when the cursor is requested, a snapshot of the results is taken and the complete result set is fetched from the database and stored in a temporary table. So when the cursor is opened all of the results are fetched from the database at that point in time and stored in a temporary table. The user of the cursor can then scroll forward or backward through the results set. And the rows which are visible to the cursor are those which were copied into this temporary table at the moment in time when the cursor was opened. Because of this, because of the use of this temporary table, any changes that are made to the underlying data, that's is, in the real tables from which the temporary table was constructed, any change it is made to those underlying data are not visible through the cursor. So if I open a cursor using a select statement and that statement fetches a number of rows into this temporary table for use by my static cursor. And at that point in time other users are changing the underlying values from which my temporary table was constructed I will not be able to see any of those changes through the cursor, because I am not actually looking at the data in the real table anymore. My cursor, or my static cursor, is simply using the data from the temporary table to move forward or backward through the rows as necessary. So the disadvantage of these static cursors is that changes which are made to the underlying data are not visible. But the major advantage is that they're simple and they consume comparatively few resources on the database server. The next scrollable cursor type is called a keyset cursor. And the way that a keyset cursor works is that when a cursor is opened, rather than fetching the entire results set, the database will fetch all of the key values for the rows in the results set and store those key values in a temporary table. So for our purposes let's consider this to be primary key values. So if I open a cursor on a table, and let's say that the results set from that select statement would retrieve 1,000 rows, the database will fetch the primary key values of the 1,000 rows of data that comprise the results set and it will store those primary key values in a temporary table. Then as I am scrolling through the rows using the cursor moving either forward or backward, every time I scroll forward to the next row or scroll backward to the previous row, the database will use that primary key value to fetch the values in the underlying table at that moment in time. Because of this, updates that are made to a row after the cursor was opened are going to be visible through the cursor. So let's say that my keyset cursor fetches a number of key values when the cursor is opened. These are primary key values, let's say 1, 2, 3, 4, and 5. I start moving through the cursor. I examine row one. When I scroll to that row the database fetches the values for primary key ID one out of the database, returns them to me. I then scroll to row 2, and the database fetches the values for row two from the database. Let's say at that moment in time, some other user changes the values for row one. Well, if I scroll backwards using a keyset cursor, I will use the primary key value, in this case one, to fetch the current values of the data in the underlying table and I will therefore be able to see the changes to row one that were made by the other user. Therefore, updates that are made to the row after the cursor was opened are visible. It is also possible to identify deleted rows using a keyset cursor, because as I scroll forward or backward through the results set using the cursor the database will use the primary key values that it had stored in the temporary table to attempt to fetch the true values of the data from the underlying table. And if at that moment in time a row no longer exists, then the database will fetch a null result for me through the cursor and I can then infer that the associated row has been deleted. Although updates and deletes are visible through a keyset cursor, new rows, that is rows that are added using an insert statement, are not visible through the cursor. And the reason for this is that the key values for all of the rows in the results set are fetched and stored in temporary table at the point in time when the cursor is opened. Therefore, any new rows which might be added after that point in time, their key values, their primary key values, are not going to be included in my temporary table. And as I scroll forward or backward through the results set I will therefore be unable to identify any new rows that have been added after the point in time when the cursor was opened. Finally we have dynamic cursors. And a dynamic cursor is essentially the opposite of a static cursor. Dynamic cursors use a comparatively large quantity of system resources, because they allow all changes that are made to the rows of a result set to be visible while the cursor is scrolling. So as I scroll through my dynamic cursor, forward or backward, any changes that have been made to the results set are going to be visible to me through the cursor. That is, all changes made through updates, all new rows added through insert statements, or any rows deleted through delete statements. I will be able to identify those changes as I scroll forward or backward using a dynamic cursor. Although this capability certainly is convenient, I hope you can understand that it consumes quite a large quantity of system resources to enable this level of visibility through the dynamic cursor. Toward the beginning of our lecture we talked about three major functions that to a database administrator must perform. The first was understanding concurrency control, and we talked about that, as well as the effects of concurrent access on different types of cursors. And the second of the major functions that a database administrator must perform is to manage database security. In a multi-user database environment, database security is critically important. The objective of creating a database security model is to ensure that only authenticated users are allowed to perform only authorized activities. So potential users, we want them to authenticate themselves to the database. Those users can then attempt to carry out activities, but only the activities for which they have been authorized will be allowed to be performed against the database. To enable database security, modern databases implement a model, which is based on processing rights. And database processing rights define two separate things. First is who, by which I mean which users, who is permitted to perform which actions and when are those actions allowed to be performed? After defining and assigning processing rights to groups or individual users, then the individuals who are performing activities against the database will have full responsibility for the implications of their actions. That is to say, if the database administrator has granted you the power to, say delete customers from the database, and you choose to exercise that power, then it's not the database administrators fault if you decide to delete all of customers. It's your fault. Typically in modern databases, individuals authenticate themselves to the database management system through the use of a username and a password. And the database administrator can establish password rules, timeout periods, and so forth, just as if we were managing user accounts on a different type of server. In the realm of database permissions, one critical concept to understand is that a user can be known to the database, both as an individual and as a member of one or more roles. And because of this, it is possible for our database privileges to be assigned to both individuals on a user by user basis and to the roles to which different individuals may belong. Another critical concept to understand with respect to database permissions is that permissions are associated with database objects, such as tables. And permissions can therefore be granted, denied, or revoked with respect to a user's ability to interact with one of those database objects. Because users can be assigned privileges as individuals and as members of one or more roles, then a given user will possess the compilation of all of the permissions that have been granted to him or herself, and to all of the roles to which they belong. More formally, a user receives the union of all of the permissions that have been granted, denied, or revoked by their individual permissions and their role permissions. As an example, you may be assigned to a role which has permissions to read, modify, or delete rows in the customer table. However, as an individual, the database administrator may deny your rights to delete rows within that table. In that case, your role privileges would ordinarily allow you to perform operations such as select, insert, update, and delete. However, if the database administrator has additionally denied you as an individual rights to delete rows in that table, then the database would not allow you to delete any rows. So what happens then if a conflict exists between individual permissions and role permissions? Well the way that databases handle this is first, a deny permission will always take precedence. That is, if the database administrator has denied you as an individual access to one or more database objects or if your role denies you access to one or more objects, then that deny will always be honored by the database, even if your other individual rights or role rights what otherwise grant you access to the database object. The deny will always take precedence. Aside from those denies, role permissions are always given precedence over individual permissions. And this is a critical point for a database administrator to understand. In this diagram we're using the set of symbols that we learned about in entity relationship modeling to illustrate the database security model. Beginning on the left in working toward the right, we can see that a user can have zero to many roles, while a role can have zero to many users. Both users and roles are assigned permissions, which are associated with different objects in the database. So a user can have zero to many permissions, a role can have zero too many permissions, and an object might have zero too many permissions associated with it. In SQL Server database roles can either be fixed or they can be flexible. Flexible Roles are custom roles that are established and defined by the database administrator to support the specific needs of his or organization. By contrast, fixed rules are predefined roles to which database users can be assigned. Here we see a list of the fixed roles that come with SQL Server. A user who was assigned to the DB Owner role, for example, is allowed to essentially do anything they want with the given database, including dropping the database. A user who has been assigned to the DB Data Writer role is allowed to insert rows, delete rows, or update rows in all of the user tables within a database. By contrast, if a user has been assigned to the DB Data Reader role, they're allowed to read all of the data from the user tables. SQL Server also comes with two fixed deny database roles. So a user assigned to the DB Deny Data Writer role is not allowed to add, modify, or delete data within the database. Whereas a user who is assigned to the DB Deny Data Reader role is not allowed to read any data within the database. And there are, of course, several other fixed roles which are listed here that we won't talk about. Just a few final points to finish up our overview of database security. Data are absolutely critical to modern organizations, and as such, the DBMS should always be run behind the firewall. This may seem obvious, but nevertheless, I hope the point is well taken. It's important for the database administrator to ensure that the latest operating system and DBMS service packs and patches are always applied. This will help to ensure the security of the database. Philosophically speaking, the functionality of the DBMS should be intentionally constrained only to those functions and features that are needed by the organization. Finally, the database administrator should ensure the physical protection of the database servers by placing those servers in a secure location, and using physical barriers like locked doors. And the user accounts and passwords which allow access into the DBMS should be managed very carefully following established organizational policy. The third of our three major database administration functions was backup and recovery. A backup and recovery strategy is necessary to hedge against the possibility of database failures. Databases can fail for many different reasons, including failures of hardware, bugs in software, human error, or malicious actions, like efforts by hackers, denial of service, et cetera. Because it is not possible to completely avoid all of these issues, it is absolutely essential that a database administrator establish recovery procedures for the database. And the foundation the basis of these recovery procedures is making regular backups of the database. So a database administrator should back up the database regularly, but not so often as to negatively impact the accessibility of the database. With backup procedures in place, then the simplest method of recovering a database that has failed is to go through a process called reprocessing. And in reprocessing, the strategy is to restore the database using the most recent backup copy. And then have all of the database activities which were executed since the last backup copy be performed again by human beings who originally performed those actions. So if our organization, for example, has a database failure and we restore a backup copy that was made to days ago. And we need to recover all of the sales that were added into the database. Using the processing strategy we would ask our sales force to go back through their records for the past three days and add each sale that they made into the database again manually. This approach, of course, is costly in terms of the effort involved by the human beings who must reenter all the data. And it is also a risky, insofar as human error is likely. This strategy is best used only in very, very small database implementations, such as personal databases or very, very small corporate databases. For a database that contains any reasonable amount of data or has more than just a very few users a better strategy is required. The strategy that is used by most enterprise level relational database systems is called rollback and rollforward, and again this is based on the notion of having a database backup. The second mechanism that is required to enable rollback and rollforward is something called a log file. And log file, as the name suggests, is simply a file which records a log of all of the activities to carry out against the database. With a backup copy of the database and the log file it is possible for a database administrator to recover the database using either rollback or rollforward as necessary. The rollback strategy is used to undo transactions. So for example, perhaps a hacker was able to access our database and add a large number of malicious transactions to the database, and we would want to undo those transactions. We can undo those transactions using the rollback strategy. And the process is quite straightforward. Essentially, rollback requires that the log file contains copy of the values of any changed records before they were changed, as well as the order in which changes were made to the database. So these types of records are called before images. They save the values in a given cell, for example, before the change is made to the cell. To undo a transaction then, all we need to do is work backwards through the log file and apply the before images of all of the changes in the log file one by one. If by contrast, we want to redo a transaction, then the log file needs to contain a record of the values of all of the changes after the change was made. And these records are called after images. To redo a transaction then, we simply move forward through the log file one record at a time and apply the after images of the changes to the database. Here we see an example of a transaction log file. Each of the three transactions in this log file is color coded. So one of the transactions is in green, the other in red, the third in a light yellow. My apologies if you happen to be red green colorblind. Note first that records in the log file are recorded in chronological order. That is, they are recorded in the order in which they were executed by the database. The next critical thing to note about the transaction log file is that it contains pointers to the next and previous rows involved in the transaction. So taking row two as an example, we can see that the previous row is row one and the next row in the transaction is row four within the log file. So these pointers tell us where to look for the next part of the transaction or where to look for the previous part of the transaction. Now let's just work through a few of these transactions so that we have a solid understanding of how these transaction log files work. Beginning with transaction OT1 you can see the first record of the transaction is the transaction starting. The next thing that we do is we modify customer number 100. And you can see we record both a before image and after image. That is, the previous value prior to the modification for customer 100 and the new value after the modification for customer 100. The next step in this transaction is modifying salesperson AA. And we, again, record the previous value for this record and the new value for this record. Next part of the transaction is to insert a new record into the order table and in this case there is no before image, because it is a new record so we only record the new value. We then finish the transaction by issuing a commit. And thus ends our first transaction. Let's look at another example. Perhaps we looked at CT1. So we see the transaction begins. We then skip down to row nine for the next step of the transaction, where we modified salesperson BB. We record the previous value and the new value, and then we move to row 10 where we commit the changes made by our transaction to the database. Now let's examine this roll back procedure in a little more detail. Remember that the log file saves its activities in the order in which they were carried out. So they are saved in sequence within the log file. We can therefore undo any changes that were made to the database by using the before images in the log file and moving through the log file in reverse order. Let's see an example of how this works using our log file. Consider transactions CT1. In this transaction we modified salesperson BB and we recorded the before image, which was the previous value, and the after image, which was the new value. If we want to undo this transaction all we need to do is take the old value, that is the before image, and apply it to the database. So if the old value was, say five and the new value was 10, by setting the value equal to 5 it is as if this transaction never occurred. That is, it is if we never modified salesperson BB to begin with. Extending this concept, continuing to move backward through the log file, you can see how we can undo one transaction after the other after the other, until ultimately our database will be rolled back to a specific point in the past. Graphically this rollback process can be understood using this sort of a diagram. We begin with a database that has undesirable changes that have been made to it. We apply our before images to that database in an undo operation, and the result is the database without the undesirable changes. Next, let's consider roll forward. Remember, the activities in the log files are recorded in sequence. That is, in the order in which they were applied to the database. This means that if we begin with the most recent backup copy of our database and we start at the top of the log file and move downward, we can essentially redo or replay all of the transactions that occurred since the last backup. By the time we reach the end of this process, our database will be restored to the point at which it was when the failure occurred. This process is called rollforward. And again, it relies upon the backed up version of the database and the log file. Let's see an example using our sample log file. So let's imagine that this transaction log file contains all of the transactions that were applied to the database since the last backup. Our database has failed, we have restored the database using the last to back up, and now we want to recover all of the changes that were made in the time that has passed since the last backup. So what we do is, we take this transaction file and we simply replay it beginning at the top and working our way toward the bottom. As we go through and we are simply restoring all of the changes that were previously made. By the time we arrive at the bottom of the file our database will be fully recovered up until the point when it failed. Graphically this rollforward process can be understood using a diagram such as that's shown here. We begin with a backup copy of the database, that is, the database without the desired changes. We take the after images from our log file, apply those to the backup copy of the database in a redo process, and the result is that the database has been restored and all of the desired changes have been reapplied. In addition to these three principal tasks of handling concurrent access, handling database security, and establishing a framework for rollback and recovery, a database administrator has several additional responsibilities as well. First, the DBA needs to ensure that there is a mechanism in place through which user reported errors and other problems can be gathered and stored. This information can be used to tune the database, modify the database, et cetera. Additionally, the DBA also needs to create to manage a process for controlling the database configuration. Organizations are living entities and we cannot expect our initial database configuration to continue to be satisfactory for the organization as time passes. As the needs of the organization change, it's very likely that we will also need to alter the configuration of the database. And for this reason, we need to establish a process for controlling the database configuration. Finally, the DBA is also responsible for documentation. So the DBA needs to continually document the structure of the database, the concurrency control strategy that has been employed, the security model, the backup and recovery strategy, and the various applications and websites that rely upon the database. When taken together all of this documentation helps to provide it and holistic understanding the role of the database in the broader database processing environment within the organization. There are just a few remaining topics that I would like to touch on in this lecture. The first two which is distributed database processing. Although conceptually it's convenient to think of a database as a single entity located in a single physical position, perhaps on a single server, it is now very common for real world databases to be partitioned, replicated, or both. And a given conceptual database might be split across many different servers, perhaps in different geographic locations. We may also have replications of the database, which are continuously cloning all of the actions that are occurring as part of a backup and recovery strategy. And so we need to start thinking of these databases in broader terms. Let's look at a few examples as to how databases can be partitioned or replicated. In this first example, we see a version of a database which aligns with our simple conceptual model. That is, the database is nonpartitioned and is nonreplicated. All of the tables in this case, labeled W, X, Y and Z, exist on a single database server. In our next example we see a database partition. And in this case, the database has been subdivided such that it runs on two separate database servers. In this case, tables A and X are operating on server number one, whereas tables Y and Z are operating on server number two. And the two servers are interconnected via communications channel of some sort. In our third example, we see a replicated database. In this case, we have two database servers and each server is maintaining a full and complete copy of the entire database. So they're interconnected with a communications line, and we keep track of tables W, X, Y, and Z on both servers such that we have a full real time backup strategy in place. And in our final example, we see the most complicated of these four designs, and that is a partitioned and replicated database, where we have two different database servers interconnected with the communication line. On the first server we store tables W, X, and Y. To whereas on the second server we store tables Y and Z. So in this case, we are replicating table Y between servers one two. And we are then storing tables W and X additionally on server one and table Z additionally on server two. So there are many different ways in which databases can be subdivided according to our partitioning and replication needs. Our final topic for today's lecture will be the object oriented paradigm and its role in the database world. Over the past several decades object oriented programming has come to be the most common paradigm for software development. And object oriented programming is the basis of most of the world's most popular and widely used computer programming languages. Just like entities in a database, objects in the object oriented programming world have identifiers and properties. Objects, however, can also have methods. And this is what differentiates them from entity classes, such as those that we've studied in the entity relationship model. A method can be thought of as an algorithm which allows the object to perform certain tasks. Let's say, for example, that we have an employee object, and one of its properties is the employee's date of birth. Well we might then have a method which allows us to calculate the employee's age in years. So we could call that age method, and it would return a value which would reflect how old the employee is in years based upon the current time and the employees date of birth. Since most of the world's widely used programming languages are based on the object oriented model, software developers commonly have the need to save objects on a storage device such that they can be retrieved for future use. And this is called object persistence. Now, object persistence can be accomplished using several different approaches. One approach, for example, would be to serialize the objects and store them as a binary file on the file system. But another common technique is to use database technology for storing objects. For this purpose, normal relational databases can be used, but doing so requires substantial additional programming. To address this problem, several vendors have over the past few decades developed a type of database management system known as an object oriented database management system. And the purpose of these object oriented database management systems is to provide the ability to easily store and retrieve objects for use by software programs that are built using the object oriented paradigm. Unfortunately these object oriented databases have had very little commercial success, largely due to the fact that transferring existing data from relational and other organizational legacy databases would just be too cumbersome, too time consuming. That is, most managers have decided that the cost of transitioning to an object oriented database are simply not worth the benefits. The good news is that the current SQL standard includes several object oriented features. The two most prominent being user defined types and inheritance. So if you are familiar with object oriented programming, you will immediately see the connection between these features of the SQL standard and the object oriented paradigm. Therefore because these features are a part of the current SQL standard, any database that fully supports the current standard, can therefore be considered an object relational database. That is, they not only incorporate all of the features and capabilities of the relational model, but they additionally incorporate certain object oriented features which enhance their viability and value for use in conjunction with object oriented software programs. Most major database vendors currently support at least some of the object oriented features of the SQL standard. This includes Oracle, DB2, SQL Server among others. Well, my friends, thus ends our further exploration of database administration. I hope you learned something interesting in this lecture. And until next time, have a great day.

Info

Channel: Dr. Daniel Soper

Views: 130,489

Rating: undefined out of 5

Keywords: Database (File Format Genre), database administration, database security, database backup, database recovery, database locking, optimistic locking, pessimistic lo, database transactions, Database Administrator (Job Title), databases

Id: dMkwFzRgxZY

Channel Id: undefined

Length: 97min 2sec (5822 seconds)

Published: Mon Jul 29 2013