Database Lesson #1 of 8 - Introduction to Databases

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome, everyone. This is Dr. Soper And today we will be going through our first lecture for our database class. I need to operate under the assumption that not everyone in the class has had experience with databases in the past. And so this first lecture is intended to bring everyone up to the same basic level of understanding with respect to several important database concepts. We have several different objectives, which we will seek to achieve in this lecture. These are listed on the screen right now. First, we want to try to understand why storing data in a list is not necessarily a very good idea. It can cause many different types of problems. Second, we want to see if we can gain some insight into why an organization might want to use a database. Third, we will see how the notion of related tables of data, which forms a core part of the relational database model, provides a basis for eliminating many of the problems which are associated with storing data in a list. We'll also explore the various components and elements that comprise a database or a database system. And we will learn about the purpose of something that we call a DBMS, a database management system. And along the way, we're going to explore some of the functions that a robust database application can provide to us. What, then, is the purpose of a database? Well, it's important to remember that a database does not have just a single purpose. Instead, there are several key advantages that databases provide. First, a database provides a repository for storing data. That's kind of implicit in the name. Database implies that we have a place to store data. However, what might not be so obvious is that databases provide an organizational structure for data. That is, we don't just have a place to store data, but the database also provides an organized structure into which those data can be placed. Finally, a database provides us with a mechanism for interacting with our data. Now, interacting with data can generally be described in four different operations. Here they're listed as querying, creating, modifying, and deleting data. But there's another more interesting acronym, which may help you to remember this. And that acronym is CRUD, C-R-U-D. This stands for create, read, update, and delete. These are the four basic operations that we can use when interacting with data. A key point to remember here is that, in business, there are many natural, hierarchical relationships among data. For example, a customer can place many orders. Another way of saying that is many different orders can be associated with the same customer. Or another example is a department can have many different employees, but a given employee might work in one, and only one, department. So these are hierarchical relationships among the data. And a relational database allows us to model and represent these relationships. Let's take a look at a list of data. What I mean by a list here is a simple, two-dimensional table of data. And in this table, we store information that is important to us for some reason. An example might be, say that we have many different projects in our company, and we want to keep track of who the project manager is for each project. Therefore, for each project, we may track the project manager's name, their ID number, and maybe their phone extension within our company. Now, ordinarily this wouldn't be such a big problem. But imagine if the same person, the same project manager, is simultaneously managing multiple projects. In that case, using a simple list, the project manager's information would necessarily appear on the list multiple times. Now, what is the problem with this? Well, there isn't really any major problem, aside from the fact that it's redundant. Another way of saying that is we are using more space than is necessary to record which project manager is associated with which projects. Another problem with storing data in a list is that the list of data may contain more than one concept or more than one theme. I want you to remember this idea of a business concept, because we will see it again and again throughout our course. A business theme or concept refers to a number of different attributes or properties that are all naturally related to one type of business entity. An example might be an employee. An employee is a business concept. Employees have different attributes that we may want to track for each employee in our organization. For example, we may want to track the employee's ID number, the employee's name, the employee's salary. Other examples of business concepts or business themes include things like departments, products, customers, orders, locations, and so forth. Returning to lists, a major problem is with these multiple themes. That is, for each row of the list, we might be recording information about more than one of these business concepts. As an example, consider our list of project managers. If we also wanted to include project information-- that is, perhaps, a project name, a project ID, any additional information about the project-- we might store that in the same row along with the project manager. Aside from redundancy, as I mentioned earlier, the problem here is not necessarily how we are storing this information, but what might happen to the information if it is stored in this way. When we store information in a list, we introduce the possibility of something called data anomalies. And specifically, there are three types of these anomalies. Here they're listed as a deletion problem, an update problem, and an insertion problem. Later in our course, we will refer to these as deletion anomalies, update anomalies, and insertion anomalies. So the structure of a list and its associated problems of redundancy and multiple themes can produce a situation in which these modification problems potentially could occur. So let's take a look at some of these modification issues. Here we have a list of data, which contains information on students. So we have a student's last name, first name, their email address. And then we see who that student's advisor is. So the last name of the student's advisor, the email of the student's advisor, and the department in which they work, along with some additional information. Let's take a look at how some of these modification anomalies might emerge. In the first case, imagine that we want to change a student's advisor. And for this example, we're going to change Advisor Baker to Advisor [? Tiang. ?] When we do this, not only are we going to have to update the last name of the advisor in the list, but to maintain the overall quality of the data, a concept that we call data integrity, we are going to have to also update the advisor's email address. Note, in this case, that we do not need to update the department or the administrator's last name, because those are the same for advisors [? Tiang ?] and Baker. However, if, for some reason, we wanted to change the student's advisor from, say, Baker to Valdez, well, now not only do we need to update the last name and the email address, but we also need to update the department and the admin last name attributes as well. Just to do something as simple as changing the student's advisor with this list approach requires that four separate pieces of information be updated. So this is what we might call an update anomaly or an update problem. Next, let's look at a deletion problem. Imagine that our student, listed here as Chip Marino, decides to drop out of the university. So we need to remove Chip from our list of students. Now, look what happens if we delete this row of data. When the row of data is deleted, not only do we lose the student's information, but we've also lost information about the advisor and the department in which that advisor works. That is, you will notice that Advisor [? Tran ?] is not currently assigned to advise any other students. So when we delete this row of data, we may entirely lose the knowledge that Advisor [? Tran ?] even exists. And that can be a problem. In this case, this is called a deletion problem or a deletion anomaly. Finally, let's look at an insertion anomaly. Let's say that at our university, we decide that we want to add a new department. So we create a biology department. Well, this is fine. We add information to our list. We now have a biology department and an administrator's last name. However, we now have all of this missing data in our table. There are empty cells. There's no information for a student. There's no information for an advisor. All we have here is information for the department. This means that data are missing, and we're not efficiently utilizing our available storage space. These are some of the problems with these lists. Now, remember, earlier I mentioned that in the business world there are complex relationships among different types of data. For example, a department may have many employees who work in that department. Or a product may be assembled from many different components. Or a customer might place many different orders with our company. So there are these natural complexities that arise in business. And relational databases, as we will see, not only solve the problems that are associated with lists, but also allow us to model these natural relationships among business data. Now, I've been using this term relational database. So it would probably be a good idea for us to consider what a relational database actually is. On a very basic level, relational databases store data in a table. Now, a table is simply a two-dimensional grid of data that contains columns, and it contains rows. The convention in the relational database world is that the columns represent different attributes of an entity. And each row in the table represents an instance of the entity. So for example, if I have an employee table with an employee ID, an employee name, and an employee telephone number, we would list those three attributes-- ID, name, and telephone number-- as columns in the table. And then each row in the table would represent an individual employee. Again, we said that there are these natural relationships among different business concepts out there in the business world. In a relational database, data for each of these concepts is stored in its own table. So I may have an employee table. I may have a department table. And then I can create a relationship between those tables, which will allow me to figure out which employees work in which departments. So a good way of thinking about this is instead of storing all of the information in one big list, we break the list apart into separate pieces according to which information is associated with which business theme or business concept. Therefore, all of the information associated with an employee might go into an employee table. And all of the information associated with a department might go into a department table. Although, inside a relational database, information about different business concepts or business themes is stored in separate tables, it may be necessary-- say, for reporting purposes-- for us to reassemble all of the data from these different tables back into a list. Now, we can accomplish this in the relational database world by performing something called a join operation. That is, we join the tables together. Now, a very, very important concept for you to understand is that in the relational database world, we link the records in different tables together using matched pairs of values. These matched pairs of values allow us to associate a row in one table with one or more rows in another table. For example, if we have a project table and we have a customer table, and we use an attribute called a customer ID to uniquely identify each customer, then what we can do is we can place a customer ID attribute in the project table. And we can then use the ID of the customer in each table to link related rows together. Now, of course, you may be listening to this, and you may think, why do we want to do all of this extra work? And it is true that relational databases are more complicated than storing data in a list. However, as we will see, relational databases have many, many advantages over a list. Among these, first, a relational database minimizes data redundancy. In the examples we saw earlier, we saw that there exists with lists the possibility to have all sorts of redundant data. Relational databases eliminate this. Relational databases also allow us to model the natural relationships that exist among business concepts in the real world. And there are several other advantages as well, including providing a solid basis from which to do things like generate reports or build user interface forms. I mentioned a few minutes ago that there are four basic ways of interacting with data-- create, read, update, and delete. In the relational database world, we have access to something called the Structured Query Language, often called SQL, or SEQUEL, if you like. And this is the primary tool that we can use for performing these four basic operations on our data. In a few weeks, we will begin learning the Structured Query Language. This is an extremely valuable skill for you to develop. If you are able to learn the Structured Query Language well, then you will be able to sit down and work with virtually any modern database with a very, very short learning curve. It doesn't matter if you need to work with a SQL Server database, or an Access database, or an Oracle database, or a DB2, or a MySQL database, or even a database for, say, a mobile device. Most databases, including all of those that I mentioned, support the Structured Query Language. So if you can learn the Structured Query Language, you will be very well positioned to work with almost any database. Even though we're still early in our course, I want to give you an example of the Structured Query Language now so that you can see that it's really not that difficult to understand. This is certainly not a full scale computer programming language. The Structured Query Language was designed from the ground up to be simple to use and to be simple to understand. So what we have here is an example where we have three tables. There is a customer table, and a course table, and an enrollment table. So let's say that these are art courses. We have customers who have signed up to take different art courses to see, perhaps, if they can learn to paint or they can learn to sculpt. Now, although we start with three tables, if we need to combine all of the information together in order to restore the original list structure of the data, we can do that by using something called a SQL SELECT statement. Here we see an example of such a statement. I know this looks complicated. But hopefully, you are able to read this even right now at the beginning of our course and get a good idea for what this statement is supposed to do. So let's take a look. We are selecting the customer's last name, the customer's first name, the customer's phone number, and the date of the art course, the amount paid out of the enrollment table, the course itself, and the total course fee from our three different tables-- customer, enrollment, and course. We are then linking the related records in each table together using these matched pairs of values that I mentioned earlier. In the first case, we are linking the customer table to the enrollment table using matched pairs of customer numbers. And in the second case, we're linking the course table to the enrollment table using matched pairs of course numbers. What this will do for us is restore the original list of data. So you can see our list. It contains the customer's last name and first name, customer's phone number, the course date, the amount that the customer has paid so far, the course they are taking, and the total course fee. Now, the SQL statement that we saw was a SELECT statement. And that is a statement which just pulls data from the database. There are many other things that we can do with SQL, like deleting data, inserting new data, updating data in the database, and so forth. And we will learn how to do all of those various tasks when we study the Structured Query Language in more depth in a few weeks. Now I would like to turn our attention to some conceptual information. Let's focus on a database system. It's important to remember that a database system is not just the database itself. Rather, it exists in a broader hierarchy, which includes users, application software programs which rely on the database, something called a database management system, a DBMS, and then, of course, the database itself. The broader picture of a database system includes these four components. Here we see the relationships between the four components of a database system. On the far left, we see users. Now, users are people, just like you or me, who might need to use the data in a database. But a user doesn't necessarily have to be a human being. It is possible, for example, for other software programs to be users of our database as well. Now, you will see that users interact with database applications. These are the programs that rely on the database. So if we are a human user, we might, for example, use a website, which interacts with a database. That website could be considered a database application. You will notice that the database application talks to the DBMS. The DBMS, the database management system, acts as a gatekeeper. All of the information flowing in or out of the database must travel through the DBMS. It is a critical mechanism for maintaining the quality of the data in the database. We do not allow our users or our database applications to directly access the data in the database. To work with the data, to access those data, database applications must go through the database management system. And then, finally, far over on the right side of our screen, we see the database itself. And this is where we actually store the raw data in their separate tables. So what do the users of a database system do? Well, they do many different things. You and I are users of database systems on almost a daily basis, even if we don't realize it. So we can look up our checking account balances online. We can produce reports. We can seek information on Wikipedia. All of these are examples of us as users interacting with database applications. A few interesting things to note about the database itself is that it is a self-describing collection of related records. What this means is that the database does not just contain the data, but it also contains a definition of its own structure. Databases also contain metadata. The simplest way to understand metadata is that metadata are data which describe other data. And in the context of the database world, metadata are data which describe the structure of the data in the database. So for example, I may say that an employee table contains an attribute called Employee ID, and that ID value is an integer. That knowledge that the Employee ID value is an integer is metadata. And of course, databases allow us to establish relationships between tables. So that means the tables within a database are related to each other according to the relationships that we establish. So it's important to remember that a database does not just contain the data that we have put into it. A database also contains metadata, which describe our data. It contains, or potentially can contain, what we might refer to as overhead data. These are tables of data that are used to improve performance or track the status of the database or how users are interacting with the database. And one of the major types of these overhead data are called indexes. Later in our course, we will examine database indexes. And of course, databases can also contain application metadata. So these are data values that are used by the applications which rely upon the database. An application, for example, might store user preferences in the database. Now, to reiterate, a DBMS, or a database management system, serves as a gatekeeper or an intermediary between the database and the applications which rely upon that database. The purpose of the DBMS is to control and manage the operations of the database. The DBMS is a special kind of software program, which is used to not only create databases, but also to process and administer those databases. So a DBMS is a type of application program. But it is not the database. The DBMS interacts with the database, but it itself is not the database. What, then, can a database management system do for us? Well, the answer is it can do many different things, as we see here on your screen. We can use the DBMS to create a new database. And then, within that database, we can create tables and establish relationships between tables. We can read, write, edit, and delete data using the DBMS. We can establish rules which regulate and control the data in the database. The DBMS provides an additional layer of security for us. And it also provides mechanisms through which we can backup our data and recover our data if anything ever goes wrong. So the DBMS is a very critical piece of software. Now, businesses often operate with many different types of business rules or constraints. And one of the great things about working with databases is that the DBMS can establish and enforce many different types of constraints. One of the most useful types of constraint that a DBMS can enforce is called a referential integrity constraint. The purpose of a referential integrity constraint is to ensure that values that we are entering in one table have matching values in another table. Now, that's a very abstract way of trying to convey a simple concept. As an example, imagine that we have a customer table and a project table. And we are trying to enter information about a new project in the project table. And we specify that the customer ID that is associated with this project is customer number 5. Referential integrity means that the database will actually check in the customer table to see if customer number 5 exists. If customer number 5 does not exist, then the database will not allow us to add that customer ID to our project table. And in this way, I hope you can see it helps to maintain the quality of the data in the database, what we call data integrity. Now, the fourth component of our database system were database applications. And database applications are simply those software programs that need to use or work with data in the database. So these might be data driven websites. They might be mobile apps. They might be desktop applications or productivity software. Just remember that these applications are not allowed to talk to or use the database directly. But rather, they must go through the DBMS to work with or use those data. Finally, for this lecture, I want to just talk about some of the different types of database systems that are available. First, we can consider something called a personal database system. These are very small databases. They are typically used by a single user at a time. They are not complicated. They are simple in design. And they're not really suitable for supporting medium or large businesses. They might be used by very small businesses just out of convenience. But as those companies grow, they will, at some point, certainly need to choose a more robust database solution. An example of a personal database system might be Microsoft Access. Convenient, easy to use, but not very powerful. One of the interesting characteristics of personal database systems is that they often contain the capability to build a database application, or at least interfaces to the database within the DBMS. So in Microsoft Access, for example, I can create data entry forms or generate reports for my Microsoft Access database. That is, it kind of combines the database application and the database management system into a single entity, conceptually speaking. On the other hand, we have a class of database systems that are called Enterprise-Level database systems. These are databases that are used by medium and large size companies to support their operations. And it is Enterprise-Level databases that we will learn to use and interact with in this class. Compared to personal database systems, Enterprise-Level databases are designed to support many different users at the same time. And this could be thousands or tens of thousands of users all at the same time. Enterprise-Level databases also serve data to more than one application. For example, if we are a retailer like Target or Walmart, we may have a database that provides data to our website and also provides data to the dashboards that are used by our managers. Enterprise-Level databases are commonly spread across multiple physical computing devices. So many different computers. So a single database might be running on many physical servers. Enterprise-Level databases can also be geographically distributed. So I might have part of my database in my corporate offices in Los Angeles, another part in Beijing, another part in Berlin in Germany. And Enterprise-Level database management systems support more than one database. So within the same database management system, we might, for example, have our operational database, which allows us to run our business and keep track of transactions in real time. And we might also create a data warehouse or data marts. And they can all be managed by the same Enterprise-Level database management system. These are large databases. As an example, consider an ERP system like SAP. A typical ERP implementation will have thousands of tables, all related to each other in some way. An Enterprise-Level database management system can handle this level of complexity with relative ease. So here we see a graphical example of these Enterprise-Level database systems. The DBMS, again, serves as an intermediary or a gatekeeper between the databases and all of the various database applications that want to rely on the data in those databases. And they can be mobile applications, applications that are written in Java, or C#, or even web applications. It might be ASP.NET or PHP apps. Just as examples of some commercial DBMS products, again, I mentioned that personal or desktop DBMS products might include Microsoft Access. Whereas, Enterprise-Level products include SQL Server, which is what we will be using in this class, Oracle, MySQL, and DB2. MySQL, if you're interested, is a completely free and open source database. Well, my friends, thus ends our first lecture for our database class. I hope you learned something. And come back soon, and we'll begin our exploration of our next topic. Until then, have a great day.
Info
Channel: Dr. Daniel Soper
Views: 957,050
Rating: 4.9028645 out of 5
Keywords: database, databases, DBMS
Id: 4Z9KEBexzcM
Channel Id: undefined
Length: 38min 42sec (2322 seconds)
Published: Thu May 30 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.