MySQL in 25 Minutes | Primary and Foreign Keys | Database Terminology for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
What is the first thing that comes to your mind when you hear the word “database”? For many people, this question is more challenging than it might seem at first. An answer like “a big file where much information is stored” is not satisfactory and would not please potential employers. You should remember there are two main types of databases – relational and non-relational. The former will be the focus of this course, while the latter regards more complex systems. Although understanding non-relational databases requires a serious mathematical and programming background, some of the logic applied in its coding is the same as SQL. Likewise, relational databases have a few advantages on their own. A small bit of theory will explain why they are still the preferred choice in many companies and institutions. Databases’ main goal is to organize huge amounts of data that can be quickly retrieved upon users’ request. Therefore, they must be compact, well-structured, and efficient in terms of speed of data extraction. Today, people need such extra efficiency because data occupies memory space and… the bigger its size, the more sluggish the database is and the slower the retrieval process becomes. If we have a database containing one multi-million-row table, with many columns, then every time a request has been received, the server must load all the records, with all fields, and it would take too much time for a task to be completed. Don’t forget every symbol is a container of information and requires bytes of storage space. Hence, loading that much data will not be an easy job for the computer. So, what allows us to contain so much data on the server, but lets us efficiently use only the portions we need for our analysis? The secret lies behind the use of mathematical logic originating from relational algebra. Please, don’t worry - we will not bother you with math. Imagine each table with data is represented by a transparent circle that contains all the data values of the table, categorized by columns or, as we will often call them, fields. Now, if our database consisted of only one table, a giant circle would represent the entire database, something like this huge table from our fictional example with the “Sales” database. And when we need a piece of information from the database, for example if we wish to see who has bought something on a certain date, we will have to lift this whole big circle and then search for what we need. This challenge seems vague and the process of data extraction will not be efficient. See what can happen if we split the circle into 3 smaller circles, just as we did with the “Sales” database. One circle will stand for the “Sales” table, the other for “Customers”, and the last one for “Items”. There are various theoretical combinations between 3 or more circles, but in our database, we have the following model. “Sales” and “Customers” have the same “customer ID” column, and “Sales” and “Items” have the same “item code” column. This way, we can see the circles overlap as they have common fields. So, if we’d like to extract the same information, the names of the customers who have purchased something on a given date, we will need only the “date of purchase” column from the “Sales” table and the “first” and “last name” from the “Customers” table. So, to satisfy this request, we will not need to lift the third circle from our database, “Items”. This way, we can save energy or, more technically, increase efficiency. Less data, represented as only two of the three circles, will be involved in this operation. The mathematical trick lies in relating the tables to one another. Relationships were formed namely through these common fields. Anyway, I am sure that now you understand why we use the term relational databases. Some professionals may refer to the tables, or the circles in our plot, as relations because, theoretically, they are the smallest units in the entire system that can carry integral logical meaning. Likewise, the three circles are all part of our “Sales” database. When we combine the database and its existing relations, we obtain the famous term relational database management system, frequently abbreviated as RDBMS. 😊 We hope this theoretical illustration makes things clearer. SQL is designed for managing relational database management systems and can do that by creating relations between the different tables in a database. To complete our theoretical preparation, we will need to learn more database terminology. This is a necessary step that will help us when we start coding in SQL later. Let’s go through the entire process of creating a database. Assume our database containing customer sales data has not been set up yet, ok? So, imagine you are the shop owner and you realize you have been selling goods quite well recently, and you have more than a million rows of data. What do you need, then? A database! But you know nothing about databases. Who do you call, an SQL specialist? No. You need a database designer. She will be in charge of deciding how to organize the data into tables and how to set up the relations between these tables. This step is crucial. If the database design is not perfect from the beginning, your system will be difficult to work with and wouldn’t facilitate your business needs; you will have to start over again. Considering the time and data (…and money!) involved in the process, you want to avoid going back to point 0. What do database designers actually do? They plot the entire database system on a canvas using a visualization tool. There are two main ways to do that. One is drawing an Entity-Relationship diagram, an ER diagram for brief. It looks like this… and, as its name suggests, the different figures represent different data entities and the specific relationships we have between entities. The connections between tables are indicated with lines. This way of representing databases is powerful and professional, but it is complicated. We will not focus in-depth on ER diagrams in this course, but you should know they exist and refer to the process of database design. Another form of representation of a database is the relational schema. This is an existing idea of how the database must be organized. It is useful when you are certain of the structure and organization of the database you would like to create. More precisely, a relational schema would look like this. It represents a table in the shape of a rectangle. The name of the table is at the top of the rectangle. The column names are listed below. All relational schemas in a database form the database schema. You can also see lines indicating how tables are related in the database. To this moment, it has been ideas, planning, abstract thinking, and design. At this stage, it would be correct to say SQL can be used to set up the database physically, as opposed to contriving it abstractly. Then, you can enjoy the advantages of data manipulation. It will allow you to use your dataset to extract business insights that aim to improve the performance and efficiency of the business you are working for. This process is interesting, and, practically, the main part of the course will be related to that. Remember – well thought-out databases that are carefully designed and created are crucial prerequisites for data manipulation. If we have done good work with these steps, we could write effective queries and navigate in a database rather quickly. You will often hear the term database management. It comprises all these steps a business undertakes to design, create, and manipulate its databases successfully. Finally, database administration is the most frequently encountered job amongst all. A database administrator is the person providing daily care and maintenance of a database. Her scope of responsibilities is narrower regarding the ones carried out by a database manager, but she is still indispensable for the database department of a company. All right! So… In this lesson, we will use the “Sales” database to illustrate the concept of relational schemas a little better. The data will be stored in 4 tables – “Sales”, “Customers”, “Items”, and “Companies”. You see how these tables have a tabular shape. Let’s see how a relational schema can be applied to represent them. Let’s build the landscape piece by piece. We care about the sales per customer of our company, and that’s why we have a “Sales” table. It tells us the number with which each purchase was registered, the date of the purchase, the respective customer ID, and the item code. So, in this table, the dates of a few purchases may coincide, because it is normal to sell some of the same goods in the same day. Analogically, in the third column, the ID of a customer may appear a few times, given that some customers make more than one purchase. People may have bought many units of the same product; hence, there is a possibility to see the same item code a few times in the last column. This reasoning does not apply to the first field, though. Each purchase is unique! In databases, this means all the numbers in this column will be different, because each purchase will be assigned a specific number that is unique. There is a term for such type of field. A column (or a set of columns) whose value exists and is unique for every record in a table is called a primary key. Each table can have one and only one primary key. In one table, you cannot have 3 or 4 primary keys. For instance, in our “Sales” table, “number of purchase” can act as a single-column primary key, and there will be no other primary keys. If the database creators decide, the primary key of a table may be composed of a set of columns, not of just one column. In the “Sales” table, both “number of purchase” and “date of purchase” can be thought of as a unique-identifier pair for the data in this table. For instance, purchase number 1 and purchase date, which is the 3rd of September, will form a unique pair and so will purchase number 2 and the same date, 3rd of September. This means these 2 rows are different. Please, don’t overlook the fact you cannot have a combination of the same purchase number, 1, and the same date, 3rd of September, more than once in your table. Either the number or the date must be different. This is because, as we already said, a primary key must be unique. Good! Both ways of representation make sense. Since having a one-column primary key here would mean all purchases in our database will be recorded under a different number, this means they can go from 1 to, hopefully, a million, or two… This logic will preserve the uniqueness of the purchases. Ok. So, we will proceed with “purchase number” as a single-column primary key. Awesome! Therefore, you can think of a primary key as the field or group of fields that identifies the content of a table in a unique way. For this reason, the primary keys are also called the unique identifiers of a table. Another crucial feature of primary keys is they cannot contain null values. This means, in the example with a single-column primary key, there must always be a value inserted in the rows under this column. You cannot leave it blank. Please, be careful and don’t forget this characteristic of the primary key! Technically, to create a relational schema, we draw a table in the form of a rectangle, and we place its name on top. Then, we enlist the fields vertically (as opposed to their horizontal organization in a spreadsheet). The field that is the primary key of the table is usually quoted on top of the other fields. What is more important - it is always underlined. Keeping these simple rules in mind, when you spot this table, you will immediately know: It is called “Sales”; its primary key is “purchase number”; and there are three other fields – “date of purchase”, “customer ID”, “item code”. This image corresponds to tabular data in the following form… All right? Good. One last remark about primary keys. Please, remember that not all tables you work with will have a primary key, although almost all tables in any database will have a single-column or a multi-column primary key. To conclude, the term for representing a table with the help of such a rectangle and the names of the fields is called a relational schema. When you combine the relational schemas of the tables we have in a database, this gives us a database schema. Informally, it is ok to call the representation of the entire database in plural - relational schemas. All right! Excellent! The “Customer ID” column does not give away much information about the purchaser, does it? We will store her first and last names, email address, and number of complaints submitted in another table, called “Customers”. Here is a table containing all these fields, as well as a “Customer ID” column, which is unique; there is no need to keep the information about the same customer more than once, right? Therefore, in this table, the “Customer ID” field is the primary key. The other columns represent the rest of the information about a customer – first name, last name, email address, and number of complaints. Interesting! We already have two tables – “Sales” and “Customers”! We know they are related through the “Customer ID” field, and as you can see, a “Customer ID” column exists in the “Sales” table. The coincidence of the two column names is chosen purely for convenience. The same relationship could be based on columns with different names – “Column ID” and just “ID”, for example. What is important is that the information in the “Sales” table can be obtained through the “Customer ID” column in the “Customers” table. Great! Let’s stick to using the same name for both columns. All right. Here is the big trick. “Customer ID” is a primary key for the “Customers” table and a foreign key in the “Sales” table. In the relational schemas form of representation, relations between tables are expressed in the following way – the column name that designates the logical match is a foreign key in one table, and it is connected with a corresponding column from another table. Often, the relationship goes from a foreign key to a primary key, but in more advanced circumstances, this will not be the case. To catch the relations on which a database is built, always look for the foreign keys, as they show us where the relations are. How is this shown on the canvas? First, you should indicate that “Customer ID” is a foreign key in the “Sales” table with F.K. in brackets next to the column name. Second, you must draw an arrow that begins from the same level as the field name of the foreign key and ends pointing at the column name from the table it is related to. Don’t worry; the arrow won’t be straight – what matters are its start and its end. Observe the following example from our “Sales” database to avoid confusion between primary and foreign keys. In the “Customers” table, “Customer ID” contains different values; that is, different numbers are stored on every row, and there are no missing values. This complies with the requirement for primary keys to contain unique values only. In the “Sales” table, though, we see under the “Customer ID” column we can have repeating and missing values. This is ok. For instance, it is normal to expect customer number 2 to have executed three purchases in our shop, correct? Also, it is possible to have not registered who executed purchase number 9. The values under this field in the “Sales” table will be mentioned in the “Customers” table, as well. If we have registered information for four customers with IDs 1,2,3, and 4, we cannot have a value of 5, 10, or 100 here, right? This also means, if you try to insert in the “Sales” table a record with a “Customer ID” that differs from the ones we have already seen in the “Customers” table, MySQL will raise an error and the record won’t be stored. Fantastic! By the same token, we can create the “Items” table. The field “Item code” appears in this table as a primary key and in “Sales” as a foreign key. Analogically to the previous example with the “Customer ID” column from the “Customers” table, “Item Code” contains a letter and a number forming a unique combination in the “Items” table, while in “Sales”, the values could repeat, as is the case with item B2, for instance. Therefore, we can use “F.K.” to indicate an “Item Code” is a foreign key in the “Sales” table and by drawing an arrow starting from the “Item Code” field in the “Sales” table pointing at the column with the same name in the “Items” table. Great! Remember that primary keys were called unique identifiers of a table? Well, foreign keys can be called identifiers, too, but they identify the relationships between tables, not the tables themselves. Amazing! Ok, great. Now we have a good idea of what the database will look like. However, there is one missing bit – we have to create a connection between the “Sales”, “Customers”, and “Items” table and the “Companies” table you created earlier. And why is the “Companies” table so important to us? Well, I hope you can already see that “Company ID” is a primary key in it. Spotting this must be a piece of cake for you already. On the other hand, you can rarely have two or more companies with the same name, but it’s not impossible, especially if they are registered in a different state or if they operate in different industries. How about the “Headquarters Phone Number” field? Can you have two US numbers that are completely identical? No! When you dial a phone number, there is just one phone that will ring somewhere in the US, right? Therefore, we can conclude that the values in this field have to be unique. But wait! This also means that “Headquarters Phone Number” is a primary key! Would that be possible? We just said that there is one and only one primary key in a table, and here the “Company ID” column is a primary key! Don’t worry, there is a name for this type of field, and it is unique keys. They are used whenever you would like to specify that you don’t want to see duplicate data in a given field, exactly as in our example with the headquarters phone number. There is one substantial difference between unique keys and primary keys, which is that they can contain null values. Meaning, if we don’t know the headquarters phone number of Company D, we can still have this record and our table will be functioning as part of the relational database. However, it won’t give us information about the phone number of Company D. If, instead, we remove number two as company ID and leave the name “Company B” and the relevant phone number, SQL will display an error message because “Company ID” is a primary key and we are obliged to provide data in every row of this column. In addition, a table can have 0, 1, 2 or more unique keys. This means that depending on the requirements of the database, a table may or may not contain one or more unique key columns. This is different with respect to the characteristics of the primary key constraint: remember, you can have only one primary key per table. Furthermore, what is similar between the two types of constraints is that both can be applied to multiple columns, not just to a single column. In other words, the unique key constraint can be defined by a single field from the table it is applied to, or it could comprise a few of the columns in that table. Both scenarios are possible. Wonderful! This completes the creation of the entire “Sales” database scheme! All right! Excellent! This is a lecture in which we will try to illustrate that relationships between tables can be categorized. We will not explore all types of relationships in detail given that this topic is rather theoretical and time consuming in its entirety. Instead, we will study the main types of relationships you will likely need in your workplace. Relationships tell you how much of the data from a foreign key field can be seen in the primary key column of the table the data is related to and vice versa. What do we mean by that? As we already discussed, the “Customer ID” column is a primary key of the “Customers” table. This means it contains only unique values – 1, 2, 3, and 4. The same information about customer ID can be found in the “Sales” table as a foreign key, but you will likely have a lot more than 4 rows there. Hence, the values from 1 to 4 can be repeated many times because the same customer could execute more than one purchase. This is an example of a ‘one-to-many’ type of relationship: one value from the customer ID column under the “Customers” table can be found many times in the customer ID column in the “Sales” table. In the relational schema, this is shown by assigning the correct symbols at the end of the arrow. You should always read the symbols according to the direction of the relationship you are exploring. For instance, think of it this way: a single customer could have made one purchase, but she could have made more than one purchase too! Therefore, the second symbol, which is next to the rectangle, shows the minimum number of instances of the “Customers” entity that can be associated with the “Sales” entity. When this symbol is a tiny line, it means “one”. The symbol located next to the rectangle indicates the maximum number of instances that can be associated with the “Sales” entity. The angle-like symbol stands for “many”. Let’s check the relationship in the opposite direction – for a single purchase registered in the “Sales” table a single customer can be indicated as the buyer. So, the minimum number of customers you will find in the “Customers” table for whom we have such information as name, email, and number of complaints, and that corresponds to a single purchase in the “Sales” table, is one. At the same time, we just said that for a given purchase we cannot have more than one buyer, meaning that the maximum number of instances from “Sales” associated with “Customers” is also one. Therefore, for every purchase, we can obtain the name, email, and number of complaints data for one customer, and we represent this logic by drawing а line for the minimum, and а line for the maximum. The two symbols in closer proximity to the rectangles form the relationship between the “Customers” and the “Sales” tables, according to the specified direction. In our case, it is correct to say that the “Customers” to “Sales” relationship is one-to-many, while “Sales” to “Customers” is many-to-one. Great! The symbols showing us relationship limitations are called cardinality constraints. There are other symbols that can be used too – M or N for infinite associations, or a circle for optional instances, which would have been the case if it wasn’t necessary for a registered person to have purchased an item. There are some other types of relationships as well – one-to-one, many-to-many, and others. This is information that we share with you for general knowledge, as this is a specialized topic which is of interest mainly to advanced users. To sum it up, relational (or database) schemas do not just represent the concept database administrators must implement. They depict how a database is organized. They can be thought of as blueprints, or a plan for a database, because they are usually prepared at the stage of database design. Drawing a relational schema isn’t an easy job, but relational schemas will help you immensely while writing your queries – a neat and complete visualization of the structure of the entire database will always be useful when retrieving information.
Info
Channel: 365 Data Science
Views: 27,686
Rating: 4.9277778 out of 5
Keywords: data science, 365 datascience, 365datascience, 365 data science, sql foreign key, foreign key in database, sql add foreign key, mysql add foreign key, primary key and foreign key, foreign key, primary key sql, primary key foreign key, database design, database terminology, primary key in database, how to create a database, null values, unique key, database concepts, sql basics, sql tutorial, mysql create database, mysql database, mysql create table, mysql setup, sql, MySQL
Id: 8kDs8QkFI2Y
Channel Id: undefined
Length: 25min 43sec (1543 seconds)
Published: Thu Jan 31 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.