SQL Index |¦| Indexes in SQL |¦| Database Index

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Today we are going to run SQL queries against a table containing ten THOUSAND records. {{ Maniacal laughter }} {{ Phone call }} What is it, I’m in the middle of a video… You don’t say? ALL in RAM?… Well, alrighty then... Today we are going to run SQL queries against a table containing one .. Hundred .. MILLION records. {{ Maniacal laughter }} But don’t worry. By using indexes, we can rapidly speed up queries so you do not have to experience the phenomenon known as boredom. We will work with a single table called ‘person’ containing 100 MILLION randomly generated people. The first row is an auto-generated primary key called “person_id” The other columns are first_name… last_name… and birthday. To create this table, we randomly generated names using the 1000 most popular female names, male names, and last names in the United States. We did not weight the names by frequency when generating our random sample. The datasets and the Python code used to generate the random names are available for download from Github. Today, we are going to pay close attention to how long each query takes to execute. So on the right, we will maintain a list of queries and how many milliseconds each query took to complete. These values will vary depending on the specifications of your workstation. We will begin by counting the number of rows in the “person” table… There are indeed 100 MILLION records. Notice how long it took to perform this simple count. 3 seconds and 562 milliseconds. Or 3,562 milliseconds. Let us put this query and the time required in our table. As another speed test, count the number of people with the last name “Smith.” Execute Over 100,000 Smiths... And the query took 4,261 milliseconds. For a second test, count how many people are named “Emma.” Execute… Nearly 50,000. And note the time: 4,066 milliseconds. What if we count the number of people born in May, 1980? For this we will use the BETWEEN operator. Execute… 166,000... And the query took 4,480 milliseconds. A pattern is starting to emerge. Let’s quickly gather a few more speed samples. How many Michaels are in our database? 4,127 milliseconds How many people have the last name ‘Hawkins’ or ‘Snow’? 5,315 milliseconds And how many people were born on April 1, 1995? 4,110 milliseconds. We ran 7 queries, and the execution times were all very similar. This is because in order to answer each query, the database had to scan all 100 MILLION rows to check each record for a match. Our 7 scans of the entire table took a combined total of 29,921 milliseconds. This means on average my workstation was able to scan 23,394,940 rows per second. While this may seem fast, we can do better… MUCH better… And we will do this by building an INDEX. The name ‘index’ was chosen because of how an index works in a book. For example, if you want to learn about “normal distributions”, you would not flip through your entire math book page by page, scanning for the term “normal distribution.” Instead, you would go to the index, find the term “normal distribution”, and you would see the list of pages. This technique is used by databases. When you create an index, the database will generate a method to rapidly find data based on one or more columns. As our first example, let us create an index on the “person” table. To do this, write “CREATE INDEX” and then give your index a name. There are many different naming conventions that you can use. The important thing is to be consistent. We will use the table name, followed by the field name, then add “idx” to indicate this is an index. Next, write “ON” followed by the table name. Inside parentheses, write the columns to include in the index. We will start simply and create an index for the “first_name” column. Execute. That certainly took a while. Remember, the database had to scan 100 MILLION rows and build a “first name” index from scratch. Let’s test our new index by repeating an earlier query where we counted the number of people named “Emma.” Execute. This time, it took only 508 milliseconds. If we repeat the count of people with last name “Smith”, it takes 4,415 milliseconds. What’s going on here? This is about as long as it took last time. It did not speed up because our index was built for the first name column. The last name was not indexed. And as you might suspect, if we search for people born on April 1, 1995, the query should not run any faster than last time. Execute. 4,011 milliseconds. No real change. Our index improved the speed of searches by first name, but had no effect on queries by last name or birthday. However, there are other queries where the “first name” index will improve performance. For example, count the number of people named “Julie Andrews.” Execute. This ran in just 514 milliseconds. By contrast, if you count people with last name ‘Andrews’ and born on June 12, 1992... It takes 4,274 milliseconds. The full name query benefited from the first name index, because the database was able to use it to quickly find all the “Julies.” From this smaller set of records, it then completed the job by finding people with last name ‘Andrews’. The second query, however, had no index to help. It had to perform a full scan of the table to find all people with last name “Andrews”, and then find those with a matching birthday. Fortunately databases are clever about how to use indexes. For example, if you count the number of people born on October 31, 1985 with a first name of “Mia”... It executes quickly: 512 milliseconds... But why? Wouldn’t the database first scan all rows to find people with a matching birthday, and then filter by first name? No! Just because we wrote it in this order does not mean the database will do its work in that order. The database will consider all possible ways to execute your query, then chooses the optimal path. We call each possibility a “query plan”, and the “query optimizer” picks what it thinks is best. You’ll be happy to know a database can have more than one index. Better still, each index can be created using more than one column. Let us create a multi-column index. Write “CREATE INDEX”, then an overly long but descriptive name. Next, write “ON” then the table name. In parentheses, list the columns you want to go into this index. Important Note: The order matters. Think of this as sorting the data first by last_name, then by first_name Execute… It is now time to take this new index for a test run. How many people named “John Williams” are in the database? Execute… There are 36 people. And it only took 27 milliseconds to run this query. What about Julie Andrews? 26 milliseconds. Well done, index.. Well done... A word of caution. Indexes are not free. Just as indexes in a book require paper, indexes in a database require storage. Furthermore, when you add data to a database, it needs to create new records AND update all relevant indexes. So go forth and use indexes, but use them wisely. And before you go, please help us achieve our goal of being the first channel with One Hundred BILLION subscribers! {maniacal laughter hahhahaa please subscribe, really}
Info
Channel: Socratica
Views: 512,316
Rating: undefined out of 5
Keywords: Socratica, SocraticaCS, SQL, structured query language, sql tutorial, learn sql, database, databases, mysql, postgresql, mariadb, oracle, sql server, sqlite, index, indexes, indexes in sql, sql indexes, create index, how to use index in sql, how to use indexes in sql, learn indexes in sql, how to index in sql, what is index in sql, what are indexes in sql, beginner sql, easy sql lesson, beginner sql tutorial, sql for beginners, sql indexes explained, sql indexes examples
Id: fsG1XaZEa78
Channel Id: undefined
Length: 9min 57sec (597 seconds)
Published: Sun Mar 31 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.