[Backend #9] Understand isolation levels & read phenomena in MySQL & PostgreSQL via examples

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
When working with database transactions, one crucial thing we must do is choosing the appropriate isolation level for our application. Although there’s a well-defined standard, each database engine might choose to implement it in a different way, and thus may behave differently in each isolation level. So today we will explore deeply how each level of isolation work in MySQL and Postgres by running some concrete SQL queries. But first, let’s talk a bit about its theory. As we’ve already learned in the previous lecture, a database transaction must satisfy the ACID property, which stands for Atomicity, Consistency, Isolation, and Durability. Isolation is one of the four property of a database transaction, where at its highest level, a perfect isolation ensures that all concurrent transactions will not affect each other. There are several ways that a transaction can be interfered by other transactions that runs simultaneously with it. This interference will cause something we called “read phenomenon”. Here are some read phenomena that might occurs if a database is running at a low level of transaction isolation. First, dirty read phenomenon It happens when a transaction reads data written by other concurrent transaction that has not been committed yet. This is terribly bad, because we don’t know if that other transaction will eventually be committed or rolled back. So we might end up using incorrect data in case rollback occurs. The second phenomenon we might encounter is non-repeatable read. When a transaction reads the same record twice and see different values, because the row has been modified by other transaction that was committed after the first read. Phantom read is a similar phenomenon, but affects queries that search for multiple rows instead of one. In this case, the same query is re-executed, but a different set of rows is returned, due to some changes made by other recently-committed transactions, such as inserting new rows or deleting existing rows which happen to satisfy the search condition of current transaction’s query. Another phenomenon that involves the separation of a group of transactions is serialization anomaly. It’s when the result of a group of concurrent committed transactions could not be achieved if we try to run them sequentially in any order without overlapping each other. Don’t worry if you don’t fully understand these phenomena right now We’re going to replicate each and every one of them in MySQL and Postgres in a few minutes. Now in order to deal with these phenomena, 4 standard isolation levels were defined by the American National Standard Institute or ANSI. The lowest isolation level is read uncommitted. Transactions in this level can see data written by other uncommitted transactions, Thus allowing dirty read phenomenon to happen. The next isolation level is read committed, where transactions can only see data that has been committed by other transactions. Because of this, dirty read is no longer possible. A bit more strict is the repeatable-read isolation level. It ensures that the same select query will always return the same result, no matter how many times it is executed, even if some other concurrent transactions have committed new changes that satisfy the query. Finally the highest isolation level is serializable. Concurrent transactions running in this level are guaranteed to be able to yield the same result as if they’re executed sequentially in some order, one after another without overlapping. So basically it means that there exists at least 1 way to order these concurrent transactions so that if we run them one by one, the final result will be the same. Alright, now it’s time to find the connection between isolation levels and read phenomena. We’re gonna run some transactions with different levels of isolation in MySQL and Postgres to figure out which phenomena are possible in each level. Then we will fill that information into this summary chart. Here I have 2 running docker containers, one is Postgres version 12 and the other is MySQL version 8. Inside those containers, I have also prepared the simple bank database schema with some initial data like what we’re working on in previous lectures. So let’s connect to MySQL console and access that database. Here we go! Now to get the transaction isolation level of the current session, we can run Select @@transaction_isolation By default, it is repeatable read as we can see here. This level is only applied to this specific MySQL console session. There’s also a global isolation level, which is applied to all sessions when they first started. We can get its value by adding global to the previous select query. By default, it is also repeatable read. Now to change the isolation level of current session, we can use this query: Set session transaction isolation level then the name of the level you want to set. For example, read uncommitted. After this, if we run select transaction isolation again, We will see that it has been changed to read uncommitted. Note that this is change will only have effects on all future transactions of this current session, but not on transactions that runs on another session of MySQL console. Alright, now in order to demonstrate the interference between 2 concurrent transactions, I’m gonna open another terminal window, Put it side by side with this one. And start a new MySQL console inside it. Then let’s copy this command to set the isolation level of this session to read uncommitted as well. Then run the select transaction isolation to make sure it’s changed successfully. OK, now both sessions are running at read uncommitted isolation level. We can now start a new transaction. In MySQL, we can either use start transaction statement, or simply use begin statement as an alternative. OK 2 transactions have started. Let’s run a simple select from accounts query in this transaction 1 At the moment, there are 3 accounts with the same balance of 100 dollars. Then in transaction 2, let’s select the first account with id 1. OK we’ve got that account with 100 dollars balance. Now let’s go back to transaction 1 and run this update statement to subtract 10 dollars from account 1. Query OK So if we select account 1 in this transaction 1, we will see that the balance has been changed to 90 dollars. But what if we run this same select statement in transaction 2? It also sees the modified value of the balance: 90 dollars. Note that the transaction 1 is not committed yet, but transaction 2 still sees the changes made by transaction 1. So this is a dirty-read, and it happens because we’re using read-uncommitted isolation level. Alright, now let’s commit these 2 transactions and try a higher isolation level. This time, we will set the isolation level to read committed instead. OK the isolation of this session has been changed. Let’s start a new transaction with begin Then we do the same thing on the other session. Set the session isolation level to read committed. Check if the level has been updated successfully. And begin a new transaction. OK now let’s select all records to see the current state of accounts table. Account 1’s balance is now 90 dollars, And the other 2 accounts both have 100 dollars. Just like what we did before, in transaction 2, let’s select account with ID 1. Then in transaction 1, we update that account’s balance by subtracting 10 dollars from it. OK, the balance has successfully been changed to 80 dollars in this transaction. Let’s see if this change is visible to transaction 2 or not. Now you can see that when we select account 1 in transaction 2, Its balance is still the same: 90 dollars as before. This is because we’re using read-committed isolation level, And since transaction 1 hasn’t been committed yet, its written data could not be seen by other transactions. So read-committed isolation level prevents dirty read phenomenon. How about non-repeatable and phantom read? In transaction 2, let’s run another select from accounts where balance is greater than or equal to 90 dollars Then go back to transaction 1 and commit it. OK now if we read account 1 again in transaction 2, we can see that the balance has been changed to 80 dollars. So the same query that get account 1 returns different value. This is non-repeatable read phenomenon. Also, if we rerun the query to get all accounts with balance of at least 90 dollars, This time we only get 2 records instead of 3 as before, because the balance of account 1 has decreased to 80 after transaction 1 was committed. The same query was executed, but a different set of rows is returned, one row has disappeared due to other committed transaction. This is called phantom-read phenomenon. So now we know that read-committed isolation level can only prevent dirty read, but still allows non-repeatable read and phantom-read phenomena. Let’s commit this transaction and move to a higher level to see what will happen. Now I’m gonna set this session’s transaction isolation level to repeatable read. Check it again just to be sure. And begin a new transaction. Then let’s copy this command and do the same thing for the other session. Alright, now 2 transactions have been started. Let’s first select all accounts in transaction 1. Then select account with ID 1 in transaction 2. Also select all accounts with balance of at least 80 dollars. This will be used to verify if the phantom read still occurs or not. OK now back to transaction 1 and subtract 10 from its balance. Get all accounts to see their current state in transaction 1. We can see the balance of account 1 has decreased to 70 dollars. We know that dirty read has already been stopped at the lower isolation level: read committed, so we don’t need to check it in this level So let’s commit this transaction 1. Then move to transaction 2 to see if it can read the new changes made by transaction 1 or not. Now this select query returns the old version of account 1, with 80 dollars in balance, although transaction 1 has changed it to 70 and was committed successfully. That’s because the repeatable-read isolation level ensures that all read queries are repeatable, Which means, it always returns the same result, even if there are changes made by other committed transactions. Having said that, let’s rerun the query that select accounts with at least 80 dollars As you can see, it still returns the same 3 records as before. So phantom read phenomenon is also prevented in this repeatable-read isolation level. That’s great! However, I wonder what will happen if we run this update query to change account 1’s balance in this transaction? Will it change the balance to 70, 60, or throw an error? Let’s try it! There’s no error. So let’s run select query to get the updated account 1. OK, so it is 60 dollars, which is the correct value because transaction 1 has already committed the change that modified the balance to 70 dollars before. However, from this transaction 2’s point of view, it doesn’t make sense because in the last select query, it saw a balance of 80 dollars, but after subtracting 10 dollars from the account, now it get 60 dollars. The math doesn’t work here because this transaction is still being interfered by concurrent updates from other transactions. I don’t know why MySQL chooses to implement repeatable-read isolation level this way, But it would make more sense to just refuse the change by raising an error in this case to ensure the consistency of the transaction data. Later we will see that’s exactly the way Postgres handles this type of concurrent updates in this isolation level. For now, let’s just rollback this transaction, And try to move on the highest isolation level to see if this issue can be prevented or not. Alright, let’s set this session’s isolation level to serializable. Confirm that it has really changed. And begin the first transaction. Then let’s do the same for the other session. Set the isolation level to serializable, Check it one more time, And begin the second transaction. OK, both transactions are started. Now let’s select all accounts in transaction 1. And select just account 1 in transaction 2. Next, go back to transaction 1 and subtract 10 more dollars from its balance. This time, interestingly, the update query is blocked. So basically the select query in transaction 2 is blocking this update query in transaction 1. The reason is, in serializable isolation level, MySQL implicitly converts all plain SELECT query to SELECT FOR SHARE. And a transaction that holds SELECT FOR SHARE only allows other transactions to READ the rows, but not UPDATE or DELETE them. So with this locking mechanism, the inconsistent data scenario that we’ve seen before is no longer possible. However, this lock has a timeout duration. So if the second transaction doesn’t commit or rollback to release the lock within that duration, we will see a lock wait timeout exceeded error like this. Therefore, in this case, we must retry the first transaction. So when you use serializable isolation level, make sure that you have implemented a transaction retry strategy. OK now I’m gonna restart this transaction 1 Run the select query, Then update account 1’s balance. But this time, I’m not gonna let the lock wait timeout occurs Let’s see what happen if transaction 2 also try to update the same account 1’s balance. Here we go, a deadlock has occurred, because now transaction 2 also needs to wait for a lock from transaction 1’s select query. So be aware that, beside lock wait timeout, you also need to take care of possible deadlock situation. Now let’s try restarting both transaction. Select account 1 in transaction 1, Select account 1 in transaction 2, Update account 1’s balance in transaction 1. Then commit transaction 2. As you can see, the lock is released right away, and the account 1’s balance has been updated successfully. Alright, so far we have experienced all 4 isolation levels in MySQL and how they help preventing some read phenomena. Now let’s see how they work in Postgres. The effects will be quite similar, but there will also be some differences. First, let’s start 2 postgres consoles on these 2 terminal windows. In postgres, to get the current isolation level, we run show transaction isolation level command By default, it is read committed. So 1 level lower than the default value in MySQL. The way we change the isolation level is also different. In MySQL, we set the whole session isolation level before starting the transactions, But in Postgres, we can only set the isolation level within the transaction, and it will only have effects on that 1 specific transaction. So let’s begin the first transaction, And run set transaction isolation level read uncommitted. Now if we show transaction isolation level, we can see that it has been changed to read uncommitted. Let’s do the same thing on the other console. Begin the second transaction, Set the isolation level to read uncommitted, Then show the updated isolation level. OK, now in transaction 1, let’s select all accounts. At the moment, there are 3 accounts with the same balance of 100 dollars. In transaction 2, let’s select only account with ID 1. Then go back to transaction 1 and update its balance. The balance of account 1 has been changed to 90 dollar here. Now we select that account again in transaction 2, It’s still 100 dollars! This is unexpected, because we’re using read-uncommitted level, So transaction 2 should be able to see uncommitted data of transaction 1, right? Well, in fact, if we look at the documentation of Postgres, we can see that read-uncommitted in Postgres behaves exactly the same as read-committed. So basically, we can say that Postgres only have 3 isolation levels, and the lowest level is read-committed. It makes sense because normally we would never want to use read-uncommitted in any circumstances. OK, so let’s go ahead and commit transaction 1. And select account 1 in transaction 2 one more time. Now it sees the committed balance: 90 dollars, as expected. Alright, let’s commit this transaction and move to the next isolation level. I’m gonna start the first transaction, Set its isolation level to read committed, Show transaction isolation level to make sure it’s been updated. Then do the same thing for the second transaction. Now just like before, let’s select all accounts in transaction 1. Then select just account 1 in transaction 2, Beside dirty read phenomenon, we also want to see how it handle phantom read, so let’s find all accounts where balance is greater than or equal to 90 dollars. At the moment, all 3 records satisfy this search condition. Now let’s go back to transaction 1 and subtract 10 dollars from account 1’s balance If we select account 1 in transaction 2, it will still be 90 dollars because transaction 1 is not committed yet. So dirty read is not possible in read-committed isolation level. Let’s see what happen if we commit transaction 1. This time transaction 2 can see the updated balance of 80 dollars. Now if we run the query to search for accounts with at least 90 dollars again, We will only see 2 records instead of 3 as before. The updated account 1’s balance no longer satisfies the search condition, So it has disappeared from the result set. So phantom read has occured in this read-committed isolation level, That’s the same behaviour as in MySQL. Let’s commit this transaction and move up 1 level. I’m gonna begin the first transaction, Then set its transaction isolation level to repeatable read. OK, the level has been changed correctly. Let’s do the same thing for the second transaction. Begin the transaction, Set isolation level to repeatable read, Check it again, Alright, now let’s select all accounts in transaction 1. Then select just account with ID 1 in transaction 2. Also search for the accounts with balance of at least 80 dollars. Now go back to transaction 1 and subtract 10 more dollars from its balance. The balance has been updated to 70 dollars in this transaction. Let’s commit it and see what will happen in transaction 2. Now if we select account 1 in transaction 2, It’s still 80 dollars as before, although transaction 1 has committed its change. That’s because we’re using repeatable read isolation level, so the same select query should always return the same result. Non-repeatable read phenomenon cannot happen in this case. Also, if we rerun the query to search for accounts with at least 80 dollars, we still get the same 3 records as before. So phantom read is also prevented in this isolation level. Now I’m gonna try to run this update account balance query to see how it behaves. In MySQL’s repeatable read isolation level, we have seen that it allows the balance to be updated to 60 dollars, But here, in Postgres, we’ve got an error: could not serialize access due to concurrent update. I think throwing out an error in like this is much better than allowing the balance to be modified, because it avoid a confusing state, where the transaction saw subtracting 10 from 80 produces 60. So kudos to Postgres! Until now we have encountered 3 types of phenomena: dirty read, non-repeatable read, and phantom read. But we haven’t run into serialization anomaly yet. So this time, let’s see how it’s gonna look like. Let’s start a new transaction, And set its isolation level to repeatable-read. Then do the same for the second transaction. OK, here we have 2 running transactions with repeatable-read isolation level. In transaction 1, let’s select all accounts record. Now imagine that we have a use-case, where we have to compute the sum of all accounts’ balance then create a new account with that total balance. So let’s run select sum(balance) from accounts. It's 270 dollars. Then we insert into accounts table a new record, where owner is ‘sum’, balance is 270, and currency is USD. OK now if we select all accounts, we can see the new record in this transaction. However, what if transaction 2 also wants to perform this operation? Since we’re using repeatable-read isolation level, The select query in transaction 2 will only see the original list of accounts, without the new record that transaction 1 has just inserted. Therefore, it will get the same value for the sum of accounts balance, 270 dollars And thus, end up inserting the same record to the accounts table. OK now let’s commit both transactions to see what will happen. They were both committed successfully. And there are 2 duplicate sum records with the same balance of 270 dollars. This is a serialization anomaly. Why? Because if these 2 transactions are run serially, one after another, Then there’s no way we can have 2 records with the same sum of 270 like that. It doesn’t matter if transaction 1 or transaction 2 runs first, we should have 1 record of 270 dollars, and another record of 540 dollars. OK so that’s how serialization anomaly occurs in repeatable-read isolation level. Now let’s try the highest level: serializable to see if this anomaly can be stopped or not. I’m gonna start a new transaction, Then set its isolation level to serializable. Check if it’s really changed. Then copy this statement, and do the same thing for the second transaction. Now let’s select all accounts in transaction 1. Calculate the sum of all balances. And insert a new account with balance equals to this sum. Now as you can see, a new sum record of 810 dollars has been inserted in transaction 1. Let’s go to transaction 2 and run the same series of query. After this select query, we can see that the list of accounts in both transactions are identical. Let’s try to commit both of them. The first transaction is committed successfully. However, the 2nd transaction throws an error: could not serialize access due to read/write dependencies among transactions. And Postgres gives us a hint that the transaction might succeed if we retry it. So this is good! The serializable anomaly is completely prevented. The 2 concurrent transactions no longer create duplicate records like they did before. We can conclude that Postgres uses a dependencies checking mechanism to detect potential read phenomena and stop them by throwing out an error. MySQL, on the other hand, chooses to use locking mechanism to achieve similar result. Let’s see how it handle serialization anomaly! Here are two MySQL consoles that we were working on, Let’s set this session’s transaction isolation level to serializable. Select it to make sure it has been changed. And start the first transaction. Then let’s do the same thing in the other session to start the second transaction. Now back to transaction 1, Select all accounts record, Compute the sum of all accounts’ balance. And insert a new record with that sum into the accounts table. Select all accounts again to confirm that the new record is really inserted. Then switch to transaction 2 and run the query to select all accounts. As you can see, this query is blocked and it needs to wait for transaction 1 to release the lock before continue. As soon as we commit transaction 1, The lock is released, and transaction 2 get the result of its query immediately. Now we can continue running the sum and insert query in this transaction. And finally commit it. There’s no duplicate sum records. So MySQL has also successfully prevented the serialization anomaly with its locking mechanism. Now I want to try different order of the queries in these 2 transactions. First begin transaction 1, Select all accounts record, Select sum of all accounts’ balance. Then bein transaction 2, Also select all accounts, And calculate the sum of all accounts’ balance. This time, both transactions are having the same sum of 1040 dollars. Let’s insert a new account with this sum in transaction 1. It’s blocked because transaction 2 is holding a share lock that prevent other transactions from updating. Now if we try to insert a new sum account in transaction 2, We will get a deadlock, because in this case, both transactions have to wait for each other. And since transaction 2 failed due to the deadlock, The lock is immediately released, which allow transaction 1 to complete its insert query. So after commiting transaction 1, We can see that a new sum account is successfully inserted. The database stays consistent with no serialization anomaly. Now before we finish, let’s do a quick summary of the relationship between isolation levels and read phenomena in MySQL and Postgres. In MySQL, the lowest isolation level read uncommitted allows all 4 phenomena to occur. While the next level: read committed only prevents dirty read. The rest 3 phenomena are still possible. The repeatable read level in MySQL stops the first 3 phenomena: dirty read, non-repeatable read, and phantom read, But it still spares serialization anomaly, and even some inconsistent concurrent updates. The highest isolation level: serializable is the most strict. It prevents all 4 phenomena. Thanks to the locking mechanism. The isolation levels in Postgres produces quite similar result, However, there are still some major differences: First, the read-uncommitted isolation level behaves the same as read-committed. And second, Postgres doesn’t use locking mechanism as MySQL, but it uses a better dependencies detection technique to stop non-repeatable read, inconsistent concurrent updates, and serialization anomaly. Also the default isolation level in Postgres is only read-committed, while it is repeatable-read in MySQL. The most important thing you should keep in mind when using high isolation level is that there might be some errors, timeout, or even deadlock, thus, we should carefully implement a retry mechanism for our transactions. Also, each database engine might implement isolation level differently, So make sure that you have read its documentation carefully, and tried it on your own first before jumping into coding. I will put the links to the official documentation about isolation level of MySQL and Postgres in the description, in case you want to have refererence. And that’s all for today’s video. I really hope it’s helpful for you. Thanks a lot for watching and see you in the next lecture!
Info
Channel: TECH SCHOOL
Views: 5,883
Rating: 4.9461884 out of 5
Keywords: transaction isolation levels, transaction isolation, mysql isolation levels, mysql isolation level, postgres isolation levels, postgres isolation level, postgresql isolation levels, postgresql isolation level, isolation levels, read phenomena, dirty read, phantom read, non repeatable read, serialization anomaly, mysql, postgres, postgresql, backend course, backend development, backend tutorial, coding tutorial, programming tutorial, tech school, database tutorial
Id: 4EajrPgJAk0
Channel Id: undefined
Length: 29min 37sec (1777 seconds)
Published: Mon Sep 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.