When working with database transactions, one crucial thing we must do is choosing the appropriate isolation level for
our application. Although there’s a well-defined standard, each database engine might choose to implement
it in a different way, and thus may behave differently in each isolation
level. So today we will explore deeply how each level of isolation work in MySQL
and Postgres by running some concrete SQL queries. But first, let’s talk a bit about its theory. As we’ve already learned in the previous
lecture, a database transaction must satisfy the ACID
property, which stands for Atomicity, Consistency, Isolation,
and Durability. Isolation is one of the four property of a
database transaction, where at its highest level, a perfect isolation
ensures that all concurrent transactions will not affect
each other. There are several ways that a transaction
can be interfered by other transactions that runs simultaneously
with it. This interference will cause something we
called “read phenomenon”. Here are some read phenomena that might occurs if a database is running at a low level of
transaction isolation. First, dirty read phenomenon It happens when a transaction reads data written
by other concurrent transaction that has not been committed yet. This is terribly bad, because we don’t know
if that other transaction will eventually be committed or rolled back. So we might end up using incorrect data in
case rollback occurs. The second phenomenon we might encounter is
non-repeatable read. When a transaction reads the same record twice
and see different values, because the row has been modified by other
transaction that was committed after the first read. Phantom read is a similar phenomenon, but affects queries that search for multiple
rows instead of one. In this case, the same query is re-executed, but a different set of rows is returned, due to some changes made by other recently-committed
transactions, such as inserting new rows or deleting existing
rows which happen to satisfy the search condition
of current transaction’s query. Another phenomenon that involves the separation
of a group of transactions is serialization anomaly. It’s when the result of a group of concurrent
committed transactions could not be achieved if we try to run them sequentially in any
order without overlapping each other. Don’t worry if you don’t fully understand
these phenomena right now We’re going to replicate each and every
one of them in MySQL and Postgres in a few minutes. Now in order to deal with these phenomena, 4 standard isolation levels were defined by the American National Standard Institute or
ANSI. The lowest isolation level is read uncommitted. Transactions in this level can see data written
by other uncommitted transactions, Thus allowing dirty read phenomenon to happen. The next isolation level is read committed, where transactions can only see data that
has been committed by other transactions. Because of this, dirty read is no longer possible. A bit more strict is the repeatable-read isolation
level. It ensures that the same select query will
always return the same result, no matter how many times it is executed, even if some other concurrent transactions
have committed new changes that satisfy the query. Finally the highest isolation level is serializable. Concurrent transactions running in this level
are guaranteed to be able to yield the same result as if they’re executed sequentially in some
order, one after another without overlapping. So basically it means that there exists at
least 1 way to order these concurrent transactions so that if we run them one by one, the final
result will be the same. Alright, now it’s time to find the connection
between isolation levels and read phenomena. We’re gonna run some transactions with different
levels of isolation in MySQL and Postgres to figure out which phenomena are possible
in each level. Then we will fill that information into this
summary chart. Here I have 2 running docker containers, one is Postgres version 12 and the other is
MySQL version 8. Inside those containers, I have also prepared
the simple bank database schema with some initial data like what we’re working
on in previous lectures. So let’s connect to MySQL console and access
that database. Here we go! Now to get the transaction isolation level
of the current session, we can run Select @@transaction_isolation By default, it is repeatable read as we can
see here. This level is only applied to this specific
MySQL console session. There’s also a global isolation level, which
is applied to all sessions when they first started. We can get its value by adding global to the
previous select query. By default, it is also repeatable read. Now to change the isolation level of current
session, we can use this query: Set session transaction isolation level then the name of the level you want to set. For example, read uncommitted. After this, if we run select transaction isolation
again, We will see that it has been changed to read
uncommitted. Note that this is change will only have effects
on all future transactions of this current session, but not on transactions that runs on another
session of MySQL console. Alright, now in order to demonstrate the interference
between 2 concurrent transactions, I’m gonna open another terminal window, Put it side by side with this one. And start a new MySQL console inside it. Then let’s copy this command to set the
isolation level of this session to read uncommitted as well. Then run the select transaction isolation
to make sure it’s changed successfully. OK, now both sessions are running at read
uncommitted isolation level. We can now start a new transaction. In MySQL, we can either use start transaction
statement, or simply use begin statement as an alternative. OK 2 transactions have started. Let’s run a simple select from accounts
query in this transaction 1 At the moment, there are 3 accounts with the
same balance of 100 dollars. Then in transaction 2, let’s select the
first account with id 1. OK we’ve got that account with 100 dollars
balance. Now let’s go back to transaction 1 and run this update statement to subtract
10 dollars from account 1. Query OK So if we select account 1 in this transaction
1, we will see that the balance has been changed
to 90 dollars. But what if we run this same select statement
in transaction 2? It also sees the modified value of the balance:
90 dollars. Note that the transaction 1 is not committed
yet, but transaction 2 still sees the changes made
by transaction 1. So this is a dirty-read, and it happens because
we’re using read-uncommitted isolation level. Alright, now let’s commit these 2 transactions
and try a higher isolation level. This time, we will set the isolation level
to read committed instead. OK the isolation of this session has been
changed. Let’s start a new transaction with begin Then we do the same thing on the other session. Set the session isolation level to read committed. Check if the level has been updated successfully. And begin a new transaction. OK now let’s select all records to see the
current state of accounts table. Account 1’s balance is now 90 dollars, And the other 2 accounts both have 100 dollars. Just like what we did before, in transaction 2, let’s select account with ID 1. Then in transaction 1, we update that account’s
balance by subtracting 10 dollars from it. OK, the balance has successfully been changed
to 80 dollars in this transaction. Let’s see if this change is visible to transaction 2 or not. Now you can see that when we select account 1 in transaction 2, Its balance is still the same: 90 dollars
as before. This is because we’re using read-committed
isolation level, And since transaction 1 hasn’t been committed
yet, its written data could not be seen by other
transactions. So read-committed isolation level prevents
dirty read phenomenon. How about non-repeatable and phantom read? In transaction 2, let’s run another select
from accounts where balance is greater than or equal to
90 dollars Then go back to transaction 1 and commit it. OK now if we read account 1 again in transaction 2, we can see that the balance has been changed
to 80 dollars. So the same query that get account 1 returns
different value. This is non-repeatable read phenomenon. Also, if we rerun the query to get all accounts
with balance of at least 90 dollars, This time we only get 2 records instead of
3 as before, because the balance of account 1 has decreased
to 80 after transaction 1 was committed. The same query was executed, but a different
set of rows is returned, one row has disappeared due to other committed
transaction. This is called phantom-read phenomenon. So now we know that read-committed isolation
level can only prevent dirty read, but still allows non-repeatable read and phantom-read
phenomena. Let’s commit this transaction and move to
a higher level to see what will happen. Now I’m gonna set this session’s transaction
isolation level to repeatable read. Check it again just to be sure. And begin a new transaction. Then let’s copy this command and do the
same thing for the other session. Alright, now 2 transactions have been started. Let’s first select all accounts in transaction 1. Then select account with ID 1 in transaction 2. Also select all accounts with balance of at
least 80 dollars. This will be used to verify if the phantom
read still occurs or not. OK now back to transaction 1 and subtract
10 from its balance. Get all accounts to see their current state
in transaction 1. We can see the balance of account 1 has decreased
to 70 dollars. We know that dirty read has already been stopped
at the lower isolation level: read committed, so we don’t need to check it in this level So let’s commit this transaction 1. Then move to transaction 2 to see if it can read the new changes made by transaction
1 or not. Now this select query returns the old version
of account 1, with 80 dollars in balance, although transaction 1 has changed it to 70
and was committed successfully. That’s because the repeatable-read isolation
level ensures that all read queries are repeatable, Which means, it always returns the same result, even if there are changes made by other committed
transactions. Having said that, let’s rerun the query
that select accounts with at least 80 dollars As you can see, it still returns the same
3 records as before. So phantom read phenomenon is also prevented
in this repeatable-read isolation level. That’s great! However, I wonder what will happen if we run
this update query to change account 1’s balance in this transaction? Will it change the balance to 70, 60, or throw
an error? Let’s try it! There’s no error. So let’s run select query to get the updated
account 1. OK, so it is 60 dollars, which is the correct value because transaction
1 has already committed the change that modified the balance to 70 dollars before. However, from this transaction 2’s point
of view, it doesn’t make sense because in the last select query, it saw a
balance of 80 dollars, but after subtracting 10 dollars from the
account, now it get 60 dollars. The math doesn’t work here because this
transaction is still being interfered by concurrent updates from other transactions. I don’t know why MySQL chooses to implement
repeatable-read isolation level this way, But it would make more sense to just refuse
the change by raising an error in this case to ensure the consistency of the transaction
data. Later we will see that’s exactly the way
Postgres handles this type of concurrent updates in this isolation level. For now, let’s just rollback this transaction, And try to move on the highest isolation level
to see if this issue can be prevented or not. Alright, let’s set this session’s isolation
level to serializable. Confirm that it has really changed. And begin the first transaction. Then let’s do the same for the other session. Set the isolation level to serializable, Check it one more time, And begin the second transaction. OK, both transactions are started. Now let’s select all accounts in transaction
1. And select just account 1 in transaction 2. Next, go back to transaction 1 and subtract
10 more dollars from its balance. This time, interestingly, the update query
is blocked. So basically the select query in transaction
2 is blocking this update query in transaction 1. The reason is, in serializable isolation level, MySQL implicitly converts all plain SELECT
query to SELECT FOR SHARE. And a transaction that holds SELECT FOR SHARE only allows other transactions to READ the
rows, but not UPDATE or DELETE them. So with this locking mechanism, the inconsistent data scenario that we’ve
seen before is no longer possible. However, this lock has a timeout duration. So if the second transaction doesn’t commit or rollback to release the lock within that duration, we will see a lock wait timeout exceeded error
like this. Therefore, in this case, we must retry the
first transaction. So when you use serializable isolation level, make sure that you have implemented a transaction
retry strategy. OK now I’m gonna restart this transaction 1 Run the select query, Then update account 1’s balance. But this time, I’m not gonna let the lock
wait timeout occurs Let’s see what happen if transaction 2 also
try to update the same account 1’s balance. Here we go, a deadlock has occurred, because now transaction 2 also needs to wait
for a lock from transaction 1’s select query. So be aware that, beside lock wait timeout, you also need to take care of possible deadlock
situation. Now let’s try restarting both transaction. Select account 1 in transaction 1, Select account 1 in transaction 2, Update account 1’s balance in transaction 1. Then commit transaction 2. As you can see, the lock is released right
away, and the account 1’s balance has been updated
successfully. Alright, so far we have experienced all 4
isolation levels in MySQL and how they help preventing some read phenomena. Now let’s see how they work in Postgres. The effects will be quite similar, but there
will also be some differences. First, let’s start 2 postgres consoles on
these 2 terminal windows. In postgres, to get the current isolation
level, we run show transaction isolation level command By default, it is read committed. So 1 level lower than the default value in
MySQL. The way we change the isolation level is also
different. In MySQL, we set the whole session isolation
level before starting the transactions, But in Postgres, we can only set the isolation
level within the transaction, and it will only have effects on that 1 specific
transaction. So let’s begin the first transaction, And run set transaction isolation level read
uncommitted. Now if we show transaction isolation level, we can see that it has been changed to read
uncommitted. Let’s do the same thing on the other console. Begin the second transaction, Set the isolation level to read uncommitted, Then show the updated isolation level. OK, now in transaction 1, let’s select all
accounts. At the moment, there are 3 accounts with the
same balance of 100 dollars. In transaction 2, let’s select only account
with ID 1. Then go back to transaction 1 and update its
balance. The balance of account 1 has been changed
to 90 dollar here. Now we select that account again in transaction 2, It’s still 100 dollars! This is unexpected, because we’re using
read-uncommitted level, So transaction 2 should be able to see uncommitted
data of transaction 1, right? Well, in fact, if we look at the documentation
of Postgres, we can see that read-uncommitted in Postgres
behaves exactly the same as read-committed. So basically, we can say that Postgres only
have 3 isolation levels, and the lowest level is read-committed. It makes sense because normally we would never
want to use read-uncommitted in any circumstances. OK, so let’s go ahead and commit transaction
1. And select account 1 in transaction 2 one
more time. Now it sees the committed balance: 90 dollars,
as expected. Alright, let’s commit this transaction and
move to the next isolation level. I’m gonna start the first transaction, Set its isolation level to read committed, Show transaction isolation level to make sure
it’s been updated. Then do the same thing for the second transaction. Now just like before, let’s select all accounts
in transaction 1. Then select just account 1 in transaction 2, Beside dirty read phenomenon, we also want
to see how it handle phantom read, so let’s find all accounts where balance
is greater than or equal to 90 dollars. At the moment, all 3 records satisfy this
search condition. Now let’s go back to transaction 1 and subtract
10 dollars from account 1’s balance If we select account 1 in transaction 2, it will still be 90 dollars because transaction
1 is not committed yet. So dirty read is not possible in read-committed
isolation level. Let’s see what happen if we commit transaction 1. This time transaction 2 can see the updated
balance of 80 dollars. Now if we run the query to search for accounts
with at least 90 dollars again, We will only see 2 records instead of 3 as
before. The updated account 1’s balance no longer
satisfies the search condition, So it has disappeared from the result set. So phantom read has occured in this read-committed
isolation level, That’s the same behaviour as in MySQL. Let’s commit this transaction and move up
1 level. I’m gonna begin the first transaction, Then set its transaction isolation level to
repeatable read. OK, the level has been changed correctly. Let’s do the same thing for the second transaction. Begin the transaction, Set isolation level to repeatable read, Check it again, Alright, now let’s select all accounts in
transaction 1. Then select just account with ID 1 in transaction 2. Also search for the accounts with balance
of at least 80 dollars. Now go back to transaction 1 and subtract
10 more dollars from its balance. The balance has been updated to 70 dollars
in this transaction. Let’s commit it and see what will happen
in transaction 2. Now if we select account 1 in transaction 2, It’s still 80 dollars as before, although
transaction 1 has committed its change. That’s because we’re using repeatable
read isolation level, so the same select query should always return
the same result. Non-repeatable read phenomenon cannot happen
in this case. Also, if we rerun the query to search for
accounts with at least 80 dollars, we still get the same 3 records as before. So phantom read is also prevented in this
isolation level. Now I’m gonna try to run this update account
balance query to see how it behaves. In MySQL’s repeatable read isolation level, we have seen that it allows the balance to
be updated to 60 dollars, But here, in Postgres, we’ve got an error: could not serialize access due to concurrent update. I think throwing out an error in like this is much better than allowing the balance to be modified, because it avoid a confusing state, where the transaction saw
subtracting 10 from 80 produces 60. So kudos to Postgres! Until now we have encountered 3 types of phenomena: dirty read, non-repeatable read, and phantom
read. But we haven’t run into serialization anomaly
yet. So this time, let’s see how it’s gonna
look like. Let’s start a new transaction, And set its isolation level to repeatable-read. Then do the same for the second transaction. OK, here we have 2 running transactions with
repeatable-read isolation level. In transaction 1, let’s select all accounts
record. Now imagine that we have a use-case, where we have to compute the sum of all accounts’
balance then create a new account with that total
balance. So let’s run select sum(balance) from accounts. It's 270 dollars. Then we insert into accounts table a new record, where owner is ‘sum’, balance is 270,
and currency is USD. OK now if we select all accounts, we can see
the new record in this transaction. However, what if transaction 2 also wants
to perform this operation? Since we’re using repeatable-read isolation
level, The select query in transaction 2 will only
see the original list of accounts, without the new record that transaction 1
has just inserted. Therefore, it will get the same value for
the sum of accounts balance, 270 dollars And thus, end up inserting the same record
to the accounts table. OK now let’s commit both transactions to
see what will happen. They were both committed successfully. And there are 2 duplicate sum records with
the same balance of 270 dollars. This is a serialization anomaly. Why? Because if these 2 transactions are run serially,
one after another, Then there’s no way we can have 2 records
with the same sum of 270 like that. It doesn’t matter if transaction 1 or transaction
2 runs first, we should have 1 record of 270 dollars, and
another record of 540 dollars. OK so that’s how serialization anomaly occurs
in repeatable-read isolation level. Now let’s try the highest level: serializable
to see if this anomaly can be stopped or not. I’m gonna start a new transaction, Then set its isolation level to serializable. Check if it’s really changed. Then copy this statement, and do the same
thing for the second transaction. Now let’s select all accounts in transaction 1. Calculate the sum of all balances. And insert a new account with balance equals
to this sum. Now as you can see, a new sum record of 810
dollars has been inserted in transaction 1. Let’s go to transaction 2 and run the same
series of query. After this select query, we can see that the list of accounts in both transactions are identical. Let’s try to commit both of them. The first transaction is committed successfully. However, the 2nd transaction throws an error: could not serialize access due to read/write
dependencies among transactions. And Postgres gives us a hint that the transaction
might succeed if we retry it. So this is good! The serializable anomaly is completely prevented. The 2 concurrent transactions no longer create
duplicate records like they did before. We can conclude that Postgres uses a dependencies
checking mechanism to detect potential read phenomena and stop
them by throwing out an error. MySQL, on the other hand, chooses to use locking
mechanism to achieve similar result. Let’s see how it handle serialization anomaly! Here are two MySQL consoles that we were working
on, Let’s set this session’s transaction isolation
level to serializable. Select it to make sure it has been changed. And start the first transaction. Then let’s do the same thing in the other
session to start the second transaction. Now back to transaction 1, Select all accounts record, Compute the sum of all accounts’ balance. And insert a new record with that sum into
the accounts table. Select all accounts again to confirm that
the new record is really inserted. Then switch to transaction 2 and run the query
to select all accounts. As you can see, this query is blocked and it needs to wait for transaction 1 to
release the lock before continue. As soon as we commit transaction 1, The lock is released, and transaction 2 get
the result of its query immediately. Now we can continue running the sum and insert
query in this transaction. And finally commit it. There’s no duplicate sum records. So MySQL has also successfully prevented the
serialization anomaly with its locking mechanism. Now I want to try different order of the queries
in these 2 transactions. First begin transaction 1, Select all accounts record, Select sum of all accounts’ balance. Then bein transaction 2, Also select all accounts, And calculate the sum of all accounts’ balance. This time, both transactions are having the
same sum of 1040 dollars. Let’s insert a new account with this sum
in transaction 1. It’s blocked because transaction 2 is
holding a share lock that prevent other transactions from updating. Now if we try to insert a new sum account
in transaction 2, We will get a deadlock, because in this case,
both transactions have to wait for each other. And since transaction 2 failed due to the
deadlock, The lock is immediately released, which allow
transaction 1 to complete its insert query. So after commiting transaction 1, We can see that a new sum account is successfully
inserted. The database stays consistent
with no serialization anomaly. Now before we finish, let’s do a quick summary of the relationship between isolation levels and read phenomena
in MySQL and Postgres. In MySQL, the lowest isolation level read
uncommitted allows all 4 phenomena to occur. While the next level: read committed only
prevents dirty read. The rest 3 phenomena are still possible. The repeatable read level in MySQL stops the
first 3 phenomena: dirty read, non-repeatable read, and phantom
read, But it still spares serialization anomaly,
and even some inconsistent concurrent updates. The highest isolation level: serializable
is the most strict. It prevents all 4 phenomena. Thanks to the locking mechanism. The isolation levels in Postgres produces
quite similar result, However, there are still some major differences: First, the read-uncommitted isolation level
behaves the same as read-committed. And second, Postgres doesn’t use locking
mechanism as MySQL, but it uses a better dependencies detection
technique to stop non-repeatable read, inconsistent
concurrent updates, and serialization anomaly. Also the default isolation level in Postgres
is only read-committed, while it is repeatable-read in MySQL. The most important thing you should keep in
mind when using high isolation level is that there might be some errors, timeout, or even
deadlock, thus, we should carefully implement a retry
mechanism for our transactions. Also, each database engine might implement
isolation level differently, So make sure that you have read its documentation
carefully, and tried it on your own first before jumping
into coding. I will put the links to the official documentation
about isolation level of MySQL and Postgres in the description,
in case you want to have refererence. And that’s all for today’s video. I really hope it’s helpful for you. Thanks a lot for watching and see you in the
next lecture!