How do Transactions Work? | Get to know Cloud Firestore #8

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
TODD KERPELMAN: So Cloud Firestore doesn't exist in a nice, pristine world where your database requests march in and out in an orderly fashion, like British people properly queuing at their favorite chip shop. Instead, it's chaos. You've got requests coming in from all sides, all trying to access or change the same document all at once, sort of like Americans on Black Friday when there are TVs on sale. And you have to make sense of that chaos. Luckily for you, you have a very powerful tool at your disposal to bring order to this chaos. And that's transactions and, to a lesser extent, its cousin, batched writes. So if you want to avoid weird race conditions in your code, these are the tools you're going to want to master. Let's find out how on today's episode. Hey, wait a minute. That was mine. I was going to own-- [MUSIC PLAYING] So let's talk about transactions. Generally speaking, you're going to use transactions in cases where you want to perform a bunch of operations on the database, but you kind of need them to be performed all at once, meaning that I don't want the database to change out from under me while I'm in the middle of making my changes. And if I'm making a bunch of changes, I don't want users being able to read my set of documents in that partially changed state. And I certainly don't want a set of operations failing halfway through, leaving me in a weird and inconsistent state forever. It's got to be all or nothing. For example, let's say I'm changing a value that exists in a couple of places in the database. This is something you're going to be doing pretty often in our de-normalized database world where you might have redundant data in a couple of places. Like, let's say I've got our literary app, where I've got a collection of books and a collection of authors. And it just so happens that I have the author's name recorded in a few places, both here in the author document, but also here in each of the books. Now, let's say we hear from Charles Dickens' literary agent that he wants to re-brand himself a little bit-- you know, to stay relevant. So I could try to perform this changes, just like three or four different write requests in a row. But if my app crashes or my flaky network connection disconnects midway through making these requests, I'll be in a state where some of my database has the old value and some of my database has the new value. And that could mess up my app in all sorts of insidious ways. And heck, even if my client doesn't die, if another client happens to read-in these documents while I'm in the middle of making these writes, we could end up in the same situation. And what would happen if another client or cloud function were also trying to change my author's name at the same time? If all these writes were happening one after the other, I could end up in a situation where half the documents contain my changes and the other half contain theirs. Or let's take another example. Say I've got a role-playing game. And I'm trying to buy an item from another player. Well, we'll need to remove gold from my inventory and give it to the other player. But then we'll also need to remove her plus-3 pen of accounting or whatever and add it to my inventory. But what would happen if this series of events got interrupted, for some reason? Depending on when or how it fails, I could end up with no gold and no pen, or all the gold and a pen, or even weirder cases where both of us have a copy of the pen. And I'm sorry, but I can't respect any MMO that doesn't obey the laws of conservation of mass. Hey, buddy, where'd you get all those pens? Are those really yours? Whoa, whoa, whoa, don't write on me. That's a plus-3 pen. My stain remover's only plus-2. So all these cases can be handled by transactions to ensure there's no funny business going on here. Now, there are two types of transactions you can perform in the database. The first one is a much simpler version, known as a batched write. Basically, a batched write lets you perform a bunch of writes. And these can be document creations, deletions, alterations, what-have-you, all at once in a batch. You tell the SDK what changes you want to make. It tells the database. And the database will apply your changes to all the documents that you've specified. If any part of your batched write fails along the way, none of it will go through. The entire batch will be rolled back to its original state. This is known as being atomic in the database world, which is way less exciting than being atomic in the superhero world, but still useful. The point here is that you don't have to worry about your write failing halfway through. And you don't have to worry about somebody else's writes getting mixed up with yours. Sure, if there's another client trying to make these changes at the same time with their own batched write, they could totally blow away your changes when your write is done. But this weird hybrid state where you end up with a mix of my changes and their changes all at once-- that won't happen. The really nice part of all this, by the way, is that batched writes are generally easy for the database to perform. In fact, in many ways, a batched write is actually more efficient than trying to make a bunch of individual write calls. And the implementation is pretty straightforward, too. It's not much more different than performing any other database write. So when would you want to use these? Well, basically anytime you want to change a bunch of related documents and you don't really care about what the old value was. One case might be when you have your de-normalized data living in multiple places among your database and you want to change them all at once. So looking back at our author and book example, this could work. If we know we need to change our author's name in just these places and our new value isn't really reliant on the old value, well, then we could send up these four writes in one nice, little batch. But now, what about our role-playing game where our two players want to trade items? It seems like what we could do here is create a batched write where I get a plus-3 pen of accounting and 100 less gold and my trading partner gets 100 gold and no pen. But here's the problem-- this batched write would only work if I know the copy of my data is up to date. And it might not be. Maybe I don't have real-time listeners set up. So I'm actually working off of data that's, like, 10 minutes old. Even if I do have real-time listeners set up, there are race conditions that could happen where the data that's on my client isn't quite up to date with what's on the server. For example, let's say I decide to send down this write to perform this trade. But in the meantime, some other event in the game has awarded me 1,000 gold coins. Awesome, right? Well, it would be until my trade comes along and sets my gold value equal to 200 coins. And just like that, I'm out 1,000 gold pieces. Or maybe my partner has already traded away her plus-3 pen to somebody else. My trade function doesn't know that yet. So I still get the plus-3 pen. And hey, guess what? My players have just discovered an exploit where they can keep creating an infinite number of plus-3 pens in the world. And don't think that there aren't clever players out there who are looking for logic issues just like this and chomping at the bit to exploit them. So in this situation, we're going to want to turn to a transaction. A transaction is like a batched write, but we tell the database that we want to read in some documents first to make sure we're working with the most up-to-date data. And we're going to assume that these documents aren't going to change until the transaction is complete. Now, the code for creating a transaction, it looks a little strange at first. And I encourage you to check out the documentation for a sample or two. But once you've written a couple of these things, you'll get more comfortable with it, and it'll be second nature. So the way transactions work is something like this. First, in your transaction, your client is going to read-in a document or two from the database. Next, your client is probably going to perform a small amount of logic, based on the data you get back from that read. Third, you will then tell the database exactly what changes you want to make, based on this data. Then the database will check and see, have any of these relevant documents, either the ones I've read or are about to write to, changed since I started my transaction? Now, if nothing's changed, it'll go ahead and commit all these changes at once, which is an almost instantaneous step. But if something has changed, we're going to go back to step 1. The client is going to re-fetch those documents. And it's going to try the entire transaction all over again. And we repeat this process either until the transaction finally succeeds or we've retried it too many times and the transaction fails. Now, this process might seem strange to you if you're coming from other databases where a transaction typically means that a portion of the database is locked, and nobody can make any changes until that transaction is complete. That strategy certainly can work, too. And in fact, our server libraries do work that way. And I'll talk about that more in a little bit. But when you're dealing with mobile clients that might suddenly lose battery or drop network connections or go into airplane mode, locking your database is often a bad idea. You don't want to be in a situation where your database is locked and waiting for some final bit of code from the client that it's never going to get. Instead, we don't lock anything and just retry the entire transaction if the database sees that something has changed. Now, this is a strategy known as optimistic concurrency control. It's based on the idea that most of the time, our transaction will probably work just fine. So let's optimize for that case. And in the rare case where it doesn't work out, we can go through the hassle of redoing the entire thing. And if you think about it, this kind of makes sense. The whole NoSQL model is based on the theory that reads happen way more often than writes. And in fact, given some of the limits on how often you can write to the same document, that is probably going to be true. So maybe my Black Friday analogy wasn't so good. Maybe it's more like a Black Friday sale where copies of cassette tape players are on sale. I'm sure you might get a few small bursts of interest from hipsters buying them ironically, but it's not quite the madhouse you were expecting. Yeah. Yeah, you should probably diversify, dude. All right. So let's go back to our trade example and think about how we'd want to accomplish this with a transaction. I think, first off, our transaction would request the documents representing the inventory from both of our players. Next, it would perform a little logic to make sure this trade is still possible. Do I have enough gold to make the trade? Does my partner still have the pen? Then we'd perform a couple of writes on our transaction, based on the idea that the data we have here on our two players is completely up to date. That gets sent down to the database, which would then do a sanity check to make sure that nothing has changed since our first read. And then it would go ahead and commit the changes all at once. So those weird edge cases are avoided. Like I said, if some Cloud Function comes in and gives me 1,000 more gold pieces before this transaction is complete, well, hey, guess what? Our database is going to notice that. And it's going to retry the entire process and make sure that extra gold doesn't get lost. Now, there are a few additional rules and guidelines about transactions that you should know about. First, you have to perform all of your reads before you can do any writes. And on the mobile and web clients, these have to be individual document reads, not queries. Second, because your client might retry your transaction multiple times, make sure there aren't any side effects in your transaction code. Don't increment any variables in the middle of your transaction or show pop-up dialogs or anything like that. Do all of that work in your completion handler. Third, because your transaction will basically have to rerun itself anytime any one of these documents changes, don't go overboard with the number of documents you're trying to fit into a single transaction. Basically, you want to involve as many documents into a transaction as makes sense logically. But don't try to add in other unrelated documents into the same transaction, thinking that you're being super efficient. You're not. Fourth, transactions will fail if you're offline, which kind of makes sense, if you think about it. By writing a transaction, you're saying, hey, this write is totally dependent on having up-to-date information from the database. And that's probably not going to be the case when you're offline. Finally, you can only write to 500 documents at a time with a transaction. And this applies both to transactions and to batched writes. So what are good examples of transactions? Well, it's funny, a few weeks ago, I would have said that anytime you want to increment or decrement a value, you'd need to do that in a transaction, because by definition, incrementing a value basically requires that you have the most up-to-date data. But the Cloud Firestore team just recently released the ability to do exactly that-- numerically increment or decrement a value in the database without having to resort to a transaction. That'll make your life a lot easier. And I recommend checking out the documentation for more info on these fun new functions. So looking at our MMORPG, if I only wanted to reward a player some gold coins for completing a quest, well, that's something I could now possibly do, using this fancy, new numeric add function. On the other hand, if you need to perform any other logic around these changes, that still might need to be done in a transaction. For example, if my user wants to spend some of that gold, well, I'd need to put that into a transaction, partly because spending gold while simultaneously adding the item to my inventory is probably something you'd need a transaction for, but also because I need some logic to make sure my user isn't spending more money than they have. And that's not something I can do within the numeric add function. It will happily let me have negative gold. And frankly, any other situation where you want to transfer data between documents, you'll probably want to put that into your transaction. The example everyone always uses for a transaction is like when you've got a banking database and you've got a customer looking to transfer money from one account to another. You can kind of see why this would be important to handle atomically. OK. So up until now, I have generally been focused on transactions from the perspective of client libraries-- mobile devices, web pages, and the like. But you can also write transactions from the server. And yes, this also includes Cloud Functions. Now, in general, these server side transactions work very similar to how the client libraries work, but with two big differences. First, because your typical server doesn't go through a tunnel or go into airplane mode or run out of batteries, we do the more traditional document locking technique instead of this optimistic concurrency model. Second, we are able to run full-blown queries during the read part of a transaction and not just read individual documents. And that's often useful for cases where you need to handle de-normalized data. So let's see if we can apply what we've learned to my restaurant review app. Are there any cases where I'd want to use either a transaction or a batch write? Well, one good example might be updating our average score for a restaurant. Let's say that I'm keeping track of my average score for each restaurant by storing a running average and a total number of reviews for that restaurant. When somebody adds a new review, I have a Cloud Function that would grab this average, the total number of reviews, and then calculate the new values based on our incoming review. Well, with both these operations, it's important that, A, I have the latest and greatest data, and, B, I don't have multiple processes updating these values at the same time. So this seems like a pretty clear opportunity to use a transaction. A second, maybe slightly contrived, example is, let's say San Francisco, in a bold new marketing move, officially decides to change its name to San Funcisco. Come on, guys. This is my greatest branding idea since Funyuns and Funfetti. What do you mean, "I'm a fun trick pony?" How would I change that city name for every restaurant listed in the database? Well, this one's interesting. If I were using the server SDKs, I could create a transaction where I would perform a query for all restaurants where city equals San Francisco. Then I could iterate through the entire results of the query and build out a bunch of write statements to change the city name in each of these documents. And then I could get everything done in one nice transaction. But there are two big concerns here. First, remember that transactions are limited to 500 document writes total. And San Francisco probably has a lot more than 500 restaurants. So I would need to run this transaction multiple times. And I could totally do that. I'd just keep track of how many documents I get back in my initial query, and I'd just kind of repeat the transaction until this number is zero. Now, yes, I do lose some of the benefits of a single transaction by breaking this up into several operations. Specifically, there will be a small amount of time where a client could see half of my restaurants in San Francisco and the other half in San Funcisco. But you know what? I think that's OK in my specific case. And I'll be honest, anytime you're migrating a huge number of documents like this, you're probably going to need to do it in smaller chunks. You can't migrate your entire database all at once. The second issue is that this transaction is going to lock down a large number of documents at once. Remember, server side transactions go with the more traditional locking strategy. And locking so many documents down at once might slow down writes to my database if I'm not careful. Now, I can help address this by adding a limit to my query. If I were to limit my query results to, say, 50 items, then I'd be locking down fewer documents at once. But I would be running this transaction many more times. And actually, therefore, it might take a few more milliseconds to completely migrate over all of my documents. It's a bit of a trade-off there. You can decide what makes most sense for you. So there you go. Hopefully, this gives you a better understanding of transactions and batch writes and when you'd want to use them, which is probably more often than you originally thought. So give your app another look and see if you find any cases where you might want to convert a normal update into a transaction. Once you get the hang of writing a couple of these, it certainly gets easier. So thanks for watching. I will see you soon on another episode of "Get to Know Cloud Firestore." But if you'll excuse me, that guy still has my TV, and I'm going to get it back. Hey. Hey, you there. I need that TV. For watching TV on. Well, well, yeah, I have one in my living room, but I don't have one in my bathroom. [MUSIC PLAYING]
Info
Channel: Firebase
Views: 68,425
Rating: 4.9605265 out of 5
Keywords: How do transactions work, Cloud Firestore, transactions, avoid race conditions, avoid strange inconsistencies, database application, database, what is a transaction, how do transactions works, client side transaction, server side transaction, Cloud Firestore transactions, batch writes, Get to Know Cloud Firestore, Firebase, Firebase developers, Firebase devs, GDS: Yes;
Id: dOVSr0OsAoU
Channel Id: undefined
Length: 16min 9sec (969 seconds)
Published: Wed Apr 17 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.