TODD KERPELMAN:
So Cloud Firestore doesn't exist in a nice,
pristine world where your database requests march in
and out in an orderly fashion, like British people properly
queuing at their favorite chip shop. Instead, it's chaos. You've got requests
coming in from all sides, all trying to access or change
the same document all at once, sort of like Americans
on Black Friday when there are TVs on sale. And you have to make
sense of that chaos. Luckily for you, you have a very
powerful tool at your disposal to bring order to this chaos. And that's transactions and,
to a lesser extent, its cousin, batched writes. So if you want to avoid weird
race conditions in your code, these are the tools you're
going to want to master. Let's find out how
on today's episode. Hey, wait a minute. That was mine. I was going to own-- [MUSIC PLAYING] So let's talk
about transactions. Generally speaking, you're
going to use transactions in cases where you want to
perform a bunch of operations on the database,
but you kind of need them to be performed
all at once, meaning that I don't want the database
to change out from under me while I'm in the middle
of making my changes. And if I'm making
a bunch of changes, I don't want users
being able to read my set of documents in that
partially changed state. And I certainly don't want a set
of operations failing halfway through, leaving me in a weird
and inconsistent state forever. It's got to be all or nothing. For example, let's
say I'm changing a value that exists in a couple
of places in the database. This is something you're
going to be doing pretty often in our de-normalized
database world where you might have redundant
data in a couple of places. Like, let's say I've
got our literary app, where I've got a
collection of books and a collection of authors. And it just so happens that I
have the author's name recorded in a few places, both here
in the author document, but also here in
each of the books. Now, let's say we hear from
Charles Dickens' literary agent that he wants to re-brand
himself a little bit-- you know, to stay relevant. So I could try to
perform this changes, just like three
or four different write requests in a row. But if my app crashes or
my flaky network connection disconnects midway through
making these requests, I'll be in a state where some
of my database has the old value and some of my database
has the new value. And that could mess up my app
in all sorts of insidious ways. And heck, even if my
client doesn't die, if another client happens
to read-in these documents while I'm in the middle
of making these writes, we could end up in
the same situation. And what would happen if
another client or cloud function were also trying
to change my author's name at the same time? If all these writes were
happening one after the other, I could end up in a situation
where half the documents contain my changes and the
other half contain theirs. Or let's take another example. Say I've got a
role-playing game. And I'm trying to buy an
item from another player. Well, we'll need to remove
gold from my inventory and give it to the other player. But then we'll
also need to remove her plus-3 pen of
accounting or whatever and add it to my inventory. But what would happen
if this series of events got interrupted,
for some reason? Depending on when
or how it fails, I could end up with
no gold and no pen, or all the gold and a
pen, or even weirder cases where both of us have
a copy of the pen. And I'm sorry, but
I can't respect any MMO that doesn't obey the
laws of conservation of mass. Hey, buddy, where'd
you get all those pens? Are those really yours? Whoa, whoa, whoa,
don't write on me. That's a plus-3 pen. My stain remover's only plus-2. So all these cases can be
handled by transactions to ensure there's no funny
business going on here. Now, there are two types
of transactions you can perform in the database. The first one is a
much simpler version, known as a batched write. Basically, a batched write lets
you perform a bunch of writes. And these can be document
creations, deletions, alterations, what-have-you,
all at once in a batch. You tell the SDK what
changes you want to make. It tells the database. And the database will apply your
changes to all the documents that you've specified. If any part of your batched
write fails along the way, none of it will go through. The entire batch will be rolled
back to its original state. This is known as being atomic
in the database world, which is way less exciting than being
atomic in the superhero world, but still useful. The point here is
that you don't have to worry about your write
failing halfway through. And you don't have to worry
about somebody else's writes getting mixed up with yours. Sure, if there's
another client trying to make these changes
at the same time with their own
batched write, they could totally blow
away your changes when your write is done. But this weird hybrid
state where you end up with a mix of my changes and
their changes all at once-- that won't happen. The really nice part of
all this, by the way, is that batched writes are
generally easy for the database to perform. In fact, in many
ways, a batched write is actually more
efficient than trying to make a bunch of
individual write calls. And the implementation is
pretty straightforward, too. It's not much more
different than performing any other database write. So when would you
want to use these? Well, basically anytime you want
to change a bunch of related documents and you
don't really care about what the old value was. One case might be when you have
your de-normalized data living in multiple places
among your database and you want to change
them all at once. So looking back at our
author and book example, this could work. If we know we need to change
our author's name in just these places and our
new value isn't really reliant on the old value,
well, then we could send up these four writes in
one nice, little batch. But now, what about
our role-playing game where our two players
want to trade items? It seems like what
we could do here is create a batched
write where I get a plus-3 pen of
accounting and 100 less gold and my trading partner
gets 100 gold and no pen. But here's the problem-- this batched write
would only work if I know the copy of
my data is up to date. And it might not be. Maybe I don't have
real-time listeners set up. So I'm actually
working off of data that's, like, 10 minutes old. Even if I do have
real-time listeners set up, there are race
conditions that could happen where the data
that's on my client isn't quite up to date
with what's on the server. For example, let's say I
decide to send down this write to perform this trade. But in the meantime, some
other event in the game has awarded me 1,000 gold coins. Awesome, right? Well, it would be until my
trade comes along and sets my gold value
equal to 200 coins. And just like that, I'm
out 1,000 gold pieces. Or maybe my partner has already
traded away her plus-3 pen to somebody else. My trade function
doesn't know that yet. So I still get the plus-3 pen. And hey, guess what? My players have just
discovered an exploit where they can keep creating an
infinite number of plus-3 pens in the world. And don't think that there
aren't clever players out there who are looking for logic
issues just like this and chomping at the
bit to exploit them. So in this situation,
we're going to want to turn
to a transaction. A transaction is
like a batched write, but we tell the database that we
want to read in some documents first to make sure we're working
with the most up-to-date data. And we're going to assume
that these documents aren't going to change until the
transaction is complete. Now, the code for
creating a transaction, it looks a little
strange at first. And I encourage you to
check out the documentation for a sample or two. But once you've written
a couple of these things, you'll get more comfortable with
it, and it'll be second nature. So the way transactions
work is something like this. First, in your
transaction, your client is going to read-in a document
or two from the database. Next, your client
is probably going to perform a small
amount of logic, based on the data you
get back from that read. Third, you will then
tell the database exactly what changes you want
to make, based on this data. Then the database
will check and see, have any of these
relevant documents, either the ones I've read
or are about to write to, changed since I
started my transaction? Now, if nothing's
changed, it'll go ahead and commit all these
changes at once, which is an almost instantaneous step. But if something
has changed, we're going to go back to step 1. The client is going to
re-fetch those documents. And it's going to try the entire
transaction all over again. And we repeat this
process either until the transaction
finally succeeds or we've retried
it too many times and the transaction fails. Now, this process might
seem strange to you if you're coming
from other databases where a transaction
typically means that a portion of the
database is locked, and nobody can make any changes
until that transaction is complete. That strategy certainly
can work, too. And in fact, our server
libraries do work that way. And I'll talk about that
more in a little bit. But when you're dealing
with mobile clients that might suddenly lose battery
or drop network connections or go into airplane mode,
locking your database is often a bad idea. You don't want to
be in a situation where your database
is locked and waiting for some final bit of
code from the client that it's never going to get. Instead, we don't lock
anything and just retry the entire transaction
if the database sees that something has changed. Now, this is a strategy known as
optimistic concurrency control. It's based on the idea that most
of the time, our transaction will probably work just fine. So let's optimize for that case. And in the rare case
where it doesn't work out, we can go through the hassle
of redoing the entire thing. And if you think about it,
this kind of makes sense. The whole NoSQL model is based
on the theory that reads happen way more often than writes. And in fact, given
some of the limits on how often you can write
to the same document, that is probably
going to be true. So maybe my Black Friday
analogy wasn't so good. Maybe it's more like
a Black Friday sale where copies of cassette
tape players are on sale. I'm sure you might get a
few small bursts of interest from hipsters buying
them ironically, but it's not quite the
madhouse you were expecting. Yeah. Yeah, you should
probably diversify, dude. All right. So let's go back to
our trade example and think about how
we'd want to accomplish this with a transaction. I think, first off,
our transaction would request the documents
representing the inventory from both of our players. Next, it would
perform a little logic to make sure this trade
is still possible. Do I have enough gold
to make the trade? Does my partner
still have the pen? Then we'd perform a couple
of writes on our transaction, based on the idea that the data
we have here on our two players is completely up to date. That gets sent down to the
database, which would then do a sanity check to make
sure that nothing has changed since our first read. And then it would go ahead and
commit the changes all at once. So those weird edge
cases are avoided. Like I said, if some Cloud
Function comes in and gives me 1,000 more gold pieces before
this transaction is complete, well, hey, guess what? Our database is
going to notice that. And it's going to retry
the entire process and make sure that extra
gold doesn't get lost. Now, there are a few
additional rules and guidelines about transactions that
you should know about. First, you have to
perform all of your reads before you can do any writes. And on the mobile
and web clients, these have to be individual
document reads, not queries. Second, because your client
might retry your transaction multiple times, make sure
there aren't any side effects in your transaction code. Don't increment any variables in
the middle of your transaction or show pop-up dialogs
or anything like that. Do all of that work in
your completion handler. Third, because your
transaction will basically have to rerun itself anytime any
one of these documents changes, don't go overboard
with the number of documents you're trying to
fit into a single transaction. Basically, you want to
involve as many documents into a transaction as
makes sense logically. But don't try to add in
other unrelated documents into the same transaction,
thinking that you're being super efficient. You're not. Fourth, transactions will
fail if you're offline, which kind of makes sense,
if you think about it. By writing a transaction,
you're saying, hey, this write is totally
dependent on having up-to-date information
from the database. And that's probably
not going to be the case when you're offline. Finally, you can only
write to 500 documents at a time with a transaction. And this applies
both to transactions and to batched writes. So what are good
examples of transactions? Well, it's funny,
a few weeks ago, I would have said that anytime you
want to increment or decrement a value, you'd need to
do that in a transaction, because by definition,
incrementing a value basically requires that you have
the most up-to-date data. But the Cloud
Firestore team just recently released the
ability to do exactly that-- numerically increment
or decrement a value in the
database without having to resort to a transaction. That'll make your
life a lot easier. And I recommend checking out
the documentation for more info on these fun new functions. So looking at our MMORPG, if I
only wanted to reward a player some gold coins for
completing a quest, well, that's something
I could now possibly do, using this fancy,
new numeric add function. On the other hand, if you need
to perform any other logic around these changes,
that still might need to be done in a transaction. For example, if my user wants
to spend some of that gold, well, I'd need to put
that into a transaction, partly because spending gold
while simultaneously adding the item to my inventory
is probably something you'd need a transaction
for, but also because I need
some logic to make sure my user isn't spending
more money than they have. And that's not
something I can do within the numeric add function. It will happily let
me have negative gold. And frankly, any
other situation where you want to transfer
data between documents, you'll probably want to put
that into your transaction. The example everyone always
uses for a transaction is like when you've got a
banking database and you've got a customer looking
to transfer money from one account to another. You can kind of see
why this would be important to handle atomically. OK. So up until now,
I have generally been focused on transactions
from the perspective of client libraries-- mobile devices,
web pages, and the like. But you can also write
transactions from the server. And yes, this also
includes Cloud Functions. Now, in general, these
server side transactions work very similar to how
the client libraries work, but with two big differences. First, because
your typical server doesn't go through a tunnel
or go into airplane mode or run out of batteries, we do
the more traditional document locking technique instead of
this optimistic concurrency model. Second, we are able to
run full-blown queries during the read part
of a transaction and not just read
individual documents. And that's often
useful for cases where you need to handle
de-normalized data. So let's see if we can
apply what we've learned to my restaurant review app. Are there any cases
where I'd want to use either a transaction
or a batch write? Well, one good example might
be updating our average score for a restaurant. Let's say that I'm keeping
track of my average score for each restaurant
by storing a running average and a total number of
reviews for that restaurant. When somebody adds
a new review, I have a Cloud Function
that would grab this average, the total
number of reviews, and then calculate
the new values based on our incoming review. Well, with both
these operations, it's important that, A, I have
the latest and greatest data, and, B, I don't have multiple
processes updating these values at the same time. So this seems like a
pretty clear opportunity to use a transaction. A second, maybe slightly
contrived, example is, let's say San Francisco, in
a bold new marketing move, officially decides to change
its name to San Funcisco. Come on, guys. This is my greatest
branding idea since Funyuns and Funfetti. What do you mean, "I'm
a fun trick pony?" How would I change
that city name for every restaurant
listed in the database? Well, this one's interesting. If I were using the server SDKs,
I could create a transaction where I would perform a query
for all restaurants where city equals San Francisco. Then I could iterate through
the entire results of the query and build out a bunch
of write statements to change the city name in
each of these documents. And then I could get everything
done in one nice transaction. But there are two
big concerns here. First, remember that
transactions are limited to 500 document writes total. And San Francisco probably has
a lot more than 500 restaurants. So I would need to run this
transaction multiple times. And I could totally do that. I'd just keep track
of how many documents I get back in my initial
query, and I'd just kind of repeat the transaction
until this number is zero. Now, yes, I do lose
some of the benefits of a single transaction
by breaking this up into several operations. Specifically, there will
be a small amount of time where a client could see
half of my restaurants in San Francisco and the
other half in San Funcisco. But you know what? I think that's OK
in my specific case. And I'll be honest,
anytime you're migrating a huge number
of documents like this, you're probably going to need
to do it in smaller chunks. You can't migrate your
entire database all at once. The second issue is
that this transaction is going to lock down a large
number of documents at once. Remember, server
side transactions go with the more traditional
locking strategy. And locking so many documents
down at once might slow down writes to my database
if I'm not careful. Now, I can help address this
by adding a limit to my query. If I were to limit my query
results to, say, 50 items, then I'd be locking down
fewer documents at once. But I would be running this
transaction many more times. And actually, therefore,
it might take a few more milliseconds to
completely migrate over all of my documents. It's a bit of a trade-off there. You can decide what
makes most sense for you. So there you go. Hopefully, this gives you
a better understanding of transactions and
batch writes and when you'd want to use them,
which is probably more often than you originally thought. So give your app
another look and see if you find any
cases where you might want to convert a normal
update into a transaction. Once you get the hang of
writing a couple of these, it certainly gets easier. So thanks for watching. I will see you soon on another
episode of "Get to Know Cloud Firestore." But if you'll excuse me,
that guy still has my TV, and I'm going to get it back. Hey. Hey, you there. I need that TV. For watching TV on. Well, well, yeah, I have
one in my living room, but I don't have
one in my bathroom. [MUSIC PLAYING]