TODD KERPELMAN:
I'm Todd Kerpelman, and this is a five
pound gummy bear. Now, I'm going to eat
this entire confectionery to make a point. You see, this giant gummy
bear is a lot like your data. You might have a lot
of it, but chances are, your users only need
a small portion of it to satisfy their needs. And by giving them more data
than they actually need, you could create problems,
either for yourself or for them. But I can tell you're
not convinced yet, so let's keep going. [MUSIC PLAYING] This was a terrible idea. [MUSIC PLAYING] So let's dive into
pagination-- or, if you're not into technical jargon, how to
split your database results into lots of little
chunks so that you don't fetch all of your data at once. Now, if you've been
following along with our previous
episodes, you probably have a good idea of why
it's important to paginate your data. I mean, my restaurant
review app might have 50,000 restaurants
all located in Tokyo, but if my user is doing a search
for the best tempura in town, I probably don't want to send
overall 50,000 documents. For one thing, that's going
to kill their data plan faster than you can say overage
charges may apply, and it's going to be costly to
you, the database developer. Remember that you get
charged per document read, so if you send over
50,000 restaurants and it turns out your user never
scans past the first 20, well, hey, guess what
you're still getting charged for those other
49,980 database reads. So a much better
solution here would be to send your user the
first, say, 20 restaurants, then send over another 20 when
they indicate they need more, either when they explicitly
click a next page button or when you've noticed
that they've scrolled down to the bottom of
their current results from one of those, like,
infinitely scrolling table dealies. So let's look at how we might
do this in Cloud Firestore. Now, I think the one
important conceptual thing to understand when it
comes to pagination is that you're not saying
to Cloud Firestore, hey, remember that query
I asked for earlier? Well, now send me
the next 20 results. Instead, it's really
more like you're creating a bunch of different
and essentially unrelated queries, but each
one of them just so happens to pick up exactly where
the previous query left off. Now, luckily, the
Firestore libraries have several
methods in them that makes it easy to create
these unrelated but basically sequential queries, so let's
take a look at that now. Suppose my user is searching
for the best tempura restaurants in Tokyo. Well, I'll probably start with
a query that looks a little like this in the client. We get our tempura restaurants
in Tokyo and sort by rating. Now, this query will probably
fetch over 1,000 documents. So let's tackle
the easy part first of limiting our
initial results here. This is essentially done
by adding on a limit to your query--
something like this. And there you go. Just like that, I am
already saving tons of data by only loading up 20
documents for that first query. But now, what
happens when my user decides they want to see
the next 20 restaurants? Now, as I said earlier, I can't
tell Cloud Firestore, OK, now give me the next 20 records
from this exact query-- at least not using
the client libraries. So what we're going
to need to do instead is basically build a new query. It's going to look
almost exactly like the previous query,
where we're fetching tempura restaurants in Tokyo
sorted by rating, but we're going to ask Cloud
Firestore to start at exactly the right place so that we
happen to fetch the next 20 records in our query. So we can start by generating
a new query from scratch, like so, but then add
a start after method. Now, with a start
after method, I can pass in an array of values
to use where, if they match up with the fields in my
query in the right order, Cloud Firestore will
grab the next document from the appropriate
spot in the index. Now, this call would work
for fetching the next 20 documents in my
query, but there are two ways I can make this a
whole lot easier to implement. For starters, regenerating a
query from scratch like this? It can be kind of a
pain, particularly if it's a user-generated
query and you're relying on some UI elements
that are no longer on screen, and having to remember
all that information to regenerate a
query from scratch every time is kind of a hassle. So instead, you can just
take your existing query and simply add on a
start after method. This will generate a copy
of your original query with the new start after
values, and then you can assign that
to a new variable or simply overwrite
your existing query. And yes, this works even
if your old query already had an existing start
after parameter. So now all I have to do is
keep track of my current query in a property or
field, then call start after to fetch new
batches, and I am on my way to easier pagination. OK, now, as for the
second improvement, see this array of values? It's a little tedious
to have to track this for every different
query I'm making. It's also a little error-prone. Like, if I were to get the
elements in this array wrong, the SDK would throw
a fatal error on me or just give me some
very weird results. But there's a bigger
problem, which is that if I've got 1,000
tempura restaurants in Tokyo, it's very likely that
a whole bunch of them are going to have an
average rating of 4.9. So asking to search for the
document right after Tokyo, tempura, 4.9, might skip over a
bunch of 4.9-rated restaurants that just weren't included
in that first batch. So instead, what the
Firestore library lets you do is pass in an
existing document as the start after parameter instead. Now, Cloud Firestore will
analyze that document and figure out
exactly what values it should be searching for. And it's smart enough that
if it sees multiple documents with the exact same values,
it will start with the one with the document ID that's
right after the document you specified. So you don't need to
worry about accidentally skipping documents. And as you can
see, the nice thing here is that I no longer
need to know anything about the exact
query I'm running. If my user is searching for the
top tempura restaurant in Tokyo or the cheapest Italian
restaurants in Boston, I can use this exact same
line of code and everything just works. So the idea here is
I can run my query, get back the next batch
of data that I need, then use the last document from
that previous batch of data to fetch the next batch
and so on and so forth. And so I can use that
technique to populate my next screen of values,
or add on more rows to my table view and my
infinite scrolling table, or what have you. So pagination is great, right? But it's not perfect. When you run separate
fetches like this, there is some opportunity
for weirdness. Specifically, if you
have a collection where documents are constantly
going to be moving around in your index or you've got
data being inserted and deleted willy-nilly, you could end
up with some odd-edge cases. Some documents might
not be retrieved at all if they were to move
from one potential batch to a previous one in
between your fetches, and other documents
might be retrieved twice if they happen to
move the other way. In practice, most
of this weirdness tends not to be
too big of a deal. But of course, it
depends on your app and just how much you expect
your data to be moving around. Also, pagination can
be a little tricky if you're trying to use
it while also updating your data in real time. Let's go back to our
infinitely scrolling table here and think about how we could use
it alongside our real time data listeners. If we're trying to be
responsible app developers, we might want to deactivate
our old listeners and only keep one listener
active for the most recent batch of data. That's awfully nice
and responsible of us, but what's going to happen is
that these latest results will update in real time and none
of the earlier ones will. And that could be
a weird experience if your user starts scrolling
back up to earlier data and sees data that's sort
of stale or no longer updating real time. Although in something
like a chat app, some variation of this
might work just fine, because old chat messages,
they tend not to change. So fetching only the most
recent batch in real time might be the exact
right thing to do. Another option? We could keep adding new
listeners for every new batch instead of replacing
the old one. This means that if our
table has 120 rows, we would basically have
six listeners set up, each one responsible for
a batch of 20 documents. That's nice in that
it'll generally keep all your
documents up to date, but you're going to have to keep
track of all these listeners and make sure you're updating
the right set of data when they fire. And unfortunately,
inserts and deletes are going to be
painful here, right? If a new document is
inserted at, like, slot five, well, that means not only is
this first batch of results different, but the start
value for the second query has changed, so
you're going to need to change that, which
changes the start value for the next query,
and so on and so forth all the way down. And you know, those
extra listeners probably means a little bit
more overhead for your app. Now, a third option would be to
actually not paginate at all, but just rerun
the original query to start from the beginning and
just keep increasing the limit so we get more and more
results every time. This will work, and it's
probably the easiest solution that accurately handles inserts,
deletes, and documents moving around in your
collection, but it also means that every time you
request a new batch of data, you're basically requesting
all the old data you retrieved previously, and you get charged
for all those reads, which kind of defeats the whole
purpose for adding pagination in the first place. Still, this isn't
a terrible option if you want to keep
the real-timey-ness and don't think
you're going to go past, say, three or four pages. Now, all these options
have pros and cons, and they do tend to make
your code more complicated. So for that reason,
I often prefer going with the
plain old one time get call when you
start paginating your data in an infinite
table like this, but obviously, that
depends on how real-timey your users expect
your app to be. Like, in my restaurant
review app, for instance, I actually think people
would be somewhat weirded out if they saw restaurant results
flipping around in real time. So I think I'm better
off using some simple one time fetch calls and not
using real time fetch. On the other hand, if I had,
like, a stock market trading app, then it might
be worth the trouble adding in some
real time listeners while still paginating my data. But again, think about
your user experience and what they would
realistically expect. OK. So now that this is all
clear, let me reveal one tiny little white lie I
might have told you earlier. Turns out that with the server
libraries, you can kind of say, hey, give me rows 40 to
60 of a specific query. It's done by using this offset
method to whatever query you're building. And while this does
work in theory, it's generally not
a great option, because it turns out that when
you call this offset method in the server libraries, you
still get billed for all those reads that you're offsetting. So this call here will
charge me for 60 reads. So you're generally better
off using the start at or start after methods
if at all possible. Those will only get billed for
the actual reads you perform and none of the
ones you skipped. So there you go, folks. I'm hoping this was enough
of a head start for you to start breaking
up your data when you expect to get a lot of it. Not only will it save
you database charges, but your users will
thank you, too. Whew. You know, this episode
made me a little hungry. I think it's time
for a snack break. Don't you? Yeah. [MUSIC PLAYING] I've learned nothing! Ooh, there's another one. [MUSIC PLAYING]