[MUSIC PLAYING] FRED WULFF: My
name is Fred Wulff. This is Arpita. And we are from Streak, which
is a startup of about 40 people based here in San Francisco. And yeah, this
talk, we're really talking about two things. One is this host of managed
databases on Google Cloud. And the second is how
we're building our business on top of them. And so it's probably
helpful to start with what our business,
what our product is. So basically, Streak is
an organizational tool on top of Gmail. So the way this works
behind the scenes is we ship a Chrome
extension that just modifies the Gmail UI
to add in organization on top of it. We have what's called
a pipeline, which, if you're in sales, might
be a list of your sales deals and the stage they're
in, whether they're leads, whether they're closed,
that kind of thing. If you're in support, it
might be support tickets. People use this for hiring,
business development. Basically, the key thing
here is it's organization on top of email. And why is that important? So there's a bunch of different
tools within your organization for making organization better. There's things like
Hangouts Chat or Slack. There's things like
project management tools. But once you get outside
of an organization, basically everybody uses email,
just because everybody has it. And so the key things
that Streak adds-- so we have what's
called a pipeline. And you add boxes,
which might be sales deals or support tickets. And in addition to just
organizing your email-- so when you look at your
emails in your inbox, you can see which deals
or support tickets they're associated with. We can also add
structured metadata, so things like what stage the
deal's in, who is involved, who's the point of
contact, how long it's been since the last follow-up,
and all of that kind of stuff. So we're mostly talking
about technology here rather than a product
demo, but I just wanted to give a motivating
example of what we're storing in these databases. Cool. And so, here, we're going to
be talking about our database journey. And just to give you a
little bit of preview, situate everything, we started
off on the Cloud Datastore. We're going to be talking
about Cloud Firestore, which is Datastore's
spiritual successor. And then we're going to focus
in on one particular problem that we call the
email metadata problem and how we're building some
new product features that caused us to reach for Cloud
Spanner and Cloud Bigtable, some of the newer host
of database technologies. Cool. So now, I want to
give it over to Arpita to chat about our use of Cloud
Datastore and Cloud Firestore. ARPITA SHRIVASTAVA: All
right, so Cloud Datastore-- the way we're going to be
talking about the databases is just give you a little
bit of an introduction why we chose that database
service, our takeaways from using that service, and
how our data needs evolved to want a newer solution. So Datastore-- it's a
NoSQL document store. Your data is organized into
documents called entities, which are basically
objects that are accessed via a key or an index value. And similar entities
are grouped as kinds, which is like a table
in a relational model. Entities have fields called
properties, which are basically the values-- you get two inbuilt indexes
on each of the properties for an entity. There are no joins or aggregate
queries for Datastore. There is the GQL, which is a
query language for Datastore, but complex joints and aggregate
operations are not supported-- but simple filtering
is possible. Why we chose it? So back when Streak was started,
it was started on App Engine. So the only database
offering at the time was-- on Google Cloud-- was
Datastore, so it made sense to collocate our database
with our server stack on Google Cloud. But even so, we think
it's an excellent choice for if you're starting out
with a new product or service, and that is because it gives
you a lot of flexibility to evolve your schema. And what that means
with Datastore is, with entities, as I said,
they have different properties. You can have-- the same
properties across entities can be completely different. The number of properties,
even the data types for the same properties,
can be different, so it gives you a lot of freedom
to just build your features and evolve your
schema as you go. Of course, the other reason
we love Datastore is it's a managed service. NoOps. It autoscales, autoreplicates. Google's SREs
handle availability. You don't have to worry about
provisioning capacity up front, and you really just
throw data at it, and it would just grow
with your service. And so back then, there
was one candidate, and over 10 years later,
we have about 15 terabytes of data on Datastore. All our application
data is on it, and we found it to perform
predictably and have predictable kind of
challenges and solutions. So working with Datastore. So some concepts
that we found are key to work most
effectively with Datastore. One is entity groups. So as I was saying, your data
is organized into entities. And multiple entities, if they
belong to one entity group-- you can have a transaction
across multiple entities. So you, basically, want to have
your most highly correlated data in one entity group,
so it can be a user profile, and so on. At Streak, we have the
smallest unit of work is a box. So a box can have
different threads of emails, notes, comments, and
a lot of other custom fields. And so boxes can be
put into pipelines, which are shared across
teams, and the teams belong to an organization. So the way you design
your entity groups will have consequences on
throughput and consistency. So for our case, if you
want to update boxes, we have our entity group set
to organization level, which means we have a lower
number of entity groups. What that does is-- so entity groups are also
units of consistency, which means a query
against a entity group will always give you
strongly consistent results. Also, with a transaction,
they will always-- your asset
transaction properties will be applied when
you're using entity groups. They're also stored--
all entities-- within an entity group. Google stores them
physically close together, so you have minimal IoT. Reads are really fast. So in our example with
boxes, when the entity group is set to an organization
level, with one query against an
entity group, you get a high volume of entities. But there are limits on how many
entity groups, how many writes you can have per entity group
per second, which is set to you can only have one write,
currently, in Datastore per second per entity group. And I'll talk about
the consequences of that in a second. The other things that we found
useful working with Datastore is we use Objectify, which is a
Java data access API that makes using Datastore really easy. The example at the bottom,
where you have annotations to work with entities,
indexes, and so on, has been really useful. We use JSON and
Jackson annotations for serialization
and digitalization. Then some of the challenges. As I was saying, the
write throughput rate for entity groups is, in
Datastore, set to 1 per second. So ideally, you'd want to have
really small entity groups so that you are not writing
more than once per second. In our case,
organizations-- pipelines can have hundreds and
thousands of boxes. So for example, if
you're trying to update multiple boxes, in our
case, in a pipeline, then it can be tens of thousands
of writes against one entity group, so it can cause
a lot of contention. And again, something to
keep in mind, entity group relationships are permanent. You cannot change them on this. You delete all the
entities and recreate them in a new entity group. So that is a
challenge contention. As we're growing, we have
larger and larger customers. That has been a big challenge. Again, boxes--
the way our schema has evolved, there's
a lot of child objects that we have with
boxes now that they all belong to the same entity
group, but transactions are limited to 500
entities per transaction. So when you have
cascading writes, that also becomes kind of
a performance bottleneck. Also, another limit
to keep in mind-- transactions are limited,
only, to 25 entity groups. So again, you want to
figure out your key design, so that you're finding
the right balance between your consistency
and throughput. And then client-side views. So as you saw in
the first graphic, we present this spreadsheet
layout to our users, and we offer complex filtering,
grouping, and sorting. And not being able to do that-- Datastore not providing
that out of the box-- we have to shift
this responsibility to the application code,
which, of course, increases the complexity and debugging
complexity and so on. So not having joints-- as we grow larger, we really
miss having all those features supported out of the
box, which would be part of a relational database. Cloud Firestore. Firestore is the latest
version of Datastore. There's two modes here-- Realtime and Datastore mode. Datastore mode-- Firestore
in Datastore mode is backwards compatible
with Cloud Datastore, but it doesn't have
some of the newer features that Realtime Firestore
has, like real-time updates. There's rich mobiling
web client libraries. So Firestore is supposed
to be drop-in replacement for Datastore, and the
migration to Firestore is supposed to be automatic
without any downtime, so we're really
excited about that. Basically, Firestore
has a Spanner backend, with the Datastore
features in the front end. So you get a strongly
consistent layer in the backend, and this eliminates some
of our biggest pain points with Datastore. One, we have strongly
consistent results throughout, and the entity
group restrictions that I talked about. 25 entity groups per transaction
and the write throughput limit of 1 per second-- we don't have that
with Firestore anymore. We're hoping to
upgrade soon and-- yeah. All right. So we talked about how we felt
the need for a fully relational model for our application
data, and there's a new feature that
we worked on, and we call it the Metadata Problem. So I'm sure all of you had this
problem where somebody forgot to hit Reply All in
an email, and then you got a forward of that email. And then now you have a lot
of redundant emails and emails just in reverse
chronological order, and it's just not very elegant. And also, there can be problems
with permissions and sensitive email that you were not
supposed to receive ends up in your inbox. So strict solution to that
is having a unified view of emails across a team. So in this example, you
have the top three emails were not addressed to
me but, as you can see, they're shared from my
co-worker's inboxes. So you get kind of
this elegant view of the email, which is
controlled by a permission model. So this looks really
good in the UI. But on the backend, it's kind
of a complex graph traversal problem. So we want to answer questions
like, which co-workers of mine have interacted with this
particular prospect over email, or, for this email, what
are the other emails on this thread that
are not in my inbox but in my co-workers' inboxes? So the solution to that-- we came up with
a graph solution, where your nodes are either
a person or an email, an organization or a
domain, or an email message. And the email
messages are connected if they're on the same thread. So if I emailed you, and
then you forwarded the email to somebody else, we
can trace those emails and give you a unified view. And then two emails
are also connected if they are the same
email, but they're the part of different people's inboxes. They're connected
by something that is called an RFC ID with Gmail-- so same message but
different inboxes. So this, obviously, helps
us to answer the questions and create a unified view that
can be shared across a team. To talk about how we approach
this problem solving, this, and the relational model needs-- uh, Fred? FRED WULFF: Yeah, OK. Real quick before I go on to our
solution, just want to-- yeah, building on what
Arpita was saying, so this is like a pretty
challenging graph database problem. Like you have to
really understand this course of an email
thread or a course of like connected email threads
across different people's inboxes, your organization. You might have to hit
hundreds of different inboxes. I'm sure you've all been on
these like horrible Reply All threads that have
not just hundreds of people in the company but,
like, thousands of emails when you count all the email chains. And so, there's kind of
this graph database problem, but it's also pretty challenging
for a couple of reasons, technically. So one is you have to be able to
do all of these edge traversals quickly. There may be a lot
of emails or sort of nodes from the graph problem. And as Arpita
mentioned, permissions are a huge deal, right? Email can be really sensitive. You can have a
password reset emails. You can have a manager
giving targeted feedback to a report that might not
be generally consumable. There are just
like all these ways they can go horribly
wrong if you don't have a strong permissions model. And kind of what this means is
we have to traverse this graph, and there's pretty
big challenges if we try and precompute
a lot of this stuff. Normally, the
distribute system's way out is you just like normalize
a bunch of stuff in advance. And in this case, we
both really needed to compute it at read
time to make sure that we're getting permissions
right, even if somebody just changed some permissions. And this is all happening
within the context of viewing emails in Gmail. And you know, the
whole premise of Gmail is that it's really fast. People expect their emails
to just like load quickly, even if they're searching
their whole inbox. And we're expanding
that to letting you search parts of not just
your inbox, but everybody in your company's
inbox possibly. So challenging
technical problem. Looking at the backend,
this is actually across our entire user base. We have about 30 terabytes
of just email metadata. So that's like we're not looking
at the contents of email-- we don't really want to-- but just looking at like
who the email's from, who it's to, the
sort of information we need to compute these edges. And so we evaluated a bunch of
different database products. There's a lot of
graph-specific database products on the market. But we really-- all
those being equal, we wanted something that
had this property where it was managed by Google. We have a very small
operational team. We basically have, roughly,
three infrastructure engineers, depending on how you slice
different people's time. And being able to offload
some of that like availability and charting work to
Google SRE really seemed like a good trade-off to us. So we looked at a bunch
of different options, and we ended up-- this was our first use
case for Cloud Spanner. So Cloud Spanner-- I think it's about a year old at
this point, maybe a little bit more. But the key things
about it, looking at just the product
in general, it is a globally consistent
relational database. And so what does that mean? It supports the SQL language
that you know and possibly love or hate, depending on
what you've done with it. And it has what I affectionately
call the magic clock thing. And the magic clock is basically
for years and years and years, there was always this
introductory paragraph in every distributed systems
paper, which basically said something like, we know
clocks are unreliable and go back in time and have jumps,
and the network's unreliable and may have horrible latency. And everybody just sort
of took it as a premise that you couldn't
rely on clocks, and they were trying
to lead you astray. And so the Spanner-- Whitepaper came
out a while back, and now the Cloud
Spanner product is built on this
idea of, hey, what if instead of accepting
this as a premise, we just made the clocks
roughly reliable? And there's a bunch
of effort, in terms of installing atomic clocks
in all the data centers. There's a bunch of effort,
in terms of like setting up the networking. And there's a bunch
of effort, in terms of the statistics of like
setting error bars on timing. But the outcome of it is that
Cloud Spanner says on the tin, and it's actually
true, like we're using it in [? anger. ?]
It is globally consistent. So basically, this means
that going back to the stuff that Arpita chatted
about with Datastore, you don't have to preselect
particular entity groups. You don't have to say,
these are the things that I'd like to do
transactions on beforehand. You just-- when you go to
actually update entities, it looks at what
entities you've queried. And you only have
transactional conflicts if you are actually modifying
things that were also queried by other queries. So it's like just
a set of stuff that are actual real
transaction conflicts So, you know, this is great. Globally consistent SQL, it
has really great transactions. And, you know,
some of you who are sort of thinking this
through might say, but wait-- email itself isn't consistent. Why do you care
about consistency? I might send you an email,
and, all the time, we see emails taking 10 minutes,
30 minutes to get delivered. Why do we care about the
database being consistent? And the answer is it
goes back to that sort of operational ease. It means that when we
have our syncing logic, we can sync the data. We can build these indexes
to power these graph queries. And we don't have to
worry that the sync gets partially committed. We can just say,
hey, all of this should be committed
in the transaction. And it's either all
committed or not committed. And this just means that
our indexing pipeline that's processing a lot of
email was basically written by-- it basically took one engineer
about half-time working on it for a month or so. It was like, really, for the
scale of distributed system. When we're planning out
the work, we're like, hey, we probably need to
have at least two people on this for a quarter. And just by relying on the
Spanner transactionality to deal with a lot
of the edge cases, we were able to just get it out
and get it in customers' hands much more quickly. And so building on
that operational ease, there's a lot of other sort
of really nice features in that area around Spanner. So in a previous life, I managed
a couple of on-premise SQL databases and, sort of, all of
those operational things that just sort of filled
DBAs with terror-- things like adding and
removing columns, adding or removing indexes. Both, they're like
pretty easy to do. And also, there's
a pretty good sort of Task Planner that makes
sure that, even if you add and remove a
bunch of columns, it doesn't interfere with your
actual transactional workload. So that's all pretty
nice things to have, even if our underlying
data isn't necessarily super dependent on the
strong consistency. So let's-- you know, I've been
singing its praises generally. Let's talk about how it
actually works in practice. So this is in the Cloud
Console, the schema for one of our tables in production. And this table is what
we call MessageMailboxes. And so that's-- if you remember
the sort of graph nodes, there were some things that
are dependent on the message itself, which Gmail thread it's
in, which other messages have the same sort of message ID
header to figure it out between inboxes. But there's also stuff that's
dependent on who the message is addressed to or who it's
from or who's on the CC list. And MessageMailboxes has one
row per person on a message. So if I send a message
to you, the version of the message in my inbox
would have two MessageMailboxes rows-- one for me and one for you. And key things here-- so you see at the top,
it says that this table is interleaved in messages. Interleaved tables
is this feature in Cloud Spanner that
lets you sort of have really great performance
for stuff that should be co-located together. And so here, most of
the time that we're querying who's on a
particular message, we also care about what
message they're on. And so here, we've interleaved
MessageMailboxes in messages. And that means that if I want
to do an index filter on all the messages that correspond
to a particular email address-- that a particular
email address was part of-- so let's say
in the Streak product, we want to be able to show
all of your communication with a particular sales lead
across the organization, I can query on mailbox
email and then do a very quick join,
very performance join, with the Messages table
to get things like, what was the subject
of that email or like these other properties
that, if you were doing a more traditional
distributed system, you might have to denormalize
and waste a lot of space to sort of get performance. And so, we use
interleaved rows a bunch. One other interesting thing-- you'll see these sort of weird
Is BCC or Is CC Boolean field at the bottom. And so that's
basically saying, hey, is this particular email
address in the BCC line? Is it in the CC line? Like how are they
connected to this message? And the interesting
thing there-- those rows actually
have two values, which you might
expect from a Boolean. But those values are
actually true or null. And so like, why
are we doing this? So Spanner actually
has this ability to have what are called
Null Filtered indexes. So if you're coming
from like PostgreSQL, or I'm sure Microsoft SQL
Server has a similar thing, you might be familiar with this
concept of partial indexes. And that means that we can
create an index that's just-- is this person on the From line
or are they on the To line? And it doesn't actually use
up any space for the Message Mailbox rows that don't have
the corresponding Boolean set to true rather than null. It filters out all the
null entries, so yeah. So lessons learned
with Cloud Spanner. Setting up your schema
is super important, particularly if you're
putting a lot of data into it. You know, it's pretty
easy to build and remove secondary indexes,
but you are stuck with your primary
key index, like you have to rewrite rows if you're
changing your primary key. It's not magic like that. And it's pretty important
to just sort of play around with the schema
to figure it out. So what challenges
have we run into? So far I've been
like singing praises. And yeah, you know, no
plan survives contact with the enemy or the
database query planner. So one challenge we
ran into is actually Spanner is too smart
for its own good-- for our use case. I guess your use case may vary. So Spanner has a lot
of things or smarts in the query planner
to balance running a lot of small
transactional queries. So this is like the
kind of thing that is probably the 90% use case. It does things like if you
run a really big query, it assumes that this must
be like an analytical query or maybe like a
backend bulk task. And it will
deprioritize that query and make sure that that query
takes a long time to run, so it doesn't get in the way
of your transactional queries. But going back to our use
case, as we mentioned, you've all been on these
horrible, huge Reply All threads. And a lot of times, you know,
fortunately or unfortunately, those threads are the most
important, because they're the ones where the collaboration
story really gets tricky, and you need to be able to share
some stuff but not other stuff. And we really need
those to be answerable in a short enough amount
of time that it doesn't degrade the Gmail experience. And so what we were
running into was when we did the
queries to span out-- to fan out on those
really big threads, it was not overwhelming
anything else, but it was taking a long time. And there wasn't really a great
way to tell the query planner, hey, this query is really
important, go ahead and do it. And we tried some experiments
with fanning out and sending a bunch of queries. We tried some
experiments with doing like a little bit of caching
in like Redis or something like that, using
Cloud Memorystore, but we just didn't find a thing
that really delivered the user experience that we wanted. So that led us to
our next database that we're going to chat about-- Cloud Bigtable. So Cloud Bigtable
is actually based on Bigtable, which is
now, at this point, a very well-established
Google technology. It sort of has a lot
less functionality than Clouds Spanner,
Cloud Datastore. It's really just
a key value store or a sorted key-value
store, let's say. And basically, that
means that there's really two queries you can do to it. You can query for this
particular primary key-- give me the value. Or you can do lexicographical--
however you pronounce that word-- range queries and
basically say, hey, give me all values for keys
between A and BE or AA and B. And so there, it's like a
pretty limited query interface, but the trade-off you get here
is that it's really, really simple and really, really fast. And so there,
basically each node, you add to Cloud Bigtable. And it's another
manage database, so you can just add and remove
nodes to your heart's desire, and you pay for more nodes,
get more performance. When it's nighttime,
you can just like decrease the
number of nodes and not pay for what
you're not using. But basically, each
node you add gives you 10,000 reads or writes a
second, which is a lot of reads or writes a second. And basically, the
queries just look like either put some
data in for this key, or read some data for
this key or set of keys. There's currently two modes. The mode that we use for this-- just because for our
use case, it's not great if the feature is
unavailable, but it's not like business ending. People can still get
at their Streak box data, which is stored
safely in Cloud Datastore. But there's one mode that is
single-zone availability-- has the concept of a transaction
within a particular row. And then there's another mode
that is multi-zone replication. You forgo transactions. But even if one zone goes
down-- which is infrequent, but happens-- you can just access the
data in another zone, and then it'll
asynchronously replicate after the zone comes back up. So yeah. So Cloud Bigtable-- not a
great tool for every job but the best tool for some jobs. And we're actually
using this in production for querying this
email metadata index. And let me show you, sort
of, what that looks like. So here, we've got
the output of CBT-- is the Cloud Bigtable console. And this is our Messages table. And you can see,
there's this concept of a column family, which
is sort of a set of data that's stored together. And we have one column family
for each of those indexes that we need to
traverse the graph. And so there, for instance,
I was showing you the Message Mailboxes table, and so we've
got one column family that's sort of this index
for looking up Message Mailboxes by
the domain or by email. And then also,
messages are indexed by the message ID, the RFC
message ID, or the thread ID, which lets us do that
sort of graph traversal. And the cool thing here
is the Bigtable API is completely asynchronous. So we basically, when we're
doing this graph query, we can just have a queue
that's just as soon as information comes
back, sending out the next set of fetches. And that's all pipelined and can
be very much like sub-second. So each look up is
about 6 milliseconds, so you can do a lot of
lookups before somebody notices the delay. Other things here. This is very much sort
of build-it-yourself. So the model in Bigtable
is just a string of bytes to a string of bytes. And so, we had to figure
out our own encoding that does graphic encoding
the way that we want. There's a few
libraries out there. There's one called Orderly. There's one that's sort
of partially factored out of the HBase library. So there's some tooling
there, but you still need to very much figure
out your own schema and your own encoding. This was a bit more
developer-intensive, so definitely don't
recommend reaching for this from the get-go. But it is very, very
fast, and it really does what it says on the tin. So, you know, we're using Cloud
Bigtable for those lookups in production. 6 milliseconds. You can do a whole bunch
of reads all at once, and it's like
providing what we need. And so then, at
the end of the day, just wanted to summarize sort
of what we already talked about in a convenient flowchart. And so kind of
the first question I'd ask if I were reaching
for a new database is how developed is your
project schema product, however you want
to think about it. If you're still in that sort
of finding product market fit or experimenting
with the project, then I think either Cloud
Datastore or Cloud Firestore is probably where you want to go. Basically, there
are key things here. It's got very flexible schemas. You just like set
up the documents, and then you can index a
bunch of different ways after the fact. It's very easy to work with. If you're using Cloud
Firestore for a new project, you can actually use
that in Firestore mode, get most of the same thing--
oh, sorry, in real-time mode, get most of the same benefits
that we're talking about, and get being able to send stuff
directly to your mobile or web clients. So I think that if you're
still figuring that out, that's a great choice. Oh, also, Firestore
and Cloud Datastore just have a per-operation
building model, so if you're not
using it, you're only paying for storage,
which is like pretty cheap if you're just getting started. Whereas, Cloud Bigtable or Cloud
Spanner are a per-node pricing model, so you're paying
hourly for each node that you have set up. So if you're just
sort of experimenting, it's real cheap to get
started with like Cloud Firestore or Cloud Datastore. And then, yeah, on the
difference between those two, I sort of jokingly put,
is it 2019 or later? But yeah really, Cloud
Firestore Datastore mode is just Cloud
Datastore but better. They've announced that they're
going to move everybody to Cloud Firestore. So if you're starting
a new project, just start with Cloud Firestore. I have trouble envisioning
a case in which you actively want to go to Datastore if
you're starting a new project. So if you're more certain of
the needs for your project, if this is something
where either it's a new technology for an
established company or product, or if you're really, really
certain of your performance needs and you've tested it out
and have a really good idea-- then that's when you should
probably reach for a Cloud Bigtable or Cloud Spanner. Cloud Spanner is a lot
nicer to work with. It has data types,
it has SQL, it has these globally
consistent transactions. So generally, unless you know
you're going to need to really, really optimize those
key value lookups and build things, like this
horrible Reply All thread case, definitely go with
Cloud Spanner. We actually built out this
index entirely on Cloud-- or this metadata indexing
system entirely on Cloud Spanner to start with. I'm super happy that we
did, even though we ended up moving some of the
workload to Cloud Bigtable because it informed a
lot of our decisions, sort of gave us more
information on the use case. And then if you
just like really, really need that one specific
tool, then Cloud Bigtable is probably what you
should reach for. [MUSIC PLAYING]