Spanner, Firestore, and Bigtable: Connecting the Dots (Cloud Next '19)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[MUSIC PLAYING] FRED WULFF: My name is Fred Wulff. This is Arpita. And we are from Streak, which is a startup of about 40 people based here in San Francisco. And yeah, this talk, we're really talking about two things. One is this host of managed databases on Google Cloud. And the second is how we're building our business on top of them. And so it's probably helpful to start with what our business, what our product is. So basically, Streak is an organizational tool on top of Gmail. So the way this works behind the scenes is we ship a Chrome extension that just modifies the Gmail UI to add in organization on top of it. We have what's called a pipeline, which, if you're in sales, might be a list of your sales deals and the stage they're in, whether they're leads, whether they're closed, that kind of thing. If you're in support, it might be support tickets. People use this for hiring, business development. Basically, the key thing here is it's organization on top of email. And why is that important? So there's a bunch of different tools within your organization for making organization better. There's things like Hangouts Chat or Slack. There's things like project management tools. But once you get outside of an organization, basically everybody uses email, just because everybody has it. And so the key things that Streak adds-- so we have what's called a pipeline. And you add boxes, which might be sales deals or support tickets. And in addition to just organizing your email-- so when you look at your emails in your inbox, you can see which deals or support tickets they're associated with. We can also add structured metadata, so things like what stage the deal's in, who is involved, who's the point of contact, how long it's been since the last follow-up, and all of that kind of stuff. So we're mostly talking about technology here rather than a product demo, but I just wanted to give a motivating example of what we're storing in these databases. Cool. And so, here, we're going to be talking about our database journey. And just to give you a little bit of preview, situate everything, we started off on the Cloud Datastore. We're going to be talking about Cloud Firestore, which is Datastore's spiritual successor. And then we're going to focus in on one particular problem that we call the email metadata problem and how we're building some new product features that caused us to reach for Cloud Spanner and Cloud Bigtable, some of the newer host of database technologies. Cool. So now, I want to give it over to Arpita to chat about our use of Cloud Datastore and Cloud Firestore. ARPITA SHRIVASTAVA: All right, so Cloud Datastore-- the way we're going to be talking about the databases is just give you a little bit of an introduction why we chose that database service, our takeaways from using that service, and how our data needs evolved to want a newer solution. So Datastore-- it's a NoSQL document store. Your data is organized into documents called entities, which are basically objects that are accessed via a key or an index value. And similar entities are grouped as kinds, which is like a table in a relational model. Entities have fields called properties, which are basically the values-- you get two inbuilt indexes on each of the properties for an entity. There are no joins or aggregate queries for Datastore. There is the GQL, which is a query language for Datastore, but complex joints and aggregate operations are not supported-- but simple filtering is possible. Why we chose it? So back when Streak was started, it was started on App Engine. So the only database offering at the time was-- on Google Cloud-- was Datastore, so it made sense to collocate our database with our server stack on Google Cloud. But even so, we think it's an excellent choice for if you're starting out with a new product or service, and that is because it gives you a lot of flexibility to evolve your schema. And what that means with Datastore is, with entities, as I said, they have different properties. You can have-- the same properties across entities can be completely different. The number of properties, even the data types for the same properties, can be different, so it gives you a lot of freedom to just build your features and evolve your schema as you go. Of course, the other reason we love Datastore is it's a managed service. NoOps. It autoscales, autoreplicates. Google's SREs handle availability. You don't have to worry about provisioning capacity up front, and you really just throw data at it, and it would just grow with your service. And so back then, there was one candidate, and over 10 years later, we have about 15 terabytes of data on Datastore. All our application data is on it, and we found it to perform predictably and have predictable kind of challenges and solutions. So working with Datastore. So some concepts that we found are key to work most effectively with Datastore. One is entity groups. So as I was saying, your data is organized into entities. And multiple entities, if they belong to one entity group-- you can have a transaction across multiple entities. So you, basically, want to have your most highly correlated data in one entity group, so it can be a user profile, and so on. At Streak, we have the smallest unit of work is a box. So a box can have different threads of emails, notes, comments, and a lot of other custom fields. And so boxes can be put into pipelines, which are shared across teams, and the teams belong to an organization. So the way you design your entity groups will have consequences on throughput and consistency. So for our case, if you want to update boxes, we have our entity group set to organization level, which means we have a lower number of entity groups. What that does is-- so entity groups are also units of consistency, which means a query against a entity group will always give you strongly consistent results. Also, with a transaction, they will always-- your asset transaction properties will be applied when you're using entity groups. They're also stored-- all entities-- within an entity group. Google stores them physically close together, so you have minimal IoT. Reads are really fast. So in our example with boxes, when the entity group is set to an organization level, with one query against an entity group, you get a high volume of entities. But there are limits on how many entity groups, how many writes you can have per entity group per second, which is set to you can only have one write, currently, in Datastore per second per entity group. And I'll talk about the consequences of that in a second. The other things that we found useful working with Datastore is we use Objectify, which is a Java data access API that makes using Datastore really easy. The example at the bottom, where you have annotations to work with entities, indexes, and so on, has been really useful. We use JSON and Jackson annotations for serialization and digitalization. Then some of the challenges. As I was saying, the write throughput rate for entity groups is, in Datastore, set to 1 per second. So ideally, you'd want to have really small entity groups so that you are not writing more than once per second. In our case, organizations-- pipelines can have hundreds and thousands of boxes. So for example, if you're trying to update multiple boxes, in our case, in a pipeline, then it can be tens of thousands of writes against one entity group, so it can cause a lot of contention. And again, something to keep in mind, entity group relationships are permanent. You cannot change them on this. You delete all the entities and recreate them in a new entity group. So that is a challenge contention. As we're growing, we have larger and larger customers. That has been a big challenge. Again, boxes-- the way our schema has evolved, there's a lot of child objects that we have with boxes now that they all belong to the same entity group, but transactions are limited to 500 entities per transaction. So when you have cascading writes, that also becomes kind of a performance bottleneck. Also, another limit to keep in mind-- transactions are limited, only, to 25 entity groups. So again, you want to figure out your key design, so that you're finding the right balance between your consistency and throughput. And then client-side views. So as you saw in the first graphic, we present this spreadsheet layout to our users, and we offer complex filtering, grouping, and sorting. And not being able to do that-- Datastore not providing that out of the box-- we have to shift this responsibility to the application code, which, of course, increases the complexity and debugging complexity and so on. So not having joints-- as we grow larger, we really miss having all those features supported out of the box, which would be part of a relational database. Cloud Firestore. Firestore is the latest version of Datastore. There's two modes here-- Realtime and Datastore mode. Datastore mode-- Firestore in Datastore mode is backwards compatible with Cloud Datastore, but it doesn't have some of the newer features that Realtime Firestore has, like real-time updates. There's rich mobiling web client libraries. So Firestore is supposed to be drop-in replacement for Datastore, and the migration to Firestore is supposed to be automatic without any downtime, so we're really excited about that. Basically, Firestore has a Spanner backend, with the Datastore features in the front end. So you get a strongly consistent layer in the backend, and this eliminates some of our biggest pain points with Datastore. One, we have strongly consistent results throughout, and the entity group restrictions that I talked about. 25 entity groups per transaction and the write throughput limit of 1 per second-- we don't have that with Firestore anymore. We're hoping to upgrade soon and-- yeah. All right. So we talked about how we felt the need for a fully relational model for our application data, and there's a new feature that we worked on, and we call it the Metadata Problem. So I'm sure all of you had this problem where somebody forgot to hit Reply All in an email, and then you got a forward of that email. And then now you have a lot of redundant emails and emails just in reverse chronological order, and it's just not very elegant. And also, there can be problems with permissions and sensitive email that you were not supposed to receive ends up in your inbox. So strict solution to that is having a unified view of emails across a team. So in this example, you have the top three emails were not addressed to me but, as you can see, they're shared from my co-worker's inboxes. So you get kind of this elegant view of the email, which is controlled by a permission model. So this looks really good in the UI. But on the backend, it's kind of a complex graph traversal problem. So we want to answer questions like, which co-workers of mine have interacted with this particular prospect over email, or, for this email, what are the other emails on this thread that are not in my inbox but in my co-workers' inboxes? So the solution to that-- we came up with a graph solution, where your nodes are either a person or an email, an organization or a domain, or an email message. And the email messages are connected if they're on the same thread. So if I emailed you, and then you forwarded the email to somebody else, we can trace those emails and give you a unified view. And then two emails are also connected if they are the same email, but they're the part of different people's inboxes. They're connected by something that is called an RFC ID with Gmail-- so same message but different inboxes. So this, obviously, helps us to answer the questions and create a unified view that can be shared across a team. To talk about how we approach this problem solving, this, and the relational model needs-- uh, Fred? FRED WULFF: Yeah, OK. Real quick before I go on to our solution, just want to-- yeah, building on what Arpita was saying, so this is like a pretty challenging graph database problem. Like you have to really understand this course of an email thread or a course of like connected email threads across different people's inboxes, your organization. You might have to hit hundreds of different inboxes. I'm sure you've all been on these like horrible Reply All threads that have not just hundreds of people in the company but, like, thousands of emails when you count all the email chains. And so, there's kind of this graph database problem, but it's also pretty challenging for a couple of reasons, technically. So one is you have to be able to do all of these edge traversals quickly. There may be a lot of emails or sort of nodes from the graph problem. And as Arpita mentioned, permissions are a huge deal, right? Email can be really sensitive. You can have a password reset emails. You can have a manager giving targeted feedback to a report that might not be generally consumable. There are just like all these ways they can go horribly wrong if you don't have a strong permissions model. And kind of what this means is we have to traverse this graph, and there's pretty big challenges if we try and precompute a lot of this stuff. Normally, the distribute system's way out is you just like normalize a bunch of stuff in advance. And in this case, we both really needed to compute it at read time to make sure that we're getting permissions right, even if somebody just changed some permissions. And this is all happening within the context of viewing emails in Gmail. And you know, the whole premise of Gmail is that it's really fast. People expect their emails to just like load quickly, even if they're searching their whole inbox. And we're expanding that to letting you search parts of not just your inbox, but everybody in your company's inbox possibly. So challenging technical problem. Looking at the backend, this is actually across our entire user base. We have about 30 terabytes of just email metadata. So that's like we're not looking at the contents of email-- we don't really want to-- but just looking at like who the email's from, who it's to, the sort of information we need to compute these edges. And so we evaluated a bunch of different database products. There's a lot of graph-specific database products on the market. But we really-- all those being equal, we wanted something that had this property where it was managed by Google. We have a very small operational team. We basically have, roughly, three infrastructure engineers, depending on how you slice different people's time. And being able to offload some of that like availability and charting work to Google SRE really seemed like a good trade-off to us. So we looked at a bunch of different options, and we ended up-- this was our first use case for Cloud Spanner. So Cloud Spanner-- I think it's about a year old at this point, maybe a little bit more. But the key things about it, looking at just the product in general, it is a globally consistent relational database. And so what does that mean? It supports the SQL language that you know and possibly love or hate, depending on what you've done with it. And it has what I affectionately call the magic clock thing. And the magic clock is basically for years and years and years, there was always this introductory paragraph in every distributed systems paper, which basically said something like, we know clocks are unreliable and go back in time and have jumps, and the network's unreliable and may have horrible latency. And everybody just sort of took it as a premise that you couldn't rely on clocks, and they were trying to lead you astray. And so the Spanner-- Whitepaper came out a while back, and now the Cloud Spanner product is built on this idea of, hey, what if instead of accepting this as a premise, we just made the clocks roughly reliable? And there's a bunch of effort, in terms of installing atomic clocks in all the data centers. There's a bunch of effort, in terms of like setting up the networking. And there's a bunch of effort, in terms of the statistics of like setting error bars on timing. But the outcome of it is that Cloud Spanner says on the tin, and it's actually true, like we're using it in [? anger. ?] It is globally consistent. So basically, this means that going back to the stuff that Arpita chatted about with Datastore, you don't have to preselect particular entity groups. You don't have to say, these are the things that I'd like to do transactions on beforehand. You just-- when you go to actually update entities, it looks at what entities you've queried. And you only have transactional conflicts if you are actually modifying things that were also queried by other queries. So it's like just a set of stuff that are actual real transaction conflicts So, you know, this is great. Globally consistent SQL, it has really great transactions. And, you know, some of you who are sort of thinking this through might say, but wait-- email itself isn't consistent. Why do you care about consistency? I might send you an email, and, all the time, we see emails taking 10 minutes, 30 minutes to get delivered. Why do we care about the database being consistent? And the answer is it goes back to that sort of operational ease. It means that when we have our syncing logic, we can sync the data. We can build these indexes to power these graph queries. And we don't have to worry that the sync gets partially committed. We can just say, hey, all of this should be committed in the transaction. And it's either all committed or not committed. And this just means that our indexing pipeline that's processing a lot of email was basically written by-- it basically took one engineer about half-time working on it for a month or so. It was like, really, for the scale of distributed system. When we're planning out the work, we're like, hey, we probably need to have at least two people on this for a quarter. And just by relying on the Spanner transactionality to deal with a lot of the edge cases, we were able to just get it out and get it in customers' hands much more quickly. And so building on that operational ease, there's a lot of other sort of really nice features in that area around Spanner. So in a previous life, I managed a couple of on-premise SQL databases and, sort of, all of those operational things that just sort of filled DBAs with terror-- things like adding and removing columns, adding or removing indexes. Both, they're like pretty easy to do. And also, there's a pretty good sort of Task Planner that makes sure that, even if you add and remove a bunch of columns, it doesn't interfere with your actual transactional workload. So that's all pretty nice things to have, even if our underlying data isn't necessarily super dependent on the strong consistency. So let's-- you know, I've been singing its praises generally. Let's talk about how it actually works in practice. So this is in the Cloud Console, the schema for one of our tables in production. And this table is what we call MessageMailboxes. And so that's-- if you remember the sort of graph nodes, there were some things that are dependent on the message itself, which Gmail thread it's in, which other messages have the same sort of message ID header to figure it out between inboxes. But there's also stuff that's dependent on who the message is addressed to or who it's from or who's on the CC list. And MessageMailboxes has one row per person on a message. So if I send a message to you, the version of the message in my inbox would have two MessageMailboxes rows-- one for me and one for you. And key things here-- so you see at the top, it says that this table is interleaved in messages. Interleaved tables is this feature in Cloud Spanner that lets you sort of have really great performance for stuff that should be co-located together. And so here, most of the time that we're querying who's on a particular message, we also care about what message they're on. And so here, we've interleaved MessageMailboxes in messages. And that means that if I want to do an index filter on all the messages that correspond to a particular email address-- that a particular email address was part of-- so let's say in the Streak product, we want to be able to show all of your communication with a particular sales lead across the organization, I can query on mailbox email and then do a very quick join, very performance join, with the Messages table to get things like, what was the subject of that email or like these other properties that, if you were doing a more traditional distributed system, you might have to denormalize and waste a lot of space to sort of get performance. And so, we use interleaved rows a bunch. One other interesting thing-- you'll see these sort of weird Is BCC or Is CC Boolean field at the bottom. And so that's basically saying, hey, is this particular email address in the BCC line? Is it in the CC line? Like how are they connected to this message? And the interesting thing there-- those rows actually have two values, which you might expect from a Boolean. But those values are actually true or null. And so like, why are we doing this? So Spanner actually has this ability to have what are called Null Filtered indexes. So if you're coming from like PostgreSQL, or I'm sure Microsoft SQL Server has a similar thing, you might be familiar with this concept of partial indexes. And that means that we can create an index that's just-- is this person on the From line or are they on the To line? And it doesn't actually use up any space for the Message Mailbox rows that don't have the corresponding Boolean set to true rather than null. It filters out all the null entries, so yeah. So lessons learned with Cloud Spanner. Setting up your schema is super important, particularly if you're putting a lot of data into it. You know, it's pretty easy to build and remove secondary indexes, but you are stuck with your primary key index, like you have to rewrite rows if you're changing your primary key. It's not magic like that. And it's pretty important to just sort of play around with the schema to figure it out. So what challenges have we run into? So far I've been like singing praises. And yeah, you know, no plan survives contact with the enemy or the database query planner. So one challenge we ran into is actually Spanner is too smart for its own good-- for our use case. I guess your use case may vary. So Spanner has a lot of things or smarts in the query planner to balance running a lot of small transactional queries. So this is like the kind of thing that is probably the 90% use case. It does things like if you run a really big query, it assumes that this must be like an analytical query or maybe like a backend bulk task. And it will deprioritize that query and make sure that that query takes a long time to run, so it doesn't get in the way of your transactional queries. But going back to our use case, as we mentioned, you've all been on these horrible, huge Reply All threads. And a lot of times, you know, fortunately or unfortunately, those threads are the most important, because they're the ones where the collaboration story really gets tricky, and you need to be able to share some stuff but not other stuff. And we really need those to be answerable in a short enough amount of time that it doesn't degrade the Gmail experience. And so what we were running into was when we did the queries to span out-- to fan out on those really big threads, it was not overwhelming anything else, but it was taking a long time. And there wasn't really a great way to tell the query planner, hey, this query is really important, go ahead and do it. And we tried some experiments with fanning out and sending a bunch of queries. We tried some experiments with doing like a little bit of caching in like Redis or something like that, using Cloud Memorystore, but we just didn't find a thing that really delivered the user experience that we wanted. So that led us to our next database that we're going to chat about-- Cloud Bigtable. So Cloud Bigtable is actually based on Bigtable, which is now, at this point, a very well-established Google technology. It sort of has a lot less functionality than Clouds Spanner, Cloud Datastore. It's really just a key value store or a sorted key-value store, let's say. And basically, that means that there's really two queries you can do to it. You can query for this particular primary key-- give me the value. Or you can do lexicographical-- however you pronounce that word-- range queries and basically say, hey, give me all values for keys between A and BE or AA and B. And so there, it's like a pretty limited query interface, but the trade-off you get here is that it's really, really simple and really, really fast. And so there, basically each node, you add to Cloud Bigtable. And it's another manage database, so you can just add and remove nodes to your heart's desire, and you pay for more nodes, get more performance. When it's nighttime, you can just like decrease the number of nodes and not pay for what you're not using. But basically, each node you add gives you 10,000 reads or writes a second, which is a lot of reads or writes a second. And basically, the queries just look like either put some data in for this key, or read some data for this key or set of keys. There's currently two modes. The mode that we use for this-- just because for our use case, it's not great if the feature is unavailable, but it's not like business ending. People can still get at their Streak box data, which is stored safely in Cloud Datastore. But there's one mode that is single-zone availability-- has the concept of a transaction within a particular row. And then there's another mode that is multi-zone replication. You forgo transactions. But even if one zone goes down-- which is infrequent, but happens-- you can just access the data in another zone, and then it'll asynchronously replicate after the zone comes back up. So yeah. So Cloud Bigtable-- not a great tool for every job but the best tool for some jobs. And we're actually using this in production for querying this email metadata index. And let me show you, sort of, what that looks like. So here, we've got the output of CBT-- is the Cloud Bigtable console. And this is our Messages table. And you can see, there's this concept of a column family, which is sort of a set of data that's stored together. And we have one column family for each of those indexes that we need to traverse the graph. And so there, for instance, I was showing you the Message Mailboxes table, and so we've got one column family that's sort of this index for looking up Message Mailboxes by the domain or by email. And then also, messages are indexed by the message ID, the RFC message ID, or the thread ID, which lets us do that sort of graph traversal. And the cool thing here is the Bigtable API is completely asynchronous. So we basically, when we're doing this graph query, we can just have a queue that's just as soon as information comes back, sending out the next set of fetches. And that's all pipelined and can be very much like sub-second. So each look up is about 6 milliseconds, so you can do a lot of lookups before somebody notices the delay. Other things here. This is very much sort of build-it-yourself. So the model in Bigtable is just a string of bytes to a string of bytes. And so, we had to figure out our own encoding that does graphic encoding the way that we want. There's a few libraries out there. There's one called Orderly. There's one that's sort of partially factored out of the HBase library. So there's some tooling there, but you still need to very much figure out your own schema and your own encoding. This was a bit more developer-intensive, so definitely don't recommend reaching for this from the get-go. But it is very, very fast, and it really does what it says on the tin. So, you know, we're using Cloud Bigtable for those lookups in production. 6 milliseconds. You can do a whole bunch of reads all at once, and it's like providing what we need. And so then, at the end of the day, just wanted to summarize sort of what we already talked about in a convenient flowchart. And so kind of the first question I'd ask if I were reaching for a new database is how developed is your project schema product, however you want to think about it. If you're still in that sort of finding product market fit or experimenting with the project, then I think either Cloud Datastore or Cloud Firestore is probably where you want to go. Basically, there are key things here. It's got very flexible schemas. You just like set up the documents, and then you can index a bunch of different ways after the fact. It's very easy to work with. If you're using Cloud Firestore for a new project, you can actually use that in Firestore mode, get most of the same thing-- oh, sorry, in real-time mode, get most of the same benefits that we're talking about, and get being able to send stuff directly to your mobile or web clients. So I think that if you're still figuring that out, that's a great choice. Oh, also, Firestore and Cloud Datastore just have a per-operation building model, so if you're not using it, you're only paying for storage, which is like pretty cheap if you're just getting started. Whereas, Cloud Bigtable or Cloud Spanner are a per-node pricing model, so you're paying hourly for each node that you have set up. So if you're just sort of experimenting, it's real cheap to get started with like Cloud Firestore or Cloud Datastore. And then, yeah, on the difference between those two, I sort of jokingly put, is it 2019 or later? But yeah really, Cloud Firestore Datastore mode is just Cloud Datastore but better. They've announced that they're going to move everybody to Cloud Firestore. So if you're starting a new project, just start with Cloud Firestore. I have trouble envisioning a case in which you actively want to go to Datastore if you're starting a new project. So if you're more certain of the needs for your project, if this is something where either it's a new technology for an established company or product, or if you're really, really certain of your performance needs and you've tested it out and have a really good idea-- then that's when you should probably reach for a Cloud Bigtable or Cloud Spanner. Cloud Spanner is a lot nicer to work with. It has data types, it has SQL, it has these globally consistent transactions. So generally, unless you know you're going to need to really, really optimize those key value lookups and build things, like this horrible Reply All thread case, definitely go with Cloud Spanner. We actually built out this index entirely on Cloud-- or this metadata indexing system entirely on Cloud Spanner to start with. I'm super happy that we did, even though we ended up moving some of the workload to Cloud Bigtable because it informed a lot of our decisions, sort of gave us more information on the use case. And then if you just like really, really need that one specific tool, then Cloud Bigtable is probably what you should reach for. [MUSIC PLAYING]

Info

Channel: Google Cloud Tech

Views: 11,938

Rating: 4.6428571 out of 5

Keywords: type: Conference Talk (Full production);, purpose: Educate, pr_pr: Google Cloud Next

Id: 3aHBkfBRFEU

Channel Id: undefined

Length: 37min 34sec (2254 seconds)

Published: Wed Apr 10 2019