[MUSIC PLAYING] ADAM LEVIN: Welcome everyone. Thanks for joining us at
5:30 on a Thursday evening to talk about in-depth topics
on distributed databases. So if that doesn't interest you,
then you're in the wrong place. But welcome, and
thanks for joining us. My name is Adam Levin
this is Sharon Dashet. And we're going to do a
little deep dive into Cloud Spanner and Cloud Bigtable. So before we get started,
just a friendly reminder to please fill out surveys. We love feedback. And so those will open 20
minutes into the session. As I mentioned, my name is Adam. I'm a product marketer
based out of San Francisco. Sharon is a big data specialist
based out of Tel Aviv. So we figured, London was
the most convenient place to get together,
have a cup of coffee, and talk about databases. So what we're going to do
today is talk a little bit about why manage databases. I promise to make that brief. And then, we'll get into
the depths of Spanner, how it works,
Bigtable, how it works. We'll do a little demo. And then, we will
talk a little bit about some of the migration,
the modernization options, particularly to Bigtable. And then, if we have
time at the end, we can do some Q&A.
So let's get going. And so a discussion on
Spanner and Bigtable wouldn't be complete without
a little bit of a history discussion to begin with. So Google, as a
company, has been building tools and services
that help people build their lot in their lives for as
long as Google has been around. One of the things that
enables that is data. And data in the infrastructure
to manage that data underpins all of
which Google does. And so Google has been tackling
big data problems for 15 years. Some of those
innovations have resulted in famous open-source projects
like MapReduce, HBase, things like that. And then, they've also
resulted in commercial products that you have access to go
build on within Google Cloud. But it all comes down to
managing data at scale. And along the way, we've
learned a few important lessons. So before we talk a little bit
about those important lessons, a quick story. So one Bigtable was first
created and first produced, it was handed over to
developers within Google to manage themselves
and run themselves. And so a developer
had to spend time maintaining and managing
Bigtable and then building the application
that ran on top of that. And what the Bigtable
team soon discovered was that that wasn't efficient. All of these developers
were spending their time operating this thing instead
of adding business value and building code on Bigtable. And so they decided to
build a managed platform that developers could then
access and build on top of. But the management
would be centralized. And this had two big effects. One was that developers
could just build and focus on adding business value. But also, for the team that was
managing this thing centrally, they were able to discover
bugs and Edge cases and solve those at
a much higher rate before they became
bigger issues. So a challenge discovered
by one person over here was solved for
someone over here. And so that was a
really big advantage. And so the same lesson
holds true for all companies building and running
applications in the cloud. So by relying on
managed services, you're able to reduce your
operational overhead and toil. And instead of worrying about
availability upgrades, security hardware patching, you
can let a managed service help you achieve that for you. And then, you can focus your
energy on higher-priority work, like adding business value. Now, that's not the only
lesson we've learned in managing data at scale. There's a few other things. But the overarching
point is for you to let specialized systems
that operate at scale manage that data. So you can focus on
adding business value. So let's take a quick step back. There are many
databases out there. DBengines.com, if you're
familiar with them, tracks 343. As of a couple of months
ago, each line on this represents one of
those databases. And I actually got tired of
clicking to load them up. So I don't even think
this is all 343. And so you have lots of
choices to choose from. In addition, Gartner
is saying that by 2022, 75% of all databases will
be deployed or migrated to a cloud platform. And so managing infrastructure
in the databases that they run on isn't
a core differentiator for most companies. And so whenever
possible, you want to take advantage of
a managed platform. And so then, your
database choice comes down to what
your application needs, what your industry
is, and really, what your cloud provider offers. And on GCP, we have a
wide range of databases, both systems that are built
and managed by Google-- we're going to talk about
Spanner and Bigtable in two seconds-- but then, I also
wanted to point out that we do have a range of
options that are provided by partners, as well
as you can really run anything you want on
GCE, which you see over on the far side. And so today, we're
going to focus on Cloud Spanner and Cloud Bigtable. So few minutes in, let's
dive into Cloud Spanner. And we have to start with
the discussion on what's tricky with databases. And so Cloud Spanner
addresses these challenges with a combination of
scalability, manageability, high availability, transactions,
and complex squares. It's lot of ands. I say combination
because there are other systems out there that
address this in an or fashion. So it's scale, or
transactions, or replication, or high availability. But with Cloud Spanner,
what we're doing is, we've created an and statement. We're taking the best aspects
of relational databases and distributed databases
and combining them together. It's not to say that the
30 years of features that are in existing
relational databases all exist in Cloud Spanner. But we've combined
relational semantics with horizontal scale. And as we go deeper
and deeper, we have to start understanding
what the differences are between Cloud Spanner and
traditional relational databases. And one is that the application
is in control of its data when you use Cloud Spanner. And so traditional DBs
have a lot of complexity. They have stored procedures
and other business logic inside the database engine. Cloud Spanner, that's all
pushed to the application. And that allows Cloud
Spanner to scale really well. And so Sharon is going to talk
through how Cloud Spanner works and how it's different. And it's really helpful to
understand how it's built and how it works to understand
how to approach building on it. SHARON DASHET: Thank you, Adam. Sorry. So to reiterate
what Adam mentioned about having the
best of both worlds, so Spanner is both relational
and also highly scalable as NoSQL databases. So I have a personal story I
would like to share with you I started in the late '90s as
an Oracle and SQL Server DBA. And back in the days, even
as a developer who was not part of the production
team, we had to do a lot of administration
task, and maintenance, and many stuff that were
part of your source control, like exchanging partition, like
vacuuming, rebuilding indexes. And some of the stuff
really is like house chores. You would like to concentrate
your business logic. So we did like 60%
of administration and 40% of business logic. But I think the worst
was the inability to scale whenever we
wanted to introduce a new workload, whenever
there was a new customer that joined the production system. So whenever we needed
to scale, there were very expensive
hardware involved, long-term upgrade plans. And this was a hassle and
tempered both innovation and the business. And who can ever afford to buy
a very expensive appliance, or use a shared stored system,
like some of you know-- I will not mention names? So moving forward to around
2010, when the big data disruption happened,
and Hadoop came, and a lot of NoSQL and
Columnar database based on commodity server were
coming to the market. I remember me and my data
team sitting for coffee. And we were thinking,
what if we can have both? So both relational
ANSI SQL system that can scale like NoSQL. And it looked like
a dream really. But when we heard about Spanner
a few years later in a Google conference like this,
we were very pleased. So this is like a
full circle for me. And what are the building
blocks of Spanner that make it both
scalable, highly available, and both relational? So these are the Google network. For those of you
who don't know, we have a global backbone
private network, very reliable and
fast, the TrueTime. And TrueTime is our
global synchronized clock. And we are going
to speak about it later to understand
how can TrueTime make us both relational
and scalable. And we made some optimization
to the Paxos algorithm, the famous Paxos algorithm,
for Two-Phased Commit. And on top of that, we have
some automatic rebalances of the shards of the table. So these all can
explain why Spanner is high-available, performant,
scalable, and relational. So every big data talk
has the CAP theory minute. So we are not different here. So in a distributed system,
we cannot have all three guarantees of partition
tolerance, availability, and consistency. So the traditional
relational system would sacrifice availability
for the sake of consistency. And the NoSQL system will
sacrifice consistency for availability. But what about Spanner? Do we break the CAP theorem? The answer is no, no, no. We don't break it. But we minimize the
chance of partition, of networking
partition, because we have this highly-reliable,
redundant network. So we can have very
high SLA, but if we have to sacrifice something,
we would sacrifice availability because this is a
relational system. And a fun fact, Eric Brewer,
who created the CAP theorem, is working in Google. And he has written an article
about TrueTime, Spanner, and the CAP theorem, if you
would like to check it out. OK, so what we see here
is the regional instance. In Spanner, we have
two configurations-- one regional and
one multi-regional. The regional is
under four nines SLA. And recently, we
announced that even with one node or two nodes,
we can have four nines SLA. And this is a big improvement. So you can start small
with Spanner and do a POC. And then, you can scale out. And what you see
here is something that is very interesting. We have a separation between
the storage and compute. So a node in Spanner is a unit
of throughput and compute. It does not have any storage. The storage is elsewhere. It's distributed. And every one of the
Spanner instances can have more than one database. We can have up to
100 databases sharing the same configuration
and the instance to enjoy the same resources. But in Spanner, as
far as multi-tenancy, we usually do not design
database per tenant. We use the primary key for that. So databases are for
sharing the same resources. In a multi-region
configuration, we have three types of replicas. In the main region-- in this example, you see
Spanner across three continents. So the US is the main
region in this example. We have what we call
Read-Write Replicas. And these are replicas
we can write into. And we have something that
is called a Witness Replica. So this will ensure
us a quorum, even if the write replica is gone. In addition to that,
in the other regions, we have Read Replicas. So this is very performant. The readers can read
close to their zone. And we have this global
high availability. And often when we
speak about Spanner, we speak about
external consistency. And external consistency
means that the system behaves as if all transactions
were executed sequentially, even though Cloud Spanner
actually runs them across multiple servers. So it acts like a
one no-database. But it is just a
distributed system. So this is twofold. This is due to the
very first network, and our synchronous replication,
and the customization we made to the Paxos Protocol. And the other
factor is TrueTime. So often, TrueTime is mentioned
as one of the building blocks and what makes Spanner ticks-- and pun intended here. Because TrueTime is a
globally synchronized clock. And in TrueTime, in each
one of the zones of Spanner, we have a combination of
both GPS and atomic clock. Each one of these clock
types are compensating for the other's failures. And in addition to that, we
bring into the timestamp-- we attach to every
write and read uncertainties of the network. So even if we synchronize
every 30 seconds the local time and the reference time, we can
have drifts in the local clock. And we have uncertainty that can
be as much as two milliseconds. And this is used as an
epsilon as part of the formula that Spanner is
comprising when it has to attach the timestamp
to the transaction. So why am I telling
you all this? It sounds complicated. So this is how
Spanner makes sure that one timestamp does
not overlap with the other. And this is what
makes this sequential. So there are no collision. And the readers can
read multi versions. So it is also a
multi-version database. And the readers can read a very
consistent version of the data without holding locks. And this is rather revolutionary
in distributed systems. And on top of the
TrueTime the replication, we have automatic table split. So the keys in Spanner
are ordered and split amongst the nodes of the cluster
in what we call key ranges, or splits. So each node can have one
or more split off the table. In our example of
three nodes, each node will have three splits. And we will get
back into splits. And every split
has one leader that is allowed to write into
the split and two replicas. So this explains why we are
so scalable, performant, and highly available
because we have three replicas of each data unit. So we talked about the
wonders of Spanner. But at the end of the day,
this is a distributed system with network in between. So we do have some best
practices around primary key, around child tables,
and around indexes. You need to keep in
mind when you start your adventure on Spanner-- first, how do you keep
parent-child relationships? So you heard from Adam, we
don't have many knobs to turn. It is not like the
classic relational system with the triggers and
the integrity constraint. But if you do want to co-locate
in the same physical node the child and the
parent key, you should use the interleave
keyword, like in this example. So in this example,
if we interleave the foreign key
table, the singers, with the parent,
the albums, there will be co-located
together on disk. So this is when we don't
use the interleave keyword. We have two separate tables-- the singers and the albums. But if we do use
data the interleave, it will look like that. So we have co-location of
the foreign key, the albums, with the singer. And we have some type
of indexes in Spanner. Automatically, every primary
key will have a unique index. We also have the
ability to create independent interleaved index
or non-interleaved index. We have other types of indexes,
like null-filtered indexes. These are indexes without
nulls because by default, nulls are getting indexed. And we also have some
covering indexes. So whoever worked with
MS SQL know the term. So covering index is helping us
to prevent lookups to the base table when this is applicable. And remember the
split we talked about in the scalability chapter? So we used an example of
monolithically increasing primary key. And in fact, this
is anti-pattern because we don't encourage this. By default, in a monotonically
increasing primary key, the last records will be
appended to only one node. And this is split number 8. So we don't want one leader
to accept all hotness from the new records. So this is why it is
recommended to distribute the keys by using a unique
ID, or by using some field promotion or salting, or
some type of mechanism that will have even
distribution of the split. And finally, we have some
new features in Spanner. We talked about the one and
two nodes guarantee the SLAs. We have some new graphing
in our monitoring system, in the console. We introduced JDBC driver. We introduced support in
hibernate and some more security controls. And with that, let's get back
to Adam to speak about Bigtable. ADAM LEVIN: So I told you,
we had a lot to cover. And it's a whirlwind deep dive. So moving right on from
Spanner to Cloud Bigtable. And so Cloud Bigtable is our
scalable, high-throughput, low-latency data store. And so if you're familiar with
the types of new SQL databases, it's a wide-column or
key-value data store. It's really good for low
latency random data access. It's often partnered with Big
Query for a lot of workloads, particularly around
real-time analytics, doing machine learning and AI on
top of lots of, let's just say, log data. And the really
nice thing about it is that performance scales
linearly as you add nodes. And that's a completely
online operation. So the same thing is
true with Spanner, where any sort of
scaling procedure-- or with Spanner, schema
change is all online. So there's no such thing
as planned downtime for these databases. One other thing to
note about Bigtable is that it's fully compatible
with the HBase client. So if you have HBase that
you're running yourself, it's relatively straightforward
to move to the Bigtable and use that. One use case that we'll talk
about a little bit later is also moving from
Cassandra to Cloud Bigtable because those can be similar
data models and similar use cases. One thing to note
about Cloud Bigtable is that it's fairly
well integrated with the rest of
the GCP ecosystem. So you can actually query
directly from BigQuery into Bigtable. It's integrated with
Dataproc and DataFlow. And then, most importantly,
it's integrated with TensorFlow. And so there are a lot of
people building ML models on top of Bigtable and
using that to process and then serve
personalization data-- which takes us to the most
common use case for Bigtable is things that fall under
a very broad umbrella of personalization. And so this is really
high-throughout reads, low-latency writes
and that integration, where you are doing predictions
on [? click-screen ?] data, or you're wanting to create a
unique user experience based on actions. And you can use all that
Bigtable offers to do this. And we'd see lots of
customers doing this today. The linear scalability
of Bigtable makes it very sensible for
EdTech, Fintech, gaming IoT, and other use cases
where you just have tons of data
coming in, and you need a place to put it,
and then access it later. And so as I mentioned, BigTable
is often used with BigQuery. And this is the very high-level
marchitecture diagram that we see for this wide
umbrella of personalization workloads. And so that's just a quick
run-through of Bigtable. Now, Sharon's going
to talk a little bit about what's going on under
the hood with Bigtable and how it's able to
achieve such throughput and performance. SHARON DASHET: Thank you, Adam. So no personal
story here, but we have a very nice demo coming. And a little bit about the
terminology around Bigtable-- we have an instance. And an instance is a
container of cluster. We can have up to four clusters. And we have them in
various zones and regions. So each one of these clusters
is attached to a zone. And we have nodes that are
also called tablet servers. We are going to speak
about what a tablet is. And you can attach storage
that is either SSD or HDD. And of course, for production
use cases, we prefer SSD. It has a few
millisecond latency. But there are some
use cases when you don't care about
latency as much as the cost. So you can use HDD as well. And in Bigtable, as
in Spanner, the nodes are a unit of compute
and throughput. They also does not have
storage of their own because the storage is
separated from the compute. And we will speak
about it later. And what is very
nice about Bigtable is that we can scale easily
by adding more nodes. So each node we
add to the cluster is roughly about 10,000 QPS. So you can scale up
and down according to your requirements, your
throughput, your performance, or your cost planning. And you can see here,
it scales very easily. So we talked about how the nodes
are separated from the storage. But we have another
wonderful thing happening. This is automatic rebalances. So every one of the nodes
is a throughput unit that is responsible
for writing and reading to the storage of
the Bigtable system. But once we see that
one of these nodes is more loaded than the other,
the routing layer of Bigtable can automatically
place the shard in another note
that is less busy. Each one of these
shards is called tablet. And this is why the nodes are
also called tablet server. And you can think of it
as like a logical unit handled by only one node. So those of you who
handle Cassandra or HBase, it's very similar to
regions or partitions. And a little bit about data
modeling-- so in Spanner, we spoke that we have
some best practice that says everyone wants to
know and needs to know. In Bigtable, this is mostly
around modeling the key. So modeling the key because
you heard from Adam, this is a key-value system. So the automicity
of the transaction is paired in the
context of one role. It does not cross roles. So this is white
it's very important to model the key properly. So the only index in
Bigtable is the key index. So if you need
additional indexes, you would probably
create additional tables or use some of the
server-based filtering that will happen after you retrieve
the blocks from the storage. And what you see here
is the column families. Column families are a way
to group together physically columns that have
common characteristics. So you can co-locate
them together physically. And with each column family,
we have one or more columns. And the system is very sparse. What do I mean? That in one row, you
can have 100 cells. And in the other, you
can have only 50 cells. So you don't pay for
what you don't write. Each one of these cells
has multiple dimensions. So the [INAUDIBLE],,
it's per column family. It's per column, but
it's also versioned. So you can do upsert and write
versions in the same cell. And there will be a
garbage collection that will collect the older version. And you can play with
the configuration and decide that you want
to keep all versions. Or you can keep only
the last version. And in addition to
that, you can also configure time to leave, TTL,
in the column level family to control historical
data, aging out. So we see many systems like
that in the [INAUDIBLE],, in the monitoring. So they use both the garbage
collection configuration and the TTL. And this is one of the
most important design tasks you will have to do per
a table in Bigtable. This is to decide about the key. So we would try to avoid
keys that create hotspots. For example, we have an
IoT system in this example. And we would like to monitor
our metrics of a device like memory, CPU, storage. In this example,
if we model the key to be around only memory
usage across all the devices, we will create the hotspot. So adding the timestamp can
alleviate some of this problem. But it will also
introduce another problem of sequential writes that
can also create hotspots. So what we propose
here is doing field promotion by adding the user-- the user in this
example is the device-- to the key. So I learned for many
customers that when they chose the wrong
key, the performance was not as expected. And when they did the
rethinking and design the keys as this practice says, they
got a few milliseconds latency as expected. And this just summarized
what we said so far. But what we haven't spoken
about is some best practices around the size of a single cell
and the size of a single table. So we recommend this
cell to be not more than 10 megabytes
and the row not to be more than 100 megabytes. So if you have a
very large row, you will start to see some warnings
in our monitoring system, as we will show you later. And these are the common
API operations in Bigtable. And most used are Put and Get
to write and read a single key. Also very popular
is the range scan. So if you would like to create
a monitoring system, time-based, or anything else
that is time serious, we can use a range scan
to read more than one key in the same API call in
a very performant manner. And this is about writes. So reads are very
fast in Bigtable. But writes are even faster. And this is because every one
of them mutations, the updates, or the rows in Bigtable
is being written first into memory for consumption. And only then it
is flashed to disk in the form of what
we call SSTable. So SSTable is the
most optimized way to keep the keys ordered and
the values in the same place. And we also have a commit logs. So every row we
write in Bigtable is being written
first to a commit log to assure us that even
if the node crashes, we still can recover
from that transaction. And finally, just before
the demo-- bear with us because the demo
will be interesting-- this is about monitoring. So monitoring in Bigtable can
be done by using the console. And in the console, we have
a very useful graph for CPU, for storage, for throughput. We can also use Stackdriver. And Stackdriver is
the Uber monitoring, and tracing, auditing suite
we use for all GCP products. And one of our customers--
you may know them, Spotify-- created an open-source project
for auto-scaling-- auto-scaling Bigtable programmatically based
on metrics from Stackdriver. So whenever they can
detect a node that is too large as far
as storage or CPU, there will auto-scale the
cluster in an automatic manner. So this is the
project by Spotify. And we also had some
client's side monitoring by using OpenCensus,
for example. And finally, we have
a Key Visualizer. And this is a very interesting
piece of engineering. This is a heat map that will
show us in a visual manner a lot of dimensions of
performance in the Bigtable cluster. So horizontally, we
can see the timeline. So we can see
summaries over time of CPU, of reads and writes,
and rows distribution. And vertically, we can see the
entire key schema by prefix. So if the tool can
make sense of the key, you can see it as the
hierarchy of prefixes in the vertical axis. And a heat map is a heat map. So all the cold values are dark. And all the hot values
will be very bright, like [? luminin ?] bright. These are some of
the common patterns we can see in Key Visualizer. For example, you can
see a periodical usage. So the entire key schema
is affected at once by something that is happening-- normally a batch,
MapReduce, or Apache Beam. And the diagonal pattern you
see here on the bottom right is also interesting. It's indicative of a
sequential scan or write-- usually, again, by
MapReduce or one of the processes in framework. So we can move to the demo. But first, I have to tell
you something about the demo. In the demo scenario,
we are going to walk you through how
to monitor, troubleshoot, or pinpoint performance
issues in the cluster. And we are going
to also demonstrate how to scale the cluster and the
effect of scaling the cluster. So we have an event table. And the event table has
time-based trading event of a trading platform. It has four kilobytes
average row size. And we begin with
six-node clusters. In this scenario,
the trading company decided to start with
a historical backfill. So they would fill the cluster
with some historical data before it becomes production. And when it becomes
production-ready, they will add replication for
high availability and load balancing. They will add
application profiles. They will have an
Angular real-time UI. And they will have streaming. But in our scenario,
we are focused at the historical backfill. And we have some readers in
the system, data scientists and data engineers,
that are starting to train the data to
do some prediction and time-based analysis. So they're complaining about
slow down and time out. And we wanted to
know what happened. So we took a recording
of the system after we ask the
reader to stop reading, so we can see what is
happening in the cluster. So without further ado,
let's look at the recording. What we see is that the
cluster is starting to fill up with the historical data. And we can see, we have six
nodes and relatively high CPU. So we go straight to
the monitoring pane to see what is happening. We can see, we have very,
very high peaks of CPU above the recommended threshold. And we can see that these spikes
are coming from writes, not from reads. So we know something
is related to writes. And we can also see
this high throughput that is in correlation
to the peaks we saw. And using now Key Visualizer,
we will use Key Visualizer, to understand more. So usually, you will start
with the monitoring console. And when we would like
to have more insight-- like what is happening
in the key space, whether we have warning
about large rows, what is hot and what is not, we
go to Key Visualizer So we can go to
Key Visualizer now. We can open Key Visualizer
directly from Stackdriver from the Resources menu. And we have both daily scan
or hourly scan of the system. We go to the last
hourly scan, what we see here is the ops metrics. It's that aggregation of
reads and writes per key. And we see a
periodical pattern that is coming strictly from the
CPU of writes from the right. And we can also see very high
latency of a few seconds, which is not expected. So this is what we
will try to tackle-- to lower this latency. That can explain the
peaks we saw in the CPU. And we can also look at
the time-based summary and the horizontal axis
and on the key space in the vertical axis. We have other metrics showing
the distribution of rows amongst the buckets of the keys. So Key Visualizer will try
to divide in an even manner the rows of the table between
what they call key buckets. And these are mainly
for visualization. In this example, we don't
have an even distribution yet because this is a
relatively new cluster. So we don't have the
entire key space yet. So this is rather normal. We happen to know that we
have DataFlow, our managed version of Apache
Beam, ingesting the historical backfill. So this is most
certainly the problem because we saw this
is coming from right. And we also know that we have
an Airflow, or most precisely, our managed version
of Airflow, Composer that is orchestrating
the ingestion. So let's go to composer
or to the AirFlow UI. And we can pick the
last ingestion task from the Airflow UI to see what
is the name of the Dataflow job. And the purpose is
to take the name and to go to the Dataflow UI
to correlate what we see there with what we have seen so far. So this is the name
of the Dataflow job. This is the Dataflow UI. We go to the Dataflow job. And we see a throughput
that is matching what we saw in the UI of the Bigtable. And if we go to the last
transformation stage, we can see also the
number of roles we write and the storage we save in every
one of these ingestion jobs. So this is pretty heavy. And it can explain why
the clients are imposing too much stress on the server. So our conclusion at
the end of all of that is that we need to
scale the cluster. We need to scale Bigtable,
to be able to soak all this pressure from the
clients coming from Dataflow. And this is what we have done. We scaled the cluster
from six to 12. And after about 20 minutes, we
started to see an improvement. And we started to see that
the CPU went lower, throughput the higher. And the reader started
to work with the system without any complaints. And this was done very easily
because scaling in Bigtable can be done programmatically,
can be done by the UI, or by the command line. So let's look at the
recording of the system after we fixed it. And we can see in the graphs
that the CPU went down. And in correlation to the
CPU being back to normal below the recommended threshold,
we go to the throughput graph. And we see throughput went high. So we probably can have
more frequent writes now and more workloads
in the cluster. And if we look in the
perspective of four days, this will become
even more apparent, how the scale made
such an effect. And we have now room to
breathe in the cluster. And what is happening
in Key Visualizer now? So again, we go to
the last hourly scan. We don't expect this
periodical pattern to go away because we
are still ingestion. But we do expect
latency to go down. So let's look at the latency. So the max latency should
be around few milliseconds. And it is around
few milliseconds. So now we are not joking. And this is what we
expect of the system. And this was done with only a
few minutes of troubleshooting and a few clicks. And this concludes the demo. And we have a few
more things to say about integration in
Bigtable and migration paths. So Bigtable inspired all
the system you see here-- and most noticeably,
HBase, which is one of the major parts
of the Hadoop ecosystem. And also Cassandra,
which was open source by Facebook in 2010. And Cassandra was inspired
by more than one system, but mainly from the Bigtable. So you must be
wondering, why would we want to move from system
like Cassandra, for example, to Bigtable. So the main reason has to do
with the operational burden because I, myself,
manage Cassandra. And it has a lot
to do with tuning consistency level, of tuning
[INAUDIBLE] table, of tuning Blum filter. So we don't have all
this burden in Bigtable. And we have a very
low touch replication. And we don't have to deal
with a lot of discovery, topology of the network. And you can scale up
and down very easily and save cost, according to
your throughput and performance. And finally, we have interesting
platform integration in GCP, like Adam showed you. For example, in most
of the recommendation and personalization products,
we will use a combination of big query and Bigtable. And all this integration
makes this even more powerful. So I think we
concluded before time. Yeah, so you can go home. [APPLAUSE] [LAUGHTER] SHARON DASHET: Thank you. ADAM LEVIN: Thank you.