[MUSIC PLAYING] GEIR ENGDAHL: My
name is Geir Engdahl. I am CTO and
co-founder of Cognite. With me I have Carter Page. And we're going to
talk about how Cognite is using Google Cloud
technologies to enable machine learning on industrial data. And in particular, we'll
talk about time series data and Bigtable as two of the
key technologies and problems that we are solving. So a little bit
about the Cognite. We're a young company,
less than two years old, just crossed 100 employees. We're working with our
asset-intensive industries. So that basically means
large industrial companies that have big machinery
that costs lots of money-- a lot in the oil and gas
vertical, also in shipping. And our mission is to
liberate industrial data from silos and piece
that data together to form a model of
industrial reality so that humans and machines
can make better decisions and take better actions. So it's a model that's
real time and historic to have both the present
state of what's going on. And it has all the
previous data too, which is important if you want
to try to predict the future. So what exactly is an
industrial reality model? Well, I'll try to
show a little bit. It depends on how you view it. So there are many angles to
kind of look at this model. This is a typical operator view. So if you're in the control
room in an industrial plant, this is very close to what
you would typically see. This is data now streaming
in live from the North Sea. It has about 1 to
2 seconds delay. It is data that is concerning
a single tank outside on an oil platform-- or inside actually. The tank is called 20VA0002. And the typical
oil platform will contain anywhere from 10,000 to
100,000 sensors, time series, like this. So here you have a handful. But it's just a tiny
piece of a huge machinery. So this is the kind of real
time what's happening right now. You also want to
see, as an analyst, what has happened in the past. Each one of these squiggly lines
represents about 1 gigabyte of data. This is one year of data. I really like this chart. It's my new hobby
to play with it. It's kind of like Google Earth. I can zoom in and view the
data at any resolution. And you know, given that this
is about 10 gigabytes of data, you probably noticed by now
that the next Wi-Fi doesn't support downloading all
that data this fast. So you need a back end that
can quickly crunch the data and give you the data at the
resolution that you want. And you can go all
the way down here to the rural data
points, which will pop up when you zoom in enough. And of course, you can view
this in different ways. So this is a view that
humans tend to like. It's a three-dimensional view. We imported the entire CAD
model of the oil platform. And we connected it
with all the other data. So for instance, if we want
to see the tank that we just viewed data from, which is
so aptly named 20VA0002, you can see exactly where that
is, and what it looks like, and what it's connected to. So you kind of browse the
data in the three dimensions. Just to give you a
little impression of what we mean by industrial
reality model, I want this model to be
up-to-date and contain the data today and for
what happened in the past. By the way, the charting
library that I just showed you, we couldn't find that. So we had to make it. And we open-sourced it. So if you're interested,
you can use that. It's not tightly coupled
to the Cognite back end. So if you have any
provider that can give you data at different solutions
you can use that-- [? uses react ?] on D3. So the scale of data
to be ingested is huge. And it's growing very fast. If you look, Cognite handles
lots of different data types to build the model of
industrial reality. So that can be ERP data. It can be maintenance logs. It can be 3D models,
like you saw. But if you look at
the data by volume, 99.7% of the data that we
have is time series data. So that's really where
the huge data is. Even though the 3D
model was large-- it was about 10 gigabytes-- the time series
data dwarfs that. And it's exploding. And all of that time series
data need to go somewhere. It needs to be stored. It needs to be processed. It needs to be queryable. So how do we do that? What is under the hood? So when we started
out building Cognite, we started with
a few principles. And one of them is impact. And it kind of seems
strange to say this. But you have to kind of write
it down to for it to matter. It's easy as a
technologist to build technology for technology's
sake because its cool. And I've been guilty
of that in the past. We've been lucky to
have large customers, demanding customers,
very early on to guide us in finding out what the real use
cases are to create real value. And then there's speed. So you want to show
something as fast as possible so you can iterate,
you can get feedback. And those two put together,
there's a consequence to that. And that is that you want to
use managed services wherever you can, especially for
anything that is stateful. Because handling stateful
storage service that has to scale up
and down, that has to have backups,
have redundancy, have all the logs for who
accesses the data, et cetera-- all of that stuff is just
painful to implement. And it's going to slow you down. And it's not something that-- yeah, it's being done. So we recognized very early
that we needed a time series database. And our hypothesis was
that we could get this as a managed service
too, that there would be something out of the
box as an API support of this. And our requirements were that
it would be robust and durable. So it means that
we don't drop data. No data points
should be dropped. It would have to support a huge
volume of reads and writes-- writes in particular. You always get new data in. Low latency, so they
can show the real time version of what's going on-- you want to see data
at any time scale. So you want the zoomed out
view and the zoomed in view that I showed you. You want to be able to
efficiently backfill. So if you're onboarding
a new customer, and that customer has a million
data points per second being generated, and you
can handle 2 millions, then backfilling is going
to take a long time. If they have a year of
data for you to backfill, it's going to take
another year before you're done with that, because you're
going to spend 1 million of your capacity on the new data
and then 1 million per second on the old. And you want to be
able to efficiently map over data in order-- so the sequential reads for
training models, for instance. So we experimented with the
OpenTSDB at the beginning. It's a great piece of software. The cool thing is you
can use a Bigtable, which is a managed storage back end. So you can use OpenTSDB with
Bigtable as the back end. Which is very nice, but
it had a few shortcomings. For instance, it's not durable. That means if you send a piece
of data to the data point, it will acknowledge that it
got the data point before it's written, which that means
you can potentially lose data if you're scaling
it up and down. And it essentially
used a front fill path for batch backfills, which made
backfilling very inefficient. And there were a few
other things as well. So we chose to build
our own time series logic on top of Bigtable. So Bigtable is a fully managed
service, which, as you know, we really wanted. It supports a huge number of
reads and writes for node. It's been tried and tested
on very large user facing distributed systems at Google. And it has this property which
most time series databases don't-- most key value
stores don't have-- which is that you can
scan forward efficiently. The keys are stored in order. A lot of key value
stores will hash the keys so that you get the
load distributed evenly. But Bigtable doesn't do that. So that means you don't
have to jump around when you're reading sequential data. It also means that-- the flip side of that-- is that
you can run into situations where you get hotspotting. So you need to write
your code around that. But for us, it's a price that-- it's been worth it for us. So I'll hand it over
to Carter Page here, who is senior engineering
manager for Cloud Bigtable. CARTER PAGE: Thanks, Geir. So I'll talk a
little bit about-- [APPLAUSE] I'll talk a little bit
about Cloud Bigtable-- how it's a good fit
for IoT and why we see more customers coming to it. I do want to say I'm
particularly excited to be presenting with Cognite. I think the stuff that
they're doing is really neat. I think his point about
doing impactful things, doing a comprehensive story
of IoT is very exciting. The idea of not just
connecting those devices and getting the data, but
once you've got literally tens of thousands devices-- way more than a single human
could actually monitor-- thinking about how do you
extract data, react to that, and manage very high
risk asset situations. And he's going to get into some
really cool stuff after this. But let me talk a little
bit Cloud Bigtable and how that is a good fit
for these types of use cases. A quick show of
hands, just to get a sense of the audience, who
is familiar with distributed databases like Cassandra,
HBase, things like that? OK. All right, so this
is not going to be rocket science to most people. The main thing, particularly
for large IoT use cases, where people are looking at collecting
massive amounts of metrics, is being able to handle this
really large scale traffic. So a couple of years
ago, for example, we did a load test
with a company where we basically simulated
the entire US trade markets all being processed
together, which we processed 25 billion
records in about an hour. And that's capable just due
to the scale of Cloud Bigtable and how it works. We were peaking at about
34 gigabytes per second and about 34 million operations
per second on Bigtable on a single instance. And the reason this works is
because Bigtable was built for very high scalability. And you essential get linear
characteristics way out on the curve. So Bigtable was initially
designed by Google as a backing store
for our crawler. And so it was stored to keep
a copy of the World Wide Web. It's expanded internally. It's being used for a lot
of other products as well. And so we have put about
14 years of engineering into keep finding new upper
limits and breaking those. So every distributed
system, eventually you have this straight
line that goes out, and eventually it flattens out. Everything eventually
hits a bottleneck. Or you might hit
something where it just-- if you've got an HBase
cluster with 1,000 machines, you're going to hit
probabilistic machine failures, and there's an overhead for
your operations and things like that. So we will eventually flatten
out, but pretty far out, and it would take a lot of work
for you to get up to the scale where you would notice. The reason this is important--
the linear scaling, from a business perspective--
is this gives you predictable cost of revenue. So when you're thinking
about building a system and you're like, I've
got a terabyte of data, and now this has to go to
10 terabytes, a petabyte, 10 petabytes, usually if
you're building this on top of your own home-managed
Cassandra system, you are going to have to rethink
each time you hit one of these new tiers-- all right, now, how am I
going to deal with this? I've got a lot more machines. I'm going to need a
new on-call rotation. I'm going to need new strategies
to be able to deal with this. Here it's just a matter
of cranking up your nodes. And the number of
nodes you need is proportional to the
throughput that you need. So give a quick overview for
how Cloud Bigtable works, you have clients that basically
talk to a single front endpoint that load bounces to the nodes. So you don't need to think
about address mapping or talking to individual nodes themselves. I'll take that layer away and
talk a little bit about what's going on underneath the covers. So the data itself is being
stored durably in an underlying file system called Colossus. And the Bigtable servers
themselves are actually not storing any data. They are taking
the responsibility for serving the data. And every row is assigned
to only one node. So your entire
keyspace is basically balanced across these
different nodes. And this allows
a single entity-- by being responsible
for an individual row-- allows for atomicity of
operations on it and allows for read your own writes. The advantage here of having
the file system and the data dissociated, is it allows
us to do some clever things in terms of being able
to rebalance workloads very aggressively. So you may have a
customer that has changed their underlying workload,
which then impacts how you are using Bigtable. Or you may have
diurnal patterns. You may have
different things that are coming on and going off
during the middle of the day, changes which tablet
servers or nodes are getting more or less activity. And what we'll do is we'll
actually identify these changes in patterns. And we will just reassign areas
of data to different nodes. And so this allows
a couple of things-- one is it allows you to
not have to worry too much about per node hotspotting. You can hotspot individual
rows, which is a problem. But in terms of getting
unlucky and having one server that's
hotter than the rest, we'll balance that out. It also means
higher utilization. And by having higher
utilization in these nodes, by keeping things
balanced, you're not having the provision
for the hottest node. We're trying to keep all the
nodes fairly well balanced. And that actually
means cheaper service relative to running this on
an HBase cluster or Cassandra cluster. In addition to rebalancing, you
can resize up and down fairly trivially. We have some customers who might
have ingestion workloads which only need a few nodes. And they might run a batch
at the end of the month or the end the week. And they might want to scale
that up to say 300 nodes. And you can do that
fairly instantaneously. If you've got a
really large data set, it may take 10 to 20
minutes for the data to rebalance over
the nodes just added. But it could be a good way to
make those batch jobs you run once a week run really fast. When you're done, you
scale it back down again. The basic data
model, it's key value but has more dimensions to it. So you have a single index,
which is your row key. And then your data
is stored in columns. And the columns are a
tuple of basically a column qualifier and a column family. The column family is
defined in your schema. And the column qualifier is
defined at insertion time. The table is sparse. So any column family,
column qualifier tuples that you don't fill
in for a given row don't cost against
your actual space. The database is also
three-dimensional. So under each of those
cells is an arbitrary number of versions. So you can keep them there
pretty much indefinitely, as long as you're row doesn't
get beyond a couple of hundred megs. Or you can instill a
garbage collection, say, when you wipe out any
data that's over a week, or just keep the last
five versions or something like that. Wednesday, yesterday,
we announced that we have replication now. So between two
Bigtables in a region, we will replicate the
data between them. This has a few advantages. The first is it expands
your failure domain. So you're no longer [INAUDIBLE]
into the failure domain of a single zone. You've got two zones
with the failure domains. So that gives you
higher availability. And another advantage that
people use, particularly because the replication
is asynchronous, is workload isolation. So some customers may have their
critical low latency server loads on one cluster and then
maybe doing batch reloads on another. And by doing it
on the other one, they're not interfering
with each other essentially. The effective result of this,
on the high-availability side, is we get an extra
9 onto our SLA. We have a high availability
application policy. I won't totally get
into those right now. But you can go read up on these. You have application
policies that can define how you want
your traffic to be routed. And if you use the
high availability one, you'll get automatic
zero-touch failover. So if there's any problems
in the middle of the night, you don't have to get paged. It'll just failover for you. Being a large data
tool and a database that's designed for terabytes-- petabytes, actually-- we
need really powerful tools to be able to make the most
advantage of your database. And so we've got deep
integrations with BigQuery with Dataproc, Dataflow. BigQuery, the queries
are not as fast as if you're going it's native
BigQuery, because BigQuery is an offline store. And certain
optimizations of BigQuery to be online to be able to
get single-digit milliseconds latency, which makes
the BigQuery queries run a little slower. But it's nice because you can
do ad hoc queries on your data without having to
write a MapReduce job. If you do want to
write a MapReduce job, you can use Cloud
Dataproc, or you can use our internal
replacement for MapReduce, which is housed inside of Dataflow. And then also this
week, we're announcing that we have a deep first order
integration with TensorFlow, which just got put on to GitHub. And so people can start
playing with that. So I'll hand it
back to Geir, here. GEIR ENGDAHL: Thank you. So with that background, I'll
go into a little bit of detail on how Cognite is using Cloud
Bigtable to store its data. And then we'll move on
into machine learning. And eventually you'll see what
the [? windmill ?] can do. So Carter talked about the
data model of Bigtable. This is our basic data schema. So the first thing
that is very important is how you choose your row keys,
because that's the only thing that you can look up fast. So our row keys consist of a
sensor ID plus a timestamp, or a time bucket. That means that you
can look up, very fast, the values for a particular
sensor at that particular point in time. And you can also then scan
through all the values of the sensor in order. Which is nice if you
want to train a model. And that's a very
inexpensive operation. Inside of each row, we
store more than one value. It's not just one value per row. We stuff a lot of data points
inside each row, typically around 1,000 data
points per row. So you have the timestamps,
which are unique timestamps. And then you have values,
which are fielding points. And of course it's binary stuff. It's just drawn out in
readable numbers here. But it's all binary
to save space. And actually, Bigtable does
compression for you too. So what you'll see
if you stop ingesting new data into your
Bigtable instance, you'll see that the total
size of it goes down. Which can be scary at first if
you don't know what's going on. Where's my data going? But it's actually a good thing. Here's how we architected
our data ingestion pipeline. So every step along this
path is auto scaling. Carter talked about how easy
it is to scale Bigtable. And we have a service
which looks at the load and then scales it up and down. It's pretty simple logic. So it starts with
Cloud load balancing. And then an API node, which
is a Kubernetes service, which handles authentication
and authorization. Then it will put the data
point onto a Pub/Sub queue. And then it will
say to the client, we got your data-- we're
not going to lose it. So Pub/Sub is another
component that we use a lot. It's a very nice
component in the way that it scales to
whatever you ask for. If you look at the
documentation for Pub/Sub, it will say like the limit
on the number of published operation unlimited, and
subscriber operations unlimited. So that's kind of
a bold statement. We haven't drawn
into the limit there. Once it's on the queue, it
gets picked up by a subscriber to that queue, which
will package the data and write it to Bigtable. So that's where our time
series writing logic lives. And once it's been written there
is a new job put on the queue to do with aggregates. So those are
roll-ups that we use to be able to answer queries
about any arbitrary time scale efficiently. So that's what we need in
order to do the dynamic zooming that you saw at the beginning. So your KPIs are
throughput and latency for this kind of pipeline. It's typically
what you'd look at. So the data that
comes in is queryable after 200 milliseconds,
in the 99th percentile, and we regularly handle 1
million data points per second. And I'm pretty sure it could
do much more than that too. And querying, much simpler-- this is in
synchronous operation, so it goes to the API node
and then straight to Bigtable. One of the
optimizations that you want to do if you want
to transfer a lot of data here is that typically
API developers like to have JSON data. And for most applications like
Dashboard that makes sense. If you're writing a Spark
connector to your API, and you want to run
machine learning on that and transfer lots of data,
then the JSON serialization-- the serialization
becomes an issue. And so you want to use
something like Protocol Buffer or another binary
protocol for that. And it's not really
the size that matters. Because if you GCF to JSON,
it's very small anyway because it's very repetitive. But it's the memory overhead
of doing that serialization. So that's a nice optimization. I want to talk a little
bit about cleaning of industrial data
because it's something that's often overlooked. If you want to make
a useful application in the industrial
IoT space, it's not enough to have
time series and AI. A lot of people are
running around saying time series plus AI is
going to solve everything. A very simple question is if you
have 100,000 time series coming from an oil rig, and you want
to make your predictive model for this one tank that
we saw, which time series will you pick? And how will you pick those? Are you going to manually
go over all the diagrams? It's going to take
you a lot of time. So typically, we see these
data science projects, and they are really about
finding the right data. So that's what 80% of
the time is spent on. And then at the
end of the project, you have a wrapup where
you try to model something. If you want to truly
understand what's going on in the
industrial world, you need to be able to get
data from a lot of sources-- data like the metadata of the
time series, the equipment information-- who made
it, when was it replaced, failure logs from previously,
worked orders, 3D models, and the process
information-- how things are physically connected
and logically connected? Which component is
upstream of this? And it's not enough to have
all this data in one place. It needs to be connected. The hard part is connecting it. And the glue that
holds this together is the object in
the physical world. And the unfortunate thing is
that the same physical object has a different name depending
on what system you ask. So we spent a lot of time
on this contextualization, figuring out how we'd
map the IDs from one system to a unique
ID for each asset, for each physical object. So if you look at the
cleaning pipeline there, there's this thing called
an asset matcher, which we spent a lot of
time developing and which will
assist experts and do automatic mapping, in many
cases, of IDs from one system to another to be able to make
this connected, contextualized model. So you're probably wondering
now what this windmill does, and why it's here. So I'm just going to
say a little bit more, and then I'll get to it. Predictive maintenance-- you've
probably all heard about this. There is a great business case
for predictive maintenance. We have seen cases
where a single equipment failure on a piece of subsea
equipment costs $100 million to fix. So obviously you
want to prevent that. But this is also why
it is so hard to do. Because the failures
are fairly rare. There is not a
lot of label data. Imagine what it would
cost to get enough label data to validate your
model, let alone train it. So you're typically stuck with
these unsupervised approaches. And for anomaly
detection, what we've seen is two classes of approaches
for how to do this. One is forecasting based. So it means that you will
take a set of sensors. You will hide one of the values. And you'll try to predict
it using the others. And if your prediction is far
from what is the actual value, then you flag that
as an anomaly. Or the other approach is
you take your sensor data, your set of sensors. You plot them in an
n-dimensional space, and you see what points
are close to each other. They form clusters, and those
clusters typically represent different operational states. So you'll have a cluster
around the running stage. You'll have a cluster
around the idle stage. You'll have a cluster
around the powering, powering down maintenance. And if you have a new
points, and it doesn't fit-- it's far from any of
these cluster centers-- then that it is an anomaly. Let's look at this live. So this demo is as
live as it gets. It has a lot of moving
parts literally. Everything that you
see here is live. There is no pre-trained model. There is no pre-generated data. The data is going to be
created right here, right now. And we'll train the model,
and we'll see if it works. So are you excited? AUDIENCE: Yeah. GEIR ENGDAHL: OK. [APPLAUSE] Me too. So let's see if we get data
from this wind turbine now. And it looks like we do. So this is a different
view of the wind turbine. It's a 3D view into
what's going on there. It has the sensor values. I can turn this knob. I can increase the speed. You'll see it will start
to produce more energy. It's producing a lot of energy
for a wind turbine the small. So let's go into Jupyter. I'm sure those of you
who are working with data are kind of familiar with this. So we're going to interact with
our Cognite API via our Python SDK, which is also open source. So first we're going
to just log in here. We're going to select
the right data. And then we're going to plot it. This live plotter will show
the analyst's view of this. And you see if I
adjust the speeds, you see it will go down. If I take it up, it's
going to go up a bit. And this, what you
see on the screen, is going to be
our training data. So I'm going to give it
a little bit more time. So it's seeing a little bit of
normal operation of this wind turbine. We painfully brought this
wind turbine for you here. It's 3D printed. It looks very homemade. We got it through
airport security. I was taken aside by
security here at Google Next. They were wondering
what this base is. Because this wind turbine
thing isn't mounted on it. It has wires coming
out of it, and it has this red scary light on it. So I've done a lot of
explaining to bring this here. Now I need to stop this
plotting to move on. I will create an anomaly
detector for this, and I will select a
time range for it. Now this is pushing the training
operation up into the Cloud. So it doesn't actually
happen on my computer. The SDK will just do an
API pull to train a model. And then we get the job
ID that we can query for the status of the job. It is not a lot
of data right now, so it took very
little time to train. And now we can create
another plotter. And it's going to plot again
live data from this wind turbine. But this time, it's
also going to plot the output of the predictor,
the anomaly detector. So how can we introduce
an anomaly here? Well, I'm going to
use brute force. I'm going to hold it back here. It takes a little bit
of time for the data to appear in this Python thing. And now you'll see it's
detected an anomaly. You see the red
background there. [APPLAUSE] And it should go back to normal
again once I've let it go. So there you have live
anomaly detection. Now it's back to normal. Operators can use
this to monitor-- if you want to monitor
a 100,000 time series you can't do that manually. You can't put it
up on the screen and have people watch it. Well now, operators can be
alerted to strange conditions, and they can look
into what's going on, and hopefully prevent
the next $100 million failure before it happens. So another very useful
thing that we do-- once you've detected
this anomaly, your first question is
going to be something like has this happened before? Or has something
similar happened? So we implemented something
called similarity search in the API, which is not
really machine learning, but it is a very useful thing. And it is very computationally
expensive to do. So you can take a time
period of a setup sensor and look for that pattern in
a different set of sensors, or in the same, find
similar portions. So this talk also has
"AutoML" in the title. So AutoML-- on oil rigs
you have hydrocarbons. And if there is a leak
and you have a spark, then it's potentially
very dangerous. So they're always looking to
see if there are faulty wires. Faulty wires can
be very dangerous. So typically what
they do is they have these regular inspections
every six months or every year, where they go over everything. But these damaged wires,
they don't appear over time. They're typically the result of
someone stepping on something, or something mechanical happened
at a particular point in time. So what we have
tried to do is build a model which can
detect faulty wires so that you can wear a
camera on the helmet or use CCTV or other
ways of gathering images and in the background always
look for this kind of failure. I'm not trying to replace
those inspections. We are trying to augment
them and make it even better. So we trained the
model, using TensorFlow, to detect faulty wires and
also tell us where in the image the failure is. This is a project that
we spent three months on. And we were able to get
to a particular accuracy. That was about 95.5% on
precision and recall. And then in June, we
got access to AutoML. So we figured we would
upload the data set there and see how well that would do. So I'll show what
that looks like. So training a model
with AutoML, if you can upload a bunch of
images onto a web page and you edit the CSB
file, they you can do it. So we did that. And in 5 minutes, I was able
to get a model with 96.1% and 96.2% precision and recall. So that was half a
percentage point up. But then we didn't give up. We tweaked our model
a little bit more. So we were able to get
that one even higher. So we were able to get to
95% precision and 99% recall on our own model, which
it's better on one metric but not the other. But the big
difference, I think, is that we spent three
months doing it, and it took five minutes
to train the AutoML model. So let's see what that can do. So on my way here, on
the first day at Moscone, I found this wire outside,
and it looked dangerous. So now I ran the AutoML on it. And it was not part
of the training set. And this is kind of
a new hobby for me, to go around and look
for faulty wires. But yeah, it really
caught that quite well. And on my way to the rehearsal,
which I did back in June, I also found this faulty
traffic light wire. This is the knob that you push
to make the pedestrian light go green. And it also catches
that really well. So I think it is
possible to do this. And I think there is a class
of problems that you can solve in the industry--
like rust and leaks-- and using camera
feeds as a sensor, that can be very powerful and
instrumental in getting people out of harm's way. So I'll hand it over to
Carter to wrap it up. [APPLAUSE] CARTER PAGE: Thanks, Geir. If we have any time-- yeah, we'll have a little
time for questions after this. I'm sure Geir will be
able to join us there. So quick review again. Cognite basically built
the system on top of GCP, focusing on the scalability
and the throughput of Cloud Bigtable. And focusing on the
capabilities of AutoML, he was able to take a
training exercise which took three months
with TensorFlow, was able to do it in
five minutes with AutoML. Which is really
exciting for companies that want to tap into
this type of technology without having to hire an
army of data scientists. Not everyone can hire
Geir in their company. Also, one of the
things-- actually, I don't know if it was clear
when things were switching around when he
did the windmill-- you saw some code. But he didn't actually program
the training model for that. He just highlighted the
part that he had done before and said this is bad. And so they were able to
put this out in the field. They don't need to
have programmers. They can have engineers who
know what looks bad on the graph highlight that, and then expand
that out to tens of thousands of metrics and be able
to quickly identify things that are going wrong. And that's really exciting to be
able to scale these things out to industry. We have replication now
in Cloud Bigtable, which is going to provide
higher availability and some great features
around workload isolation. And it provides a great basis
for large scale data analysis and machine learning. If you want to learn more about
the technology you saw up here, here is a handful of links. There is the Cloud
Bigtable main product page at cloud.google.com/bigtable. And there's a bunch of
material underneath that. And then we also have the
various machine learning pages up there. So you can learn about
machine learning in general under the products page. And then there is a
master page for AutoML. And then at the bottom there
is the TensorFlow integration with Bigtable that we
launched this week. And so that's all we have. [MUSIC PLAYING]