ALFRED FULLER: Hi. Yeah. They just told me
to get started. So I'm presenting-- well. I'm co-presenting No-SQL
versus SQL. My name is Alfred Fuller and my
co-presenter, Ken Ashcraft, he was just here. But I guess I'll get
started anyway. So I'll do the overview
at least. That's weird. Excellent. This is opposed be a debate. But Ken Ashcraft's not here
right now, so I guess it's going to be pretty easy. And I'm always telling myself
to talk slower in these things, and I guess I'm going
to have a lot of extra time. So this is going
to be awesome. So the first thing I'm going to
do is I'm going to give you a quick overview of
data in the cloud. So why in the cloud? Well first and foremost, the
cloud is excellent for fault tolerance, because when you use
Google's cloud, you know, we man the pages for you. We always have people looking
at these systems to try and keep them working without any
user visual interruptions. And we automate fault recovery
as much as possible. So most of the time when
something happens, no one ever need know about. It just automatically switches
over to another machine or something else, and it
just keeps working. And that means low
maintenance. And in addition to the fault
tolerance, low maintenance also means that we manage the
updates for you in the cloud. From a bare metal to software
patches you never have to worry, how will this patch
affect my systems, because you're using the same cloud
that we use at Google. And we vet these patches, we
make sure everything works, and then we push it. We go through that
whole process. And you don't have to
worry about that. So when you use the cloud, you
can focus on what you do best and not worry about the cloud. All of our cloud products also
have built in durability in terms of replication. There's nothing to configure. There's nothing to
think about. It's built in from
the ground up. These systems are designed
to work this way. And they're also geographically
distributed. So nothing is sensitive to a
single power outage or a single geographical location. And finally, accessibility. The cloud is always on. It's always available, at
least when you have an internet connection. And when you don't, when
you develop against Google cloud products-- well, at least the Datastore
and Cloud SQL, which I'm talking about today-- we have local development
environments so that you can test against those environments,
you can build against those environments, and
then you can deploy to the cloud and to production
without worrying. So I work on App Engine. How many of you use
App Engine? Show of hands. Oh, that's a lot. But for the people who don't
use App Engine, App Engine lets you build apps on Google's
infrastructure. Its Platform as a Service and
its goal is to make your app-- whether it's a web app or a
cloud-enabled Android app--- easy to build, easy to scale,
easy to maintain. So again, you can focus on what
makes your app great, what makes your app special. App Engine connects into some
storage APIs, primarily Cloud SQL, the Datastore, which again
is what I work on-- Cloud SQL is what Ken was
supposed to talk about-- and Cloud Storage, which they
have a talk on Friday. So if you're interested in Cloud
Storage, which is BLOB storage, I recommend
going to that talk. So the Datastore. Datastore is literally Google
storage infrastructure. So it's the same technology we
use for our own applications. So Gmail, Google Web Search,
you're using the same infrastructure pieces that we
use to keep those things up and running. And it's distilled into
well-documented APIs that are included in the App
Engine SDK. And it's built for scale, both
in terms of size and traffic. Right now we perform over 2
trillion operations per month in the Datastore alone. And it's a fully managed NoSQL
database solution. So you don't have to worry about
provisioning or scaling. It just kind of works. Cloud SQL, on the other hand,
it's fully managed, but it's pure MySQL. So it's kind of like that
computer you built at home to run your SQL instance, except
it's not in your basement. It's not on some island in some
VM somewhere with no one looking after it. It's in the cloud, it's fully
managed, and it's happy. KEN ASHCRAFT: Hey, I'm here. Hang on. Hang on. I'm here guys. Sorry. ALFRED FULLER: Oh, Ken. Great. KEN ASHCRAFT: Hey,
sorry I'm late. Sorry. We can get started now
that I'm here. My name is Ken Ashcraft and
I work on Cloud SQL. This is Alan Fuller and he
worked on the Datastore. ALFRED FULLER: Alfred. KEN ASHCRAFT: Oh, yeah. Anyway, so let me give you a
high-level overview of what running in Google Cloud is like
and App Engine and Cloud SQL and all of that. ALFRED FULLER: No, no. I just did that. You're a little bit late. KEN ASHCRAFT: Oh, Oh. Sorry about that. ALFRED FULLER: Yeah. KEN ASHCRAFT: You know,
it's been hard. I got distracted. I was talking with developers
outside. They just keep on mobbing me. Everybody's so excited about
using Cloud SQL. They're really loving
to have the expressiveness of a MySQL database. It's super easy to get started,
super easy to manage, and the best part is they don't
have to use any of that NoSQL silly Datastore stuff. They get to use a real
database, MySQL. ALFRED FULLER: Wow. Starting with the name
calling already. You know, I knew this was
supposed to be a debate. But I didn't think it'd get
so ugly so quickly. KEN ASHCRAFT: It's
not name calling. It's fact. Anything that you can do,
I can do it better. I can do anything
better than you. ALFRED FULLER: No you can't. KEN ASHCRAFT: Yes I can. ALFRED FULLER: No you can't. KEN ASHCRAFT: Yes I can. Let me show you. Let's talk about queries. Queries are important because
they're the way that you access your data. If you don't have a powerful
query language, you can't get to the data that you want, and
you can't get it quickly. Cloud SQL supports the
international standard for database manipulation,
structured query language, or SQL. It sounds like NoSQL databases
are proud not to support this standard. ALFRED FULLER: Actually, NoSQL
is kind of a misnomer for the Datastore as we support an
ever-growing subset of a structured query language
similar to SQL. Specifically, we support a
wide range of filters. We just added support for OR
in both Java and Python. So you can combine these
filters with OR and in sub-expressions. It supports arbitrary sorting. And we also added, recently,
projections or index-only queries, where you can only
retrieve a few properties from your entities, and it's much
faster and cheaper than retrieving the whole entity. And we actually go beyond SQL
in that we support repeated properties. So you can do set operators
like "contains all" or "contains any." And that's
incredibly useful when you're building tools like labels in
Gmail or tags for photos. And the best part about this is
this subset scales in the size of the result set. So you never have to worry, as
your database grows, if the performance of your queries
is actually going to degrade over time. KEN ASHCRAFT: I don't know. In Cloud SQL, we support all
those SQL queries that you just talked about
and then some. We can do things more powerful,
like aggregations. So let's say that you want to
compute the average age of people living in each city. In Cloud SQL, it's as
simple as this. All you have to do is select
the average age and group by city. ALFRED FULLER: Well, the
Datastore supports something like that too, except it has to
scale to enormous sizes, so we have a powerful framework
called MapReduce. And here's an example. You can see on the left
there's some data. I have each person, and it
has a city ID and an age. And we can use MapReduce
to compute the average age in each city. By simply mapping, we map this
to a key value pair. In this case, the key is
the city ID and the value is the age. Then we shuffle to
group by city ID. And then we reduce to calculate
the numbers that we need to compute the average,
namely the total number of people and the sum
of all the ages. And this case, the sample
set I chose is actually, apparently, quite young. But also, we can go beyond this
in that this required mapping over all your data,
and MapReduce is a very powerful framework to do that
because it computes this in parallel, so as a basic scatter
gather algorithm. But if you want to keep this
view that you've created of the result set up to date as
your entities change, you can do that using something called
a Materialized View. So what you do is you basically
track changes in your system as they happen. And you store them in
a separate entity. And then asynchronously, you fan
in those changes and apply them to your result set. And look. The results are now
up to date. Apparently, cities two and three
have been evacuated. And-- whoa. KEN ASHCRAFT: Time to update. I don't know that all sounds
pretty complicated. In Cloud SQL, it's
much easier. And beyond that, you can do
more complicated things. Like let's say that you wanted
to put this average city age information on a map. Well, you need to be able to
have joins to do that. So you'd probably have a table
for your people and a table for your cities, and the cities
table will contain the latitude and longitude. And it's just as easy as this
SQL query that we've got on the screen. ALFRED FULLER: Yeah, it's not
as easy in the Datastore. But let's see what [? Appy ?] has to say about that
on the scoreboard. And I guess you're right. The Datastore actually has a
wide variety of queries that support most use cases. But if you really want to query
anything and everything, or in fact all of your data
when you do these aggregations, you really
have to use Cloud SQL. KEN ASHCRAFT: That's right. Let me tell you something else
that I can do better than you. Transactions. Transactions are important
because they ensure that you have atomically made changes
to your database. You don't want your machine
to crash in the middle and partially apply some changes. Lots of NoSQL databases,
they don't even support transactions. ALFRED FULLER: Actually,
the Datastore does. KEN ASHCRAFT: Well, OK. So you can do a transaction
on a single row. That's not a real transaction. ALFRED FULLER: Datastore
actually supports transactions across rows using something
called Entity Group. These are groupings of entities
under a single transaction log. And the thing they do incredibly
well is they provide ACID semantics
at scale. So all of these entity groups
can have transactions occurring simultaneously, and
you can have any number of these entity groups in
your application. For example, if you have a game,
and you have a player entity, and then you have
entities for items in that player's inventory, as long as
you structure it in such a way that the items in the player's
inventory are in the same entity group as the player,
you can act upon these transactionally. And this is very important,
because you never want a player to use an item and have
the item still be in their inventory afterwards or try to
use an item and have the effect not work. So for example, if a player
wanted to drink a potion-- we have the player as the root
entity and the potion as a child entity. So they're in the same
entity group. And so we can easily act upon
these transactionally. So here's an example of how to
do this using the Python API for App Engine. There's APIs in many other
languages, well, Go and Java. And it's as simple as decorating
the function, use_potion, in db.transactional,
and it makes everything in that function
happen atomically. So you get the player. You get the potion from
its inventory. You transfer the health and
the potion to the player. You remove the potion from
the player's inventory. And then you put
the player in. It all happens atomically. KEN ASHCRAFT: I don't know. That sounds pretty limited. What happens when you want to
atomically move a potion from one player to another? You're stuck. I told you. Anything you can do,
I can do better. ALFRED FULLER: Wait,
wait, wait. No, no. We also support cross-entity
group transactions. So if you have two players, and
one player wants to sell a potion to that other player,
you can do so simply by setting the XG flag to true. And now in this scenario, you
can load the buyer, you load the seller, you load
the potion from the seller's inventory. You transfer money from the
buyer to the seller. You store the potion in
the buyer's inventory. You remove the potion from
the seller's inventory. And then you save both to the
buyer and the seller, and it just happens. It works atomically. KEN ASHCRAFT: Well, in Cloud
SQL, you can do the same thing, but you don't have
to define those relationships in advance. All you need is START
TRANSACTION, you run your queries, and then commit. It's as simple as that. Here's the exact same example
from the previous slide, except how you do
it in Cloud SQL. Now, these cross-entity entity
group transactions, are there any limitations to them? ALFRED FULLER: Well, actually we
had to do something called two-phase commit to make sure
that we commit to all the transaction logs atomically. And this doesn't actually
scale very well with the number of transaction
logs involved. So currently, we have a limit of
five entity groups that you can use in these cross-entity
group transactions, which is more than enough
for most cases. KEN ASHCRAFT: Well, there are
those other use cases where you want to transact over the
entire world, and in Cloud SQL you can do that. So let's say that you
wanted to give gold away to your friends. And it's amazing how your
friends just pop up out of nowhere when you're
giving stuff away. Again, all you need is START
TRANSACTION, you run your queries, and then you commit. There's no limitations, then, to
the number of entity groups or rows that they can be
involved in a transaction. ALFRED FULLER: Well let's see
what [? Appy ?] has to say about that. And that's [? Appy, ?]
by the way. App Engine logo. You know, I guess the Datastore
does support a wide range of transactions, and they
do meet most use cases. But if you really want to
transact in the world or lock your whole table, you can
use Cloud SQL for that. KEN ASHCRAFT: Yes, you can. Now, these transactions that
you have over here in the Datastore, what good are they
if they're broken by your cross-data center replication? We all know that the Datastore
is built on top of BigTable. And BigTable has this weird,
out-of-order, eventually consistent replication that
nobody really understands. ALFRED FULLER: Well, actually
the Datastore uses Megastore Replication. And Megastore Replication uses
those entity groups that I talked about earlier. And remember, they had parallel
transaction logs. Well they also replicate
in parallel as well. So we replicate on the
transaction level. Although the system does have
no master-- and that means that there's no replica in the
system at any given time that necessarily has all the most
up-to-date information. But if you use operations that
provide the entity group in their request, like a "get by
keys" or an ancestor query, we can make sure that you're
reading from a replica that has all the most up-to-date
information for that entity group. We do also provide really
powerful global queries. So you can query against all of
your data the Datastore no matter how much data you have. But these, they don't have
an entity group, and it's impossible to determine ahead
of time what entity groups you're going to see
in that query. So we can't make sure that
you're reading from a replica that has all the most up-to-date
information. But if you recall from
this same slide. We do parallel replication. And that means that we can scale
the replication based on the amount of resources
we have available. And that means the replication
actually happens very quickly. So these global queries are
only usually a few hundred milliseconds out of date. And speaking of replication, I
know that MySQL uses a single master to guarantee strong
consistency but then asynchronously replicates
changes to a slave. And if there's a lot of changes
queued up on a master and the master crashes,
you lose that data. I bet it's a lot of fun to
tell your developers that you've lost their data
every time there's a Datacenter outage. KEN ASHCRAFT: No, no, no. It isn't a whole lot of fun to
have those conversations. And that's why we don't have
them with Cloud SQL. In Cloud SQL, we use synchronous
replication. Let me show you how
this works. So we have our MySQL server
running here in data center A. A client sends some data
to the MySQL Server. Before responding to the client,
we synchronously replicate the data to the other
data centers, and then we respond to the client. What this means is that if we
lose the machine that's running MySQL server, or even
if we lose the entirety of data center A, we can quickly
restart the MySQL server in a different data center without
any data loss. ALFRED FULLER: Well,
I don't know. Let's see what [? Appy ?] has
to say about this one. KEN ASHCRAFT: Oh, man. I knew I was going to win this
debate, but I didn't think it would be this easy. ALFRED FULLER: It's
not over yet. Let's talk about scalability. In any dimension you can scale,
I can scale better. KEN ASHCRAFT: No you can't. ALFRED FULLER: Yes I can. KEN ASHCRAFT: No you can't. Let me show you. I'll give you some examples from
within Google about how we use Cloud SQL. The first one is the Google
Time Keeper application. It's used by an organization
within Google, the AdWords sales and support team. And they use it to track how
much time they're spending on chat support, email support,
or campaign optimization. And then they use this
information to optimize their own workflow. So this is a large organization
within Google that's using Cloud SQL for their
day-to-day jobs, and it works really well for them. Let me give you another
example. The Google company org chart
runs on Cloud SQL. So this is 30,000 employees,
their relationships to each other, and what they're
working on. To give you an idea of the
kind of load that we can handle, picture this. We've got these company
all-hands meetings. So all 30,000 employees are
listening to our upper management. And the upper management reminds
everyone, all right, I want you to go onto the org
chart application and update what you're working on. So this is a tech company
that we work at. Of course, everybody's
there with their laptops on their lap. So everybody simultaneously
opens their laptop and goes to this website. This is tens of thousands
of employees hammering on this website. All of a sudden, we get
tens to hundreds of QPS on the back end. And Cloud SQL handles
it just fine. So Cloud SQL works very well
for these sorts of large corporate environments. ALFRED FULLER: That's
not scalability. Let me show you scalability. Say you're building a hugely
popular mobile application. We're talking about thousands
and thousands of QPS and millions and millions
of users and billions of ruffled feathers. Well with the Datastore
there's no headaches. There's no provisioning. It just scales to your use
case, and it just works. Let me show you how. So the Datastore, as I said
earlier, is built on top of Google infrastructure. And each one of these layers
adds a key component to the Datastore scalability. For example, the lowest layer
Google File System, or GFS, provides huge capacity and
extremely good durability. And this allows your application
to get as large as it needs to get. And on top of that
we have BigTable. And BigTable automatically
splits your data based on loads and balances them
on the machines that we have available. And so say your traffic
changes. All of a sudden you have a
spike of writes in one part of your data. What BigTable will do is it will
take down that one shard, or tablet, and split it into two
pieces and then load those on different machines. And I'd like to thank
[? Ekie ?] for this very excellent comic
demonstration of this. And then on top of that
is Megastore. And Megastore works at scale. It is a truly distributed
database system, because it spans multiple data centers and multiple geographic regions. And that's the level
it operates. And if you want super into depth
detail about this, you can see my talk from last
year, "More 9s, Please." And at scale, the reliability
of the Datastore is hugely important, because even small
local issues can cause outages for many, many, many users. And the Megastore just handles
this by automatically failing over to different data
centers and reading the data from there. And it's guaranteed, if you're
using the entity groups, to always have that strong
consistency, because it makes sure that whatever replica
you're reading from is up to date. It also handles catastrophic
failures. So if one or more data centers
all a sudden goes offline-- they fall into the ocean
or the power outage happens nearby-- well, those types of
failures are still hidden from your users. So let's see what the
score on this one. Oh yeah. KEN ASHCRAFT: All right. I'll let you have this one just
because I'm so far ahead. ALFRED FULLER: Good. Let's talk about management,
then. Remember at the beginning of
this presentation, I talked about the benefits
of the cloud. No software patches
to worry about. No hard drives to replace. No systems to purchase. KEN ASHCRAFT: And all of that
applies equally to Cloud SQL. Let me show you just how
easy it is to get started with Cloud SQL. But the very first thing we need
to do is create an App Engine application. And rather than doing a live
demo and worrying about WiFi and all that stuff, I'm just
going to show you some screenshots. So we go first to the App Engine
website where we have the form for creating
an application. We need to pick an app ID. So we're going to go with
SQL vs. NoSQL and an application title. And then I go down here to
create an application. All right, that worked
just fine. And so I can click on
the dashboard to see what we would see. But we haven't uploaded
any code yet. We don't have any traffic, so
the dashboard isn't very interesting. The next step that I'm going to
do now, that I've created this application-- and let's keep in mind the SQL
vs. NoSQL ID, because we're going to use that in
just a second. The next step is I'm going to go
over to the API's console. And if you've used the Maps API
or the Translate API, you probably have this
already set up. I've just created
a new project. And so it's telling me that I
need to set up my billing. So I'm going to go here to the
Billing tab, and I click on the Checkout button and go
through the billing flow. I enter in my credit
card information. Once I'm done with that, I come
back here to the main page, and I can set up my
Cloud SQL instance. So I go to the Cloud SQL tab. And I don't have any instances
yet, so I click on Create a New Instance. And it pops up this dialogue for
me, and I need to pick an instance name. I think I'll come up with
"sql is better." And now I need to authorize
that application. Oh, I also can pick a size. The size basically controls how
much CPU and RAM you're going to allocate to
the MySQL process. So remembering that application
that we just created of SQL vs. NoSQL,
I type that in. And I click on Create
The Instance. Oh, it wants me to
do a Project ID. I type that one in. Again, "sql is better," of
course, and I choose this ID. And it starts to create
my instance. After a few seconds, the MySQL
instance will be provisioned, and we'll see a dashboard
like this. You can see down here we have a
little bit of storage usage already, and that's because
MySQL needs to format some of its data files. Now we want to get started using
our Cloud SQL instance. We have a SQL prompt built into
the web UI that I can easily use for simple queries. So first thing we need to
do is create a database. So I type in CREATE DATABASE,
and I can click on Execute. That works just fine. Now I need to create a table. So I can type in that
SQL statement and execute that as well. You can imagine how I can
continue to use this to populate the data or query
the data as well. And if I need to create
development or staging instances, I just go through
those last few steps, and everything is already
provisioned for me. So let's see you make that any
easier for the Datastore. ALFRED FULLER: Oh,
it's easier. KEN ASHCRAFT: Then show us. ALFRED FULLER: Oh,
I don't need to. You already showed us
about 20 slides ago. KEN ASHCRAFT: Oh. ALFRED FULLER: When you created
that app originally, the Datastore was ready right
then to accept rights from your application. There's nothing to provision,
nothing to configure. You just start writing data. And if you want to use different
tables-- or in the Datastore, they're called
"kinds"-- you just define those kinds in your code. You don't have to tell
the Datastore about him ahead of time. And you start putting data. If you want isolation, you
can use Namespaces for multi-tenancy or to isolate
a development instance. Or you can even use an entirely
different app to completely isolate your
staging instance from everything else. So let's see what [? Appy ?]
has to say about that. Oh, yeah. KEN ASHCRAFT: All right. I'll let you have another one. Let's see what's up
next, though. Ah, schema. I got this one. I got this one. All right. So the schema's important
because it defines what your data looks like. What are the data types? What are the relationships
between the data? And you saw in my recent example
how I created a table. Well, in Cloud SQL, this schema
is strictly enforced. And that means that you have to
create the table before you can start working
with your data. And some people think of this
as a benefit of having this strictly enforced schema. It means that you don't have
typos in your code where you write to some non-existent
column, and then when you try to read from the column that
you're supposed to read from, there's no data there. Let me give you an example
of how to do a schema change, then. Let's go back to our previous
example of a player with a name and some integer
amount of health. We're going to want to add
magic to this game. So we need to add
a mana column. All we need to do in
Cloud SQL is alter table and add the column. Just like that. ALFRED FULLER: You know, that
sounds a little too magical. KEN ASHCRAFT: You're right. We do have to be careful with
these ALTER TABLE statements, because they can lock
up the table for the duration of the change. And the reason why that happens
is that MySQL has tightly packed the row data so
that one row is right adjacent to the next. And when we add that extra
column, there's not room for that new field in that
tightly packed space. So it needs to copy everything
to a new location. So for the duration of the time
that it takes to copy everything, you're going
to lock the table up. Now there are some tricks that
we can play to minimize this lock time or even hide
it entirely. And it's called an Online
Schema Change. And what we do is we have
our old table, and we have a new table. We do a background copy of
the data from the old table to the new table. And while that background copy
is going on, we don't want to miss any changes that are
happening to the old table. So we set up a trigger on the
old table so that if any of those changes come through,
they'll get propagated to the new table. Once everything is copied, we
just do an atomic rename and it just works. So if you want to see how that
works, there's a company called Percona. And they have a tool called
pt-online-schema-change that works with MySQL to make
that very, very easy. ALFRED FULLER: Well, in the
Datastore, schema changes are actually magical. Well, not really. They're not actually magical. You have to do something. But the schema enforcement
actually happens--- or you can enable a schema
enforcement in your code. The Datastore doesn't actually
enforce this schema for you. What this means is if you want
to add that mana field, all you do is change your
code and it's there. You can set a default value, and
you can just start using this stuff. If you need to back fill any
of the previously stored entities to, say, do some sort
of complicated calculation to figure out what initial mana
every character should have, you can do that using the
powerful MapReduce framework that I described earlier. And let's see how this
one turns out. And I win. KEN ASHCRAFT: No, it's a tie. ALFRED FULLER: Oh, that didn't
turn out how I thought. KEN ASHCRAFT: No. ALFRED FULLER: Who could
have predicted a tie? KEN ASHCRAFT: You know, maybe
there is room for both of our products in the world. Actually, let me give you an
example of where the Datastore probably is a better
fit than Cloud SQL. These file sharing applications
are really popular nowadays. If we wanted to build one, well,
first we need to come up with a good name. I think the DropRectangle.net
would be a good one. If you were to use Cloud SQL
to store this data, this is probably how you would structure
your schema. You'd have a table
for your users. Of course, they'd have
an ID and a name. You'd have a table
for your files. The owner_id would reference
back to the users. And you'd also have a table
for your access control specifying who is allowed
to access which files. So with this schema, you can
imagine how you could run queries like, show me all of the
files that I have access to, or atomically transfer
ownership of this file from one owner to another. And this works great until your
site gets popular and you have lots and lots of users and
lots and lots of files. And the data no longer fits
on a single machine. At that point you can
shard your data. And the natural way to shard
the data would be by user. Unfortunately, we have this
operation of transferring ownership between users. And if you shard your data by
user, you don't know whether the two users are on
the same shard. And if they're not on the same
shard, it gets really hard to atomically move that file
ownership between shards. And this is where the Datastore
actually would probably do better
than Cloud SQL. You structure the user
as the root entity. You'd have files beneath
that and access control underneath that. So with the global queries, you
could easily find all the files that you have access to. And if you want to atomically
transfer files between users, you can use the cross-entity
group transactions that Alfred described earlier. ALFRED FULLER: And you know,
when I was working on this presentation with you, it really
became kind of clear to me that there are also some
use cases for Cloud SQL, especially if you
want to support off-the-shelf solutions. So there's this entire
ecosystem built up of frameworks that are available
that were built to work with relational databases. And it doesn't always make
sense to modify these solutions or roll your
own solutions. So if you just want to use these
off the shelf, Cloud SQL is obviously a better
choice there. KEN ASHCRAFT: So do you think
there are ways that our two products could work together? ALFRED FULLER: You now, we
have this PM, or product manager on our team, Greg. He's always sending us these
emails, like selling stuff from his garage. And I don't know what those-- KEN ASHCRAFT: He does send
a lot of emails. It'd be really great if he had
some sort of web application where he could post things for
sale or list things for sale, and people could search for
what they want to buy. ALFRED FULLER: Yeah, yeah. And he could call
it Greg's List. KEN ASHCRAFT: That's
a good idea. ALFRED FULLER: And if he did
this, what he could do is he could use Cloud SQL to store all
of his active listings so that you have all the speed of
the in-memory operations and in-memory performance
of a single machine. And then when a listing expires
or is sold, you can use the Datastore to archive
all those listings. And they're always available,
and you can still query against them, and you could
still use them. KEN ASHCRAFT: One of the big
benefits of putting the active listings in Cloud SQL would be
that you get to take advantage of the powerful query language
and all of those aggregations and lots of flexibility so that
you could run queries like, show me the
average price of a sofa in San Francisco. ALFRED FULLER: Yeah, and Cloud
SQL works best when your entire data set fits into memory
so it doesn't have to page the disk or do any sort
of heavy lifting there. And the active set of listings
is relatively small to all the listings throughout time. So it really makes a lot
of sense to keep them in Cloud SQL. KEN ASHCRAFT: And storing the
archive listings in the Datastore makes sense, because
when you have schema changes, you, of course, want to apply
it to the data that you're actually going to be working
with, the stuff that's in Cloud SQL. But all of those archive
listings, you don't really want to apply the schema changes
and do the back fill and everything. And so with the flexible schema
of the Datastore, you can get that to work as well. Wait, hang on. The guys in the back are trying
to say something to me. So I guess there's a talk on
BLOB storage after this one. And they're worried that we're
running over time. And they're kind of
rushing us along. ALFRED FULLER: Isn't
that on Friday? KEN ASHCRAFT: I know. I guess they're worried that if
we run long, then the next one's going to run long, and
they're just going to get bumped off the schedule
entirely. ALFRED FULLER: Oh,
that's rude. Can't they just hold up
a sign or something? KEN ASHCRAFT: I know, right? ALFRED FULLER: And
for BLOB storage? More like boring storage. KEN ASHCRAFT: Like
that's so hard. I can store pictures of cats. Yippie. ALFRED FULLER: Well, I guess
we better finish up. You know, going back to this
scoreboard, it's really clear to me that the Datastore does
provide a lot of query capability, really good
transactions, a great consistency model. But if you really want to query
anything and everything, or you want to transact on the
world, or you need strong consistency for all of your
operations, or you rely on a solution that assumes these
things, you really need to use Cloud SQL. KEN ASHCRAFT: And on the flip
side, Cloud SQL does quite well in terms of scalability,
ease of management, and schema changes. But the Datastore really
shines in these areas-- super scalable, really flexible
schema management, and really easy to
get started. ALFRED FULLER: And the best part
is that you can use these solutions together. KEN ASHCRAFT: That's right. ALFRED FULLER: And
closing remarks. KEN ASHCRAFT: All right. So thanks for taking the time
to come to our talk. [APPLAUSE] ALFRED FULLER: Yeah, we'd like
to open it up to questions. KEN ASHCRAFT: There
are microphones in each of the aisles. And if you liked the talks,
there are some +1 cards that you can drop in the
box at the end. ALFRED FULLER: Oh, I didn't
know we were doing that. That's high tech. KEN ASHCRAFT: Can you use
the microphone, please? Sure, go ahead. AUDIENCE: In the scheme where
you described that MySQL will synchronously replicate to the
slaves before responding to the client, what is the range
of latency that we should expect, and how does that
compare to the Datastore? KEN ASHCRAFT: They're
pretty comparable. The latencies would be somewhere
between 50 and 100 milliseconds. ALFRED FULLER: Although, since
Cloud SQL uses a single master, it can commit a whole
bunch of inserts at one time, so the bandwidth
is much larger. AUDIENCE: Is there any thought
being given to readjusting the beginning prices
on SQL storage? Because it's kind
of right now-- you can go out and get
third-party storage, which may not be Google, but run on
[INAUDIBLE] the $9 a month. And you can't get started
for 24 hours for less than $38 on this. KEN ASHCRAFT: So one of the big
benefits of Cloud SQL is the data durability and the fact
that we have synchronous replication. And the other cloud providers
don't have that. And they don't have
that assurance that the data is secure. ALFRED FULLER: And I've set up
a SQL instance before, and if you really want replication and
that kind of durability-- even if you want to do
asynchronous replication-- it's a huge pain to set up. And Cloud SQL kind of
simplifies all that. Robert? AUDIENCE: So, I guess my
question is kind of related to the first one. With the Datastore, the
replication across data centers is on an entity
group basis. With Cloud SQL, it sounds like
the whole database is replicated across. Are there any nuances to how
that limits your concurrency or anything? KEN ASHCRAFT: As mentioned
earlier, your write latencies will go up a bit. And if you need to do
long-running transactions, that can affect you, because
those long-running transactions that are doing
writes will hold locks for a longer period of time. AUDIENCE: So my question is
about the fix the schema from the Cloud SQL. When we do some operation, like
we add some schema in the data storage using the NoSQL
strategy, so how can we map the schema back? Something I could already change
in the other level. But on the top, it's still
fixed to Cloud SQL. So how can we handle
this problem? KEN ASHCRAFT: So I think the
question was, how can we map from a Datastore schema back
to the Cloud SQL schema? AUDIENCE: Yes. ALFRED FULLER: So we do provide
metadata queries to query the live schema that you
have in the Datastore that you've decided to have by
putting data in the Datastore. And you can use that to
actually automatically generate a strict schema
in a MySQL database. Or you can decide what
the schema should be. And when you're doing the
MapReduce, you can do an arbitrary code in the MapReduce,
and you can fix up your data and convert it to
whatever schema you need it to be in or just drop the data that
doesn't fit your schema. AUDIENCE: If you have an
application which runs with Datastore and is [INAUDIBLE] in
the Cloud, can you make a transaction across
both storages? KEN ASHCRAFT: No you cannot. They're independent. ALFRED FULLER: Yeah, but there
are algorithms you can use to make sure that one is eventually
updated in a transactional fashion. It's guaranteed to be
updated eventually. You just have to store some
versioning for, like, what version does the Cloud SQL have,
and what version does the Datastore have? And you can use that
as a basis to make sure it gets updated. AUDIENCE: We built an
application on App Engine using the NoSQL database, and
we wanted to back it up. And we weren't worried about
Google losing it. We were worried about making
a programming error and destroying our own database. And it seemed like the only way
to do that was to write a custom app that would take
our data down and then bring them back up. And there were some limitations
on transferring large amounts of data. It seemed very difficult. ALFRED FULLER: Yes, and that's
a problem we're addressing. We actually have an
experimental-- a backup that you can just
enable from the admin console. And you can actually use
the cron jobs to schedule these backups. And you can back them
up to BLOB Store or Google Cloud Store. And then you can download
those or do whatever you want with them. And we are working to make
that much better. So right now, it runs a
MapReduce that doesn't guarantee any sort
of consistency. And we're working on solving
that problem. AUDIENCE: I started using
Cloud SQL very recently. And the tools, they
fall a bit short. So I was wondering if there's
any plans to maybe let something like phpMyAdmin or
something that's already established that we can maybe
map to the Cloud SQL and start using that way? KEN ASHCRAFT: So the problem is
that we have a proprietary connection that uses OAuth to
get into the Google cloud. And I recognize that this is
a shortcoming, and it's something that we would
like to fix. But nothing that I can announce
at this point. AUDIENCE: What kind of
availability numbers are you offering for each of these? KEN ASHCRAFT: What sort
of availability numbers are we offering-- AUDIENCE: How many lines? ALFRED FULLER: Availability
numbers for HRD? KEN ASHCRAFT: Yeah. ALFRED FULLER: It's echoy. So right now we have an SLA on
App Engine which includes the high replication Datastore
of four and a half nine. So 99.994. The Datastore itself, actually
it's better than that. But the SLA is on the
entire stack. So any issue will affect the
numbers as you read them. So that's like eight minutes
of downtime per year of unexpected downtime
in our timeouts. KEN ASHCRAFT: And on the Cloud
SQL side, we do not have an SLA at this point. ALFRED FULLER: But if you want
to know how that works, Definitely "More 9s, Please"
is a good talk on-- well, sorry. It's my talk. I shouldn't say that. But it goes into a very low
level of the details of how that works. AUDIENCE: In one of the previous
diagrams, you showed that when a write hits the
data center it's getting written to multiple
data centers. Do you also keep local copies? KEN ASHCRAFT: So the way that we
do replication in Cloud SQL is at the file system level. So we write to a distributed
reliable file system, and then we replicate that as well. So by writing to the file system
at all, it is writing to the local copy. ALFRED FULLER: Yeah, and it's
also updating the in-memory running instance so that your
reads are incredibly fast. It's not like it has to touch
the disk or wait for replication or anything
to do reads. AUDIENCE: I was wondering if
you're keeping multiple local copies other than relying on
multiple data centers to [? archive ?] the copies. ALFRED FULLER: So Google File
System, which is also a foundation of Cloud SQL,
manages that type of stuff for us. I don't know how much detail
I can go into there. KEN ASHCRAFT: That's fine. AUDIENCE: Hi, my question
actually ties in a little bit with you. I work with developing one
of the [INAUDIBLE] for-- sorry, I shouldn't mention
the name-- a development tool for MySQL
and I would love to support Cloud SQL. Is there a .NET driver for it? KEN ASHCRAFT: There
is a document. It's a JDBC driver. You can also get one in
Python, though our documentation isn't so great. AUDIENCE: So no .NET driver? ALFRED FULLER: .NET, no. KEN ASHCRAFT: Oh, sorry. .NET. I thought you said "documented." No, no .NET driver. AUDIENCE: Given that the driver
is open source for MySQL, once you've
authenticated, is it the same transport layer as MySQL? KEN ASHCRAFT: No, it's
not the same. AUDIENCE: OK. So it's a completely
different driver. It's nothing that-- any plans on developing
a .NET driver? KEN ASHCRAFT: We would like to
support the MySQL protocol and make it much easier so that it
doesn't matter what language you're actually running in. You just can connect to
something that looks like MySQL and it just works. AUDIENCE: OK thank you. ALFRED FULLER: Any
more questions. Wow. We're well ahead of time. KEN ASHCRAFT: All right. Thank you so much for coming.
This is actually just an excellent talk overall.