Google I/O 2012 - SQL vs NoSQL: Battle of the Backends

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This is actually just an excellent talk overall.

👍︎︎ 1 👤︎︎ u/philogynistic 📅︎︎ Jun 26 2013 🗫︎ replies
Captions
ALFRED FULLER: Hi. Yeah. They just told me to get started. So I'm presenting-- well. I'm co-presenting No-SQL versus SQL. My name is Alfred Fuller and my co-presenter, Ken Ashcraft, he was just here. But I guess I'll get started anyway. So I'll do the overview at least. That's weird. Excellent. This is opposed be a debate. But Ken Ashcraft's not here right now, so I guess it's going to be pretty easy. And I'm always telling myself to talk slower in these things, and I guess I'm going to have a lot of extra time. So this is going to be awesome. So the first thing I'm going to do is I'm going to give you a quick overview of data in the cloud. So why in the cloud? Well first and foremost, the cloud is excellent for fault tolerance, because when you use Google's cloud, you know, we man the pages for you. We always have people looking at these systems to try and keep them working without any user visual interruptions. And we automate fault recovery as much as possible. So most of the time when something happens, no one ever need know about. It just automatically switches over to another machine or something else, and it just keeps working. And that means low maintenance. And in addition to the fault tolerance, low maintenance also means that we manage the updates for you in the cloud. From a bare metal to software patches you never have to worry, how will this patch affect my systems, because you're using the same cloud that we use at Google. And we vet these patches, we make sure everything works, and then we push it. We go through that whole process. And you don't have to worry about that. So when you use the cloud, you can focus on what you do best and not worry about the cloud. All of our cloud products also have built in durability in terms of replication. There's nothing to configure. There's nothing to think about. It's built in from the ground up. These systems are designed to work this way. And they're also geographically distributed. So nothing is sensitive to a single power outage or a single geographical location. And finally, accessibility. The cloud is always on. It's always available, at least when you have an internet connection. And when you don't, when you develop against Google cloud products-- well, at least the Datastore and Cloud SQL, which I'm talking about today-- we have local development environments so that you can test against those environments, you can build against those environments, and then you can deploy to the cloud and to production without worrying. So I work on App Engine. How many of you use App Engine? Show of hands. Oh, that's a lot. But for the people who don't use App Engine, App Engine lets you build apps on Google's infrastructure. Its Platform as a Service and its goal is to make your app-- whether it's a web app or a cloud-enabled Android app--- easy to build, easy to scale, easy to maintain. So again, you can focus on what makes your app great, what makes your app special. App Engine connects into some storage APIs, primarily Cloud SQL, the Datastore, which again is what I work on-- Cloud SQL is what Ken was supposed to talk about-- and Cloud Storage, which they have a talk on Friday. So if you're interested in Cloud Storage, which is BLOB storage, I recommend going to that talk. So the Datastore. Datastore is literally Google storage infrastructure. So it's the same technology we use for our own applications. So Gmail, Google Web Search, you're using the same infrastructure pieces that we use to keep those things up and running. And it's distilled into well-documented APIs that are included in the App Engine SDK. And it's built for scale, both in terms of size and traffic. Right now we perform over 2 trillion operations per month in the Datastore alone. And it's a fully managed NoSQL database solution. So you don't have to worry about provisioning or scaling. It just kind of works. Cloud SQL, on the other hand, it's fully managed, but it's pure MySQL. So it's kind of like that computer you built at home to run your SQL instance, except it's not in your basement. It's not on some island in some VM somewhere with no one looking after it. It's in the cloud, it's fully managed, and it's happy. KEN ASHCRAFT: Hey, I'm here. Hang on. Hang on. I'm here guys. Sorry. ALFRED FULLER: Oh, Ken. Great. KEN ASHCRAFT: Hey, sorry I'm late. Sorry. We can get started now that I'm here. My name is Ken Ashcraft and I work on Cloud SQL. This is Alan Fuller and he worked on the Datastore. ALFRED FULLER: Alfred. KEN ASHCRAFT: Oh, yeah. Anyway, so let me give you a high-level overview of what running in Google Cloud is like and App Engine and Cloud SQL and all of that. ALFRED FULLER: No, no. I just did that. You're a little bit late. KEN ASHCRAFT: Oh, Oh. Sorry about that. ALFRED FULLER: Yeah. KEN ASHCRAFT: You know, it's been hard. I got distracted. I was talking with developers outside. They just keep on mobbing me. Everybody's so excited about using Cloud SQL. They're really loving to have the expressiveness of a MySQL database. It's super easy to get started, super easy to manage, and the best part is they don't have to use any of that NoSQL silly Datastore stuff. They get to use a real database, MySQL. ALFRED FULLER: Wow. Starting with the name calling already. You know, I knew this was supposed to be a debate. But I didn't think it'd get so ugly so quickly. KEN ASHCRAFT: It's not name calling. It's fact. Anything that you can do, I can do it better. I can do anything better than you. ALFRED FULLER: No you can't. KEN ASHCRAFT: Yes I can. ALFRED FULLER: No you can't. KEN ASHCRAFT: Yes I can. Let me show you. Let's talk about queries. Queries are important because they're the way that you access your data. If you don't have a powerful query language, you can't get to the data that you want, and you can't get it quickly. Cloud SQL supports the international standard for database manipulation, structured query language, or SQL. It sounds like NoSQL databases are proud not to support this standard. ALFRED FULLER: Actually, NoSQL is kind of a misnomer for the Datastore as we support an ever-growing subset of a structured query language similar to SQL. Specifically, we support a wide range of filters. We just added support for OR in both Java and Python. So you can combine these filters with OR and in sub-expressions. It supports arbitrary sorting. And we also added, recently, projections or index-only queries, where you can only retrieve a few properties from your entities, and it's much faster and cheaper than retrieving the whole entity. And we actually go beyond SQL in that we support repeated properties. So you can do set operators like "contains all" or "contains any." And that's incredibly useful when you're building tools like labels in Gmail or tags for photos. And the best part about this is this subset scales in the size of the result set. So you never have to worry, as your database grows, if the performance of your queries is actually going to degrade over time. KEN ASHCRAFT: I don't know. In Cloud SQL, we support all those SQL queries that you just talked about and then some. We can do things more powerful, like aggregations. So let's say that you want to compute the average age of people living in each city. In Cloud SQL, it's as simple as this. All you have to do is select the average age and group by city. ALFRED FULLER: Well, the Datastore supports something like that too, except it has to scale to enormous sizes, so we have a powerful framework called MapReduce. And here's an example. You can see on the left there's some data. I have each person, and it has a city ID and an age. And we can use MapReduce to compute the average age in each city. By simply mapping, we map this to a key value pair. In this case, the key is the city ID and the value is the age. Then we shuffle to group by city ID. And then we reduce to calculate the numbers that we need to compute the average, namely the total number of people and the sum of all the ages. And this case, the sample set I chose is actually, apparently, quite young. But also, we can go beyond this in that this required mapping over all your data, and MapReduce is a very powerful framework to do that because it computes this in parallel, so as a basic scatter gather algorithm. But if you want to keep this view that you've created of the result set up to date as your entities change, you can do that using something called a Materialized View. So what you do is you basically track changes in your system as they happen. And you store them in a separate entity. And then asynchronously, you fan in those changes and apply them to your result set. And look. The results are now up to date. Apparently, cities two and three have been evacuated. And-- whoa. KEN ASHCRAFT: Time to update. I don't know that all sounds pretty complicated. In Cloud SQL, it's much easier. And beyond that, you can do more complicated things. Like let's say that you wanted to put this average city age information on a map. Well, you need to be able to have joins to do that. So you'd probably have a table for your people and a table for your cities, and the cities table will contain the latitude and longitude. And it's just as easy as this SQL query that we've got on the screen. ALFRED FULLER: Yeah, it's not as easy in the Datastore. But let's see what [? Appy ?] has to say about that on the scoreboard. And I guess you're right. The Datastore actually has a wide variety of queries that support most use cases. But if you really want to query anything and everything, or in fact all of your data when you do these aggregations, you really have to use Cloud SQL. KEN ASHCRAFT: That's right. Let me tell you something else that I can do better than you. Transactions. Transactions are important because they ensure that you have atomically made changes to your database. You don't want your machine to crash in the middle and partially apply some changes. Lots of NoSQL databases, they don't even support transactions. ALFRED FULLER: Actually, the Datastore does. KEN ASHCRAFT: Well, OK. So you can do a transaction on a single row. That's not a real transaction. ALFRED FULLER: Datastore actually supports transactions across rows using something called Entity Group. These are groupings of entities under a single transaction log. And the thing they do incredibly well is they provide ACID semantics at scale. So all of these entity groups can have transactions occurring simultaneously, and you can have any number of these entity groups in your application. For example, if you have a game, and you have a player entity, and then you have entities for items in that player's inventory, as long as you structure it in such a way that the items in the player's inventory are in the same entity group as the player, you can act upon these transactionally. And this is very important, because you never want a player to use an item and have the item still be in their inventory afterwards or try to use an item and have the effect not work. So for example, if a player wanted to drink a potion-- we have the player as the root entity and the potion as a child entity. So they're in the same entity group. And so we can easily act upon these transactionally. So here's an example of how to do this using the Python API for App Engine. There's APIs in many other languages, well, Go and Java. And it's as simple as decorating the function, use_potion, in db.transactional, and it makes everything in that function happen atomically. So you get the player. You get the potion from its inventory. You transfer the health and the potion to the player. You remove the potion from the player's inventory. And then you put the player in. It all happens atomically. KEN ASHCRAFT: I don't know. That sounds pretty limited. What happens when you want to atomically move a potion from one player to another? You're stuck. I told you. Anything you can do, I can do better. ALFRED FULLER: Wait, wait, wait. No, no. We also support cross-entity group transactions. So if you have two players, and one player wants to sell a potion to that other player, you can do so simply by setting the XG flag to true. And now in this scenario, you can load the buyer, you load the seller, you load the potion from the seller's inventory. You transfer money from the buyer to the seller. You store the potion in the buyer's inventory. You remove the potion from the seller's inventory. And then you save both to the buyer and the seller, and it just happens. It works atomically. KEN ASHCRAFT: Well, in Cloud SQL, you can do the same thing, but you don't have to define those relationships in advance. All you need is START TRANSACTION, you run your queries, and then commit. It's as simple as that. Here's the exact same example from the previous slide, except how you do it in Cloud SQL. Now, these cross-entity entity group transactions, are there any limitations to them? ALFRED FULLER: Well, actually we had to do something called two-phase commit to make sure that we commit to all the transaction logs atomically. And this doesn't actually scale very well with the number of transaction logs involved. So currently, we have a limit of five entity groups that you can use in these cross-entity group transactions, which is more than enough for most cases. KEN ASHCRAFT: Well, there are those other use cases where you want to transact over the entire world, and in Cloud SQL you can do that. So let's say that you wanted to give gold away to your friends. And it's amazing how your friends just pop up out of nowhere when you're giving stuff away. Again, all you need is START TRANSACTION, you run your queries, and then you commit. There's no limitations, then, to the number of entity groups or rows that they can be involved in a transaction. ALFRED FULLER: Well let's see what [? Appy ?] has to say about that. And that's [? Appy, ?] by the way. App Engine logo. You know, I guess the Datastore does support a wide range of transactions, and they do meet most use cases. But if you really want to transact in the world or lock your whole table, you can use Cloud SQL for that. KEN ASHCRAFT: Yes, you can. Now, these transactions that you have over here in the Datastore, what good are they if they're broken by your cross-data center replication? We all know that the Datastore is built on top of BigTable. And BigTable has this weird, out-of-order, eventually consistent replication that nobody really understands. ALFRED FULLER: Well, actually the Datastore uses Megastore Replication. And Megastore Replication uses those entity groups that I talked about earlier. And remember, they had parallel transaction logs. Well they also replicate in parallel as well. So we replicate on the transaction level. Although the system does have no master-- and that means that there's no replica in the system at any given time that necessarily has all the most up-to-date information. But if you use operations that provide the entity group in their request, like a "get by keys" or an ancestor query, we can make sure that you're reading from a replica that has all the most up-to-date information for that entity group. We do also provide really powerful global queries. So you can query against all of your data the Datastore no matter how much data you have. But these, they don't have an entity group, and it's impossible to determine ahead of time what entity groups you're going to see in that query. So we can't make sure that you're reading from a replica that has all the most up-to-date information. But if you recall from this same slide. We do parallel replication. And that means that we can scale the replication based on the amount of resources we have available. And that means the replication actually happens very quickly. So these global queries are only usually a few hundred milliseconds out of date. And speaking of replication, I know that MySQL uses a single master to guarantee strong consistency but then asynchronously replicates changes to a slave. And if there's a lot of changes queued up on a master and the master crashes, you lose that data. I bet it's a lot of fun to tell your developers that you've lost their data every time there's a Datacenter outage. KEN ASHCRAFT: No, no, no. It isn't a whole lot of fun to have those conversations. And that's why we don't have them with Cloud SQL. In Cloud SQL, we use synchronous replication. Let me show you how this works. So we have our MySQL server running here in data center A. A client sends some data to the MySQL Server. Before responding to the client, we synchronously replicate the data to the other data centers, and then we respond to the client. What this means is that if we lose the machine that's running MySQL server, or even if we lose the entirety of data center A, we can quickly restart the MySQL server in a different data center without any data loss. ALFRED FULLER: Well, I don't know. Let's see what [? Appy ?] has to say about this one. KEN ASHCRAFT: Oh, man. I knew I was going to win this debate, but I didn't think it would be this easy. ALFRED FULLER: It's not over yet. Let's talk about scalability. In any dimension you can scale, I can scale better. KEN ASHCRAFT: No you can't. ALFRED FULLER: Yes I can. KEN ASHCRAFT: No you can't. Let me show you. I'll give you some examples from within Google about how we use Cloud SQL. The first one is the Google Time Keeper application. It's used by an organization within Google, the AdWords sales and support team. And they use it to track how much time they're spending on chat support, email support, or campaign optimization. And then they use this information to optimize their own workflow. So this is a large organization within Google that's using Cloud SQL for their day-to-day jobs, and it works really well for them. Let me give you another example. The Google company org chart runs on Cloud SQL. So this is 30,000 employees, their relationships to each other, and what they're working on. To give you an idea of the kind of load that we can handle, picture this. We've got these company all-hands meetings. So all 30,000 employees are listening to our upper management. And the upper management reminds everyone, all right, I want you to go onto the org chart application and update what you're working on. So this is a tech company that we work at. Of course, everybody's there with their laptops on their lap. So everybody simultaneously opens their laptop and goes to this website. This is tens of thousands of employees hammering on this website. All of a sudden, we get tens to hundreds of QPS on the back end. And Cloud SQL handles it just fine. So Cloud SQL works very well for these sorts of large corporate environments. ALFRED FULLER: That's not scalability. Let me show you scalability. Say you're building a hugely popular mobile application. We're talking about thousands and thousands of QPS and millions and millions of users and billions of ruffled feathers. Well with the Datastore there's no headaches. There's no provisioning. It just scales to your use case, and it just works. Let me show you how. So the Datastore, as I said earlier, is built on top of Google infrastructure. And each one of these layers adds a key component to the Datastore scalability. For example, the lowest layer Google File System, or GFS, provides huge capacity and extremely good durability. And this allows your application to get as large as it needs to get. And on top of that we have BigTable. And BigTable automatically splits your data based on loads and balances them on the machines that we have available. And so say your traffic changes. All of a sudden you have a spike of writes in one part of your data. What BigTable will do is it will take down that one shard, or tablet, and split it into two pieces and then load those on different machines. And I'd like to thank [? Ekie ?] for this very excellent comic demonstration of this. And then on top of that is Megastore. And Megastore works at scale. It is a truly distributed database system, because it spans multiple data centers and multiple geographic regions. And that's the level it operates. And if you want super into depth detail about this, you can see my talk from last year, "More 9s, Please." And at scale, the reliability of the Datastore is hugely important, because even small local issues can cause outages for many, many, many users. And the Megastore just handles this by automatically failing over to different data centers and reading the data from there. And it's guaranteed, if you're using the entity groups, to always have that strong consistency, because it makes sure that whatever replica you're reading from is up to date. It also handles catastrophic failures. So if one or more data centers all a sudden goes offline-- they fall into the ocean or the power outage happens nearby-- well, those types of failures are still hidden from your users. So let's see what the score on this one. Oh yeah. KEN ASHCRAFT: All right. I'll let you have this one just because I'm so far ahead. ALFRED FULLER: Good. Let's talk about management, then. Remember at the beginning of this presentation, I talked about the benefits of the cloud. No software patches to worry about. No hard drives to replace. No systems to purchase. KEN ASHCRAFT: And all of that applies equally to Cloud SQL. Let me show you just how easy it is to get started with Cloud SQL. But the very first thing we need to do is create an App Engine application. And rather than doing a live demo and worrying about WiFi and all that stuff, I'm just going to show you some screenshots. So we go first to the App Engine website where we have the form for creating an application. We need to pick an app ID. So we're going to go with SQL vs. NoSQL and an application title. And then I go down here to create an application. All right, that worked just fine. And so I can click on the dashboard to see what we would see. But we haven't uploaded any code yet. We don't have any traffic, so the dashboard isn't very interesting. The next step that I'm going to do now, that I've created this application-- and let's keep in mind the SQL vs. NoSQL ID, because we're going to use that in just a second. The next step is I'm going to go over to the API's console. And if you've used the Maps API or the Translate API, you probably have this already set up. I've just created a new project. And so it's telling me that I need to set up my billing. So I'm going to go here to the Billing tab, and I click on the Checkout button and go through the billing flow. I enter in my credit card information. Once I'm done with that, I come back here to the main page, and I can set up my Cloud SQL instance. So I go to the Cloud SQL tab. And I don't have any instances yet, so I click on Create a New Instance. And it pops up this dialogue for me, and I need to pick an instance name. I think I'll come up with "sql is better." And now I need to authorize that application. Oh, I also can pick a size. The size basically controls how much CPU and RAM you're going to allocate to the MySQL process. So remembering that application that we just created of SQL vs. NoSQL, I type that in. And I click on Create The Instance. Oh, it wants me to do a Project ID. I type that one in. Again, "sql is better," of course, and I choose this ID. And it starts to create my instance. After a few seconds, the MySQL instance will be provisioned, and we'll see a dashboard like this. You can see down here we have a little bit of storage usage already, and that's because MySQL needs to format some of its data files. Now we want to get started using our Cloud SQL instance. We have a SQL prompt built into the web UI that I can easily use for simple queries. So first thing we need to do is create a database. So I type in CREATE DATABASE, and I can click on Execute. That works just fine. Now I need to create a table. So I can type in that SQL statement and execute that as well. You can imagine how I can continue to use this to populate the data or query the data as well. And if I need to create development or staging instances, I just go through those last few steps, and everything is already provisioned for me. So let's see you make that any easier for the Datastore. ALFRED FULLER: Oh, it's easier. KEN ASHCRAFT: Then show us. ALFRED FULLER: Oh, I don't need to. You already showed us about 20 slides ago. KEN ASHCRAFT: Oh. ALFRED FULLER: When you created that app originally, the Datastore was ready right then to accept rights from your application. There's nothing to provision, nothing to configure. You just start writing data. And if you want to use different tables-- or in the Datastore, they're called "kinds"-- you just define those kinds in your code. You don't have to tell the Datastore about him ahead of time. And you start putting data. If you want isolation, you can use Namespaces for multi-tenancy or to isolate a development instance. Or you can even use an entirely different app to completely isolate your staging instance from everything else. So let's see what [? Appy ?] has to say about that. Oh, yeah. KEN ASHCRAFT: All right. I'll let you have another one. Let's see what's up next, though. Ah, schema. I got this one. I got this one. All right. So the schema's important because it defines what your data looks like. What are the data types? What are the relationships between the data? And you saw in my recent example how I created a table. Well, in Cloud SQL, this schema is strictly enforced. And that means that you have to create the table before you can start working with your data. And some people think of this as a benefit of having this strictly enforced schema. It means that you don't have typos in your code where you write to some non-existent column, and then when you try to read from the column that you're supposed to read from, there's no data there. Let me give you an example of how to do a schema change, then. Let's go back to our previous example of a player with a name and some integer amount of health. We're going to want to add magic to this game. So we need to add a mana column. All we need to do in Cloud SQL is alter table and add the column. Just like that. ALFRED FULLER: You know, that sounds a little too magical. KEN ASHCRAFT: You're right. We do have to be careful with these ALTER TABLE statements, because they can lock up the table for the duration of the change. And the reason why that happens is that MySQL has tightly packed the row data so that one row is right adjacent to the next. And when we add that extra column, there's not room for that new field in that tightly packed space. So it needs to copy everything to a new location. So for the duration of the time that it takes to copy everything, you're going to lock the table up. Now there are some tricks that we can play to minimize this lock time or even hide it entirely. And it's called an Online Schema Change. And what we do is we have our old table, and we have a new table. We do a background copy of the data from the old table to the new table. And while that background copy is going on, we don't want to miss any changes that are happening to the old table. So we set up a trigger on the old table so that if any of those changes come through, they'll get propagated to the new table. Once everything is copied, we just do an atomic rename and it just works. So if you want to see how that works, there's a company called Percona. And they have a tool called pt-online-schema-change that works with MySQL to make that very, very easy. ALFRED FULLER: Well, in the Datastore, schema changes are actually magical. Well, not really. They're not actually magical. You have to do something. But the schema enforcement actually happens--- or you can enable a schema enforcement in your code. The Datastore doesn't actually enforce this schema for you. What this means is if you want to add that mana field, all you do is change your code and it's there. You can set a default value, and you can just start using this stuff. If you need to back fill any of the previously stored entities to, say, do some sort of complicated calculation to figure out what initial mana every character should have, you can do that using the powerful MapReduce framework that I described earlier. And let's see how this one turns out. And I win. KEN ASHCRAFT: No, it's a tie. ALFRED FULLER: Oh, that didn't turn out how I thought. KEN ASHCRAFT: No. ALFRED FULLER: Who could have predicted a tie? KEN ASHCRAFT: You know, maybe there is room for both of our products in the world. Actually, let me give you an example of where the Datastore probably is a better fit than Cloud SQL. These file sharing applications are really popular nowadays. If we wanted to build one, well, first we need to come up with a good name. I think the DropRectangle.net would be a good one. If you were to use Cloud SQL to store this data, this is probably how you would structure your schema. You'd have a table for your users. Of course, they'd have an ID and a name. You'd have a table for your files. The owner_id would reference back to the users. And you'd also have a table for your access control specifying who is allowed to access which files. So with this schema, you can imagine how you could run queries like, show me all of the files that I have access to, or atomically transfer ownership of this file from one owner to another. And this works great until your site gets popular and you have lots and lots of users and lots and lots of files. And the data no longer fits on a single machine. At that point you can shard your data. And the natural way to shard the data would be by user. Unfortunately, we have this operation of transferring ownership between users. And if you shard your data by user, you don't know whether the two users are on the same shard. And if they're not on the same shard, it gets really hard to atomically move that file ownership between shards. And this is where the Datastore actually would probably do better than Cloud SQL. You structure the user as the root entity. You'd have files beneath that and access control underneath that. So with the global queries, you could easily find all the files that you have access to. And if you want to atomically transfer files between users, you can use the cross-entity group transactions that Alfred described earlier. ALFRED FULLER: And you know, when I was working on this presentation with you, it really became kind of clear to me that there are also some use cases for Cloud SQL, especially if you want to support off-the-shelf solutions. So there's this entire ecosystem built up of frameworks that are available that were built to work with relational databases. And it doesn't always make sense to modify these solutions or roll your own solutions. So if you just want to use these off the shelf, Cloud SQL is obviously a better choice there. KEN ASHCRAFT: So do you think there are ways that our two products could work together? ALFRED FULLER: You now, we have this PM, or product manager on our team, Greg. He's always sending us these emails, like selling stuff from his garage. And I don't know what those-- KEN ASHCRAFT: He does send a lot of emails. It'd be really great if he had some sort of web application where he could post things for sale or list things for sale, and people could search for what they want to buy. ALFRED FULLER: Yeah, yeah. And he could call it Greg's List. KEN ASHCRAFT: That's a good idea. ALFRED FULLER: And if he did this, what he could do is he could use Cloud SQL to store all of his active listings so that you have all the speed of the in-memory operations and in-memory performance of a single machine. And then when a listing expires or is sold, you can use the Datastore to archive all those listings. And they're always available, and you can still query against them, and you could still use them. KEN ASHCRAFT: One of the big benefits of putting the active listings in Cloud SQL would be that you get to take advantage of the powerful query language and all of those aggregations and lots of flexibility so that you could run queries like, show me the average price of a sofa in San Francisco. ALFRED FULLER: Yeah, and Cloud SQL works best when your entire data set fits into memory so it doesn't have to page the disk or do any sort of heavy lifting there. And the active set of listings is relatively small to all the listings throughout time. So it really makes a lot of sense to keep them in Cloud SQL. KEN ASHCRAFT: And storing the archive listings in the Datastore makes sense, because when you have schema changes, you, of course, want to apply it to the data that you're actually going to be working with, the stuff that's in Cloud SQL. But all of those archive listings, you don't really want to apply the schema changes and do the back fill and everything. And so with the flexible schema of the Datastore, you can get that to work as well. Wait, hang on. The guys in the back are trying to say something to me. So I guess there's a talk on BLOB storage after this one. And they're worried that we're running over time. And they're kind of rushing us along. ALFRED FULLER: Isn't that on Friday? KEN ASHCRAFT: I know. I guess they're worried that if we run long, then the next one's going to run long, and they're just going to get bumped off the schedule entirely. ALFRED FULLER: Oh, that's rude. Can't they just hold up a sign or something? KEN ASHCRAFT: I know, right? ALFRED FULLER: And for BLOB storage? More like boring storage. KEN ASHCRAFT: Like that's so hard. I can store pictures of cats. Yippie. ALFRED FULLER: Well, I guess we better finish up. You know, going back to this scoreboard, it's really clear to me that the Datastore does provide a lot of query capability, really good transactions, a great consistency model. But if you really want to query anything and everything, or you want to transact on the world, or you need strong consistency for all of your operations, or you rely on a solution that assumes these things, you really need to use Cloud SQL. KEN ASHCRAFT: And on the flip side, Cloud SQL does quite well in terms of scalability, ease of management, and schema changes. But the Datastore really shines in these areas-- super scalable, really flexible schema management, and really easy to get started. ALFRED FULLER: And the best part is that you can use these solutions together. KEN ASHCRAFT: That's right. ALFRED FULLER: And closing remarks. KEN ASHCRAFT: All right. So thanks for taking the time to come to our talk. [APPLAUSE] ALFRED FULLER: Yeah, we'd like to open it up to questions. KEN ASHCRAFT: There are microphones in each of the aisles. And if you liked the talks, there are some +1 cards that you can drop in the box at the end. ALFRED FULLER: Oh, I didn't know we were doing that. That's high tech. KEN ASHCRAFT: Can you use the microphone, please? Sure, go ahead. AUDIENCE: In the scheme where you described that MySQL will synchronously replicate to the slaves before responding to the client, what is the range of latency that we should expect, and how does that compare to the Datastore? KEN ASHCRAFT: They're pretty comparable. The latencies would be somewhere between 50 and 100 milliseconds. ALFRED FULLER: Although, since Cloud SQL uses a single master, it can commit a whole bunch of inserts at one time, so the bandwidth is much larger. AUDIENCE: Is there any thought being given to readjusting the beginning prices on SQL storage? Because it's kind of right now-- you can go out and get third-party storage, which may not be Google, but run on [INAUDIBLE] the $9 a month. And you can't get started for 24 hours for less than $38 on this. KEN ASHCRAFT: So one of the big benefits of Cloud SQL is the data durability and the fact that we have synchronous replication. And the other cloud providers don't have that. And they don't have that assurance that the data is secure. ALFRED FULLER: And I've set up a SQL instance before, and if you really want replication and that kind of durability-- even if you want to do asynchronous replication-- it's a huge pain to set up. And Cloud SQL kind of simplifies all that. Robert? AUDIENCE: So, I guess my question is kind of related to the first one. With the Datastore, the replication across data centers is on an entity group basis. With Cloud SQL, it sounds like the whole database is replicated across. Are there any nuances to how that limits your concurrency or anything? KEN ASHCRAFT: As mentioned earlier, your write latencies will go up a bit. And if you need to do long-running transactions, that can affect you, because those long-running transactions that are doing writes will hold locks for a longer period of time. AUDIENCE: So my question is about the fix the schema from the Cloud SQL. When we do some operation, like we add some schema in the data storage using the NoSQL strategy, so how can we map the schema back? Something I could already change in the other level. But on the top, it's still fixed to Cloud SQL. So how can we handle this problem? KEN ASHCRAFT: So I think the question was, how can we map from a Datastore schema back to the Cloud SQL schema? AUDIENCE: Yes. ALFRED FULLER: So we do provide metadata queries to query the live schema that you have in the Datastore that you've decided to have by putting data in the Datastore. And you can use that to actually automatically generate a strict schema in a MySQL database. Or you can decide what the schema should be. And when you're doing the MapReduce, you can do an arbitrary code in the MapReduce, and you can fix up your data and convert it to whatever schema you need it to be in or just drop the data that doesn't fit your schema. AUDIENCE: If you have an application which runs with Datastore and is [INAUDIBLE] in the Cloud, can you make a transaction across both storages? KEN ASHCRAFT: No you cannot. They're independent. ALFRED FULLER: Yeah, but there are algorithms you can use to make sure that one is eventually updated in a transactional fashion. It's guaranteed to be updated eventually. You just have to store some versioning for, like, what version does the Cloud SQL have, and what version does the Datastore have? And you can use that as a basis to make sure it gets updated. AUDIENCE: We built an application on App Engine using the NoSQL database, and we wanted to back it up. And we weren't worried about Google losing it. We were worried about making a programming error and destroying our own database. And it seemed like the only way to do that was to write a custom app that would take our data down and then bring them back up. And there were some limitations on transferring large amounts of data. It seemed very difficult. ALFRED FULLER: Yes, and that's a problem we're addressing. We actually have an experimental-- a backup that you can just enable from the admin console. And you can actually use the cron jobs to schedule these backups. And you can back them up to BLOB Store or Google Cloud Store. And then you can download those or do whatever you want with them. And we are working to make that much better. So right now, it runs a MapReduce that doesn't guarantee any sort of consistency. And we're working on solving that problem. AUDIENCE: I started using Cloud SQL very recently. And the tools, they fall a bit short. So I was wondering if there's any plans to maybe let something like phpMyAdmin or something that's already established that we can maybe map to the Cloud SQL and start using that way? KEN ASHCRAFT: So the problem is that we have a proprietary connection that uses OAuth to get into the Google cloud. And I recognize that this is a shortcoming, and it's something that we would like to fix. But nothing that I can announce at this point. AUDIENCE: What kind of availability numbers are you offering for each of these? KEN ASHCRAFT: What sort of availability numbers are we offering-- AUDIENCE: How many lines? ALFRED FULLER: Availability numbers for HRD? KEN ASHCRAFT: Yeah. ALFRED FULLER: It's echoy. So right now we have an SLA on App Engine which includes the high replication Datastore of four and a half nine. So 99.994. The Datastore itself, actually it's better than that. But the SLA is on the entire stack. So any issue will affect the numbers as you read them. So that's like eight minutes of downtime per year of unexpected downtime in our timeouts. KEN ASHCRAFT: And on the Cloud SQL side, we do not have an SLA at this point. ALFRED FULLER: But if you want to know how that works, Definitely "More 9s, Please" is a good talk on-- well, sorry. It's my talk. I shouldn't say that. But it goes into a very low level of the details of how that works. AUDIENCE: In one of the previous diagrams, you showed that when a write hits the data center it's getting written to multiple data centers. Do you also keep local copies? KEN ASHCRAFT: So the way that we do replication in Cloud SQL is at the file system level. So we write to a distributed reliable file system, and then we replicate that as well. So by writing to the file system at all, it is writing to the local copy. ALFRED FULLER: Yeah, and it's also updating the in-memory running instance so that your reads are incredibly fast. It's not like it has to touch the disk or wait for replication or anything to do reads. AUDIENCE: I was wondering if you're keeping multiple local copies other than relying on multiple data centers to [? archive ?] the copies. ALFRED FULLER: So Google File System, which is also a foundation of Cloud SQL, manages that type of stuff for us. I don't know how much detail I can go into there. KEN ASHCRAFT: That's fine. AUDIENCE: Hi, my question actually ties in a little bit with you. I work with developing one of the [INAUDIBLE] for-- sorry, I shouldn't mention the name-- a development tool for MySQL and I would love to support Cloud SQL. Is there a .NET driver for it? KEN ASHCRAFT: There is a document. It's a JDBC driver. You can also get one in Python, though our documentation isn't so great. AUDIENCE: So no .NET driver? ALFRED FULLER: .NET, no. KEN ASHCRAFT: Oh, sorry. .NET. I thought you said "documented." No, no .NET driver. AUDIENCE: Given that the driver is open source for MySQL, once you've authenticated, is it the same transport layer as MySQL? KEN ASHCRAFT: No, it's not the same. AUDIENCE: OK. So it's a completely different driver. It's nothing that-- any plans on developing a .NET driver? KEN ASHCRAFT: We would like to support the MySQL protocol and make it much easier so that it doesn't matter what language you're actually running in. You just can connect to something that looks like MySQL and it just works. AUDIENCE: OK thank you. ALFRED FULLER: Any more questions. Wow. We're well ahead of time. KEN ASHCRAFT: All right. Thank you so much for coming.
Info
Channel: Google Developers
Views: 392,895
Rating: undefined out of 5
Keywords: gdl, cloud
Id: rRoy6I4gKWU
Channel Id: undefined
Length: 43min 8sec (2588 seconds)
Published: Fri Jun 29 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.