Aurora Database Architecture by Amazon Web Services

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well thanks for coming my name's Nate Slater I'm an AWS solution architect I work here in San Francisco covering some of the accounts that we have here in the Bay Area tonight we're gonna talk about Amazon Aurora it's a new database product it's a relational database we announced the product for the first time I believe this past year at our reinvent conference which was in November of 2014 in Las Vegas it's our annual user conference the products today is in beta and will hopefully be go to GA soon and you know many of you who ABS customers already we can see about possibly getting you early access if you are interested and I'll have some information at the end about the expected launch date for the product so so Aurora's a relational database and Before we jump into what Aurora does and what it is I thought it'd be useful to just sort of take a quick stroll back through history and talk about the history of the database in the 1970s we saw the introduction of the mainframe big big huge systems that were you know comfortable in big data centers at corporate headquarters that basically ran all of the company's operations and all the database related activity generally was performed on a mainframe the 1980s we started to see the rise of the UNIX relational databases right there's a lot of companies that I'm sure we're all familiar with that came of age during that period that offers some of the most widely used relational databases today in the 90s probably the mid to late 90s we also started to see x86 architecture make a move into the relational database land with some of the products like Microsoft sequel server and others that didn't require a big UNIX server to run and shortly after that we started seeing relational databases that came from the open source community things like my sequel Postgres by to the mid-2000s they'd gone from the point of being databases that were used for very specific purposes actually running large production workloads and then in this decade no sequel has really been ascendant you know those sequels existed for a while but it's been in the last probably you know five to seven years that we've really started to see it become a widely used technology and so that's just sort of a quick history of where we are today with databases and the relational database has been a cornerstone of IT infrastructure for nearly four decades right anybody who's done any software engineering in the last 30 years has touched a relational database at some point right it offers a lot of really compelling features the first is atomicity consistency isolation and durability or the acid compliance you have atomic transactions you have a consistent schema so that when each transaction commits the database is in a consistent state you have isolation so you can run a concurrent system where each transaction executes as if it's in complete isolation from every other and then finally you have durability right if the database system crashes you can actually restore your data from the crash so those are those are powerful concepts and no sequel databases support usually some of them but not all of them and there are trade-offs as a result so you know these are these are things that most folks find valuable in a relational database system sequel also a very powerful tool very flexible been around for a long time there is a lot of good tooling around it I des and other things it's very easy to develop with sequel there's a rich ecosystem of tools to do almost anything with a database a relational database you have tools that let you build logical schema and promote that to a physical schema and then actually generate the DDL for your tables on the database right a lot of really really handy tools that make productivity very very high when working with the relational database and they've also been designed to run on the most powerful hardware right so unlike some software packages where you can choose to run it on a system that has 16 cores you may or may not see any notice benefit of the system single-threaded and is gonna make use of all the resources on that hardware databases have always been designed to make extensive use of the hardware so if you invest in an expensive powerful server you run a relational database on it most the time at least it's going to take full advantage of the hardware available to it the problem it's not a problem but but the the reality is is that usage has changed over the years when relational databases were first introduced you really only accessed it through the corporate network right and it was usually through terminals right you had a couple of people who had terminal access to the database and that was it in the 90s we started to see PCs accessing databases either directly or indirectly over the internet right as web sites and other sort of highly scaled applications came of age all of a sudden now you have not just tens or hundreds of clients accessing the database you have potentially thousands tens of thousands maybe even millions and you know some of the work that resulted in products like dynamodb which is the AWS no sequel database came out of work that had been done at amazon.com to scale their database right so the 90s was really 90s late 90s into 2003 a period where all of a sudden we were starting to see databases having to serve up a lot more load than they traditionally had previously and then today of course you have a multitude of devices you have mobile devices you have laptops you have Smart TVs tablets right sometimes people have all of these types of devices and they're running them at the same time I've got my laptop here I've got my cell phone in my pocket or my smartphone in my pocket you know all of these have apps that ultimately access some sort of database so this this has created some problems in scaling right relational databases scale in two different ways typically you can scale up or you can scale out and if you scale up you end up absorbing quite a bit of cost because to scale up involves adding more hardware oftentimes proprietary hardware that's very offensive if you scale out meaning you're actually going to add more nodes or more instances using commodity hardware you can keep your cost lower but that comes at the price of complexity you end up with situations where your data is sharded across instances you have fancy master to master replication going on all things that create some operational challenges and so this slide points out when you're scaling vertically you end up paying a lot more to do it and when you scale out horizontally you may be able to do that for less money but somebody somewhere is going to have potentially an operational nightmare on their hands if you don't do things properly so these are just two of the main reasons that scaling relational database has you know typically been been hard so let's introduce Amazon Arora which aims to solve this problem of scaling a database but before we we dig in there it's useful to know where Arora fits in to the AWS data storage ecosystem we have a number of services that we offer all of them sort of do different things and are you know best suited for certain workloads on the bottom left there when we're looking at sort of very hot data storage so data that's very very recent often times as in the case of ElastiCache resident in memory we have our last cached product which basically supports Redis and memcached caching systems and it's a very loose schema right these are object stores you store data in sort of key value pair or JSON type notation and there's not a lot of schema enforcement right you can sort of put whatever you want in there we also have DynamoDB also very good for hot data because it can scale to meet demanding read and write io workloads it has maybe a little bit more schema than elastic cache but not much right at the end of the day dynamo DB is a key value store we recently added support for JSON so if you're looking for strict relationships between entities it's gonna be more challenging to do that with a loosely scheming system than it would be with a strong schema system so obviously we have to have something in that quadrant and today we have RDS and RDS runs relational databases such as my sequel Postgres sequel server Oracle and you have all of the benefits of a relational database in that its strong strict schema relational model and you know adheres to the acid compliance we also have redshift which is better for cooler data so data that's maybe a day old two days old a month old you know data that's not as live general there's some kind of ETL process to move it from the hotter data storage like DynamoDB or RDS in to redshift it uses a columnar data storage technique but it looks and acts to the client application like a relational database you can execute sequel queries against it and it has many of the same properties as a relational database even though under the hood it's actually storing data very differently but it's not great for transactional stuff right you wouldn't want to have redshift be used to ingest data from user profile settings on mobile devices right it's just not designed for that we also that's three great for cold storage right you can move data out of systems like dynamo or even RDS or even redshift where the cost per gigabyte stored is a little higher on archive data you can move it to s3 and for the coldest data storage of all we have glacier which is really offline data you need it only for regulatory purposes or compliance purposes you don't expect to have to access it frequently s3 and glacier are object stores so again you can put whatever you want in there so there's no real schema right whether you're storing binary data or JSON doesn't matter it's gonna treat it the same way and then finally as we'll talk about today we have Aurora which is really a relational database strongly schemin relational database that's designed for very very low latency so it fits a very sort of useful spot in the ecosystem and that if you want to use a relational database and you extremely high performance and performance that scales horizontally Aurora's a really good choice so what did we do with Aurora well we we reinvented relational database using a service-oriented approach and we'll go into some details about what that actually means the scale out component of the database was really a function of some very interesting things we did with storage right so we'll see in the next several slides that it's at the storage layer that a lot of the magic happens with Aurora and then we retained drop in compatibility with my sequel 5/6 right we wanted to give people a database that actually their existing apps would work with right out of the box right we didn't want to have to have them rewrite queries or you know even make subtle changes to the sequel engine that would break existing applications so we applied some service-oriented architecture to the database and that really is what you're seeing at that storage layer there we've actually built a storage service similar to how he be s which is our block storage as a service used by ec2 is actually a service you know the storage itself lives outside of ec2 we did a very similar thing with Aurora so we actually created a service-oriented architecture for accessing the storage we have also made extensive use of some of the other services like s3 route 53 SWF and DynamoDB for the control plane so the control plane is basically the part of the service that actually controls the database itself so the provisioning of the system any metadata that we have to keep about the system some of the DNS right you knew with RDS when you launch an instance you get a endpoint that's all controlled by DNS and then s3 for backup and durable storage for high availability we've built on some of the concepts that exist today with RDS in multi availability zone mode so when you run RDS instances in multi AZ mode today what we're doing is we're actually making copies of your data to a second availability zone in a passive standby instance so that there's a service disruption and the primary availability zone we can fail you over the second you generally don't suffer more than you know 3060 seconds of downtime as the DNS switch happens so what we've done with Aurora as we've taken that even further we've decided that we're going to scale across three availability zones in the regions that have three availability zones we're gonna do a 4 of 6 right poram so every time a change is made to your data we'll consider that change to be complete if four of the six copies have completed successfully and then on the read side we will consider a read to be consistent if we can get three of six versions of that data in that system and there's some some nice logic in there that if an availability zone goes down it'll automatically roll over to a three or four a quorum model just so that your writes and reads don't completely fail simply because a single AZ is down that really wouldn't be very smart if we wanted high availability we're using SSDs on the back end so that keeps read latency is really really low so that there's not a lot of seeking pull data off disk the database size will scale with your data so you don't have to tell Aurora upfront how much storage you need to provision so today with RDS for example if you need three terabytes of data but you're you're you're expecting that you'll have three terabytes six months from now you sort of have to provision it you know when you launch the instance with Aurora you won't have to do that right you start with just whatever storage you're using when you launch the instance and then as you add more data to your database we just scale the storage for you and then the final point here and this is this is really to me one of the most interesting pieces of it we've we've taken the approach of storing data in a log structured way folks who've worked with some of them no sequel products like HBase or Cassandra are probably familiar with log structured merge trees which is the way those those databases actually store their data and we've done something similar with Aurora log structured storage is a well-researched concept in computer science it's not something we invented but we have implemented this it has some unique characteristics so what is log structured storage it basically just means you treat storage as a log file right you do append-only you don't ever actually update data so what this means is that any new right of data any new piece of data that needs be written out to storage just append it to the end right you don't actually seek for it on disk to find where the original value is look it up and then write it back to that page you just append it to the end you'll use B tree indexes to hold pointers to the latest version of a record so what will happen is when you write a new block of storage or a new segment of storage you're gonna update the pointers and the index to basically point to the most recent version of that data what this means is that you're gonna end up with stale data that sort of is just orphaned and nothing's pointing to it and those need to be removed and so there's a garbage collection process that will run periodically that will remove that data so conceptually this is what what goes on with log structured storage this this isn't you know actually how you would implement it in terms of what the data structures look like on disk but it's just sort of what what's going on in terms of the the indexes and the pointers so you can see if we have a b-tree index on the left there that has primary key values in it a typical b tree you know each each segment is going to point to another segment which will eventually point to a leaf node and the leaf node there you can see we have value 50 or a key 52 has value one and then we have G 55 is pointing to the value for key 55 let's just assume that the value what we have here is a new value for key 52 and so this just gets appended to the end of the block of storage right so first first thing that happens is just append it then we have to update the index and so you can see what's happened is the new value is now referenced by the index node and the old value is stale right it's it's no longer needed and I'll be this garbage collection process that actually will clean it up and reclaim that space again for those who are familiar with Cassandra HBase there's a lot like compaction right when those those files are being flushed to disk need to be merged together this type of thing is happening right you're actually getting rid of the old versions of records you're merging the files into a smaller number of files and what you end up with are just the latest version of the records in a small number of files so some of the the interesting things about this are is that what this really means is that the database file itself is the right ahead log it is the replay log right as you're writing out appending these blocks to the to the storage that's basically exactly what a right ahead log does right so one you're reducing i/o because you're not having to do two writes now and then once you've written to the right ahead log do what you just said you were gonna do in the right ahead log which is how database works today it also gives you really really fast recovery from failure almost instantaneous recovery because the database file is the right ahead log so if there's a failure event you restart the database you just basically start reading from where you left off update the pointers and you're recovered right it's essentially you're replaying the log as you're reading from the database you can also cache the writes in memory so what you do is you you cache the writes in memory when the buffer fills up you flush them out to disk and that can happen asynchronously so you end up with very consistent write performance because you're writing to memory not to storage it also means backups are continuous and incremental so again if every time you append a new record to the end of storage you can also just write that segment out to say s3 or some other kind of backup media and lo and behold you now have continuous incremental backups right you every time you write something new you're essentially checkpointing and writing that off and then there's something called multi-version concurrency control which basically data is never updated it's only appended so what that means is that if a read of data what's going to happen is the the clients going to request a read the reads gonna look up the bit of data using the pointers in the index and it's going to get the most current copy of that data at the time the read was requested someone could go ahead and change the data while the clients reading it but that's okay right because to that client it got the most up-to-date version and if the client then wants to write a value back out all you need to do is on the right just look to see well has the pointer changed to this value that I had in memory when I wrote it when I read it if the answer is yes well now you have a concurrency problem and you can handle it optimistically as opposed to pessimistically by having to lock that data down during that whole transaction right it doesn't mean you're gonna overwrite what somebody else has written it just means that you can choose to do that or you can choose to so that solves a lot of these scaling issues with relational databases that involve heavy use of locks because once you start locking things you know concurrency becomes a real issue because data is gonna be locked for some length of time you have lots of clients trying to access the data you know the locks the locks can be burdensome so this this is sort of to subscribe to a bit more detail the recovery process so you know in a typical database left-hand side there you're gonna basically have this check point of data and they're gonna have these redo logs in the event of a failure when you have to replay the the log but the failure occurs at T 0 there you've got to replay all of those uncommitted transactions in the log all the way up to the checkpoint but generally single threaded operations could take some time to actually recover in Aurora because the database file is the log and we're also splitting it up into chunks that exists on different data partitions we actually can do sort of instantaneous replays across these little blocks here and they happen in parallel so recovery from from failure is very very fast we've also done some things with the cache so the query cache and the buffer buffer pool cache are in a separate process than the then the database engine itself so if you restart the database we keep the cache warm and so what that means is when the database is backup all of the previously cached material is still available so you don't have to go through this process of rewarming the cache with all the queries that you you're running some other interesting properties of the way that Aurora storage works is you know what the typical read replica model you're basically you have your your master here and you have your replica over here this is 70 percent writes and 30 percent reads so you decide well you know I want to get some of those reads off my off my primary so you spin off a read replica you're using you know in case of my sequel binary log replication well what happens is is that yeah sure you get thirty thirty percent new reads there but remember reading the reading the binary log and writing that out to the replicas is actually gonna use resources on that instance so you're not actually you know getting you're not getting all of the compute power on that read replicas devoted just to new reads you're really only getting about thirty percent right whatever you had on the primary with Aurora because the storage is shared all we need to do is just do some cache and validation to make sure that the caching is in sync across these guys and you actually get 100% on the right you can literally say I want this to be a hundred percent rights and this to be a hundred percent reads because it's reading from the same storage so you know you get some some really powerful scale out of this and we give you up to fifteen replicas and there's almost no load on the master and the lag it's very very low between the databases this is kind of interesting so you know one of the one of the things we've heard from customers who run RDS in multi availability zone mode is well you know multi AZ mode is great and sure we trust you guys that you know when the primary suffers a disruption where you're gonna cut us over to the secondary but you know I've got SLA is with my customers I don't know how long that that's gonna take I don't know how my app is gonna react how do I simulate that how do you do that and there are some tricks to do it but there's no button that you can go into and say hey you know failover from the master to the secondary so what we've done with Aurora is we actually created some you know some some DML extensions really or DL I guess alter statements lets you actually simulate failures you know using sequels so you can test your your failover mechanism you can see how your applications can behave when some of these events actually occur okay so let's talk a little about performance because that's one of the chief reasons so this this is some performance data that was captured using my sequel sis bench in this case we were running an R 3/8 extra-large instance that has 32 cores and 244 gigabytes of RAM so that's the largest size instance in the r3 family which is our memory optimized instance and what we did was we ran for client machines so he's actually separate ec2 instances each with a thousand threads and you can see that you know we got just pretty phenomenal DML throughput you know 105 thousand DML per second presumably there insert statements right that's significant right I've done some benchmarking with Cassandra and HBase you know systems that are designed to handle tens of thousands of writes per second and you know using large ec2 instances you know I never saw a 6-figure you know transactions per second you can definitely see I'm sure there are ways to get there with those other products but you know it's pretty impressive you can see that we were running 4,000 4,000 clients so that's quite a few concurrent connections as well on the read side same thing we use my sequel sis sis bench same instance type r3 at excel this time we did a single client with one thousand threads and you can see just the Select throughput is just you know fantastic you know half a million plus per second you can also see that where we're hitting the cache right so caching helps and you know reads reads tend to be the harder workload on a database especially random access reads because it's you know it's forcing you to jump around on the disk obviously solid-state helps with that but you know still it's that sort of random access of data that tends to be more taxing on database than just appending new data to the end on on Wright's replica lag yeah this is uh this is an interesting one too so you can see we did was a 13,800 you know inserts per second and we were seeing you know 7.2 millisecond replica lag right so that's that's almost instantaneous and same hardware running my sequel five six with two thousand updates a second was seeing replica lag around two seconds right so you know your read replicas are essentially in sync with your master this one's a little more esoteric but you know for people who are running databases with large numbers of tables you can see that scaling up the number of tables doesn't cause significant decline in database performance with Aurora like it does with the other three categories that we benchmarked so the first column next to Aurora there as I to 8xl it's a storage optimized instance that we offer with really really fast local SSD and you can see that when we got to about a thousand tables you know the performance really began to drop off the same my sequel instance running on an I - with everything in RAM also did suffer a drop-off decay as you you get down to the 10,000 tables and then finally an RDS my sequel using a 3,000 IAP svali sorry 30,000 die ops volume you know really dropped off pretty quick when you got you know - to that thousand table range concurrency you know you can see Aurora as the number of connections goes up it sort of just scales with the number of connections whereas if you look at the RDS side you know it's sort of like that 500 is kind of a sweet spot and then tip against the tail off after that so you know concurrency really strong on the Aurora side caching this one's interesting I need to actually talk to the guys who generated this number is because you know the first first one makes sense right the first first row there you have 100 percent reads and zero percent writes Aurora without caching is can deliver 160,000 opps per second with caching its 375-thousand makes sense right caching is only going to help you on read you see similar similar characteristic with well actually don't because with with for whatever reason with with RDS my seek or just my sequel in general caching is not the recommended the recommended config so actually the performance goes down with 50% read/write ratio we actually do still see good performance on Aurora but actually drops and I'm guessing it has to do with the fact that it's a blended it's a blended workflow and you're just not getting as much benefit from caching and then of course the final row there 0% reads 100% writes caching is not going to help you in any case you're gonna see the same performance with or without it but what I'd like to find out just from our product team is is what explains that that you know slight drop off there I would think that caching would would actually make the number go up but it doesn't what it is yeah the writes are causing value read replicas again significantly less lag than what you see with just traditional my sequel you know my sequel side as you scale up the number of transactions that are occurring on the primary you know lag starts to go way up you know we don't even see the blue bars there with Aurora because we're just in the sub 10 millisecond range there so you know if you're doing 10000 updates a second with an RDS my sequel 30 ki ops single AZ so we're not even having to copy the blocks to the second AZ your 300 second lag that's what five minutes so you know that's probably not ideal unless you're you know your clients that are talking the read replicas understand there's gonna be eventual consistency there ease of use so you know one of the things we like to do with all of our services is eliminate operational overhead and we've done the same with Aurora it's gonna be part of the RDS suite of databases that you can run so just like you can go into the RDS console spin up a database in minutes you'll be able to do the same thing with Aurora instant on volume snap shots again well I take a snapshot of your data you can do it right through right through the console right through the API due to that you know rather unique property of log structured storage we can actually do continuous incremental off volume snapshots to s3 so you know again great durability you know s we designed for eleven nines of durability not gonna lose data there perfect place to put your backup on the storage side will scale you up to 64 terabytes and that shouldn't have any performance or availability impact we just do it as the data size grows I think we do it in 100 Gig chunks or something like that have to get the details there and then under the hood at the storage layer right there's automatic restriping mirror repair hotspot management all the things that we're doing to manage the actual arrays so what we what we're saying is its Enterprise great features and performance at open source prices right enterprise features like data protection we replicate your data across all the availability zones in the region generally regions have three or more we're doing continuous and incremental backups to s3 protect your data long term we'll have capabilities to secure data at rest those are coming sometime after launch it looks like with aes-256 based encryption and then all of the blocks that were writing out to s3 will be encrypted using s3 encryption presumably through the key management service which is our new way of doing data at rest encryption using customer manage keys ssl will be available for data in transit so if you do have workloads involving sensitive data compliance regulations that require securing data in transit and at rest you'll have SSL endpoints that you can use there it's going to be in V PC by default so you'll get all the extra security you get with V PC network a CL as private subnets both ingress and egress rules on security groups and then there's no direct access to the node so we're not actually left customers you know SSH or logging into the nodes that run there or a database those are all under the hood so you'll be able to make client connections to the my sequel client but you can actually get in and you know twiddle things on the nodes themselves and you know that is a important point because you know it eliminates the likelihood that someone's going to come in and you know do something bad by accident engineered for mission-critical apps right we want this to be an always-on database you have 15 Reid replicas that you can use to failover to across availability zones if there's a service disruption in an availability zone it's designed to handle it complete transparently so you shouldn't even see you know a brief sort of disruption Wow the failover occurs right immediate crash recovery again that's one of those sort of interesting properties of log structured storage is that you know you don't have this period of time where you're having to lengthy period time for sure where you're replaying redo logs or right ahead logs and then the page cache we keep it warm in the event that the database engine itself needs to be restarted so when you come back up you have all the benefits of the cache and finally the pricing right this is what sort of makes it as really as compelling as it can be you can see that we start with our three large instance type at 29 cents an hour and that goes up to 464 an hour with the r38 extra large that's pretty inexpensive I don't remember offhand what the r38 extra-large ec2 hourly prices probably not that much more than four dollars and 64 cents an hour so these are what we expect the prices to be for Virginia so there may be some variability across regions right you know our costs do vary across the different geographic regions that we operate in around the world so there'll be some differentiation there but this is what we're saying there's also some cost for the storage storage will be 10 cents per gigabyte per month and then you'll pay 20 cents per million iOS target launch for this second quarter of this year so no no specific date yet but that's what we're targeting and there is a preview now I don't know offhand if they're still accepting new customers into it but we'll to eat something we can look into I'm just a shout out to the dev team right we just want everybody to know that this is a huge effort on our part right you know we launched a lot of services a lot of features throughout the year and you know got a lot of people that work really hard on these this is a big one right this is one that significant amount of engineering went into it it's been in the works for a while so nice work by the dev team that's it
Info
Channel: Cisco Umbrella
Views: 18,270
Rating: 4.4059405 out of 5
Keywords: Amazon Web Services (Website), Database (Software Genre), Aurora Database Analytics Engine, Software (Industry), OpenLate
Id: -TbRxwcux3c
Channel Id: undefined
Length: 36min 15sec (2175 seconds)
Published: Mon Apr 06 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.