AWS re:Invent 2018: [REPEAT 1] Databases on AWS: The Right Tool for the Right Job (DAT205-R1)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right well welcome and I hope everybody's having a great reinvent thanks for being here with us so one of the most common questions that I get from customers really centers on this idea of like you know how do I think about my database investments you know it's it used to be a platform choice like I would consider two or three vendors I would pick one that was my platform that was my primary investment that I'd start building all my applications on that platform but today there's there's hundreds of databases to choose from when you really look and that's pretty hard you know to reason and customers like how should I how should I think about picking the right tool for the right job and that's really the topic that we're gonna explore together today so if you leave this session just having a better frame on how to think about these different databases their purpose the use cases they support then we've done our job so I'm gonna try to cover the the whole family of databases in the small amount of time we have we're gonna slip a demo right into the middle and then at the very end I'll just have folks come up enjoy this upfront to answer questions so let's let's get into it so if I were sitting down with you and we were talking about databases this is one of the questions that I'd ask you straight away what is your database strategy some customers that I talk to have a point of view on what their database strategy is other customers are actually in the middle of thinking well what is our database strategy going forward and then and then everything in between but what I find in this question is two areas of focus almost always come up in almost every conversation I have in those two fundamental areas that emerge are really around lift and shift you know how do I think about lift and shift moving from on-premise into the cloud the second thing that comes up is how I think about new database or in that database investments and new apps because that's that's a very different thing than it than it was in the past and how people built their applications so let's take a closer look at lift and shift so when I get into discussion around lift and shift typically what's happening is somebody's got a set of applications that they've already built they don't have budget to rewrite those applications they're trying to find ways to free up budget and one of the things that they start looking at is hey if I can move some of my existing applications into a cloud can I get some sort of return that frees up budget that I can invest in other places I can start innovating and that's basically what the discussion is in its shortest form but in that what I often hear from customers is this this notion of hey I actually really want to move off of these old guard commercial databases I typically don't even have to ask why because what they'll continue to tell me just yesterday is having lunch with somebody I didn't even know I just sat down sat next to somebody asked him what he did for a living and he started telling me that hey I you know we build a financial payroll thing in small medium-sized business we're trying to actually move off of this commercial database I'm trying to free up some dollars so I can take that money and reinvest it back into building new parts of my application that's that just happened yesterday now what comes next is you know I don't want to be stuck I don't want to be stuck with punitive licensing terms these Licensing's terms change on me to change my behavior that doesn't work for me I'm really thinking about moving over to open source like that's a thing that we want to do in this case this particular gentleman told me he's like we want to move to Arora right away so when you think about that comment going from commercial like Oracle or sequel server to something on open source the next thing that I typically hear is hey we've been experimenting with open source on-premise it's actually hard to get this to perform the way we need it to but this is basically why we built Amazon Aurora sororal gives you the performance and availability of commercial with the cost-effectiveness of open source it's really the simplest way to think about it with the roar you'll see often five times the performance of standard my sequel three times the performance of Postgres all with the security availability and reliability of commercial-grade at about a tenth of the cost this is really why Annie is always showing up at reinvents and talking about Aurora being one of the fastest growing services in AWS now I also talked to a lot of customers that basically starts with hey ivory I've got legacy apps I want to improve the performance and scale I want to free up resources to innovate and a lot of folks running commercial databases will first move into RDS and they'll move into RDS because RDS is just going to automate a lot of time-consuming tasks that many of you probably do on your own premise I don't know that I need to go through what all of that means but the net is is when you're not doing provisioning and managing servers and setting up h.a you get that time back to invest in other things the other thing that a lot of customers benefit from on this lift and shift category is using tools like DMS DMS is an excellent tool and I've seen some big big enterprises take full advantage of this and the tooling is not really meant to just it's not like point a tool at a source and then pointed at a destination and everything is just going to get sorted out but these tools are getting better and smarter every day I often think of these tools like having an army of consultants by your side through that knowledge through the form of a software so if you're looking to move from sequel or Oracle or something like my sequel or into AWS I would encourage you to check out DMS a lot of folks it saves folks a lot of time and money all right so let's shift gears and take a look at how customers think about database investments for new apps this is very different than how things used to be this is really the crux of why we're here right now so if you think about modern apps these modern apps create all new requirements than what we might have been used to 10 15 20 years ago so for example if you think about some of the largest cloud applications today you'll know you know such as like ride-hailing media streaming social media dating you'll you'll notice some common characteristics millions of users located all over the world everybody's expecting a near-instant experience which could translate predictable sub-millisecond performance these systems need to scale on-the-fly so this whole idea of a one-size-fits-all database that doesn't work anymore instead developers are doing what they do best they take these giant applications they break them into smaller parts and then they picked the right tool for the right job why do they do that most developers say the same thing to me I do not want to trade-off functionality performance or scale so they take a big app break it into smaller parts and pick the right tool for the right job if we went and looked at any of these large modern applications looked at the architecture behind the scenes you're not going to see a platform or one database supporting it you're gonna see most of the customers that are going down this path what they're doing is using the right tool for the right job so it's a variety of purpose-built engines why because they don't want to trade off on functionality performance or scale okay so if you think about common data categories and use cases this this one slide is the one that almost everybody I show it to takes picture of because it's a different way of thinking about things I am personally not a big fan of this no sequel or non relational relational or no single I actually don't think that helps the mental model of anything what I see from customers and developers is they think about a family of databases and they're not competing with each other they're complementing each other so when we think about a family of databases all we have done here is just listed out some common categories along with the purpose of the tool in that category along with some common use cases the use cases is not an exhaustive list it's just to give you an idea so instead of looking at a list of to 300 databases I found when people just kind of turn it on its side and think about these categories then when you're thinking about how do I pick the right tool for the right job it really starts with the use case and the access pattern and then you pick the tech the the way it used to be was pick the tech then go figure out how to do the use case and that's not the world we really live in anymore all right so let's take a closer look at three of these I'm gonna get a little more detailed as we go now okay so let's look at relational key value and graph now the reason I'm picking these three it's not just a random collection is I actually think this really illustrates how databases have gone from platform to more specialized over time so relational really emerged in the 70s most of us are quite familiar with it because it's been around for multiple decades but key value is an example of a new thing you know that starts to emerge in the 2000s graph really starts to emerge in the last 12 to 18 months I don't think it's a coincidence that these more specialized databases are emerging at the same time these modern cloud apps are I don't think that's a coincidence I think that's the reality of what these new apps need purpose-built engines so if we look at relational I'm gonna assume most of us are familiar but just to recap if you think about sort of the purpose of relational in the access pattern this is really about modeling or breaking data amongst tables so I don't have the same data over and over and over and over you know if you think about it when relational first came along storage was the premium so if you kept an address for a hospital a thousand times over and you had to change the address it's like a thousand updates whereas if I just put it in the hospital table give it a key it's one update I'm using less storage but the reality is with these systems when I talk to developers it usually sounds like this I don't know all of the questions that are gonna be asked to this data but what I do know is when somebody wants to ask a question perform some ad hoc query that that must be that data that comes back must be high integrity very consistent and I need that system to make sure that that referential integrity is preserved so in this particular case I get a bunch of data accuracy and consistency if you look at just how you might query that like in this case we've got you know we're using a very simple schema here we modeled a patient a doctor a hospital visits and medical treatments pretty simple we've got some keys that connect these things together so somebody can't go delete a table the system won't let that happen but if I asked a question like doctors affiliated with a particular Hospital pretty straightforward for a developer you know select from where statement the things that meet that particular condition I get the result set back but I can trust that that data is consistent that I'm looking at I might ask something a little more I might say hey I'd like to know imagine if we had a we were an insurance company and we wanted to know the number of patient visits each doctor completed last week well the developer that was implementing the query to answer that question for us might write something like that you know or I select from where and then I group by that's kind of the axis pattern I get this awesome integrity on the data that I'm looking at now as we all know I've seen this many a times like if you overburden you can overburden any database you just ask it to do more than it was supposed to do and that's where things start to you know that's where things start to get in trouble but when you use them for their purpose they can do awesome this is what relational does well ok so now let's look at key value so key value is all about the simple key value pairs it's all about you know horizontal partitioning it's all about consistent performance at scale so if you and I were building a video game app and we had we were like how many users are we gonna have mmm it could be a hundred thousand it could be a hundred million but no matter what it is we need consistent performance at scale she imagine if we built a video game and the things stopped scaling like these players today boom they're gone one click away they're off to the next game we can't afford that so that's why a lot of folks think about key value in these use cases where you need perfect very consistent performance at scale a very flexible model and if you look here in the middle you know the the you know the the language or the the way a developer would interact with the system is pretty straightforward with puts and gets on the right is just a very simple gamers table with the primary key and a set of attributes but let's look at how you might access data so in this axis pattern on the right we have a gamer's table we try to keep this really simple just so we get through this quickly but you'll see on my primary key a gamertag look at hammer 57 there and then under type there's rank status weapon and if you look at status health and progress imagine if we built a video game and part of our application logic needed to quickly understand the current health of a player in real time as the games being played so that's a really simple get go give me that data that query might look that easy where I go get from the gamer's this key the type status and that's what the system is gonna pull out just that it's a very simple yet and it works fast extremely fast or you know if it's one of those situations where we need all of the data or so she associated to a particular game at gamertag that's what the query would look like and in this particular case we're gonna go pull back all of that data but the real magic in this is how you can partition this data very easily for this very simple put get access pattern in regardless if we had 10 users 10 million or a hundred million users the system's going to perform the same now if we look at graph graph is really about highly connected data you know where where relationships are first-class objects they have attributes they can be queried in index so in this very simple drawing there's vertices in a graph others call them nodes vertice graph vertex just means the same thing so in this case we have some customers and categories like product and sport and then there's edges and edges are the connections between these nodes that can have attributes on them and this is effectively what you're querying so what do I mean by this so let's say for example we were working on an app and in the app we wanted to do something like a product recommendation so in this case and if we were using a graph which is its purpose is about highly connected data in this sort of use case let's say here we've got bill Amit and Kevin as customers we have product and sport as categories those are our vertexes or nodes bill has purchased product emits purchased product Kevin follows a sport those are those connections are our edges and then Sara shows up in the system Sara follow sports Sara goes to make a purchase from products and we want to basically show her that customers who also follow sports purchase these items that's what the gremlin query would look like to do that so instead of writing hundreds upon hundreds upon hundreds of lines of code in a database whose purpose isn't about traversing these types of relationships that's what I mean by using something in a way that it really wasn't designed for a lot of folks can try to do these workarounds and try to figure things out but remember I always hear from our customers I do not want to trade off functionality performance and scale I do not want to spend all my time on these workarounds I just need the thing to work and in this particular case that's the query very simple query to do a product recommendation or the other common use case here as a friend recommendation so if you look over on the right Mary shows up as a customer Amit knows Mary emit knows Kevin now the system can say hey do you know like you could think of a friend recommendation in this context that's what the query looks like for that so it's relatively straight simple once you get versed with it then off you go but the reality is I'm just showing really simple things right now now imagine a graph with millions of nodes and all the associated edges and attributes when you write that query you want that system to run extremely fast now if you take one step back and we look at a couple of customer examples Airbnb most people are familiar with Airbnb Airbnb has a awesome engineering team and the Airbnb is not a it's a it's an experience to us as the users of it but there they break that down into all these smaller parts and they absolutely picked the right tool for the right job in this case they'll use DynamoDB or key value for search history because they're dealing they need super fast look ups they'll use ElastiCache for session state and this allows them sub-millisecond you know page rendering and they'll use RDS first as part of their transactional data so oftentimes when you hear that you could think of the time where you're ready to give a credit card and this could be modeled in a certain way that's Airbnb another really fun one is duolingo I just met a reporter yesterday I was doing an interview with her she was from Japan and she uses duolingo for language learning and we were on this topic and I said hey you know if you think a duolingo duolingo which is a language learning platform and they do I think they offer 80 different languages across 300 million total users doing seven billion exercises per month so they break that app into smaller parts they're using DynamoDB for item tracking to see which language exercises were completed they're using Aurora for their primary transactional database for user data and then they're using ElastiCache as a caching layer to speed up descriptions and learning around key words such as the and in it so it's one thing to talk about how developers will will take big things and break them into smaller parts pick the right tool for the right job there's nothing better than seeing an actual demo so Joe thank you very much John so we're gonna do is take what shawn has been talking about we're gonna put it into practice with some live running code and what we built for you today is web application ecommerce site that that sells books perhaps so you've used one of these before or familiar with this this type of scenario and what we're gonna do is we're gonna put ourselves in the shoes of the developers building this site when we built this out and we're gonna look at four different use cases and then we're gonna kind of rationalize you know what is the use case what is the data model and what is the right tool for that particular job and then you know we'll summarize so the four use cases we're gonna look at today are a product table and it's similar with the shopping cart in and orders table that'll be the first experience the second one is a product search the third is a leaderboard and the fourth is a recommendation engine so let's get into it so the first use case we're gonna look at is our product table and that's really the metadata that describes you know these books that you see on the screen right now so for that let's go look at what this data is actually modeled as and I'm gonna pick on the book carbs today and there it is now if we look at this data model the book has a unique identifier right and that's a good and I think that's a pretty standard practice and that has a number of attributes the author the category the name the price the rating the s3 bucket where the image resides and it's a self-contained document so you know this this particular data model lends itself really well to a key value store and why I chose dynamodb for this particular use case is I only have 62 books in my website right now but if I have ten thousand one hundred thousand ten million products and customers I want the access pattern to this particular this particular document to be consistent and have the same performance whether I have thousands or millions and that's what key value is really good for so let's modify this we'll make this carbs vegas-style so we'll add a new book to our product table of course anything in Vegas is expensive but delicious so we'll give it a five rating and we'll save that we need a unique identify and we'll save that to our table so we'll go back to our demo app will look at cookbooks and then we'll go try to find carbs vegas-style there it is right there but what I just did is not how I expect my customers to search for books right I think we've all become very accustomed to a really great product search experience instead I expect my customers to go up here in search and there's the book we just add Vegas cards and you know 2998 now when I thought about choosing the data store that's going to power the search experience you know this is the one area where I didn't want to compromise on functionality I don't want to as a developer I'll tell you the last thing I want to do is build full text search faceting ranking and autocomplete into a database that doesn't have it that's just a bunch of reinventing the wheel not a good use of my time I'd rather be building other experiences for my customers so I chose Amazon elastic search service because that's its purpose it does really well at full text search now you might be asking yourself but Joe you just wrote to a table and dynamodb but you just told me you just searched for this new book you just had inelastic search index how did you keep those two databases in sync or what did you do behind the scene to make that happen let me show you dynamodb has a really great feature called streams and it's available in all tables so I'll create a new product table and there's and we'll create that and for the streams I label it for this particular table when it's set up what allows me to do is every time that I insert modify or delete an item in my particular DynamoDB table it's gonna write it to the stream or if you want to think about it as a cue so with that cue I can then associate a trigger which is a lambda function and we'll have a batch size of one so that every time I write to that table it's going to trigger out that lambda function it's going to go right into my elastic search index for me so that's me pushing this functionality down into the native capabilities of the service so that I don't have to do this in my application tier all right so the third experience we want to look at we have the basics right we have a product table and we have a search experience the third use case I want to look at is a leaderboard and why a leaderboard well because I want my customers to be able to access the most relevant content on my site and relevancy is sometimes measured by the most items that have been purchased this is similar to the Billboard top 100 or New York Times bestseller list so here's my bestseller list I have three items in it so when I think about picking the database for this particular use case I'll tell you what I don't want to do I don't want to write a query that has to do a full table scan of my orders table or group by a sub nation and an order by every time a customer comes to this particular website why is that because I expect to get a lot of orders on my website and that query performance is going to get slower is the number of orders I have in that table increases that's going to be so basically the more successful the site becomes the slower this becomes that's not a good scenario so we thought about picking the the tool for this job we used Amazon ElastiCache for Redis and why Redis Redis has extremely useful in memory data structure called a sorted set that makes it really easy to build use cases in scenarios like this so let me show you what that looks like so I have a terminal right now that's actually connected to the Redis cluster that's powering this demo application and a sorted set is just it sounds it is exactly what it sounds like and this is the query for that so books all times the sorted set I'm gonna query from 0 to 10 I only have three items in there and I'm gonna show you the scores right and that's the data structure that I'm pulling back now when I add an item to this this in-memory data structure it's just gonna update you know the good is the book and then it's gonna update the quantity so let's go ahead and do that now let's see what are we at if we want to pop the last book up to the very top we have 34 for scream ice cream so we we add 15 books or 12 will be good so let's let's buy that and let's pop this up to the top now what we should expect happened in similar to what I did with elasticsearch every time I write to my orders table I have a similar lambda function that goes and writes it to my sorted set still have all my orders my orders table but I have this really simplified data structure that keeps track of my sorted set for me so we go back to our bestsellers ice creams on top we can go query this data structure again and now we see we went from last to first and we have 46 items so every time a customer comes to my website it's that simple of a query and it's incredibly useful all right so the fourth experience is a recommendation engine now the one that we're showing right here and why we want a recommendation engine is we know if a colleague recommends a book to us or we see it sitting on a friend's table that has meaning to it so increases our likelihood to buy that book because there's some social validation there so what this this recommendation engine is showing right here is like hey these are the other friends that have bought in this book and that's a really great tool for our for our website number very visual learners let me show you what this graph looks like right this is the social graph that's powering this demo application Nashawn talked about these circles are the vertexes the dark blue ones are people the light blue ones are books and the orange ones are categories so we can see here is this particular person purchased this book which was also purchased by this particular person and you know they know other people and this is what a social graph looks like why I like to visualize this is a developers it actually helps me write queries a lot more efficiently because I can you know match it with what the actual structure looks like so let's do that the next experience I want to add is when I click I haven't built this yet but this is the next one I'll do and when I get time when I click on this book I also want to present the other books that people have purchased that is also bought in this book I think we're familiar with this experience if you bought this one you might like these five too so let me show you how I write that that query in in gremlin right so this is our this is our graph and we're gonna start with vertex 34 this is just a simplification of the book that we clicked on and from there this first line of code is basically saying like given this vertex I want you to go out and I want you to find out all the other people that have purchased this right so that's the first traversal so it's just going out and saying hey who has purchased this the second line of code right here is saying ok now that we're at those people right now that the people vertex is what other items did they purchase we're gonna remove the item that we're referencing and then we're gonna order by the ratings of those books decreasingly so we get that top list for a social recommendation engine so I already have a console setup with Neptune I'm gonna run this query and then I get that performant result set back with just a little bit of code and this is a great example again of just using the native functionality of the database it's not that complicated to write these queries again I don't want to try to write this query in sequel it's a disaster so with that let me switch it back over and summarize really quickly so we did is we decomposed an application we picked the right tool for the job we chose a key value store in dynamo DB for our product table we chose a graph database for a product recommendation engine we chose an in-memory store in elastic cashed in Redis for our leaderboard and we chose chose Amazon elastic search service for our product search but wait there's more that demo application that you saw that I ran today that's available today it's up on github and we created a one-click CloudFormation template so you can get this up and running in your own account just a 1 1 click and you can go ahead and explore and have fun and extend and look at these different databases so that I really appreciate your time and thank you thank you all right it's pretty fun stuff a lot of people worked on that demo in fact we did this talk last year at the very end the we got a bunch of feedback hey that demo can you build it such that we can download it play around with it etc and the team pulled it all together so thank you for that ok so let's take a look at ledger database it's a whole new category then we'll look at time series then we'll summarize and be done alright just by raise of hands how many people are familiar with the ledger database not a lot of us some of us okay I'm gonna try to cover some key concepts here show you how it works talk about use cases and let's see where we get ok so as it as it relates to use cases what I can tell you is this I did not have a customer come up to me when we started this project and say hey I need a ledger database nobody said that instead what customers were saying was things like this hey boy would be great if the data was immutable it just can't be changed I'd be great it would be great if that data was immutable and could be cryptographically verifiable I have supply chain scenarios where you know I need to be actually be able to trace the source of something like a like a it could be a recalled product for example I need to be able to trust the lineage of that data that it hasn't been changed that's what the conversation would sound like or in healthcare it could be oftentimes when you sell a medical device you have to keep record of who you sold it to you have to keep record of when it was served if you saw somebody else you have to keep record and in that context you'd hear gosh it'd be great if the if the data was immutable and cryptographically verifiable so that if we needed to go look at the lineage of that data we could follow it and know that it hasn't been changed another example would be in think of a DMV scenario car registration sky should be great all these car registrations and tight tracking titles and registered owners boy wouldn't that be something like if you've ever looked at a Carfax have you ever wondered when it said five owners you ever wondered to yourself I wonder if that car really has had five owners like who validated that that's what we heard and it turns out within Amazon several years ago we used to say this to ourselves like think of the think of the control planes that sit behind ec2 and s3 just think of how much activity event activity is happening there you know like gosh wouldn't be great if we had a you know this would really help us in a variety of scenarios we could kind of have the the date of all the control plane events and know that it hasn't been changed so we actually started building ledger technology several years ago but it wasn't until the last year-and-a-half that customer started talking to us just like I shared with you and the union of those two things led to gosh I think people are really asking for what we call today a ledger database so the challenge is that we heard from customers center on just a couple of dimensions one if somebody made a platform choice or was trying to sort of audit changes in a database using a relational database we would hear this hey it's kind of hard to actually build an audit table for a variety of reasons it's not that creating the table is hard it's the no we got to write this store procedure we might have to write that trigger what happens if something does change in the audit travel table how do we keep track of it if we're auditing too much did the application just slow down maybe we should audit less then the second aspect we heard is this notion of hey even if I am trying to audit it's impossible to verify Impala it's very hard customers say I don't have I don't have a clear way to prove that superuser didn't log in and just change the data it's really hard for me to do that the other thing we heard from customers is around blockchain some customers need distributed consensus like imagine all of us in a room observing things and it recording what we saw happen you know that's a very simple way of articulating distributed consensus but we don't know each other and the great thing is if we did need distributed consensus and we were all recording things and you could imagine the algorithm you could write to prove like hey actually that transaction did or didn't happen but a lot of customers say you know what I don't need 500 people I don't know in a room observing what I'm doing I don't need distributed consensus that's not that's not my use case but I do need I do need that cryptograph or that complete verifiable cryptographic way to watch and track data changes I don't need to set up an entire blockchain environment just for that but I do need this ledger thing now let's look at some fundamental key concepts of alleged database so I know this is a new category and I'm just going to use drawings to try to articulate this so as a developer this is this these are the key concepts to just think about one you create a ledger when you create a ledger it's serverless and in a context of QLD B so there's no servers to manage but a ledger has a key component and that key component is called the journal and when you record a transaction so if I use a car registration example like imagine registering a car to a particular owner let's call that the transaction when I perform that transaction I write it to the Journal and a transaction is actually stored in a block on that journal and once the transaction is written to that journal the data cannot be changed that's what we mean by immutable like you can't go back to a block and change data and it can't update it if you if you execute a transaction and you accidentally did something wrong like you you registered a car to the wrong owner the only way to correct it is with a new transaction that doesn't update on the owner so once written to the journal the data is immutable it can't be chained and each of those little blocks just think of the transaction as the input the output is a little hash that goes along with them just trying to oversimplify this okay so the journal determines what we call current current state or history what do I mean by that so think of like a bank scenario think of debit credit debit credit debit credit on the journal and then as a developer you want to query what's my current account balance that's what we mean by current state h4 history is this really cool concept so typically what happens is okay I can wrap my head around debit credit debit credit debit credit on J current state okay account activity but what if I wanted to see the past 30 days or I'm sorry see for current state of my account balance but what if I want to see the past 30 days of account activity so in our system it's just a table that's created by default that allows you to just quickly query account history so just think of it like that all right so the ledger comprises C H and J and J determines C and H so you can blow up CNH and you can get it back with J I hope that makes sense I'm trying to really simplify this now let me show you an example of how a ledger database works okay so the scenario is this you and I are working together at the DMV our assignment is to build an application that is recording registrations which we've all been doing already but what's different is we're gonna record the transactions in a in a ledger database specifically on that journal why because we want to make sure that we have a complete cryptographic verifiable way to just follow that data lineage okay so here we go so we create a ledger and when we create that ledger there's a basically think about it like an empty journal that you see on the bottom we create a current dot cars table and then we have an Associated history dot cars to go along with it so just think about it like that and now we want to register our first car so if you look in the upper right that's our super complicated insert script and this is rotary we're inserting into cars manufacturer Tesla Model Model S year 2012 VIN and the owners Tracey Russell so as a developer that's what I write when I execute that what happens the transaction is written into a block on the journal and when I write that transaction into the block if you see the data inside of that block think of that as the input we run that through a hashing algorithm and then a hash is associated to that data and then from a developer point of view if I go query current dot cars I'd see Tracy Russell as our current owner and then of course in the history table that's associated to it there's just one version of the doc now let's just let's let's show a sale of the car so in this case we're going to from cars where then equals the number that you see there we're gonna update the owner to Rani Nash okay so when I execute this what's gonna happen we're gonna write that transaction into a block on the journal so remember that journal it's append only we're not going back and updating the first block it's a new transaction that's the input it gets a hash there's a pointer that connects the two and now when I go in query current cars I see Ronnie Nash is the owner and how many versions of the doc three have in the system now you're right - so if I need this query the history of hey cue show me the previous owners of this vehicle that's how you do it now just to complete this let's say we saw the car one last time and in this case it's just an update for the owner when I execute that update what happens I write a transaction into a block it gets a hash there's a pointer that connects now on a developer queries current cars I see Elmer Hubbard is the current registered owner and if I go look at the history of the registered owners for that vehicle you'd see I have three versions of the doc so that in its essence is how a ledger database works now one question I get from a lot of customers that have heard this cryptographic verification and immutable part for the first time sounds like this hey could you tell me one more time how that data is immutable and what I point them to is on the journal that first block and then I say hey if you look at like the owner Tracy Russell the VIN and whatnot like once you write that to the block you this database you don't go back and change the data there you just write a new transaction which is recorded as a new block and this is when people go okay I think I now understand what you mean by immutable then could you tell me one more time what do you mean by that cryptographic verification like what does that mean it's not that the data on the journal is all encrypted it's that the message that transaction goes through a secure hashing algorithm and it gets hashed now you have a digest you could publish that digest and then if somebody ever came up to you said hmm I think you might have changed that date I don't trust you then if you had a published digest then you could say okay let's take a look at the transaction let's look at the digest republished you can run the hash against the same data I'm showing you if the hashes match you know that data has not been changed so that's the real power of a larger database the last thing I'll talk to somebody about if I if they're giving me that look you know it's like hey see where it says owner Tracey Russell in the first block if you just changed that capital T to a lowercase T in and submit it rehab it's a completely different hash this is when people go okay now I get it alright so so we're excited about introducing QLD B right we've had a lot of fun working on this project in terms of you know really the things to remember here is the data that you store in here it's a mutable cryptographically verifiable the system is very easy to use and we're excited what we can get done here all right now let's look at time series so one of the things that we get on time series is what is time series data again could you remind me of what you mean by that so time series data is basically a sequence of data points recorded over an interval of time such as what is the weather house list of what's the temperature over time what's a stock price over time some people call that regular time series data time series data is also if you think of an Amazon Fulfillment Center a machine turned on or off I care about how that those changes over time door open closed or I care about this a lot item picked item packaged item shipped I definitely care about how that changes over time because if I have billions of events like that I need to tune this environment in near real-time so I want to capture that time series data but the next thing a lot of people ask is like what's so special by the time series database in other words what they really mean is if I can record a timestamp isn't that just time series data no those that laughed I I know time what makes a time series database special is time is the single primary axis of the data model so X can be one thing and one thing only as time and when you have that assumption in the system it just basically allows you to specialize and optimize the whole stack from ingest to storage to query like you know in query you're always querying over time there's a lot of things you can do to optimize the system from a use case perspective there is a wide range of time series data use cases for almost all customers I'm talking to in one way shape or form they're talking about gosh how do I analyze data as it changes over time you know I have aspects of this all over my business and it's not just IOT sensors it's also in application events like heavily instrumented applications DevOps data and so on and so forth the challenges we heard from customers when trying to build time series workloads it really is hey I tried to do this thing in a relational database but it like I it's it's kind of unnatural for I don't need a rigid schema in fact I don't even know what I might I might have a set of sensors on a robot I might want to collect a certain amount of attributes but I might want to change that on the fly I'm not trying to model out my entire I Oh T environment out of the gate I definitely need that kind of flexibility and if I need to start collecting attributes on a sensor I need that to happen now I do not also another common thing in time series customer they met with we're talking about time series customer says gosh you know the one thing I have a problem with is interpolating missing data points for whatever reason there's a connection it might fail what have you I'm missing data points I need to interpolate if you go to stack overflow and just search up when you get a chance how do I interpolate a missing data point using sequel you'll find folks sharing sometimes eight nine hundred lines of code as an example but in a time series database that's exposed just as a function think of it like a function key on a calculator you should just develop you should just say hey using this series interpolate interpolate the missing data points using this series it should be a single line of code and off you go those are the kinds of things that you should be able to do in time series database and then the other thing we here with existing time series fully managed solutions out there I consistently hear about scale constraints for example I had a customer tell me they one of these fully managed time series solutions out there when it fills up with data it literally will start purging data by default the other option I have is to actually just turn off ingest so the the point that I wanted to make there is the volume of data with these types of workloads is off the charts and when you think about a huge volume of data you don't want to keep all that high resolution data in memory all the time you might for thirty minutes while you're diagnosing something but with a policy you want to be able to very simply move that data from in memory maybe to a warm tier you might down sample that data you might actually want it to just gracefully end up and in cold storage that's really cheap why because if I have a dashboard and I'm troubleshooting something on the spot I see vibrations on a particular machine I might also want to see hey what did the fight what was the average vibration on this for the past twelve months that should just be a simple query to a developer and managing that whole lifecycle of data is not something you should be doing as a full-time job it should just be a policy so this is why I rebuilt timestream really excited about it this is designed architectural II to really have no scale limits the performance are excited about the formants you know we we believe you're going to be able to collect data at the rate of millions of inserts per second process that data very quickly will have built-in functions to help you with interpolation extrapolation smoothing approximation and then of course it's serverless this is the one big thing that we heard a lot of customers like you know when we first started on this journey they would show us pictures and these data it was like Bill it was just things all over the place just trying to create a time series database here you just create an endpoint start writing all right so let's summarize so again when you think about the choices our family of databases is represented across the page you know we have a very you know I think some would argue it's you know our it we have a really good relationship with our customers and understanding of how to scale these systems you know our roadmap is 90 percent driven with our customers what you see on this page is a reflection of that relationship a common question I get from customers is hey what what's coming next when we show up to reinvent next year and what's the new category and a question I almost never get is what's not gonna change on this picture and we want to invest in those things for example I don't think a customer is going to come up to us and say hey really like those databases but I wish they were a little less reliable I wish they scaled less so those are the types of investments that we make every year we really want our investment and that operational aspect to be indistinguishable indistinguishable from perfect all right so as far as other breakouts you know there's there's topics you can do a deep dive on a number of these systems you saw let it be Neptune the ledger I'm just listing some of them here with ElastiCache so on and so forth know that purpose-built is really about taking this big app breaking into smaller parts picking the right tool for the right job so we appreciate your time and energy we know you have a lot of choices of sessions to go to and please help us improve so if you can take the time to fill out that survey we would appreciate it so we'll meet folks up front to take questions thank you [Applause]

Info

Channel: Amazon Web Services

Views: 17,179

Rating: 4.9662447 out of 5

Keywords: re:Invent 2018, Amazon, AWS re:Invent, Databases, DAT205-R1

Id: -pb-DkD6cWg

Channel Id: undefined

Length: 55min 39sec (3339 seconds)

Published: Fri Nov 30 2018