AWS re:Invent 2019: Data modeling with Amazon DynamoDB (CMY304)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

alright alright here we go hey welcome to reinvent this talk is about DynamoDB so I hope that's why you're here quick show of hands who's using dynamodb who has used it awesome great great to see so first off just a little bit of my back story with dynamodb I've been using dynamo for probably about four ish years now three years ago I you know if you ask me I probably would've said yeah I'm pretty good at dynamo DB about two years ago this time is when I found out that I wasn't very good at all with DynamoDB I watched Rick Houlihan's talk on YouTube from from reinvent and I don't know a lot of people seen Rick's rick's talk before super useful really great for learning how to use dynamodb so this is the talk that I wish I had four years ago as I was getting started with dynamo so yeah this talk is data modelling with dynamo DB three big things we're going to cover today first off what is dynamo DB when would you want to use dynamo DB things like that second we'll look at some key concepts some terminology on dynamo DB just to get us on the same page with especially for folks that are coming from a relational background things like that and then once we have that background we'll move in to really the meat of the talk which is you know some data modelling examples and some strategies and how you how you model your data in dynamo dB so Who am I I'm Alex debris I'm an engineering manager at Service Inc I'm also an AWS data hero focused on dynamo DB two years ago when I when I found Rick's talk and and watched it you know 15 times over Christmas break I made this dynamodb guide calm which is like a free resource for learning about dynamo DB trying to be in a you know user-friendly way I'm also working on a book DynamoDB book calm so check that out you know if you like this talk I'd say you like that only some some more data modeling examples there I tweet at Alex B debris if you're a Twitter fan I also blog at Alex Bri calm a lot of different AWS stuff before we start just a few related breakout someone I want to mention first of all dat 301 these are builder sessions so I'm doing a few of these Matthew bondings doing a few of these as well but you can just get together with us and talk about how to model dynamo DB in kind of a personal way so that's great Deb 3:25 this is an awesome one I love all these under the hood which like it shows you how AWS runs their services and if you just think about the scale of DynamoDB and the pieces involved it's pretty cool to see what they're doing there so go check that one out and then DAF 403 that's Rikuo hands talk advanced design patterns with DynamoDB I went to that this morning it was awesome there's a repeat session on Wednesday at 7:00 definitely check that out we're on YouTube if you want to kind of compare and contrast Rick's talk in this talk I would say Rick does a great job of really going wide on a bunch of use cases and showing you all the different things you can do with DynamoDB and he packs a ton into like 50 minutes but your heads gonna be swimming you're probably have to watch it a couple times this talk we're we're I'm gonna try and get some depth so really just go deep on one use case if you're not really understanding some of Rick's things come here learn some of those basic principles and see what that's like and then go back and watch Rick's talk and hopefully you'll get more out of it all right so let's hop in what is dynamodb right so dynamodb is a no sequel database no sequel you know this kind of we saw a lot of no sequel databases pop up in you know 2000's when all these internet scale applications came online and and sort of the access patterns they had and the amount of traffic you know relational databases couldn't handle it so you saw things like MongoDB Cassandra and dynamo popping up in there in this time and the interesting thing about dynamodb is that second point is that it's fully managed by AWS right so you don't have to worry about adding servers patching servers upgrading servers fail overs any of that stuff I think you're seeing a really big movement to having fully managed databases like Amazon RDS things like that I think that's even more critical and when you're talking about no sequel because what you have are clusters of machines right and your data split across it and just the more machines you have the more you have to take care of you have to worry about cluster membership and fail overs and and you know increasing cluster size so I think in know sequel you really want to have a fully managed database third thing that's interesting about dynamo there is that you know most databases use a persistent TCP connection reuse that connection for a while not the case with dynamo you know it uses an HTTP connection so a more stateless connection model they use AWS I am for authentication so if you're using compute on AWS like you know ec2 instances lambda function stuff like that you've probably got a role associated with that compute it works really well with interacting with that you don't really have to worry about off as much which is great or at least about rotating off tokens and things like that and then finally the fourth the fourth thing there and the reason a lot of people use DynamoDB is you get fast consistent performance as it scales so you know you you're gonna get single-digit millisecond latency and that's gonna be true at you know one gigabyte when you're you're starting your your your application it's gonna be true at 100 gigabytes or a terabyte or ten terabytes you're still gonna get that consistent performance which is which is pretty great so when would you use dynamodb there are two big big areas that I like to talk about the first one and really what DynamoDB was made for are these hyper scale use cases right so we just have so much data you're throwing at it and accessing it so quickly that maybe a relational database can't keep up and this is things like amazon.com shopping cart right really the sort of foundations for dynamo were were developed at amazon.com as they're going through Black Friday and Cyber Monday and prime Day and all that stuff and realizing that their Oracle databases aren't gonna be able to keep up very much longer so you see that now amazon.com is is pretty much fully on dynamodb as well as you know Tier one services in AWS using dynamodb another example there is lyft so lyft just like all the geo locations for their their rides in dynamodb so if you think about you know all the rides that are happening you know not just here in Vegas taking people to and from the airport but across the United States across the world and all those locations are updating every couple seconds it's pretty pretty high volume service and Dinamo's able to handle that in addition to this this hyper scale use case there's what I call the hyper ephemeral compute use case this is how I sort of got into dynamodb the more serverless stuff right so if you're using compute like AWS lambda or ATS app sync anything like that we have this real ephemeral compute and you might have huge numbers of it scaling up really quickly and then scaling back down it doesn't really work as well in these relational databases right they weren't built for hyper ephemeral compute like that coming from all over things like that dynamodb because of its HTTP connection model because of this like global request we request router layer they have they're able to handle this scale pretty easily in a way that our relational database come in so I think that's another place you're seeing it'd be really popular all right cool so let's dive into some key concepts with dynamo first four key concepts you need to think about these are table item primary key and attributes we're gonna walk through this with an example so imagine you have an application that needs some some authorization authentication so you have a session store write as users login you authenticate them and create a session store that in them and give them that session back to them on subsequent requests they'll include that session and a cookie or a header or something like that so you might have some data like this here I've got a couple records here with some Marvel characters in it so these four different characters are in my session store and all the data together this is called the table right and this is this is just like it's similar to a table in a relational database or maybe a collection MongoDB if you look at a single record that's going to be called an item that's gonna be similar to a row in a relational database or a document in MongoDB now when you create a table you're gonna have to specify primary key and that primary key is gonna need to be included on every item that goes into your table so on this one you know the session ID is our primary key that's how we're gonna access our data it has that unique UUID that will generate it will store it in the table and give it back to the user for them to return to us that primary key needs to uniquely identify every item in your table right so you're not going to have two items that have the same primary key that would they would overwrite each other if that happened and then finally in addition to the to the primary key you can have other attributes right and this is stuff like username created add expires at this this is freeform and flexible so you don't need to define these upfront you can have these on your items so somewhat similar to columns in a relational database except that you have that flexibility you'll need to define them upfront and they can differ across items in your table so let's dive a little deeper into primary keys as they're gonna be pretty critical all your access patterns are gonna be driven off your primary keys you really need to do a lot of thinking about how you're gonna access your data how you model that data and then how you model your primary keys there are two kinds of primary keys first one is a simple primary key this is this has just a partition key this is what we saw with that session store a second ago you can also have a composite primary key which has a partition key and a sword key so again let's look at that session store it's a it's a simple primary key it has just that session ID which is the partition key you can have another example say we're storing you know actors and actresses and the movies that they've played in so maybe we use a composite primary key there and we have the actor or actress name as the partition key and the movie name as the sort key and as you can see there I've got two records with Tom Hanks but it's still considered unique because it's the combination of that partition key and the store key that makes it unique so even though I have two records with Tom Hanks they're different movies they can they can be in that table alright let's let's get into some API actions the way you interact with DynamoDB is going to be usually with a AWS SDK so it's very API driven and there's rather than sort of query driven like it would be in a relational database you know select star from whatever instead of that you're gonna have more sort of AP is that you're writing our programming language I split it up into I split the API up into sort of three main buckets the first one is gonna be item based actions right and this is gonna be any time you're writing updating deleting an item any time you're acting on a single item you're going to be using these item based actions so the important part about an item based action is you must provide the entire primary key when you're doing that right so if you wanted to delete Tom Hanks and cast away from your database you need to say hey delete you know where partition key is Tom Hanks and sort key is cast what you need to delete that and then note that you can only do this sort of one atom an item at a time or you can do it in some batch requests but you can't say you know delete all items where actor equals Tom Hanks you need to specify that whole primary key when you're acting on these items the second bucket of API actions is a query action that's going to read only action but it allows you to fetch multiple items in a single request which is which is really powerful so if we look at this you know going back to that movie example you know if your application was showing the movies for these different actors someone clicks on Tom Hanks's page you said hey give me all the movies for Tom Hanks it can give you you know Tom Hanks in castaway so the important thing with the query is that you must provide the partition key when you're doing that so I must specify the actor you know Tom Hanks I can optionally optionally provide sort key conditions as well so I could come here and just say hey give me all Tom Hanks movies no conditions at all or I could say hey give me all Tom Hanks's movies that are between a and M and the alphabet right and that's gonna give me castaway but not Toy Story so that's how query works last big bucket of API action is gonna be the scan operation it's gonna be similar to a scan in a relational database it's gonna be a full table scan it's gonna look at every item in your table you mostly want to avoid this when you can unless you really know what you're doing or you're doing like an export and ETL something like that you want to avoid the scan it's gonna be expensive at scale it's gonna be expensive in terms of how long it takes to respond to a request and it's gonna be expensive in terms of how much capacity you need to service that alright last last terminology I want to get into before you in the example is secondary indexes so we talked about how primary keys are gonna be really important for modeling your data we're gonna see a lot of that but but what happens if you get into the situation where maybe you have five access patterns you met you design your primary key and that works for two of them but it doesn't work for your other three access patterns how do you add these additional access patterns and that's where secondary indexes come in you declare these on your table and and basically you give it a new primary key or a key schema is what it's called and now when an item is written into your table it's gonna be replicated into that secondary index in that new shape right so you don't need to worry about dual writing to two tables in two different formats or something like that it's gonna handle that replication for you and give you those additional query patterns on those secondary indexes and you can use that query those scan operations on the secondary indexes just like you could with your primary key so let's take a look at why that might be useful again let's look at our actors in movies database right we saw how we can query for actors and actresses but what if we want to flip it what if we want to query by the movie name and give me everyone that's in Toy Story right what we'll do is it will declare the secondary index and right here we're just gonna flip the partition key in the sort key so movie becomes our partition key actor becomes our short key this is what our GSI looks like is the exact same data the big thing that's happened here is just that partition key and sore key have flipped now movies my partition key I can query that that index directly by movie and I could say hey give me all the actors and actresses that are in Toy Story you give me Tom Hanks it'll give me Tim Allen and and I'm good to go hold on again take a drink all right now it's a fun part I hope what's gonna do a dating modeling example here and sort of see what that looks like in practice right we know the concepts let's see what it how we do it big thing here is you really want to work with a process right and I think these are the three big steps you need to go through first of all start with an e Rd this is an enterprise or an entity relationship diagram you should be making it in a relational world as well but basically an entity relationship diagram you know thinks about your application say what entities do I have in my application and how do they relate to each other right do I have a one to many relationship one to one many to many relationships so get that all modeled out now if you know if you're working with a relational database usually you take Yardi and you ship it directly to the database right your entities become tables you set up your foreign keys to map up these relationships and and you're good to go that's not how it happens with DynamoDB so dynamo step two you need to define your access patterns so go talk to your PM your biz analyst whoever it is and say hey how are we gonna use this application how we're gonna fetch these entities manipulate these entities what are our access patterns and and really write those down don't just sort of think about it really write those down and make sure you can handle each of those then when you've written down all your access patterns that's when you get into it you design your primary key you design your secondary indexes to handle these access patterns specifically I for people coming from a relational world I think it's it's tough you got to forget your relational experience in a lot of places a couple places I see that happening most often is number one normalization you know we've learned normalization third normal form all that stuff don't duplicate your data all that stuff for a long time and that just doesn't work with dynamo and that and the reason that doesn't work is because of that second point there are no joins in dynamo so you can't join your data together right so in a relational database you normalize your data when you're writing it and then when you query it that's when you sort of aggregated it back together and joins it and returned to the user the problem is those joins get expensive at scale so rather than having these this flexible query language these flexible joins what you do is sort of pre aggregate your data you do normalize it so it's in the shape that you want to retrieve it in when it when read time comes around the third big thing I see people struggle with is is they're used to having one in any type / table right so you go back to your er D if you got a couple entities on there you make a table for each one and and that's how where they go with dynamodb you're probably gonna have a couple different entity types in a single table right so whereas in a relational you might have a user's table and an orders table now they're gonna be in the same table that's gonna be kind of weird we're gonna look at how you do that how you manage the manage of stage saying well alright cool so let's get into the example we have a our examples gonna be an e-commerce store right so we're gonna we're gonna compete with Amazon our stat our store users make orders an order can have multiple items right because maybe when I go in order I can order a t-shirt and a basketball different things like that so first step we're gonna create our erd right and nothing different here from a relational database map out your erd know your entities know your relationships and how they relate to each other here's the erd I came up with I've got four different entities and three relationships between them let's start in the top left there so I've got the the users entity which is you know users gonna create a create an account give me their name their address date of birth things like that then going down in the bottom left there's the user address entity so user might have multiple mailing addresses right because maybe I want to mail sometimes to my house sometimes do my business sometimes to my parents house for gifts things like that so there's a one to many relationship between users and addresses then going across on the top a user can have multiple orders right because hopefully they're gonna come back to your store they're gonna order a bunch of times you're gonna make a big I P oh all that stuff so so multiple orders for a user and then down on the going down the right and order can have multiple order items right because you can be ordering a book a basketball a t-shirt all that stuff so so each item in that order is going to get put into the order item alright so next step got to find your access patterns right that's our second step in this process I wouldn't talk to our p.m. these are the access patterns I came up with number one get user profile right so if someone's searching around in your site and they click click themselves they want to see you know their name their address they've got set up their credit cards their email all that stuff so so fetch that user profile second access pattern we have is fetch all the orders for a user right so if I'm looking at my profile I want to look at my order history and see how many times I bought something from our site the third one falls out of that and that's when you want to fetch a single order in the order items right so I'm looking through my order history I see one that's kind of interesting why did I spend you know $120 last July I click on that it shows me the order I it showed me the order and all the order items I can see what I bought the fourth one is you know similar to the second where I get all the orders for user by status right so if I have a lot of orders maybe I just want to look at my cancelled orders or my returned orders or the orders that are being shipped right now or the orders that I've placed and haven't shipped all that stuff so as a user I can filter down and see what I want to get there and then finally this last use case is kind of interesting this is not a user facing use case but for the warehouse crew I have some warehouse people that they go and pick the orders put it in a box and get it shipped so they just want a way to know like across our entire store what are some orders that are open so I can go pick it get it boxed up and get ready to go so get an open order across our entire store alright so we've got our five access patterns time to dive in start modeling our our primary keys and secondary indexes and this can be this can be tricky to start especially when you're new with dynamo it can be tricky to figure out how to get started here and you might start going down the road you know kind of work yourself into a corner and realize you need to backup erase some stuff and remodel it so don't be afraid to iterate on this it's much it easier to iterate at this stage then once that codes live in production and your data is there so so make sure you you know take the time to do this right when getting started I usually like to start with like a core entity in our application right so we have four entities I'd say the core entities would be either user or orders those are kind of core to the application and then that user address and order items those are more peripheral so I'm gonna start with the users and put a user into our table so let's load two two items into our table two users this is me Alex and we also got Ned Stark in our table to sort of interesting things I want to point out here number one the primary key right so we're using a composite primary key it's got both the partition key and the store key but notice that we've given the partition key in the store key very generic names it's just PK in SK it's not something like user ID or order ID or anything like that and the reason is we're gonna have multiple entity types into this table so you need to make something that the sort of works for all and user ID you know works for users but it doesn't work for orders or order items things like that so you give it very generic names these are just used for data access and the real data is probably gonna be in your attributes that's the first thing note the second thing to note here is is each of those user objects the PK is starts with you know capital user hash and then has the username that might might look kind of weird a couple of reasons you do this this helps you determine sort of what type of entity you're working with right so this is a user entity this is useful for if you're trying to debug maybe in the console or figure something out you can say okay I know this is a user entity that's what I'm working with it can also help prevent overlaps so imagine you had users and you and orders and I each had IDs and if they could have the same ID then you sort of overwrite each other whereas you know if they have this prefix and different types can have different IDs and they'll and they'll work also these these values can be useful for querying and sorting and we'll see a little bit of that later on as well so it can be a little confusing to sort of understand what you're you're shorter entity patterns are one thing I like to do is I make it I don't know I call it an entity chart but what I do is just write the entity types I have down the left side and then for each of those entities I say hey what's the PK pattern and what's the SK pattern for that entity right so for in this case for my user that P a pattern is gonna be user hash username and that SK pattern is gonna be hash profile hash username right that's what every every user that I put into my table is gonna look like in their PK and SK alright so we've modeled one entity in our table we're gonna look at one-to-many relationships here and one thing that's interesting about DynamoDB is you know with a relational database there's generally like one way to do something right if you're adding a one-to-many relationship you you add a foreign key and you're done with dinamo it sort of depends right it depends on on your data it depends on how you're gonna access that data different things like that and there's many different ways to model your relationships so you really have to just like know a couple of different strategies and know when to use each one of those so we're gonna walk through three different one-to-many relationships here so we have our four entities with our three relationships let's start with this relationship on the left side so a user has multiple user addresses right because I might want to ship to my house my business a relative's house something like that right and as you start thinking about user addresses how am I going to model this relationship the first thing I think is okay do I have any access patterns that that operate on the address directly and you know I don't have a fetch by a dress or a fetch user by address or anything like that so that's kind of interesting and then number two the number of user addresses that a user can have is gonna be pretty bounded right I can I can easily say hey you only get 10 addresses that you can say you can't save 400 or something like that whereas like a different relationship like users and orders I don't want to sort of bound the number of orders that they can have someone to keep coming back over time so so anyway those are both two interesting things and given that given I'm not gonna access that address directly and also that it's a bounded thing I'm just gonna denormalize this data right so I have my user profile these are the same two profiles I showed earlier and all I've done is I've add added this addresses attribute where I store these addresses on this user and you can use you know complex data types like maps and lists in your adverb yoots and I'm just storing these these addresses there so you can see you know Alex has a home address and that's got a home and a business address there so that's the first way you can you can handle these one-to-many relationship is that just denormalizing your data and putting in a document if we go back to our entity chart we don't need to add an entry in our energy chart for user address because you're not fetching that directly there's no separate entity exactly it's just part of that user all right now let's look at a second one to many relationship this is users to orders and and you know a user can have multiple orders the way we're going to model this is probably the most common way you're gonna want model one-to-many relationships and that's just using a composite primary key and using that sort key so I've added a few orders to this table I'm gonna walk through a little bit about that order first and then well then we'll see how to how to do that one of many stuff but if you can see here I've added five orders one of them is highlighted there the peak a pattern there is order hash and then it's got an order ID and then a couple interesting things to note is just look at the attributes on that compared to the user profile right so my order has an order ID it has a status whereas you know my user has a full name and email so they're just attributes on each one of them that don't make sense for the other one but it's okay I can have different attributes on different entity types dynamo is not gonna get mad about that I can I can be kind of freeform with my attributes there if we go back to our energy chart how are we modeling that order in our table so the order the PK is user hash username and then the sort key is order hash order ID interesting thing to note here notice that user in order have the same p case they're gonna be within the same partition right which is crucial so now if we want to say hey go get all the orders for a particular user I could use a query like this this is kind of dynamodb pseudocode I would say but but fairly close you know run this query where the PK is is user hash Alex debris and then I'm gonna use the begins with operator and say hey make sure my sort key begins with order hash that's gonna go to my table it's going to go to the right user partition and it's going to select just the orders it's not going to fetch me the user profile it's going to select me just the orders and return those back to me so that's that second one and many relationship you use where you use a composite primary key using that store key value to find what you want let's look at our third one in many relationship here on the right side an order can have multiple order items right so I can order a basketball a book a t-shirt all in one order and you might think let's just take that last strategy and run it back right why can't we just do that again the problem here is that we're doing we're trying to model a one-to-many relationship of something that's already the object of a one-to-many relationship right so our users have orders and our orders have order items and that's not gonna work to use that composite primary key to go down another level so let's walk through how we might model this I've added some items here at the bottom of the table that are highlighted in red there you can see the primary keys item hash item ID and the order is or the sort key sorry is order hash order ID going back to our entity chart we've got those all in our table right and complete our energy chart or item is order hash item item hash item ID sort key is order hash order ID important thing to note there is the sort key is the same for orders and order items right so that's important to note if we go look at our table we can see that those those sort keys are the same so I've highlighted the order that one's the one up top there in the user the Alex debris partition but then we got the order items that are down below they're in a different partition right now right so I can't access them with a single query this is where I'm gonna add my first secondary index I'm gonna use a strategy what's called an inverted index which is just if you have a composite primary key and you flip the partition key and the store key values that's an inverted index that's what we'll do here so let's take a look at what that looks like here's our index again this is all the same data you know it's all been replicated over the only thing that's different is I have a different primary key on this index right so in this case eske is now my primary key and piqué is now my sort key if I and now if I want to get an order and the order items I can do that I can go straight to that order partition and pull it back the interesting thing to note here is notice that I'm pulling back to different types of entities and a single request right I'm getting the order which is that that first item there that has the PK of user hash Alex debris but I'm also getting the two items so I'm getting two different types of data in one request this is doing that join for us right we're pre aggregating our data in the ways that we want to access it and and this is how dynamo can can be fast and at any scale right so just to recap what we looked at there we looked at three different one of many relationship patterns right and and we had three different patterns and we accomplished them in three different ways the first one we use was an attribute we denormalized those user addresses and just put them into a map on to our user profile the second one we did we used a primary key and use that query that's gonna be the most common way you do that and final way is is similar you know but if your primary keys already used for something else then you can add a secondary index to give you those same same kind of semantics and use that query on that secondary index alright let's hop into filtering this is a section sort of like the one-to-many we're gonna walk through a couple different examples you know if you're talking about database and data design and data access it's really a big exercise in filtering right because what you have is you have a lot of data and a different types you want this data but not that data or that data but not this data and you need to figure out how to do that efficiently how do you do it quickly and then cheaply right and if you're coming from a relational world you you've been spoiled because you have this this where clause right it's super flexible you can do anything you want with it you can you can only look for four items or rows with a particular column name you can use these built-in functions you can join and filter on a joint value and there's there's nothing really comparable in dynamodb right you need to build your filtering into your primary key but before we get too far in that I do want to talk about filter expressions because you might go out and and Google or you might be looking through the SDKs and you see filter expression and you think it's going to save you and it won't save you so let's look at this example here again these are our users and our orders I've highlighted two items there or - excuse me - orders they both have the status of shipped right and let's say across my entire database I want to look for for just the item the orders that are in that shipped status right and if you're looking through the documentation you might see that filter expression so so here's some some pseudocode using Python you import boto 3 which is the AWS SDK for Python you create this client and you say hey I'm at a query I'm gonna query my table and I'm gonna give it this filter expression that says status equal shipped right and it's just gonna give me exactly what I want back the problem here I'm gonna take this back to a slide from the beginning about when we're talking about the query API right and and when you're doing a query you have to provide the partition key in that query right so you can only do a query within a particular partition the problem here our orders are in two different partitions right one is ned stark's one is mine you can't clear across partitions so that's not going to work so this one out the window but you know you keep looking through the documentation and you notice that scan also allows the filter expression and and you probably think hey I'm smarter than Alex he told me not to do scan but I'm gonna do it anyway because I'm using this filter I'm not going to be looking through stuff and you probably are smarter than me but I tell you it's not going to save you here so let's look at how a scan works right or sorry a filter expression works when you when you push off that scan to the back end the first thing that dynamodb is gonna do is read items from your table once it has those items in memory then it's going to look and see if you have a filter expression defined and if you do it's going to filter out any items that don't match it that that filter expression and then it's going to return those items to you right so you only get the items that match your filter expression the problem is if you go back to step one there's a one megabyte limit when reading items from the table right so imagine you have a gigabyte of data in a table which is not a big table what that's going to do is if you're doing a scan with a filter expression that's gonna take a thousand requests to your table there and back to handle this so that's that filter expression isn't gonna save you it's gonna save a little bit of bandwidth on the wire you know maybe you don't have to filter as much in your application but other than that it's it's not going to save you and make more efficient queries so don't rely on that filter expression when you're filtering you have to build it into your primary key you have to build it into your secondary indexes into those designs up front right so so let's look at a few axes we have that our filtering based and we'll see how to see how to implement those like we did the one-to-many relationship so first one here get orders for a user right and if you're working in sequel this would be you know select star from orders where username equals Alex right and we already sort of looked at this into one of many relationship something go quickly here but that query we have would be you know give me give me items where the PK is user Alex debris and give me the items were and where that SK begins with order hash right and what we're doing is we're using that partition key to filter down to just the items we want just go to that users partition and find the orders there when we do that you know our table goes directly to that partition it finds the order items fetches them and returns back to you so that's our first filtering access pattern right we're not getting all the orders we're getting the orders for a specific user that's how we filter down using that that PK the second filtering access pattern we have is get orders by status for a user right so as the users looking through their order history things like that they don't want to look at all their orders maybe they want to look to the ones that cancelled or returned or letter being delivered and things like that if you were reading this in sequel you might have something like this select star from orders where user name is alex debris and status equals shipped right and so what I want to do here is I guess if we look at the the orders in our table right we can look at Alex degrees orders we see that there's this status attribute over on the far right it's in the attributes though it's not in a primary key it's not something we can query what we're gonna do here this the pattern we're going to use here is called a composite composites or key so first we're gonna add an attribute that combines the status and the creative that date right so we're making this additional attribute we're gonna call it order status date and all did is Jam those two existing attributes together and separate them by a pound sign right so so that first one it was placed it was created at on you know April 21st 2019 so I got that order status date of placed hash and on the date so now I have this order status date in my table and now what I can do on that is I can I can add a secondary index using that value right so I'm gonna make a secondary index where the partition K key is the PK from our table and the sort key is gonna be that order status date if we look at that this is this is what that's going to look like I've got my orders sort of rearranged again separating those user partitions but it has a supporter set of state as the sort key and then if I want to get just the ones that are shipped now I can write a query like this you know go to that secondary index use the PK where user equals user hash shouts debris and and where the order status state begins with shipped it's going to go to that table and it's going to find exactly the item I want right so it's gonna go to my partition it's going to find the ordering with that status the interesting thing about using that composite sort key here is you could use it to filter on on both of those attributes right so I could filter down much more narrowly if I wanted to I could say hey give me all the items for Alec's debris that were shipped between April and June of this year and you get you could do that or or that worship before Christmas of this year things like gasps so you can use both of those properties both the status and the order date in your in your queries so that one was the composites or key last filtering pattern we want to get to into is this get open orders pattern again this isn't a user facing pattern this is for your warehouse crew right they want to query across your whole database and just find an open order that's ready to be picked so they can go pick it and say that it's ready and move on to the next one and this is a pretty hard query for dynamodb generally because mostly with dynamodb you want to sort of narrow down to a partition first and then query within that partition but this is like a global query right how do I query across my entire table which is tricky pretty tricky here so in sequel world this is select star from orders where status equals placed so if we look at our table here we have three orders that are in that place status but again they're in different partitions how do we enable this how do we provide kind of a global filter over our table we're gonna use what's called a sparse index pattern so first of all we're gonna add another attribute just like we were doing with that order status date and any order that's in the place status we're just gonna add an attribute called placed ID right this this attribute can be anything it can be the order ID you can just - whatever the whole point is just that it exists on that item when it's in that place state and then we're gonna create a secondary index on that placed ID now the way secondary indexes work is you know you declare the key elements the key schema of that secondary index and it's only gonna replicate data from your main table that has all the elements of your key schema so if you look at that but you know our user profile objects they don't have a place tidy any order that's in a different status it's not in the place status it's not going to have that place ID our order items they won't have the place tidy so all we get in this Jia GSI are our orders that are in the place status now our application you know that our warehouse crew is using they can they can go to this they can actually use that scan operation on that index and say hey scan that scan that index give me one item back it doesn't matter where it is it's just gonna be a sort of a random one they'll go pick that that item when they're done picking that item they'll come back say hey this this order's picked as it gets update from place to ready or whatever says that would be then that place ID gets ripped off of it and now it's gonna be removed from that sparse index because it doesn't have that place ID anymore so now they can go retrieve a different item from that table alright so just like we did with the one-to-many relationships right we looked at some different filtering patterns here and we looked at three different problems we had in three different ways to solve it the first one the most popular one is going to be using that primary key where you gonna be filtering down within a partition or maybe with um you know using that sort key the second one we use that composites or key where we combine two different properties together indexed on that which allows us to query on on sort of the first one or both of the both of those properties if we want to the last pattern we looked at was the sparse index which which provides us a global filter on our table right and allows us to to weed out a bunch of items that we don't care about and get us just down to what we want so that's all I have yeah thanks for coming today I'll be around for questions if anyone wants to ask questions thank you [Applause]

Info

Channel: AWS Events

Views: 91,775

Rating: 4.9639449 out of 5

Keywords: re:Invent 2019, Amazon, AWS re:Invent, CMY304, Serverless, Inc., Amazon DynamoDB, Not Applicable

Id: DIQVJqiSUkE

Channel Id: undefined

Length: 39min 46sec (2386 seconds)

Published: Tue Dec 03 2019