Azure Cosmos DB Partition Key Advisor

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi my name is a Stephanie Arroyo and I'm on the PM team for Azure cosmos TV today I will be discussing the new azure cosmos EP partition key advisor to help you better understand partitioning and thus increase your performance and lower costs cosmos TV is a sure's globally distributed multi model database service with automatic and elastic horizontal scaling of throughput or number of requests per second and storage before we dive deep into explaining the functionality of our new tool let's go over how partitioning works in cosmos II be partitioning is the way in which we horizontally scale the data in our database as the user it is important to keep in mind how your data is partitioned for scalability throughput and performance partitioning is made of two main concepts logical partitions and physical partitions these partitions allow you to group a set of items or data in your collection by our similar property also known as partition key for example if you have a car collection where you have property such as car color VIN number make model etc these are all perfect examples of possible partition keys the way you control the mapping of your data is through studying a good partition key this maps to logical partitions as you can see we partition this data by car color we will have all a red color cars in a partition all blue and finally all black this is how groups are formed logical partitions consist of a set of items that have the same partition key the second part is physical partitions physical partitions are the physical storage of your data that allows one or more logical partitions to map to it based on an internal hash function it is important to note not to limit the number of logical partitions because they will be scaled down to a smaller number due to physical partition mapping however choosing the right partition key is not always easy let's go over some best practices on how to choose a partition key for a write heavy use case first thing to consider is having good storage distribution uniformity over all high cardinality or uniqueness of values and lastly good distich nests at any given time or a second now let's illustrate this with some examples before you data stored cause and CB uses an internal hash function on the partition key to decide in what physical partition your data will be stored all items that have the same partition key will be stored together in the same physical partition in this example we are looking at parking in an international airport the gate serves as a hash function used to determine where a car will be parked based on a partition key for instance if we are partitioning this data set on car fuel type as a car or data as being hashed you will see that most cars are diesel fueled as compared to electric or other types and therefore will be mapped to the appropriate partition the first parking lot is also known as a hot partition hot partitions occur when the partition key design does not distribute the throughput requests evenly limiting the maximum utilization rate and having an inefficient use of the provision throughput and higher costs to avoid this you can choose a partition key like the VIN number or you have a good storage to distribution of your cars because of the high number of unique values that can be an evenly distributed vs. the previous example where a car fuel type will have a high number of requests to one single partition next thing to consider is having a high cardinality having a high cardinality means having a high number of unique values the more number of unique values you have the higher the number of logical partitions and the better the uniformity finally it is important to have good distinctness of values at a given time also known as distinctness per second and this scenario we will add the additional concept of time imagine looking for parking at an airport on a very busy time of days like 9:00 a.m. on Saturday we might know that most people with SUVs travel at this time so what will happen it happens is that on Saturday morning you may have a hot partition of people coming into the first parking lot because it will be less requests coming in from trucks or buses as opposed to SUV and therefore will have a hot partition on the first parking lot to avoid this we typically use a value with a higher cardinality like VIN number this is where the new addition to Cosmo CP the partition key advisor will come into play the partition key is an open source web application that will help you choose a right partition key by providing a comprehensive analysis of your existing data in Cosmo ZP and recommends the best of the test Keys based on high cardinality of values throughput and storage distribution uniformity there are currently two versions of this project one being a live application deployed on Azure and the other is a github project where you can download the source code and add additional features the partition key advisor has a potential for various use cases today we'll be addressing the IOT device streaming scenario for write heavy workloads let's go over a real example say your developer in Kanto so device Corp an IOT device company that ingest data in bursts from device sensors and various locations you want to know whether your current partition key is good or if there's a better partition key to do this let's test out our application once you navigate to the partition key advisor website you would have to enter some account setting information found in the azure portal once you've filled out this information you can then navigate to the candidate partition key section here this is where we will be entering several keys that we'd like to test on our current collection here we have our collection and several items we have in our collection including the vise ID location device location and submit dates of an hour these are certain artist keys that we are able to test with our application in this case we'll be testing out device ID location ID and submit Tekkit so let's navigate back to our advisor here we see that our recommended partition key based on the three we just inputted was device ID based on the workload score of 74% this is calculated by the uniformity score plus a total uniqueness score and plus the distinctness here we will able to select a candidate key to show the distribution here you see how your dataset is being distributed it looks fairly uniform same thing with location ID this also looks uniform however in one partition you see that there may be about a thousand document and this may not be the best in terms of storage distribution let's look at submit second submit second looks very very well distributed and only has about 45 documents within one partition some analysis on this previous graph you see that submit second actually has the best storage uniform community percentage and location ID has the worse moving on to total uniqueness perky we find the cardinality of each of these keys temperature Celsius seems to have the most number of unique values submit second has 661 values device ID 200 values and location ID has 30 if we scroll down to the analysis of this cardinality chart we will see that submit second actually has the highest total uniqueness perky and location ID has the lowest value as we continue on to the distinctness per second distribution we see a device ID has actually the highest number of unique values at any given second in time compared to the location ID and submit second in this analysis of our graph we see that device ID has actually the highest number of distinct values per second so this is how we were able to recommend a partition key as device ID and that was it for the partition key advisor this project is currently open source if you're interested in adding any additional features or functionality please visit our github page contributions are welcome thank you and I hope you enjoyed this video
Info
Channel: Azure Cosmos DB
Views: 7,233
Rating: undefined out of 5
Keywords: cosmosdb, partitioning, partition key, azure, partition key advisor, nosql, azurecosmosdb, nonrelational, clouddatabase
Id: 9v6WbCOzPiM
Channel Id: undefined
Length: 7min 53sec (473 seconds)
Published: Fri Aug 02 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.