Deep Dive on Amazon Cloud Directory - April 2017 AWS Online Tech Talks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to today's webinar deep dive on Amazon Cloud Directory our presenter today is Clint Van de man today with us we also have a team of moderator they'll be answering any questions you have throughout the webinar so keep them coming they're eager to practice their typing skills with us today we have a little Jacob Mahendra Sergio and Sri so thank you to everyone who is participating in this webinar we'll go ahead and get started quint welcome the floor is yours all right good morning good afternoon or good evening everybody as mentioned my name is Quinton daemon and I'm the Business Development Manager here at AWS for identity and directory services now before you let that title scare you too much please know that for my first two and a half years here at Amazon I spent those as part of our AWS professional services organization and I'm certainly a technologist at heart and in practice and continue to do so in this new role of mine and that role of the technologist is really the perspective that I'm here to talk to you through today as mentioned we've got a number of great moderators from across AWS on the line with me that are going to be helping out today and they're going to be able to answer your questions in real time so key to keep them coming and then we'll aggregate the ones that we think are most applicable for everybody there at the end so with that let's get into it let's talk a little bit about Amazon Cloud Directory so now the story here about Amazon Cloud directory I like to talk about it's really being a story about applications right as each of us go through our daily lives today we're interacting with an increasing set of a really innovative new-age applications everything from Zillow to Airbnb to slack or Twilio I mean whether it's at home at work or play certainly our lives are changing around these applications and while I would never discount or underestimate the amount of technical innovation that goes on at all levels of the technology stack necessary to bring those applications to you I think it's a reasonable premise to say that without great data behind those applications they wouldn't have the same level the impact engagement as they do and that's what cloud directory is all about it is providing a great new data store for those for those types of for those next applications that are coming down the pipe so now that said their data is coming in increasingly different shapes fashions and forms today and because it's going to be really fundamental to our conversation today and to your understanding of Amazon Cloud Directory overall I just want to take a second up front to really kind of baseline instead of common point of understanding for us for some of these different common data models and I'm actually going to start here on the right hand side with relational data just because that's the one that the most people are the most familiar with and in relational data we've got row data that's stored in rows and columns for tables those tables have relationships between each other one-to-one one-to-many etc that provide kind of the linkage between the data and just the easy this data is often transact and certainly just the easy mental model here is our systems like online banking next we move towards the mill we and we start talking about graph data now in graph data we're going to trade those rows columns and relationships for objects and links that form a loose network of data the those objects or nodes represent the data themselves and the links provide the connections between the various nodes in the network and the easy mental model here is to think about your social networks your friends and your your friends of your friends etc and then on the Left we've got hierarchical data and hierarchical data is really what Amazon Cloud directory is all about so Amazon archaeal data is a very defined tree structure of data tree bees it too has nodes and links but here the nodes and links take on a much different structure we've got the notion of parents and children where things where nodes inherit and aggregate as they flow down the tree the mental model to think about here are our constructs like organizational charts where you have reporting structures or office locations and we're going to drill in quite a bit as we go into this so that's really clear for you let's try to make that real right a hierarchical data can is sometimes isn't as easily is in the tangible or concrete as some of these other structures that we've been operating with for longer so let's do this by example and here we've got a simple organization chart the organization chart for the Acme company and you can see that at the top of this hierarchy we've got the organization itself and at the bottom of the hierarchy we've got a couple of employees Tim Sally and Jim now on the left-hand side of this tree you can see that we've got one set of relationships and in this case we're representing something like the department or the group that these folks are a member of that we've got a department for n nearing and then sub-departments for dev for development and tests underneath that obviously Acme company hasn't really bought bought into the DevOps model quite yet but will forgive them for that and and you can obviously see through the relationships that the Tim and Jim are members of development group excuse me Tim and Sally are members the development group and Jim is a member of the test so that's one set of relationships then on the right we've got kind of a corresponding set of relationships but this time instead of orienting around a business unit or a department we're orienting orienting around a global location or office building right and here we can see that the Acme company has two worldwide offices one in London one in San Francisco and our same employees that had one set of relationships that defines how they fit into the reporting structure of the organization oh I'm much differently when it comes to their global office space and we'll see more examples of how a cloud directory enables this type of multi-dimensional hierarchical data as we get into the presentation so but before we get there I want to take a step back a little bit and talk a little bit about some of the challenges that customers have historically faced when working with hierarchical data and probably the first place to start is that many of the traditional solutions that folks that have aligned well with hierarchical data predominantly LDAP based doors like open LDAP or Microsoft Active Directory do pretty well for storing hierarchies of data that have a single dimension but they really break down when you try to have to support hierarchical data with multiple dimensions like the one I've just shown they just weren't ever at really engineered for those types of use cases and they struggle to support them next and more come you know more commonly folks would turn to other realtor data stores like relational databases to try to store these complex my multiple hierarchy data sets things like my sequel or even graph databases and while those provided some relief around the multi hierarchy problem they're really a thing they're really inefficient for storing and more particularly querying that type of data really particularly with things like relational databases you have to do self describing or self referencing tables with very complex or recursive joins and just it's not it's not efficient it's not and it's certainly not an optimized store for this type of data kind of a round peg square hole type of equation really in either case that's also a lot of complex infrastructure we believe a lot here at AWS about the notion of undifferentiated heavy lifting where we want you to be able to focus on your unique applications or the unique things you're building and not have to deal with patching and backups or or scaling mechanisms and in all those other with all those other tools most of that is still traditionally left to you and finally we had in most of those other types of data stores we have this inflexible scheme of problem and that kind of comes in one of two forms first because in most of those other forms at least the ones that do support schema based data the schemas are stored inside the instance of the data structure itself and so therefore they can't easily be shared there across applications or across different iterations of the same data set that and similarly because they're stored within that that particular instance of the data store makes them make some rather inflexible overtime you know is if we need to have new applications leverage our data store adding or extending to that schema in a way that that's flexible for the new applications can be quite challenging and to the to directly address those challenges and more that's why we've come to get there and build Amazon Cloud directory a fully native excuse me a fully managed cloud native directory we're going to talk a lot about the different features of cloud directory as we get into this deep dive but certainly as I mentioned the bedrock of cloud directory is all about organizing a hierarchical data and particularly hierarchical data that spans across multiple dimensions cloud directory is a fully managed infrastructure there and as servers there's no knobs there's no dials there's there's nothing to patch we handle everything in a fully managed way for you you simply just issue your API calls against it and we handle all the maintenance operations scaling etc next you know cloud directory is highly optimized for directory based workloads right means these workloads generally have very specific query and usage patterns things like being able to query about what about all the children for a given object or perhaps what are all the parents or the the pass through the parents back up to the root of the tree and we'll get much deeper and we'll explore some of these concepts in detail but but certainly providing a set of very directory specific capabilities for searching the objects and the relationships between those objects the core part cloud directory and then finally we really wanted to address that notion of schema and flexibility and we've kind of given you away that we'll see in much greater detail as we go on to both externalize the notion of a schema from the data itself so that you can reuse schemas more easily and provide a great way for you to expand expand and and reuse your schemas over time as your data changes as you add new dimensions to your hierarchy or as you add new applications that need to support your data store so with that said we like to be really prescriptive at it abuse or at least certainly as prescriptive as we can and so because cloud directory and hierarchical data sometimes is still a relatively new construct I just like to offer you this these for self directed questions as kind of a way to orient yourself and figure out if cloud directory is a good fit for your project or for your application and as I mentioned several times that certainly just starts right off the bat with is my data hierarchical can I describe you know when I draw it on the whiteboard does it resemble a tree with parents and children next do you want to enforce a data structure right so sometimes we don't necessarily know the the attributes or the qualities of the data beforehand which which means that you know schema enforcement can be something of a challenging thing but here we certainly are talking about we want a scenario where we want to be able to enforce that type of schema that has great benefit to us as application developers because we can offload the validation of data onto our data store using that structure but it does require that we be that we want to have that sort of rigidity up front next does my application is a bias towards reads so with Amazon Cloud directory we're looking for use cases where the read to write ratio is somewhere approaching ten to one or even a hundred to one or more and that's pretty somatic with most hierarchical workloads as we'll see as we get into it but it's important to understand that cloud directory is optimized in that way and lastly do you want a data store that's designed to scale horizontally without you know any complex effort on your part you know so traditionally things like relational databases are supremely problematic to scale particularly horizontally nasty words like sharding get into folks vocabulary when they start to look at those types of problems and it can be a real challenge but with directory that that scaling that horizontal scaling is all handled for you so again if you find yourself answering yes to all of those questions it's a good indication that cloud directory is going to be a good fit for your project or for your application if it's not that's okay we there are lots of other AWS data stores out there from no sequel options like dynamodb or relational options like RDS and I'd highly encourage you to check those out if you think that that's a better fit for your application all right got a little ahead of myself there so so let's talk a little bit about some of the just even to cement the concept a little further let's talk about some of the use cases that that we see customers exploring with cloud directory now the first one I've already gone into a little bit and that's the notion of organizational church and there's a lot of just natural hierarchical data in organization charts things like reporting structures who's my boss who's my boss's boss things like office locations like I showed where we might segment our business based on geography and then office location etc apologies for the slides kind of bounced down on me here or we might even have third dimensions things like a gap you know cost centers or other financial elements within the organization the next use case that we see a lot of customers showing great interest in is that of wheat management systems now fleet management systems might come in forms as large as you know industrial fleets things like heavy equipment or for cars or trucks imagine where cases where we've got fleets of vehicles and we need to assign drivers to vehicles that we need to assign payloads to trucks we need to assign trucks to locations and the jobs there's a lot of just kind of natural hierarchical data in those types of workloads or at the other end of the fleet management perspective fleet management use case we've got cases like IOT where a given IOT device might have lots of sensors that relate to it and then those IOT devices themselves might aggregate in you know across wind turbine fields or solar fields or etc and there's a lot of just kind of natural hierarchy to that data as well and then finally we've got things like learning management systems or course catalogs and that could range everything from the course catalog at higher higher educational institution where we need to keep track of where our courses fit into majors and those courses might fit into multiple nature's things like a introductory system to statistics classes might fit in the biology just as well as it fits in the engineering we also have different dimensions of that type of learning data where we need to keep track of professors that teach classes or enrollments of students etcetera and again we kind of see where that natural hierarchy comes into play so now in addition to those kind of externally facing use cases AWS has already used cloud directory ourselves to build a number of exciting applications that you've already seen rollout to you AWS organizations was a new service it was released earlier this year that's all about creating structures or groups of multiple AWS accounts and applying policy to them and then Amazon Cognito that we'll talk even further about in a second as it has a great feature called your user pools that allows you as an application developer to to create pools of users that are interested in your application or that use your application and it's in no way important for you to understand necessarily what these AWS services offer what they do or how what they offer but that the real takeaway here is that AWS were certainly a drinking our own champagne by using a tier 0 building block service like cloud directory for our own purposes and it really speaks volumes to the durability and the reliability that we've engineered in from the ground up and the fact that you're kind of seeing the pattern that you see in many places across AWS where we're using our own services as building blocks for higher level services to get those services out faster organizations incognito in this case didn't need to go about building a specialized hierarchical data store that to provide the functionality that they wanted to provide because cloud directory offers that for them okay so let's talk a little bit about how cloud directory fits in with the kind of broader pop for excuse me portfolio of AWS directory services and you know the name cloud directories certainly sparks connotations and everybody's technical mind and just want to see if we can help disambiguate a little bit of that so starting there at the left you we've got AWS manage DD and an AWS managed ad is an actual and traditional Microsoft Active Directory so think of this as a service that's designed around infrastructure use cases of things like OS OS authentication group policies etc or use cases that involve kind of traditional Microsoft workloads things like SharePoint then in the middle there we've got Amazon tagging do rain kognito is all about sign up or sign in for web and mobile applications doing a lot of authentication and authorization within those applications and federated access back to AWS services but each and while these can be used in conjunction with Amazon Cloud Directory I want to really draw your attention that Amazon Cloud directory is very distinct and different from either those two other AWS services Amazon Cloud Directory is really all about being a specialized data store for hierarchical data and as a building block service for developers if you have use cases that fit into either that infrastructure tool or that tool for using a kind of you know kind of an identity service for custom applications you need to look at some of these other eight of yes sirs alright so now let's that let's pivot and let's get into a little bit of a technical deep dive so let's start that by going over in a little bit more detail some of the key features of cloud directory I think I described this a couple times already but just to really reinforce it certainly Khan directory the puller feature here is it's all about efficiently organizing hierarchies of data across multiple dimension and I think that we've talked about that in pretty good detail as we'll kind of even in further drill on as we go into things we've talked already that cloud directory is going to scale automatically on managed infrastructure so that means again there's no servers for you to worry about there's no patching for you to worry about there's no provisioning thresholds that you need to set it kind of in kind of predetermine ways to think about how much you're going to use you simply issue the API calls you want and we're going to take care of the scaling automatically on your behalf we're going to take care of the provisioning of the patching of the management on your behalf it's a fully fully managed service lastly or excuse me next we've got certainly the notion that I described earlier where directory based operations for searching for your objects and relationships are fundamentally different than many of the other searching operations you might do on on say relational data and we're going to go into this and even further the tale and a later slide but we've produced with cloud directory a set of API s are specifically targeted at being able to highlight to search that directory in a highly efficient and super responsive way next and we're going to go into some schema in a second here but we've talked about this notion of externalize schemas are being able to easily adapt your changes your your schemas for as your data requirements change next week we haven't quite talked about yet and it's this notion of policies so with Amazon Cloud directory you're able to embedded policies into the hierarchy these come in the form of specialized policy objects that link to your other nodes we're using specialized links that will kind of Zone in on and a little bit further detail as we get into things but this provides you some really unique capabilities to be able to to build business logic around how you want the objects in your directory to be accessed and then we've got the notion of out of the gate support for abs pas trail and tagging and this is designed to take care of all of your traceability your audibility and kind of some of just the general operational aspects of making sure you've got good visibility and awareness into what's going on with your data now before we move on there's probably even a seventh feature that I would call your eyes or your attention to that it's not well representing the slides here and that's about data protection all of the data within cloud directory is encrypted both at rest and in transit at rest it's encrypted using kms or AWS key management service keys on your behalf so it's fully encrypted with aes-256 encryption on disk and then in transit it is encrypted just by the nature of the fact that you're operating with TLS secured AWS API endpoints so great story there in that we have taken care of the fundamentals of data protection for you so you don't have to all right so now that we've talked a little bit about some of the fundamentals how do you get started right and we like to think of this as a very simple four-step process that we're going to drill into incoming slides you're going to create a schema you publish that schema and this is where we talk about that externalize nature of a schema a schema does not exist within a directory it is of itself kind of what I would call it first order primitive or if it is a top-level construct within Amazon Cloud directly once you've got once you've developed and published that schema to the where you like it we're going to use that schema to create a directory and then ultimately get to what we're truly interested in where we're reading and writing data into it all right so to continue on let's cover down just on some basic definitions so we kind of have a common vernacular to go through the first ms basic definition we've got is that of a facet and you might equate this to something like an object class if you're familiar with LDAP but a facet is a collection of attributes and those facets are aggregated in our collected into a schema schema involves a collection of facets a facet is a collection of attributes we use that scheme as I described where we apply that schema to create a directory and then which is going to be our container for all of our data and all of our links and then within that directory we're then going to start creating objects now those objects are themselves a collection of the attributes defined by the various facets and they might either be a node which is a node that can take that can have other child nodes or they could be a leaf node a node that by definition can have no further children in either case those nodes are connected to each other through links and then on either the nodes or the leaf nodes we can attach policies now I've kind of drawn this is a little bit of a simplistic representation but the policies are themselves objects in the directory they're just a very specialized and unique type of object and then they're connected to the other objects the policy links again a specific type or purpose-built type of link and we'll see why that that specialization is important as we get into so now schemas so schemas again are going to define that set of attributes that are going to define that set of data validation for the data in our directory to get you started AWS provides three sample schemas in person organization and device now if we provide these for you as a place to get started but certainly as you develop your own applications with cloud directory you're going to want to take those and make them your own certainly you can adapt one of those samples either directly or just by using it as a reference or once you get more comfortable with cloud directory you can build those in those from scratch so let's take a look and try to make this a little bit more concrete oh it was them and this is an example of one of the sample schemas this happens to be the device schema that is provided as one of the samples and you can see in a very leftmost column we've got here one facet for our device and remember facet is a collection of attributes and here they have it that what's being represented is something like an IOT devices that if that's not perfectly clear and that device has different attributes of that it's got an ID a description perhaps some certificates and some versioning type numbers each of those attributes has is tight and it has a link and whether or not the attribute is mandatory and we talked a little while ago about offloading your data validation to the directory and that that's the purpose of these qualifications on the attributes it allows you to not have to worry about that data validation within application tiers particularly where you've got multiple applications that might share the same data store and just lets you simply offload that heavy whipping to the directory itself all right so I appreciate at this point it's probably still some questions in your mind about how schemas work and exactly how they tie into cloud directory and so let's talk a little bit about the lifecycle of a schema so I think that I'll hopefully solidify it a bit the first phase the schema goes through is what we call the development phase and this is where you initially create the schema you can create it either by uploading a JSON document or by manipulating it programmatically through API calls and in this phase your your schema is under active development you have you haven't even started worrying about creating a directory you're just trying to get your data structures sorted out amongst your team once you've done that the scheme is going to get you you're going to publish that schema to a published state and the key here and reason we have this published state is again the schema we'll see in a second how it relates to an actual instance of the directory but the schema itself is a is a unique entity within a directory now when you publish that schema it does become immutable now that's not to say the schemas of your directories are immutable we'll talk about that in a second but that exact instance of that schema does then become immutable and it's an important part to point to remember and then finally the final stage of the schema lifecycle is where we take that published schema and we use it to create a new directory or apply it to an existing direct and this is where the schema extensibility really all kind of comes together so if while you you certainly use a schema to create a new directory that puts it into the applied state there's there's a couple of other things to kind of keep in mind here one is that you can apply additional schemas to your directory over time in a way that doesn't impact or influence the data that's already in your directory or B once a schema is applied to a directory it becomes think of it as a derivative of that published schema it then becomes an element or an aspect of the directory that lives its own lifespan from there and you can actually add additional facets to the schemas that have been applied to your directory so once it once a scheme is applied to a directory it it starts it can drift away and is separately versions from the published schema all right so now let's talk about and just kind of really clarify about some of the unique Cloud Directory aspects around multiple dimension or multi-dimensional directories and again what this really allows you to do is to represent multiple hierarchies within the same directory and those can be things that you know about ahead of time we saw earlier in some of the examples hierarchy is that involve differences in reporting structure versus financial structure versus office location where they can be hierarchies that that get added into your application over time as your needs expand and Grille now the real power here is that cloud directory comes with a number of very purpose-built API that in computer science terms a really fast and efficient of on tree traversal a uniquely a needing challenge within the computer science realm that's much different about how you would go about finding data in say a relational there's no need for recursion there's no need for these real nasty joining type of statements instead you've just got very simple API calls that are able to retrieve in the case of list of object children all the child objects forgiven a node in the directory or things like list object parent paths a relatively new API that we release even after the initial launch of cloud directory so let's you kind of invert that logic and start at any node in the tree and find all the possible paths back up to the root of the tree and there are other api's that cover different use cases around tree traversal but the key in all of them is that they're really optimized for this use case and there's no penalty or there's no they're very highly sensitive to latency and highly optimized for these sorts of hierarchical data traversal and then as I mentioned it certainly support adding the dimensions over time all right so now let's talk a little bit about cloud directory i/o right and if you've been around AWS for a while hopefully these are terms that you've heard of before but cloud directory supports two different types of lead coughs strongly consistent and eventually consistent read API calls now strongly consistent read API calls are designed for when when timing accuracy matters right and and to get a really precise about what we mean here by strongly consistent that means that if you've got multiple readers and writers against your directory that when a reader tries to query the directory they are guaranteed that they will receive the last value written by any writer right and and this can be super important in cases like IOT where we're looking for sensor data or other things where we want to be absolutely certain that the day that we're reading out of the directory is absolutely the latest copy that was written into the directory on the other hand there are lots of scenarios that involve hierarchical data where you know that timing isn't terribly important if I think about how often my manager changes within my organization that's not something that changes a lot nor is it something where the sky is going to fall if I get data that's a couple of milliseconds or a couple of seconds old and and for that reason we offer we also offer consist of n chily consistent read API calls and the big big deal to you is that these offer almost a 10x cost savings to you and for a lot of these cases where that type of timing accuracy isn't important that's obviously a big cost lever for you to take advantage of and again these these concepts of strong and consistent and eventually consistent follow the same patterns and definitions that you see in other places around AWS things like DynamoDB things like services like s3 etc and then the last thing to keep in mind about cloud directory is and I said this before but it's really optimized for heavy read use cases oh you want to look at cloud directory when your use case is going to involve at least a 10x ratio of reads over write and even getting up into the hundreds or thousands of reads over writes only improves that optimization on your behalf all right so we next we talked a little bit or I mentioned a little bit about policies now policies are a specialized type of object in your directory that are attached to the other nodes field policy mix and the reason we have both a specialized type of object and specialized type of link is that there are policy specific API it's like the lookup policy API it's going to walk up the tree and return all of three all of the policy objects between the node that you're clearing all the way up to the to the root of the tree and you can probably imagine how this would let you create scenarios where you can do inherited policies down the tree you can aggregate policies down the tree lots of great functionality that's that's there and available to you and again you know the reason those are specialized types is that so these very unique API sets can very quickly and efficiently return those policies to you now an important thing to understand though is that these are not policies that cloud directory itself is using to make authorization decisions about the objects in the directory they are not interpreted by cloud directory in any specific way they are merely returned to you so that your applications can use these policies within their own business logic they are stored simply as a blob returned to you as a blob and and you can parse and use those to implement any business logic any authorization system any type of use that you've got for those policies all right pricing as I mentioned before the pricing we try to keep the pricing with cloud directory really simple you only pay for what you use and there's no minimum fee there's a great free tier that's designed you know to kind of let you kick the tires and really get things going or really try things out in a cost-free way but once you do get up to the point where you're going to be looking at using cloud directory for substantial workload you're looking at a simple pricing model that has two basic dimensions storage and access and using Northern Virginia as an example the price does does vary by region on the storage side you're going to again you're paying for basic capacity and you're talking about a price increment of 25 cents per gigabyte and then the second dimension comes in on access or your API calls alright and and here we come back to again the notion that we've got this two different types of reads the eventually consistent and the strongly consistent and you'll see there that's that 10x levered that I mentioned before where the eventually consistent reads a far more price advantageous to you as a customer and then that the right API calls again kind of price the same way as the strongly consistent rate now cloud directory is currently available in six regions worldwide within the US we've got Northern Virginia Ohio and Oregon in Europe we it's available in Ireland and then in asia-pacific we're currently available in Sydney and Singapore now this is always changing and we're continuing to roll out Amazon Cloud directory and new regions all the time and so I'd highly encourage you to keep tabs either on some of the blog announcements or by just using the global service availability page to see where clouds veld cloud directory is available regionally as we move into 2017 further into 2017 and beyond all right so now you know we've gone through some basics here today but but certainly as a developer oriented tool we know that what you're probably really going to want to do next if you're if you think that this is a good fit for your project is start to kick the tires start to get deep with this yourself and get going we are very actively working on some sample code sample application that we're hoping to release out through aw slabs on github very soon very quickly it's going to give you some great sample code that's executable out of the box it'll give you a sample to build from and to learn from and so stay tuned for the to the AWS blog channels for announcements there but in the meantime there's there's already a great blog posts out there with some great sample code about that again is oriented around the use case we've talked about several times of of how to use how to query up and down your treat using multiple hierarchies there's other resources out there available to you as well just again to try to full round out the fullness of your experience or your understanding with quad directory everything from Jeff bars initial launch blog to a blog that really goes deep on on the mechanics of that parent path API that I mentioned and then just obviously of course the traditional service documentation
Info
Channel: AWS Online Tech Talks
Views: 4,643
Rating: 4.5384617 out of 5
Keywords:
Id: UANm3DC_IxE
Channel Id: undefined
Length: 45min 28sec (2728 seconds)
Published: Tue May 02 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.