AWS re:Invent 2016: Architecting Next Generation SaaS Applications on AWS (ARC301)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

well hello everybody thank you so much for attending my session today I'm very excited to see so many people turning out to hear about SAS which is a clearly my passion and glad to see so many of you share it as well my name is Todd Golding and I am a partner Solutions Architect with AWS and that means basically I spend my days working with customers and partners or either migrating solutions to sash or SAS or building brand new SAS solutions and basically helping them dig in and figure out best practices and sort of how to deliver all that and achieve the SAS delivery model on top of AWS and when I talk to customers about this they typically come to me with this sort of common question which is okay I'm ready for SAS I'm ready to go all-in with SAS but what are the tools what are the frameworks like what's the prepackaged to answer to this problem for me how can I find my way there and do it efficiently and sort of leverage all the best practices everybody else is doing and for me it ends up being a very challenging answer for that question because when you look at SAS and you look at the domain of the architectural and design problems for SAS you find that SAS spans and intersects with so many of the core architectural principles that are just common to every solution right we have identity we've scale and availability we have all these notions that we talk about when we talk about applications and the truth is there's no sort of separate silo or box that SAS lives in instead what I find is that SAS ends up being this extra dimension or this extra layer that sort of intersects and overlays all of these common and core architectural principles and so when I when I work with customers on this and I talk to them about what their needs are it's more about how to SAS influence each one of these sort of architectural concepts so if we look at identity for example right we've all dealt with identity and potentially built identity in our solutions we're familiar with authorization and authentication and roles and those concepts but when we fall into a SAS multi-tenant environment what we find is that ass brings all these layers of additional considerations like how am I gonna deal with get tenants into the system how am I gonna link tenants to users house all that stuff get provision what's the life cycle all of all that there's this whole new realm of questions that kind of bolts onto identity from a sass perspective and I have a whole new set of considerations that I have to be thinking about so identity really is whatever the core values of identity are plus whatever all these additional sass values are agility is a great example right everybody wants agility everybody would like to have a great DevOps experience and have great CI CD and be rolling out features and functions all the time but imagine in the world of sass right and sass we're the lifeblood of sass in fact is how rapidly can I deploy new features how efficiently and effectively can I deploy those new features so that I'm constantly responding to market changes and customer needs well in the in the universe of sass that's just essential in fact that's a lot of the reason why people go to sass so that tends to set the agility bar a lot higher for sass environments we can't have downtime and multi-tenant environments right we can't sort of have a maintenance window where we're down and we're rolling updates because that would cascade across all of our tenants so as we look at agility we have to say what are the principles and and what are the tenants of sass there that we have to be thinking about and that's the theme spanning all of these bits right and that's a theme really that I want to talk about today which is ok as I work with these customers what are the patterns and the architectural design considerations and the themes I see for how people are addressing these bits and we won't be able to touch on all of these and the time we have today but I'll focus in on a few of these and give you some idea of some of the some of the higher priority areas that are common patterns and and considerations right and by the way another aspect of this is how does AWS and its services influence my approach to each one of these problems but before we dig into the specifics of architecture I want to step back for a second because there's sort of a macro sort of question here I get which is just well what what's what's a good way to implement multi-tenancy in general like what are the themes you see Todd as you talk to customers and you work with their solutions what are the patterns right and it's and while there's any of different ways people could say I'm gonna build a multi-tenant solution I have found three very specific buckets that I think most solutions fall into and that most patterns that customers are following and it's this silo bridge in this pool model that I have here and this theme of silo bridge and pool model sort of cascades through a lot of the discussion of the patterns that we have on the left you see the silo and the silo really represents an environment where I'm gonna put all my tenants and a fully isolated infrastructure each tenant is going to have their own underlying infrastructure and their own footprint and they'll be entirely isolated from one tenant to the next and sure I'll bolt on on the top of that some sort of onboarding and billing and shared experience so the tenants onboarding looks and feels like a seamless experience but under the hood I'm gonna provision separate architecture and you can imagine why this would be appealing to some tenants right because they're either coming from a legacy environment and this is a natural transition for them or they're seeing them in some high compliance environment where my tenants or my customers won't let me run in anything but a siloed model so totally valid model the bridge model as its name sort of implies is is a bit of a hybrid model here the bridge says well for some layers of my architecture I'm going to have a multi tenant shared model where the tenants share the infrastructure and four other layers or groups of services or however you want to think about your architecture those might remain single tenant and I'll have some some mix-and-match of that based on the profile my application and where multi-tenancy might fit in my domain and then the last model the pool model is the all-in sort of model in that model what we're saying is all the tenants will all go into a shared environment they will share all of the infrastructure now the interesting question I get in fact I got this this question just this week while I was meeting with a customer here at the conference which was okay so which one's the best one like which one should I be using and I hate to do this but I always end up answering this question with another question because the reality is all three of these models are valid and there's all these other factors that you have to think about to decide what's gonna align best with where you want to go with sass so I kind of have this natural sort of discussion when I talk about that and in terms of deciding which models might fit for you or which hybrids of these models might fit for you and usually if I lay this out and sort of put silo and pool side-by-side and sort of show the natural tension that exists between the silo and pool models it's a very easy way to sort of tease out the tension in this problem and help people understand what what sort of factors might glean me one way or the other in making this choice right on the silo side where I've got this isolated infrastructure you can imagine what the pros are on this side on this on this ISIL in the isolated world I can very much go to a highly come compliant areas where there's big regulatory needs and really say to my customers hey you're running in this this isolated model and and we can meet your compliance needs or in general because I'm coming from a legacy environment and I've already got a single tenant kind of model the partition nature of this aligns with the way I've built my infrastructure but the bigger bets here for people on the silo model is often what the sort of support and deployment and the risk profile of the silo is right because I'm in a siloed environment I'm not going to get cross tenant impacts right so one tenant can't do something and and and and impose a load that will adversely affect another tenant right in a pool model that absolutely can happen tenants could depending on your implementation and how effective you've been have one tenant could adversely affect another silo not possible the other bits is here I can do very tenet specific tuning in the silo model I can tweak and tune it and say this tenants going to get these kind of customizations and these customizations will help them out and the last one maybe is the most important one here is you'll get this notion of tenant level availability if a tenant goes down only that tenant goes down it doesn't somehow ripple across your entire system of course this isolated model the siloed model comes with some downsides to it as well right obviously if if I'm going to distribute this and have all these separate stacks I'm gonna find that cost and agility are gonna be undermined to some degree right I can only do so much I guess I can buy my resources effect and I can do things in the siloed model to try to be cost-efficient but it's never going to be as cost-efficient as a as a pooled model I'm also gonna have the jelly problems if I surround all this with automation and I do a great job with DevOps in its I've got great scripting here I'm still gonna have a more complex model you can imagine what it means to provision an entire stack and a siloed model as part of my onboarding there's gonna be more moving parts to that it's gonna be harder me to live up to my agility goals and then also hear the last bits or more around deployment and management here if you can imagine all these bits are distributed how do I create a centralized experience for management and monitoring and deployment and analytics data that I have to gather now when it's distributed across all these different stacks it's much harder for me to aggregate it all into a common view definitely can be done but more challenging on that side on the pool side of this we see the absolute inverse of this on the pool we get because we're in an all shared model we get the agility profile that aligns better with what we probably want for a SAS model right because everything's in a shared environment it tends to be easier for us to have a more efficient CI CD story and align with a lot of the mechanisms that are used for CI CD and and continuous deployment I can also do have a whole lot more levers and knobs and dials for tuning cost because my tenants are all in one environment there's all kinds of new things I can do to really take an aligned tenant consumption and my infrastructure consumption to really return a lot of value back to the business in that model right also because all of this is under sort of one hood all the centralized management the aggregation of analytics those bits are easier because it's all running in a shared environment but the challenge of the the pool model is if I'm going all in with the pool model I better have a great DevOps footprint in my organization I have better have really good automation really good robust deployment models because if if if there's something broken in the way that I'm deploying in a shared environment it will cascade across all of my tenants and my entire system will be down for all of my customers so for me that bar of DevOps goes up much higher in this and now the the cons of this side of the world also sometimes apply to compliance obviously if I'm running in a siloed model I mean I can offer better compilers and a compliance in a pooled model people some some tenants may not be comfortable living in that sort of universe right and they're exposed to these cross tenant impacts that can be something that's a downside for them so those are the bits that we sort of consider as we trade off the models the reality is as we consider trading off those models you're not going to be all in on one side or all in on the other I just put those up there as extremes and that sort of drives the conversation and then we'll talk about what where's your business really trying to get to are you driven all by the agility and should you lean a little more towards pool or you really just need to get your current legacy system moved over and maybe that'll move you more towards silo or you're just in a compliant environment where you need that silo model the last bit of conceptual slides here I have is around this sort of landscape for SAS right because a lot of people say what's the reference architecture for SAS show me a reference architecture and there is no easy ability to say here it is here's the blueprint here's exactly what this reference architecture looks like because there are so many variations and how SAP solutions build but I bought that I built this conceptual architecture model just so we could have a sort of a foundation for the concepts we're going to dig into here and it's broken into sort of three dimensions here so on your right side you'll see there's an application view and those moving parts are these big building blocks of architecture that are the actual bits that are used to construct your application right and so now if you think about SAS we got to look at identity identities a huge area we got to talk about and we got to dig into and understand what the patterns and practices there are and what the moving parts of that are tenant isolation AWS offers you a wide range of different ways have sort of separating one tenant from another and at different levels of granularity so we have to dig in and figure out what are those different mechanisms how do we enforce them how do we what are the patterns there that really work for people and what are some of the trade-offs and then data partitioning which is the Stan sort of discussion probably the most talked about area in terms of multi-tenancy which is how can I separate and partition my data but data partitioning gets more interesting because AWS has such a a large set of storage options right I have RDS and redshift and DynamoDB and those add new dimensions to what are my what are my options there and what does multi-tenancy looks like look like so we have to hit on those and that application view really focuses on building an application but equally important to me is this operational view you see on the left hand side so if I'm going all in with with SAS I'm going all I need to go all-in and have a really robust operation story so I have to be able to think about the multi tenant impacts on things like management monitoring and profiling analytics how am I going to do billing and metering and a lot of your agility will come out of how how multi-tenancy ends up landing in those bits now for our discussion today we're going to focus more on these application view I'll hit a little bit on these operational views but in the scope of the time we have but we're mostly going to focus on those the application bit of this but the third dimension of this that I really think is super important because it's a theme that carries through all of this is this notion of agility which span is sort of over the top of this and overarching all of it so every one of these bits identity tenant isolation management monitoring every discussion of these bits is going to be somehow influenced by how you're approaching agility and we've always talked about in development this natural tension between the business side of the house and the technical side of the house and those tensions have always existed and we've always tried to figure out features and functions versus stability and architectural trade-offs but in SAS organizations I see an entirely different dynamic here where the actual architecture you're building and the approaches you're taking to your architecture have a very big influence influence on how the tearing of your solution is offering offered the cost model that you can offer and how successful the business will ultimately be and how much agility they have and offering different flavors to to the different kinds of customers you might encounter or how quickly you can pivot from thought our SAS vision was this but now that we've got a thousand customers in our system it's actually this and how quickly can my architecture move from this model to this model and then the business is always going to be pushing in here because they're gonna say hey I'm in this SAS model we need to be turning features out faster we need to respond to competitive forces faster that's a whole reason we went to SAS have you what are you doing technically to make that possible for me so I see this accentuated and emphasized in the SAS world in ways I haven't seen in other environments so let's dig into identity as the first of those application views right and most it like I said earlier most of you have already sort of dealt with AI identity and have some notion probably of the fundamentals of of off and off but when I dig in and talk about identity with customers I find that there's these other topics that they're interested in talking about as well and this is a handful of them we can't hit them all today but these were the ones that come up most frequently to me so they they're interested in understanding what's the provisioning life cycle like how do I automate the creation of a tenant how do I end up binding the notion of my user identity to my tenant identity and how does all that flow in a way that I controlling and managing access to resources inside the AWS environment and that sort of dovetails into this notion of security and isolation right all multi-tenant environments no matter where you're building them have to deal with this issue of if I'm gonna put people side by side in an environment what am i doing when I how am i answering the question we're customers how isolated am i from that tenant what is your identity doing to assure that nobody else can see my data and nobody else can see the resources in my environment and this very much connects to the identity story because the identities the paths in and with that path we have to attach some mechanism to it that make sure that we're hardening that boundary between access and and controlling access to those resources and the last bit is one that's probably more influenced by the sort of the evolution and the the growth of micro service design and the decomposition of systems into smaller services as people built these much more decoupled systems and these much smaller services this notion of tenant contexts and the fact that tenant context in a multi-tenant environment has to flow through all the interactions between the web of services that are in your environment creates this extra need for another flavor of identity of flowing a tenant context through along with user context so we'll look at how that works as well so let's start with just the provisioning lifecycle part of that right what does it mean to get a new tenant in your environment and by no means is this meant to be the ultimate diagram to say this is the way it's always done but these are the common moving parts I see in this problem right so essentially a tenant shows up and many of us have signed up for SAS solutions and essentially land on some landing page and you're going to provide your fundamentals of who you are as a user but you're also going to provide information about who you are as a tenant what's your name is a tenant which plan am i signing up for free tier bronze platinum what am i doing and once I sort of fill out in that entire form I'm gonna submit that form and then I'm gonna go through a multi-phase process of actually provisioning and getting your footprint as a SAS tenant creating and that first step of that process goes along the bottom here where you'll see the use of an identity broker and identity provider and this is a very traditional sort of identity relationship I've relied hopefully on some third-party solution oz0 octopus all kinds of great partners who implement these solutions and they're going to provide the fundamentals of getting my user created they'll and they'll give me also a pluggable construct so by using this identity broker I can use the identity broker to isolate myself from individual providers so who I use is the provider who I used for MFA who I used for each one of the dimensions of identity gets hidden from me via the identity broker right so through this process I sign up I create my user my I do some validation process to say I am Who I am and now the second phase of this which is sort of across the middle of this diagram which is how do I actually now go about provision in the actual tenant right so I have the user but I still don't have a tenant and in the tenant flow here you'll see I'll go through some tenant management service that service will provide some degree of isolation of the data and the constructs that are there and then on the other bits of this the parts people often overlook is I also have to integrate with any third-party systems that may be part of my tenant experience in this case I've shown billing here it's very common people for people to rely on a third-party billing system as part of their multi tenant solution and as part of creating a tenant I have to indicate to the billing system which can't which type of tenant you are which plan did you sign up and that becomes the account management view into this universe and then the last bit of this is the creation of this iam policy and we're going to dig into this more but it's part of doing this and creating this tenant I need to create policies that will scope my access to resources and this is the natural moment at which this happens and then I showed one optional bit here which is for some multi-tenant environments provisioning a domain and creating a domain is part of that so if you'll have each tenant have their own domain and create a certificate as part of that when this is all done I've got a tenant created got a user created and there's a binding now between those two and I'm ready to go so what does that actually look like though from when I flow in terms of landing in here right it because I've said I've created this user but now that's how that user actually often one of the moving parts of that solution well you can imagine the auth part of that is pretty straightforward it is what you're used to I hit the web app it redirects to the identity broker it goes the provider it gets the bits about my data and it returns to me I an identity token but at the moment it returns the identity token I haven't really solved the SAS identity part of the problem because just getting in and getting an identity token it isn't enough it hasn't really said I've constrained your view of the universe in any significant way now you can rely on to the app to try to do some of that work but what I like to see here is these iam policies that we provision in the prior step I've got roles and I've got a set of policies that are provision specifically for this tenant and now what I'll use is the STS service from AWS I'll go out and assume role with a web identity as a as a mechanism I get through STS I take your ID token I take the role you want and that role then has bound to it the set of policies I created for this tenant and then those policies now control and scope my access to resources and that's all managed through this temporary token that's returned to me so now this temporary token and my ID foot token flow back and my interaction with all the AWS resources is now constrained to a view that is only appropriate for that tenant and for me this is an excellent way to sort of have an extra layer of isolation even if nobody else is demanding it's something you should want as a way of sort of assuring yourself that you've created these boundaries and and eliminated needs for cross tenant access now I am is really handy in this context if you've looked at I am I am let you control that like the kinds of operations that are available on a resource and the scope and the visibility of those resources and I can apply them at different granularities to different resources so you can imagine in the universe where I'm partition I'm provisioning tenant resources I'm provisioning CPU I'm sorry I'm instances I'm provisioning tables I'm provisioning s3 buckets and each of those could belong to different tenants and I'm using I am as the tool to scope and control access to them the last bit of this is tenant context I mentioned this this problem with how does the tenant context flow through this and I'll show you one of the mistakes here that I made when I built one of SAS applications I first felt which was I had my normal pages here right you hit your home page your catalog service hit your card service and as part of hitting each one of those every single time I hit the catalog for example I said well I need to know who the catalog is for the current tenant that's logged in well how do I resolve that I'll go over to the tenant service and I'll find out who the current tenant is resolved that gives context for that tenant and then in the cart service needs to resolve that the cart will go and do the same thing and quickly this became a huge problem for me right because the tenant management service now became a bottleneck of my system right and imagine this in a universe where I have 200 micro services are a thousand micro services all of which are going back to some tenant service to continually resolve their contacts well now you start throwing cash and you start throwing compute at the tenant service to try to solve this problem which is a problem you should have never had to begin with and this is where the standards for off tools and the tools that are out there can solve this problem for you and allow you to inject the context that you need and flow it through the system so instead of using the tenant service I'm gonna introduce an auth service which is one of the ID providers that's out there it's my identity broker and when I first come in and I and I off I'm gonna get back a jot token that Jah token is gonna have claims in it and I'm gonna configure that Jah token with my true SAS identity so not just my user identity but my tenant identity and my role and whatever other data that I think is relevant that needs to follow me and be bundled as part of walking through all of these services and then that token will flow through to the individual services and essentially provide me with on the context that follows me everywhere I go with no need to sort of round trip and go resolve it and this just relies on Roth and open ID connect very open standards for passing these tokens around so it's more about just leveraging the tools that are already there so what are the big takeaways from the from the identity part of this problem well I hope you see that the identity problem when you're thinking about SAS is certainly a much bigger problem than just how to get in the front door of your app the bigger point here is lean on the third-party tools lean on the octaves and off zeros and the pings and the duo's of the universe and the partner ecosystem to give you the innovation that you need to build your the identity parts of your solution you don't want to be sort of home building this all on your own in a homegrown model and then binding to that and then all of a sudden new capabilities are showing up in the identity space and you have to spend all this effort and energy to try to unwind that that's a very difficult path if you can somehow bind to one of these other solutions and use the identity broker pattern right so put the identity prokhor pattern there let them treat these resources as pluggable resources and use them as you need so who you're using for MFA today could be one provider tomorrow you could say I'm gonna swap it out and use somebody else write the broker gives me ability to do that and keeps me from being bound on directly to anyone providers identity solution the other bit of this is I feel like Adam ation is super-important here I feel like people sometimes they build all these robusta mechanisms with all this policy and isolation management these indentity mechanisms but they don't put full automation and regression around all of these bits right so I'm relying on these policies to enforce isolation but what am i doing to validate that that's actually working and what happens if something goes wrong and a whole gets open in my security how to identify that how do I catch that invest here I in in automation every way you can hopefully see that the tenant context can can be resolved here by using some of the existing mechanisms and you can introduce this notion of SAS identity and really the last bullet points probably the most important one to me which is you have to think about how identity actually lands in your developer experience right if as a developer I'm very aware of how what we've done with identity and I it's a part of my everyday development experience my productivity is not going to be especially good so even though there's a lot of moving parts to identity to get it all working my hope is that still a very seamless experience where that those tokens just flow through and they become part of what you bind to and use as a developer but you're not continually baking security policies directly into every service you're writing if you are you should at least ask yourself whether whether that's a good good implementation and whether your architecture is really doing for you what you want the next area want to look at is isolation and to me isolation is just a given almost every customer I deal with has some variation of isolation in their multi tenant solutions yes people like the pool environment but invariably the the forces of business and the needs of customers will lead customers down some flavor of isolation so the question is what are the different ways you can implement isolation with AWS what are the common sax techniques for getting these isolation well there's probably multiple patterns but the patterns I see the most clearly full-stack isolation we've kind of talked about that a little bit where we give each customer their own I their own stack but what do we use what AWS constructs do we do to to realize that network isolation so here what networking constructs VPC subnets what are the mechanisms we can use to isolate customers and then the last one layered isolation right there ways in my app my app level that I can have tiers or layers or clusters of services that have their own isolation schemes the simplest one to talk about full stack is one we've already sort of hit on right separate separate stack separate bits but there is one nuance to this which is even in full stack isolation I want to project the person the the illusion of a fully shared multi-tenant experience to the actual tenant who's consuming this environment I want them to have no awareness of the fact that they're running in siloed stacks so the onboarding and and the setup of the billing experience and those bits that are sort of horizontal to all your tenants sort of sit on top of this to me and they sit on top of this and then they're directing the traffic and routing the traffic to land in the appropriate stats the other bit of this is that you know I don't really want you to think of this is purely an ec2 based model right I can get full stack isolation with any of the compute models I can certainly use containers and put containers and clusters and have clusters be a way of achieving isolation just like I could do with ec2 right the big bit of this model is obviously I'm biting off a huge provisioning model here setting up these environments and automating the provisioning of these environments is a big undertaking now one way I've seen people achieve full stack isolation is by using AWS linked accounts so if you've looked at linked accounts linked accounts let me sort of have a payer account and these child accounts associated with them and the nice part of like a linked account is essentially all the resources that I allocate inside of a linked account are visible in that one context and the better part of that is that the billing side of the system right if you go look at your AWS bill and how its aggregated and viewed I can see each linked account as a separate a separate portion of the bill and this gives me a very natural way to say what's this particular tenant costumey what's the how can I correlate they're in for structure consumption with the actual bill from AWS very natural fit here it also is sometimes a good console experience for people because they can go in and say in the console I can see what's going on with this tenant in this in this very specific context the challenge of this model is it's not a great fit for every SAS solution right because if you tell me we're only gonna have 10 customers or a hundred customers and we want to use the count as our siloed model I'm saying that's probably ok that's gonna scale but if you say we're gonna have a hundred but we expect to scale to 10,000 or only want the possibility of scale to 10,000 or 50,000 well the account based model and the silo based model for this is not going to scale very effectively with you right you're gonna have to think about a just linked accounts have limits on them but also then how am I going to set it account limits and how am I gonna deal with the default limits for each one of these environments the orchestration of that doesn't end up scaling very effectively and it becomes a much more complex problem but totally viable if you're if you have a more limited set of tenants in this pool now the the model I like to advocate is what I call hybrid isolation so for me the purist and me and I know this isn't practical for everybody else as you said if you could get here the ideal way and somehow get the best of isolation but still get the agility and that you really want in a SAS solution I'd like to start with having everything be multi tenant shared if I could write so I create my shared environment I create all the great devops around that I create all the agility around that I get all the goodness that I want out of that experience including the cost optimization I want and then if somebody says they want an isolated environment or they're willing to write a big enough check which is what typically happens and the business says well they'll only run if they run in an isolated environment I will carve out out of my multi and tenant environment a single copy or clone of my multi-tenant environment for that one tenant but I will say that I am NOT going to allow one-off variation for that tenant right so I'm going to resist the temptation for one-off variation there I'm going to try to keep the same DevOps tooling I want features and functions as they come out to push to each tenant in a universal way because to me the minute I carve one of these out and the minute I let to it to have its own sort of path and its own customizations I've lost all the promise of of sass and agility here right so for me this is the constant struggle and it's a real struggle for many people inside their businesses because if somebody says well it's already carved out it's already separate and they don't really understand the cost associated with oh yeah but now as we want to roll new features and new functions and we're changing them in the base we can't roll them to everybody else and suddenly what started out as this great vision for SAS is suddenly back to the same thing you had before you went to status right constant struggle and I I know I've been science eyed for those struggles myself they're never easy but if somehow the purists didn't II can get you there that's where I'd like you to get to now network isolations a much cleaner story right we just we could take for example V pcs and we could say I'm gonna create a tenant for each V PC and then I will do something with peering with V PCs where the V pcs will be create my the peered V PC will create my management view of all these different V pcs so that's my view of getting a cross tenant view of all the activity going on inside those V pcs but as I move to V pcs I lose the niceness I had of the account construct now the billing responsibility and the attribution of resources to a given tenant becomes more my responsibility once I fall into this model right so now I have to introduce tagging or introduce some other scheme to say which resources belong to which tenants but it's not particularly complex just something more you have to think about variation and I don't see very often but I've seen it a few times is the notion of subnet so we get even more granular here and we create separate subnets for each tenant and then we use whatever the subnet constructs are to control flow in and out of in and out of those for each tenant again the tagging is essential here these are these are I would say the V PC model is probably very heavily used by a lot of customers again have to think about how it's gonna scale on term and how many 10 gonna have to know whether that's gonna be a long-term fit for you though the last one is layered isolation and layered isolation often looks like an evolution to me right so I start out with this notion of full stack on the on the left hand side and all I've got is my onboarding and my administration as a shared experience and then over time I'll say you know the web tier of my solution can be can be shared now I am only got static assets we can refactor a little bit and make our web tier very much of shared multi-tenant construct so let's it push the boundary a little further with our solution and we'll have onboarding and the web tier both be multi-tenant but the app tier either based on how it's built or based on the requirements of your customers will remain a single tier and so sorry as a single tenant and they'll have their own storage as well and then over time you might say well we think we now can extend the reach of that all the way down to the app tier right we can make the app tier multi-tenant we've found a way to do that and you see this sort of gradual evolution here of saying where is the boundary of of multi-tenancy in my environment it isn't always this clean sometimes it's a pocket of these services in a pocket of these services and you see people sort of in layers introducing multi-tenancy word didn't exist before the last one which I'm very passionate about is serverless sass right if we're going to talk about isolation I feel like serverless just has a very compelling story in here right with Cerberus if you're in fact if you're not attending server lists and you're interested in sass here I recommend that you find a session on server lists and just get very familiar with the moving parts of a WSS server list or because I think it's it's a very natural fit for sass environments and so here a very typical stack we've got s3 and some buckets and we're using cloud fronts to serve from the edge and cloud fronts you may be giving us our DDoS and that's very same things we would normally do but then we're going to rely on the API gateway here as the entry point to all the services that are represented our SAS solution so I'm gonna get throttling and metering and all the goodness of versioning management of my rest api's all pushed out at scale to the managed api gateway service and then all the functions of my application will be implemented as a series of lambda functions and then eventually storage sits on the other side of this and the reason this one is very compelling to me besides it's sort of optimal use of resources and the fact that functions when they run I only pay for them if they're actually running is it gives me a very compelling way to address the isolation story without having to introduce entire stacks for for each tenant here I can say I've got a set of functions do you want to run them in isolation for a tenant okay run them in a context in an iam context where you're only running in the context of that tenant and then if whichever function that tenant happens to be running yes you'll pay for the execution of that function but there's no notion of cold servers or our sort of parts of your stack that are just waiting around for activity that even with elasticity when it comes down there's some footprint here with lambda if I don't calling any of those functions I have a tenant that's relatively quiet and doing nothing for certain parts of the day even though they're isolated they're consuming no compute resources for me because they're not invoking anything on lambda so for me the cervix represents the sort of best mix of an isolation story that is compelling to your customer if they're willing to accept that flavor of isolation and still a great match of infrastructure consumption and tenant consumption here it's also by the way I should mention a great fault tolerance story here also because just a lifecycle these function and the granularity these functions you tend to get a better more fault tolerant experience and you can imagine in zero downtime kind of sash universes anytime I can get better at fault tolerance it's a plus so compute partitioning in this idea of tenant isolation I hope you realize that for me one of the challenges I see people have is they assume tenant isolation is for all their tenants because some of their tenants already in that model I was out of customer and the customer said oh we have five key members of our of our client base who all our needs siloed environments therefore everything we have to do must be siloed but business side of the house was saying well we're trying to address new markets and we have new kinds of customers who may not demand silo and in fact we may have a better business model if we could offer the solution to them and a shared model so for me don't sort of project from the fact that you already have silo to the fact that everybody therefore has to be silent challenge that the other bit here is if you can start with pooled and then work your way to two to a siloed environments and maintain that integrity of no one-off kind of customization that's a really good path to go down whatever you do whichever isolation scheme you do you use create one single aggregated view of system health and activity right you have to be able to have for the ops side of your universe something that aggregates all this information and presents it as one view of health and activity because we still want to be a van we're gonna have an agile experience for our ops people we want to hide the details of the fact that it's fully distributed away from them or at least give them better tools for managing it the other bit here is you better you have to think about limits right AWS has default limits for services well if you're automatically provisioning these services on the fly and provisioning these entire environments how do the limits get set and how are you going to control they adjust me of those limits as you create and introduce new tenants using tags is important and in general just don't let partitioning be the enemy of agility here right keep your eye on agility and make that a goal the last bit we're going to touch on here is data partitioning schemes and data partitioning like I said in fact I've just released a white paper on storage schemes for a data partitioning on AWS and what we find is this is an area that is probably most talked about in the most literature's out there for and the patterns for how people partition here data are very common right but what's different here is that the AWS services and the kinds of storage services that come along they all have different nuances to how they want to implement data partitioning but we if we just sort of generalize the three flavors of data partitioning that there are you'll find that these very much mirror the overall things we had with pool and bridge and and silos so on the left you'll say and in the sort of silo model conceptually I'm gonna have a separate instance or a separate database for every user and in the bridge model I'm gonna say no all the users go into and all the tenants go into one database but I'm going to allow schema level variation so each tenant could have their own schema and then finally in the pool model I'm going to say no all the tenants are all in one database and they're all in the shared schema and I'm going to use some kind of indexing to partition our shard those tenants now one of the things people don't think about though and this problem is they'll say oh we're gonna go all in we're gonna do shared we're gonna do some multi tenant model we'll use a foreign key inside of RDS or something to partition our data we're all good now it's sharted we're all we're all set but what they don't account for is the fact that the distribution of data for attendance in a multi-tenant environment is rarely equal right you almost always have some set of tenants who are imposing a disproportionate load on your system and they're going to have a disproportionate effect on the keys in your environment and the distribution of the keys in your environment so if three tenants are huge and a hundred are small those three could be imposing a load that's both gonna both raise your costs of your solution but also impact the the other hundred right so whatever you come up with as a in terms of a solution for addressing hot key you have to consider from the beginning how data distribution and how the size of the data and the nature of the data will affect your sharding scheme and what was a common approach I'm seeing people apply right now is to introduce the notion of a shard of shards so they introduced a level of indirection where they can insert themselves into that scheme and they can control how and when the data is sharted so here I can say tenant one has ten shards because they have this really huge data footprint but tenant two only has two shards because they have a much smaller data footprint and now I can intelligently decide which shards are assigned to which tenants and I getaway around and controlling distribution and I prevent myself from throwing resources at my storage constructs try to overcome these data distribution problems now on RDS multi-tenancy is probably the most straightforward scheme you'll find right we'll use an instance those instances will be an instance per tenant yes we might think about how we shard the instances and how we distribute them but it's a pretty straightforward scheme I want to do the bridge model I'll have a common instance but I'll have separate tables so I'll name the tables and and and scope the tables on a tenant by tenant basis and then finally if I'm in the pooled model and the right hand side I'll have everything in one instance I'll have common tables and I'll use some sort of foreign key pretty pretty straightforward model DynamoDB is by contrast a very different beast here and has a very different set of considerations right with DynamoDB I have one global namespace for all the tables within a region right I'll have the notion or a construct of an instance or a database those it's just one big managed service in a region and for that account I have one global native for an account I have one global namespace right for all the tables so how do i implement a silo model inside of DynamoDB well then I have to I have to rely on I am and I have to rely on tables as the way of achieving that silo so here I end up creating a group of tables for each individual tenant I ended up naming them based on that and then I end up surrounding them with some kind of IEM policy that says these tables are owned by this tenant and that's about as close as I'm gonna get to silo without something more exotic the more exotic model but is not a scalable model would be to say I'm gonna create separate linked accounts for every single tenant and then I can actually have a shared name but now I run into the scaling issues of will linked accounts scale with me effectively so I tend to steer people more this way then towards the linked account model and then I wanted to show dynamodb in the pool model but I want to show DynamoDB in the pool model and factor in this reality that we want to avoid hotkeys right and for me that means introducing another table in DynamoDB that becomes my tenant lookup table so when you look at this slide the top the top table is really my tenant lookup table and that's how I'm resolving what the sharding scheme is on it for each individual tenant so I've only showed one item here you can obviou all the tenants would be listed in this blue table and then for each table that's managed by that tenant I'm gonna have different sharding data that describes how that tent and how that particulars tables shard it because you can imagine even table to table the load profile of a tenant might be different how big how what the sharding scheme needs to be for the customer table for this tenant might be different than it needs to be for the account table so I want to take that into consideration so if you look at the customer table you'll see out three shards I've got some notion of what the size of those shards are and then finally some collection of shard IDs so now when I look down in the actual customer ID my partition key isn't a tenant ID my partitions key now is the shard ID right and this indirection of going through the tenant lookup table to figure out which collection of shard IDs belongs to a given tenant is a level of indirection sort of have to bite off if you're going to avoid the hot key problem and there's many approaches to this people round robin and they'll do all kinds of bits around collecting metrics around how to adjust these these shards on-the-fly I also see people manually adjusting me sometimes but don't don't overlook this the other bit of this is think about optimizing for real time here right so imagine you're running dynamo DB and multi-tenant has all of these fluctuation and I ops that are going on right and you have to decide well where do I set I ops to in a multi-tenant environment for dynamodb that gives me the best sort of cost and efficiency profile well there's you can't set it to some static level because what the load right now will look different in an hour and will look different from that an hour after that so what I want you to do is look at real time kind of tools like this is using a tool called an open source tool opens dinah dynamic DynamoDB and dynamic DynamoDB will actually let me set a policies in a set of mechanisms that will track the actual activity that's going on in the tenants and will real-time adjust the AI ops based on the load so that I keep my I ops just good enough without over-provisioning and I'd like to see this strategy generally as a storage strategy across multi-tenant environments if we're trying to optimize for cost the last bit and this is a general slide that that is in my storage bit but it really is a more global concept right here which is in general all these mechanisms we've talked about security we've talked about data partitioning and tenant context however they flow through my system I absolutely want to hide the awareness of multi-tenancy from the everyday developer of my system right so if you--if I show up on day one at your organization and you say hop in here and start writing a brand new service for my system what do I have to know about multi-tenancy what do I have to know about how to log what I have to know about how to get to the storage in this case how do I get how do I resolve security I want on all those bits sort of hidden away from me and surrounded and isolated from me and abstracted away from me with frameworks and tools and this to me data partitioning is the natural place to have this discussion because whichever partitioning scheme I choose I don't want the author of a service to know anything about that partitioning scheme the last bit we talked about just briefly I said I wouldn't hit the the sort of operational view but I felt like I really have to emphasize the importance of at least one dimension of the operational view and that's management and monitoring because to me when I talk to people about SAS and management monitoring their first reaction is you know I have good tools I'm using Splunk and sumo logic or Cabana and there's awesome tools out there that will let us aggregate logs and and see trends and activity inside of our solutions and we tend to assume I'm all set I'm good for a SAS environment but my experience is that would I actually have been out building these SAS solutions but even though these tools are great that that I still have to go through another layer of customization and another layer of configuration of these environments to give me the multi tenant context that I'm really looking for so if I'm an ops person I want to see a multi a tent cross tenant view of performance and activity that shows me in a cross Tennant way what's really performing well where are the hotspots at a multi-tenant level but I also want to be able to say hey if somebody calls under Tennant calls even though the multi-tenant health is looking good this tenant is having problems how can I drill in in this experience and in these tools and see on a tenant by tenant basis where a tenant may be experiencing problems and where they may need some help and so for me it's more about taking these tools and asking yourself what what's the ops experience what's the dev experience right I I've been in the dev side of this one and we say hey we've got these great tool and we're digging in there and I can see all the services are running I know which services are healthy and so on got all these logs I can comb through with all these great analytics but I can't assemble a multi-tenant view of health of the system in a way that really helps me figure out what's going on and how to fix it and imagine now an environment where your whole business is all in multi-tenant shared and if something goes down all your customers go down you're gonna be want to be ahead of that curve you're gonna want to be proactively seeing problems before they show up and then you want to be able to see them in a tenant context this is just an extension of that I guess my point here is that as part of instrumenting these environments and setting up your system to capture all these bits that it isn't just about capturing the traditional metrics you have right it's not just the log data it's not just cloud watch metrics it often means instrumenting the actual services of your solution with more knowledge about what's actually going on in your system to be able to build an effective management and monitoring view so what am I actually logging what are the metrics and I find people actually come up with their own metrics and invent new metrics to say these are the health metrics of our environment based on what we know about our domain and the patterns of our services and the patterns that users are using and I'm instrumenting these custom metrics into my actual services and I'm aggravated that up and creating views and dashboards that are things that are more than just what's the latency am i throwing errors what's CPU what's memory I get these domain-specific concepts that boil to the surface so what's the overall takeaways from the session well no matter what you're doing here no matter which approach you take with sass generally the goal force acts to achieve agility so don't let the technology sort of get in the way of your agility goals and I guess I'm I would say try to bake agility into every decision you make or if agility is not your goal on your it's just a good delivery model but you're not biting off acknowledge that as well but and that might push you somewhere on the on the spectrum here leverage the third-party solutions wherever you can third-party solutions are essential to this metering billing identity all these different tools can have a huge impact on the productivity and innovation you get in your solutions and then I really like to see the business and the technical sides of the house really pushing one another in a and when you're developing a SAS architecture right as an architect ask yourself what are the ways that I need to enable the business not just because we're going out like typically you're launching a new SAS solution you're gonna launch you know with three tiers and you're not to have given a lot of thought to pricing or different ways people are gonna want isolation try to push the business side for where those points of inflection are that are gonna add value to your SAS customers I also would like to see more than identity used here to achieve isolation I'd like to see I am baked into that so I'd like to see I am enforcing all flavors of isolation and preventing multi-tenant access wherever you can get it and then from a storage perspective hopefully you saw that the distribution of data and the irregularities in the distribution of data for tenants can have a huge impact on the storage and sharding schemes that you choose and so think about those those profiles of those tenants data and think about how that might affect the sharding scheme that you select and then the last bit is all this monitoring metering metrics right SAS environments live and breathe based on it on metrics right you're tuning and tweaking and fixing and solving based on whatever good metrics you have flowing into your system so invest heavily in those metrics rely on those metrics and use them to find the path for your system as you go forward anyway hopefully that gives you just a general good high-level view of what what options you have when you're thinking about architecture clearly there's a lot more moving parts to this discussion there's a SAS optimization but after this later today there's a bit on SAS optimization if you'd like to a deeper dive on some optimization strategies but I hope this was helpful to you and I really appreciate you attending have a good great day and enjoy the conference

Info

Channel: Amazon Web Services

Views: 19,132

Rating: undefined out of 5

Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud, AWS re:Invent 2016, aws reinvent, reinvent2016, aws, cloud, amazon web services, aws cloud, re:Invent, ARC301, Tod Golding, Advanced (300 Level), Architecture

Id: rqXDSh0YZdk

Channel Id: undefined

Length: 56min 4sec (3364 seconds)

Published: Thu Dec 01 2016