Building SaaS on AWS - Improve Security & Cost with Multi-Tenant PostgreSQL

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Applause] hi everyone and welcome to building sas on aws my name is guna girosh i am a developer advocate here at aws and if you are building planning to build a sas product this is the show for you in previous episodes i have been in the previous episode rather i was joined by sudhir and ananda for a technical look at how to deliver sas using aws private link and before that we've looked at reference architectures for multi-tenant sas using amazon eks and using servles and all of those episodes are of course available to you on demand and that's really what we do here we deep dive into sas specific topics so that you our viewers can learn more about building sas on aws and we want this to be interactive so keep your questions and comments coming in in the chat we're here to answer it we also have a few people joining us in the chat aws experts and this week i always have experts on and this week i'm joined by michael beardsley welcome to the show michael hey thanks for having me gunnar happy to be here i know it is super early for you so thanks for joining us at this hour yeah it's a little early i've got my coffee so i'll try not to uh interrupt my myself too much by trying to stay awake here yeah well we'll make sure to keep you awake with lots of great questions from the chat as well so tell us a bit about yourself michael sure so uh welcome everybody out there in the world wherever you are it's morning for me on the west coast of uh north america um i believe it's afternoon for you gunar um again my name is michael beardsley at the bottom of the screen you'll see the weird couple of letters after my name that's my github handle so as we look at github code if you're if you're out there uh and and you see that weird series of letters that's me um i've been with aws uh three and a half years or so uh working uh on a team that specializes uh helping our customers who are building operating or figuring out how to run a sas business or a subscription-based business using aws um and uh and and it's really exciting my background is is mostly in software um so i didn't really come from the uh infrastructure background like a lot of people who who come to cloud have i'm more of a software head i've been i've been working with stuff since uh since the 90s so um and i've been playing with postgres since since the the late 90s which is basically when it kind of became public out of the ingress project from berkeley so um so i've got a soft spot for postgres i've been playing with it for a long time and i'm excited to uh to talk to talk about how you can better use uh better use that relational engine for your multi-tenant workloads yeah and that really is the topic of this session so we're gonna talk about how you can improve both security and cost with multi multi-tenant postgresql right right and and and we're talking about cost because um because it's a big it's a big part of why companies are moving to a sas delivery model right so you've got all these companies out there they're delivering their software uh they've got multiple customers they're being really successful but they're often they're struggling to grow uh and to uh and to onboard new customers quickly or reach into new markets because it's complicated and and expensive and so sas really uh sas promises a sas delivery model for your business promises more agility your ability as the vendor to move faster uh iterate more delight your customers and as you sort of uh you know simplify and and and standardize your delivery of your product and whether you do that with a with a single set of resources for all of your customers to share which is sort of the common um definition of multi-tenancy or how we like to think of multi-tenancy which is just that you've got lots of tenants whether you gave them different infrastructure or the same infrastructure you're treating that solution as a single solution that you're managing on behalf of those customers and as you get closer and closer to that you're able to speed up your delivery time to your customers and iterate faster and your costs go down as you can share more of that infrastructure and more of that maintenance and and and you can automate more you can you can lower your costs um but it's hard to share infrastructure right it's it's it's difficult to make sure that everything is secure so that all of your customers are in their own space and they're not impacting each other's workloads so that's what we want to talk about today to take that topic and talk about it in the context of databases right because there are some challenges with with the sas delivery model and even even though it might be kind of like the the traditional delivery model it is a bit different and it brings some challenges right that's right you'll often see some kind of the the the same bullet points that you would concern yourself with in a traditional delivery model right you you want a product that's secure you want a product that is performant you want a product that's easy for your team to maintain you wanna you want a product that doesn't cost you too much to operate so that you can make more revenue you want a product that you can automate you you you want all of these sort of same bullet points but all of those issues are kind of amplified if you will they're a little bit harder when you talk about them in the context of of a multi-tenant system because now your your blast radius of your choices that you make and if things go wrong you're affecting more of your customers than you would in a traditional model and it's and it's hard to get security right um and and it's hard to figure out how to share resources and lower those costs and gain some of those efficiencies and that's that's what we help a lot of our customers with is how can you share some of these resources and how can you do it securely um because a lot of people you know security is so important that they'll just default back to more coarse-grained you know models and then they never really get the uh they never really get the ability to move quickly and they never really get the cost savings and so we're really there to try and help everybody succeed help our customers succeed help them you know uh improve their market um where they can go and reach out into to new uh deliveries to different they can either go down market or up market as they reduce their costs it opens up the opportunities for those businesses and so we want to focus on those kinds of topics today yeah and i think you touched on on that topic there with data partitioning i know we'll go more into depth into that in a bit but that's really one of the key things to consider how to partition your data in in whatever multi-tenant solution you're building right that's right so so everybody knows right data data not every workload has uh stored data you know but every workload deals with data but most of the workloads out there in the world deal with persisted data that you've got to figure out how to store and then retrieve and do it in the right context um and and and databases they they're really the sort of the gravity right they're the foundation for so many of these workloads out there and nowadays you've got so many choices right i mean uh where it's an embarrassing level of uh of number of purpose-built databases that you can uh utilize out there and and it's not just nosql right uh you know nosql schema-less databases became popular here uh a few years back but and now we've got you know time series databases and graph databases and ledger databases we've got all these specialized offerings but the reality is is most companies are still using the good old-fashioned relational database model why you know why are they still using it there's all kinds of reasons why um a lot of it is comfort a lot of it is skill set um and you know for a lot of companies it's still the right tool for the for the job right uh and and so um so we want to look at how can you um share those relational databases uh when you have multiple customers coming into your solution and i think that's a really good call out that a lot of companies are using using relational databases a lot of the reference architectures the samples and so on that we're building they are built using dynamodb nosql databases right and there's many reasons for that it's it's easy to to set up start working with usually no cost with just setting that up and having it uh without any usage and so yeah yeah yeah a lot of the samples look that way yeah dynamo dynamo is awesome but um but it's not a fit for everybody um and a little bit of a plug there it it there is a free tier for postgres uh on aws so um so if you want to get in and play with what does the relational database service look like how does it help you you know part i think you know it's interesting the cloud has has delivered all these purpose-built solutions for us right and now we can truly choose the right tool for the right job we don't have to shove everything into a relational model uh if that's not the right fit but at the same time the you know the pay as you consume pay as you go the elasticity and these managed services uh that that the public cloud providers have come up with like the relational database service from amazon and the aurora service have really um sort of if you will breathe you know life into the relational model it's not going to die anytime soon it's still a valid model for everybody to work on and because we've got these uh managed services that are taking some of that uh that management overhead away from you in terms of sort of those traditional dba sort of duties backup restore security patching um you know scale out stuff like that and and so you've got these really modern uh architectures that are being built still on relational models now you can have my goodness you can have you know uh global tables that replicate uh across global regions um and uh you get the durability and the security uh read replicas that just light up magically with failover and aws has sort of figured all that out for you and and done all the hard work of wiring it together so i think customers are able to come and use uh relational database engines like postgres with um with more confidence now uh when they use it for through through a managed service they know that they know that they're gonna have that reliability and that performance and that up time that's great all right so hi everyone joining us we're 12 13 minutes into the show it's just two talking heads so far but if you've seen any of my shows before you know that we are going to show demos there are we are going to see code and i guess sequel commands as well in this episode so don't worry we'll get there and just want to do a call out from the chat surely dev 150 dollars worth of aws credits last night so congrats after this show perhaps you might want to use those credits on testing some postgres on aws as well that's right well let's let's let's jump into uh let's jump into our big topic here right so so we want to talk about how are we going to deal with multi-tenant workloads uh in in a relational database or any database and then we'll sort of go into the specifics of a relational model so um so really there's there's kind of three main uh partitioning patterns and i think we've got let me make sure i'm on the right tab here yeah so we've got we've got uh we've got sort of a couple of pictures here um uh to to sort of demonstrate this so uh the first way to partition data and again what is partitioning right let's start from the beginning what is partitioning well partitioning is separating one customer or tenant we call them tenants one customer's data from another so you have to somehow discriminate where is customer a's information from customer bees and that can be a physical separation like we see here in this picture right so the first example of partitioning the easiest to understand uh and sort of the default rollback for anybody to to to uh to implement is to give each of your tenants their own infrastructure so here in this picture you see i've got i've got three tenants up there and each of them is running in their own database instance or their own database cluster and and um and this model has a lot of benefits um but it has a lot of a lot of cons right so so um so the benefits are of course you've got really strict isolation right there's no way that one customer's data could be where someone you know the next customer's data could be because they're physically in different database installations um but uh that high isolation level you know it can make customers feel really good right so it's an easy story if you ever get and we always try and tell our our sas companies that we're working with try not you shouldn't need to have those conversations with your customers you should be talking about your service and the benefits that your service provides and the customers shouldn't be worried about you know how are you implementing your service behind the scenes they should just be paying you for the value of your service but it is uh you know it is realistic that some customers are going to you know kind of get behind the scenes a little bit and this model makes them feel very comfortable um but uh but it's but it's really expensive right because you've got to light up all these different databases you've got to pay for all these different databases it's really hard to maintain this it's really hard to onboard new tenants into your system right and again sap is a sas is a growth model and so you're trying to grow you're trying to reduce the friction and uh and make it easier to bring new customers on board into your solution and quickly get them to a point of value so that you gain their loyalty right so as cool as sas is uh it's also easy to switch to some other provider and so we you know you need to get that to that sticky point of providing value to them and this can be really difficult to uh this can be really difficult to um to onboard to so so going going back to what you talked to initially we're not really reaping the benefits of the sas model by by only using xylo yeah we're you're definitely missing out you're you're missing out on um on some of your ability to reduce your costs you're missing out on some of your ability to view this this is a really hard model to report on and to get a single sort of operational view on um it is it is a good model for backup and restore so it's a good model when you have a lot of data sovereignty issues um because it's really easy to to to with the database engines to get the data out for one customer versus another um so there's different and and and you're probably not going to run into noisy neighbor problems and so when we say noisy neighbor that's when you've got multiple customers running on your solution at the same time and and their use one customer's usage of your system would uh impact the performance of one of your other customers and and that's less likely to happen in this model here because they're all running in their own infrastructure now i will say that this model is more straightforward when you when your compute so the thing that's talking to the database layer is also per tenant if you have a situation like you see on this diagram here where your if your if your compute layer your application is shared for all of your tenants but then each of your tenants has their own copy of the database then things get even more complicated right so you'll see here that in that in that model you have to figure out which database do i connect to uh and and so you've got this other overhead that you have to maintain and you've got to maintain the credentials and figure out how to route to the proper database that you don't have in some of these and some of these other models that we're going to talk about so let's let's move on to the next one so we don't run out of time here um so so the next model let me zoom in on this to make sure everybody can see so the next popular model is to use is to use the um native namespace mechanics inside of a relational database engine usually these are called schema and and pretty much every database engine out there has some concept of schema postgres literally calls it schema and the default one is called public sql server also has a default schema it's called dbo for database owner and and all of the major databases have a concept of schema my sequel doesn't really have a concept of schema what my sequel calls a database is kind of a schema so you can install multiple databases on a single mysql daemon and those databases can talk to each other by default which is a little weird right in in postgres or sql server or oracle or db2 or some of the other engines that that wouldn't be true you'd have to go through quite a bit of wiring and effort and maybe install some third-party ability to have one database talk to another so however the idea here is that every all of your customers maybe i made this too big all of your customers are in a single database but now you now you partition them you split them up by namespace by schema uh and and this model here um i you know i don't really love this model it it kind of brings a lot of the difficulties of the per database model in terms of it's hard to maintain onboarding can be really difficult especially when you think now you've got even more of a noisy neighbor problem because now you've you've got say you've got you know 100 customers running you've got 100 schema in your database engine your 101st customer on boards now you've got to issue all of those ddl statements all those create object statements in your database and you've got to run those against a running engine you know and you may end up locking tables and and you're probably going to have to grow your database instance bigger it's a it's a real it's a real struggle and you're likely going to have to shard your database instances anyway because you're going to run out of performance if you've got too many customers on a single database but i think there's a reason why companies go to this model and i think it's i think it's for two reasons i think one it's for cost right so the benefit here is now i'm running in one database instead of as many databases as i have tenants and if you think about the licensing costs of some of those sort of old guard you know commercial database engines you're paying by instance so if you can reduce the number of instances you're reducing the number of licenses that you're paying for and that's and that's a powerful enough uh you know reason for companies to move to this model even though it's kind of a little clunky from an operational standpoint so let's name them which well i mean what are you talking about sure think about sql server think about oracle uh right and and so in sql server you can install many many many database many databases and many schema in a single database in a single sql server installation but if you have a separate sql server install for each of your tenants you're paying that sql server licensing fee over and over again now i am no licensing expert i'm not a lawyer talk to your enterprise uh you know uh uh account rep from microsoft in terms of licensing but it's one of the reasons i think why companies default to this and there's a second reason so one is one is cost licensing issues the second reason is it and we're going to talk about this a lot today if we get a chance to really dive into the code it it moves the security away from your software developers right because you're using a feature of the database engine itself to differentiate where your customers data is and that's important because the last model so if the first model is everybody gets their own database the next model is everybody gets their own name space within the same database whether that's by using a schema or whether that's that you've renamed the tables say with like a prefix of your tenant name we've seen that before or maybe you use views and you really go hard hardcore old school relational database modeling and you've got tables but nobody looks at the tables everybody looks at views and so you make a view for every one of your tenants so there are different ways to approach the same concept where you're you're trying to give some sort of name spacing to everybody but the but the last model of course then as we go along is is the shared model and this is where everybody's in the same database everybody's in the same name space and now you're using some piece of data in your database tables to discriminate one customer's data from another and the challenge with this is that now you've kind of put security in the hands of your developers if you will because the only way to enforce whether you're going to see someone else's data or not is through the sql queries that you that you up you know that you that you run against this database essentially your where clause right you got to make sure that whatever's in that where clause is filtering down the record set um so that you're only seeing the rows that you should see and i don't know about you but i mean like i've never written a bug in any software that i've written you know 25 years so you know so it's good so so you know so companies are worried about that so um but there's a ton of pros to this i mean my goodness right you you you get you know the maximum agility for your sas workload you get the minimum cost it's super easy to onboard new customers here because you don't have to do anything right the database is already up it's running the table structures are defined um you don't have to you don't have to create any new resources um and and uh the and your and your costs are super low right but the challenge here is the security concern right how do you how do you secure this model um and how do you take that overhead away from your software engineers so that they're not spending all their time worrying about this and therefore uh the potential to to have a mistake so so pooled looks pretty good and and we need to think about like how can we tackle some of these problems with it i think that last part is very important as well it's it's one that they of course the developers they don't want to make mistakes they don't want to cause security issues in in whatever thing they they write but just removing that from them should be a goal so that they don't have to think about it right right there was something recently that i read some some little article or something about the um the super mario effect or something like that somebody did some study where they had a bunch of software engineers and some of them got penalized for making mistakes or bugs and some of them didn't get penalized they were asked to solve the same problem and they were put into these two sort of psychological scenarios uh and it turns out that the that the people who weren't penalized for making mistakes built better software faster than the people who were told that they were going to be penalized so why is that well we don't want to make mistakes right and so we're so worried about screwing up that that we move slower we don't experiment and we don't come up with better solutions and the and they called it the super mario effect because they were reflecting on uh when video games became super popular in the 1980s and and uh and the early 90s and for the first time you could just keep retrying right you you got to the point where your character you know ended your character died but you could just start again and so people were encouraged to repeat over and over and over again until they got success and so they didn't focus on the failure uh they focused on getting to the end of the level in the video game uh and so same same idea here so so you're right so so we want to figure out how to support our software engineers and how to take some of that burden away from them and so that's how we're going to improve security here and the way that we can do that is with the super cool feature in postgresql called row level security so let me pull up here i've got a uh i've got a slide on on you know what is row level security so row level security it came out i think in version 9.5 somewhere around in there so it's been around for a long time definitely production ready very stable very performant system but it's a user-defined security policy so you write these as as the as the user as the person who's defining your your database structure you define these security policies and you apply them to a table and just like an index or constraint they've got a name and the way that you define them is with what postgres calls a using clause and all the using clause is it's just a boolean statement so true false and if the result of that statement is true though that row is going to be included in your either your read for select or it's going to be included in the rows that will be affected by your write statements whether that's insert update or delete and what's neat about rls is that you can you can combine policies so you can have more than one policy on a table if you have to you can split out and you can have one policy for reads so one policy for select and a different policy if you want for uh update insert and delete um and and uh also important for everybody to understand that this this is a postgres feature but it is available in the postgres compatible version of aurora so this works in both aurora and in um and in rds and like i said it's been around since 9.5 there's nothing special that you have to install you don't have to install an extension you don't have to set up anything weird to get this it's just available it's just part of the engine so let's take a look at um at what it actually looks like i know everybody on these streams loves to look at code and so do i let me zoom out here just a little bit okay so so here we go so this is this is sql for anybody uh who hasn't seen it before uh and up at the top all we're doing we're creating a table we're just going to create a simple table for our tenants uh it's got a couple of fields in it it's got a unique identifier and then we're gonna enable row level security for that table so we use the alter statement in postgres to do that and then down at the bottom here's our security policy here's our row level security policy uh we give it a name i'm going to call it tenant isolation policy and then here's the here's the important part the using clause so what is this using clause saying well this using clause is saying that return true when the value in the tenant id column from my table which is defined up here is my primary key i happen to define it as a uniquely universally unique identifier when that value equals the username in postgres of the currently logged in user so this means that a user can only see their own data and i've only got one policy i don't have a different policy for reads and writes so i've got a single policy so any action that's taken on this table has to be taken by the same user who owns that row of data that's how this that's how this uh reads out so if you take a look at what this uh would look like if we ran these statements and we'll get to running statements and we'll run some code here in a minute but this is easier than typing out goods here so so here we'll we'll log in um we'll log in as the owner of this database as the as the root level owner of this database and we'll and we'll run a select statement and by default the owner of the database installation they're not beholden to rls policies right which makes sense right because if if the owner of the database the the if the role that manages the entire database daemon and the master tables and all the security and everything can't see rows in the table that wouldn't be good so it makes sense that by default in postgresql the user is called postgres unless you change that and in rds and aurora you give us a username that you want to act as the postgres username and that user that user by default won't be restricted by these security policies which again is is normal so we see that we've got two tenants in our table now let's log in as tenant one so now i've actually logged in as this username happens to match the id of my tenant so if i say hey who am i currently logged in as this is kind of like the linux command who am i in postgres it shows me that value and now when i run a select statement you'll notice that i only get one row i only get the row that i own because my using clause has said only show me rows where the value in the tenant id column match the value of the current user so so basically you've stripped out the where clause in that statement that's right that's right notice that that there is no where tenant id equals right it's just select everything from tenant and as far as this session as far as this user is concerned that is everything from the tenant table there is nothing else and and then so let's say okay that's cool but somebody like hacks into your system or knows your other tenant names your other ids or can do sql injection into your solution or something what happens well you'll notice here this is now this is tenant one specifically trying to select tenant two's data and it doesn't work you don't get any rows back and you don't get any rows back because postgres runs the rls policy engine super early on in the process before it does all of its query planning and it does that so that it doesn't leak data around one of the challenges with sql statements is these are simplistic statements right these are contrived statements but in a real application you can have super complicated sql statements any of my sequel fans out there and comments right you guys know there's correlated sub queries you know there's uh rolling window table there's all kinds of stuff and if you have to think through a security model where that wear cloth that outer where clause you actually have to get it right on some of the inner stuff too and that can be really challenging and rls solves that problem for you now when we look at right statements reads and writes the same thing happens we get no affected rows because the record set of which rows to operate the the command on returns zero right because that select statement returns zero essentially there's nothing there's nothing to there's nothing to act on now inserts are interesting inserts do actually return an error in postgres with row level security um uh unlike select selects just don't turn any return any records and updates and deletes just don't affect any rows and it's silent and nobody knows what's going on but but inserts do actually return an error telling you that that you can't do what you're trying to do okay so let's see here i think all right let's go over a couple of these and then let's dive into a demo so we can actually see some cool stuff going on okay so what are some of the gotchas like this is this is too cool right this is great like why wouldn't everybody use this well uh so there's a couple of things to keep in mind when you're using row level security uh the first is like i said that the super user of the database installation or the owner of the table the owner of the table in postgres by default will be the user who issued the create table statement now you can also change the owner of a table if you want to by issuing an alter table statement but by default the table owner is not constrained by rls policies um you can you can turn it on if you want with force row level security but by default it's not so you want to keep that in mind in terms of how is your application creating your database schema in the first place and then how is your application actually using that set of table structures because if you if you use the same user for both and you don't use force row level security um you're not going to have any rls policies uh involved so and just general best practices you should not be logging in as the super user as your application code your application code should be running as a as a least privileged database user not as the super user of your database so you should always have different credentials they shouldn't be the credentials that you gave to rds or aurora when you lit up the instance in aws you should have a different set of credentials and we've got all kinds of cool tools secrets manager all kinds of ways to deal with secrets for you rotate things hide your credentials from your workloads pull them in at runtime you can put them on your containers we could have a whole show on secrets maybe we should uh but but don't don't use the super user as your application user for your database um and let's be honest right this is overhead there's more thinking going on in the database engine right there are more cpu cycles that are evaluating this using clause and applying it across the table tuples before it returns whatever it needs to do for the query engine so clearly there has to be some sort of performance impact i will say that um if you keep your using clause really simple and straightforward and if you use indexed columns as the as the things that you're looking at and comparing in your using claws you shouldn't see a huge impact but obviously everybody you know needs to be testing and and and doing load and performance testing on their solutions before they just jump into into production with it so so what the overhead is is back to the standard it depends answer totally um but again sort of best practices right uh try not to use views in your using claws especially views that are built on top of other views try not to do too many table like try not to do table joins if you can get away with not doing table joints remember a using clause is simply anything that can execute inside of postgres and return a boolean so any crazy select statement you can imagine you can run as you're using clause for your security policy so if you need to join to another table to really understand whether you should be allowed access to that or not you can do that if you need to execute a function if you need to look at a view again anything that you can run as a valid statement inside a postgres you can put inside the parens of that using clause so but keep in mind what is it that you're running in there because it's gonna it's gonna happen on every single query that you hit so there's a comment in or a couple of comments in the chat from bisoldi about uh well this the same issues that we talked about before with noisy neighbor and so on and that's still applies even though you're using rls yep the abs is to take care of the the security concern around using the shared model yep absolutely so so the shared model comes with it all kinds of challenges around noisy neighbor comes with it so it's an interesting conversation right because you're saving a bunch of cost because you're only running one piece of infrastructure but you're sharing that infrastructure so often what ends up happening is you often end up having to buy a larger instance you have to buy more power right so you'll often find companies having to up their database instance type to a more powerful instance type uh so that instance type is more expensive than what they would do if everybody were running in their own instance but you're still saving money because instead of running you know 50 whatever you know m4 larges you're only running one you know m5 you know 4xl or something like that so the math still works out but yeah you definitely still are are burdened with the noisy neighbor issues but rls think of rls this is a great way to think of row level security think of row level security as a as a where clause that is magically applied to all your sql statements and managed for you by the database engine so you now don't have to remember or get right with all those fancy complicated queries to get the right where clause in every time you still need to issue your where clause that makes sense for whatever your business logic is that you're selecting out of the table but you don't have to then add to that where clause your your your isolation constraint for your tenant you should be able to define that as a policy and then uh let's say that we're working with building a sas that has different tiers we've talked about this in in previous episodes so we have this pre premium tier and we have the standard here some are using the shared or the pool model some are then using the silo model yep would you use still use the same business logic so you're still using rls on the i the silo databases as well yep i i would i would and i and and i would because when you end up splitting out your infrastructure because you're of the packaging decisions of your product um for tiering like you say maybe maybe somebody gets a larger instance type or maybe somebody gets their own instance or whatever your or or whatever your rationale is there you still as a as a sas builder as the vendor of the solution you still want to be managing one set of artifacts one chunk of software that you're testing and iterating on and you just happen to be deploying that into an environment where only one tenant is going to attach to it it doesn't mean that it wouldn't support more tenants it's just that only one is there and and i think that's even more important here when you look at how you have to uh turn on rls right so so you remember those are alter statements so those are part of your table definition so you have to think through what is your sort of cicd pipeline if you will of your table structures and that's a whole other topic that we can get into how do you light up your table structures to match your domain model in your software so that they can talk to each other and stay synchronized it's a whole nother whole another topic to dive into which we don't have time to for today but but you have to think about those types of things and so yes very much i would i would still run the policies even for the siloed uh resources makes sense and i suppose customers sas the users the tenants they might change between different tiers as well and handling that is probably easier if if you're using the same business logic absolutely okay well we are uh man the hour is just flying by as it always does right so let's try and jump into uh a little bit here so um i'm not i'm not sure we're going to have enough time to do everything that i want to show today so i'm going to skip around a little bit um we have an example out on github uh of how you would actually implement some of this code and i wanted to walk you guys through some of that today show you an example and actually look at some look at some software here so um i have been madly working at updating this repository over the last few days because when i originally wrote it it didn't really have a user interface or a good way to demo anything and um good news bad news right good news is it was really hard to write a demo to try and break this security policy and i actually couldn't quite figure it out so so there's some complexity to this uh demo code that wouldn't exist in real life because i'm trying so hard to figure out a way to show rls working bad news is it took me so long i haven't yet merged this fork back into the main line so i will do that today uh but so right now i'm i'm in my fork of this uh here and uh and i and i lit up a cloud formation stack for it um and when you light up this when you light up this cloud formation stack here if you guys go and play along at home later on today you'll see that this uh that this example it it it has a few parameters right um so give the stack a name um it asks for a key pair this is if you want the cloud formation stack to instantiate a small ec2 instance that you can ssh into that will be able to connect to that database so that you can issue raw sql statements with the psql command line client go ahead and give it a key pair name that you already have in your account if you don't give it a key pair name it won't it won't launch that little jump box for you so you um and then but the database is in a private subnet so you're not going to be able to connect to this database using something like pg admin or db beaver or anything like that um so you can give it a you can give it a key pair if you want um if you leave stuff alone if you leave the defaults alone you'll stay in the free tier um and then just you know give stuff names um give it a username and a password and then hit go and notice that i set up two different usernames and passwords there right back to our earlier conversation you should not be connecting your software as the master user if you can help it right that's something to refactor out of your system is get users that only have the permissions that they need so this stack is obviously going to take too long to run for the time that we have today so last night i um i already installed one and this is complete and when this stack is done you can go to outputs and you'll notice here that i've got a load balancer endpoint and if i open this load balancer endpoint it should launch my little demo app so here's my demo app now this demo app is a monolith to make it simple and make it easy and get like this little demo to light up as fast as possible and have those few moving parts but it also then means that it's doing more than it should so it's essentially it's doing it's it's doing both the tenant onboarding as the sas administrator but it's also acting as sort of an application that your tenants might log into and utilize so it's cheating a little bit in that regard so um the first thing i have to do i i've got an empty database here right now but if i go into the to the admin section um as i would if i were a uh the sas builder the vendor um i've got i've got no tenants so let's add a tenant on tenant one great let's add another tenant let's call them tenant two now i've got a now i've got a couple of tenants so now i can do something so now i can go in and i can supposedly manage the users of these tenants so if i go in here now this is where again this is a demo that's a little contrived so now i i'm going to say okay login like which tenant are you accessing my system as let's say that i'm going to access it as tenant one so here's my little authentication mechanism so now i'm logged in as tenant one now this is where i'm giving you the ability to basically break the system um but let's not break the system right now let's yep um i'm logged in as tenant one and i want to operate on users for tenant one so cool so i choose tenant one and i click the go button and i have no users so let's add a user for tenant one as tenant one so this should all work let's call it uh user one for tenant one okay uh cool so now again if i load all the user i'm logged in as tent one if i hit the go button to load the users i'll see it let's add another user i'm user two okay great so now tenant one's got two users let's log out now let's go back in and let's let's login as tenant two manage users for tenant two tenant two doesn't have any users empty database let's go ahead and add a user for tattoo oops can okay cool so now um i'm logged in as tenant two tenant two has users tenant one also has users all right let's start seeing something interesting so let's now say i want to i'm logged in as tenant two but i want to try and manage i want to try and impact the users for tenant one let's try to load the list of users for tenant one well that doesn't work right it doesn't work because of the sql statements that we just showed you which is to say um you can't access somebody's records that you don't own okay um cool but let's uh let's see so we can't access data for this tenant id but let's go ahead and try to break the system anyway so let's say let's create a new user and we will explicitly give it the tenant id of tenant one but we're logged in as tenant two right mad skills okay and that doesn't work either so why doesn't that work well that doesn't work because um of the same reasons i can't i can't um i can't write i can't issue an insert or an update statement when i'm logged in as the wrong as the wrong thing cool so let's look at one more thing here before we completely run out of time let's log back in as tenant one let's grab user two's id from tenant one and i'm just gonna um i'm gonna copy that over here on my desktop into some other place okay cool so let's get out of here so now this is where it gets a little hard to actually break the system partly because of the framework that i'm using this is a spring framework app it's got cross-site forgery tokens going on and all kinds of crazy stuff um so so this would be a little bit easier to hack on if it were just an exposed like rest endpoint but it's a little bit harder so let me um oops i don't want to view source sorry i want to let's look at developer tools okay so let's login as tenet2 we'll go ahead and initially manage tenant two's users now this is interesting this is the easiest way i could figure out how to show you guys this let's go ahead and delete this user from tenet2 as tenant two so this will work right let's go ahead and delete that user cool so that worked now i'm going to go over here into dev tools and i'm going to select that delete user success right and then i'm going to and then i'm going to go in dev tools and i'm going to right click and i'm going to say copy and i'm going to say copy is fetch and i'm going to paste it i know this is too tiny for you guys to see but you can play along at home and what this is is this is the post this is the post that happens and way down here in the post are the key value pairs of the of the parameters from that form that got sent to the um to the back end engine let me jump over here real quick let's get some logs open so we can see this because we're not going to see it in the ui because now we're just being sneaky make sure we're in the right region and what i'm going to do is i'm going to reissue that post command with the other tenants user id okay so here's our here's our logs okay okay i'm gonna take the user id for user two from tenant one that we copied i'm gonna i'm gonna paste it into this little fetch thing here it's even hard for me to see and then i'm just gonna hit enter here let's jump back over to our cloudwatch logs and let's see what happens okay so in here what you'll see is so here's our successful delete as tenant two deleting tenant two's user and now i just got more logs to show up and now what i'm gonna see is i'm gonna say hey i wanna i wanna delete this user i've got two different connection pools one for each of my tenants that's being managed by my software my currently authenticated tenant is this e55 tenant i'm gonna try and delete the user and i'm gonna get zero rows affected i'm gonna get zero rows affected because my rls security policies returned protected me from deleting that record even though i was explicitly breaking the software and trying to jump in and the neat thing is again these sql statements don't have a where clause in them right so if you look at like delete user it's just delete from tenant user it's not where tenant id equals and user id equals right which is what you would have to do if you didn't have the level security policies in here pretty cool and unfortunately i know we're we're super short on time um when you pull down this example code um there's really just one place in this entire chunk of software that really has to do with row level security and it's this it's when you get your connection from whatever it is that's providing your connection to you this happens to be written in java so this is a jdbc uh database connection um what we're doing is we're setting the session variable and we're setting it to the current tenant context so determine current lookup key like what the heck does that mean well that's this function up here and what does it do well it goes who's currently logged in right so this could be your jot token from your identity provider that's got your tenant id in it i know folks have talked about that on your show before um this could be old school session based login like this app here what however you're determining current tenant contacts for this http request right we are using that current tenant as the key right there it is the key equals the current tenant and that is what we're setting the session variable to and that's what our row level security policies are looking at they're saying hey if the if the app.current tenant session variable equals the value in the tenant id column then you're good to go otherwise no very cool so yeah i think that's going to be what we're ending with michael it's been a quick hour as it usually is uh i've pasted a bunch of links in the chat now and link to the repo as well if people want to try this out and you will i guess merge your new commit later today yes i will get that merged in uh as as soon as possible i could i could merge it now the readme file won't quite match so now it's simpler now all you have to do you don't have to do all the like prerequisite steps that it says you have to do in the readme all you need to do is take the cloud formation template and launch it in your account and as long as your account has uh the ability to light up a new vpc uh you're good to go all right and with that i want to thank you all for joining us today thank you michael for joining us as well being an expert and this has been building sas on aws bye bye [Music] [Applause] [Music] [Applause] [Music] you
Info
Channel: Gunnar Grosch
Views: 150
Rating: undefined out of 5
Keywords:
Id: jnD3K0590tM
Channel Id: undefined
Length: 61min 29sec (3689 seconds)
Published: Wed Sep 22 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.