AWS re:Invent 2017: From Mainframe to Microservices: Vanguard’s Move to the Cloud (ENT331)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello and welcome thank you for attending today's session from mainframe to micro services Vanguard's journey Vanguard's moved to the cloud my name is Ilya Epstein I'm an enterprise Solutions Architect with AWS I'm joined by Barry shirred the chief Enterprise Architect from Vanguard I had the pleasure of working with Vanguard's team for the last two years on their cloud journey many of the customers that I work with have mainframe systems and those mainframe systems have centralized data stores and monolithic applications a lot of those customers want to move the mainframes to the cloud so we're going to talk about some of the modernization approaches of how you can move mainframes to the cloud and how you could what are some of the key considerations then Barry's gonna do a deep dive into the Vanguard's cloud data architecture and he's gonna talk about the strangulation pattern that they're leveraging and he's gonna share a lot of the benefits and lessons learned hopefully that you could take back to your organization's so for most customers a journey to the cloud is also a journey to micro services although there may not be a precise definition of what a micro service is there are some very common characteristics so for example my cursor versus typically have reusable modules that are Bill deployed independent micro services typically communicate with each other through an API interface they're usually organized around the capabilities of that make of their micro service or the business versus the specific technology stack so what you often will see instead of having teams that are UI based or middleware based or database the micro services teams are organized around that specific micro service so at Amazon for example we call that the two Pizza teams micro services will typically own their domain logic and they'll have the centralized governance and data stores from a data store perspective a lot of times the each micro service will leverage the data store of its choice so if no sequel makes more sense or if a relational database makes more sense for specific micro services you will usually use the data store that actually makes sense and then you're also going to start seeing a lot of the centralized data governance right so you Barry's gonna talk about concepts like bounded bounded context and he's gonna talk about eventual consistencies those are some of the things that you see when customers move to micro services micro services are almost always deployed in an automated way so this is where you're going to have continuous integration and hopefully continuous delivery and they have to be designed for failure right so there is definitely a lot more stress that has to be put into testing your clients because micro services could fail so you need to make sure that your application and your clients could respond to those failures I think from it benefits going to micro services I think it's pretty obvious it removes the business logic and data logic from your applications it helps customers eliminate the technical debt as they move to the cloud and it helps you eliminate the monolithic bottlenecks right because each micro services team could operate independently could have their own deployment schedules and there's just not a lot of dependencies across the teams and what customers are trying to achieve is they're trying to improve the developer velocity so this all sounds great right but how do you get there if you are running mainframes if you have monolithic applications centralized data stores a single CI CD pipeline maybe two three releases a year so how do you get to the cloud unfortunately there's still no announcement around just lifting and shifting a mainframe to the cloud so what are some of the approaches so the good news is that there are successful patterns out there over the last couple of years we've seen some really great things from a lot of a lot has to do with our partners and some of our customers have had great successes in these approaches so for example one approach could be a repost approach what a riho step roach does is it for the most part maintains your Kobo code nothing changes from that perspective but what you will do is you will recompile that code on a permanent emulator that you'll run on ec2 so you are tied to an emulator you will also need some third-party tools to control things like printing and maybe access to storage but there is a way to Rijos that mainframe and there's some third-party solutions that do that of course you know some of the challenges with that type of approach that we'll talk about is that you may not be gaining the full value of the cloud there are also multiple reengineering options and they're kind of different subcategories in that area right so one approach could be refactoring we have some partners in the space which are providing tools that will actually actually do automatic code conversion and they'll try to refactor it so this is not a lift and shift what they're actually doing is they're analyzing the application model of your mainframe and then they're trying to create a similar application model leveraging AWS native services so this could give you access to my cursor versus it could give you access to higher-level AWS managed services and in some cases that code conversion could be done automatic there are also options of course to manually rewrite your micro services over your functionality into micro services over time and then there's a whole bunch of different data driven approaches as well so for example you know migration of the job of batch jobs right so we are you could potentially be streaming data from your centralized data store or you could be moving files that say your VM files into s3 and then just implement implementing certain aspects like batch reporting it could also be analytics capabilities where you're taking that data moving it to the cloud and then performing your analytics in the cloud those are all aspects of re-engineering Reap lat form could be an option this is really specifically if you're running your mainframe with Linux I think about 75% of z/os mainframes are actually leveraging some Linux so if you have linux and java applications on your mainframe you could potentially do some replant and run that on ec2 and then of course there's repurchase and retire where some of the functionality will simply be sunset over time either by SAS solutions or new functionality that you develop for most customers it's a combination of all of these approaches right some functionality will naturally be retired some of it will be read maybe we host it and some of it could be reap lat form there are a lot of considerations to think about right things like risk and impact to your operations but two of the factors that I also want to call out our cloud business value and the time it takes to do the migrations right or to do the modernization so if we look for example at a Rijo stop ssin from an operational perspective it's probably the least amount of risk it's also doesn't require you up your team's to you know retool as much but when you're doing a repost you really are not taking the full value of the cloud right because you're running in this emulator environment you're not using micro services you're not using some of the AWS manage services you really are not getting a lot of value but you could get to the migration pretty fast and sometimes when customers have things like you know data center closures and they just simply need to get out and move to the cloud they'll take that approach with refactoring automatic let's say code conversion you'll definitely be running in need of AWS services so that allows you to then get more business value from the cloud but of course the timing could be a little bit more for example then free host and then of course rewriting you know that's where you get the most value because you're doing things in the most cloud native way possible and you're making new decisions as you rewriting the code but of course that's a typically a multi-year journey the good news is we do have professional services team and partner teams that actually have expertise in each one of these areas so we could definitely help in that space but we also have customer successes in each one of these approaches for today's session we're primarily going to talk about re-engineering and the focus is going to be on kind of data driven re-engineering and the approach that the Vanguard team is going to be taking is going to focus on you know focusing on the data and then potentially rewriting some of the code into micro services so Martin Fowler I think every tech conference has martin fowler slide he had a great observation I think he's the first one that coined the term the strangler pattern what he was doing is he he was in a Australian rainforest and he observed these fig trees that had these vines what happens is a I think a bird or a bad will put a seed on an upper branches of a tree and it'll start putting vines all the way down to the soil and over time you have these hundreds and thousands of vines that go all the way down to the soil and eventually they actually cover up the host tree entirely and then on the soil level they actually end up competing for the nourishment and then eventually the host tree actually dies and you basically have this new fig tree with a hollow insight so he basically made an observation to take the same approach for monolithic to microservices migrations where you're going to develop a new system around the edges and over time more and more functionality will move to that new system and that original old system will eventually be strangled so on a high level what does that mean so if you have your monolithic yet say your mainframe with a centralized data store and a monolithic application and you have your users accessing your application through a browser over time you'll develop a micro service that micro services potentially could have its own data store and then you'll put some type of a proxy tier in front of it or an API tier and you'll redirect users to the new functionality and then of course over time you're going to build more and more of these services to a point where more of that functionality is using the new capabilities and then the old system becomes strangled the challenge we see sometimes is depending on how that's implemented it doesn't actually always leave to strangulation because depending on how these new micro services are developed you still could have a lot of dependencies either on the actual monolithic code or potentially even just the monolithic centralized data store so in practice we are seeing sometimes customers have challenged where that actually is not achieved so the approach that Barry's going to talk about is the approach of focusing on the data right where you actually figure out a way how you could get the data to the cloud and where your micro services are actually leveraging the native data stores in the cloud and all of your rights are you not doing directly to necessarily rights to the mainframe you're leveraging things like eventual consistency is bound the context and where you don't have that tight dependencies to the on-prem data store or the monolithic code and that's approach that bear is going to talk about and with that I want to introduce Barry who's gonna do a deeper dive in the vanguard cloud data architecture [Applause] Thanks Thank You Ilya so I'm very sure Keefe Enterprise Architect at Vanguard I work in a team called the cloud construction team which is part of the CTO office we're gonna be starting off talking a little bit about our legacy infrastructure what we have at the moment what we're hoping to move from working to move from then we're going to start talking about some of the various architectures that we put together to help us move from mainframe to micro services and then finally we talked a little about lessons learned and things to bear in mind us you make the journey first just enough about Vanguard to give you some context of the organization so I like to joke that we're a small investments company we actually have over four trillion dollars worth of assets under management they should tell you two basic things about the organization firstly we're very conscious of security particularly fiber security making sure that we're well protected and the second thing is we're very highly regulated so organizations such as the SEC and FINRA will monitor us make sure that our computation is both timely and accurate if we don't get things timely and accurate best case we get irate customer phone calls that we need to spend money in order to process worst case we get fined by organizations the regular two regulators and we can sometimes end up on the front page of The Wall Street Journal which is never a good thing if you're not there for the right reasons we have founded 40 years ago so basically we've seen every generation of IT from mainframe micro serve mini-computers client-server service ordering oriented architecture and micro services now moving on towards cloud don't let my accent fool you we are us-based just outside of Philadelphia in in Pennsylvania and it occurred to me a couple of days ago that 40 years ago our founder had the forethought to locate us almost exactly halfway between Wall Street in New York and the future home of Amazon's Us East one data centers we have number of different lines of business two of our largest are the retail line of business which allows individual investors to open accounts and trade mutual funds with us and an institutional line of business which obviously works with companies to provide 401ks and for those of you not from the US that's kind of like a pension tax deferred savings for for use in retirement our institutional line of business is very focused on non regulatory compliance such as sot - so something that we can use to prove to our customers that we do things right in the IT space and finally we have no outside owners so the returns on the investments we make provide better returns for our customers and since our staff have 401 k's with the company we're actually incentivized to be as efficient as we possibly can and provide value to our investors so our current IT environment obviously we have a whole bunch of data centers a huge reserves to handle spikes in capacity and availability and within those environments or data centers we effectively have three types of operational system we have the systems that our business uses in order to decide which mutual funds to buy analytics data those kinds of things a lot of that is cots software we then have our system of record which is our real old systems that keep track of who owns what within the system and then we've got our system from engagement which are the tools that we use to interact with our customers interestingly 20 years ago the system of record in the system of engagement were the same system right customers would call up and the phone operators would be typing into mainframe consoles in order to make transactions happen nowadays obviously it's all web interfaces typically our web interfaces you can see here custom web applications three-tier large monolithic java applications very complex many of them now have internal caches non distributed which means that all the sessions are stateless obviously that impacts scalability our largest lines of business use db2 on the mainframe for their data storage and some of our other lines of business do use other databases as well our mainframe environment really is the crown jewels of Vanguard we're not running Zeile Enix MVS MVS based we have a build system for COBOL code and version control system for it as well and then db2 which is the system that supports our system of engaging with our customers there are at least three thousand relation tables and six thousand COBOL stored procedures in that system which makes makes a trivial migration very difficult to achieve with enough our record-keeping systems part of our system of record which is all based on VCF files CICS Kix interfaces and COBOL batch processes and those are typically run every night to perform all the necessary calculations in order to provide the data from our system of record into our system of engagement there's a huge number of integration processes that run point-to-point integrations with multiple systems that move data from the record-keeping system into the the database and vice versa and typically that goes through MQ and COBOL batch processes and then finally people interact with our mainframe a small amount through the CICS interfaces slightly more through MQ interfaces but the largest interactions are through the db2 so why is it we want to migrate we're not doing one or even two migrations here actually doing three separate migrations the first is the migration to the public cloud so we started to look a private cloud we came to reinvent we realized we can't possibly match Amazon so let's look at using them give us the ability to use infrastructure as code we can spin up environments as many as we need we also get a lot of managed services right why run our own message queuing systems our own databases when Amazon can run it for us right we're a financial company not an IT company and finally if you're using on-premise systems no matter how many you buy there's still limits to the elasticity right and if you buy too many they're sitting around if you don't buy enough you can't cope with the kind of spikes that you might experience and those are the things that go away with public cloud from the main pane moving off the mainframe perspective firstly with any proprietary system if you can't get the same capability in the cloud provider of your choice you're somewhat limited to at best a hybrid model the cost of mainframe basically you're running an expensive vertically scale system as opposed to being able to take advantage of cheaper horizontally scale systems we also have effectively our development staff divided into two parts one part is the COBOL developers that work on the mainframe side the other part is the Java developers that work on the system of engagement we wanted a common programming model so that we didn't have to worry about the skills that are starting to be more difficult to find in the mainframe space and also get the same process from the web tier all the way down to the data accessed here and then finally why do we want to my great from legacy web applications to microservices so no one gets up out of bed in the morning and thinks themselves I'm gonna write an application today and it's gonna be an unmaintainable baller mug but what tends to happen is over time the tears that you put in place get people people take jumps through them when they shouldn't that the tears get blurred so what micro-services gives us is a network boundary that enforces a strict bounded context and all the access needs to go through that network boundary what this means is that we end up with smaller pieces of code it's easier to test them it's easier to deploy them and it also over time once we start taking advantage of those network boundaries it allows our development teams to adopt other mechanisms for implementation in terms of different languages and also different database storage mechanisms so rather than just the pure relational that they tend to work with now before we started moving to public cloud we started to implement micro services for use on Prem basically seven step process each step builds on the capabilities of the last so we have the monolithic applications on the far left hand side all the way to next-gen apps or micro services on the far right the first step was start to monitor our code builds instrumentation so that we could produce clean modular code this would enable us to isolate the data layer from the business access oh sorry the business logic once we have that data layer isolated we can start defined bounded contexts start to split things up into services and once we define those bounded contexts the step to cloud in order to enable the step to cloud we really needed to make sure that we were completely stateless within our micro services and that would give us the kind of elasticity that we were looking for the steps aren't purely technical there's some cultural ones here so the move towards being able to be completely agile and take advantage of experimentation was was something that our business was very interested in but moving from a framework that was originally developed for use on an on-prem private cloud environment to a public cloud required some additional thinking so you see secured up there I'm not suggesting that we never thought about security until we went to the public cloud but we had to think of it differently because with public cloud we started to experience the situation where members of staff would suddenly see things in newspapers right some company left their s3 bucket exposed all their data was leaked and they end up on in the newspapers we started have to ensure that instead of going through a process where we would validate the security model and then implement the tool we would have to be able to continually be able to answer the question why what do we have in place that would prevent that from happening here so just a slightly different way of looking at security as part of the move to public cloud we wanted to make sure that we were able to take advantage of multiple geographic regions so rather than just a single one and all of our customers from all over the world coming to one place we wanted to be able to move the compute and data out to them make their experiences as as good as possible we also wanted to be autonomous within a single region so if there was an availability problem a disaster problem and networking problem we wanted them as much as possible to still get that really good user experience by accessing resources as close to them as possible and be at a cope with the fact that we may lose connectivity to other regions we wanted low latency from two different perspectives the first perspective would be from the real-time movement of data from customer to us but the second perspective is from the users perception of latency so rather than clicking on a button and seeing the little whirly hourglass from many seconds before something happened even though we might be transferring the data allow them to continue with what they wanted to do as quickly as possible we needed to think about compliance we always think about compliance but now the regulators are going to be walking in and saying we know you're using public cloud it's potentially a different level of of compliance that we need to be able to show a different level of transparency and then finally need to be cost optimized so when you're working with an on-prem environment everything kind of the the costs to the development teams are often completely opaque they have no idea it's a it's a lot more obvious when we're looking at cloud in terms of cost optimization I heard a interesting analogy anybody can build an expensive bridge that doesn't fall down the trick is to build the cheapest possible bridge that doesn't quite fall down right and that's kind of what we're trying to do with cloud what's the bare minimum we can get away with while still providing the customer experience that we want so with all these in mind we spoke to a number of teams within the organization we've got a set of requirements from them we came up with designs we iterated through the designs we spoke to a number of partners and consultants and said this is what we're thinking of doing and we came up with a design that met all the requirements and before you start adjusting your glasses it's purposely blurred it did meet the requirements it used a hub-and-spoke model right we adopted a hub-and-spoke model because that way all the updates would happen at central location there was no possibility of conflicts and meant that we could use our existing business logic to validate what was going on we utilize in the cloud for the spokes so where we wanted to move the compute and data as close to the customers as possible that would be at the end of the spokes there was a number of decisions made in the past with respect to the container management system that we incorporated and also in terms of an object store caching layer the object store caching layer was originally too conceived to reduce the number of transactions that we were running on the mainframe in order to deal with the fact that we had the relational model on frame in db2 and the object model out in the cloud there was a fairly sophisticated object relational mapping that worked in two different layers the first if there was a cache miss in the object store and the second when changes were actually happening to db2 outside of this system via batch via our existing monolithic applications taking those changes and replicating those out to the to the object store at that of the spokes because this design required making two writes one to db2 so that our existing systems worked and to the object store at the hub that the hub is the bottom glue box we came up the concept of business events so that we could get something in that was agnostic to the actual data platform being used and then could be translated into a sequel code for talking to db2 and API calls for update in the object store because we wanted to keep the object store and db2 in sync we're using a two-phase commit to achieve that the solution was asynchronous so definitely eventual consistency where when a micro service would make a change if it immediately read the change from the object store it would not see that change we still had a reliance on proprietary hardware and in fact we were actually running more code on the mainframe as part of the process to get away from it and we were leveraging no manage platforms so still doing an awful lot of heavy lifting ourselves as we started to become more familiar with public cloud and looking at moving towards public cloud we started to look at how can we split that hub and move more of the workloads out we came up with this concept of an extended hub so that the hub would be extended between us and our closest region us East one and the spokes would still be located around the world complexity increased so we decided to take another look how could we do things differently we took the list of requirements that we originally had and rather than try to build the solution to meet those requirements we said this what is the simplest thing that we can possibly do and what subset of the requirements do we actually meet with that solution we actually came up with nine options this is option one there was actually an option 0 there was just so awful in terms of functionality I'm not even showing it but basically the micro service and batch processes would write to db2 directly db2 being at the very bottom box there a change data capture process if you're not familiar with that it's basically something that reads the database transaction logs and then makes those changes into another database or another system so the CDC would pick those changes up move them to the cloud-based part of the extended hub where they would be written into the two databases those databases that would then be read by the micro-services and since those databases would be local to those micro services the user experience would would be good for read operations which is 90 plus percent of our actual traffic the system is considerably easier because of considerably considerably simpler because there is no nests need for a translation between a relation on an object model with this particular design we're moving from relational to relational the CDC handles it it makes life nice and simple and it's fairly easy when you're working with one RDS database to get easy improvements with scalability by adding read replicas and availability by going multi a-z option 3 implemented more of the requirements that we originally had so one of the problems with with option 1 is that if there was a network problem between the micro services and the db2 database on Prem effectively the micro service is waiting until the timeout occurs users who are trying to make an operation involving a write or waiting for the little hourglass to finish with the buffered writes option the micro service rights to a streaming data platform located in the same geographic region that it uses the business event when it actually writes that message and the business event is propagated down to the extended hub which makes the right to the db2 database the years of perceived latency as a result is very low the availability and disaster impact is considerably lower because our users can still make writes to the database and they're stored before they're forwarded and if we have an incredible increase in traffic the part of this solution which can't scale is that they face at the very bottom on the mainframe and we can absorb the spikes using the streaming data platforms a couple of the requirements that we had though were the single integration point what we wanted to give our SI organizations was instead of having the need to write integration across multiple point-to-point solutions they could go to a single place to see what was happening with the system and asynchronous rights were still a fact of life eventual consistency was still a fact of life the idea there was that it will be solved in the application layer talking to the micro Service gonna go into a little more detail about the buffered writes how he achieved it so what we wanted was that because we had this asynchronous system that applications would be aware of where they were when asynchronous writes were happening so the micro service would write to the streaming Data Platform a piece of code there labeled the replicator would take that move it down to on-prem where we picked up bionis dispatcher the dispatcher would then look at the business event and determine which micro service can actually make this update in our db2 database and it would make the appropriate call the reason for the double hop there is that the micro service teams that owned the bounded context could implement the lambda functions and deploy those independently in order to have control over their own data and make the updates in the database that made sense to them in the event of everything working correctly the dispatcher would get back a good return return result it would append some information for the business event that we could use for analytics for example the processing time and it would then push the business event into the done stream and that would be propagated around the network by using the IDS in the to do and the done stream we can then determine the particular operations happened and in the event of an error the message would end up in the error stream so everyone would be aware by looking at the error stream that there was a problem that occurred we developed a micro service that would actually read from those various streams so the bounded context for the Kittyhawk micro service was actually the logged in user so an application which didn't necessarily need to know about an every micro service it needed to deal with could still determine if there were things in flight that might impact its operation and we call that micro service Kittyhawk because it deals with in-flight messages during right operations so continuing along the the various different options that we offered the lines of business we're aware that some of the lines of business were very interested in not sticking with a relational model but instead looking at a no sequel option so we came up with this solution use this no sequel datastore or walk you through basically when a write happens to db2 the CDC tool again picks up the translation transaction moves that into the hub where the CDC tool then takes that transaction data instead of writing it directly to a relational database it puts it again into a streaming data platform the message is then read from the streaming data platform and then based on the contents the tables the rows that that are actually impacted by that change it calls another what we call event writer and then the event writer pushes the message through to the same streaming data platform on the left-hand side that was used by the write process the dispatcher then picks up passes it to the DB writer because of identifiers that the event writers put into the message we can determine that that message came through the CDC process so rather than update db2 we update the no sequel platform we now have our single point of integration because all changes to the system are flowing through the streaming data platform on the left hand side so anyone can write code at that point of time that looks at that streaming data platform and is able to determine all the changes that are happening to the system we have our relational - no sequel mapping happening in the event writers and the DB writers spread across those two components we're able to leverage a no data no sequel datastore we have a slightly higher latency but it's somewhat hidden from the user but we have introduced significant complexity but we've given it to our users as an option right you want option 1 simple - an option 6 more complex and the users were able to look at the requirements that were implemented with each approach and make the decision for themselves rather than take a one-size-fits-all so what did they actually pick lines of business couldn't decide one line of business still wanted to stick with relational the other line of business wanted to move towards DynamoDB it's really interesting being here this week we've seen significant announcements about both right Aurora for Postgres now available server lists for example global tables for Dino DynamoDB now available the solution then gives us both in addition because we are taking the data replicating it from on Prem out into the cloud the analytics part of the organization also said we could we can leverage that if you're pushing the data out through the CDC tool we can siphon it off use that as well for big data analytics so that's been added as well what we have now then is a system where if there's an availability event a micro service that's based out on one of the spokes can still read everything that it needs to from the databases we'll have an idea when we replicate the data out exactly how stale that data is when the user wants to make an update the update is moved into the streaming data platform even if we can't replicate that data down to the hub at least it's there we can let the user off the hook we understand their intent and then all the work still goes on in the hub and we're reducing our on-prem footprint sorry just just to go through the green lines represent the relational process and the purple lines represent the no sequel process that you saw on the last slide so in terms of mainframe strangulation strategy what we've seen here is like the first step we we take the data we move out into the cloud we've provided ourselves with an abstraction layer where the micro services can be built to interact with the data format that makes the most sense for them either relational or no sequel they can't be deployed they don't need to know about all the rest of the process that keeps those data stores up to date we then start to take our monolithic applications so at the moment those monolithic applications are used to making calls to the database getting the data building the HTML code sending effectively the page all ready to be rendered back to the browser we need to take that and remove some of that rendering so that what we actually sent back to the browser now is the instructions that said once you get this page you need to go back to the Micra service grab the data that you need and render the page yourself so basically refactor the monolithic apps make Ajax calls the micro service grab the data and display it as necessary in parallel with that migrate batch processes to the cloud so that the data that is in in the cloud now that we've replicated out there is used the batch process instead of db2 on Prem use for the batch process we're looking at the kinds of batch processes that we have that operate within a single bounded context once we start to move back to process out to the cloud and some of those will require orchestration type services that work across multiple micro-services in order to perform those types of operation but once we're able to move those batch processes out to the cloud and we're at the stage where a particular bounded context is no longer accessed within our own Prem database then we can start to switch the replication so we keep the gold copy of the data in the cloud and we push the data back down to db2 should it be necessary to actually read it there but everything is actually happening in the cloud for the short term we're keeping the mainframe record-keeping systems the the CICS the lease and the COBOL systems treating those as a bounded context so the integration logic that we had that would take the data from db2 move it into V Sam at the beginning of the batch process and then at the end of the batch process would take that data from db2 sorry from V Sam and move it back out to db2 at the end of the batch process would instead at the beginning of the batch process move the data from the cloud into V Sam and then at the end of the process from the V Sam files back out into the cloud and obviously over time chip away at this process move as much as we possibly can out into the cloud and basically strangle what we're doing there's a few different Amazon services that we use in order to accomplish this relational database service basically what we looked for when we were working with any Amazon service is four things first of all top to compliance so many of our applications need to be stocked to compliant our customers are looking for that in the institutional space next thing is data at rest encryption so mandate came down from security that every piece of data that we store in Amazon needs to be encrypted at rest user access management is basically where we are absolutely sure about which users exist within our various data stores and the rights that those users have and the ability that if someone leaves the company we can be absolutely sure that within minutes their rights are removed from those databases and then finally data activity monitoring is the recording of every select insert update and delete that happens on those databases which is absolutely necessary for consistency integrity and also data loss prevention right we can tell exactly when data was read from our database so with RDS we were really lucky that top to compliance came straight out of the box and data at rest encryption has been available for years user access management was more difficult the on-premise systems that we had from accessing databases did not work with the database that we were looking to target in Amazon so we ended up writing a rest interface that would integrate with our on frame IDM system and create and delete the necessary user IDs and roles from the RDS databases it also offered the capability that we can on-demand request a list of all the users and all their roles and all their accesses in a database that we can two auditors in order to prove that we meet in that particular requirement data activity monitoring we use a third party product on Prem the track solver select inserts updates and deletes to db2 it could not be used with RDS because it requires installing an agent on the actual box we're able to get around that particular issue by pulling the RDS logs every so often from RDS moving those into a log consolidation system and monitoring for particular access so we could actually see what was going on there dynamodb again lucky was stopped to compliance there's no data at rest encryption for DynamoDB so we're taking advantage of client-side encryption they're basically encrypting the data before we move it to DynamoDB that certainly imposes some restrictions on what we can use dynamodb for you suddenly lose the ability to filter or query on the data that you've encrypted user access management for DynamoDB was fairly easy to achieve because it's tightly integrated with Amazon's iam Identity and Access Management Service so that was easy to achieve and data activity monitoring was also something that was missing so dynamodb streams gave us the ability to monitor updates deletes and inserts to the data but we couldn't monitor the slits so potentially there's a data loss issue there we solved this by again using client-side encryption even and then taken advantage of the fact that we do get a log of cloud trial access to kms so every time someone actually grabs the key in order to decrypt the data we do have a record of that fraida West lambda two or three weeks ago was it stop to compliance was granted for a SS lambda up to that point we were very careful about our app selection and also a SS engagement constantly talking to the SI Ilya saying when you're gonna be sucked in compliant for lambda dagger at rest encryption is not really an issue user access management again because it's integrated with AWS iam and data activity monitoring not an issue for lambda Amazon Kinesis also part of the solution stop to compliance again was granted a couple of weeks back in exactly the same way up until that point it was being very careful about the applications that were selected and constantly asking a de Bresse is this gonna be available it's really important to us data at rest encryption was announced back in July so we were really happy about that stops us jumping through hoops from the encryption perspective user access management again is provided by AWS I am and data activity monitoring is not available for certain types of data that need it we can use client-side encryption and again we get the kms and cloud trail logs when we decrypt that data so what are the impacts of migration so firstly the unmaintainable big ball of mud is gone with the microservices right we can use CD pipelines we can pull the data we use the pull request model to force a peer review so whenever I users are making changes someone else has to approve those changes before the actual code moves from their particular feature branch into into the main branch it gives us some quality controls and as a result we're doing continuous builds of this improving quality and code coverage of tests micro service print cause we have the strictly enforced bounded contexts and the move on the NGA stack to cloud means that we have stateless and therefore much more highly scalable services the negative of the micro-services approach as supposedly on prem is the eventual consistency which we're solving using at the application layer and the the Kittyhawk micro service from a cloud perspective we have our infrastructure as code pipeline so basically we make an extensive use of cloud formation in order to set up these environments that we're using in the same way that our code goes through the peer review pull request process so does our infrastructure we have the same kind of quality gates and rapid feedback when we're standing up infrastructure now making a extensive use of managed services in the public cloud so dynamodb amazon RDS Kinesis lamda there's a lot less heavy lifting for us to do but of course by moving to the cloud model we are now dealing with eventual consistency and latency again solved at the application layer and finally with this approach we've moved and achieved the requirement for a single development model developers are no longer out needing to write COBOL code to run on the mainframe they're now writing lambda functions that can reach through pull the data out of the mainframe the direction we're giving them is try to use frameworks that even make that agnostic so that we can continue to start taking advantage of new technologies right if you're talking to an RDS database today think about what it would take to actually move to no sequel in the future or whatever comes next we've achieved the goal of polyglot in terms of data stores and languages so by having the network interface between the various components of the system we can implement a micro service using almost any language as long as the interface remains the same the caller that the client of that interface doesn't even need to know the language that it's developed in so that's a huge change from the monolithic codebase and also in terms of data stores so we're now able to tell our developers if your particular micro service makes for example extensive use of relationships there may be a graph database is the right thing for you to use rather than relational rather than no sequel it gives us that kind of flexibility and I've got compliance on there as a bullet point in some respects compliance is easier we have the compliance information from Amazon so we're effectively leveraging their compliance rather than having to do everything ourselves but there's certainly an increase in scrutiny so lessons learned basically fall into three separate categories from a regulatory perspective when you're working with a large monolithic database it's very easy to see everything within that database has been sensitive PII type data when you're taking that database splitting it up into multiple different bounded contexts moving it around to different micro services it's possible to think of it differently so understanding the different data classifications is really critical being able to treat those different data classifications in different ways for different micro services can save you a lot of time and effort in terms of the possible solutions that you can use we've often found that it's useful to have a back-up plan because the particular technology that you most want to use perhaps isn't released when you would like it to be isn't compliant as you would want it to be so being able to have another option and often we found that self-managed solutions are the answer there so although a particular managed service may not have the capability that we need we can stand up ec2 instances and install our own software to use for the short term in terms of acceptance so I said earlier we're not just doing one migration we're doing three migrations right and people don't like change so working with large groups of people letting them know what's going on making them part of the process definitely very valuable what we found is that work with large teams to publicise will work with smaller teams to gain consensus and understanding and move forward and in terms of cloud specific we found a number of times AWS has released new capabilities sometimes just after we implemented something that provided the same thing but obviously we want to take advantage of the managed capabilities as much as possible it meant summary architecture we've actually coined within the cloud construction team the idea of continuous architecture so we've got continuous integration continuous deployment there's now this continuous architecture expect it to change and then you won't be disappointed and also build a good relationship with the AWS team so that's been invaluable for us we've found that sometimes you go to AWS if you say what's your roadmap for the next three years you'll be met with stony silence they won't give anything away if you go to them and say I have this specific problem I need this can you help me we've heard much better responses from you're not the only person to ask for this we'll put you on the list we'll see what we can do to prioritize it all the way to hey we've just rolled this out as beta tests we'll add you to the beta you can start playing with it right and right away so that's it from me thank you very much for attending Elia and I will be down here to answer any questions and I know all the swag is probably gone but if you could fill out your comment cards that would be greatly appreciated thank you very much [Applause]

Info

Channel: Amazon Web Services

Views: 6,852

Rating: undefined out of 5

Keywords: AWS re:Invent 2017, Amazon, Enterprise, ENT331, AI, Security

Id: XYwYiQBCcaM

Channel Id: undefined

Length: 57min 30sec (3450 seconds)

Published: Fri Dec 01 2017