AWS re:Invent 2019: Backup-and-restore and disaster-recovery solutions with AWS (STG208)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right welcome everybody my name is Jody Kirk this is STG 208 backup and restore and disaster recovery solutions with AWS thank you for all joining us today welcome to reinvent 2019 I'm here with two of our esteemed customers and with Creighton Swank who is a cloud architect for Caterpillar as well as Juan Mejia whose data center manager for BankUnited Juan and Creighton are going to share with you their stories how they've been able to leverage the AWS cloud to support their backup and disaster recovery initiatives for their respective organizations so let's jump into the the overview for today first we're gonna talk with just a little bit about why we protect data in the first place just to lay a little bit of groundwork we're gonna talk about modernizing your backup using the cloud and some approaches that you can take in order to do that we're gonna talk about some common backup patterns so this is both how customers are using native AWS services to perform backups as well as our partner is V backup our is B backup partners and some of the architectures that those is V's support and how customers are deploying those solutions for their backup now we're going to talk a little bit more specifically about disaster recovery approaches so let's start off by laying a little bit of the groundwork like why do we backup data in the first place we backup data to ensure the safety and security of the information we store but in reality the reason we do it is because we don't want to lose it right and so because we don't want to lose it lost data can have impact on our business can affect our brand can affect trust and our customer relationships can affect future and current revenue streams as a result and so when you look across your business applications you realize that some applications are simply more important than others to your business so in order to ensure that those applications are protected effectively we try to understand how much data we can afford to lose as an organization for those applications and so that decision the recovery point objective for those applications or how much data we have to lose will dictate the frequency in which we take backups and it will also have cascading effects in terms of the decisions we make of the systems and services that we're using in order to support those those applications second thing we look at is how long does it take to recover the recovery time objective this is a function of how much your business can tolerate to have applications and its associated information offline in some cases an hour of downtime for some businesses can be in the millions of dollars right so the recovery time objective is the second critical aspect of defining your backup and disaster recovery strategies so question is why don't we just backup everything forever using techniques like continuous data protection point in time recovery if we backed up every application forever so we could scale back to any point in time in the past and restore to that specific point in time that would be kind of Nirvana right well the reason we don't do that is because we don't need to different applications have different levels of importance to your organization and therefore doing so would be backing up everything forever would be overkill so what we do is we try to create a balance between cost and liability or risk associated with not backing up applications so how do we create that balance the first step is to go through the process of determining how important your various applications are we do this through a business impact analysis the business impact analysis is designed to help an organization understand the cost associated with downtime so that costs associated with downtime will help you set your RPO and your RTO requirements for your business and then once you develop those RPO and RTO requirements then you can have the the discussion with your finance teams their ability to fund it right so that's where the balance comes in the funding ability with the requirements that you're required for your applications so now that you've got your requirements established there may also be some compliance considerations so for example if you are required for compliance purposes to have a copy of your data over 150 miles from your primary data set that might be at a consideration that affects your overall strategy but these requirements are the foundation for how you go about determining your effective backup and disaster recovery strategies all right so now you've got options for where to store your backups why should you consider the cloud in the first place so first let's take a look back at the traditional backup environment in the traditional backup environment where backups were done on premises you've got application servers you've got media servers that are making copies of the application data backing them up to tape or if there's lower RPO probably to disk and then some copies of those backups are stored typically off-site for longer-term compliance using a third party data bunker service to get those off-site what we heard from customers is that there's a whole lot of undifferentiated heavy lifting in that process the process of managing the tape arrays the process of managing the third-party vendors to get that that data off-site and the cloud provides some compelling options for streamlining that overall process so why is moving to the cloud so common for backups one of the reasons is because it's it's the infrastructure that you've already put in place to manage on-premises backups can be utilized to support your cloud backups right we can support utilizing your existing backup applications and your existing backup workflows with little modifications instead of pointing to a disk based target on-premises you're now pointing to a cloud based get in Amazon s3 for example very little modification to your backup workflows additionally you get cost-effective off-site storage alternatives so you get separation of your backup data from your primary data with no incremental no initial upfront investment pay-as-you-go pricing so it's very cost effective in terms of getting started another reason is backup presents a discrete workload it's not an active workload meaning it there's no application depending on it at that point in time so it's easier to do POCs trial getting those workloads into the cloud and then if you're running tape just the elimination of the the tape infrastructure for getting rid of that infrastructure it's prone to failure typically high support costs so customers are more than happy in most instances to to find a better way to manage that function and then if you're doing any kind of analytics or you want to gain better insights from your data having the data in the cloud allows you to have access to other services from AWS to support analytics and understanding that data or to provision a copy of that data so you can have for example a tests and dev team do some extra work on experimenting with that backup data so the first question customers ask is will my data be safe so Amazon s3 and glacier are designed for 11 9s of durability those are the two primary services that customers are using to backup their data to the AWS cloud so what does 11 nines of durability really mean it means that if you have 10 million objects stored in Amazon s3 over ten thousand years on average you will lose one object so that's highly durable highly durable your data is safe in the AWS cloud when you think about the AWS cloud we've got 22 regions that are online today around the world we've got three additional announced and when we think about regions we have our regions are built of multiple availability zones and each availability zone has multiple data centers within that region within that available exam so some of our cloud competitors talk about having in excess of 50 regions when in actuality they consider a region a single data center not a single data center with or multiple data centers within an availability zone or multiple availability zones within a region just a single data center constitutes a region for our competitors so it's really apples and oranges comparison in addition in each region has two transit centers that are fully independent and redundant that allow traffic to cross the AWS network enabling AWS regions to connect to the global network so there's full redundancy and access to your data as well based on the design that we have and then if your business requires compliance we provide more flexibility and capabilities to ensure that you're able to meet that those compliance objectives so for example with s3 you can replicate your data cross region from any s3 service including s3 glacier so for example if you have your primary Budd bucket in s3 you can make a replica copy of that in a secondary region in s3 glacier save you money on that secondary copy you can replicate from any region to any region you can replicate at the bucket level you can replicate at the prefix level you could replicate at the object level you can even replicate within and within a region now we just supported that just a quarter ago so you can actually replicate within a region if your requirements dictate that so lots and lots of flexibility in terms of being able to achieve compliance with the AWS architecture all right so now that we've covered why we backup and why backup to the cloud makes sense let's discuss some common backup patterns that we see our customers deploy so first I want to cover some of the AWS services that are commonly used in some of the ways that customers are utilizing those services for their backup activities so the first one is backups to using the I'll Gateway so the AWS Storage Gateway comes in three variants a file gateway a volume gateway and a tape gateway each of these models can be deployed either as a virtual machine or as an appliance within an on-premises data center or as an Amazon machine image within the AWS cloud so in the file gateway instance you can access the file gateway through standard storage protocols such as SMB or NFS and when you write to the file gateway data gets translated to an object and gets stored durably within the s3 cloud in the bucket of your choosing so back to the backup question like what does this mean for backups how are people backing up using the file gateway the first obvious example is they're backing up their file data to AWS to the to the file gateway so they're writing copies of their file data it's being backed up into the aw scrap cloud in their s3 bucket a very common use case as well is for database backups doing things like Oracle backups or sequel dumps using the the file gateway just like in on-premises where you might be using an AZ array as your backup target for storing your backup files you can use the file gateway in a similar way presenting either at SMB or NFS writing those backup files of the file gateway and having them durably stored within the AWS cloud so now the other thing that the file gateway provides is a local cache up to 32 terabytes of cache so you can provision that cache to have those backup files stored locally so you can have fast restore on-premises from that data that's backed up to the file gateway for tape gateway tape gateway is essentially used to replace on-premises tape libraries it's a VTL product so if you're familiar with virtual tape library it looks smells feels like a tape array however it's all based on software emulation so you're writing to tapes you're using the same backup saw where that you were using when you were writing to your tape array except this time you're writing to a virtual appliance instead of physical tapes those those rights are going to Amazon s3 where they're durably stored terms of the backup workflow you can use the same backup software that you are using to write to those tapes they all support the the virtual the tape gateway and then you can the in terms of the workflow you can have the same exact workflow you can label tapes the tapes are represented within the software as a virtual tape library so you you can actually physically manage those tapes view the software and eject those tapes into Amazon s3 glacier when you want to store them for longer-term compliance the volume gateway presents I scuzzy blocks storage volumes to your on storage or your on-premises applications the volume gateway provides either a local cache or full volumes on-premise while also storing copies of your volumes in the AWS cloud common way customers use the volume gateways the leverages ability to create EBS snapshots so when you write to the when you write to the volume gateway the data gets stored in s3 and can be snapshotted and then restore it as a volume in EBS and then can be mounted by at ec2 instance so this is very practical use case for migrating databases to the cloud for disaster recovery into the cloud you copy a volume to the volume gateway and it gets stored within s3 where you can do a whole bunch of things with it through the utility of the EBS snapshot functionality EBS snapshots themselves so if you're running databases on ec2 you need point in time recovery for those databases those volumes you can create volume snapshots with EBS snapshots so the snapshots are fully incremental meaning that only changes to the blocks in the volume are stored so it saves you money terms of cost efficiency and space efficiency and storing those backups earlier this year we announced the capability to support multi-volume crash consistent snapshots on EBS what that means is if you have multiple volumes sitting behind us a single ec2 instance you can back those up and restore them as a single operation okay we also just announced a capability last month about two weeks ago which is called fast snapshot restore with fast snapshot restore you can identify which EBS snapshots you want to have in a pre initialized State what that means is when you go to restore those those snapshots there's no initialization time they're already pre initialized so you get full provision performance for those snapshots at the time of the restore operation it's pretty cool capability just announced the last one I want to cover for people doing database backups in the cloud EFS is commonly used so just like we were talking about with the file gateway for backing up databases on premise if you're backing up resources within the AWS cloud you're utilizing backup utilities from some of the common database backup vendors so people are using it for SI P Hana Oracle db2 very common and what's nice about EFS for this use case is it presents itself as NFS a POSIX compliant file system so it's easy to write those database backups to EFS and as soon as you do right to EFS because EFS writes every write goes gets written across three availability zones each with its own independent mount point you can now access those backup files from any one of three availability zones so for a database administrator who's using a database backup utility this is the easy button alright with EFS you don't have to provision any storage you don't have to configure any kind of replication you don't have to setup H a it's all there for you so you set up the filesystem and you start backing up your databases and it's just that easy alright now we're going to switch gears a little bit we're going to talk a little about some of our partner labeled solutions so AWS has a broad ecosystem of backup and recovery partners these are everything from some of the most innovative new startups to some of the ISPs who have been in the market for 20-plus years and support you know the Fortune 100 type of companies we've partnered with all of them and we do a lot of great business with all of them but it's important to understand that there's different architectural models for utilizing our partners that you can employ as a business and so it's important to understand what's the appropriate model for you and how should I architect that utilizing the AWS cloud so I'm going to go through a couple of different architectural models and and sort of compare and contrast the differences and the value of each so the first one I want to talk about is just backup from on premises to AWS so in this model your you've got your traditional backup model where you're backing up resources within your own data center and you want to leverage AWS for your backups we've already established that every major backup vendor has built a native connector for Amazon s3 which means they can write directly to s3 as a storage target without any modification so what's also nice is that most of these backup vendors they license their software based on the amount of capacity managed so whether you're writing to local disk or you're writing to s3 it's not going to have any impact on the cost from the ISV for managing those backups right so that's key so and those backups can be written over a number of different means AWS direct connect over VPN or over the Internet and in addition to the native connectors as we discuss you can use the storage gateway or even third-party products like V TLS so in this example the master server is being run on premises and AWS is being used to store the backups it's worth so yeah and as I mentioned that there's typically no additional incremental licensing fees for that model all right so now this is a model so back up on premises remote and branch offices to AWS so for organizations that have distributed models where they've got lots of branch offices they don't want to have backup infrastructure in every back office branch office so the the management complexity that would be unwielding so what they do is they've employed a model like this where you've got a secondary backup server in the AWS cloud it provides federated management to that backup server from the primary master server within their data center okay so now you can have that backup server in the AWS cloud managing the backups for all of the remote offices and you can take all the infrastructure out of those offices so you can have endpoint protection within the offices all the data is backed up to the cloud if there's a problem with those remote offices you can restore that data in the cloud still have access to it and you can see as I'm going through these there's the box on the bottom that talks about the partners that support these different backup architectures so if they're in this box it doesn't mean that that's the only architecture they support it means it's one of the architectures that they support and these are the the partners that are in our APN the AWS partner network storage competency for backup and recovery alright so the next model I want to talk about is SAS backup from on premises to AWS so in this model you're taking away the complexity of running the backup infrastructure in your on-premises data center in a SAS model the SAS vendor creates the s3 bucket does all the configuration of the backend resources provides all the connectivity typically in the form of endpoint software that they use to transport the data to the cloud all the data transport as long as the basic transport capabilities of Direct Connect VPN or Internet as long as those are available they manage that entire process they provide it to you in a single interface GUI and you're able to manage it effectively that way what's nice about this model as well is because it's a SAS model it's all-encompassing and they offer it on a cloud based model so there's no upfront investment you pay-as-you-go and you pay for what you use okay we're seeing a lot of activity from our back up partners investing in the protection of AWS resources within the AWS cloud this model is backing up in easy to instances but controlling that from an on-premises infrastructure so if you're an organization that has traditionally backed up on premises and you want to extend the capabilities into the AWS cloud but maintain a common control plane or single pane of glass for managing that infrastructure if you want to have the ability to consolidate reporting across both your AWS and on-premises environments if you want to be able to create common policies and have those enforced across your on-premises and AWS environment this may be a great solution for you so in this model we've got the the backup server managing the remote resources that are within the AWS cloud via api's and so that's the way that that that model is affected so backing up Amazon ec2 instances with AWS hosted control so in this model everything is encapsulated within the AWS cloud we're not talking about protecting any on-premises resources this is all being managed management and protection of resources within the AWS cloud so it's very simple model everything happens within the cloud typically it starts with ec2 and EBS resources and then partners scale out their functionality from there and you can see there's a number of partners that have already begun supporting this model so earlier this year in January we launched a service called AWS backup aw sit back up it basically is applied to that same sort of configuration where you're you're backing up a bunch of AWS resources and only AWS resources within the cloud it's a fully managed service so it provides a centralization in terms of creating policies that you can apply for the backup operations across multiple AWS services you can see the services that we currently support EBS EFS RDS dynamo DB and Storage Gateway and what this does is it provides consistency so you can create a common policy so the backups are taken across these services in a consistent way you can roll these out across your organization and manage it centrally so that you can ensure that backup the backup activities are meeting your compliance objectives as an organization seeing great adoption with this customers who have built their own scripting and functionality to sort of initiate snapshotting functionality within the AWS cloud for various services that don't want to continue down that path of building on and maintaining those scripts have flocked in droves to the service so seeing a lot of good good reactions from from the service and the outcome from customers and some people say hey well wait a minute isn't you just talked about all your backup partners and AWS backup isn't it competitive we don't view it as competitive at all we view it as complementary to our APN backup partners and the reason for that is because we're making it open and available so with a double use backup we've created a set of API is that our partners can write to which gives them an integration point an integration point that allows them to write to a single set of api's but have access to all the underlying services that AWS backup supports as a service so an exact great example of that is EFS so AWS backup supports EFS EFS had not had its own native data protection capabilities prior to AWS backup coming online now we have part who are building support for EFS through the AWS backup api's alright so I want to switch gears here a little bit I want to talk from a customer perspective and I've got quick Creighton Swank here who's a cloud architect at caterpillar cratons gonna talk to us about his journey and caterpillars journey and modernizing their their backup activities for his organization so Creighton thank you thank you Thank You Jody good afternoon reinvent my name is Creighton Swank I'm a cloud architect for Caterpillar we're about three years a little over three years into our AWS journey my team focuses primarily on AWS sort of soup to nuts so we do operations consulting internally as well as architecture we work with a lot of the independent teams internally to kind of enable connectivity and expose our services to customers inside of Caterpillar it's a little bit about cat before I dive into the meat we are the world's largest manufacturer of construction and mining equipment we make diesel engines make natural gas turbines we also make diesel electric locomotives we do business under a number of brands as well we're a fortune 65 company we have four million machines in the field around the world today and we have plans to connect 1 million of them by the end of this year which I hope we're on task for so let's talk a little bit about who my team's customers are or who we enable inside of the inside of Caterpillar so we have internal customer groups within caterpillar corporate so these are groups like engineering and marketing so we do a lot of design as well as kind of public web service hosting things of that nature we also have external brands this is a number of other brands that are kind of under the caterpillar umbrella that do business throughout the world and we work with these teams as well so some of those brands are solar turbines cat financial Perkins engines I encourage you to visit our website if you want to see the complete list of brands you probably see some of those that you didn't know we're associated with caterpillar so starting a back up presentation with a bold quote nobody cares about backups they're expensive they affect performance why do we have to replicate the backup somewhere else they're already expensive my database doesn't support this so bottom line is backups our insurance policy Jody alluded to this earlier we pay our bills our insurance bills every month we hope we never have to use it but everyone cares about restores so when the data's gone the questions shift how long until my restore is done I don't care what it costs I just need the data why can't I go back to last night's backup now these drive to the concepts that Jody was talking about earlier defining rpoS and RTOS is part of what my team does as we work with customers so we try and work with them to balance cost not only cost of losing data but also the cost of implementing the backup solution so if we're down and it cost $10,000 but the backup solution to enable that to protect us from that loss of data cost a million dollars it doesn't make sense so what's the problem with backups now talking primarily about the traditional sort of on-premise backups in this in this particular case so they're expensive initial costs really high barrier to entry so you've got to go by the the licenses or or the appliances or media tapes or drives and you're not just backing up the current copy of the data you also have to keep changes so you're talking about retaining data change over a period of let's say 30 days so it's not just doubling your storage cost in some cases it's much much larger than that the other problem is when you have shared workloads you're not separating these workloads so if you have a unique set of rpoS and RTOS based on these workloads you kind of have to come up with a common denominator and treat everything like one so categorizing backup and recovery requirements it's hard nobody wants to lose any data so if you go to a customer and you say how much data can you lose and how quickly do you need to be up they're going to say zero and right now so what we have to do is kind of track that and put it towards here's the cost of enabling that so we can make financial decisions and these loosely defined RPO and RTOS it's just a challenge because nobody really understands what they want and everybody wants everything so you kind of have to go and and work with them to define you know how much can you really get what business process does this support are there manual work rounds you really have to get to know the workload I like how consistency moves in slow that's the other challenge so if you take a VM based or a host based backup of a server running a database well when you go to restore that it's probably not going to work so what you end up with this kind of a one-size-fits-all solution and that's really not what we're about in AWS so when we backup resources in AWS how do we do it well first of all we realize there's no one-size-fits-all solution that's that's kind of table stakes so each of the services that AWS offers has an independent backup strategy and as my team works to enable that that specific service within AWS we also work to define the backup capabilities and what options we can present to our customers in terms of frequency of backups as well as consistency so each service has to be handled differently we leverage these native capabilities when they make sense so cross region replication with s3 or RDS multi-region replication RDS snapshots point in time recovery these are all things that we have to take into account I'm not going to get access to the VM to install a backup agent on on an RDS instance so we have to get flexible automating backups Jodi made a really good point about backup manager this is something that my team has to kind of look into in the next quarter because we're one of those customers that kind of did it ourselves we have lambda functions that go and initiate ABS snapshots based on tags and we have lambda functions that go and delete the tags after predefined or delete the backups after a predefined recovery a retention window but we control it all with tags and we allow our customers to select those tags so when we go through to deploy those resources they have options now third-party tools also important and this is kind of what's going to give us the ability to back up on premise workloads to the cloud so partner solutions such as Dhruva are a really good opportunity here for us to kind of break out of simply our AWS environment and kind of move into the on-premise data center and the focus on that's going to be specifically the remote offices that Jody mentioned earlier so we control backups with tags so we integrate those tags of the template and those templates are accepting parameters so what you see here is a real simple example in the cloud formation template do you need a backup of your instance yes or no that's not what our customer sees that's the code behind it our customer sees that box on the right backup yes or no each one of those parameters can be presented through service catalog or through cloud formation deployments just as easy so think about it also including like retention time or backup frequency your backup window we're really about empowering our customers to make the choices to find their best times as well as to find the solution that meets their needs so we present these things through Service Catalog and we have a tagging structure backup frequency yes or no window retention period these are all pretty much the standard tags now these things are nice because we can add tags if we come up with a new capability or if things kind of open up or we want to leverage backup manager instead of this it could be take your backup plan some of our lessons learned these are things that we learn the hard way so if you're using a custom tool to take EBS snapshots and in this space even even lambda automation kind of follows this make sure you have one that removes them too that's important I mean only if you don't like money I guess but those things do add up if you're using s3 versioning or replication make sure you have lifecycle policies that purge the old versions after your desired retention time now that applies both the source and the destination buckets as well so from our perspective it's really about empowering our customers running in AWS to kind of take control of their own backups if you have ephemeral workloads that are auto-scaling you don't need to back those up but you do need to backup the persistent store that's RDS or if that's s3 go there back it up there I encourage you guys to look into backup manager as well because we have done a lot of undifferentiated heavy lifting to build the automation to back up our ec2 instances and now the service sort of takes care of it for you so with that I'm going to turn it back over to Jody thank you yeah good stuff I always like hearing about the the lessons learned and the stuff that you can't you know readily get but you're a customer talking about it it's always so insightful and interesting so thank you for that all right so now we're gonna change focus to talk specifically about dr you know we think about dr so business continuance the the the operationalizing of a process for getting a business back online after an event at AWS we defined disaster recovery as the opéra operationalizing of the systems that are needed to get the business back online so every business has some event that they need to plan for there's some disruption that's potentially going to take place and a disaster as we define it can be any event that has a negative impact on an organization's ability to conduct business or impact its reputation so these disruptions can come from a number of different resources of course natural disasters is one of them I happen to live in California it's a not so natural disaster so if you've been watching the news over the past couple years there's been lots and lots of wildfire fires in California as a result of the wildfires PG&E whose the public utility in California they've instituted these brownouts where they basically shut down high powered electrical transmission lines when there's conditions that are can de fires breaking out the reason is because the the line swing they create arcs and they could start fires so these are non natural disasters that businesses this affects thousands and thousands of businesses across California and there's not a whole lot of warning in terms of when they're going to to institute these brownouts so those are the kind of events that take place then of course there's you know the rogue actors both external threats and internal threats that could potentially harm your business so now all these things are they're not items that you can solve but you absolutely have to plan for right you have to plan for forever your business is the ambassadors for your business and managing risk and the reality is that many of us have not made appropriate dr plans so IDC estimates half of businesses would not survive a disaster particularly due the fact that over half of the applications are not protected that's crazy right but it's reality I'm on calls frequently with customers who are large well-established organizations that don't have dr plans in place and are mandated that they need to get some dr planning in place to support both you know applications that are running within the AWS cloud but also applications that are running within their on-premises data centers so this is a real fact hopefully none of you are part of that 48% but so if you're thinking about doing dr in the cloud what are some of the benefits of utilizing the cloud for dr well the first and most obvious thing is cost and cost is significant so if you're running a secondary site for dr purposes I mean there's the cost just of the facility itself the power our cooling keeping the lights on I mean that's a substantial cost that you don't have if you're moving to the cloud but then there's also all the infrastructure required which by the way you have to provision for peak capacity in order to support the failover of your applications to the capacity that they're required to keep your business running that's a substantial investment all the management and overhead in making sure that those those systems are up to date and online available for that failover vent to occur if you contrast that with building in the cloud there's no upfront initial investment right you only pay for right size computes and storage when you actually need it so what that means is you can scale down your disaster recovery environment and then scale it up as you need to when a disaster actually occurs that's for a cost optimization there's lower mountain IT management overhead there's more opportunities for automation it's easier to do repeatable testing of that failover event and you can get systems up in minutes so the cost benefits but also the the agility benefits of having a DR strategy that includes the cloud or significant so at AWS we have this framework for the different models that we can employ to support dr in the cloud the first one we've kind of already talked about backup and restore with backup and restore you have a backup copy of your data that you would use for restore but since none of your systems are on standby this method while cheap can be time-consuming as it would require you to stand up all the ancillary services and capabilities to support that application running live for your customers so the advantage of that is it's it's less expensive very cheap as we start going from backup and restore and scaling right across the spectrum pilot light where we increase costs and complexity slightly but provide more aggressive RPO and RTO to warm standby which is a scaled-down version of your existing environment to multi-site or hot standby which is a replica of your existing environment so you're actually doing load balancing across two environments so as you go across that spectrum you're gonna get lower and lower rpoS you're gonna get higher and higher costs and complexity so let's talk a little more more detail about these different models starting with pilot light so the pilot light option consists of your environments most critical elements running in the AWS cloud so the systems are either in standby or powered off state and data is replicated from your on-premises data center or it could be another instance within the cloud I guess but from your on-premises data center in this example to the AWS cloud I've got my database server replicating the data I've got all the application servers and web servers basically configured I've used cloud formation to pre provision the configurations of those those instances so that when I deemed to invoke them and there is an event I can quickly stand up those instances and minimize my my recovery time objective so when we have that failure event occur there are a few operations I need to run I need to initiate the application servers I need to nip initiate the web servers if there's any load balancers that I need to provision this is when I would do that then I need to mount the data volume and reconfigure DNS to direct traffic to the AWS cloud as the the failover site so when you're running it can't stress this enough if you're running the pilot light model pre configuration is your friend making sure that your pilot light system is updated making sure that all the systems that you need to support that recovery mode are pre-configured as cloud formation templates that you could scale up quickly and easily in the event of a failure so the next model is warm standby so a warm standby solution maintains a scaled-down version of your fully functional environment always running within the cloud so this is more robust than the pilot light because you've got systems that are always active and it's typical in this type of architecture to use a scaled-down version of the resources that you need so for example if you need certain types of ec2 instances to support the transaction workload that you have you can use scaled-down versions of those ec2 instances to economize for cost so in this instance when in a recovery event you can add additional capacity under those under the load balancer so you can add the quantity of ec2 instances you can change the type of ec2 instances in order to support the transaction volume that you have in this instance I've also reconfigured DNS record to reroute the traffic to the AWS site and then since you're running the active resources within AWS again you're gonna have a shorter recovery time because all these resources are already running but it is going to be slightly higher in cost and then the hot site so a hot site represents an active active configuration in both your AWS and on-premises infrastructure and this is the architecture you want to employ if you have very low RP o and RTO objectives for for this architecture I've got a mirrored database server so I've got its going both ways so it's bi-directional so the data is in sync constantly and then the data replication method that you use by the way is going to be predicated on the RPO requirements for for this particular configuration but since this is a fully operational model customers that have businesses that are seasonal for example they have large fluctuations can employ this type of model to absorb the fluctuations and demand that they have but since the the load is being split between both sites if I have a failure event traffic is automatically going to be redirected to that to the secondary site and then you can use auto scaling and pre provisioned resources in order to scale that environment to meet the capacity requirements based on triggers that you set within your AWS configuration and so in this mod in this model you're fully prepared for disaster recovery event because you architected your application to support that type of event right so cloud indoors a leading provider of data migration and disaster recovery solutions it was acquired by AWS earlier this year cloud indoor simplifies and reduces the cost of cloud-based disaster recovery by offering lots of automation so a few things core things that it does that are really interesting so with cloud indoor you can deploy an agent on your primary application servers that agent provides block level replication to AWS and that block level replication can support both the applications themselves and all the associated configuration data and files associated with the state of that application it then uses machine conversion technology to look at the price or resources that the applications running on and build an equivalent destination resource within the AWS cloud this is really cool and it takes it that one step further not only does it build the destination resource it also builds a scaled-down version of that resource and creates a staging environment where you can replicate the data to the staging environment have it live within the AWS cloud but then you have a mapping of the ultimate resource that you're going to expand to on a failure event so it's pretty cool technology right so in a failover event cloud indoors orchestration engine automatically launches fully operational workloads in the target AWS region at the time of the failover and this process includes cloning disks from the staging area to the target area and this entire orchestration process typically takes minutes of time and is predicated on the amount of time it takes for those resources to boot up so while the data stage you're incurring costs at a much lower rate which provides great economics in terms of overall solution alright so now I'd like to bring Juan Mejia from Bank united up to the stage Bank United's been working over the last year on a Ryoka tech chure of their disaster recovery strategy and so on is going to take us through that process and share some insights hey go on thank you thank you very much hello everyone I work with Bank United Bank United is a financial institution located in South Florida with thirty three billion dollars in assets my name is Juan Mejia I'm the data center manager I've been at Bank United for 18 years 24 years in IT I managed a storage hypervisor and business IT business continuity for the bank I'm going to be speaking on the pre dr modernization architecture challenges and objectives so prior to the our prior d'art architecture we have two dated physical data centers one in Miami Lakes Florida and one in Long Island New York multiple forms of methods of replication we have vSphere storage or a replication and database replication thank you sorry yes we have high-touch failover documented manual procedures high complexity requires heightened levels of coordination between the business and IT the primary challenges to the bank's data center topology are as follows environmental environmental prone to hurricanes and flooding geographical physical locations situated in direct flight paths and close to major highways facilities facilities requiring additional investments such as power HVAC to support ongoing dr operations when banking I decided to move to the dr facilities to the cloud the bank was working to understand the objectives scalability ability to increase resources accordingly in order to meet customer and business demands elasticity enabled resources to be made available when needed and only as long as needed in order to meet demand resiliency increase the availability and the survivability of the banks technology estate efficiency enable the bank to realize objectives without incurring significant significant idle costs and enablement help enable the banks digital transformation Preedy our modernization questions and solutions securely transport data to w2 AWS being a bank we have heavy compliance requirements so we had to make sure solution provided encryption in transit and at rest to keep the data sync once the initial seeding is complete we looked for a solution that can give us enterprise level replication with minimal dependencies to achieve this avoid paying for unused compute on top of the required storage we wanted to continue to have the same capacity as our current dr and not continue to pay for the expensive storage and infrastructure also giving us the benefit of pay-as-you-go address multiple replication use cases we were using multiple replication methods such as storage sequel replication oracle data guard and each of these were managed by a different personnel provision resources once a dr event has been enacted our d our design was expensive since we purchased additional equipment identical to what we have in production recover from hypervisor different and easy to moving to AWS we needed a tool to replicate in orchestrate our recovery on a different hypervisor we're currently using VMware and we had to move into an ec2 framework securely transport data to WS we use Direct Connect and clouding collander encryption which provided us a aes-256 encryption in transport and also provided encryption at rest keep data in sync once the initial seeding is complete we use block level replication and avoid the avoid paying for unused compute on top of the required storage so we're replicating to low-cost storage on AWS and we're ready to deploy our workloads we provision the compute and storage is needed this removed the purchase of identical equipment and as production which reduced Rd our costs address multiple replication use cases we're replicating at the OS level with a software agent rather than the hypervisor or san enabling support of any type of source infrastructure provision resources once a DRA event has been enacted or orchestration engine we automated the deployment of workloads in the target environment by scripting how machines will be provisioned under one console and remove the storage recovery and hypervisor requirements require from hypervisor different than ec2 machine conversion technology we use cloud endure for the hypervisor and OS configuration boot process changes and guests agent installation post dr modernization architecture benefits automation of disaster recovery we automated the deployment of servers by on AWS by assigning security groups subnets instance types of reserved IPS and tags for dr requirements single pane of glass for recovery we removed the storage fiber and hypervisor does remove complexity snapshots it's what 48 hours of protection we protected our environment with the capacity of lowering or RPL and our Till's by leveraging snapshots during replication so we replicate week I'm sorry we snapshot every 10 minutes for 24 hours and after which we snapshot every hour for 24 hours this allow this also allows us to rollback every 10 minutes up to a day if needed reducing the requirements or removing the requirements for backup restores what changes in D our testing for BankUnited remove the need for consistency consistently updating scripts to recover storage and VMs remove the dependency on physical Hardware next steps migration to the cloud we successfully completed our dr testing this year which included over 200 servers we are now made migrating off our on-prem datacenter with a lifted shift approach this will eventually evolve to a more native architecture take an event advantage of AWS in the process of the migration we're also starting region to region replication as workloads now become production thank you very much thank you on good stuff thank you alright so I wanted to point out some important information about our ecosystem for specifically around dr so we have i mentioned our APN partner network and our store our backup certification we've also got a dr competency and you see the is v partners that we have within the dr competency some of these guys have been doing dr solutions for customers for decades right some of them are more enterprising start-ups and then we've got on the go-to-market side on the consulting side we've got a whole stable of consulting partners who provide d our consulting services to our customers and all available with on our website so there's great capabilities across the board there to help you operationalize your dr strategy if you need assistance alright so wrapping up just a few final thoughts for you so using the cloud for backup and disaster recovery provides some compelling benefits we talked about those and an integral piece of the cloud is an integral piece of a modern backup and dr architecture AWS provides services and tools for building reliable backup in dr solutions to support all of your application requirements and last AWS is partnered with industry leaders in backup and dr to help mutual customers modernize and operationalize their backup and dr practices I'd like to thank you all for coming please take some time to update the survey in the application enjoy reinvent and have a great day [Applause]
Info
Channel: AWS Events
Views: 9,619
Rating: 5 out of 5
Keywords: re:Invent 2019, Amazon, AWS re:Invent, STG208, Storage, BankUnited, Caterpillar
Id: 7gNXfo5HZN8
Channel Id: undefined
Length: 58min 29sec (3509 seconds)
Published: Tue Dec 03 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.