AWS Database Migration Service (DMS)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] [Music] [Music] [Music] [Music] [Music] all right thanks guys so hopefully everybody is wrapped up with the workshop we're gonna go ahead and do our next session here which is going to be on database migration service or DMS we're also gonna be talking a little bit about our schema conversion tool it's also cut SCT we mentioned these in the first session I fully appreciate the fact that I've got the challenge of speaking to you guys right after you've had a nice big meal and you've finished up your workshop so I I'll try and keep you as engaged as I possibly can as we go through this so so you've thought about moving to the cloud that's great we welcome you and for some of you you may have the the fantastic opportunity to build an application from round zero so you're gonna start using cloud native capabilities immediately and you don't have any legacy baggage that you've got to bring in however for a lot of customers who are looking at moving into the cloud they're faced with the prospect of data that they've got in existing systems and they've got to find out how they're gonna move it into the cloud and so the questions that they ask us are you know hey how am I gonna get the stuff up there you know how am I gonna make it to where while I am doing this transfer there's a translation of the data from one location to another that my end-users aren't gonna be faced with either long periods of downtime when I've got to take the entire system offline to make sure I don't corrupt anything or make it to where they get nasty error messages because something's not up and running while I'm doing this this transfer also you know once I've got it up on premises I mean once I got up in the cloud I may have systems that I've that I'm still gonna be running inside my own network that need access to data and I could have it go over the Internet to get access to the data I've moved to the cloud but sometimes it makes a lot more sense to have that data right next to the tooling but I just said I'm moving my stuff to the cloud I can't just have it two words in two places or can i and then of course you know the next set of questions are if I've got like multiple kinds of databases how can I get it to where I can homogenize my platform so I really don't want to have a whole bunch of different engines that I'm running my stuff on I'm running into training difficulties or I'm having issues with guys who have to get Reap lat form when they're doing there and you know different applications also we've got customers who are asking us we're running Hannukah database platform our licenses coming do we really really don't want to reo our licenses we found that it was a very expensive process they're very punitive they audit us at the drop of the Hat can you guys help us move on to another database platform and make it as easy as possible for us and then the last one is questions where customers are hearing about artificial intelligence and machine learning these are concepts that really thrive on top of data like platforms and the classic data like platform in AWS is in the form of s3 but hang on a minute I've got a database how can I take a database and turn that into something that's going into our simple storage service or s3 service so what we're gonna be doing is talking about a couple of services are gonna help that so once upon a time when it when you're talking about moving from one location to another or from one platform to another it was going to be a really long and fraught process there were lots of third-party tools out there to manage this for you but they were really expensive it was very anxiety-producing because I've got a production system I really really can't afford to screw this up I really can't have downtime and it's kind of taking me a long time to transform assets where maybe I'm taking advantage of some proprietary features of the database platform that I'm using and now I found myself faced with the prospect of having to rewrite my triggers or my stored procedures or or my indexes or something like that and so what we did was come up with two services we called them a database migration service and schema conversion tool and really they are what they advertised to be the database migration service is a service that we offer to you folks it will handle the job of moving your data from point A to point B we're gonna dig into a lot more of the capabilities a little bit later here but the the big winning point for DMS is that it allows you to not only move your data from one place to another but it also gives you the ability to move it from one database engine to another which is a really really cool feature of it now next up if we're gonna be talking about moving from one database platform to another another another database engine is the schema conversion tool and the schema conversion tools purpose in life is to give you a way to take your existing data definition language your tables your indexes your triggers your store procedures and show you how you can move them into another database platform and in many cases that can actually take care of doing that translation for you so when would you use these two services I mean really we kind of break it down into three categories there's modernization there is migration and then there's replication so let's dig into these a bit so for schema conversion tool when you talk about modernized what we're referring to is when you've got old existing database platforms like say db2 db2 is a great database platform it's been around a long time but if you go looking on indeed or on career builder for people who have db2 talent there are a lot of guys out there who can do that these days they're now using open source technologies like my sequel like Postgres so what I want to do is use a schema conversion tool to help me take the existing database I've got get it off of db2 get it off of Sybase and move it into platforms like my sequel and Postgres that the current generation and talent is much more likely to have access to the next one is if I have a data warehouse so let's say I've got an init ISA or a sequel server data warehouse and I want to move it to something that's more cost effective doesn't have licenses associated with it I can move it to Amazon's redshift that's our data warehousing platform it's compatible with PostgreSQL so if you're familiar with Postgres you can use redshift to calm their store so it's ideal for doing data warehouse and analytic solutions so if you're looking at SCT it's a desktop tool it's got a lot of UI features for showing you how you can navigate through all the tables that are in your source database it lets you pick and choose the things you're actually going to do conversion on and what it will do then is convert things like you know your indexes your DDL for your tables it'll it'll convert your triggers it'll convert user-defined data types and it can be Department it can even go all the way down to converting user defined functions and store procedures which is like crazy black belt level of ninja magic what we actually do is we build an ABS abstract syntax tree on your source database we know what those object structures look like and then what we do is depending on what target system you're actually having for your migration we'll convert it into whatever the language of that yet platform is now when you do run an SCT report you know from your source database what it's going to do is analyze the database it's got to determine the degree to which you are leveraging proprietary features of that database platform that may not necessarily migrate terribly well towards your target platform and incidentally we're gonna be providing PDFs of all the decks so you don't necessarily have take pictures of the slides but you can if you want to it'll also give you a rendering of what the estimated level of complexity is so when you're really lucky and you're just using the tabular keno language of your platform you're gonna find that it's gonna be generally a hundred percent conversion if you start leveraging you know store procedures or user defined functions really heavily that's when it's gonna start going into the orange and possibly even the red so showing that yes there's some additional work that you have to do and we'll talk about you know what we can do about that in just a moment now we do have another tool that's called the workload qualification framework this is actually something that's easy to installable and in fact let me show you guys this for a second let's go into my browser and pull up ec2 console you can actually launch an ec2 instance that's got the workload qualification framework and SCT installed so if we do workload qualification well it's always a gamble when I try to spell in front of an audience so oh pardon me I have to resign in grant will always expire right before you give a presentation so let's go back to ec2 let's make this bit larger and let's zoom in a bit so let's go ahead and launch a new instance and we'll look for workload qualification and if we go into the marketplace there it is right there it's a workload qualification framework this is actually an ec2 ami or Amazon machine image it's got the schema conversion tool and the workload qualification pre-installed and then what you can do is you pointed at a source database and it'll actually render a report telling you hey this is level of complexity of that migration and it will even help you estimate what the amount of time it's gonna take it'll actually help you derive action items so you can have a roadmap for what that conversion is going to look like and we generally you know design that to help you guys you know plan out what your migrations actually gonna look like now all this is well good I've talked to you about a lot of you know reference documentation and the tooling itself but sometimes it's really really helpful to see what those who have gone before have done and give you like an actual formulaic playbook to follow and to that end we've actually published a whole bunch of migration play books that you can follow for your own my database migrations and there's a link here at the bottom of the page and like I said earlier you'll get PDFs of these slide decks so don't worry about transcribing that but you'll see we've got guides for how you can go from like sequel server to Aurora or Oracle to Aurora or Oracle to my sequel in this case and the whole idea here is just give you an idea of what the guys who have gone before have done just help you actually plan your own migrations and make it a little bit easier and it gives you nice checklist style things to follow just so you can actually do a bit of a divide-and-conquer with your own resources to perform the migration now the next thing that SCT will provide is if you're doing like a data warehouse migration data warehouses are a special beast they tend to be a lot larger than your standard relational database we're talking about going from gigabytes to terabytes in size to possibly even petabytes in size well it just so happens a redshift can handily handle a petabyte scale databases but the issue here is how am I going to get all that data into the redshift data warehouse in the first place well we've got something hold the data extractors and what these will do is you actually run them on premises right next to your data warehouse and it will pull the data out of your sports data system whether it's Netezza oracle server whatever it will then upload it to s3 into a format that redshift can ingest very very quickly redshift actually has phenomenally fast ingestion capabilities by virtue of the fact that it's a node based service you'll actually have a bunch of parallel redshift nodes in your cluster and they can actually pull that data in from s3 in parallel which means they can actually do it blazingly fast so when would you use database migration service well it's right in its name it's all about the migration from point A to point B and what you see here on the right-hand side is this is this is I don't think the canonical list of all the sources that we support but it's most of them so you can go from all the the source databases that we have there Sybase you can go from my sequel from db2 Oracle etc and we support you know sending that to a services such as our own native Aurora you can send it to Aurora my sequel Aurora Postgres you can send it to redshift and you'll also notice that we've got a little icon there for s3 so if you want to take it to where you've got a relational database and you actually want to have that going to s3 so you're going to do some AI ml or you're gonna do some elastic MapReduce or Hadoop style processing on your data you can actually have that data extracted and sent into s3 and then use something like hive to define it as a set of sequel tables so that now my elastic MapReduce job has got the ability to crunch those numbers without having to saturate all the connections that I've got on a database the other things that database migration service can do is not only migrate your applications from AWS from from on Prem into the cloud it can also do things like schema schema of Sydney version upgrades let's say I've got sequel 2008 sequel 2008 s coming end of life you've got to get off that pretty soon here now so if you want to migrate it to a new version of sequel database migration service will handily take care of doing that for you and as you'll see in just a moment it's got a way for you to make sure that your users will be none the wiser that this migration has ever occurred and then of course if you want to reap lat form let's say you've got Oracle you want to send it to Postgres you can use database migration service to copy all the data add that Oracle system and uploaded it into the Postgres you know database oh and then last the last bullet point there is on top of the ability to go to a database excuse me file based systems like s3 you also have the ability to go back and forth from say a relational data store or to a non relational data store let's say you've got a you know a sequel server database and it's turning out that really this is very key value oriented I've always got to retrieve a record for a given customer why am I taking you relational database for this this would be perfectly fine in a non relational system such as dynamo DB or you can use DMS to handle the extraction of your data from that relational system and put it into that non relational system and get yourself not only lower costs but also a much higher performance by you know putting it into a platform that's actually better suited for the task that you have at hand at the end of the day what we're trying to do is give you guys choice we want to make sure that you guys are able to use the right engine for the right workload and make it very very easy for you to discover what that is and make it to where it's as painless as possible to try that out and then lastly you know once you actually perform your migration you've moved it from one database platform to another or one version to another DMS also supports the ability to validate number one whether it looks like this thing is going to go well in the first place it'll actually validate whether or not you've got a proprietary construct in your source database that's not supported in your target database it'll also do things like check data types so let's say my source database I've got like a Geo locator type that's not necessarily available in the target system it'll let me know ahead of time that whoa this migration is probably not going to go so well because you have a column here that's not supporting your target database but then the other thing it'll do is after it's done the actual copy of the data you can have it do a post migration assessment where it actually goes in spot checks all the data that it's copied over to make sure that it's actually intact in the target database so it gives you a measure of confidence that yes this thing actually did do what it said it was going to do for me now there's another use case that we've got to talk about let's say I've got our really large database let's say I've got a database that's in terabytes of size yeah I could use the database migration service to send this over the wire over the Internet to the target database that I've got running an AWS cloud but depending on how fat the pipe that I've got to the Internet is how saturated that pipe is in the first place because maybe it's a really busy pipe right now I may find this guy take a very very long time for me to do that migration and maybe I've got a workload that I need to get migrated in order of you know days or or weeks as opposed to you know maybe taking a few weeks well what we support is the ability to use database migration service in conjunction with Oh ball so what you'll do is you'll take a full dump of your existing database just make a native backup of it you'll upload that to a snowball device we'll look at one of these in just a second you will then send that snowball device back to us upload that to s3 and then you'll do a restore of your database from that native backup and then you can use DMS to actually do a trickle charge we'll look at the replication capabilities in just a second but I mentioned snowball the magic word so for those of you who are not yet familiar with it this is a snowball device it is basically a ruggedized chassis that is just stuffed chock-a-block full of storage we've got a little Kindle device that we that's doing the actual mailing label that's on it and what you'll do is you go into the console you just tell us hey I'd like to start a snowball job you can indicate how many snowball devices that you want to have sent to your offices we'll then ship this thing to you upon receipt the that little Kendall label there we'll switch over to our return address and then what you're able to do is plug the snowball device into your network it's got up to 10 gig Ethernet or copper connectivity so you can plug it into whatever modality you've got then it's got a client tool that looks a lot like the s3 console so if you've looked at the s3 console in the AWS console that's what the the snowball transfer tool looks like and then what you do is you upload it into the snowball device now while you're copying the data onto the snowball device it's actually getting encrypted onto the device and this was keys for us security is always job zero and so we make sure that even if some bad guy were to get a hold of the snowball device wallets in transit from your data center to the AWS cloud some bad guy would not be able to get access to any of your data because it's going to be encrypted with AES keys once we actually receive it will then start the process of actually moving the data off of the snowball device we dump it into the s3 bucket that you designate when you start the the snowball job and then you've got the ability to do whatever you're going to do with it so if you're gonna do a database restore you can now start a native database restore on RDS if it's s3 you know some queries whatever yes so he so the question was s3 buckets have a limit of 5 terabytes and he's saying if I've got a snowball device that has 80 what am I gonna do with that so s3 actually doesn't have a limit of 5 terabytes that's what s3 is limit is is how how big an individual object can be so so with snowball what you'll do is you'll just make sure your individual objects are gonna be less than 5 terabytes in size and there are a bunch of different ways you can accomplish that you can zip them up you can break them into individual pieces that sort of thing so yes it's not a problem okay so now this last thing that I was just talking about I've kind of teased this a couple times is replication so one of the really cool things about database migration services not only can it do that one-time copy from point A to point B from your source to your target database it also has the ability to what do what I call trickle charging so you can have it to where I've got that database let's say I've just restored it from my snowball import I've just done that native backup I've restored a copy of my database up in AWS cloud but in the meantime while the snowball was in transfer even if it only took 24 hours to get to AWS data probably went into my source database while this was happening and I want to cut over to my new database am I gonna abandon all the data I've gotten the source and well no of course not I want to make sure I get caught out well how am I gonna do that if I gotta take another backup no it's not gonna work I'm never gonna catch up what database migration service can do is leverage something called change data capture depending on which database platform you're using it's using a slightly different modality to do it but at the end of the day what it's doing is it's doing a delta it sees what's on your source system it sees what's in the target system it knows what is different from the source and then target it'll pull just those changes over to your target system and keep it up-to-date so why do I care about something like this well obviously first is this allows me to make my cut overs very fast right so now it's just a cname transfer I just tell everybody hey you're no longer pointing to this database you're now pointing to this one and I know it's up-to-date so my downtime is gonna be kept to a bare minimum ideally nothing but the other thing that's really cool is the scenario I was talking about earlier let's say I've got analytics tools running on-premises and I've wanted to have access to a copy of the data yeah I could again go through the process of doing a backup and restore but I'm always going to be behind the times if I do that with database migration service once I do that cut over now my production database is running in the cloud but I still have a copy for local analytics maybe I'm running a MapReduce on on-prem and I don't necessarily want to go to the Internet every time I want to get access to the data that's got my database instead what I can do is use DMS keep the on on Prem system up-to-date with my production system that I'm running in the cloud and I'll still be able to do my analytics on it another really cool use case is like Devon test let's say you know I've got a system where really the only reliable way for me to test my code is on a really really large you know database one that's actually really close if not nearly identical to what I'm running in production well in practice that's pretty hard but if I'm using something like database migration service I can just clone the system that I've got and keep it up to date and that way I can actually have my developers testing code on something that looks a lot like the production environment giving me a way to know that yes I'm actually going to meet the performance bar even when I'm running at the scale that I need to impress you know as far as security is concerned like I mentioned earlier in AWS security is always always always job zero if we ever find for any reason that security is not where we feel it needs to be we will stop everything will not produce any new features we will not produce any new services we will fix whatever that security problem knows and so with DMS what we've done is we've baked security in from Ground Zero you have complete control over permissions you can set you know define who's got the ability to create DMS jobs run DMS jobs look at the status of them you've got the ability to turn on SSL encryption so the data that's going from point A to point B can be encrypted at on the on transit incidentally database migrations service is a HIPAA compliant now so just make sure you do enable SSL Torino transmission of the data and as long as you're choosing a target database excuse me yeah target database engine that's also HIPAA compliant you're going to be covered under HIPAA and you can get a baa from AWS anything else did I didn't include oh yes then the last thing is if you want you can also leverage the key management service to make sure that once it's on the target system that it's actually gonna be encrypted at rest as well using kms oh yes who had the question yes oh yes oh so he's asking about data masking so data masking is a new feature we're gonna be adding to database migration service in at the end of the day what data masking is it's let's say I've got a source source database and a field of it is some data that I want a guard I want to make sure that people don't get access to it unless they have the ability to do it what kms was gonna be allowing us to do is actually leverage a feature where while you're doing this migration of the day you can say hey this this particular column I've got in the database I'd like to make sure that this data is actually encrypted DMS does not currently support it but the roadmap is to have that feature alive within the next 12 months I'm sorry what was the question so the question was do we also have column level encryption while doing DMS this is related to the data masking feature that's going to be coming online within the next 12 months so you're like so the question was if it you know like in date lake or doing you know rc2 parque that sort of thing DMS does not natively support that if you want to do conversion like Oh our situ parque or non parque de parque then you're going to be using a service like glue so glue is actually optimally you know set up to do that kind of conversion for you alright so let's talk about what customers are saying about using database migration service now this is not the complete list of all the customers that have used DMS but what I can tell you is to date we've done over a hundred and ten thousand databases using DMS and those are just the references we've actually heard from customers about I'm sure it's actually a little bit higher than that and these are just a few examples of some use cases where people have used database migration service to a you know to migrate their own production systems and we're going to dig into a couple of these but you can see we've got some pretty big names in there like you know Verizon or the Department of Veterans Affairs so let's look at a few cases where people actually successfully used it so this is a company called tremble they're a logistics company and really what they had was an existing Oracle system they were starting to kind of stretch the bounds of what they could do with their on-prem Oracle and they really didn't want to read up their licenses with Oracle and so they came to us and they asked us you know hey how could we take what we've currently got running in our environment and get this running inside AWS and so what we did was we enabled their ability to take that Oracle system and use Postgres instead of instead of Oracle it just so happens to be that Postgres has a very similar DDL and a very similar programming language to Oracle so for in their particular case it was a pretty straightforward for them to do that on top of that they also got a neat capability of RDS which is the ability to do master and slave configurations we're going to be talking a bit more about master and slave stuff multi availability zone RDS tomorrow but I'll give you kind of the Cliff Notes version of it what RDS offers is the ability for you to have two copies of your database up and running once considered to be the primary and others considered to be the secondary any transaction that goes into the primary is synchronously replicated to your secondary and if for whatever reason your primary becomes unavailable we detect that and then will automatically cut overall database operations to that secondary and it becomes the new primary this happens on the order of seconds so your users may see a blip while you're waiting for the for that switchover to occur but by and large it happens almost transparently to your end-users while they're using your application and it gives you much much higher RTO for for keeping your applications up and running and so last you know the the guys would tremble yeah they I didn't mention it earlier but they were also in the United States and United Kingdom and you know one of the issues that they were having was the fact that they were geographically distributed well it just so happens that when you light up with AWS you get global scale at the press of a button and so what they were able to do was to leverage the capability of running inside an EU region as well as inside the United States and what they found was while it it wasn't like completely free because there were some changes they need to make in their system the ROI the return on the investment the amount of time and money they had to spend to actually perform the migration and change the things that the scheme and conversion tool just couldn't do automatically was more than paid for by itself within six months just because of the fact that they didn't have to re-up that Oracle license so all in all is actually very cost effective for them and really this is this last little quote from Todd Hobart is just talking about you know envision where you want to be and we will work with you to help you get to where you're trying to go we actually have a thing here at AWS we call it working backwards what we want to know is what is your desired end state what is it that you want to be able to enable what are the use cases that you're trying to aim for we'll work with you to identify the services or make whole new services to help you get where you're trying to be um so here's just another case this is sis aid they're a helpdesk system and basically what they wanted to do was you know they had a couple of different database engines and they just wanted to consolidate it they wanted to be one single database engine and so what they're able to do is take DMS and then what they wanted to do is just move that to running on my sequel so they're using the my sequel RDS instance as opposed to Aurora my sequel but for them this was a slam dunk they got a lot you know uptime they were able to handle version upgrades a lot more seamlessly and what they expect to do is over time they're gonna do six times more migrations you know in that amount of time as well now gumgum is a case where they want to do analytics in the cloud they had a fairly complicated you know system that they were running on Prem and they wanted to leverage all the analytics and machine learning capabilities that we have in AWS but they just it wasn't feasible for them to just take every single thing they've got in migrated to AWS at least not yet so they're faced with this issue you know I want to leverage the capabilities made of us I'm not quite ready to move my production systems up there how're you guys going to help well in this particular case what we did was we use the replication feature of database migration service to handle the copy of that data that they have on the on Prem systems up into AWS cloud they then get complete access to all the analytics capabilities that we have inside AWS and really just by virtue of the way that the change data capture works in DMS they had the data that's being stored up there just being replicated near real-time so this is just another Oracle comparison this is smile brands they are a you know dental hygiene and the whole idea here is that they had an existing Oracle RAC system so RAC is a Oracle system for high availability on Prem but what they wanted to do is move this into a multi availability system that they multi availability zone system that they had running inside AWS and again in their case they found that the Postgres going to be the logical replacement engine for their Oracle system and so what they did was they used schema conversion tool if they analyzed their existing system it's bad out through the report with the the suggestions for the results and then it moved everything up into into AWS using database migration service this last one is a scale-up so this is a case where they had an existing system and the issue was that you know they were starting to stretch the limits of what the existing system was able to do they wanted to be able to move it to a larger system and and it just so happens Aurora is ideally situated to handling very very large databases so what they did was they use database migration services and basically did a engine to engine copy where it's the same engine basically my sequel Oracle support us give me a raw report both my sequel and the Postgres engines and what they're able to do is just have DMS handle the data migration from those my sequel RDS instances into an or Aurora system with a Flipboard well in this particular case what they wanted to do was they had an RDS system but they wanted to move that into from an ec2 classic system into V PC so for those of you who are you know who haven't been around super long ec2 classic is what the original AWS used to be it was before there were V pcs and everything was publicly addressable and we didn't have this this concept of being able to set up a virtual private cloud or V PC boundary around your machines these days everything is living inside of e PC if you spin up a new box it's going to be inside what's called the default V PC which has behavior just like what we call a DC - classic however you've got the ability to define your own custom V pcs that have their own internet connectivity their own you know security gateway you know feat a security group features that sort of thing enabled however they had this system that had been up and running since before V pcs were around and so they were faced with this prospect are we gonna keep running an ec2 classic can we afford to take any downtime to rehydrate our machines well with DMS they didn't have to take that kind of downtime hit they just spun up a new RDS instance they used the exact same schemas that they were using in the ec2 classic machines and then they just copied all the data over to the to the new machines that were running inside of V PC this last one is a split migration so in this particular case you know they had a set of systems and it just so happened that they had a few things that simply could not run inside RDS so they the one to continue to run it on you know on VM based images but they did want to move some of their data over to and into Aurora and so what they did was they actually a you know you leverage database migration service to copy their data from the on-prem systems into Aurora and then into my sequel on ec2 but at the end of the day what this allowed them to do was save about 55 percent by moving into Aurora Aurora has again this much much lower cost profile than some of the commercial database engines that you're that's out there and also it's designed to run at scale when you're an internet facing property so how does all this stuff I'm just giving you a whole bunch of customer references let's actually march through what a migration looks like in practice so first thing you're gonna do is you're gonna point the schema trans the schema conversion tool at your source database and it's gonna take a look at what your database currently looks like and then what it will do is generate that report and give you an idea of what you know this the scope of those changes are going to be it will actually create the database schema on the target database for you and then you can actually set up DMS to actually start the actual database copy and hydrate the new database system now one thing that's really cool and I mentioned this a couple times earlier is that with database migration service what you'll do is you'll leverage the ability for you to actually start the replication it'll start putting everything into the new alias database you'll pick whichever databases or whatever tables that you want off the source database into the new system then what it will do is start the transfer over the wire hydrating your new database once you're confident that all your data has been up and running inside your sort your target database system what you can then do is basically a cname swap you just tells your database Apple or your database dependent applications that hey this was the connection string for my database now it's going to be this and ideally if you're able to deploy that change to all the database to all your application servers very quickly your end users are never gonna know that you just made a big platform switch from one to the other and if they do know maybe a single transient error message or something just because it happened while they're waiting for their their DNS queries to time that or something like that now if you're looking to go to multi availability zone what you can do is you'll set up a multi multiple agents of DMS inside your V PC and what will happen is one is configured to basically take over for the other in the event of something happening like a failure what you'll do is you will start your transfer just like we did before and then what'll happen is it for some reason the primary agent that was performing the transfer we're to go down that secondary agent will then pick up and keep everything replicated or continued the transfer for you so the whole idea here is that especially for replication if you're just trying to make sure this thing is going to continue to happen no matter what leveraging the ability to deploy multiple of the end agents inside a multiple availability zones is really your key to success so when it comes to moving data from your source database to your target database it is table by table so you do have the ability to say hey everything inside the schema but you also have the ability to say hey I'm just going to do this one this one and this one and what it will do is start copying them on a per table basis into your target database system and once it's actually in the system I mentioned earlier the whole concept of change data capture well actually happen is the DMS instance will start watching your source database and as transactions are flowing into it it knows the last synchronization that it did with your target database and any new transactions that come in it will then replicate those over to the target instance to make sure that it's kept synced up and I should point out that the synchronize a replication feature it is not a two-phase commit so the replication is not guaranteed to happen with your most recent transaction what it's doing is it's looking at things like the transaction log once a tribe's that transaction has gone into the system it'll replicate it over to the target system so it is eventual consistency in the fact that it's just not gonna it's not guaranteed to happen as soon as you do commit transaction on that source database but it's real soon after that transaction got committed so what else can you do so this is great I've been talking about just going from database to database what if I've got a whole bunch of databases and I just want to consolidate them I don't want to run them on a whole bunch of individual machines anymore well DMS has got your back you can have a single database server and just have it to where three-day basis for databases whatever your limit are go on to a single machine or a single RDS instance to host those databases for you so it's a great way for you to realize some cost savings especially if you have a whole bunch of little tiny databases out there I've got a customer who had like 50 WordPress databases that we're all running on their own individual machine out there and what they did was they can they were and they were very all low activity there were marketing websites and so what they're able to do is have a single RDS machine and they just moved all those databases into a single RDS instance and VMs took care of doing all that stuff for them and then all they had to do in their WordPress databases the database name didn't change they just changed the address that the actual server was inside the connection shirring settings however there gonna be times when maybe I've got a mondo database and I need to go ahead and split that out you know I may have you know a really over labored you know box that's hosting you know Oracle or sequel server and people classically will put tons and tons and databases on a box and now I've got to start trying to scale up on that box because it's got a handle of that kind of load maybe I can find that it's going to be more cost-effective if one of those databases is like the really high-traffic one and the other two are actually fairly low traffic let's split them all apart let's have a large RDS instance that's handling the high traffic database and let's split the other two out into individual boxes that I can have running either as say no sequel databases or running on small you know instances like you know yeah RDS for a sequel or whatnot now I mentioned earlier the replication is tabled by table you've got the ability to say hey these are dead tables they've been in this kheema forever we never use them none of our current code uses them why do I need to replicate it well it once upon a time with commercial tooling it was an all-or-nothing type thing you said I want to migrate this database and it would copy everything over if you did a database backup it would copy the entire database with DMS you do have the ability to say hey there are only certain database tables that I want you've got the ability to set up regular expression rules to say hey I want to only do databases that are inside the schema or I only want to do database tables that start with this string or maybe I want to give it exact explicit names I want to do this table this table this table DMS gives you the ability to pick and choose what tables you're actually going to be copying over and then lastly let's just talk about you know how you can do either you know like two like or like two different so in the case of say you want to move from Oracle to Oracle because you're just really happy with Oracle for some reason or say you want to move from sequel server to my sequel just because you'd like to get out from underneath that the sequel server licensing or Oracle licensing for that matter or like I said earlier you're you're starting to investigate building a day late so you want to take the existing system they've been running inside an Oracle data warehouse and now I want to go ahead and put in an s3 because I can save a lot more money storing things in s3 than I can in a mando eight terabyte database right s3 is going to be on the order if you know cents per gigabyte as having to run a giant you know ec2 or RDS instance were Oracle and that is pretty much it for the session so what I wanted to do was open the floor for questions that you guys might have and then what we can do is cut over to the actual workshop so you guys can start working on it okay so we've got some time on here so go ahead okay so let me make sure I got this you had a multi terabyte database they weren't Rafer snowflake or snowball snowball the chassis yeah yeah so and then and so what was the in the remaining question was so they weren't ready for using snowball okay so the question was you know I've got a fairly large database maybe for whatever reason I'm not able to use a device such as snowball to get my data from point A to point B what are my options well so first and foremost the simplest option is to just have a nice big fat pipe right so if you've got a hundred megabit or a gigabit connectivity from your wham provider ten terabytes I mean that's not the end of the world that's a couple of days worth of transfer assuming that it's about the only thing that you've got running over the wire but it'll it will happen on the order of you know 50 to 60 hours of transfer time if you've got a Gigabit link it'll definitely happen however you know that's not your only option you can actually take that same database let's say they did a native backup of the database and they're just trying to find a way to transfer that sucker up to s3 what you can do is leverage s3's ability to do multi-part upload so this is parallelization of the upload streams and you can have it to where it'll actually upload that data and it'll go as fast as your internet connectivity will allow it to go so you don't necessarily have to use DBMS for this you could just use a native backup and restore once you've done that I still would strongly recommend using something like DMS is replication to cats in the database up after that migration after the that transfer the database backup has occurred but yeah you can certainly make it happen that way once upon a time there was a service called AWS import/export that's that's no longer an option that was this was literally the it was basically snowball v1 we allowed you to like go buy a Seagate or a Western Digital hard drive that you plugged into your lap tarted to your laptops or your desktop that was running your database and then you actually sent that that a USB drive to us and then we copy data off these days we don't support that anymore or we tell you to use snowball snowball by and large is a very compatible thing I'd actually very very interested to find out what it was specifically about snowball that rendered it unusable for this particular use case so security so what was a concern but they were plugging into their own network okay so so the in this particular case there was a there was a concern about the security of transferring the data onto the snowball device maybe because it was on a secure system and they were concerned about the ability to get it onto the snowball I get it so so but yes if you're not gonna using a device such as a snowball device then really your remaining option is gonna be doing it over the network and incidentally I also left left out another option while you of course can send your data over the Internet you can expose your RDS instance to the Internet and make it make it possible to transfer it that way you also have the ability to set up what are called Direct Connect links so you've got your internet connectivity which is going to be the the IP LAN connectivity you've got from your provider that's 18t level 3 whatever but you can also order a link that goes directly into your AWS cloud what you'll do is you'll go into your AWS console you can order a Direct Connect link it's actually a work order that you place what that will do is generate a letter of authorization that you can hand to your van provider and what they will do is they'll drop a link into a Colo facility that's hosting Direct Connect ports and then you actually have to generate that letter of authorization to give to the people in the Colo facility because they need the authority to go touch our cages and then what that will do is they'll take the drop that's in the MeetMe room in the kola' facility they'll establish a cross connect inside the Colo to the port that's that AWS is offering in that Colo facility and then bang you're now connected into your V PC you set up something called a virtual interface and now you've got 1 gig 10 gig connectivity into a database cloud and it is your dedicated link there's going to be no other traffic that's going over it so you get very very low latency just because of the fact that there are a lot lower number of hops you also get much more reliable throughput cut is it is your link if you are concerned about security you can even set up a VPN connection over your Direct Connect link just to make sure that the the data is actually encrypted wallets go over the link not above and beyond the fact that it's your private link so you've got a couple of different options if snowball is not an option it well it goes into a port that we expose in akola facility and if you actually go to the AWS console you'll see yeah correct it's yeah it's a NIC if we have a list of Cola facilities we have Direct Connect port availability and what you can do then is order a length from your wham provider to drop into the same Colo and then you do a cross connect to that to that port and then you're up and running and you've got a virtual interface into your V PC so it looks just like an internet interface connection in general when you were setting up a Direct Connect look connectivity we do recommend that you set up to write because you want to make sure you've got high availability and in general you know you can do that a couple of different ways you can either have multiple drops into the same Colo and then your kitten's requests that they'll terminate on different racks to just make sure a power outage on a rack doesn't affect you or if you like you can actually have at we've got a Direct Connect that goes to the East Coast and one that goes to the West Coast just to make sure that say a meteor strikes Washington DC you'll still have your connectivity Tyria DVS cloud all right question oh yeah so what the question is if I choose a a target engine that is not multi AZ can I make it multi AZ after the fact and the answer is yes any RTS instance that you spin up in the AWS if you started out as single availability zone and you decide you want the high availability characteristics you can go and modify the instance and what we'll do is we'll spin up a new secondary will copy your data over and then you're up and running okay so the question was what's the difference if I start with single AC and then I convert to multi AC after I do a migration or if I go ahead and spin up a multi a Z RDS instance and then start the migration what's the difference between the two really at the end of the day it's gonna be you know while you're doing the transfer you're when you do multi AZ you are paying for two instances right you're gonna have a master in secondary so you're paying for that secondary while it's up and running so there is a small cost factor associated there on the other hand if you do the transfer onto your target to your target instance and then you convert it to multi AZ there is a period of time where we actually have to copy the data from the primary on to a secondary so it's really a mix-and-match it's you know you have to pick pick which scenario suits your use case better right if you want you could do the full transfer to your primary as a single lazy and then flip the multi a-z bit afterwards and just wait to do the cname swap yeah it's right yeah yeah and it and a lot of these also depend on the size of your database right so if you've got a really really large database it may make more sense to just go ahead and say hey I'm gonna make this multi AZ now just so I can get it up and running and I don't have to worry about the length of time it takes me to turn on multi AZ on the other hand you know it it really you just you just have to try it out and figure out which one works best for you guys so all right yes okay so the question was if I've got a database that's running in GCP how does that compare to doing my data transfer if i'm starting with an on-prem database again this is largely gonna depend on how big the databases so when it's running on Prem you do have the option of just taking a native backup and then using something like that snowball transfer scenario how many people here in the room have heard the whole comparison between a a station wagon full of tape drives versus a 10 gigabit connection so one person has so really the thought experiment is which has higher higher throughput a 10 gigabit fiber link or a station wagon that I've crammed full of tape drives what's that a fiber no it isn't totally not the fiber like it is that station wagon I can drive that station wagon from coast to coast and it will still have higher throughput than that 10 gigabit link and that's simply because just the storage density of tape drives and etc etc I could make it a hard drive center tape drives the analogy would still work so the whole concept with snowball is that I can move a whole bunch of data very very quickly and I can leverage very fast connectivity on-prem and it's got to be much faster even if I've got a 10 gigabit link the 10 gigabit Direct Connect link it's still gonna be faster when it's on Prem just because it's a lot fewer hops and I can saturate that pipe where as opposed to if I'm going over the if I have to ever Traverse someone else's infrastructure inevitably I'm never gonna get exactly you know what I've been asking for there's always gonna be a little bit of overhead for traffic management and whatnot so ok so getting back to your original question if I've got GCP or I've got on Prem which is gonna be better again it does depend on how large the database is what I would say is in general it's probably gonna go faster to go on Prem 2 to AWS just because you'll have to do the extraction of the database from GCP to get into AWS however what I did want to say this is not your question but I wanted to make sure I said this because it's really really important to all you guys I've been talking a whole bunch about how you can take stuff from on Prem and move it into AWS you can use these exact same tools to take it right back out of AWS if for any reason you feel like you're just not where you want to be it's maybe you're not getting a performance you wanted maybe you feel like it's more than you wanted to spend we're all about your freedom we want to make sure you guys have choice we want to earn your business you will find there are very very you one-way doors and AWS so the exact same tooling that we're talking about here I can use it to move to Aurora I could use it to move to Postgres for my sequel on AWS you can use that exact same tooling to go the exact other direction we want to make sure you guys feel confident and secure with your choice and if for any reason you find that it's not gonna work for you we'll make it entirely easy for you guys to move back out that's why we're using open-source tooling like my sequel and Postgres that's why we're using those protocols is that we want to make sure you guys don't feel walk in by vendors we fully appreciate there are a lot of vendors out there that have some really punitive licensing terms and they and you can really feel like you're being boxed in the minute you sign on the dotted line with them that's not the way we want you guys to deal with us so I don't know if that completely answer your question I hope I got you there yeah so yep okay yes so yeah when you're when you're doing a heterogeneous migration from one database to engine to another schema conversion tool will actually generate a report in fact well let me do this here real quick I'm gonna pull up a copy of schema conversion tool that I've got been writing you actually and when you get to the to the workshop you're gonna be working with the same tool but I'm gonna give you kind of a a sneak peek at what it looks like so I've got the schema conversion tool here and you'll notice while it's not scaling terribly well on-screen hopefully you can see the little red blotches so what this is doing is it's indicating that hey there are features in this database whenever you see one of those little red splats on the schema tree here what that's indicating is that there is something in this schema that's using a feature that's proprietary and may not necessarily be convertible from that target data from that source database to whatever target you've chosen and in fact if we look at the report view what it will do is when it's doing the analysis of the database you'll see towards the bottom here it'll actually give you an estimate on you know what it thinks can actually transfer successfully over to that target database so when you start seeing things like these orange bars or whatnot what that orange is kind of telling you is that yeah there may be some manual labor involved you know to do that conversion while the schema conversion tool is you know black belt ninja magic there are some things that even ninjas can't necessarily solve and as human beings have to kind of come in and try and help solve that now all that being said if you guys feel like you don't have the expertise or the bandwidth to help you know to do these kind of migrations we've got partners that will help you guys out or we even got our own professional services division who will come in and they can do a scoped engagement to help you with that last mile as a conversion so if there's something you're finding just doesn't convert over we're here to help you out in whatever way we possibly can all right so that answer your question [Music] so yeah so what the schema conversion tell me just give me what the database migration service will do is it'll she's asking if there's a way to validate the conversion of things like stored procedures so what the database migration tool will do is it'll validate the the copy of data it does not do like logic validation or anything like that so in that certain in that kind of case what I would recommend you do is you know build your own test Suites so I don't know if you guys have built like unit tests for your database for your a stored procedure logic or whatnot but that would be your safety net is okay is this thing still behaving the way we expected it to from the get-go yeah all right so uh let's see we got time for two more questions okay yes okay great so really good question you know what's the difference between running something on ec2 versus running on Aurora so spoiler alert we're actually going to be talking a little bit about this in the first session tomorrow morning but I'll give you another set of cliff notes so RDS is a fully managed database engine what we're doing is we'll install the operating system will install the database engine and then we own everything from security patches to database and point version upgrades on the database engine the only thing you need to do on RDS is you give us your DDL that defines your tables and your indexes and your store procedures and you give us of course the data that's gonna run on top of it everything else is completely being managed by AWS for you especially if you're using like multi availability zone features because we'll take care of doing all the failover we're taking care of making backups all that kind of stuff we make it so it's trivially easy if you want to restore to a point in time in the RDS console you just say hey I want to restore as of yesterday at noon or something like that if you're in on ec2 you can still do every single one of the things that I just described it's just you'll be doing much more of it as opposed to AWS doing I'm pointing to means AWS for the mine my corporate master will be doing much more of that so you'll be responsible for making sure you're doing the operating system maintenance that is things like patching you know patch Tuesday comes around make sure you install the applicable patches on your box you'll be responsible for making sure that your optic that your database is fully optimized you'll make sure that you've got some kind of you know agent job running to do backups of your database all those things so really it's just you know where do you want to have the slider as to you how what's where does my responsibilities and as opposed to where AWS responsibilities ends so that's also going to largely depend on the database engineer using so in the case of things like you know say you're running on oracle or sequel server there are times when you can do what's called a bring your own license where you can take the license you're running on prem and migrate it to the cloud in those cases running on II's that you can actually be very very cost effective because you're leveraging the ability to migrate that license from point A to point B but if you're trying to do like $1 $2 comparison there is there's a whole TCO or total cost of ownership discussion to be had because it's not just the bottom line for what it costs to run the service there's also how much you have to spend on the overhead of maintenance and and the people who are actually Ryan today base engine what RDS allows you to do is have your database engineers instead of focusing on you know operating your databases they can now look at optimizing your databases so at the end of the day it's it's pretty hard to give you like a hard number on what's gonna be less expensive because there are a lot of factors that have to go factor into it sorry I hope that answered your question so yes you had a question so you're so you're asking about it zero downtime migration and you're you're preparing to do a cut over and you're worried about so yeah if you've got something like an auto increment or identity column on your database in those situations it's probably going to be you know a case where you may want to go ahead and schedule five minutes ten minutes of downtime to just guarantee that all the connections things strings get properly propagated over your trap location servers so that is one of those edge cases where you have to be a little more cautious if you're using something like goods or something like that it's not as big a deal but if you're using something it's an auto increment column then you do need to be a little more careful okay so this brings us to the end of the DMS talk and what I'm going to do is pull up the deck for the workshop
Info
Channel: Amazon Web Services
Views: 28,681
Rating: undefined out of 5
Keywords: AWS, Amazon Web Services, Cloud, cloud computing, AWS Cloud
Id: zb4GcjEdl8U
Channel Id: undefined
Length: 60min 49sec (3649 seconds)
Published: Mon Feb 18 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.