AWS re:Invent 2017: Migrating Microsoft SQL Server Databases to AWS – Best Practices (WIN301)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right good afternoon everyone hope you are having a good reinvent welcome to this session migrating sequel server databases tradeable is best practices and patterns my name is Prahlad Rao I'm a Solutions Architect a day two bliss with me today I have jury who is a principle systems engineer at Expedia I know that this session is between you guys and the beard so I'll try to make it as useful as possible for you so that you have some key takeaways from this session so this decision we are trying to focus on the Vaught and how to do it for example what are the things to consider when you are looking at migrating sequel server database to a SS what are the migration tools options methods and technologies rather than why so we are focusing on more on the wart and how rather than on the Y so the Y aspect is why are you needing to move sequel server to a SS so I'm assuming that some of you have already made some of the decisions to kind of migrate or at least looking to migrate databases to a SS so we will focus on some of the migration tools practices methods so that when you actually begin the migration you have some key points from this session to be able to apply to your migration so we are going to begin with taking a look at sequel on a SS services essentially what are the options if you decide to run sequel on a SS running on RDS versus running on ec2 instances or maybe a combination of hybrid architecture connecting on-premise tradable s so we are going to outline some of that and we are also doing some compare and contrast of if you run sequel server on RDS versus easy - what are the things that you get and probably loose as well so I trying to compare and contrast which platform might be best suited for your needs as with any Windows environment and sequel database environment authentication is critical so we want to make sure that we cover at least a high level of sequel authentication options when you're running on a tubeless so we'll take a look at that and then outline some of the different migration options when it comes to migrating data not only just migration you can also think about let's say you want to replicate the data from your on-premise sequel database to a SS maybe as a backup or maybe as a disaster recovery service or maybe even to make the database available for your data analytics team to be able to get insights out of your database so for any reason it doesn't have to be just for pure migration but will also outline some of their application options as well a SS database migration services so no four key component to be able to migrate data basis not only homogeneous database migration that means like-for-like migration but also at Virginia's data migration if you're looking to move away or migrate from commercial database engines to more open source maybe Postgres my sequel or even Amazon Arora we'll take a look at some of the key features and capabilities within the database migration service that has that is turning to be a key component for large customers who are trying to move my databases to - a SS the database migration services the capability of not only having to migrate from our migrate and replicate from on-premise to be a tubeless but it also gives you the capability to migrate backwards that means if you have a database running on a tubeless for any reason you want to replicate or migrate it back to on premise for whatever reason right database migration service allows you to do that as well so it's a powerful feature and capability that is really key to kind of migrate databases and replicating to a tubeless and then wheel outline the distributed availability group which is kind of the hybrid architecture combining arm promise sequence her with a tubeless how the architecture looks like and then we have jury from Expedia come and talk about their case study of setting up the distributed hybrid architecture on area bliss and on premise and then how they are actually running it what are the challenges they faced implementation practices testing methodologies and then some lessons learned as well finally Jerry will also show you a demo which is kind of simulating their existing environment of on-premise sequel server with the distributed availability group that's connected to a SS as well so that's the agenda for for the next ok so when we start thinking about different options on a SS typically when you first start with the migration project you'd want to have the same look and feel the same comfort the tools that you have been always used to managing your database for example running sequel on a SS ec2 is exactly same as running sequel on your on-premise on a virtual machine you see to this a virtual machine running on-premise on a virtual machine it's it's pretty much the same which means that you get all the flexibility and control of the database how you set the tap the complete control of the operating system that is running the database engine and any of the access and security and control mechanisms but as you begin to move databases to a tab list we have customers who start to think about ok now we have sequel on a tubeless that's fine I want to take advantage of the a double scale and capabilities so that I can move away from managing the infrastructure required to manage sequel server database for example underlying operating system patching upgrading even database patch as well right automated backups how do you manage backups with sequel server how do you restore how do you set up multiple availability zone that means active active active passive kind of sequel database on Ed bliss I'm sure that many of you are database engineers and architects in this room and you can imagine some of the pains and challenges when dealing with having to replicate sequel database maybe do a transaction log replication mirroring availability group setup and things like that how do you fail or apply the logs and there's this definitely a lot of work involved in setting up and managing those components that is where RDS comes into the picture in terms of a SS does manage all the underlying aspects of the database including replication mirroring and whatnot so that you can focus on the most important part which is optimizing your database for your application and also improving your database performance that should be the goal for users who are looking to migrate into RDS platform rather than managing operating system patching backups and whatnot and then we take one step further in terms of transformation again we have lot of customers who are in this different stages of migration taking from lift and shift which is easy to based all the way to optimization and transformation so transformation allows you to take advantage of cloud native database architectures for example Amazon Aurora my sequel Amazon Aurora Postgres they're all built natively for the cloud although it uses the underlying my sequel binary code and Postgres as well but Amazon Aurora was completely built for the cloud which means it can take advantage of the scale the distributed architecture of the a SS platform that allows you to scale at a lower cost and also being able to move away from some of the licensing that's involved with commercial database engines lot of our customers have also migrated or their relational databases to no sequel databases as well for example dynamic which provides you with with the scale for managing or for running your no sequel database environment so you can you think about the stages from lift and shift to optimize the transformation customers are at various stages in their migration architectures I just wanted to highlight the the pace of feature release at Amazon obviously Microsoft sequel server 2017 is a big deal right the good thing is yeah well yeah it can run on Linux how good it can be right so sequel server imagine sequel server Windows users well they can run sequel server on Red Hat Linux that's that's pretty awesome right so what we have done is in order to ensure that our customers can get the best experience and be able to take advantage of the a SS platform we made available the sequel server 2017 database engine on ec2 and RDS within two days of Microsoft announcement so just to mention that the pace of feature release that we are trying to make sure that our customers are taking advantage of the full capabilities of the aw splat forum we have made that available in all Ada bliss regions worldwide and support for larger instance type X 1 type instance is pretty cool a pretty good for running large enterprise sequel server databases and again with the flexible licensing option you can bring your own license you can use our own options or you can also go with a license mobility option through Software Assurance so all those features are made available to you today I just wanted to quickly spend a minute on this so relational database I as I mentioned IDs service provides a managed database customer experience which allows you to focus on application focus on schema design and database rather than on managing the underlying infrastructure operating system snapshots and backups it also gives you instant provisioning capability scale up scale down that means you go with an instance five and then maybe one week later you see that hey this database is not performing or not meeting the needs I need to scale up to add maybe more CPUs or more RAM you can easily scale up at the same way you can also scale down with within a matter of few minutes to be to allow you to actually adapt to your changing workloads in addition to that as I mentioned the licensing provides you with both bring your own license as well as license included which allows you to run sequel server environments cost-effectively on our RDS platform so let's compare some of the capabilities with RDS and ec2 obviously we don't want to force our customers to go with ideas all the time we understand that you have certain capabilities and features that are required to be able to successfully run the database in tune in alignment with your application so we want to make sure that we provide the best experience for our customers whether they want to choose with ec2 or versus RDS so some of the high level capabilities as you can see that the version supported IDs again provides a broad range of version support and I know the 2017 is not mentioned but 2017 is available on RDS as well different editions supported here when you when it comes to H a on RDS we use multi AC deployment with with Microsoft sequel server which means that you get to get the advantage of using multiple availability zones to set up sequel server on RDS and then underneath a SS manages all the replication the mirroring and then when the primary instance for whatever reason goes down the switch or is automatically it's automatic seamless you don't have to get up at whatever time in the night and then do a failover and and set up and apply locks and things like that everything is completely taken care of by a breeze when it comes to running the same on ec2 yes you can do that but of course you have to automate yet you can use always on me during lock shipping their various options it against dupe encryption is available for both RDS and ec2 so you get the highest level of security whether you want to do transparent data encryption or the complete block block level encryption using both ec2 as well as arias and both windows and sequel server authentication is supported backups as I mentioned our goal is to ensure that you as a customer you want to focus on the most important things backup is important which means the terriblest will manage the backup in terms of whether you want to take automated backups or manual backups and things like that you will be able to restore all the way back to the last five minute so that you can go back in time and then be able to restore at any point in time and then few other options around replicas so you can actually create read replicas for maybe read intensive workloads on our yesterday using DMS or maybe using even third-party tools like cloud basic for example on the ec2 side you can manage self manage and use that as part of your database engine environment alright so now we just have we took a look at some of the features with RDS in DC - how do you choose whether RDS would be good for it where ec2 would be good for it so the answer lies in usually what kind of environment you are coming from right what we would recommend is you probably want to look at running sequel on RDS first at least evaluate running Rd sequel and RDS and then see if it meets needs because it provides you with improved manageability at a much lower administration focus on tasks that bring value to your business to your application to the database performance rather than managing the infrastructure again and then typically we see often customers who are heavily developer oriented environment are and other startups they'll probably not have the staff to completely maintain and manage different database environments they might not even have data administrators in which case having something like an RDS would be a perfect use case because you're allowing a SS to manage that for you while you can focus on your code and application and some of the other important aspects obviously you might want to consider running sequel on ec2 for some of those reasons mentioned there if you want full control of not only the database but also the underlying operating system for whatever reason yeah ec2 is there when you are running RDS when you are running sequel and ideas you lose some of the control in terms of not be able to RDP into the underlying instance operating system so you won't have operating system access but really you need to you need to think you need to ask the question yourself do I need to have access to the underlying operating system for rot when I'm allowing a tubeless to manage that why should I worry about that it increases meaning we have we have often seen customers they say that's actually a best practice from a security perspective because we are not allowing anyone to actually log into the underlying operating system to change something for whatever reason so it's in fact a good security practice to have a managed database experience in that aspect but still for whatever reason you might you might just want access to the underlying operating system in which case you choose that and then you want to have control over maybe clustering replication whatnot because the way that you have designed your current environment is the way that you are comfortable with Nori's we are giving we are making sure that we provide the appropriate options to our customers there and then some of the features again so the features lack of which some of the lack of features on ideas is is kind of a moving target it might not be available today but maybe in a couple of months time some of the features are made available so at a table s we always betrayed some of the product capabilities and features so I would always recommend that don't try to base your dis on some of the features and capabilities unless it's heavily dependent on the application but rather look at from a holistic perspective in terms of what it entails to manage to run to do all the things right and then maybe some of the RDS the instance type does not meet your need you want maybe the highest instance type the lowest instance time what not in which case you can you can look at running sequel on ec2 so Windows NT authentication there are a couple of options you can look at when you're running sequel or at least Windows on uneri bliss you can take advantage of or managed a table as Microsoft Active Directory it's an enterprise ad that is provided again as a managed experienced or customers in which case you don't have to worry about setting up to all the Active Directory configuration on the underlying operating system managing a server and whatnot that again is a managed experience just similar to RDS you you simply launch that service and then you set up all the roles and domain authentication and then you are good to go right you can be up and running in probably minutes time however we also provide support for managed on-premise right meaning if you have domain controllers running on-premise and then you want to have a slave controller on a tubeless well yeah you can do that so ideas provides capabilities for both of that all right so now that we have taken a look at some of the the compare and contrast of RDS vs. ec2 when you are actually migrating the data again it's important to look at with RDS what can you import what can you migrate the data replicate what you probably cannot do so with ideas pretty much you get all the features of the migration options with ec2 except the large shipping and application and probably a hybrid architecture right I would imagine that something like a hybrid architecture would would be coming up so it's it's just a matter of time to make sure that we provide those key abilities and ideas but today if you're looking at migration options you have all those options to to be able to migrate data successfully to RDS platform from your on-premise and back as well both from RDS 2 on-prem and on primped audience alright so this is a busy slide but I'll try to kind of summarize some of the important key aspects here within each of the migration methods obviously backup/restore provides you with a complete backup dump of your database so you have your database running on-premise you do a back file dump that database back file onto Amazon s3 storage you spin up ideas and then you import the back file onto RDS and then boom there you go so it is able to convert all the objects essentially it's the back PI contain schemas procedures and and triggers and things like that however you'll not be able to do a cross engine that means if you are trying to migrate from sequel server to Aurora Postgres you can't do a back file and then important things like that that won't work import/export the same way similar if you want to do bulk copy or selective copy from your source database and grab that and then use that on ideas or maybe easy to you have some of those capabilities look at pay attention to the downtime as well right it depends on how much downtime you you have or the flexibility of your downtime keep in mind that more in most cases you have the application dependency and then if you are migrating from on-premise to a tubeless the application should also be migrated that's an assumption and then you probably will have to take steps to kind of switch over the application to the AWS and whatnot right so those are definitely additional considerations but you can depending on the size of the the database you're talking about at least ours for for backup restore as an offline method right similarly with import/export it can be many stores depending on how you want to switch or fail or and whatnot sequel lock shipping can be faster if you set the tab again do the initial seeding you can maybe do an initial backup but to to have the initial seed it's four large databases and then do the delta or or the wire so that when you're ready to switch or you just fail over and what not hybrid architecture again provides you that with with that capability with DMS though while you get minimal downtime you can achieve near zero downtime it also allows you to use both IDs and ec2 as a target that means you can use both ways and it also allows you to migrate cross engine which means if you are trying to move to Postgres or my sequel or whatnot DMS provides DMS with the combination of schema conversion tool provides you with that capability the other pattern although it's not very common is something like an application multicast so you have your application running and then you have the database running right existing database the database might be sequel or even for the matter any database it does not matter and then you try to do the same rights and updates to in parallel to another database may be running on a de Bresse so it's it's just that you're doing parallel rights to two different databases simultaneously again it depends this is more application intensive that means you have to configure or have the tap Legation run in such a way that it can update to both locations this also the challenges around how do you maintain consistency although you don't probably need to maintain consistency between the database which one do you want to expose to the user at what point in time how do you ensure the data that is written here is the exact the data that is written there so there are some challenges but it's not uncommon for customers to leverage that mechanism where they want to have their existing database running and they also want to have a different database engine than just right from an application perspective as well maybe do selective writes and things like that so these are the options in terms of comparing and contrasting different migration migration methods you you can see some of the ideas targets there I think the only target that's currently not supported is the the sequel lock shipping I would imagine again it's just a matter of time but with all the option you'll be able to easily take advantage of the IDS platform as a database target for you to easily migrate your existing sequel server database from on Prem or maybe even running on ec2 a few best practices that I want outline when we're thinking about migration right so most of it applies to both ec2 as well as ideas but I can I'll probably highlight some of the more specific things here when you're migration when when you have large database migration meaning the the database size is large running into terabytes and then you want to have a faster migration try to provision a larger instance size ctype and then the r-type are really well-suited for running large database workloads and then the new x1 instance type as well it's pretty awesome that you have a large memory footprint and larger network throughput for you to migrate faster bits and bytes to to address it applies to both RDS as well is easy to so look at provisioning a larger instance I at least for the migration period once you're migrated guess what we have the advantage you can always scale down to the desired instance tries to run your production environment so it's always faster you get higher bandwidth and whatnot and then we all know that disk performance is key for running any database workload so look at provisioning SSD based disks as CBS volumes and also try to size for anticipated growth as well so we just added we just added the the feature that allows you to scale storage scale up and down on sequel server which was not available until a few few weeks back we just added that capability that means you cannot only scale up and scale down instance type you can also scale up and scale down your storage capacity you can also scale up you can also move from let's say a magnetic disk to a GP - to an i/o SPI ops volume type that means guess what you have now more opportunities to look at right you don't have to worry too much about rice right sizing your database but rather what you can easily do some of the pattern here is now that the storage scaling is supported you can maybe write a lambda script right and then look at set up a cloud watch al-ahram on your database and then when the disk performance increases or it's close to whatever 70 percent 80 percent disk throughput rates set up a cloud watch alarm that triggers a lambda function the lambda function will go ahead and increase the instance size or maybe increase move from gp2 to PIOs increase the I ops on the EBS volume so you get a lot of potential and opportunities to kind of play around with the with your workload in terms of how the the workload is changing also for large data imports and heavy write activity given that multi AC up uses mirroring that means all the rights that are applied to the primary database is also being applied to the standby database in a different easy which means that there is an additional through additional performance impact on the target database even though you might not be using that for your reach and and whatnot but still the writes can become slower because it has to write and then there has to be a copy to the secondary database as well so again when you are doing the migration at least using a migration period set up a single AZ you can always change that to multi AZ and then when before you switch to production environment change to multi AZ and then there you go there are some batch file import limitations you need to be aware of at least this is only on the ideas not on easy to so the the max size is 4 terabyte with a single file I'm sure that there might be something more coming there and then for whatever reason if you if directly migrating to RDS is is a limitation for whatever reason maybe compliance maybe you're not comfortable with for whatever reason you can always do a staggered migration step one stage one migrate to ec2 you get comfortable make sure that you take advantage a tablet services when you're ready it's even more easier to migrate from ec2 to RDS so you can also think that as a migration phased migration approach some of the capabilities are on DMS in schema conversion tool is essentially to modernize and migrate right whether you are trying to move a heterogeneous database different database engines are even trying to migrate across different data warehouse architectures you can use DMS there as well I'll accelerate my talk a little bit given that we are we want to ensure that jury has time to do his slides and and demo so you can essentially keep your apps up and running and then set up the replication using DMS and then at any point in time you are comfortable you go ahead and switch the your application to the target database and then you should be good to go some of the capabilities are on DMS is you can actually load selectively that's another big advantage you have let's say you have a big database and then that database has years worth of data right you don't want to migrate all the years worth of data when you're when you're migrating to a tablet so you can do a selective migration based on time range based on date range you can selectively query the data and then be able to extract the data set to two variables during migration the provides you with a very powerful capability for you to move away from all the historical data that's residing on database but you didn't have time to clean up the database now is there opportunity when you are migrating you can do selective migration with DMS you can also use change data capture that means you can have the migration running the replication so ask the data is written to your primary database it gets updated to the the secondary copy so that everything is up-to-date and it's easy for you to fail over to your new database what else can you do you can fan in or fan out what you call so let's say you have many number of smaller database as a source and then you want to combine into one big database for whatever reason easy management you just have lower footprint of database lower licensing cost whatnot again you can use DMS to actually fan in from multiple database source to a single target as well fan out you have a large database you want to fan out into much smaller database for management and reducing complexity again you can do that with replication instance with with database migration service take it all or some granular selective filter based you can run sequel queries and then extract the data based on time range transformation rules filter rules there are a whole lot of options that allows you to granularly or selectively migrate your data to to a de Bresse some of the practice best practices around DMS again since DMS involves the replication instance where it connects to your source and the target database making sure that you have size you have size the replication instance appropriately to handle your data migration needs is important right against each instance type for large data set is very good the higher CPU memory we CPU count I think the faster number of tables that can be parallel loaded onto with variables from source to target by default it loads eight tables per instance per task you can always increase that but try to ensure that you're not overloading your source database because when you create a DMS replication instance and then set that to your maximum CPU core count and then you start their application guess what it will go ahead and parallely query your source database extract the data and then star replicating to target which means the source database can become overloaded so pay attention to when you want to balance faster migration was as better performance on his so steady database especially when it's running critical production workload and source right so just pay attention to that because DMS will consume all the the power and the bandwidth it can to go ahead and extract all the data it's also important to remove some of the bottlenecks and the target database one of the important things is to disable foreign key Chicks because when DMS floats a full data from your source to target it doesn't go ahead and validate all the for it the primary key foreign key constraints and things like that it'll just go ahead and pull the data and then load that during the full load phase but once it converts to a change data capture you can go back and turn on the foreign chick enable all the the triggers and other validation so that the change data capture can capture that right so it's important to disable some of the unnecessary things to reduce target to reduce errors when you're migrating using DMS so this is a high-level outline of the hybrid architecture using sequel server distributed groups which means you are essentially running an Ag one I will be group 1 in in on-prem and then this the second AG will run on a tubeless on ec2 so currently if you want to run distributed availability group as an hybrid architecture between on-premise to be able s you have to run that on ec2 a ids is not possible today so this is one of maybe one of the reason why you might want to run sequel on ec2 so you have your Direct Connect make sure that the network bandwidth is appropriate when you're setting up your distributed availability group try not to do that or internet and whatnot but just make sure that you have the necessary bandwidth the private Direct Connect connection between a tubeless and your on-prem so jury is going to talk more about how they're set up their distributed availability architecture using the hybrid on a SS and on Prem so I'll now invite Jerry to come on stage and talk about his lessons best practices thanks Judy thank you I my name is Jeremy Neos a principal engineer for Expedia I've been doing database engineering for a while with Expedia 2006 and lately been helping out with our cloud journey and it's it's exactly that it's a journey this is background slide I what I would like to do in this presentation is leave you with some things that you can take away and some of the things that we've encountered and maybe some understanding of the reason we went certain directions we are nowhere near as done but we have a plan moving forward and we're we're slowly implementing our phased approach using a hybrid architecture ok motivation for the hybrid architecture so what this does for us is gives us a lot of flexibility the we don't want to go all in and at first maybe I'll give you a little background on what our environment looks like we do run a lot of sequel server we will we've historically had large monolithic applications and over time in the 10 years or 11 years that I've been around it's slowly been breaking down into smaller micro services and there's not one migration approach that we're taking to get our data into the cloud sometimes there are applications that were not going to migrate other times and for this particular talk we are migrating one of the legacy ones as a start and we are beginning with a lift and shift to make it easy in that image where they we showed two particular clusters and and we're we've connected the two of them that's an asynchronous connection and we at some point would then swap roles once we have enough of our stack in AWS and you know you'll see in the in coming slides so presentation tier application tier being moved in bits and pieces with that distributed group the second cluster in AWS is serving read-only requests so for where it's appropriate parts of the application can read from the cloud nodes and the idea is we want AWS applications to be reading from AWS databases not trying to diverse back to the to the source on Prem data center and we're trying to avoid data transfer costs so in order to well the roughly breaks into three phases first being that we have our on Prem availability group which is our master in the accept rights and then like I said the ec2 in phase one would be just in in read-only mode the next phase would be to swap those roles and then send our rights to AWS and then you know just we're replicating back to on Prem and then in a layer phase we would then go multi region and then have you know we're no on Prem datacenter at all that's definitely something that we're looking at to try to control costs as we go so this is an image showing phase one real simple now with availability groups you have the option to configure synchronous secondaries and asynchronous secondaries within a zone or rather within a region for AWS we would have a mix you know we on the right hand side we're showing two different nodes one of them would be synchronous for high availability the other would be asynchronously updated for the scaling out reads you know if that's what we need to do and another key thing with this architecture and this hybrid configuration for availability groups is we're only sending data once so regardless of the number of secondaries you have on your second cluster there's not that many parallel streams of data you know traversing the wire it goes over once and then the second availability group reads from that that particular node so it's it's efficient as far as transferring data the next slide it's basically the same thing it's just that the direction of the data is going in Reverse in our phase 2 mode would be again the we're writing to our A to B but we're still maintaining our on-prem cluster should we run into a problem we could switch step back and know hopefully everything goes smoothly knocking something third phase looks just like the second phase except that it's two regions instead of you know when I'm from data center in an AWS region let's see so let's get into some of the risks and some of the challenges that we had well we'll talk risks and mitigation so risk one your secondary availability group needs to be up well first of all it that's another risk that's coming up but the if we have a network bottleneck and we can't apply what happens on the on from data center we can cause problems not only for stale data in ec2 but you could have issues where the the primary transaction log starts filling up so what we did to mitigate that risk is to do extensive stress testing and we we did that over months now through the stress testing we found a number of things one was there were opportunities to tune the network and increase our baseline network throughput and I've got some slides on what we changed the other thing that it helped that helped us do was to find issues with the version of sequel so availability groups is a newer feature currently we were running 2008 r2 where you can't do that so we started out by migrating from Oh 8 to 16 on Prem and so that we could leverage availability groups another thing that we found was when we well we tried to do this with sequel 2014 and that's not quite as optimized as sequel 2016 with regards to the availability group feature things like parallel redo we had issues with secondaries following behind and by using 2016 and testing that thoroughly we were able to demonstrate that we could run like 5x our expected production traffic and then not overload the secondaries and we watch that quite a bit yeah test test test that's if there's one takeaway it's set it up and test it and it was arity and it also through that stress testing we found some bugs you know the new version of sequel worked with Microsoft they had already had some issues like we found access violations a couple other things but we we came up with a recipe and we know what patch level we need to be at to make it all work here are some performance counters so we watched in our testing log bytes flushed per second that's you know essentially how much is the database changing with availability groups there's a concept of queues there's a send queue it's basically data that's that's on the primary that's not yet been applied to the secondary copies and then a redo queue we're recovery queue in the same concept there blogs that have been sent to the secondary but they're not applied yet so if you ran a query it would it would be behind and we mitigated these risks by again stress testing and just like turn it until it breaks and then is that adequate or do we need to scale more or come up with a better solution and I can say that we also found opportunities to tune the database and you know rethink what it's doing for instance there's sometimes logging that your application does to the database that could be going elsewhere or maybe you know that it sort of depends you know application by application but we reviewed things in case we want to be on a different platform someday in the future we wanted to sort of simplify what the database does so things like the change data capture feature within sequel we are using but we don't need to be necessarily so in addition to the cloud migration it's it's thinking about what can we change so that when we're in AWS it'll play nice or will have you know hopefully fewer issues here's some perfmon charts we regularly collect the perfmon counter is running through the pel tool and that's what's generating these charts and the pel tool is available from CodePlex still I think it's under a different name now and here we're looking at a redo q for example you know ideally that's gonna be zero or near zero if it's not or if it starts you know ever increasing then you want to look for you know resource bottlenecks and figure out what's why is it taking so long do I need more network bandwidth or bigger instant size those types of things hybrid strategy will come to a halt if there is an AWS direct connect failure yes and mr. wibbles everything fails all the time so you have to look at where we needed to have redundancy and our infrastructure to protect against stuff like that worst thing that could happen would be that breaks and then transaction logs fill up on the primary and then you know we're we've got a mess so that's another consideration third risk again everything fails all the time we expect ec2 notes to fail from time to time or have to be prepared if and when that happens so what we're working on now is maturing our scripts and automation in order to detect obviously when notes fail and also be able to rebuild a node quickly so it's a little bit complicated with sequel compared to a platform like because you've got server level objects and things that you need to manage if you're using sequel agent jobs for example if you build out another node you need to bring those along with it so we've done projects like developing clone processes to store all the stuff almost like metadata for an instance and then be able to rehydrate it on a new build and if everything goes well this will we could have self-healing clusters maybe develop a service that watches for the health of the nodes and then you know automatically auto scaling group you know generates a new ec2 and then automation joins it to the cluster and then it starts the seeding process until it's ready that way we're prepared if there's a failure and we already have redundancy in that we would be in multiple availability zones it's not that we would expect one machine to just stay up all the time but it's that you know having to get a DBA to do to take action in the middle of the night you know obviously we want to have good salad but you're automation and those are some of the projects that are going on right now so we are trying to prepare for failure essentially okay implementation guidelines best practices so based on our stress testing results we can dial in on what instance type we should be going with we're trying to pay a lot of attention to cost we don't want to over provision although yeah there are options to scale up and scale down instance type this particular set of applications that were starting out with these are production booking path super critical super latency-sensitive that's why we're taking a very cautious approach and and we're it may be that if we you know take a snapshot of an EBS volume that we're upsetting the application or we're causing some end user you know to have a frozen screen while they're trying to buy something and we obviously don't want bad customer experiences but we also don't want to over provision if we don't need to and I imagine that there will be other automation projects to automatically right-size it assuming that we've got enough additional capacity within our availability groups we need ec2 instance types with better network performance and we did evaluator RDS we are currently leveraging some features of the sequel server that don't or not currently available and RDS and we can revisit that later you know once they become available another thing we ran into is the use of dedicated hosts so this is mostly due to cost so a sequel server licensing is not cheap and and it's gotten more expensive so until we're on a different platform what we can do or if and when that happens I don't know but by using dedicated hosts you have the option to well also we're trying to bring our own license we've got you know a tremendous agreement with Microsoft and when you deploy a dedicated host meaning that you're the only tenant that's on you know the the hardware that making up the VMS or making up the ez twos we well we expect to avoid noisy neighbor situations but also it's it allows us to just license the physical cores on those dedicated hosts and that adds up to quite a cost savings so if you're concerned about managing costs you may want to look into that and as mentioned this is database so you're pretty much always going to be running into storage or you want to ensure that your storage can keep up so a provision die-offs would be the standard except for cases where everything can fit into memory or that's low right activity or that type of thing so next Windows settings TCP auto-tuning level received lotto tune level should be set as normal and that slides a bit hard to read but it we reviewed receives I at scaling stage chimney offload state net DMA state direct cache access this these were pretty well normal next slide was the one for other NIC settings that we changed this is interesting we've initially found in our baseline testing that we could push 2.4 gigabits per second over our Direct Connect once we were done making changes to ec2 network settings we were able to get that up to nine point five so making sure that we don't fall behind and that the secondaries stay up you know approaching it through network tuning as well as reducing the amount of work that the database has to do by getting rid of certain features book together helped us out quite a bit and we looked here at interrupt moderation disable jumbo packet 901 for priority packet and VLAN or yeah packet priority and VLAN disabled and receive and transmit buffer at 2:04 it okay current state so we're still on our phase one we have mostly we have a couple production nodes that are serving read queries we obviously will this application is a number of machines it's actually a charted database platform so we've moved a couple shards over to get started we will continue with that and then work closely with Microsoft in the event that we uncover any more additional bugs although that was that happened several months ago during initial stress testing we will be refining automation and building more resiliency as we go I mean I think of it as we've got a build process it takes X amount of time how do we get that down and you know keep chipping away at it to get it down the you know the seconds or well minutes instead of hours you know I'm excited about the Linux or the opportunity to run sequel on Linux and I think that's going to play nicely with the future automation that we're gonna do and it certainly you can install sequel a whole lot faster so let's see and then once we're fully built out and provisioned and we trust that we can do all of our functions and monitoring's working in the backups then and well yeah so the this is very complicated application and we won't be able to go to phase two until several bits of work happen in our application layer and there's maybe like 30 dependent services just on this one set of shards so we were at the mercy of our application teams to get those changes implemented before we can consider sending their rights to AWS but but that's that's the roadmap so let's see okay I have a demo it only takes about a minute it's really simple and see if this works okay so at the start is no pause real quick all I'm doing in this demo is I'm using management studio as a client I have a distributed availability group that's configured in our lab environment so this top section here represents the objects locally in our on-premise data center and then this bottom one is an ec2 machine that's running and it's it's connected in that the way that the architecture diagram showed you know the availability groups so the top one basically is going to update the asynchronously update the bottom one I've done just a quick little script here showing that or everything's gonna go in one shot so when I hit go it's gonna first query a value in a table locally and then remotely then it's going to update some data then it's going to repeat that query locally and remotely and then what I'm looking for is that bottom value to reflect that it it's in sync with them with the unfirm data center so it's pre-recorded so I know it works yeah so I'm just tracking what the what the names are in this case using a linked server connection from the on-prem to the remote it was just easier to do it in one window I have a table that's called system value which has a column a column called system value value and the first part of this is it's just gonna update it to one you know as I know I've got a consistent starting point then I'm gonna go to system value value and I'm gonna change it to two now if I hit go I expect that the the second one but you can see that the local one changed to two but the the remote one did not so how come and wasn't exactly expecting that at first but it makes total sense the way this grip is running it's it's firing those queries right after each other so it basically sent the update and then ran the query on the remote host before it had a chance to get it so there is you know some latency in this action and that's a good demonstration of that so in this case I just reran the query and it showed up so it actually did the same update twice and because it's an update it didn't change anything so I mean that's it there's some work involved in getting the let's go back here oops yeah we're back yeah okay do it at the end because I just want probably to complete this slide and that we can we can just take the question just one slide so thanks Jerry I just want to also highlight some of the related sessions that you might be interested in maybe some of the sessions have already passed or you might have attended some of the session but these are related session that will be hugely useful to you there's a workshop for migrating Microsoft sequel database create a bliss and then we have a breakout session of designing so we talk about migration strategies in this session and the vin 306 we'll talk about design now that you've already migrated how do you design optimize running sequel on a tubeless as well so that's the the next phase I think that would be useful to you and then we also have a chalk talk on multi-region sequel server clusters using different database regions using H and D are what are the key things to consider there that's that's again in chalk talk where we'll do a whiteboard and have a Q&A and things like that more interactive kind decision so I highly recommend that you attend that I think there's a repeat session on Thursday evening and I'm also be will be in that part of the chalk pack as well and then finally if you are interested in more deep dive on directory services and how to kind of integrate directory services with other a SS sequel server and Windows environments when 403 is is pretty much very strong there on on deep dive patterns so just take a look at your schedule and see if those some of those sessions interest you also just request you to complete the evaluations of the on the session hope it was useful to you for you to take take back away some key learnings that you can apply to your migration projects or even looking at considering hybrid distributed architecture as well with that thank you very much and maybe yeah we can take the portion thank you [Applause] good good question so I'll try to repeat the question for everyone so the first question was what's the recommended instance type for sequel server and then the second question so I'll try to answer the first question so the answer is again meaning it depends on how much what what's what's the kind of database size and and what kind of readwrite activity you are trying to expect whether you want to have maybe a read heavy workloads versus write heavy and things like that typically what we have seen with enterprise customers that get the best out of sequel server are running instance types that are more compute intensive and memory intensive that have more that has optimized network bandwidth higher the network bandwidth the better sequel server performance because guess what when you have higher network bandwidth it also has a dedicated network bandwidth throughput to your EBA storage given the TB a storage is a network attached storage connected to your instance having an EBS optimized higher network throughput performing instance type always performs better because it has two network paths one network path for your user read/write activity and then one back-end network path to your EBS volumes from your ec2 instance the newer instance side the X 1 e is is awesome we have had extremely good feedback from customers on the x one instance types so I would highly recommend that you start with x1 first and maybe look at C and R but see our look at network throughput and then any of this instance type I don't think you can go wrong your second question was EBS volumes in terms of how do you so you traditionally with sequel any database even with Oracle database equal database you always have that DB files index files log files whether you want to have different drives and things like that different storage and things like the right on either place it does not really matter significantly because you still have one Network channel to the EBS volume just because you have three EBS volumes doesn't mean you have three Network channels to EBS it is still one Network channel to EBS volumes which means it is going to share the read write to the backend EBS volumes so it's not significant that you need to separate out indexed MDB and things like that bird yes yeah so that is I think definitely a recommended option if you have temp DB and then the instance type supports from aerial storage then go with it it always gives you better performance because instant store is locally attached the instance as supposed to EBS volumes which is network connected so you get zero cost because you are not paying for the instant store right it's it comes with the instance type so there's no cost and those are all SSD volumes which means you get higher performance and those are directly attached so all three is a win-win situation especially for MTV yep sorry I just missed the last part well yeah meaning certain instance type does support and certain does not so I guess it's you just have to carefully see whether you really have a strong requirement to use instance store for Tim DB for your database performance in each case here to look for instance type that supports instance yeah what kind of jobs yeah sequel agent I think that is one of the request from large number of customers to be able to get that feature we don't have that capability but I depending on customer prioritization and need we might we might just add that I think there is some support I think you can run some jobs if you if you script it if you want if you generate the T sequel but it's not as friendly as you know what you would be used to the client-side yes this SMS and and other things of that but if you're if you're running you wanting to run some jobs and things like that as Jerry mentioned it's not really as intrusive as you can do with ec2 on a sequel because you don't have all the necessary permissions on RDS to be able to achieve that functionality and yeah in certain jobs that would require stuff like access to the OS or command shell or you know it it's a testament yeah see if it works yes do you need to be in the VPC to support ad authentication the answer is yes the our DSN is essentially RDS runs in V PC right okay oh the classic the old classic okay okay yeah I think you'd probably need to run that in V PC because Microsoft ad are you trying to use the managed Microsoft ad uh near a place or domain controllers the managed ready yeah yeah I think yeah with classic given that it's it's an older platform we are trying to ensure that the customers can take advantage of all the the good features yes correct yeah all right I think we are probably out of time so thank you again very much and hope you enjoy the rest of your day thank you very much
Info
Channel: Amazon Web Services
Views: 3,666
Rating: undefined out of 5
Keywords: AWS re:Invent 2017, Amazon, Windows, WIN301, AI, EC2, RDS, Database Migration Service, Database, Hybrid, Migration
Id: 9hUVcH48eLg
Channel Id: undefined
Length: 65min 6sec (3906 seconds)
Published: Wed Nov 29 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.