Hello, everyone, welcome to the CloudEndure
Migration Factory Solution. My name is Wally Lu,
Principal Consultant from AWS. In the next 30 minutes, I will introduce CloudEndure
Migration Factory Solution and talk about typical
large migration challenges. And in the end,
I'll share some best practices and lessons learned
from our customers. Let's get started. So first,
large migration challenges. Have you thought about maybe migrate
1000 servers in about six months, or maybe 3000 servers
in about 12 months? To migrate one server
may be very easy, but scale really changes everything. As simple five minutes tasks
such as restart server, if you repeat that 1000 times,
that's 5000 minutes. A simple 10-step process
to migrate a server, again, 4000 servers
that could be 10,000 steps. We want to design a solution
to help you simplify the migration and reduce the number steps
for large migrations. So, this is the desired state. Ideally, we will design a solution
that's simple enough that you push one button today, you can migrate
all your servers tomorrow from your data center to AWS. However, every customer is different,
there's no one size fits all. In reality, there are a couple of things
we need to consider because for large migrations, we use different tools
to support migration and different customers
use different tools. You may have discovery tools,
migration tools, CMDB, and data in Excel spreadsheet
as well, you may also want use
the project management tools to manage a large migration as well. Now, people say things as well,
you may have lots of people, different teams
to support large migration, infrastructure teams,
cloud teams, maybe application teams,
and testing teams, and so on,
so many people involved. And, the third thing here
is there are many small tasks as well part of a large migration. For example, you may want
to check a C drive freespace. And you know, you may also want
to install agent on a source machine. And how about select a target
instance type for your servers. But repeat that 1000 times
is really a big deal. So, we want to design a solution
that is simple enough, but also flexible to help a customer
solve all these problems. So what do we do? So, let's revise change
the design a little bit, here's a revised desired state. What if we split a big button
into 3, 6, or 9 smaller buttons? So in theory, if we push
the right button at the right time, in the right order, we can achieve
the same result, right? Or even better, because you can add a new button here
or replace existing buttons to integrate
with your existing systems. That's even better. So that's what how we want
to solve this problem here. So, let me introduce you CloudEndure
Migration Factory Solution. But before we talk about a solution,
I want to spend one minute to quickly talk about
what is CloudEndure Migration. CloudEndure Migration
is a re-host migration tool, it helps you migrate
from anywhere to AWS. It was designed
for rapid mass scale migration and it is a block level
replication tool that replicates every single block
from source to AWS. And, it is also agent-based
migration tool as well. So that's CloudEndure. And, what is
the Migration Factory Solution and why do we need it? Just like any other services
and solutions we develop at AWS, we always working backwards
from customers to think about how can we design a solution
to solve customer problems. So this specific solution,
we try to use that to solve
the large migration challenges. So CloudEndure Migration
Factory Solution, or CEMF, it is an automation engine built to accelerate your
CloudEndure Migration using APIs. It is also metadata store to help you
save all the data in one place. You know,
you may have your server data, application data
and source data as well in one place
in a single source of tools. And the third thing here is,
we want to share with you is a perfect use case
for the solution is if you have more than 100 servers
to lift and shift to AWS and using the solution will help you
accelerate your migration to AWS. So from the solution
design perspective, we try to solve
two problems here. One is integration
as we talked about before, there's so many things involved,
so many people involved. We want to build a metadata store
that is able to integrate with everything
as a single source of tools. So like this diagram here, you can import data
from your CSV files if you want to. And you can import data
from your CMDB, use the same standard that’s API. Or you can leverage the same metadata
in a metadata store to automate a migration activities such as I want to install software
for all my servers in wave one, since we know which servicing
we want in metadata store to automate data,
it becomes really easy to do. Also, you may also want
to automate cutover process as well, instead of cutting our servers
one by one, we want to integrate
with clop into API, so that enables us to cut over
a large number of servers such as 20 or 30 servers together
using APIs. So that's integration.
Let's talk about automation piece. And since we have
everything integrated, that's easier for us to automate
across different tools as well. Using this as example here for CloudEndure
automation activities. First column is the build phase,
in a build phase we have three tasks. First one is check the prerequisites
and why this is important? Because you don’t want
to spend hours of time, or days, to do troubleshooting. What if something doesn’t work? Your application doesn’t work
in a cut of a window, you may have to spend a couple hours
to figure out root cause. But, the root cause can be easy as just
not enough free space in C drive. You can spend five minutes
to check a free space, but for the cutover that could
save you hours of time, right? That's really worse to do it. However, five minutes 4000 servers
is going to take you 5000 minutes. What do we want to do here
is write one automation script that is able
to check the prerequisites for all Windows and Linux machines
all together in the same wave. So, example here is for Windows. We check a C drive free space
dotnet framework versus TCP 443 to cloud into console and TCP 1500
to cloud into a replication server. So, you run that once
for all your servers in the same way. So now when your servers are ready,
you want to push an agent to the source machines
as well, right? To install one agent is super easy, maybe only three
to five minutes per server. However, if we’re talking
about 100 servers, things become
a little bit more complicated. Because you have Windows,
you have Linux, you may also have
10 different target accounts. Ten different
target database accounts means you get
ten different installation tokens, one for each CloudEndure projects. So, you may end up with 20 different ways
to install an agent on 100 servers. That's not easy. Even with the tools
like Ansible or SCCM, you still need
to figure out for any server, do I use method number one or 20
needs to agent on the source machine? Now using
the automation script here, we are able to push agent
to any source, any Windows machines
and any Linux machines. And also, we can push
to any target machines as well, using one automation scripts. So this is automation,
we provide a part of a solution, but as I mentioned before,
one size does not fit all, you can add additional automation
or customize automation to integrate
with existing systems, such as your password
management system, or maybe your CMDB to build
a fully end to end automation to support large scale migration. So that's automation. From architecture perspective, we can deploy the solution
to your AWS account using one automation
cloud formation template. When you deploy
the cloud formation template, it will deploy the front end
and the back end. The front end
is JavaScript application. The back end is lambda functions
and DynamoDB. We use Cognito
to authenticate with the solution. Even you have
multiple accounts to migrate, you only need to deploy the solution
once to your account and use that to support a migration
to multiple target accounts. On the left hand side, this is
the migration execution Server and Windows Server in your data center
in your AD domain. So we can use this server to connect
to your source Windows servers, use the remote
PowerShell WinRM protocol. Or we can also use the same script
to connect to your Linux machines using standard SSH protocol. That's the architecture
for the solution. Let's do a quick demo,
I want to show you how we can use this solution
to accelerate your migration. So, I want to show you
three things in the demo. First thing is import server data
to the CEMF solution instead of updating
the server metadata one by one on the console how to import data
from a CSV file. Second thing I want to show you is
how do we check the prerequisites and push agent
to the multiple source machines, both Windows and Linux
at the same time for the entire wave? And the third thing here I want
to show you is how do we do cutover? How do we launch
meaning server together instead of launching server
one by one from the CE console? So, let's get started with the demo. Now, I'm on a demo server right now. I'm using a demo server
to run automation script. And demo server is used to mimic
the source data center environment. So this server
is in a source ad domain, I can use this server to connect
to all my source Windows servers, using remote
PowerShell WinRM protocol, or SSH to all the Linux servers. So, let's start
with automation number one, import Server data
into CloudEndure Migration Factory. But before I do that,
I want to show you how do we normally do that manually,
so we can compare the two, right? This is CloudEndure console.
Now, we have two servers here. Normally, you can select one server
and update a blueprint. Blueprint
is the target instance information, you have to select instance type,
subnets, security groups, and save the blueprint one by one
for all your servers. Think about that, what if you have
to repeat that 100 times for your 100 servers, right? We will change that by using the data
in a CSV or JSON file and import all that data
into the migration Factory. Now, let's take a look. This is the CloudEndure
Migration Factory web console. We are on the resources page here. On the resource list,
we have wave list, application list and serverless. In Wave 1, you may have
three applications, 10 servers, wave 2 maybe 20 servers. Right now we will import data
from CSV to the Factory. Now this is my CSV,
we have four servers here, two Windows servers
and two Linux servers. I have full server information
including operating system, FQDN and target server information
as well such as subnet
security groups instance type, we can use information here
to update CloudEndure blueprint. Let's import a data into the Factory
by selecting the CSV. Within just a few seconds, we will have
four servers in the Factory, two Windows servers and two Linux servers here
in wave one. Now, another option
to get data into the Factory is if you have large data set
with server application wave together in a big CSV, you can always run a Python script
to ingest the data into the Factory. Now we have the data here,
what if you want to change something? You can always switch
to pipeline page and change the information, such as:
you may change your application from wave three to wave four if there's any delay,
and save application. And you can do the same thing
for the server as well by changing a server
from one subnet to another and save
the server information as well. Now we have the data.
Let's do automation number two. let's validate the prerequisites
on the source machine and push the CloudEndure agent
on source machine as well as the run our first
automation scripts here, which is zero dash,
pre requisite check. Let's provide a wave ID as a future
and CloudEndure replication server IP because we want to validate
connectivity from the source machine to CloudEndure replication server
via TCP 1500. So let's test that,
and first step is I need to log in into the migration Factory
with my username and a password. Now we have the serverless
for wave one to Windows servers
and to Linux servers. Looks like everything
is good for Windows and let me type a username
and password for Linux this time. Of course,
check different settings for Linux. And only a few seconds later,
we have a final report and tells you which server passed the checks
and which server fail. It looks like everything is good. Now, if we switch
to the resource list page here and future migration status, we will see something change
from the Factory. Now let's filter this. So for the four servers in wave one, the status change to test
prerequisite check pass, right? This is because every time
you run automation script, a set of feedback
to CloudEndure Migration Factory API to update status
for you automatically. So you always have visibility of the entire lifecycle
of your servers. And your migration engineer doesn't
need to spend their valuable time focused on status update because
this is all automated process. Now, next step is to push CloudEndure agent
to the source machine. Let's do that. As I mentioned before, this should work for any source,
any target, let's see. First, let me log into the Factory
using my username and password and now we have to provide
a CloudEndure API token. So, this is my CloudEndure API token,
and we paste it here. And we are getting a serverless, two Windows servers
and two Linux server, right? This means this works for any source,
any Windows and any Linux. We also have servers in demo two
and demo three project, this actually means
works for any target. So let's type a username
and a password for Linux. So the process start from the first
server in the first project. Since the first server is Windows, it's actually using remote PowerShell
to connect to Windows. If the next server is Linux, the script will automatically
switch to Linux using SSH
to connect to Linux servers. Now this will take a few minutes. In the meantime, I want to show you
automation number three, how do we cut over large number
of servers in a cut of a window? Before we do that, again,
I want to show you and then compare the differences
how we do things manually. Normally, on the CloudEndure console,
you have to update the blueprint, one by one,
that's the first step, right? Before cutting over any servers,
select the right instance type, the right stop and then
in switch back to machines and select the machines,
click a button, launch a server is in a test mode
or cutler mode, right? If you have 100 servers, you have to repeat that step
100 times, right? Think about that select 100 servers
out of 500 on the console and update blueprint one by one,
that's a big task. We want to change that a little bit,
because how we do it in the migration Factory
is completely different. Because we never touch the servers,
we always operate at a wave level, let's grab our API token from here and select a project name
just to a dry run first. And launch type will be test
with ID 3. So dry run does not launch
any real servers, dry run basically
validating your data. So we import the data from CSV, now this time we want
to validate the data, make sure there's no typos,
no invalid values in a CSV, right? You don't want to spend
your valuable time in a cutover of a window
to troubleshoot issues like typo. So we should do dry run a couple
of days or weeks before the cutover. Let's do a dry run. So as soon as click
the button launch servers, this will send the data to CloudEndure API
to validate the data. Now you either get a response like, dry run was successful
or dry run failed. So it looks like dry run
was successful for all the machines. Now we can change the dry run
from yes to no, to launch a real server. Let's do that, and launch a server. So similarly, this was sent data
to CloudEndure API, update blueprint,
check replication settings and create a job not for one server,
but for the entire wave. Now we have test job created
for machine two and three, right? Let's compare
with CloudEndure console here and there is a job
or my entire wave, wave three. So you may notice difference here is
I did not select any servers, right? I simply choose a project name,
and wave ID and click button Launch Servers,
whether it's a one server, 10 servers or 50 servers,
that doesn't matter, because we launched
the entire wave together. So this will help you
accelerate the migration by focusing on the waves and eliminate
some of the potential issues during the manual process. Okay, let's go back
and check the agent installation. It looks like everything
is good here, we have agents successfully installed
on four servers. Now if we switch
to the resources page here, we can see
some status change as well. Migration status for this full server
change to CE agent install success. And these two
is Test Instance Lauched, right? Similar to the previous script,
every time you run automation, or do anything
from the Factory console, we will update status
automatically for you. So you always have the visibility
of the entire lifecycle. Let's validate again using the fact
using CloudEndure console here. We have two Windows servers
in Demo2 project and we have two Linux servers
in Demo3 project as well, right? This actually means the script works
for any source, and any target. So that is the end of the demo. I just want to show you this one
to let you know that how automation could help you accelerate
your migration to AWS. Okay, let's talk
about best practices and lesson learned
from our CEMF customers. So, customer A large cutover
in just a few hours. What I mean by large? Have you thought
about cutover hundreds of servers in just a few hours? So that's exactly
what this customer did. They were able to cutover
600 servers in just a few hours. So, how do we do that?
What do we learn from this customer? Number one, minimize a change
is a key for large cutover. Change is a good thing,
but also sometimes is a risk as well. So it's a fair to say that
if you're going to change 20 things, generally, the risk is bigger
than just change one thing. So you may want to change
your computer name, AD domain or your IP address
of the server as well, right? However, what if there are
some legacy applications hot or cold IP
somewhere in the application, but nobody knows,
that's a risk. So for this customer,
they try to mitigate their risk by not even changing IP address to cutover the entire subnet together
and that's one thing. Another thing that's similar
is the networking, right? Ideally, we want to build an application-specific
security group for every single application
before migration. However, there are some challenges because maybe
due to the tight schedule, maybe due to lack
of knowledge of the application, we may not be able to do that
for large migration if you have a tight schedule. So, for this customer, they developed
generic migration security group to support large migration and push the application security
group design to a later stage, that's how they did
a large cutover in just a few hours. Next thing we learn
from this customer is automation, where the migration Factory solution
is also the key because you do not want
to coddle your servers one by one, you do want to bundle
the server together and launch your server together
to save you some time. And, last thing from this customer,
but it's also very important is make sure
your app teams are ready. And your application teams
are really critical, because any large migration is never
just an infrastructure project. We need to make sure we got the application team's
business unit involved as part of the migration. They are being in the same team. They are not just playing
a supporting role, they have to help us
do the application testing, and change of vacation
make a go and no go decisions as part of the large migration. So really important, make sure
your app team is a fully aware and support the large migration. That's for customer A. Customer B,
we have another customer scale from 10 servers
to 90 servers a week, actually scale from one cutover to three
migration cutovers in a week. That's about 90 servers. So, how did we do it and what do we learn
from this customer specifically? One is plan the migration wave
ahead of time. So, believe it or not,
large migration store, sometimes not because you don't have
the right team and skill set, not because you don't have
the right tools for the migration, you may have
a perfect team and skills and perfect tools
for the migration, but you may not have
enough servers to support it. So, what I mean by that,
you want to migrate 50 servers a week continuously for a few months. What if you don't have
that many servers ready to do the migration? And that's the challenge we find
sometimes for large migrations. For this customer, the finish away planning
for 900 servers ahead of time, so they were able to import
all 900 servers into CEMF. So that 900 servers
are ready for large migration, even with 90 servers a week,
that's enough for 10 weeks. So that's one thing
we learned from this customer. Second thing is automate the Server Data
intake process as well. So you may have server information
in an Excel spreadsheet, server information, in the CMDB,
in the Discovery tools as well. Now, try to avoid doing
manual copypasting from A to B
and merge the data together. One, that's not efficient. And two,
there might be a lot of errors during the manual activity,
copy pasting. So for this customer, they have a big wave planner
Excel spreadsheet in SharePoint. What he do is he logs into SharePoint
to do the wave planning, basically which server in wave one,
which server in wave two, that will trigger
a Terraform process. Terraform process, basically updating
the on prem firewalls and create a security groups
for the specific server as well. And that triggers
another lambda function to validate all the data
just to make sure there's no tables and it has
all the data ready for migration. And that lambda triggers
a second lambda function to import data to CEMF
ready for migration. So, as you can see from here, there's only one
manual process in the middle, which is someone logging
to the SharePoint updater, the wave planner, so that triggers
all the other processes. Everything else is fully automated
from end to end. That will save you a lot of time and avoid a lot of errors
during the large migration. So from this customer, we learn more automation
actually means less troubleshooting, less troubleshooting
means faster migration. Customer C,
they had a 1 GB DX, but they were able to migrate
500 servers in just three months. And remember,
they share that 1 GB link with the production servers as well. Now, what do we learn
from this customer? How they did it? Number one,
develop the end-to-end process in early stage of migration,
they did that. And don't wait till the last minute
and two days before cutover still trying to figure out who's going to install agent
on my source machine and who is going
to shut down the server and who is going
to change your DNS, make sure you develop
the process RACI model. In early stage of the migration, make sure everyone is fully aware
their role and responsibilities. So no question will ever be asked who
does work for the migration process. Next thing is they developed
a centralized tracking dashboard for everyone to track the entire migration status
just like this one. So you can save
maybe 15 minutes every day, you don't need to do
the status update meetings anymore. So everyone is simply
logging to the dashboard to see the status
for the entire migration. 15 minutes may sounds small, but if you have
20 people in the team, that's 300 minutes every day,
that's quite a big number. So, this really helps them,
gives them leadership visibility and helping them manage
an entire migration using a centralized dashboard. And the next thing is,
since I mentioned, they only had
a 1 GB directing that link, they actually developed
additional automation. Just as I mentioned before, we want to see
CEMF solution to be flexible. So any customer
can develop additional automation for their use cases,
that's what they did. They develop additional automation
to integrating with a CEMF using the metadata store, but to disable
CloudEndure replication in the business hours to avoid
any impact to production servers for the entire wave
enabled replication after business hours,
that's really useful. So that's how they could use
only 1 GB link and share that link
with production servers, but still able to migrate 500 servers
in just three months. That we learn from Customer C. So to quickly summarize
a large migration best practices. Number one, plan the migration wave
ahead of time, do not wait to the last minute
and still trying to figure out which server goes to which wave, make sure you have enough buffer
to support a migration, at least a couple weeks ahead
of the migration schedule. Number two is develop
end-to-end process and automation in early stage. Again, do not wait to last minutes
do trying to figure out who is going to log into a server,
to install agent, to shut down a server defined process
developed automation in early stage can help you with a large migration. And, next thing is automation
with CEMF is also the key. As we talked about one example here, more automation
actually means faster migration. Number four
is minimize unnecessary change. Change is a good thing,
but this is another wish list, right? Since we're doing
the reverse migration. Typically, the goal is to exit
data center with a hard deadline. Some app owners may want
to modernize their application from monoliths to microservices
or change to serverless. That's understandable,
but as part of the reverse migration, we should push that
to a later stage instead of part
of the migration schedule. Now, last thing,
again, very important. Prepare your application teams,
super important, make sure they're aware
they're part of the migration team. They are not just
supporting the migration, because without app owners, we will not have
successful migrations. So, that's all for today. That's takeaways for today's session
and some useful resources for you. First one is CloudEndure Migration
page if interested. Second one here is CloudEndure Migration Factory
implementation guide for you to deploy the solution
in your environment. The third thing here
is the best practices how you use a solution
for large migrations. We also have few other links here, like Migration Immersion Day
and AWS workshops as well. Feel free to take a look. And I thank you for watching the session today
and see you next time. Thank you!
Please complete the session survey