[MUSIC PLAYING] VENK SUBRAMANIAN: So you
know any migration is always complicated because migrations
aren't a thing that we do as a business, right? When you start a
technology business, your goal isn't
to do a migration. So it's a rare occurrence
in anyone's lives, and it always brings with
it anxiety and a lack of understanding of what
we need to do to really be able to push this ball forward. So we're not here to tell
you about what technologies to pick, how to avoid that weird
GKE load balancer issue that happens every now and then. But rather, if you're
an engineer here, if you're a manager here,
if you're a leader here, each one of you today owns a
piece of your business, right? You're no longer
just doing work. You actually own a
piece of your business. So our goal here
is to teach you, as an owner of that piece of
the business, what can you do to plan and execute
effectively on the piece that you own. And also how you are
able to collaborate better so that, as
an organization, you can manage something that
can be pretty large and complex and a fair amount of unknowns. So the way we're
going to do it is give a little bit of information
about who the key players were here, what Google brought to
the table, and who Unity is. And we're going to talk
about the why, the vision, what was our partnership, and
what was the goals of what we were trying to achieve here. And then, finally,
we're going to give you details about the approach-- what we did, how we
did it, what worked for us, what didn't work
for us, and hopefully, takeaways that you can
walk away with that will help you make your
next migration a lot easier. SOM ROY: So as we
start, I just wanted to quickly talk about
Google Professional Services Organization. There might be some other
folks in the room who are working with your TAMs or
your PSO team on the ground. If not, you know,
PSO's mission is to help customers get the
most out of Google Cloud. And basically, we go to market
in three separate pillars. We have our consultings
on migration services. We have our Technical
Account Managers, the TAMs, which some of you are
working with already. And then we have our
training and certification. So these three together forms
our Google Cloud Professional Services, and our
aim is to work almost as an extension of your team and
get all the feedback back over to our product and engineering. So as we start, we'll talk
about the Unity and Google Partnership first. Venk, do you want to? VENK SUBRAMANIAN: Yeah. So who is Unity? I'm hoping a fair
amount of you know us, but I'll cover it anyway. We are the world's most popular
3D content creation system. And we started with games. We expanded into the 3D
content creation space. But really, what we are is
a complete game creation platform. Our pillars of create,
operate, and monetize allow game developers
to build a business. Because today, game
development is not just about building games. We have to build a viable
business around it. And game developers, what
they do best, is build games. So Unity's goal is to
take away the rest of it and make it really easy for
them to build a viable business around their games. Now, we've been in
Play for a while. And we are wildly
popular in many spaces. We've had over 29
billion installs with some kind of
Unity experience in it within just
the last 12 months. 60% of all AR and VR content
today is powered by Unity. Now we're 50% of all new mobile
games are today built on Unity. And then the Unity
Google Cloud Partnership? SOM ROY: Yes. So as you can see
here, I'm already sporting it on my jacket. So Unity and Google Cloud
also partner together on this thing called
the Connected Games. The whole point
of Connected Games is to connect players to each
other and players to developers as well and basically provide
a more enriching experience to game players. So Google Cloud and
Unity specifically are partnering on that. Again, as you can
see, we have a lot of initiatives with Unity
going on across the company. But this is something we
are very, very specific on Google Cloud and Unity. And if you need
more information, you can read up on Connected
Games on the Unity blog. All right, so as we
start, Venk, do you want to give an overview of
how your engineering side? VENK SUBRAMANIAN: Yeah. So one of the unique
challenges we had was that our team
is extremely global. Unity today is in over 30
offices, in over 22 countries worldwide. We have close to 10 key
locations across the globe. So a company that is built
like this cannot operate in a extremely
top-down structure. So we focus very heavily on
collaboration and empowerment. And I'm bringing this up
because, as you were starting out on your migration,
it's very important for you to know how your
organization is structured. This actually plays
a very important part in how you're going to
plan and how you're going to execute on the migration. So for us, this was
a unique challenge that I wanted to highlight. SOM ROY: While it's really
cool on Unity's side to be globally distributed,
it becomes really a big challenge when
you're a services arm and you're trying to
achieve the migration. So we started thinking about,
from the Google PSO side, how do we map a team to address
Unity's global and distributed nature of business. So we started with teams in San
Francisco, Seattle, and Austin in the Americas. We quickly realized
that we have to scale up our team in the EMEAs, across
Stockholm and Helsinki, where one of their key
business units are. We also had a team
out there in London. And as you can see,
in the previous slide, Unity does have an office
in Shanghai, China. So Google doesn't have
a PSO team in China, so to map to that in
the same time zone we had to have a team
in Singapore, which was in the same time zone. Now, this is really important
because local time zone interaction during a tight
schedule and a timeline is really important. And having that local support
in where the migration is happening, this is really-- that's why we went with this
global nature of the team. Also, sometimes,
when issues come up, the teams cannot wait for
eight hours for US to wake up and actually get their
questions answered. So removing the local
blockers was-- like, removing blockers
in local time-- is a really, really
important thing. The third thing is the
camps across the globe were kind of always keeping a
tab on how the migration was going on. So, again, we had TAMs in
all the regions that we saw. All right, so when we try to
summarize the whole migration journey in one slide, it's
really, really difficult. Because there are, as Venk
said, so many business units across so many cities. Again, there was a
very tight timeline. The migration has to be
completed within 2018. So we tried to represent
the whole journey in at least one slide. The important thing that I
want to draw your attention is, if you see the
red box there which says the pre-migration,
and then the green box, which is the migration. This is really important. As Venk could kind of agree to,
we spent a lot of time building out the foundation in
the first three months. Now, there were concerns. Are we moving fast enough? Are applications
coming on to cloud? And we basically, as a
joined team together, we had to push back and say that
it's really, really important to set up your foundation first. You need to set up your
network, your IM, your security controls. And what we saw is, because we
spent three dedicated months in the first phase of
the migration, the phase 0 and the phase 1,
the next six months were highly, highly accelerated. So if you are embarking
on a migration or you are just about to kind
of start a migration phase 1, please do build your
foundations properly. Because, I think,
if you do that, I think it's just
easier to bring the applications
and the workloads later in the remaining
half of the year. VENK SUBRAMANIAN: Yeah. This plan at the beginning
looks counterintuitive, right. In fact, when we
went and pitched this, what it essentially looks
like, that we're basically training for half the
time on a eighth month migration for what is
a extremely high scale set of services. But there was a plan behind it. Because when the teams
were ready to go, they were ready to
really run with it. We had made sure to account
for all of the major blockers. We're going to cover this
in more detail about what we did within that pre-migration
period that really set us up. But this point is so
important that we're going to highlight
it a few times through this presentation. There is a lot of value in
taking the time to prepare. SOM ROY: And one
more thing, what kind of caught us off guard
a little bit last year was GDPR was the hot thing. A lot of your-- if you're a
customer, you've been hit with the GDPR requirements. So again, it goes back
to the foundation. If you set the
foundation right, I think getting compliant
on many of these standards are pretty much easy later on. So we also did spend a bunch
of time on the GDPR part because that was literally
the time when everybody had to be GDPR compliant. So given such-- and I see
this across all the customers more so for a very, very tech
savvy digital native customer like Unity-- is they are pushing
our products and our platform to the boundary. Like Venk said, 29
billion downloads-- there are millions of
transaction, even billions, in a minute, per second. So it's a very intense
application stack that they have. So as part of that, there are
many, many things that came up. The TAM team and the PSO team
overall worked very closely with our engineering
and product teams to actually unlock
a bunch of features. GKE, shared VPC was kind of
[INAUDIBLE] at that time. Cloud Composer, GKE
private cluster-- these were absolutely a major
blocker to the migration. And kudos to our
product teams who were able to get those
products [INAUDIBLE] during the migrations
and we were able to move all
the workloads over. But there is always newer
things coming down the pipe, as you saw. I think I just saw
an announcement around Traffic Director. So it was really
important, and I think all thanks to the Unity team. We said that when we have
a ten feature request, it's really important to
prioritize those feature requests. Like, what are the things
that will block the migration? What are the things which
are kind of nice to have, that in the next six
months, if they go [INAUDIBLE], that will be fine? And what are the
things that you would need one year down the line? And I think that collaboration
with the Unity team worked really well. They said the private GKE
cluster and the shared VPC were really, really
critical items that needed to be in the migration. VENK SUBRAMANIAN: What this
slide also highlights, really, is the fact that
a lot of you are going to think of your migration
as your burden to carry. But it's not. It's a partnership. There is a team of
people out there that don't just
sell this product, but they're proud
of this product. They want to learn
more about it. They want to know what you
need and what you're missing. They want to come to the table. And what it does for
you is it actually makes it a lot easier
for you to understand the insides of the product too. But for us, we really focused
on understanding not just how the product
worked but also what didn't work for the product. Because it wasn't
important for us to pick it if it worked perfectly but
rather, just like any business today, what's down
the roadmap, what can we use today,
how do we set it up so that we can work
around any potential issues. And that doesn't
happen in a silo. So any partnership
for any migration requires that you
actually invest the time, as part of your
pre-migration strategy, to really learn with the
team about what they have and how it works. SOM ROY: So if you look across
the different Unity business units across the stack,
they're using a lot of products right now. Even after the
migration was complete, newer work teams are
getting kicked in. I think 2018-- correct
me, Venk, if I'm wrong-- was a focus was really,
really on lift and shift, move the workloads over to GCP. 2019 is more around
enhancing the data, looking at email use cases. So I would say last year
was very, very focused when we did the migration
around compute, storage, networking, all of the core
foundations of the stack. And 2019, again, as I
said, the story only starts after the migration
when the workloads actually come over. So this year, we are
actually working with Unity to enhance that and spend our
time in the data and email space. VENK SUBRAMANIAN:
And once again, this was not a generic
decision that we took by looking at the technologies. We look deeply within
our architecture and evaluated per situation
what makes sense to take over. It became very
quickly clear to us that compute was a central
piece of our technology. So it wasn't enough that
we just lift and shift it. We had to look at
Google-managed technologies and figure out what we could use
there in order for us to scale. On the other hand,
Unity has always been very, very
passionate about data and about machine learning. But those were things that were
very customized within Unity to date. So in that case, it didn't
actually make sense for us to try to bring it over
within the Google setup because a
re-architecture would be too much for us to handle at
the same time as a migration. So we looked at each
piece individually to make a decision about
how we had to move it over. SOM ROY: So why was
this important, Venk? Can you give you an example? VENK SUBRAMANIAN: So I
don't know if any of you know this game "Apex
Legends," but once we finish the migration over, we had
our multiplayer platform now able to support
workloads through GCP. And when "Apex" launched,
it was a huge launch. For those of you
that remember it, they had over one
million players within the first couple of days. Today, they have over
50 million players in and out of the game,
2.3 million simultaneously. And this is around the world. So at peak times, we're actually
using 230,000 compute cores. I mean, pretty sure we've
maxed out the Google quota multiple times
through this process. But it was possible to really
quickly launch this and scale it because the backing and the
foundation of GCP and the way that we architected
it was solid. SOM ROY: Yeah. This is really important. Like Google Cloud with scale
up, and it will scale up to handle such launches. But again, because
we kind of set up the networking and
IM, and the security, collaboration with the Unity
team, that just adding cores was relatively the easier
part of the problem once we set up the foundation. Then again, we keep
harping on that point because, I think
both for me and Venk, that is really
close to our heart that we spent that time
in the first quarter. VENK SUBRAMANIAN:
So let's dig really deep into how we actually
planned this migration. So here's the background. Now, we talked about
the partnership. The partnership was
not just a PR spiel that Som and I came up with
because our companies told us to. The partnership
is important here to mention because
we actually had to take this vision not
just to the executives about the importance
of this partnership, but you actually have to do
that for each one of your teams. You cannot expect your teams to
believe and move independently on something as key as a
migration unless you explain to them the vision, the
goals, and the business value. And if you're not doing
that with your teams, then you're not empowering them
to be part of the business. So when we set up
this partnership. We took that to the teams. And we empowered them by
explaining the problem. And after that, we
got out of the way. We empowered them
with the problem and asked them to come up
with the right solution. We told them all of
the business needs, the deadlines, what happens
if we go over x versus y, how is spend affected
through this year. We laid it out for the teams. And when I say teams, I don't
just mean the directors. I mean every
engineer, every lead, had access to this
information and was encouraged to know about it. When you do that,
now you have teams that are truly bought
in because they're aligned with that vision. And then it becomes a
self-fulfilling prophecy. The second thing we did was
we used this as an opportunity for evaluation and refresh. One of the things that you
hear a lot in the industry is lift and shift or
transformation, right? And it becomes this binary
thing that you're solving for. You're either taking
all of your crap that you have in your
house and moving it over, or you're just scrapping
it entirely and just buying everything new. But that's not
really true, is it? Every situation is unique. You have to look
at every scenario and evaluate using the
guidelines and the goals that you've outlined as
part of the empowerment to understand what makes sense. So for example, in
our case, we had been using our own Kubernetes stack. But we took the time
to evaluate GKE, and we realized
the value that it was going to bring because what
we had was essentially many, many sets of isolated
Kubernetes clusters. But what GKE allowed
us to do was really centralize that
system, and it actually fundamentally changed how we
operate as an organization. Today, instead of having every
team figure out their own GKE, we've centralized
these things back to teams that are
dedicated to it. As in, they are experts in GKE. They centralized the modules. They roll out the clusters. We use different models
to help other teams scale. But when we made that
decision, it actually helped us move faster. We did the same thing
with logging too. We had an in-house
stack that we had always been a little
troubled by because it wasn't super effective. When we looked at Stackdriver,
we evaluated the advantages we get in terms of gained
people back into the workforce because they don't have
to worry about managing an in-house stack versus what
it would cost us to run it. These are real things. These are practical things
that each team should not be looking at. It shouldn't just be happening
at the executive level. On the flip side, we have
a highly customized data pipeline. And that was not something that
we wanted to take to chance. We evaluated Pub/Sub
and our evaluation was that we're going
to move it over, or we're going to evaluate
how to move it over next year. So that one, we actually
knew because we'd done the math behind it that
it made more sense for us to take as is. Now, one of the things
that was to our advantage was we were a
distributed architecture. We've kind of been
very microservice mesh from the beginning. Everything that we do is
API and message based. So essentially, we already had
a bunch of independent layers that we could figure out
how to migrate them over. Now, I call this out because
this worked to our advantage. But there are technologies that
you're working on that maybe a monolith, that maybe
multiple monoliths, or some version of a monolith
and a distributed architecture. These are important
because you have to do your own
independent evaluation. It's not just enough to look
outside and pick the standard that everybody else is using. And then, global teams, again,
we've talked about this before. This was a particularly
unique problem for us, which is we're trying to
do a migration in a very short timeline with teams
that are across the world and how do we coordinate that. So what we did, and I'm going
to cover this in the next slide in planning, what
we did essentially was really kind of
focus on the results rather than the process. We focused on
empowering these teams so that they could move on what
they needed to independently. But we had to come up with ways
to be able to track the system. And finally-- and this was
the toughest part-- business continuity during the migration. There was no stopping
for the business. We continued to work on
new products and features and rolled them out. We had no downtime while we
migrated a high-scale system. I mean, we have 2
to 3 billion users in and out of this
system every month. We processed 20,000,
30,000 events every second. And all of this
is at low latency. In some cases, the
latency requirements were so strong that we had to
pick data centers on both sides so that we could route
in under 10 milliseconds. And we did all of that without
experiencing any downtime. SOM ROY: And that,
I think, Venk, is another key point
that always comes up. Again, from the PSO side, we
always see the customer saying, or our engineer is going
to build the next product, or are they going to
focus on migration, because they are both
equally time taking. And while, on one side, you
have to make the business go forward, come up with
the new features, you also have to
finish this migration. So I think just the
balance that the Unity team, with the help
on the PSO side, we were able to create
was really, really a-- that's one of the key things
why these things went well and we were able to complete
this within a short time frame. VENK SUBRAMANIAN:
And again, for us, it was a practical
decision because we looked at the logistics of it. If we're building a
feature, what value do we get with
differing sections of the migration
versus what do we get from speeding up the
migration in certain aspects? We always went
back to the basics. We always looked
at the goals when we were trying to solve these
day-to-day tough problems. So migration planning--
we already covered this, but plan twice, execute once. I cannot stress this enough. Now, what did it actually
mean when we did planning? We started with the
very, very basics. We put the people into
Qwiklabs and Coursera. We gave them access
to online training that they used to pick
up at least the basics of a new system
because most of us were actually unfamiliar
with GCP to begin with. Then, we went into the
next stage, which is, now that we understand
the basics, we started to work closely
with the Google TAMs to actually understand certain
technologies very, very deeply. We'd already identified
things like GKE. But it wasn't just enough that
we looked at the documentation. So we would sit
down with the TAMs and walk them through our
use cases across the board and explain how we were planning
to use it and what made sense. Then, we went even
one more level deeper. At this point, we had
each of the teams that owned pieces of the architecture
sit down for days on end with the Google TAMs. And the TAMs would bring
in their own experts based on specific technologies
that we wanted to use. If we needed a Postgres
equivalent, so Cloud SQL, they would bring
those experts over. If we were evaluating Pub/Sub,
then we would make sure we had a Pub/Sub expert there. And we would spend
all day in a room, walking through the stack,
breaking it down over and over, digging into how
they're supposed to talk to each other, how
the firewall rules work, how does access flow-- everything from
where do we set up the projects to which
services run on which cluster. We tried to lay as
much out as possible. The advantage of doing this in
a very collaborative fashion as opposed to having people go out
and write documents and review it over and over is that
it really sped up the time to get to the ideal result. So we actually focused a lot on
just bringing people together. We actually had people
fly to different offices. We had local TAMs, and this
was another big advantage of local TAMs. Then, this actually gave us time
for learning and exploration. So as I mentioned, we covered
a lot of detailed content with the TAMs. Once we had gotten to the
point where we understood what each team had to do, we
went to the next level, which was we actually started
doing cross team dependencies. So we would have teams
sit down together where known dependencies
existed, once again, in collaboration with Google,
to talk through the details, like, OK, how do we
manage latency here? How do we manage
the data workloads that are going to flow
through at peak periods? This forces you to
do a lot of research. This forces you to gather the
right metrics, to understand your system better. It's not just enough for
you to have the basics. Everybody knows, well, how many
requesters their service deals with, but you don't necessarily
know what your error rate is, what your peak loads look like,
what patterns your traffic flows through, how does it
deal with specific types of exceptions. Because when you move from
one stack to the other, you're actually going to see a
change in where the errors are generated because different
clouds handle resiliency differently. So we really focused
on those aspects. Now, while the teams
are off doing that, we were focusing on phase 0. What is phase 0? Compliance, security, SRE,
network, and developer tools. So while the engineering
teams were off figuring out how they could migrate, we had
actually set off the phase 0 teams, which, really,
if you think about it, this is your foundation. This is what creates the right
GCP foundation for you to use. We had these teams off
and running in parallel, setting up the actual stage
and production infrastructure. So the engineering teams
were off in sandboxes having these detailed
conversations and learning more
about the tech. And in the meantime,
security was coming up with how do you set up the
Google Groups to set up the right access. How do you do editor
versus read only? How do you do billing access? Where do you give super admin? How does SRE get
access to systems across different
organizational units at your company versus what
you do for your own teams within your own
organizational unit? We also covered compliance. GDPR was a big one that
we were dealing with. There's a lot of
little nitty gritty that you figure out as
you're digging into it. For example, GCP,
by default, does geoblocking of certain
countries that are embargo. And these are pieces
that you're only going to figure out when your
compliance team gets in there and asks the right questions
because that's what they do. At the same time, network
was heavy, heavy focus. We pushed bandwidth like crazy. So we definitely had the
network team in there, not just looking at how they
were going to use the existing setup, but how we can
improve it and actually build a much stronger
foundation for what we knew we were going to
scale in the next three to four years. Now, workstreams and
dependencies-- so this is a tricky one to
talk through, but I'm going to hand-wave
my way through it and hopefully you
guys are going to see. Because this was actually a
critical part of our success. What we focused on in this
whole pre-migration area was building out workstreams. Think of your workstreams
as layers within your stack. At the very basic, you're
probably going to have network. Then on top of that,
you're probably going to have your
foundational pieces of how do you start up the clusters. Then you're going to have
the actual services that run between them. But the services are
also layered, right? You will have your
backend services that are talking to the data
pipelines, that are actually talking to a middleware, that
are talking to a pure frontend service. So there's all these
layers that you have. Now, if you're able
to neatly draw them out so that they create
these parallel paths, what you've effectively done is, one,
you've identified workstreams. Each of these workstreams
is a cohesive unit that can move independently. Your data pipeline is
different from your services, is different from your frontend. The other thing
that you're going to end up doing when
you draw this out, you're going to have lines that
crossover between workstreams. Those are your dependencies. So now you know exactly who
talks to whom because you've leaned this out in charts. Towards the end,
we had giant charts that had a ton of different
workstreams in them. And so we actually ended
up creating workstreams within workstreams
in some cases. But when you're doing a
migration as a company, you have the bandwidth. You have the manpower to be
able to go and break it down. So what we did was
each workstream that we started with
was a business unit. And then that
business unit would go in and break down their
workstream into further layers. But these layers
were very important. One, we always had visibility
into what the layers were and where the dependencies lay. Two, we never ran into
the issue of people not understanding how the
architecture was flowing because this was always a great
reference point for us to be able to lay back to. Finally, tracking
the milestones-- so after we have these
workstreams and dependencies laid out, it actually let
us track two milestones. Now, I'm going to say this
to you a little differently because a lot of us, when
we're doing this migration, we're going to focus
on the process, and we're going to
focus on the deadlines. That's natural. That's what we do. But when we have a
global set of teams and our goal is to
empower them, there was no way for us to
be able to pull it off by saying, hey, this is the
date you got to hit it by, these are all the processes
that you have to follow. So we kind of flipped
it on its head. We focused on
milestones, not status. And we focused on
results, not process. So what that means
is every team was able to own a piece of the
workstream independently. All we did was create
standard milestones that we could track
across these workstreams. This is something as
simple as your stage prod being milestones or a security
review being a milestone. Like, pick your
milestones, right? But they become standard
across the teams. The second thing
we did was, when we would collaborate, when
we would review updates, we never focused on the status. We focused on the results. We focused on the milestones. So what have you
achieved so far and what are you going after next? How can we help you get there? Where do the dependencies
exist, and how can we help bring the
collaboration together? What it did was it kind
of changed the framing of this whole migration. It became less of a process and
it became more of a shared goal that we were all going after. The other thing that was unique,
at least in our situation, was our teams used
different supporting stacks. So not everybody is on
the same ticketing system, not everybody is using
the same developer tools. So once again, in
that case, how are we going to actually track to this? How are we going to know when
an engineering team has finished x milestone because how the
traditional process grows, right? You put everything into Jira. You create this giant chart. And then everybody's looking
at it every week going, oh, we slipped by two days here. But that was not
what we wanted to do. Because, again, the focus
was not the deadline. The focus was the shared
vision, this goal. So we actually took that
work into, believe it or not, we spent a week creating
this tracker that essentially took all these
disparate pieces of data and just kind of rolled it up
into a simple workstream-based progress. And we shared that every
week across the board, the executives,
they all knew what was going on because it was
a very simple way to view it. In fact, I think-- and
I'll say that out loud-- I believe Google
liked our tracker so much that it may
be showing up now as a template for
other migrations. So if you see one
that looks kind of like a set of
workstreams, that was us. And then, finally, the
cadence for the global teams-- now, as I said, it's very hard
to track these global teams, so we focused on the
goals and the results. We tried really hard to
stay away from the process. So at the end of
the day, this is kind of a very, very
high level of what the workstreams look like. These cohorts are
essentially big milestones that each one of these
people owned delivering. And if you notice, it
was possible for them to be extremely parallel. Big ones we even broke down
into further workstreams. And we had dependencies
tracked between them. But the reason why we could
move on all of this in parallel was because we focused on
making these workstreams as independent as possible. SOM ROY: And I think
the cohort was also really defining what goes in
cohort one and two and three. That was really
also very important because it kind of gave
us a staggered approach. And when we went live,
it wasn't all or nothing. We went live at the first
set, everything worked well, we went live with
the second one. So I think that was
really, really useful. VENK SUBRAMANIAN: Yeah. And we're actually going to
talk about some of our learnings now because we also
stumbled along the way. There were things
that we learned. And they're kind of
the nitty gritty. They're the practicalities of
when you try to do a migration. So we'll start with the don'ts
because we always want to know the don'ts first. OK. First of all, don't migrate
all of your baggage. But also, don't migrate
none of your baggage. You've got to pick. Think about it like moving
into a new apartment. When you move into
a new apartment, you don't just take
all those boxes that have been sitting in your
garage for months or years. But you also don't just
leave them in the house. You want to go through
and clean up as you go. So do that. Be practical about it. And more importantly, don't
try to put a ton of process around it. Your engineering teams
know these things. They know where the
baggage practically can be moved versus not. Just let the engineering
teams be empowered to do something like this. Two, don't build a snowflake. If you have a snowflake,
don't migrate the snowflake. We all have stacks, especially
if the technology we're working on is a few
years old, we all have these snowflakes
sitting around. But the world has changed. So much is out there now. Google aggressively
looks across and tries to find these common patterns. So there are known
best cloud practices. There are known application
patterns to follow. Use them. This is actually a
great time for you to be able to reinforce them. You know that there is
a VM out there somewhere that is open to the internet
and nobody knows about. You know there is
that one guy whose laptop has access to the
production infrastructure. Take the time to clean it up. Take the time to put
better practices in place. Don't migrate the monolith. Now, this is a harder problem
than just putting it up as a point on a slide. So when you're
looking at a monolith, do your best to apply
the strangler pattern. Pull out what you can that's
safe, what's feasible, and try to move that over. It's actually going to
have a twofold result. One, it's going to make your
monolith smaller and easier to migrate. But two, it's also going to
provide a proof of concept for how to be able to
scale to bigger migrations because you're going to pull
out really, really small pieces. You can start with
simple things, like your config service,
or your identity, or just a connector to the database. Separate out your
database through an API and just move that over. Now, don't retain single
points of failure. This one was especially
important for us. For example, I told you we used
network bandwidth like crazy. So when we were
moving over to Google, we really buttoned that up. We created multiple points
of failures and redundancies in there that has actually
allowed us, in recent times, to deal with network outages
that we've seen that were on the third party side. But we've been able
to deal with that because we actually took the
time to evaluate our bandwidth, understand it, and then put
the right redundancy in place. We did the same
thing for Kubernetes. Multi-zone's always
a great idea. Multi-region is even better. But multi-zones are
always a great idea, so we took the time
to do that too. And finally, don't
re-architect everything. Especially not your plumbing. Don't try to move over to that
brand new monitoring system while you're trying
to do a migration. Move parts over that need
to be migrated and deal with everything as a
separate initiative. Now in some cases, they may be
tied together, in which case, use your workstreams
and dependencies to be able to separate them out. If you treat your
developer tools as a later workstreams
have a dependency on, you're actually going to
create a phased approach of how we're doing things. SOM ROY: So, like
Venk talked about, the very specific on
the technology side, I'm going to talk about
the people and process side as well because, until all these
three together come together, you won't have a
successful migration. So the first key don't on
the people process side is please don't
migrate in a vacuum. Think about the
downstream dependencies. Think about what and
other teams downstream are going to be impacted
by this migration. And please involve all these
cross-functional stakeholders when you do so. If you don't do that, it's like,
even though your component is successfully migrated, the
shared vision will not be met. So no migration in a vacuum. I think, don't
focus on deadlines. I think Venk talked about it. I'm not going to go
into the details. I think it should
be shared goals. It should be talking
about shared focus areas. And everybody should feel
as part of the migration and just saying, I have
to do it by October 30, that's not going to
cut it because you have to take everybody along. Another thing which is
important is the lift and shift versus transformation. I think what the point
we are trying to make is there is no one
correct answer. Some of your components
will be lift and shift. It makes sense to just
take what you're running and just move it to GCP. Versus some, you
should take the option that you are doing a migration,
why don't you transform? So each workstream should
be treated very differently. Each component should be
different treated and evaluated differently. Across all the
workstreams, I think, there is no mandate
that everything needs to be either lift
and shift or transform. I think the Unity's
migration was a very good example of a mix
of both these approaches, where it made sense. We had a lift and shift
where it made sense. We actually looked at a
transformation approach. And then, just don't assume
that planning leads to success. While the planning
is really important, but following up
and executing on it is really, really important. And iterate on the plan. In every migration
that we are seeing, there are unknown blockers
that will crop up. There will be churn in terms
of folks joining the team. So you have to keep
iterating on the plan because if you don't
iterate on the plan and you just stick
to something that you planned six months back, that's
not setting up for success. So that is a really,
really key don't in terms of from a process perspective. VENK SUBRAMANIAN: So now, what
to do from the technology side. We've covered phase 0 in detail. It is very important to
establish a solid GCP foundation. Because here's the thing
to remember-- the migration is not the end of the journey. It's the beginning of it. You are tying yourself to a
technology for a long period of time, hopefully. Hopefully migration is not
what you do as a company. And when you're doing that,
you want the foundation to be extremely solid. The more debt you
accumulate, the harder it will be going forward. Now, some kinds of
debt are isolated, and you're going to
actually make calls on them. But foundational
debt is different. If your access controls are not
in place when five people are in the system, it is going
to be a lot harder to do it when 300 or 3,000 people are
going to be in the system. If your network
firewalls aren't in place to segregate the
environments when there is no services in
there, it's not going to happen when there's
thousands of services in there. And what also changes
is the manpower needed to do it at a later
time exponentially goes up too. So it is very, very
important that you establish this foundation. Automate everything--
this is the key part. So we actually took this
time to automate everything that we found that was manual. And we actually had it in
three different styles. So everything was
infrastructure as code. That was kind of the
principle that we aligned on that we were going to do. But also, when we were
looking at infrastructure and we realized all
infrastructure was being written as code, we
centralized a lot of the pieces that we knew were in use across
multiple parts of the company and we offered them
as infrastructure as a service or
infrastructure as a framework. And what infrastructure
as a framework-- I'm assuming all of
infrastructure as a service. But what infrastructure
as a framework does is it actually take some of
the burden of maintaining the service and passes that on
to the user of that service. And this works really
well for internal teams. So as a good example, we had
many, many uses of Mongo. And they were all isolated, some
of them were manually set up, some of them were set up in
different kinds of automation, and they all had different
configuration, different use cases. But we spent a time to
have a single team actually pick up all of that, understand
from each of the users what they were trying
to do, take it all back, and rewrite it as a
centralized module. Now, this module is a lot better
maintained because we actually use automation to not
just deploy and run it but we also use
automation to test it. We have a team that's dedicated
to continue to improve it, which means that you don't
fall behind on versions. You don't fall behind
on new functionality. You don't have bad bugs
sitting around in your system. But also, it didn't make
sense for a team like this to try to run every
database in the company. You're just creating just an
unhealthy central dependency. So we offered it
as a framework out. We maintain the module, we
improve it, you go use it. You own the cluster
that it runs on. It also passes the ball a little
bit of making sure everybody knows how to run infrastructure. Most of us today understand
the concept of, you build it, you run it, you own it. In fact, at Unity, we have this
joke that says, you build it, you run it, you pay for it. Because our engineering
teams actually know what it takes to
actually-- in terms of spend for their services to run. They use that to continuously
optimize their services. And this is the
kind of empowerment that you want to make sure
that you offer to your teams. Minimize technical
debt-- now, we've talked about technical
debt quite a bit. Specifically, what
you want to look at is just old debt that's
sitting around that's going to severely hamper you. In one of our cases, we had
a old, highly customized load balancer that had
been written in house that was sitting on a system. And when we looked
at GKE, we knew that we could re-architect
the system a little bit and actually get rid of
this custom load balancer. So we took the time to do it. So minimize technical
debt where feasible, especially fundamental
technical debt. Don't take it with you. Iterate and learn-- so the way
the workstreams were set up, they were actually set up in
increasing levels of complexity and collaboration. So the first systems to go
out were extremely simple. Most of them didn't even
have a back end database. It was like a GCS. But after a while, we started
to get into bigger services, but those that were
completely independent. Then you start introducing
a dependency layer. So now you have
multiple systems trying to go live at the same time. So we did it slowly with the
goal of accelerating closely towards the end
because you start to gain momentum once confidence
builds from the deployment. So our migration actually
looked like almost no work done within the first three months
because we were focused on the learnings
and the migration, very, very minimal services
migrated over the next two months because we were
getting our confidence and understanding how
everything worked in stage. And then, all of a sudden,
you start seeing you high-scale services just
ramping up really quickly, and all of that happened
within the last two, maybe three months. And finally, train,
plan, and prepare-- the reason why I'm
calling this out is, separate the
artificial pressure of learning from execution. As an engineer, you know this. When you are told to
practically learn on the go or you're being asked to deliver
something that's brand new, it creates this
artificial pressure of, I need to do two things
at the same time. So we just set it up in a way
that it took that pressure off, the teams were able to focus
on learning and training first, and then when they were ready to
go, they were able to execute. SOM ROY: And finally, from
the people and process side, what are the things
you should do? I think this is very,
very like what we covered in the don'ts is, don't
start in a vacuum. So you should align on
overall vision and goals for GCP migration. Gaps in understanding,
it has to be top-down as well as bottom-up. Like, if the engineer
who is actually working on the
migration doesn't align to what the vision of the
company is to why big GCP and where the
migration is going, that leads to issues and
conflicts unnecessarily and it will delay
your migration. So then alignment
across top-down, bottoms-up is really,
really important. Establish a migration PMO. This may sound like a really,
really old school term, PMO. But project
management office does help when you are
doing a large migration with multiple workstreams with
so many different stakeholders involved. And absolutely identify
the right stakeholders, both from the Google side as
well as from the Unity side. That's really, really important. Identify clear owners
because responsibility and accountability, again,
a very old school term. Who's responsible? Who's accountable for the
success of that workstream? That is really
important to identify. The fourth one is, I
think, the one that is really close my heart, which is
set realistic migration goals. Aspiration versus reality,
like we keep talking about, yes, let's use this migration
to do something dramatic. Your entire stack
will change, something will change drastically
versus, let's be real in what can be achieved
in the six months, seven months or one year of migration
that you're going to do. So this alignment is
really, really important. And you as a company
need to take a call in what is the realistic goals. Establish and track milestones-- I think Venk talked about-- and
I'm not going into the details again because you don't
have so much time. But have proper
tracking and make sure that you're looking at it on
a weekly, bi-weekly, monthly cadence. Prioritize for risk
issues and features, and this goes back
to the slide where we talked about all the feature
requests that Unity had. Like, please identify what
is P0 and P1s for you. What is P2 and P3, and can it
wait for six months, one year? Please do try to
actively identify. It always works better both
for Google and the customer if we have a prioritized list. And on the last point,
Venk, do you want to-- I know this is-- VENK SUBRAMANIAN: Yeah. This is the one I really
want to hammer home because a lot of you
are going to treat a migration as a project
that needs to be done. But it's a unique event,
and it's an important event. It actually sets the
stage for your company. And it's not going to
happen without your people. It's not going to happen
without your engineers. So celebrate success. We talk about people, process,
technology quite a bit. But if you think of it in
terms of people, planning, and empowerment, you're going
to get the right technology as a byproduct of that. If you focus on letting your
smart engineers be empowered and then get out of the
way and focus instead on how you can unblock them
and how you can help them get to the right kind of
tracking and milestones, they're going to get the
right results for you. Deadlines are never the focus. The goals and the vision are. And that's it for us. [MUSIC PLAYING]