A Year in GCP Networking (Cloud Next '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] SRINATH PADMANABHAN: To get started, I just want to tell you all something that I truly feel from the heart. I really love these kind of events, and I want to thank each and every one of you for being here because these events are really, really useful and effective ways for us to talk to customers. So every time we come to a Google Next-- like for instance, when we came here last year, we had this opportunity to come talk to all of you and listen to some of your feedback about what's working well, what's not, what can be changed, what can we do better, what should we continue to focus on, and we take all of this feedback and then we go back to our day jobs and we start looking at all of this, and of course we take a little bit of a break in between, but we go back the next week and we started looking at all of these lists of things that we heard from you in terms of what's working well and what's not. And we use this to kind of be a roadmap for the next year. So we really focus on areas where you say we're doing well and areas where you say we need to improve and try to make something that really helps solve those problems. And that's why I am really excited to come back here today and talk to you about what we learned last year and what we've been able to deliver in terms of things that help simplify your lives and help build on our partnership. So with that, I want to start off by saying, what were some of the key things that we heard from all of you last year? So we heard various things from you, but if I were to distill them down onto one slide, these are some of the key pinpoints that you talked about. One of the things we heard a lot was you wanted us to meet you where you were. You wanted us to make it easier for you to get onto the cloud or to reach the cloud. And as you might have noticed in today's keynote, a lot of what we've done over the past year has been in that very area, where we talk about bringing the cloud to you rather than kind of having you move things over or looking at things in a different way. The second thing that we've heard a lot from people was, help simplify my transition. It doesn't matter whether you're trying to transition a whole bunch of workloads or you're trying to transition just small pieces of things that you're running somewhere into Google Cloud. So you asked us to help simplify the transition, and that has been one of our key focuses as well and I'll definitely go into some interesting ways we've been able to do that. We also got two really nice pieces of feedback. You said that you really like how we price things and how flexible we are in terms of costs and making offerings available in a very, very cost-effective manner. And we also heard from you that we do really well when it comes to security and trying to provide a nice footprint and a nice kind of ecosystem that lets you build your security story on top of that. And so we've doubled down on both of these areas and tried to do more to help make these more useful and more better for you. So with that, I'm going to start off by talking about something that we at Google Cloud really feel is a big differentiator for us and it's something that we focus on and we want to continue to focus on, and that is our global infrastructure. So Google Cloud has been building this infrastructure for many, many years, from even before when we were Google Cloud. So we've been building these submarine cables and we've been building these data centers and we've gotten to this point where, this year, we're so proud to announce that we have 17 regions that are up and we have 125 points of presence around the globe. And this goes to make that transition to the cloud much easier because each and every one of these points of presence is one additional location that's closer to where your customers are-- your customers who are consuming your applications which run in Google Cloud. And what we do to tie all of this together is we have this extensive network of submarine cable and terrestrial cables that connect all of these points of presence and our data centers and various other pieces that we provide for you together to make it one huge infrastructure investment that delivers value to all of you. And over the years, as we've been building these sub-sea cables and all of these investments have been made in the cable world, have gotten us to the point where we have an extensive team that's completely focused on submarine cables. And I'm really, really proud to talk about the next slide, which is something very unique that we did this year. So, in January of this year, we became the first non-telco company to announce a completely private sub-sea cable that connects two continents. And this was the Curie sub-sea cable, named after Marie Curie, and it connects Los Angeles to Chile. And I'm also very, very excited to announce that-- well, not announce, but last week, we announced the Dunant sub-sea cable. This is again named after the founder of the Red Cross, Andre Dunant, and it connects the east coast of the US to Europe. So this is the first trans-Atlantic cable that's actually, again, owned completely privately by a non-telco entity. So this just goes to show some of these things that we're trying to do to bring more value from our infrastructure to you. So just looking at the numbers in terms of what we've been doing with respect to infrastructures, I'm going to point out a few key numbers here. We've been spending a lot of money in terms of making this infrastructure that underlies our Google Cloud, and over the past three years, we've spent over $30 billion in terms of expenses to build out Google's infrastructure. In addition, we've also got twenty cloud regions that we've announced. We have 125 points of presence, which I talked about. We're going to go into the dedicated interconnect locations shortly as well, but the really interesting statistic here is the 100,000 miles of fiber optic cable that we have. This is, if you look at all of the fiber optic cables that we've deployed across the globe, it's 100,000 miles. That's, just to put it in context, about a little bit over a third of the distance from here to the moon. So it's a lot of investment in terms of these cables and what I hope to do is to tell you a little bit about why this adds value to you and your deployments running on Google Cloud. So all of this infrastructure is the underlay that lets us build our SDN ecosystem on top of that, using things that you've probably heard about, like Andromeda and Espresso, and if you want to learn more about this, I definitely recommend that you go and attend one of two sessions that we have later in the conference which dive much deeper into this area and about how it helps improve performance of different workloads that are running in the cloud, and in terms of networking, how it makes it so efficient. But if I were to point out three key things here, the first thing is, this lets us provide this global network and the construct of the global VPC that I'll talk about in a minute, which basically takes away a lot of your sprawl and toil from managing a vast cloud deployment. It gives you a global reach by getting you to a point very, very close to where your customers are and it also gives you various benefits, like live migration. With live migration, what we're able to do is be able to keep your VMs up when we need to do something like, say, a firmware update that needs to be done on the underlying infrastructure. And some of these things really make it very easy to deploy on Google Cloud and help your workloads keep running. And finally, because of the way we have all of this infrastructure investment laid out and we have this SDN architecture built on top of that, it also helps things scale seamlessly because every single piece of our networking offering is built in such a way that there are no chokepoints. Everything is built in a distributed manner to make it very, very easy for you to scale. And we've continued to innovate in this area with something that we've announced late last year, which was Andromeda 2.1, which has actually reduced our intra-zone latency by 40% from Andromeda 2.0, and if you look at it overall, it's about a 780% reduction in terms of the latency going from when we originally started and introduced Andromeda. So we talked about this infrastructure. Now let me go a little bit deeper into how this infrastructure helps you build your cloud deployment. And we're going to look at some interesting things we've done over the past year and how we help solve various of those problems that we talked about in the beginning. So to begin with, I want to talk about two pieces-- our Virtual Private Cloud and our connectivity options. So when you look at Google Cloud's VPC, the global VPC that we provide, it's very, very unique in the sense that it provides you a connectivity by default across all of your workloads that you have running in Google Cloud, no matter which region you're on. So what I have there is a simple one sentence, where if I were to describe this, the way I would do that is you may have 17 regions-- you may have VMs running in each and every one of these 17 regions-- but you have one VPC, which means you have to manage one set of policies. You have to manage one set of security. You need to do all of your security configurations and everything once. And it really makes it much easier for you to manage your truly global deployment. Now the other really nice thing about this is, since this is built in this way, the only way we are able to make something like this work is because of the infrastructure offerings that we provide that gives you the kind of performance that you really need to pull something off like this. Now let's look at what are some different things that you can do to connect to the global VPC from your network. So we've had various offerings, and this year, we talked about two key offerings, which was the Google Cloud Interconnect offerings. And when you look at Google Cloud Interconnect, what it does is these are various offerings that help you connect from wherever you are, whether it's a data center or whether you're looking at a co-location facility. It gives you a convenient way for you to connect from where you are to Google Cloud. And the way you would do this is you would connect to one of our many dedicated Interconnect points of presence, and once you do that, you will be on the Google Cloud network for the rest of the way. The other interesting thing I want to quickly touch upon before diving deeper into Google Cloud Interconnect is we have also been working on simplifying all of your connectivity options, including VPN. We've had a lot of efforts that we've invested in working with our partners, and with our partners, we've been able to deliver to this whole slew of integration guides and making it extremely easy for you to spin up any kind of VPN connectivity when you want to connect to Google Cloud. So I mentioned about Dedicated Interconnect. So let me tell you what we think makes Dedicated Interconnect unique. So when you look at a connectivity option where you need to connect to a cloud, let's imagine that you have a data center here in San Francisco and you have workloads that are running in different regions. Let's say you have one running in Los Angeles, you have one running in Singapore, and you have one running somewhere in Europe. So just pick any three data centers of your choice and imagine that you have this deployed. Now if you were trying to do a dedicated connectivity from your data center to the cloud, to each one of these regions, usually you would have to deploy one set of connections to each and every one of these regions. With Dedicated Interconnect, what you are able to do is you're actually able to connect from your data center to the nearest region. So in this case, you would just connect to the Los Angeles region, and from there, you can ride Google's backbone to get to whichever other data center that you need to. The other really nice thing about this is it greatly simplifies the sprawl and toil that come from trying to build these kind of deployments with multiple connections going to all of these different regions. So this map shows all of our Dedicated Interconnect locations. So no matter where you are, there are these various locations where you need to get to and we bring you onto the Google Cloud network right from those points. We also work with various partners and each and every one of these partners helps us provide you with a much more simplified way of connecting to Google Cloud. So for instance, if you were to connect to any one of these partners, you get the flexibility to choose the bandwidth that you need, and they will help you connect from wherever you are to Google Cloud. So we talked about how we've been trying to simplify your connectivity options when you're trying to go from your on-premise to the cloud. Now let's look at what we've been doing in terms of helping you deliver applications better. So when you talk about application delivery, these are a few offerings that we usually talk about, and I'm going to spend a few minutes talking about what we've done in each of these over the past year. So first, let's take a look at Google Cloud's Global Load Balancing. So we built Google Cloud's Global Load Balancing based off all of the lessons that we learned over the years in terms of building and delivering applications like Gmail and Google Search and YouTube and all of these different offerings, and you'll also notice that that's actually true for each and every one of these things that we look at in the application delivery portfolio. And when you see this, we got a lot of great feedback in terms of how the Google Cloud Load Balancer works and how it makes it extremely simple to manage a scalable load balancing deployment, and we've kind of tried to take that over the last year and go to the next step. So our Google Cloud Load Balancer, for those of you who aren't familiar with it, supports Global Anycast IP load balancing, which means rather than having to use your own DNS service to load balance to multiple IP addresses depending on where your traffic is coming from, you can use your load balancer to use a single Global Anycast IP so that if you have a user, for instance, in this example who is trying to get onto the Google Cloud network to reach your application from San Francisco, they would still be using the same IP address, as you would see for a customer who's trying to connect to your Google Cloud deployment from Singapore, but the services would be provided by the data center that's closest to them. So this does various things, and the first and foremost is it greatly simplifies what you need to do on the DNS front in terms of configuring and managing a deployment like this. Now the really cool thing that we've done in this area in the past year is we've enhanced a lot of different protocol supports that we've added over the past year, and one of them specifically that I want to call out is our support for QUIC. So QUIC actually is a protocol that would help you gain a lot in terms of your connectivity time and in terms of the latency that it actually takes for your users to get onto this application that you're looking for. So for instance, we see about an 8% decrease in terms of page load times globally, and especially if you're in an area where there is a lot of latency or you're in high latency environment, it goes up much higher-- it's about 13% in those kind of areas. So we do have a great session about load balancing that's happening later in the conference as well, where we go much deeper into this area and many other things about our load balancing offering. Google Cloud DNS is built to work with our load balancer so it works out really, really well in this particular deployment, where it integrates seamlessly with our Global Anycast load balancing offering. It also has a very, very simple scalable record management and it also supports DNSSEC so that you can verify the integrity of your records when your clients are trying to connect to your setup. Now let's look at Google Cloud's Content Delivery Network. So like I mentioned, our content delivery system as well is built off of a lot of the things that we've learned over the years from YouTube and Google Search and all of our other offerings. And what we've been able to do here is to try and take some of those lessons and bring to you a very, very scalable global approach to delivering your content across the globe. So one thing that we've done over the past year is we recently announced support for very, very large-sized objects, which makes it great for media and applications where you're trying to either deliver media or gaming. So this is again something great that we're very, very proud of, and we do have a session about CDN as well if you're interested in learning more about this offering. We announced network tiers at Next last year. Network tiers is now in beta and network tiers is basically what makes your global VPC work. So the network tiers is an offering that lets your traffic travel Google's network all the way through from the source to your end user until a point of presence that's very, very close to them. So by doing this, it's able to provide you with much better latency and much better performance than you would be used to on a regular-- when you're using the public internet. So last year we heard from you that-- and sometimes there are certain work loads where this extra performance is not necessarily what you are looking for and you would look for more cost-effective offerings, so that's where we introduced the standard tier, and the standard tier is currently beta. And usually when you look at customers and when they use premium network versus standard network, our customers really like to use the premium network for any live applications or applications that are being accessed by their end users, whereas in a lot of cases, they like to use the standard network for things like a test deployment where you're trying to build something and making changes in your development environment. So those kind of environments usually use the standard network, as opposed to the premium network for the production environment. Now we talked about connecting to the cloud and we talked about delivering your applications to your customers. Next, we're going to look at various things that we have been doing in the area of securing your deployments. So we got a lot of great feedback in terms of how we've been making more and more offerings available in terms of security for our users, and in this area, we focused on a few specific things to try and make those very, very user-friendly and very, very easy. So the first one is VPC Service Controls. So what VPC Service Controls does is it greatly mitigates any kind of exfiltration risk you may have by providing you controls which are context-aware and let you enforce policy for accessing files not just based on the identity of the user but also based on the context. So this gives you much more visibility into your data, which is the lifeblood of your business, and it really helps you control where this data is flowing. Network monitoring is one area where we got a lot of feedback from our customers, saying that when you look at a cloud deployment and you look at network monitoring offerings that are available there, one big concern for our customers was that it doesn't map one to one when you get started in terms of what we have with the level of visibility that you look at in the cloud as opposed to what you look at when you're doing something on-prem. And so we took that and we went back and we tried to build this offering and we announced VPC Flow Logs, which is now generally available, and with VPC Flow Logs, what you can do is you get five-second interval updates. So this is almost as responsive as what you would see in your data centers. And the really nice thing about doing something like this is it lets you have the same level of visibility that you would when you're deploying things in your data center and it integrates seamlessly with all of your deployments that you have on-prem as well, because we partner with many of our security partners to provide visibility into these logs. So what you could do is you could export these logs to your partner ecosystem that you're already using and have a single pane of glass where you can visualize both your traffic in your data center as well as what you're using in the cloud. And what helps us make this even more flexible is that we have a very, very rich set of parameters that we provide to you in terms of filtering this flow log. So we let you annotate based on things like the geolocation of your client who is trying to access the service, the region where your VM is running, and various other things like the subnet and various other things that let you get the logs that you're really looking for and just use those in order to keep a close eye and get great visibility into the security posture of your deployment. The next thing we're going to talk about is Cloud Armor. Cloud Armor is our denial of service defense system that we've built and we announced in March this year, and in fact, we have a session later today where Nick will also be presenting about Target and how they have been using the Cloud Armor service as well. So there are a few key things I want to point out about Cloud Armor. So Cloud Armor is built to integrate very, very closely with our load balancer, so what tends to happen is we are able to take away a brunt of any kind of a layer three DDoS attack that would try to hit you at the load balancer Then beyond that, we have denial of service defense that is built to protect you against application attacks and various things like SQL injection and cross-site scripting. So Cloud Armor provides you with a rules language that lets you program things beyond things like SQL injection and cross-site scripting, where you can tailor your security policy to fit exactly what you are doing on your own deployment. And the nice thing about the way Cloud Armor is built is that a lot of this defense actually happens at the perimeter, which means it makes it much easier for you to manage the scale of your setup while using something like Cloud Armor. So we talked about all of these different offerings. We talked about our connectivity offerings, the security that we try to provide to you with your networking deployments, how you can get your applications delivered to your customer. So in this next section, I'm going to talk about the workload of your choice. So we have been having more and more customers adopt Kubernetes and we have a lot of customers who have been adopting GKE, and they're really adopting container technologies and making them their own. So what we've been focusing on is we've been trying to make Google Cloud the cloud of choice to run your micro-services. So we are trying to provide native support for every single networking service that we have and networking offering that we have for Kuberetes Engine. Like, for instance, this year we have support-- Kubernetes Engine supports Global Load Balancing, it supports our denial of service product, which is Cloud Armor. It supports IAP and our CDN, and we are committed to making sure that every one of our networking offerings is available for you with Kubernetes Engine as well. And we've also taken the next step in terms of securing Kubernetes workload so that you can look at your security whether you're looking at containers or you're looking at compute. You can look at them in the same way. So we announced support for things like shared VPC, private clusters, we support network policies, distributed firewalls, VPC flow logging-- all of these are supported natively now with Kubernetes, which makes it extremely easy for you to manage your security policies whether you're doing Compute Engine or Container Engine deployments. And if you're looking to connect your different services, I'm sure you already heard about some of this in today morning's keynote and you're going to hear a lot more over the next few days in terms of various sessions where people are going to talk to you about Istio and Kubernetes and what they're doing in these areas to really make it easy to use, manage, and secure. And to this point, we also have support for all of these things in different ways in our networking offerings, so like for instance, in our networking team, we also have the gRPC under our portfolio, which lets you connect basically your services in terms of RPC calls and making sure that it's extremely easy for you to build a service without having to put all the brunt off the effort of building and deploying the networking aspects of this on your developers. So what you can do is you can use ecosystems like Istio and gRPC and make it available to your developers so that they can focus on building the applications without having to worry a lot about the networking and securing the network communication parts, which can be taken care of, then, by your security admins, who would use tools like Istio and gRPC to make sure that this communication is secured. We also have a lot of partners in the areas of security and we're working with adding many, many more every day. We keep talking to different partners and we try to see how we can integrate with them to make a better offering available to you every single day. And so this is just a quick snapshot of various partners that we have in the security space, and I'm sure you recognize a lot of names there, so we do work with them very, very closely to make Google Cloud a very, very partner-friendly offering when it comes to network security. So with that, I just want to recap by saying thank you so much for all the feedback that you gave us in the last year and it's really helped us empower your success by trying to come and make it easier for you to move to the cloud by meeting you where you are and providing you these secure and flexible yet simple and reliable offerings that can really work for any workload that you're looking to use. And with that, I definitely want to also say I'm really looking forward to talking to all of you and hearing what you have in mind for us for the next year and what you would like to hear about when we come back to Next 2019 next year. So now I'm going to call Nick on stage to come and talk about what a lot of these offerings meant from a customer perspective. NICK JACQUES: Thank you. All right. Hi, everyone. My name is Nick Jacques. I am a lead engineer at Target and I work on our cloud platform team. I'm really excited to be here today and chat with everyone about some of the things that we're doing in GCP. So just a little bit about who I am and what Target is. So Target is a rather popular retailer here in the US. We have 1,800 stores, several distribution centers, a few headquarters, locations, and then we also have one very big virtual store, Target.com. I've been working on our cloud platform team for about 2 and 1/2, 3 years now, and prior to that, I was on a couple of our different infrastructure teams dealing with enterprise storage and virtualization, private cloud build-out, all kinds of good stuff like that. The reason why we thought it'd be really appropriate to share some of this with you is Target started its journey in GCP in early 2017. So it's been a little over a year since we really have migrated to GCP and Earnest, so we thought that kind of lined up with the discussion about what's new over the past year in GCP. Just to give you a little time table, in very early 2017 is when we started migrating our workloads from another cloud provider into GCP, and we were able to complete that I want to say around late July, mid-August, so right in time for peak season. We had a very successful peak season with no issues and everyone was really happy about that. Early this year, we had discussions around where we were going to migrate our commerce workloads. So we kind of split these into two categories. What non-commerce means for us is, when you're on Target.com or if you're using one of our apps, it's basically anything that doesn't involve purchasing products. So if you're searching for products, if you're swiping through looking at images of products, things like that, that's what we call non-commerce. And then when we actually have you add items to your cart, check out, and then pay for those items, those are kind of the commerce things. Those tend to be a little more difficult to do because there's quite a bit of regulatory compliance involved with them. So we started migrating our commerce workloads in Earnest right around the middle of this year, probably around March or April, and actually, we're almost 100% complete with that so we're pretty excited that we have almost all the things that power Target.com in GCP right now. So just to give you kind of the gamut of what's in there, we have quite a bit that's there. A lot of these services that I list under Target.com are actually shared and our mobile apps use those services to display content as well, but we have everything from our adaptive frontend-- so basically giving you responsive design depending on what you're using-- desktop, laptop, tablets, mobile devices, things like that-- and also our backends. Our backends kind of spread the gamut between things that are very monolithic-- they're kind of standalone apps-- to some very complicated applications that have upwards of three or four dozen micro-services. We also operate a data persistence layer and provide replication services to and from our data center and a platform for logging in metrics. These last three items-- the data persistence, replication, and logging in metrics-- are part of what we refer to as platform services and that's part of what my team helps offer our application teams or our tenants in GCP. We provide a suite of services that any application team can use and basically provides common functionality around any cloud that we deploy to. So it abstracts a lot of the difficulties away from our application teams in terms of figuring out how to do service discovery, where to store data, how to replicate data, things like that, and really lets them focus on building their apps and then deploying those apps and not really having to worry about some of that underlying infrastructure. And then, as I mentioned, we also operate some of our commerce stuff in GCP, and so those things are PCI DSS compliant, which obviously means that there is a great deal of regulatory compliance that's attached to it, and those environments are highly segmented. For mobile apps-- so in addition to everything that we just talked about for Target.com, we also have some unique mobile offerings, so we have things like Cartwheel, which is part of our mobile app that lets you discover coupons and various discounts that are available in the store. Wallet, which is actually really great. It allows you to take all those coupons and discounts that you discover with Cartwheel and pay with your red card in a single scan of a barcode so it's really handy. I use it all the time. And then we also provide some other interesting services, like in-store mapping. So we can tell you where a particular product is that you're searching for in a store and then actually show where you are in the store and you can kind of find your way to that product. There are a few other mobile apps as well. And then we also host a variety of API endpoints. So we actually host our developer documentation at developer.target.com on GCP, as well as a variety of API backends in GCP. So these are everything from item availability, creating shipping labels, creating barcodes, things like that. We also host a variety of static content. So think images, JavaScript files, all kinds of stuff like that, in GCS buckets, and we also use Cloud DNS. So I sat for a while and tried to think about, how can I best communicate all the various features that we use in terms of networking and infrastructure in GCP? And I thought for a while about this and ultimately what I decided to do is turn to a tool that a lot of people use in large companies when they're making decisions, which is a particular geometric shape that tends to be or have magical properties. So I looked at this-- and I definitely didn't make this on a plane. This is very official. And what I saw was that word cloud, for the eighth year, is dominating the information density representation. So with that in mind, here are some of the features that we use in terms of networking and infrastructure for Target. So there's quite a few up here. I'm not going to go through them all now because we'll actually kind of just naturally go through those as we work through these slides. So briefly what I want to talk about are the environments that we have. So as I mentioned, our non-commerce environment has to do a lot with basically browsing and discovering products and services on Target.com. So the way those are structured, we have several environments. So we have a dev environment, a non-prod for staging and performance purposes environment, and then we have a production environment. And you can think of those environments basically as single monolithic projects. Inside of those projects, there is a single VPC that contains many different regional subnets, and what we've done is something kind of interesting here. One of the challenges we've previously encountered in prior cloud providers and just due to the nature of Target itself is IP address exhaustion. So particularly in a cloud, you want to be able to scale and not run into any issues where you've exhausted a subnet and now what do you do. So to get around those challenges, because these tenants are actually quite large in size and scale through many thousands of instances, is we actually kind of divorced the application subnets from our data center routeability. So what that means is all of our applications that operate in our non-commerce environment are basically air-gapped from the data center, but what that means for us is we've created several very large subnets so we don't have to worry about partitioning off specific applications and specific subnets or things like that. It's just a very large general-purpose hosting environment. The way that we allow the applications to talk to services in our data center or replicate data across is that platform services suite of components that I mentioned earlier. Those services are data center routeable and so that's how we replicate that data across, and then the application teams can consume it in GCP. If we take a look, then, at our commerce environment, that environment is actually quite different. Yes, it's still hosted in GCP, and yes, we still have our platform services there, but many things beyond that are different. So, for compliance purposes, we wanted to make sure that we had a highly segmented environment here. So what we've done is we've actually created many projects and we've used shared VPC and VPC peering to give us that really highly segmented environment and help us partition off each individual application that runs there. In this model, all of our instances are data center routeable, and we made that choice because this environment is actually at least an order of magnitude smaller than our non-commerce environment, and many of those applications have direct requirements for synchronous or asynchronous calls to our data center to complete things like payment transactions and so on. So we needed to give those applications a way to directly hook into the data center. As I mentioned, these tenants are highly segmented and basically everything in this environment is subject to regulatory compliance. So obviously we wanted to make sure that we got it right. Going through some of the common patterns that we have across both of these environments, we currently use IPsec VPNs to connect our data centers to these particular environments. One of the things that we're looking at, which was mentioned just a bit ago, is partner interconnect, and that's something that we're investigating and we'll be taking a deeper look at that either later this year or early 2019. We also use Cloud DNS basically across all of our environments in GCP and that hosts the authoritative DNS for our internal resources as well as our external resources. Our developers are a really big fan of this because this means that they can use a common consolidated Terraform repository to create DNS entries for their applications and they don't need to cut tickets to another team to create DNS entries and wait for another team to complete those tasks. So we really empower developers by letting them create their DNS entries themselves. For any type of situation where instances need to reach out to the internet, and this is actually fairly common given that API.Target.com is surfaced on the internet, we provide either a NAT or a proxy service. So these services are instrumented with packet capturing software and they basically allow us to police traffic that's egressing our caught environment and take a look at what's going on there so that we can react to it should anything happen. As I mentioned several times already, our platform services are consistent across all of these environments. We also use the HTTPS load balancers almost universally. I would say that there are probably six load balancers that are not the L7 load balancers. We love them. They're great. They give us an Anycast IP address, we operate services in multiple regions, and all we do is wire those services up with the same load balancer and away we go. We don't have to worry about, is this IP address in this region and how do we geolocate our users and make sure that they're going to the region close to them. It's all handled through that load balancer. The way that we let our application teams manage their deployments and manage the instances they're running on is through an open-source tool called Spinnaker. It's a really great tool and I would highly encourage you to check it out if you're able to use it in your particular environment. It really takes away a lot of the pain of creating a deployment pattern. You don't have to worry about having people login to an instance and run scripts or anything like that. Basically, you create some sort of package install deliverable, Spinnaker will bake that into an instance template for you, and then you can deploy that as many times as you would like and deploy it in multiple instance groups and it makes things really easy, so I would highly recommend that. One of the things that we use fairly heavily is private IP Google access. The reason why we use this heavily is almost every instance in GCP that we operate does not have a public or external IP address. Our NATs and proxies are there for that purpose, and so what we want to do is keep all that traffic, especially traffic that's destined to APIs, for example, to GCS, just internal to our VPCs. So we use private IP Google access for that. We collect and analyze logs across all of these environments. And then finally, we use Cloud Armor and SSL policies to make sure that our load balancers at the edge match our security posture that we'd like them to have. So for instance, only accepting TLS version 1.2 connections, things like that. So we've talked through a lot of lists. What I want to do is give you a visual that we can take a look at. So at a very high level, this is what our non-commerce environment looks like. So what we have in our application hosting area that's not really pictured here are several subnets that span multiple regions. But roughly what happens is we have a Layer 7 load balancer that fronts our various endpoints. So it could be an application with multiple URL routes. It could be just a single-purpose app or a micro-service. And then we have that partition for platform services again. So again, there's things like service discovery that we host in there, data persistence, data replication, things like that. Then, as pictured, we have a VPN that connects this environment to our data center, and you'll notice that we've attached only the platform services subnets to our data center. The application hosting subnets really only talk to our platform services and that's about it. Then finally, the part that we haven't touched on yet is, what do we do with requests that are inbound? How do we actually service requests from either our guests or API calls or things like that? How this works is we have our clients-- they might be our guests, they might be a third party that's using our APIs-- and the first step is they'll hit our CDN provider. So our CDN provider is effectively our edge. And through there, in a variety of caching and web application firewall layers, eventually we'll traverse back into GCP and we'll hit one of these load balancers that represents a particular service that we're hosting. So that's kind of things for our non-commerce environment in a nutshell. If we go over to our commerce environment, pay really close attention because the picture is going to look fairly similar, but what you notice is we have a lot more boxes down here on the left-hand side. So what we've done is we've used shared VPC and what we've done is we've partitioned out each of these applications into its own service project. So what that means is our application teams, through Spinnaker, have full reign in their service project to do deployments that they would need to do and they can look at all of the logs that are collected via Stackdriver. This partitioning puts us in a really good place wherein application teams can't look at other app teams logs, right? You can only see the logs that are relevant to the workloads that are operating in your project. So that's been working out very nicely for us. We've done the exact same thing with our platform services as well, so no one is immune to the segmentation and partitioning. So each of our platform services teams that operates each one of these particular services that we provide has their own service project as well. And then what we've done and the great part about shared VPC is we consolidated all of our network policy into the host project. So what that means is all of our custom routes are there, our VPN connections are there but apply to all of these service projects as well, and then our firewall rules are there as well. So the great thing about this is we have a centralized place for firewall rules, we don't need to go hunting around to see if a particular application team has changed a firewall rule, and in fact, the application teams can't change a firewall rule because the service projects won't allow it. So all of those firewall rules are hosted and kind of consolidated in the host project and then we provide our developers again a Terraform repository through which they can enact those changes. In this particular case, they are heavily reviewed and we ensure the safety before we put those into effect. And then the rest of it is very similar. We have VPNs that connect us to the data center. In this case, on this GCP side, anything can traverse the VPN and come back to the Target data center. And then we have a firewall on our side to make sure that only the appropriate traffic is coming back or going to GCP. And then our internet path over on the upper right is basically the same as non-commerce-- we still use our CDN to direct the traffic inbound for us. OK. So this was a really high level. What I want to do now is kind of zoom in to just the GCP area a little bit and show you what's going on there in a little more detail. So the first thing we'll take a look at, we sometimes jokingly refer to this as a load balancer sandwich, but for our application teams, what we do is we obviously have the L7 load balancer at the edge for public ingress, and then we actually have internal load balancers that we use for connectivity either inside the VPC or from our data center up into GCP. Below, what you'll see in the Platform Services section is things got a lot bigger. We talked about a lot of these items already so I'll just gloss over this really quickly, but a lot of these services are common to each environment. And as I mentioned, we have proxy and NAT services available for these workloads that need to reach out to the internet, where we do police that egress. If we take a look in the lower right and look at the Network section, there are some things that we use that are kind of inherent across all of these environments. So we use Cloud VPN and Cloud Router together. And what we actually do in these scenarios is we use BGP to announce the routes across. So this is really great. This saves us a lot of time from having to try to figure out how we announce these static routes and putting all these change requests in. It makes things really easy for us and it makes failover really easy for us as well. We have Cloud firewall rules, as I mentioned, which are inherent across all of these environments, and there are many of them, and then we use custom routes to do things like pull traffic destined to the internet through our NATs. So there really isn't a good way for applications to egress out any other way than through our NAT. In the upper right-- we'll kind of just gloss over these a little bit, but we use many other GCP offerings. We can see a sampling up there. So we use IM very heavily. We use logging in Cloud Pub/Sub, which has actually been very helpful to us. So what we do is we have logging syncs on a lot of our Stackdriver topics and we pull that into Pub/Sub, and then what we do at our data center is subscribe to that topic and ingest that. So instead of clouding up or filling up our VPN tunnels to and from GCP, we basically transmit that out of band and pull it into our data center, where it's logged, stored, and analyzed. And then finally, the last piece that I want to touch on are the Cloud Armor and SSL policies that we apply to all of our load balancers at the edge to make sure that we have consistency in the way that those endpoints are accessed. So just quickly going through some highlights, things that have been really great for us over the past year, the global L7 load balancers, as I said earlier, have been really great. We're big fans of it and I don't think we really have any complaints about it whatsoever. Inter-region networking has been great for us. So this was something that we were able to adopt when we moved to GCP and it's basically built in. It's really quick. It's really easy. We have absolutely no complaints about it, and for some of our services-- so for example, a lot of our data persistence is stored in Cassandra. We can just replicate across regions without having to worry about regional VPCs and peering those VPCs together, or even going to the level of running VPNs in those VPCs and then connecting VPNs together to get traffic across them. So that's been great. Our IPsec VPNs have been absolutely stable and we've had a really great success with those. The only reason that we're looking at moving to Interconnect is we need a little more bandwidth than what the VPNs can currently provide us with. Cloud Armor has been fantastic. That has been in place in our commerce environment since day one and we're currently finishing a rollout in our non-commerce environment as well. A little tidbit that isn't talked about very often is the NTP server that is built into the metadata server. So this is great. What it does is it gives you access to Google's NTP servers, and you actually get the time smearing that Google provides with their public time.google.com server as well. So the great thing here is we don't have to configure anything. This is all kind of baked in and good to go and we don't have to egress out to the internet for NTP sync. So that's been really handy. Private IP Google access, I touched on that already. That's been really great for us. Another really great feature that thankfully we've only run into one time is the ability to expand your subnets live. I think a lot of folks know that no matter how you plan and how many whiteboards you draw on and how many spreadsheets you fill out, trying to set up your IP space for cloud deployment, something unexpected always happens. And in those cases, what's actually really easy to do in Google is if you haven't consumed the space that you'll be expanding to, you can flip that /24 to a 23 or beyond, as long as that space is available, and you can do that live. So it's a really good way if the unexpected has happened for you to get a little headroom for yourself. And then finally, Cloud DNS. Cloud DNS has been great for us, and our, as I mentioned earlier, developers love it to kind of be in control of DNS and do that in a self-service fashion. So then I want to just wrap up with our lessons learned here. One of the biggest ones is that everyone should have and exercise DR plans. We have an internal tool that we use that basically allows us to go through all of our L7 load balancers and disable traffic to a given region. We use this in emergency situations or when we have issues with a stack in those particular regions, and for the most part, it's been working really well. Typically within about 15 or 20 minutes after an incident is called, we're able to drain all of the ingress traffic to a particular region away and send that traffic to other regions, and we just do that with a simple config change the load balancers so that's absolutely great that we're able to respond in that way. If you're operating a hybrid cloud, lots of planning involved and lots of communicating with teams across the organization. We've been very lucky to have really great partner teams at Target, all the way from our infrastructure teams that help us set up the VPNs through our platform services teams that we've partnered with for that data replication and storage, all the way up through our app teams. And one of the things that we've done that's been really critical is sat down with our application teams and made sure that, as we're planning things out, they're included and we take their feedback into account as we're building out these new features for them. One interesting item is, be careful of attaching the same backend service to multiple load balancers. We encountered this last year when we had teams that, when they're operating in that load balancer sandwich model, created a single backend service and attached it to both. And what happened in that particular scenario is the signals from the load balancers were a little cross-wired and they were not able to scale that service based on RPS from the Layer 7 load balancer. So if scaling based on RPS is important to you, avoid attaching your backend service to multiple load balancers. One other interesting piece that has been a little bit of an adaptation for our application teams and our security teams is that, with the exception of Cloud Armor or the network load balancer, the firewall rules that you apply are applied to the instance itself. I'm sorry, this actually does apply to network load balancer. So for ILBs, for instance, there is no firewall rule that you attach to the ILB, right? It's the firewall rule that you attach to the instance that actually allows that traffic through the ILB. So a little bit of working to get that communicated across all of our teams. And then finally, just be wary no matter what you're doing-- if they're static or dynamic announcements-- of what routes you announce back on premise. We're lucky to have BGP route filters in place on the Target side of things, but we actually for a period of time did wind up announcing what is actually our internal DMZ space in our data centers that we use in GCP as well. And that's part of that kind of separation that we operate our non-commerce applications in. So luckily we had those filters in place and nothing bad happened but just be wary of that. So I think that about wraps it up for me and I'll turn it back over. Oh, sorry. So kind of tying in with a couple of other things, I'm sure you'll be hearing plenty of announcements through the rest of this conference and the rest of this year. Some things that Target is looking forward to, just a variety of enhancements up here on the screen. We use ILBs quite heavily and Cloud Armor quite heavily, so we're looking forward to some enhancements there. We're also looking forward to some additional Interconnect locations. And one of the big items that we're looking for is just some general enhancements around VPC performance and some of the managed services that might be announced later this year. So with that, I'll send it back on over. Thanks very much. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 3,348
Rating: 5 out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google Cloud Next, purpose: Educate
Id: 36pe3OtvQP4
Channel Id: undefined
Length: 50min 2sec (3002 seconds)
Published: Wed Jul 25 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.