How Airbnb Secured Access to Their Cloud With Context-Aware Access (Cloud Next '19)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[MUSIC PLAYING] AMEET JANI: I am Ameet. Good afternoon. And I'm very excited today. We have a special guest with us. Just a quick question, how many of you have heard of BeyondCorp or Context Aware? Awesome. Today, we are going to talk about BeyondCorp or Context Aware with an actual customer who's actually deployed it. This is Samuel. Samuel is, in some ways, my dream customer. He wants to try everything we do. And in some ways, he's my nightmare customer because he breaks everything that we do. But that's a good thing. So Samuel had two conditions he had for talking. One is this is not a marketing presentation. So he's going to tell you what worked and didn't work. And he really does do that. And the second is that he's going to talk about being on another cloud. And I said I think some people here might be on that same cloud, so it's OK. Just out of curiosity, how many people are running a second cloud, another cloud? There's probably more. You guys are just not saying. So let me jump ahead. Because there are three speakers today, they asked us to introduce ourselves. First is Sam. Sam is the star of the show. He's a security engineer at Airbnb. Next is Gagan. Gagan works on cloud identity. You're going to talk about G Suite. And then lastly is me. Yes, that is really me. I am on the Cloud Security team, working on Identity-Aware Proxy and [INAUDIBLE] products. And so we'll talk about all those as we go forward. I'll give it to Sam. I just want to highlight. He is the star of the show. We're going to let him do most of the talking. He's going to set up the challenges. Gagan and I will quickly come on stage and talk about how we have tools to solve all those problems. And then Samuel will come back on stage and tell you how he broke those things to make them work. So I'll leave it to you. SAMUEL KEELEY: Cool. Hey, everybody. I'm Sam. So, first, I just want to [INAUDIBLE] disclaimer that's at the bottom of this slide. I've been required to include it because I am going to talk about other clouds, about AWS. So we've had a lot of challenges. But I just want to go over briefly what our corporate architecture looks like. It's probably very similar to a lot of companies. So we have users. They have browsers. They talk to a lot of hosted SaaS apps, Slack, G Suite, GitHub, tons of these apps. But we also have a large corporate infrastructure of what we run ourselves. So this is composed of applications primarily running on AWS, along with some applications that run in some local computer. All of Airbnb.com runs on AWS. But we have some supporting services that run with some local compute, primarily for some latency reasons with SIP routing and things like that. But this is really what our infrastructure looks like, scaled down, simplified, very basic. But we've invested heavily in end point instrumentation and reporting. We use tools like Osquery from Facebook, along with a tool that we developed internally called StreamAlert, which runs in AWS Lambda and analyzes the results coming from Osquery suite. We have a ton of data that we look at coming from our end points. And then our users get to our internal services through VPN or 802.1X when they're in an Airbnb office, whether that's on a wired connection or a wireless connection. If you're not in office, then you're going to be using VPN. So this is basically what our infrastructure looks like. There's a very strange transition there. So Airbnb was founded just a few blocks away from here in SoMa about 10 years ago. And over the past decade, has grown out of just San Francisco. And we now have offices all around the world, with thousands of users supporting our community of millions of hosts and guests. Our largest office is still here in San Francisco, but with all of these offices, we ended up building a network for the users to get to our services. But this growth has presented us with a constant challenge. When you're just one office, it's pretty easy to get into your resources that might be running in AWS by whitelisting the IP of your network or something like that. But as we've expanded, that hasn't really worked. So what we've realized with that is that the worst experiences that our users have are inconsistent ones. So while you might be in San Francisco and be able to get everything, if you go to Portland, you haven't gone very far. That's still a large Airbnb office. But you might not be able to access everything that we have. And the problem really just gets worse the farther away that you get from San Francisco. Or it's gotten worse. So a couple of years ago, we started a project called Work Anywhere. And this comes out of three kind of realizations. The first is that securing user networks is pretty hard, at least doing it in a way that's user friendly and scalable. So for us, 802.1X and VPN have had a lot of edge cases where they don't really work. So in the case of 802.1X, when we're expanding into a new city, we might start off in a managed service office or something like that, where we don't control the whole network. So it'll look like an Airbnb office. It'll feel like one. But the network that they're coming through isn't run by us. It's really just an internet connection. So there are generally ways that we could end up deploying 802.1X there through remote access points or something like that. But generally, getting those offices, which are Airbnb offices, on the network is not so easy. So we end up having those users who use VPN. And then VPN has so many edge cases, too, where VPN falls over-- from users that are really far away, from their VPN gateways, or for users who are flying in an airplane, and they can't approve MFA, or it's really hard for them to get MFA working when they're on VPN. There's just so many edge cases, where just securing the network is hard. Then what we've realized, that is latency makes things even worse. We haven't been able to defy physics quite yet. So when we have these offices all over the world, and most of our applications are running in US East in AWS, as most companies end up running things in a single region or a single country, the latency to these apps can just be really bad. And we can't get you any closer to that application. But it's really easy to make things worse. So we've known about this issue for a long time, particularly for users who are in Australia. And if you are in Australia, and say, you're in Sydney, and you want to connect to a VPN, you're probably going to connect to a VPN endpoint that we have running in Singapore because the one that's closest to you. But if you're in Sydney and you're connected to a VPN gateway in Singapore, that traffic is going to route actually way out of the way to go to Singapore and then come all the way back to the east coast of the US. It's over 50% farther to go from Sydney to Singapore all the way to the US than it is just to go from Sydney to the US. It might not be evident without a globe, but that's the reality. And so latency becomes a problem for even more users. It becomes really obvious for me when I'm on an airplane, with the explosion of use of satellite-based Wi-Fi on planes that's going to outer space. The best that you're ever going to do there is like 600 milliseconds. But if you are on a flight from Tokyo to Los Angeles, your internet traffic might actually be bouncing off a few satellites and coming out in Europe. Now if you're on that flight and you need to connect to VPN, which gateway do you pick? I mean, I would probably pick the US one, but that's actually the wrong one for me to pick. I should be connected to the EU one. And there's a lot of VPN software that can attempt to make the right decision, but we've just realized that latency gets bad for a lot of users all the time. And it's hard for users to pick the right thing. And then lastly is that users expect to work anywhere. If a user has a good internet connection, they know that they have a good internet connection. They can get to Gmail. They can get to all these apps. But their connectivity to our internal apps may not be so great. If they're far away from a gateway or they're choosing other ones, it's just not working great for them. So while a user can theoretically work from anywhere today, they're able to connect to anything today. If they're over VPN, they can't really work anywhere if it's a bad experience. So we just believe that if a user has a good internet connection and a secure device, they should be able to work. So that's where our Work Anywhere project came from. And I think that's it from me. Oh, one more slide. So IT security at Airbnb, so where does security come into this? So I'm just a security engineer. I've been mostly talking about networking and user-facing stuff. And this slide could have a lot of things on it, talking about what IT security does at Airbnb. But I think it's just really exemplified in this one photo of a security key. So what IT security tries to do at Airbnb is find the best mix of user experience and security to propel both forward. And I think that the security key is really-- it's the holy grail for what that should be. So I'm proud to say that we've deployed over 10,000 of these security keys at Airbnb. And they the majority of our multi-factor auths are happening with these for our internal users. We no longer have any phone call or SMS-based MFA, which are obviously insecure. But they were really just a bad user experience. So we've been able to take MFA from being a burden to something that users have actually embraced and enjoy to use. And that's what we've been trying to do with Work Anywhere, is propel both user experience and security forward from what we've had in the past. Because there's a lot of ways that you can have a great user experience. If we just wanted the best user experience, we just put all these apps straight on the internet. But that would obviously be horrible for security. So we've tried to avoid to do that. So with that, I'm going to hand it off to Ameet. AMEET JANI: Whoops, sorry. Thank you. He's coming back in a minute. He'll talk about what they actually did to solve their problems. But I did want to just transition here for a moment. All the things that Samuel talked about with security problems and all those different things, Google themselves, we ourselves faced. About eight to 10 years ago, there was a well publicized thing of state-sponsored hackers trying to get into our network. And so out of that, we kind of changed our model from a network-centric VPN model to what we call the BeyondCorp or zero trust model. And so really, there's three tenets of this model. The first one is that you no longer care about the network that somebody's coming on. You really, really care about the context of that request. And I'm not talking about five or six things. I'm talking about 120 different things. And you're really thinking about that on every request to an application. And that has a lot of ramifications about how you deploy these applications. And we'll talk about that. And the key is that you should be able to create granular level access controls based on if somebody's coming from a corporate device or maybe a known or trusted network versus a non-corporate device. Those people shouldn't be shut out of being productive. It just means that what they get access to should be limited. And we'll talk about how we did that. But this is exactly the model that Google started to follow. And so when we were putting this model together, we kind of hit on a few key tenets. I don't think there's anything that's rocket science here, but it's really important to just walk through and kind of understand all the components. We can talk about all the solutions for everybody. The first one that we discovered is very, very important to understand the users. I think that's obvious, but there's a secondary portion of that thing, which is that you wanted to understand user behavior. If you log in from your mother's house every Tuesday, we shouldn't care that it's your mother's house, but we do want to know that you have these patterns and behaviors. And if something starts to fall out of the norm, we can start to at least create speed bumps for users along the way. The second part of this is second factor. And we'll touch on that in a minute. The other part of that whole side of things that we found is even if you trust a user, you can attest very strongly that the users who they say they are, the device posture really, really matters. And I'll tell you personally at Google, I think we published this in a white paper. We have the team here. You can ask them. Something like 125 characteristics about a request from the device itself. That's the two major components. But then there's a whole other context portion of this that we really cared about. And so we run a global front end at Google, which captures which IP address you're running coming from. We happen to know, we happen to be constantly scanning the internet to understand what are good IP addresses, what are VPN IP addresses, what are state-sponsored IP addresses. And we can start to block access from those. We also understand location, and time of day, and session age. All this stuff needs to be collected at the request point every single time. But then the real smarts for us came in when we started coming down the chain. The rules engine portion, we needed to be able to create human readable and very easy to define rules that took all those context and took into account. So you can apply them at, in fact, an end point, right? And so for us that we wanted to protect every piece of data that's on the web for us, we needed to protect. That's not just HTTPS. It is every type of piece of data that's out there. And so that was sort of how we came up with our BeyondCorp model. And we put tooling in place that revolves around all of this stuff. And then what happened about two or three years ago is we decided to start productizing these piece by piece. And so what you're seeing here is that really, the story is more of a suite of solutions than it is one singular solution. And I'll touch on all of these things. And the good news is you don't need to adopt all or nothing. There is a gradual crawl, walk, run model that we find ourselves at Google using, we find customers using. And so we can talk about how that would work. But just touching on each for a moment here, let's focus-- just turn to these very quickly. The first one is Cloud Identity. Each one of you has an identity system. You don't need to use ours. But we have one that's called Cloud Identity, in fact, where we manage things like users, groups, role information. That's all very obvious. They all sort of do that. Now, imagine a world where that identity system is locked into your calendar, right? So you understand somebody's on vacation. Somebody is in one office and not another. And a request is coming in from somewhere strange. You can start to kind of create these sort of vectors. The other thing is you need a massive system that does login abuse detection. For us, the same system we use for the enterprise side is what we use on the consumer side. So we are getting good at detecting fraudulent signals because we do 1.2 billion unique logins a month. And so, we've gotten very good at detecting what those signals are. Those are pretty obvious things. The two other things I'll just touch on briefly that we find that a lot of customers don't do is session control. I don't mean infinite versus one hour. I mean that every application has a different use case or a different footprint. And you should be able to create unique session length controls for each of your types of applications that are out there. The other one is MFA. I'll say this. MFA is better than no MFA, but there is a better story than OTP. If you don't believe me, just Bing OTP phishing. Or you can Google it if you want, CM vendor agnostic. But no, in seriousness, the OTP code have been phished. You can do a simple search for it and see. What we use internally is this idea of a FIDO U2F key. Use any vendor. It doesn't have to be us. It can be anybody. We find that this makes it-- if I say un-phish-able, you know that's always a dangerous thing to say. So I'll say very, very phishing resistant. And no credentials at Google have been phished in the last 18 months, because we've moved to this model. The second portion of this-- so we talked about Cloud identity. The second portion is actually the device itself. Here we offer a few solutions. The one I'm going to talk about today, because I think we have a broader story that's in other sessions, is what we call Endpoint Verification. Endpoint Verification is really a simple Chrome browser-based and soon other browsers, preempting Samuel's complaints, where we understand the security posture of a device, where we understand the disc encryption, the screen lock version, the version of software running on the device, secured boot, all these different kinds of things. Those are interesting. I think what you'll find-- my friend Sanjay, Sanjay Poonen, today on the stage kind of announced. He kind of pre-announced this for us. We are working with vendors. So if you already have a mobile device management or endpoint management system, we are working together with this whole ecosystem to feed their signals into this. So you can say Vendor X or Partner Co is feeding those signals in. I want to use that to make access decisions for my Google resources or resources on AWS. You can certainly do that. And the other thing is you want to be able to distinguish between corp-issued devices and non-corp-issued devices. And so I think Gagan will give a little bit of a preview of some of what we're doing there. But the idea should be that not every device should be cut off just because it's not corp-issued. It just maybe they can't access really secure systems. And then, what you get for free when you run on us is this idea of a Google Front End. And this is a globally-distributed, near zero latency frontend that does a few things for you. One is it collects all the context of the request, the IP, the location, the time of day. But it also gives you lots of things for free, if you will-- things like DDoS protection, enforcing TLS, the latest versions of TLS. These are the kinds of things that you need any system be able to handle and anywhere in the world. And then, the next talk of this is what we call the Access Context Manager. If you remember, we talked about rules that you should be able to apply. The Access Context Manager allows you to create these levels, where you can kind of pair together these various context things. So you can say a device that's running only the latest version of OS. And coming in from this IP address, for example, should be able to access my most secure systems. Anything that's n minus 2 in terms of OS version and maybe doesn't need second factor so often, we'll allow to touch second tier systems. You should be able to create rules like that and make them mass appliable, if you will, to all of your systems. Many of you have-- you work in enterprises where you're running thousands, literally thousands, of IT applications. And you don't want take the time to go through one by one and tag these things. The Access Contact Manager is really our path to get there. And then, you took all this context. You created all these rules. And now the key is you have to enforce it against all of your endpoints. I think what you'll find is we have several solutions here, depending on what your use case is. And Samuel talked about some of these, but I'll just briefly touch on them. The first one is something called Cloud IAP, Identity-Aware Proxy. So if you're writing a web application on GCP, or in another cloud, or on-prem, you should be able to protect that through these rules that you've now defined. You should also be able to do that for, by the way, SSH and TCP connections as well. And the Identity-Aware Proxy does exactly that. And it does what it's clever name tells you it does. It really looks at the identity. It looks at the context. And it makes the access decisions. The other thing we would allow you to do is the same sort of thing with our Cloud IAM system for, think of resources. You have a storage bucket, let's just say that's holding a lot of personal information, you can create context awareness around access to those requests as well. That's just an example. You can do all of this with our G Suite products through our Cloud Identity product line. So you can say, I want to be able to access Gmail at this time of day, or only from a corporate device, or drive, or whatever it happens to be. And then one of the areas I think we really stand out from the other vendors is what we call VPC Service Controls. This is the idea that every public cloud is going to have their APIs on the web. That's just the nature of a cloud. That's, in fact, what clouds are supposed to do. But it's terrifying this idea that an employee, who suddenly goes rogue, or even unintentionally, takes their device to a coffee shop and just decides to start shutting down production systems. For all the other types of things we're talking about, this should be the most terrifying to most people. And today we now we now have a product that's in GA that's called the VPC Service Controls. That allows-- that will today take starts-- it has the beginnings of this, but it will start to take all of the same context and allow you to apply it to public APIs. Think of like storage buckets and VM productions and things like that. You'll be able to apply these same controls there. And more will be coming as we go along. So I think it's one thing to talk about it and more interesting to attempt to demo it. So I'll have Gagan come up and go through some demos. [APPLAUSE] GAGAN ARORA: Thank you, Ameet. Hello, everyone. My name is Gagan Arora, and I'm a product manager in Google Cloud. As Ameet just touched on, context-aware access is the key pillar for BeyondCorp security model. And you all can imagine a lot of use cases when you think of context. So imagine that you want to control access to your resources based on where the user is coming from or what you know about their device, how secure that is. There are so many use cases that we can talk about, but over the next few minutes, I'm going to go through two specific use cases. One on the GCP side and one on G Suite side to highlight how you can enable context-aware access in your organization. After that, Sam is going to come in again. And he's going to talk about how they are deploying Google tools to address their use cases. So let's look at the use case one. So this is something that we heard over and over from our customers-- that I want to allow access to all my applications to users from wherever they are. But there is an HR application that I may have, where I want to restrict access to contractors only if they are coming from a corporate location. So let's see how that works in our scenario. So since this is an HR application on GCP, an end user can access this application from wherever they are. Now the admin wants to enforce the use case that we just described. And essentially, I want to restrict the access based on the location of contractors network. So I go into Access Context Manager, create a rule called access level. And I define the contractor IP address first. The next thing I want to do is, since this application is behind the load balancer, I want to enable IAP for this. And this allows me to configure who can get access to this application and under what conditions. So in this case, I am giving access to Bob, who is my contractor but only if he is coming from his corporate network. So once I set this rule, I can go back and check and make sure that Bob has access to this application. And now we look at it from Bob's scenario when he is trying to access this HR application. So in the first scenario, Bob is in office. And he is logged into his device. And he is able to access this application, because he is on the corporate network. Now let's say, Bob goes to a coffee shop to get a cup of coffee. And he tries to access that application again. And in that case, because he is not from the corporate network, he's still logged in, but he is denied access. So as you can see, it's fairly easy for you to set up these kind of contextual rules, that allow you to restrict access to the applications, that may reside on GCP, on-prem, or any other cloud. Now let's touch on the case for G Suite side. This is, again, a fairly common scenario that we see among our customers, where I am a customer with lot of sensitive data in my Drive and Docs. So think of Docs, Sheets, and anything else. I want to make sure that users who are coming to access this Drive and Docs are coming in only if they have a secure device. And I'll walk you through exactly how you can define your version of secure device over the next few steps. The first thing you need to do, or your end users need to have, is something called Endpoint Verification. It's a Chrome plugin, that either they can install themselves, or you, as an admin, can push it to their devices. Once the end users have Endpoint Verification installed, as an admin, you go to your familiar admin console. In your security settings, you'll see a tab now called Context-Aware Access. Once you click on it, you'll be able to set different access levels to define the conditions. And then you can apply them to different resources. So in this case, as you can see, the corporate office IAP that we defined shows up here. But we're going to define a new rule called Secure Device. Once you've created, you are able to choose different attributes that you can use to define what a secure device means for your organization. In this case, I pick three specific attributes. One, I want to make sure that any device accessing my Drive and Docs has a password on. Number two, I want to make sure that the device is encrypted. And number three, I want to make sure that the device is fairly up-to-date. And in this case, I define that the macOS version should be 10.0.0, which is a fairly high version that I want to define. Once you create this access level, it's going to show up in the list of all access levels that you have defined. And you can choose other attributes like, is their device corporate owned or not, is it a Chrome OS device which I trust and has verified access on, and so on and so forth. So once you have defined the access level, the next step is to actually go and assign those access levels to resources you want to protect. And you can do it on their entire domain that you have, or you can pick a subset of users, what we call organization unit, and apply the same rules to only a subset of users. So in this case, I apply these rules to Drive and Docs for contractors. And I pick that any contractor who is accessing Drive and Docs should be coming with a secure device. So once I have defined this, any contractor who does not meet the minimum criteria that I have set is not going to be able to access the Drive and Docs that belong to my organization. So let's see, again, we go back to our contractor, Bob, who is trying to access this application. He does have Endpoint Verification installed. And this is something that sends the device information back to us, where we can check for contextual information. The next thing we check is the OS version of the device that Bob has. He has a version of 10.13.6, so this is higher than 10.0.0 that I've set. So in this case, Bob should be able to access Drive and Docs when he tries to access his documents. Now imagine a security vulnerability has come out. And Apple has released a new patch. And you want all your users to be coming from this latest patch when they're accessing this sensitive documentation. All you have to do in this case is to go back to your access level that you defined, update the OS version that you defined, 10.0.0, to, let's say, 10.14.2. And all this happens behind the scenes where it gets trickled down as a policy to all the users and all the applications that it was applied to. And in this case, when Bob tries to access the same application again, he will be denied access, because his OS version, 10.13.6, is lower than 10.14.2 that you, as an admin, require. So as you can see, it's fairly simple to apply these rules across your organization on GCP, G Suite. You pick your attributes and signals that you want to use that in context. And then apply these rules. We also have a live demo running on your right side, so feel free to go there. And we'd love to hear more about your use cases and how context-aware access can solve it for you guys. And with that, I'll invite Sam back. And he's going to talk more about IAP and context-aware access and how they have deployed this to solve their use case. [APPLAUSE] SAMUEL KEELEY: Thank you. So when IAP was first announced a couple of years ago, I was pretty excited by it because we already used G Suite, and we already used Cloud Identity for our users. So having users that would already integrate with this was pretty cool and pretty useful. But the question was could it help us solve our problems with our user experience like, making the user experience better? We had a pretty good grasp on user authentication. But just authenticating was more than we needed. What we really cared about was device side as well, especially since we've invested so much in our endpoints over the last three years. So we tried other authenticated proxies. And I personally use some in the past, usually labeled as clientless VPN. But they just kind of ended up causing more problems than they purported to solve. So we needed something different. And that would enhance our security and our user experience. With IAP working as part of a load balancer, it seemed to kind of fit how modern apps work. So setting up the context, there were three things that we were trying to achieve here. One was consistent access. So regardless of where somebody was, they should be able to access the applications that they need to do their job, through one easy method. So where in the past, we've had 802.1X, VPN, and other methods to get to an application, it ends up confusing users, where sometimes they need VPN for something, sometimes they need to 802.1X and VPN for something. And it just-- it presents a large confusion to our users. So we wanted consistent access per application. We knew that we were still going to need VPN for some things. But for each application, the access should be consistent. We wanted to enhance application performance, especially for users that are far away from these apps. So we have thousands of users that are thousands of miles away from the applications that they're using. And while it would be wonderful if every application that we use supported multi-region deployments, that's just not the reality, especially when we're buying third-party apps. They might be able to work in an HA configuration, but they're going to want to be deployed all next to each other. So we're just going to have to pick a region for an app to run in and deal with that. So we want to enhance application performance, have the best networking that we could. And lastly, because security is important, but maybe shouldn't be the first thing on your mind for everything out there, is that we wanted to enforce the endpoint security state that we've implemented over the past two years. So we're really happy with the way that our Mac, Windows, and Chrome OS endpoints are managed. But if we're not actually enforcing that those devices are the ones connecting to our applications, I don't know if there is truly a point to us having done that. So the first thing that you're going to run into if you want to use context-aware access is the Endpoint Verification extension. And the data that this extension gives you is incredibly powerful and incredibly valuable. So we actually deployed this extension almost two years ago. And it's been collecting data for us ever since. So one of the problems that companies run into over their growth is that their inventory, their physical inventory, isn't as good as they think it is. So when you hand a laptop to somebody, the first thing you're doing when you're growing is you're probably going to have a spreadsheet. And you're going to say who you assigned it to. And then you'll grow into some physical inventory management system. But that data in there is only as good as the data that you put in there. And most of that's going to be manual. And that doesn't scale to having thousands and thousands of endpoints. So before we went and used IAP for anything, we rolled this extension out. And it will tell you who's actually using your devices. So we force install this extension on all of our managed devices and found that definitely less than 100% of our devices were assigned to the people who were actually using them. So here's a list of my devices that I'm using. And some of them are company owned. Some of them are user owned. And here's an example of my MacBook Pro. And it shows this basic security state. It shows company owned. And we can use that data to inform our physical inventory. And we can also use that to inform context-aware access. So here's the thing. We don't use GCP to run our applications. Like I said before, Airbnb.com runs all on AWS. And we have some applications that run on-prem as well. But IAP with context-aware was still very appealing to us, because our users exists on Google, our internal users. So how could we achieve this when IAP runs on the load balancer, and the load balancer can't target apps which run outside of GCP? So this is what the diagram of this ended up looking like. So we have our users, who talk to the load balancer, which goes to and talks to IAP. And behind the load balancer, we have just Nginx. And we've-- yeah, so we've set up Nginx, and that routes the traffic all the way back to the third-party apps running in EC2. So thankfully, when we built-- when we rebuilt our network over the last few years, we kind of anticipated that we'd be connecting to more than just AWS in the future. Now we weren't really thinking about connecting to other clouds. But we knew that we'd probably have some like IPSec VPNs going to vendors or something like that. So getting a Partner Interconnect setup to land right next to the AWS Direct Connect that we're using was probably one of the easiest parts of this project. And the only thing that we've had to actually deploy in GCP is this Nginx instance, or Nginx instances really. But you could very well use a different kind of proxy. You could use Envoy if you wanted to. That's the cool new thing. You could even deploy that in Kubernetes. But we have a lot of familiarity with Nginx, so we're using that. And then, it goes over the Partner Interconnect and ends up back at the application. So that's pretty easy. It's fairly straightforward. Nginx just runs as far as the traffic. Users can get to the apps. We can use Endpoint Verification. And that informs the BeyondCorp, which acts as context manager references. And then we also have Osquery and other endpoint instrumentation, along with our physical inventory, talking to StreamAlert and then talking to Lambdas, that we've written, that talk back to GCP, that talk back to BeyondCorp and inform other access decisions. So that was for third-party apps. But we have a lot of applications that we've written internally. And they're primarily behind a tool called Internal Auth, which is an identity-aware proxy. Long before Zero Trust and BeyondCorp and all these words, Airbnb built its own authenticated proxy for this, because we were building all these internal applications, but we didn't want to rebuild authentication every single time. So this is in front of tools that are Hadoop front ends and other critical business applications, like the launch menu. So this is what Internal Auth has looked like for a very long time. Users come in through VPN or 802.1X. They come through the Direct Connect. And then they talk to an Nginx proxy, which has some middleware on it, which authenticates to OpenLDAP and also does MFA. So when it authenticates to OpenLDAP, it gets that user's information, which includes their LDAP UID and the groups that they're a member of. And adds those to headers, which get passed on to other applications, which are running in EC2, which know what to do with those headers. So we wanted to rebuild this and get the authentication to not be LDAP-based. We're moving towards this new buzzword of password-less future. But this was still using traditional LDAP Auth so we're thinking like, could we add this to SAML, or add SAML auth to this. But that was kind of complicated. So as we deployed IAP in front of third party apps, we kept running into the HTTP request headers, which come through in every single request. And in them, you'll see this goog-iap-jwt-assertion, which is this big long header. There must be a lot of data in there. And what it is is an assigned header, which is telling you more about what's going on. So if we verify that and decode that, we can see that this is the information contained within the JWT. We get the user's email. We get their access levels, which is really nice. Those can be reused in the future. So if you want to have your end applications maybe make access decisions to different types of data instead of just access the application. If you want to have different regions be able to access different things, you can reuse this data. But what we were really looking at here was the email. So this has my email in it, but that's not really what we needed. What we needed was the LDAP UID and the groups. We wanted to deploy this without making any changes to the applications. So this is what we ended up doing. This is what we've been working on over the last few weeks actually. So this is a little bit different from last time. It's a little bit simplified. I'm not including the full access context manager path. But we can see here that IAP adds this HTTP header to the requests. And now, since the user is already logged in through IAP, our middleware has been much simplified. So instead of the middleware authenticating the user, it's just going to OpenLDAP to take that email and resolve it to an LDAP UID and get the same user and group information that we always had to pass it along to those applications. So we still have a very similar deployment as we've had. But users get a much simplified experience, because they're already logged in. And they're not having to log in a second time. Our MFA is simplified. And we're able to bypass VPN and 802.1X for these applications. So to wrap up my part, I have to say that not everything is perfect with IAP and context-aware right now. So this is a pleading face. So this is everything that I plead to Ameet about that I've been promised is going to be fixed in the future. So non-GCP backend support, the way we're doing with Nginx, is kind of a hack. I mean, Nginx is a known beast, but I don't really want to run those anymore. So in the future, GCP load balancers are going to support network endpoint groups. So they support network endpoint groups-- I think it was in beta as of recently. It might be GA now. But what we really need is network endpoint groups to be hybrid so we can target things outside of GCP. And that would allow us to get rid of Nginx. It would make deploying this way easier. This is a big sticking point. Chrome is the only supported browser for-- Chrome is the only supported desktop browser for Endpoint Verification. So if you have users using Safari, or Firefox, or Edge, and you want to verify any of their endpoint states, it's just not going to work. They're not going to have access today. iOS and Android support, not yet generally available. But I've been told it will be beta soon. For third party apps, because the session time of IAP sessions is only one hour, if you're using AJAX apps, the callbacks will start to fail after an hour. If you write your own apps, you can do session refreshing yourself. But if you're using third party apps, you're probably going to need to inject an iframe into those applications. Nginx is really good at that. That's what we're doing, but that's a headache that you'll probably run into. And then applications that might have a mix of thick clients and web clients that might hit the same endpoints are going to be really hard to deal with. But not everything is terrible. I want to leave you with some things that I do like. So oh, they're all appearing at once. But if a web application can live behind a web-- oh, no. There is a missing smiley face there. If a web app can live behind a load balancer, it's probably going to be really easy to use IAP with it. And the Google Front End really helps us with having performant access to these apps. The session initiation is really fast. And the premium network is definitely better than the general internet. So when we have users coming from very far, it helps us a lot. And then the information that we get in the JWTs is very useful. And I'm very excited about the future state of context-aware for IAP, G Suite, and SAML so we have consistent access to everything. AMEET JANI: Thank you. [APPLAUSE] [MUSIC PLAYING]
Info
Channel: Google Cloud Tech
Views: 11,604
Rating: 4.7701149 out of 5
Keywords: type: Conference Talk (Full production);, purpose: Educate, pr_pr: Google Cloud Next
Id: Sq9gp8KBsY0
Channel Id: undefined
Length: 46min 24sec (2784 seconds)
Published: Wed Apr 10 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.