Challenges & Opportunities of Multi-Cloud Adoption

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi my name is Armand and what we're going to talk about today is as we make that journey from private data center to really operating in a multi cloud world where we have our on-premise environment plus multiple cloud providers what changes in terms of how we deliver our infrastructure for all the different people involved whether that's IT operators networking teams security teams or developers [Music] so when we talk about multi-cloud adoption what we often see is that the starting point is still the private data center so for most organizations they have an existing infrastructure running within their own four walls and so as they transition to multi cloud in the short term that's not going anywhere right so we still have our private data center but then in addition to that we're gonna start layering in some of our preferred cloud partners right so we might add in AWS and the juror and GCP and in China we might have a li cloud right so we're gonna start to add these additional targets on top of what we already have right so I think one challenge as we talked about going through this journey is the technology Reap lat forming right in our traditional data center we were largely homogeneous likely heavy open source with OpenStack or more of a proprietary platform with VMware but largely we see a lot of standardization around one platform within the private data center right versus as we now talk about being in this multi cloud reality each of these platforms is API driven but with a different set of api's right we can't really dictate to our cloud vendors and there's really no standardization right so one challenge as we do this reap lat form is just the diversity of the platform's themselves and the fact that all of them are different API so I think that's one piece of it the other piece in some sense which is more challenging is the process shift so when we talk about the private data center traditionally it's very much an ITIL driven model right and so really it's a organization of our teams around specific technologies right we might have our vmware team and our firewall team and our load balancer team and the experience of interacting with this is we file tickets across each of them right so as an example let's say we only had these three key teams our vmware team let's say our f5 team for load balancing and our Palo Alto Networks team for firewalls we'd first file a ticket against the vmware team and say i'd like a vm wait some number of weeks until that's provision then followed by creating a new ticket against our load balancer team waiting a few weeks until the load balancers update and then lastly finally interior firewall team waiting a few weeks until that gets provision so the experience as a consumer of this is we wait sort of weeks two months until we really get all the way across this process and this is really our sort of time to value right because until we get through this entire process it's not useful that I have a VM that exists but no traffic is going to it from the load balancer none of the firewall rules are open so I can't really do anything with it I have to kind of progressed through the entire pipeline before this is actually useful so as we talk about transitioning in a multi crawl world to accept to embracing these other platforms very few organizations are saying I want to keep the existing process but I just want to support multiple platforms in practice what we see is that most organizations want something much closer to sort of a self-service experience right and you might call this a number of different things it might be you know self-service it might be sort of DevOps it might be sort of see ICD but really it's this notion of over here our groups are not actually empowered right you file a ticket and you wait and you have to sort of orchestrate across many different groups first is over here how do we empower the end development teams to deliver their applications without necessarily waiting weeks two months right so in our view the combination of these two things in some sense breaks everything about how we do traditional IT so for us it's really about how do we then think about what are the key personas involved in this transition and what changes for each of them so at the base of this is thinking about our IT operations teams right and their challenge as they provision infrastructure so when we talk about provisioning infrastructure it's easy to only think about it in kind of that day one context I don't have anything running I want a provision and have my initial set of you know VMs or servers or containers running when we talk about provision we really mean the full lifecycle so it's not just the day one provisioning it's the day two patching it's the day to upgrade it's the scale up scale down deploy new versions and finally as you get to sort of day n the decommissioning right so it's kind of the full lifecycle when we talk about provisioning here the next challenge is for our security teams and increasingly as they work with our ops teams in terms of how do we secure all of our infrastructure right and when we talk about securing infrastructure it's really a few different layers yes it's the underlying infrastructure right access to our VMs access to our databases but it's also higher-level application right how do we provide credentials like database username and password and API tokens and certificates to the apps themselves as well as all the way up to data protection all right so if our application needs to encrypt data or it needs certificates or it needs any way of sort of managing data at rest managing data at transit there's a data security aspect in here as well so how does our security teams plug into that especially as we span multiple environments right the next challenge as we think about it is what about what networking teams right historically when everything was on prem we owned our own networking appliances right we bought the cisco gear and the f5 gear and the palo alto gear and we had strong levels of control around all the physical infrastructure increasingly though if we're in these cloud environments we don't write we can't buy our Cisco device and ship it to AWS we get whatever network is provided by the clouds right so how do we solve a lot of the same networking challenges without having the same level of control over the actual hardware that we're using so again this used to be only our networking team's concern increasingly overlaps with operations and it's really around the connectivity challenge of our different applications and infrastructure the final layer impacts our developers right so if I'm a developer what I really want to know about is the runtime that I used to deploy and manage my application right so that's sort of the final layer of this and I think within each of these sort of groups there's a transition that needs to take place as we go from our traditional private data center that was largely ITIL to those more self-service multi cloud platform right so just to highlight briefly when we look at the IT ops layer the key shift for them was they used to get a ticket and then they would point and click in at some console right potentially let's say VMware in the case that this is my compute team right you could replace this with the same thing for the f5 team or the Palo Alto team right you just still a ticket it's just a different system that we're going so the transition really needs to be you know we're no longer going to do a ticket oriented approach because this won't scale right if my goal is self-service I don't want my application team to file a ticket they need to be enabled to do the provisioning themself and similarly our platform is no longer homogeneously VMware or OpenStack we need to embrace a wider variety of platforms so the approach we like to take is really to say what you have to move to is capture everything is code right so an infrastructure as code approach and what this lets you do is take that specialized knowledge that your administrator had in terms of what buttons do I point and click to bring up a VM you capture that knowledge now as code and now you can put this in an automated pipeline so anyone who wants a VM can hit that pipeline and provision an identical copy that follows all the best practice without needing to file a ticket and having that expert person do it manually so it's about getting that knowledge out of people's heads documenting it in a way that's versioned and now putting it into something that can be automated like a CI CD pipeline the other side of this is we want a platform is extensible right so as we embrace other technologies and other platforms we don't want a different automation tool for every single one of our technologies otherwise we might end up with five different sets of tools for five different platforms and this just becomes a challenge in terms of maintenance and learning and upkeep with it rather we'd let right to have a single platform that's extensible to different types of technologies so the approach terraform takes with this is there's sort of a consistent way with HCL the high sugar config language to specify the code configuration you provide that to terraform which has a common core and then on the back in terraform has what we call providers right so providers might be something like AWS or assure or if we're on premise it might be you know VMware and f5 so these are all providers and what they do is act as basically the glue between terraform and the outside world so this is the key extension point for terraform as we embrace new technologies if tomorrow we say great now we're going to embrace an ally cloud right we can just start using the ally cloud provider for terraform and it doesn't change anything about our actual workflow we just specified in the same config which we use the same workflow in terms of day 1 creating infrastructure day 2 managing a day and decommissioning it but now we can extend it to support other technologies now as we talk about the security layer this is actually the hardest part right and there's a challenge here around the mental model that we're actually using for this infrastructure so when we talk about the traditional approach it's very much what we like to call castle and moat we wrap the four walls of our data center and sort of our impenetrable perimeter and then over the front door right where we bring all of our traffic in we deploy all of our middleware ok so we have our firewalls and our rafts and our sims and all of our fancy middleware and what we're asserting is outside bad inside good right and we have a large enough network what we might do is segment this into a series of VLANs and split that traffic out all right so this was the kind of historical approach the challenge is now we're going to take these fluffy clouds and basically connect them all together in a super network and we might use different technologies for this it might be an express connect or or a VPN overlay or some other form of sort of networking technology but effectively we're going to merge all these into one large network now the challenge becomes here we had sort of a perimeter that we trust it to be a hundred percent effective where here we no longer do right on a good day this perimeter might be 80% effective so the challenge becomes we move from a model where we assume our attackers on the outside of our network and they're thwarted by our front door to a model where we assume the attacker is on network because our perimeter is not perfectly effective right so as we go through this shift it's a huge change to how we think about network security right and I think it brings a few key things into focus one of these is secret management so how do we move away from basically having credentials sprawled throughout our estate right previously if we had a web app and I wanted to talk to the database the credentials would likely just be hard-coded in plain text and the source or a config file or something like that so instead we need to move to a model where we have no system like vault where the application explicitly authenticates itself against it must be authorized there's an audit trail of who did what and if it matches all of those constraints then it can get a credential that lets it go talk to the database right so it's really about applying some level of rigor and kind of cleaning up these secret credentials from being strewn about the environment the next big challenge is how do we think about data protection right because historically we relied on the four walls right we said the customer provided us let's say some credit card data and we just wrote that to our database and we said we're safe because the database is within the four walls now in practice this was never a good idea but our security model assumed that the attacker was stopped on the outside and so it was allowable by the security model now as we say the attacker is inside of the four walls this is a really bad place to be right and so again maybe classically what we would have done is something like transparent disk encryption so that the database would encrypt the data on the way out to disk but in practice this doesn't protect us against attackers on network because if I'm on network and I can say select star from you know customers the database will transparently decrypt the data back on the way from disk so instead what you need to do is think about data protection as something that's not invisible to the application when the application gets the credit card data or social security data it interacts with a vault as a piece of middleware and says please encrypt this data and then that encrypted data is what gets stored back in the database right so in this sense the application is aware a vault and is using it as part of the requests flow to encrypt and decrypt and manage data at rest right there's a lot more in here but it's really around thinking about what are the primitives we now have to provide our developers to be able to protect this data at rest the other side of this is data in transit and we talk about data in transit the gold standard there is TLS right so we want to use TLS to secure our traffic between our different applications and services on the inside the challenge of getting good at TLS though is that we have to get good at managing an internal PKI infrastructure right so when we talk about TLS the requirement is we got to get good at PKI and what we see with most organizations is that they're not very good with PKI what we see is you might generate a certificate that's valid for five years or ten years and check that into a system like a vault and treat it like a secret that should be managed for many years at a time and in practice what always ends up happening is you'll generate a cert that's valid for ten years it's running in production for some period of time and eventually that cert expires and the service goes down in production all right you got a sub one outage because the certificate expired and our view is it's really really hard to get good at a thing that you do once every ten years so if you're gonna rotate certificates on a five-year or ten-year basis it's almost inevitable that you're gonna have these types of outages because it's just not something that you're exercising frequently right so the way we tend to try and solve this is how do we think about it more like a logger rotate right you don't take several and outages because of log rotate it just happens every night so how do you make that possible the key is you start to not think about fault as merely holding on to a set of these certificates that live for a really long time and instead think about Vil as a programmatic certificate authority right so instead what we're really saying is that anytime the web server is allowed to come in and request you know dub dub dub Dasha Corp comm right what we're not specifying is what is the exact certificate that the web server is gonna get right what the particular set of random bits we don't really care what we're saying is the web service is authorised right so we worry about the identity of this service as well as what it's authorized to do and we don't really care about the specifics of what the value is and so what this enables vault to do is programmatically generate that certificate so the web server comes in and says give me dub dub Hajduk or comm vault can actually generate and sign a brand new random certificate and by the way that certificate is only valid for 24 hours or 7 days or maybe 30 days and so this starts to flip the model where instead of having these certificates that you create and manage for years at a time instead what you're managing is this authorization you manage the authorization that the web server is allowed to request a certificate and then we treat it like log rotate every 24 hours it fetches a new certificate and it rotates nightly to the new certificate but it's no longer check out the very long-lived cert and use it until it expires so you start to flip the paradigm a little and use automation to handle this rather than having a manual remediation process when a certificate expires this flow right of taking the identity of an application and mapping it to a set of authorizations ignoring the actual value is what vault refers to as a dynamic secret and so this pattern can be applied to not just certificates right another analogy would be the web server at any time is allowed to ask for a database credential or at anytime can come in and ask for let's say an AWS s3 token it allows it to go read and write from AWS s3 right and so once we start thinking about it in this way we don't know what the database credential is we don't know what the AWS s3 I am token is instead we're managing that the web server is authorized to request these credentials and on demand when that request is made what will generate a new dynamic credential that's short-lived it's only let's say a valid for 24 hours for s3 or valid for 30 days for the database and so we start to think about the credentials not as a thing that we manually manage but as the thing that vault creates dynamically and are ephemeral in the environment it exists for some period of time when it doesn't need to exist anymore vault shreds and it moves on to some new set of dynamic credentials but we don't necessarily know what those credentials our right and so you can almost then extend this pattern to really this notion of what we call identity brokering and the challenge that we're trying to solve is as we go to the sort of multi-cloud world we have different notions of identity in each of these environments right so in a private data center we might use Active Directory to provide a sense of identity we're an AWS where you have I am where in Azure we have you know a ad credentials right and etc etc each of these environments has a different notion of what identity means right and so if we have an application that needs to work across this for example maybe I have an application that runs and the private data center but it needs to read and write data from s3 or I have an application an AWS and needs to read and write from as yours blobstore how do we broke her those identities I want to trade in my ad identity and get a new I am identity and that's where this mechanism comes to play right it might be that we use Active Directory to authenticate an application as being a web server and then authorize it to follow this path to request an Amazon s3 credential and so in this way volt now acted as an identity broker it accepted a client that authenticated against Active Directory and then it was authorized to request an s3 credential allowing us to sort of do this brokering between different platforms right so these become some of the challenges security teams have to think about where before was how do you lock down the four walls and deploy a bunch of network controls centrally right to make sure the traffic coming in is filtered and vetted and trusted and basically we asserted that this perimeter was sort of the point at which we stopped our threats and the inside was sort of soft right where now as we accept you know what perimeters only 80% effective attacker is now on the inside as part of our assumption these are the pieces we have to start thinking about is the in scope right we don't trust that the credentials can be in plain text everywhere so how do we apply secret management we don't trust the database being behind the four walls of sufficient so how do we encrypt data at rest encrypt data in transit right and how do we move towards this notion of if Emeril credentials right because if our application logs it to disk or it gets leaked through an environment variable or gets leaked through sort of an exception trace back or a monitoring system etc these don't should not be credentials that are valid for you know days weeks months years right instead they become these dynamic things that are constantly being shredded and rotated them are sort of ephemeral in this environment so as we move up a layer and talk about the connectivity challenge really the challenge for our networking teams is a they don't control the network anymore right in these environments the network is defined by the cloud providers and it is what it is as well as they need to work more closely with operations teams because we're trying to go a lot faster it used to be that it was okay that it took you know weeks or months to update the load balancers and firewalls that's not okay if we're trying to get to a place where we can deploy five times a day right so if we look at the classic network right you might say I have a service a and it wants to talk to a service B but to do so it's going to transit pass the firewall and it's going to hard-code a load balancer right so we're going to transit past the firewall talk to the load mount sir and load balancer will bring us back to B and so what we have to do if we're let's say deploying a new instance of B is file a ticket against the load balancer team file ticket against the firewall teams and ask that the network path be updated to allow traffic to flow correctly right and so this tends to be manual it tends to take you know time and so the approach that we need to move towards is you know first of all how do we automate these updates and second of all how do we do some of this function which is authorization in the case of the firewall right what we're really saying is a is authorized to talk to B as well as routing with the load balancer how do we solve some of these problems without depending on Hardware right and so the first level problem to this is how do we stand up a central registry right and this is the goal with console such that when an application boots right when a new instance of B comes up you know it gets populated in this registry and we have a sort of a bird's-eye view of everything that's running in our infrastructure right what this lets us do is start to drive downstream automation we can use the registry to actually run updates against our firewall and our load balancers and even to inform our clients so we don't have to do manual updates when a new instance of B comes up we're not manually updating the load mounts are the load balancers simply being notified that hey there's a new instance of B add that to the back end and start routing traffic to it right same with the firewall same with our downstream service the next piece of this is really looking and saying can we actually take this whole middle layer and shift it out of the network entirely right and this becomes the sort of classic ServiceMaster approach right so to do that well you actually are doing a shifting the networking challenge out of the network right so if I have application a and it wants to talk to application B then we deploy a series of intermediate proxies all right so this might be something like envoy running on the machine so a is really talking out through the proxy the proxy is the one talking to another proxy and that brings us back now the key becomes the outgoing proxy is responsible for figuring out which instance of B we should actually route traffic to right so our routing and load balancing decisions actually get shifted over here to sort of the outgoing proxy right so again we're kind of moving this out of the network and moving it to the edge Rance is now running on node with a proxy but serving that same routing function similarly on the other side what we're doing is filtering who's allowed to talk to us right so instead of depending on a firewall running in the middle of the network what we're doing is when traffic comes in we're making a decision yes or no are you allowed to communicate with me and in effect we've moved the authorization decision out of the network and on to the edge right this is done by asserting that in the middle we're gonna speak a mutual TLS right so we force an upgrade to mutual TLS and this gives us two nice properties one is we have a strong notion of who's on both sides we know this is a and this is B but two it means that we're encrypting all the data over the wire right so we sort of get that for free with TLS so I think the big shift over here at the connectivity layer is really thinking about we're moving up from thing about layer 3 layer 4 IP and really services service a wants to talk to service B or wants to talk to the database right but we also need to be much more reactive to the application it can't take days or weeks or months for a network automation to kick in so the way we kind of solve that is by treating something like a registry as a central automation point so when applications get deployed they can publish to it and we can consume from that registry and do things like network automation right the final layer of this is what's the developer experience at the runtime and here there's a huge amount of diversity depending on the problem we're solving all right so we might be using spark for some of our big data or a Hadoop data platform you know we might use kubernetes for a long-running micro services we might use our Nomad scheduler so there's a variety of tools here and I think what you find in the runtime layer it's you know pick the right tool for the right job right if I have a spark shaped problem that's what I should use right so I think you know in some sense for the developers what the challenge is how do we learn these new platforms right the sort of state of the art whether it's spark or kubernetes or Nomad right but I think ultimately they're focused on writing and delivering the applications is the same what we're trying to get to is for these groups how do we expose these functions in a more self-service way to the developers right and so I think this becomes the core challenge of this multi cloud journey right is how do we move from an ITIL driven world largely on Prem to a self-service world that operates across these environments I think there's some key changes for both IT operators security teams and networking teams and development teams I hope you found this video useful and what I'd recommend is check out Hoshi Corp comm and particularly our cloud operating model white paper which covers these four layers and why they're in transition as we go through the multi cloud adoption journey if any of the subproblems sounded relevant or interesting please feel free to reach how to engage with how we might be able to help Thanks
Info
Channel: HashiCorp
Views: 5,712
Rating: 4.8965516 out of 5
Keywords: Multi-cloud, Multicloud, Cloud, Cloud Computing, Microservices, Terraform, HashiCorp, HashiCorp Nomad, Nomad, HashiCorp Terraform, HashiCorp Vault, Vault, HashiCorp Consul, Consul
Id: jp7sOvo1a6Y
Channel Id: undefined
Length: 27min 20sec (1640 seconds)
Published: Mon Jun 24 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.