Kubernetes Networking at Scale - Laurent Bernaille, Datadog & Bowei Du, Google

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right so this session is kubernetes networking at scale and I am Bowie I work at Google and I'm l'homme I work at data dog in the team responsible for operating tremendous cluster and we run pretty big register with an whole different kind of issues with networking and this is one of the feedback we give ok so in general we're looking at to give a concrete data point for what we're gonna talk about in terms of scaling challenges so we're talking about thousands of nodes dozens of clusters and each cluster having you know tens of thousands of pods and then thousands of services in terms of the application that we're looking at we're looking at you know applications that process say trillions of RPS and gigabytes of data per second in terms of topology we're talking not just of single cluster or communication within a single cluster but communication across clusters and from VMs and clusters mixing them and finally in terms of latency we would like the lowest latency possible for these workloads now in terms of the high-level looking at scaling challenges we need to consider a couple of things first is in the data plane you really want to remove as many inefficiencies as possible right ku Bernays offers a whole bunch of abstractions but we really want to reduce the cost of those abstractions in the control plane when you're scaling up your cluster you will hit these large end issues where n is number of nose number of resources in addition to the large n issues you will also want to simplify your architecture simplifying your architecture really means reducing dependencies and making it easier to debug and in general we have found that the really the strategy to do this is to sign of enable more native integration with the infrastructure that you're running on now let's see what this means in practice so we'll talk about pod networking we'll talk about service load balancing we'll talk about l7 load-balancing which is ingress and then finally we'll discuss DNS so first let's talk about pod networking but networking is the way to give interfaces and IP addresses to two pods there are many different ways to do it the most simple one that many people have tried when they get started is to use static routes so the way it works is you give every node on your cluster a cider and then you use the cloud provider routing table and you add static routes for each Siler give pointing to to the nodes this is very simple works fine however there are limits in terms of number of static routes for instance by default on AWS you can't have more than 50 static routes which means you can have more than 50 nodes and also and we'll dive into this later it's not very efficient in term of address space because if you give a slash 24 to all the nodes if you have 10 to 20 minutes per node it's not gonna be very efficient and you're going to consume a lot of address space and it's gonna be tricky to manage another option is to use overlays so this is I think where most people are at today by overlays I mean either calico or flannel for instance and the way it worked in that case you also allocate a cider for each node but instead of relying on the cloud for value routing to get traffic per traffic from one node to another you actually create another way which means all these ciders replicated two nodes are actually virtual sizes which means the underlying network doesn't know anything about it and what the way it works is you actually do tunneling using either the X LAN or IP IP this works fine and it's very efficient because it means it can whirring is going to work regardless of the underlying infrastructure which makes it very easy to set up and that's why it's very popular to do it this way however there's quite another head in internals because you need to create packets and you need to add the additional header or for them and also distributing the route is sometimes a bit tricky so here is a typical example of how that would work your pod will get an interface so a VF interface connected to a bridge and then you would have a victor an interface connected to this bridge that's actually getting traffic out of the node so in this example here I'm getting traffic from one bird with IP 191 since I go one two one nine two six eight oh why okay and on the wire between the two instances young traffic between the node IP so the one in ten shows up in ten and the original layer two so sent by bus buzz is actually encapsulated in that fixed an error and as you can see that like there's a curse because you have all this additional information in the packet and also creating the vehicle and frame is consuming in terms of channel CPU usage another issue is control plane I mean having overlays is fine when you're you know number of node is not that big but at the certain scale it starts to be tricky because as you can see in this example it's a typical deployment of an overlay you have an agent on the node that is going to configure routes and channel in for instance vehicle in that case and this agent needs to get information on the cluster so it's going to connect to the API server and quite often it's going also to need a data store to store all the information it needs it's quite a van HDD and as your number of node grows the load all these agents are going to cause on data stalls and API server is going to be non negligible and can become an issue so in terms of best practices what we think where everybody is going to use native pod routing which means avoid avoid overlays in this case we want to have to open to a gradable IP on the network and so no overlay it's much easier to debug it's simpler and also it allows for flat network which means communication between two clusters put in different clusters and even VM outside of your cluster to your pods is going to work because everything is radical another optimization that we're seeing more and more and we're relying heavily on it a data dog is removed bridging like typical installation is the one I showed you before where all pods are connecting to a bridge and then routed outside the host but when you think about it you actually don't need this bridge quite often you just can rely on the house itself and this is what what we do most of the time another optimization that is that we see quite often is IP VLAN where it's been faster than having routing so you remember the slide from before you can see that this one is much simpler because in that case all all parts of radical IPS and there are in different space than the IPS of the nodes but everything just work and it's much more efficient in terms of so this is the principle and now we're going to talk about how this is known in practice so we're gonna very quickly talk about on premise on premise sometimes you're very lucky because you can actually control the network and you can establish bgp connection between your nodes and the network which means the network is going to know about the Sider allocated to each node is going to be transparent it's just going to be another side or in cloud provider it's slightly different so let's first talk about how it's done at Google okay so in the google kubernetes engine or if you run kubernetes directly on tcp it used to be in the old way that we create these static routes now as we discussed before static routes the static route functionality is actually fairly complex and you can easily run into quota issues because it's somewhat resource intensive to sort of implement all the different features there what we have added is a VPC native mode so in VPC native you have a pre allocated range for the pods and from this range the cloud actually allocates the IPS as alias ranges to the VMS and the cloud actually manages this IP allocation for the cluster now this is much more semantic and efficient for the underlying SDM and in fact you are telling the Sdn about the pods so the Sdn kind of knows that there is this range allocated of the pods and the stn also knows about this allocation to the pods itself so let's see how this works so in classic kubernetes you have a range that's devoted to the VM IPS you have this pod IP allocation and then you have the service IPS on the left and then with the V PC native mode the the cloud will allocate these slash 24s to the various nodes that come up in your cluster and in fact because we have delegated IP allocation to the cloud we can do interesting things like have the cloud control the size of this allocation to various nodes and actually this becomes a scalability issue as well that kubernetes I'm sure everyone has noticed when you create a cluster with the default settings it's actually quite greedy in terms of eating up your IP addresses for instance with the default settings 1k nodes with a slash 24 well that's a thousand times 256 which is actually 256 IPs on your network and we find that a lot of people when they try to stitch their networks together actually run out of space so with the cloud mediated IP allocation we can tell it oh hey I don't need that many IPS this is a node that's gonna run a very specialized workload we can give it a fewer I P so you can kind of right size it for your node footprint and in general we hope that sort of as future work is that we get away from these contiguous ranges and basically go to discontinuous ranges that can be merged together so we can kind of have clusters that fit into your network footprint another interesting thing about VPC native sort of cloud Sdn integrated plot routing is that there's an opportunity to give the Sdn full visibility of your traffic so when you create a DPC native cluster you can when you see that when a pod talks to another pod over the Sdn from node to node the VPC is able to see the flow but when the pod talks pod to pod actually this flow is not visible to the Sdn well you can play some tricks with the PDP to basically pin that traffic out to the Sdn and what this means is that all of your pod traffic whether it's between nodes or within the node will be visible by the Sdn and kind of use the same infrastructure for example you can use the same kind of tools on your Sdn to observe all the flows in your cluster and so this was for DCP an address it's slightly different because they no equivalent feature in terms of at ple assurance you cannot associate an additional range to your notes but there are still some CNI plugin that allow you to give to give radabaugh IP stupids but the first one that's pretty well-known because it's the one provided by AWS and use any key s is the e key sen i plug in the way this plugin works is you basically attach additional units to nodes and additional IPS to this yennai and this IP are gonna be the pod IP the way it works in terms of workflow is when the cubit in runtime are going to create the interface and me to four feet apart the first thing they're going to do is the shehnai plugin is going to call a demon that is responsible for allocating yen eyes and providing eyepiece and then this runtime is going to create the interface associate this IP with it and do some clever writing on the host to make sure that traffic from IP one on part one is actually going to throw in the face Ethier h1 in this is raid zero an alternative plug-in is a plug-in developed by lift which is very similar in terms of how it's designed because it's also attaching additionally an eyes an additional IPS it's a bit more complicated because they did some optimization so the first difference is each team unless so you actually have a binary called I Pam something and this spanner is responsible for manipulating eni and allocating IPs and the main traffic for the pod is going to use an IP VLAN interface the goal for this one is to be much more efficient than during just routing because I believe you'll and you don't have to cross the same part of the kernel so it's gonna be faster the disk was pretty fine but sometimes also need to reach to reach and IP addresses on the node such as the cubit resistance in that case you need an additional interface and that's why there is this plug-in they're called PTP which is going to create an interface to the house so this works pretty fine we've been using it for more than a year now and we've been very happy with it but honestly it's a bit complicated to debug sometimes at the conclusion on this and this part native patrolling is very efficient would pretty sure that everybody is going to go this way in the future you don't pay the cost of the overlay CNI plug in a still a bit young but they're getting there I mean we've seen a lot of progress in the last year and as I were saying before it's allows for traffic between clusters and traffic with vm so it's a lot it's a lot easier and finally it allows ingress traffic to be managed much more efficiently and we discuss this later ok so let's talk about service routing so when we talk about service routing we instantly get into the land of queue proxy to kind of give a sketch of how queue proxy works on every node there's going to be a queue proxy instance it's going to watch the API server associated with this whole service mechanism is a set of resources which are controlled by the endpoints control and the service controller and the endpoints and service controller basically are responsible for looking at the service resources assembling all the relevant endpoints and writing them into the set CD that then can be consumed by the queue proxy now the PUE proxy implements a client-side load balancer on every node right so here you have the client and it's sending traffic and there's this proxy err box in most typical implementations with IP tables that's actually just a Linux kernel and then it's going to load balanced traffic to the various pods now the original implementation of queue proxy was in user space this is old history most people use the IP tables based queue proxy it's default since 1.2 and it's way faster than the user space but we'll see that they have some problems and then finally there's an IPS implementation that when GaN 1.11 and it uses a different kernel mechanism but it still relies on IP tables for some things it's faster than IP tables scales better and Lauren will talk about their experience with ipbs so in terms of IP tables what is it doing so it has to make a load balancing decision on the client side and to keep the traffic flowing it has to handle two cases it has to take the outgoing traffic find a back-end rewrite the back end the service IP to the back end and then also handle getting the traffic back in the reverse path and the way this works I won't go into detail but it's a lot of chains of iptables right and generally speaking iptables is the rules are basically a bunch of if statements chained together in a linear list so if you know you first have to select a back end and then you have to go to a service and then you have to traverse all these tables and generally speaking it's a long chain and especially when you have a lot of services and a lot of back ends these chains grow linearly with that so what are the challenges well a couple and I kind of hinted on it in the previous slide the IB tables kernel interface requires a sink of all the rules of one shot this is just how that kernel interface works you have a longer sink time now this used to be a big problem but some kernel fixes went in and it has gone down to seconds but it's still you have to pay that cost and there's significant memory usage because you have to generate this list of rules it has to even if you're changing one thing generate the whole list which may be in a large cluster over 100 megabytes sync it and then you continue the second thing is IP tables uses contract which leads to a lot of right sizing issue right how much memory do you have to reserve for contract and we have seen this especially challenging when you have DNS traffic which consumes a lot of contract entries finally just it's like a future proofing though this is basically whatever you see in Cooper Miley's in terms of load balancing is basically everything you could do with the IP tables it's very hard to conceive of adding additional functionality when we started our Ajani in' in cuban ideas we knew okay so we're going to be big in terms of number of nodes number of endpoints and we knew we were gonna have issue with IP tables so from the get-go we decided to try a PDF because we we assumed it was going to be a lot faster and a lot more efficient in terms of design the logical design is the same as the same one as we used before and in terms of mapping in the umber of components accumulated service is mapped to a virtual server in a previous and but as a real real server which is a back-end in a virtual server so as we were saying before the main advantage here is that updates are atomic so your only chance I mean if you delete a pod you own two days the backend and in a virtual server so it's much more efficient in terms of how it looks this is the output of the IPS ADM command which is giving you information of what's configuring a PPS and you can see that this is a service IP for an HTTP service and you can see two backends using port 5000 this example ipbs has been working pretty great for us I mean we've been using it for more than a year as I was saying but there was one chain that was very important and that made things a bit tricky over the last few months I PPS used to not support graceful termination at all which meant as soon as you delete a pod or and big a pod becomes not ready it's remove of the backend which means no traffic is going to flow to this pod anymore which when you have an existing connection open you would want some time for example file HTTP you would want for the end of the communication to go on which is a lot better so graceful termination was introduced in 112 and back party to 111 and the first the first version of it add a few bugs and but we're getting there it's a lot better now so if you have any issues with that PVS please let us know up and issues and queue proxy will go fix it I mean we're really committed to fixing it one thing that's important with that PVS is we have double contracting IP vs as its own connection tracking system and you also rely on the contracting Bakula and it's a bit tricky because then I mean I'm Bobby mentioned issues with contract sizing and timeout before so now we have to do it twice based on IP vs and in the in the normal contract and the default time at for PBS are not great the figure there are four TCP TCP fee which is when a connection in tab is dead terminating and the 300 so five minutes is UDP as you can imagine like keeping a connection the contract of five minutes when it's a DNS query is very bad idea we also had a few issues with shinai because no CNI plugins are tested with IP tables and routing with a PPS is slightly different so we noticed some movie interaction so if you want write EVs make sure that you test everything with cyanide so haproxy is one of the implementation of clans eyelid balancing there are several others and queuebrowser is one of them so the brother is doing a lot of things if you if you know it it also allows you to control routing between junior birds and it comes also with service load balance using IPPs and it's a different limitation another option would be to use CDM which is using EBP F to do client-side load balancing and in addition to and as many other features in addition to that okay so just to wrap up on the service load balancing there's actually a couple of other exciting things going on so first is implementation of topology aware service routing so what this means is like you can express things for example low load balance first to my node and if that's not available then spread it out across the cluster and that's the link to the cap please take a look at it it's pretty much implementable but we really would like feedback the second is I'm sure if you run in a big cluster and run big services you'll notice that eventually your endpoints object will become super massive and no longer fit in your at CD table so this project endpoint slice is a project to take that endpoints object and shard it among several endpoint slices to get around this scalability issue so please take a look at that cap as well that is an ongoing design the third topic we're going to talk about is ingress traffic the first way to get traffic inside a cluster is to use a load balancer type service in that case this is how it works so you create a load service of type load balancer and the service controller in masters is going to pro to provision a load balancer on your cloud provider and this web browser is going to send traffic to node balls and the typical flow of traffic is traffic is going to the load balancer hitting the notebook on a node and then your proxy is going to take care of advancing the traffic to the actual pods having the application it seems fine when you see it this way but the first thing is it's not very efficient because it's very unlikely and large cluster that you're gonna hit a node where the service is actually located so you're going to first hit a node and then be righted again to a new where the police action is is another issue in that if you have large cluster with very different workloads for instance a Kafka or Cassandra cluster you wouldn't want this node to get a part of web traffic for instance because it's rena deficient and you would want another traffic to impact traffic of very critical data store an option is to use external traffic policy local in which case q proxy is only going to send traffic to local pods and it's going to fail the health checks if there's no pod on the service so this is much better than before because you don't have to do the additional hub however you still have a few issues with the first one that you have if you have hundreds or thousands of nodes all these nodes need to be registered with load balancer and there are limits in terms of the number of nodes you can register with load balancer to give you an idea on AWS this limit is 1000 there's no way you can have more than 1,000 instances attached to an ELB another one in an issue with routing in the past is that health check are still going to be sent to all the pods which means this is going to be a lot of traffic see already we were talking about load balancer service ol4 ingress another way to do to get traffic inside cluster is to use l7 ingresses so the ingress object in kubernetes and it's actually exactly the same implementation when you when you look and look about it the only difference in this slide is that this object are not controlled by the service controller on the master but on the English controller that it's exactly the same flow and exactly the same issues another option would be to use nginx as a proxy or H a proxy for instance and in that case the right part of the study is much more efficient because this these pods are going to be configured directly to road traffic to relevant pods so the traffic between the proxy and the prod is going to be very efficient but as you can see on the left part of the slide you still need to get traffic to these proxies and so you need usually a lot as a service in front with all the issues we mentioned before for this reason we really believe that the future is Chris Lu native running in in this case blood bank server traffic directly to Paddy peas and this is of course only possible if you put out native IPS in the V PC which is what we were showing before there are several implementation of this design on AWS for instance you have a controller called the alb ingress controller and which can be configured to route traffic directly to pods instead of instances so it's much more efficient the way it works is it's creating a lb and the target groups directly directly targeting targeting IPs and GCP load balancers can be configured to use the network endpoint groups which is doing going to do exactly the same and write traffic directly to the pods proxy also do it as I was saying before but of course you still need to put something in front and it's probably going to be either based on network endpoint groups or alb in that example okay so now you notice that it's those cases cover some of the workloads surrounding elf seven in ingress but you know what about l4 and ELB so currently there doesn't seem to be a solution there and it's also limited to HTTP traffic now we want to consider UDP or TCP traffic and so this is kind of a work in progress to get container native load balancing for all the different types of load balancing that we need finally let's talk about DNS challenges so this is a lot of people run into this so kind of this is a summary of the DEA's challenges that people have seen so you have an application let's say it's in nodejs or ruby and it does is get post by name well hopefully the recursive variant it has a search path which means that the number of queries actually gets amplified quite a bit and we hit the contract table because keep DNS is going to be a service so we may run out of contract entries now also DNS is a UDP protocol and who knows where that UDP packet went so we might have lost it going to the query and finally because of this search path amplification you may not have enough instances to soak up all this load so there's a whole our problems associated with the volume of DNS queries that we see in kubernetes so one of the solutions to this problem is to actually drop a no local cache into your cluster so the no local cache runs on every single node and it can first of all as a cache so it can absorb some of the queries right away the second thing that it does is it avoids contract it actually inserts special rules into your IP tables chains to basically skip contract and go directly to the no local cache and finally it upgrades all your connections instead of sending UDP it sends TCP which is a lot more reliable and then finally hopefully it has reduced load anyways so then you don't have these scalability problems with the number of instances another interesting thing and this is a ongoing cap is that why don't we take care of that search path issue so no local cache is a smart agent we can put some logic in there and it can actually coordinate with the upstream cube DNS infrastructure what if we have the node local cache insert that search path somehow into the DNS query perhaps as a a DNS field and then actually the rural search path mechanism could be resolved upstream inside the server and you won't have to send all those packets over the wire now I'm just going to describe you the the way we do DNS and data Doug and it's a lot inspired by the node local cache that Bob mentioned just before so this is the typical way we do DNS a data log and a very standard one so this is a default configuration the qubit is going to inject search domains the end option which governs when there's going to be an expansion of the of the searches and so this works fine but the when I put wants to do in DNS query it's going to contact the DNS service and it's going to use that PPS and it's going to go to the main system node are all the nodes running your Co DNS buds and then the query is going to come from there and if the Cordilleras doesn't know about this service is going to fold it upstream as we were saying before this this led to this leads to a few issue and we have we had quite a few outages for this design so for application that do a lot of queries and I'm talking about title of query per seconds the design is pretty inefficient so we introduced an alternative that's opted in so people can just add an annotation on their part and we have a mutating web hook that's going to modify the configuration the DNS configuration of the pod and you can see here the configuration of the pod so we do two important things the first one is we have a single search domain which is going to be SVC that cluster dot local which means apt even in the single in the same name space application we'll need to put service dot name space they can't use just service so that's the reason it's obtained it's because it's breaking some of the kubernetes contracts but it's much more efficient because most of the time it's going to be in single query in addition to that because we are very opinionated in the way we do search domains we can go back to n dot equal to which means if you do a standard query like wwe.com this has two dots which means has not gonna be expansion and traffic can be it has not gonna be it's just going to be asked could do read google.com and not ww that google.com that service after local which is typical expansion we modify also the resolver in point to a node local DNS cache okay so it's a very small coordinates instance running locally and this instance that's to think it does caching so if you hit the same name a lot you can a bit better table caching is gonna be a lot more efficient and in addition it writes traffic that is not resolvable by coordinates natively in the in the cluster directly to the upstream and locally so if the query doesn't hang with cluster dot local it's directly routed to the BBC resolver in our case so it's much more efficient and and we've had quite quite good results with it and the underworld will probably gonna switch from opt-in behavior for this behavior to this behavior by default with opt-out for application that counts work with it okay so in conclusion really a common theme is you know removing the cost of all the abstractions right native integrations with the infrastructure is key so a lot of these things that we described it's like how do we get things to route directly how do we remove overlays and so forth one thing to observe as many caps in flight to improve scalability and then the third thing is actually there are many interesting future technologies that haven't been explored yet in terms of the implementation so for example EBP F maybe some integration with the service mesh and then in general I don't know if you attended any of the scalability talks as well but the idea and the where the community wants to go is to just make sure everything scales by default out of the box I think one of the scalability talks talked a lot about SL O's it's like how do we get the implementation to such a state that we can kind of give you an SLO that you can rely on even though your cluster is $5000 $10000 with that you end up our talk and by the way thanks for staying because this is the last talk on the last day and if you want I think we have a few minutes for questions and we're gonna stay around anyway for a bit longer and what with the native approach do you find like very rapid scaling and like large rapid scaling up and down under the cluster and a lot of movement departs to sort of cause additional problems so are you like a bit AWS specific ok so yes I mean we don't have the issue that much because we don't increase the cluster size very fast but yes a way of boosting work it they locally try and get an address and so you can get rate limited by the other escapee is so yeah I think this can be improved in both the AWS and the lift plugin but I it hasn't been a problem for us but we know we've talked to people that have had the issue too and I think one of the way to solve it is to actually pre locate ip's and/or a bunch of IPS to avoid all the instances doing the same thing at the same time I can answer this question for the network temporal groups actually we had interesting a technical challenge in terms of trying to batch things together so they can go fast the code is actually open source if you want to check it out because I do link any other questions yes Chris as mentioned part of the iptables problem is like the N squared ever-present networking problem and a linear search and IP tables how's this helped by moving that into the load balancer or or the neg implementation is just a different data structure that does it require the linear search or you're just scale it out a different way well you don't need that linear search because it knows the direct back end so it can kind of I mean it can run up smarter data structure than just a linear list that's one answer I think that's like a big answer actually because IP tables you you can use IP sets but that's not currently the implementation if you had IP sets actually IP tables certain aspects of it don't aren't linear anymore but that just doesn't exist right now it depends although we'd have to look to see which parts that would be relevant for because I don't know if you can do that with the service like the service lookup for IP to could clearly be in a hash table but with IP tables you have to put in a list and I think that there are there have been a few discussion how how to shop the a table chain but I think it's I mean it's been discussed but it's I don't think that's any implementation ready so regarding the same IP tables issue during the presentation there was a it was mentioned that for ITP tables you need to write a lot of long rules to route from to throughout the network between two ports and have you thought like using this service for doing that for storing all the rules and how have you tried to use open the switch for that oh I see that's an alternative implementation that I guess we haven't covered it could completely replace IP tables so in some sense you're talking about a separate implementation what was your first question I see the the issue that was illustrated here so the question is have you thought about using a separate service for storing the rules the issue that was brought up was more that the kernel interface to change IP tables is actually all just one shot like you have to generate the whole thing and put it in even though your change might be very small so that turned out to be very inefficient because if you have a lot of endpoints a lot of services you kind of have to regenerate that thing over and over again if you cashed it you would still have the problem that you'd have to shove it through to the kernel repeatedly even though you still have the thing generated any other questions in the on-prem scenario it's not always an option to be able to use the Royal architecture IP addresses and overlays it seems to be the only solution unless going for another address space right and ipv6 so given what you know about communities at scale is ipv6 an option to get those things I cannot answer your direct question but luckily the dual stack kept is in a pretty good shape so hopefully just work on ipv6 soon and for the unprimed squishin most of the people I know but I'm really not a specialist what they do is they do native routing they have BGP so either a browser or calico running on the node and all this during peering I mean two older BGP routers and announcing route to the network in which case you don't in another light if you can't do that because in many companies you can't actually do that then you have to use an array but yeah if you can like try a negotiated with the network team to make it happen it's going to be a lot more efficient okay thank you for a great talk a hell questions the question just in case did you try playing with any tables this new fancy firewall subsystem in yes so someone mentioned nf tables and it definitely solves some of the IP tables problems I would love to see an implementation so if anyone is interested I'm pretty sure that they would be interested in one of the biggest issue in the cube proxy contracts today is pretty complicated as we've seen on the I mean I've worked a little Mia PBS implementation and the contract established by the de facto default which is IP labels is very complicated and getting this right in a new implementation may be tricky it's gonna be a lot of work so I mean if someone started I'm sure the community would love to try and use it but to be honest nf table is not a load balancing solution either so it's gonna handle please it's gonna be better than IP tables but it's not going to be a client-side load balancer the future is probably ebps based for for this for this entire thing so you're using IP vs for a year right yes and there any reason not to use IP vs now I mean it's not the default solution so what what are the kind of drawbacks this quest for termination was a major blocker I would say what no it's apparently solved I'd say today for most use cases IPS is stable there's a few issues up an uptrend but nothing in terms of features so there's one where this communication behavior is slightly different in IP vs in IP tables for instance you can do localhost note ports this doesn't work all that most of the teamwork the only remaining issues are related to graceful termination and we decided two weeks ago to remove gradual termination for UDP which is going to solve most of the ongoing issues and since the IP tables implementation is not supporting restful termination for UDP either is going to be the same behavior so we had a few people that wanted graceful termination for UDP for quick for instance but it's very hard to get right so we decided to go exactly as IP tables and remove that automation for UDP which games it's going to solve most of the remaining issues and once again if you try it and have an issue create an issue and on the repo and we can look into it yeah exactly that so I PVS I know some people had some issues because it puts the IP address on an interface and that ended up capturing traffic given how they're seen I set up so really it's like you have to it hasn't been battle tested so it just needs wider usage and then you can get the feedback what's that well it's battle tested for a vertical slice nobody yes I mean most of the new issues are related to edge cases that we don't know about for instance we had a lot of discussion with people from metal lb which does pretty clever saying that no one in the IPV Aseema tried before and so we had to solve this for instance so it's like specific use cases specific CNI implementations so I'm sure we're gonna get it fixed but we need more users to do to be sure that we're covering everything one more question so you mentioned on enemy Nestor to seen hi plugins official AWS one which also is used by you case and you are using the lift one yes do you have any elaborate comparison what are the pros and cons or no way to fuentes and why's it obvious creating a new one and so we started with the abrasion i plug in because it seems to be the most natural thing to do the first implementation was very young when we started it had quite a few bug and we had to thick them and the interaction with the team developing the plugin was bit difficult at the time so we decided to try lifted the lid implementation and it turned out this implementation was pretty solid and it was very easy to get things patched upstream so that's why we went this way in addition in terms of design its demoness which we find a lot better because it's simpler to just chain genell binaries on the host and and honestly I mean we've been running the live Chennai plug-in for more than a year now without any problem oh we fixed a few things along the way but okay thank you all for coming [Applause] [Music]

Info

Channel: CNCF [Cloud Native Computing Foundation]

Views: 3,300

Rating: undefined out of 5

Keywords:

Id: MvoImel5qfc

Channel Id: undefined

Length: 42min 15sec (2535 seconds)

Published: Fri May 24 2019