Accelerating NFV with VMware's Enhanced Network Stack (ENS) and Intel's Poll Mode Drivers (PMD)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so hi everyone my name is Rahul I'm a software engineer at Intel working for the network platform group and with me joining is Jen from VMware and we are here to introduce VMware's new V Spears new e NS stacked enhanced Network stack mainly focused for n EFI applications so as we all know network function virtualization deployments are happening at a rapid rapid pace and all the fixed functions are running on a dedicated Hardware are moving from hardware to the software or a virtual component on general purpose server so the key thing here to consider is your performance should not be affected because of your compute storage and network workload consolidation so how do we do that so let's have let's try and have an optimized top-to-bottom stack solution so that you still have throughput which is high enough to support the nfe workload as well as you also have the lowest latency possible so in order to do that VMware launches is launching an enhanced network stack with the V with a new we switch and Intel is proud to be the first partner to release the full mode drivers for this tag so going a little backwards Intel has been very active in the development of DP DK since its beginning back in 2014 if I remember the PDK summit VMware and Intel jointly presented the benefits of the vm x and three driver and the model itself which overcame the emulated a 1000 modern shortcomings by saving lot of vm exits and there were a couple of more optimizations so this mainly optimized your guest stack and we got a very good performance in the guest but that's not enough so your host host stack still needs to be optimized to meet the stringent NFA requirements if you see the current vSphere native drivers they operate in an interrupt mode with lot of async calls so by leveraging the same DP diecut concepts learned over the years VMware and in developed collaborated to develop this ENS stack with we with the new we switch which basically now polls for oops sorry sorry about that which now basically polls continuously for the packets and which also gives you a higher performance using for the you mainly required for the NFA applications so moving on to the next slide so these are the essential DBA components that inspired this stack we now have a V switch that is continuously polling for packets and we also provide the poll mode drivers to for the necessary are X and DX most functions along with the other functions for the uplink Intel or Intel is introducing the i XG b and e NS and i40 again as drivers for the first time and for this stack which will be the poll mode drivers according for drivers in addition to the existing we switch native drivers it also apart from polling we also use I mean ENS also uses enos also uses say member structure similar which is quite similar to the DB DK it also uses a two Meg of large pages which are similar to the Linux huge pages and there are other features like Cole pinning batch processing which are similar to the DVD K which are used in this tag and Jen will talk about them in detail in the next slides when he talks about the ENS architecture so from a user or a vnf or a VMS perspective you now have an option to choose a faster data part which gives you which can provide which through which you can receive all transmit your network traffic but using the same vm x raid 3 interface so with that i handed over to the gen to go over the details of dns architecture thank you what's temperature hi my name is Jin hill from VMware engineering team so like Raul explained nfe applications usually require I mean drives much higher pecky rate and require very low pay close so to meet those much much stringent requirement and performance requirement VMware has been working with Intel to implement a completely new networking stay to to better support and at the end of the applications so this new networking stack called enhanced networking stack ENS and so to implement this new data path we employed DP decays techniques various duplicate techniques saying here the employed because we are not just copying the duplicate source code and putting in ESX because we already have existing components in ESX connor that does the similar thing for example like if you look at the right side of the diagram you will see that the two megabytes heap and fazlul so this is for example our first lab is very similar to similar to memory pool in deep decay so first lab is a memory kind of packet packet allocation interface similar to memory pool but is it is much flexible so that it can it it will dynamically change in the memory allocation size so it's much more flexible than than then the memory core interface in DP decay also we already have this two megabytes our large page hip interface who are kind of we we we reuse those components so so mainly we are using and then the buffering and hash-table components from deep decay and mostly all other things we use our existing V Spears components and another difference is that we put this we implement and employ these T PDK techniques inside the kernel which is a little different from other like Linux all this duplicate application running in the gap in the user space that's that's kind of difference so with this new or existing components we have to re-implement viennetta three virtual device so we reimplemented virtual device using new em both end of interface and then the virtual switch also we we kind of reimplemented it now we use well-known flow cash technique and then of course the intel is working on polymer driver for ens networking stack so this is a kind of a new data path in in addition to this coexist with the existing finished networking database so user can choose ENS if they want to run nfe and they can still use the existing differ stick and we deliver we ENS yields much higher much better much much better performance compared to existing networking stack while still we provide these well-known features TLS and high availability and and vMotion live and live migration and ENS is a will be part of the nsx products nsx is the VMS network virtualization and security products basically to better support software-defined networking and software-defined data centers so it will be part of the nsx and then the user can use the OpenStack for example vio interface to configure and install ENS fortress each using VMs Neutron opens take open step plug-in so let me okay let me explain some of the components if you look at the purple parts they were so there are dedicated P in the course I think of duplicates the same DP DK will be as duplicate all these guys the same thing so we used we give dedicated course for running air course and the benefit is not just not just of reducing the overhead of Interop handling is more like the major benefit right we see is that doing so reduces the context switching I'm a CPA context switching latency so that this can lead to much much lower pakya loss I think that's one of the major benefits when we run the performance test and the flow case you I kinda can explained and then we also use SSE techniques to implement the memory copy and cache key calculation and slide or has key comparison so I think I briefly explained that dedicated superior location to airport and we use Pauline and then the okay so ENS has its own load balancing techniques to make sure that the VM and related Airport system threads are running on the same no monoid if this if related threads are running on different no mono then our experience term it tells us that performance kind of drops almost by half so pretty bad so we need to make sure that the VM and related air course should be running on the same Noma note which we are doing and memory allocation of course Numa aware and then the we use large pages I mean large pages is a these fears from viscous terminology so we only for now we on for two megabytes huge pages I we didn't see yeah so let's just start yeah and then though we we use n buff and this n buff picker representation is quite simpler than our internal paper representation and this gives us these safe cycles in the CPU pickle initialization and also reduces cache misses so there's one one good thing of using this simpler pic representation and then the user flow cache also improves the efficiency of air to virtual switching and then we try to avoid using locks as much as possible because even without contention just just calling it the spinner or call into calling these atomic operations that just text takes away valuable cycles so we we are trying reduced in not to use relax as much as possible and then the VMS are three optimizations so in the previous slide I told you guys that we have we implemented kind of a new vm extra layer 3 device drivers using em both in addition to that we also heavily optimized other things it's too much detail I don't want to talk about it but so basically we reimplemented viennetta 3 device itself and we also applied some of the kind of techniques and then improve the performance in the guest driver side too so this so we I think we already upstream this a new guest driver to both the PDK typical drivers and then the VM corner Linux kernel in kernel via next we have three device drivers and assess instructions I kind of already discussed so all these things will help us to improve performance quite a bit and then I briefly go over that the Intel format driver side for initial initially Intel is working on two type of device drivers one is one for HIV and the other for I 40 devices and then the and features like basic features checksum and then multi cue filtering the this multi Chris net q4 in VSP atoms and we also were support a Mac met Finland filtering and Geneva offload the external flow things like that okay so I guess most kind of important thing is performance so compared to existing default tariff eighty-four visit performance the new I mean the purpose of the ENS networking stack ENS is like a three to five at bearer in terms of P key right and this also one good thing is that if you add more air course the Packer great performance scales linearly and then the oculus is now is much lower because we use dedicated CPU and polling so there's CPU scheduling latency is not going to be incurred pack a lot across anymore so Picasa will be all very small and for for the similar reasons jitter and latency is also low okay so I think this is it thank you Thank You jr. in that way [Applause]
Info
Channel: DPDK Project
Views: 1,215
Rating: 5 out of 5
Keywords:
Id: OfeAXrqHfi0
Channel Id: undefined
Length: 15min 2sec (902 seconds)
Published: Wed Dec 06 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.