High Performance Cloud Native Networking: K8s unleashing FD.io. - Maciek Konstantynowicz

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so I've got a bit of an update on where we are with one of the sub projects within the FDA yo the the continued system integration and testing and performance benchmarking so some of the material here especially the high level marketing blurp is is the same but there is a a lot of new data hopefully hopefully of use so my name is Magic constant knowledge and I'm a project lead for FDA assist and I really represent here the work done by the or the community and this doesn't work works okay there is quite a lot of data here again hopefully of a fuse this data is generated by the labs that we operate within the the elephant but you know the your mileage may vary depending on what hardware in software stack you use we do our best to document everything that we do and how things are set up so and we claim a good reproducibility but again your mileage may vary when you try to replicate the tests on slightly different hardware or software stack and there are some trademarks and branding have been asked to put this disclaimer in so how does from the from the use case testing and benchmarking whisper perspective how do we see vvp fitting into this new world of of cloud native applications we have NF and Sdn which are which are the trends that started in telco alt quite a while back and they are now being married by by the cloud underpinned by cloud native designs but the core really of the technology that we strongly believe in is is the open source platform up in the internet trends across those dimensions I've put this slide together last year about the the Moore's law and in literature and everywhere we hear that Moore's law is pretty much that and Intel and others are finding it hard to to follow the Moore's law is getting more and more expensive but how does this apply and if it does apply to our space to our software networking space and you know I obviously have lots of opinions and and as everybody else but the data shows us that it actually it is still a rambling on and with the new generations of of processors we're getting higher and higher speeds mainly because those vendors are focusing pay more attention to i/o and networking coaches which is all good news and if you're wondering what this is thing is I had a pleasure to touch it is the pre 93 1900 hole puncher it's a seven bit and it implements well whoever built it it implements the the binary logic and a boolean logic and and binary so when we when we benchmark we clearly needs to pay attention to how computers process packets and and we try to share this knowledge as much as we can with with a development community and add user community you need to be aware of the of the physics and the hardware to be able to to make efficient use of it and you know whether these are you know shared resources process or CPU cores memory bandwidth i/o bandwidth inter socket and you know the the most important we're paying attention to is is the the packet processing piece CPU cores but there's lots of stuff to do with with memory and i/o handling now there the Moore's Law rambling on piece is a slide that again presented cube come in in Seattle is that we're moving to from one generation to another on on Intel's from broad walls to skylights we actually noticed signal second performance gained and we keep seeing that that gained even for the new code and the new new test cases and this is due to the various microarchitecture improvements that that result in the in the in the performance again if you have any more detailed questions about technologies listed I have a number of references I'm also available after the talk to discuss them in detail so this is the the five dimensions that we believe any new modern software data playing should should aspire to and you should shoot me the requirements software comes first so any retrofitted technology will always have to compromise on the the efficient use of software hardware interface it needs to be fast clearly covering both the east-west and the north-south traffic and we have this well I guess thanks to the clouds and and contain there's a massive explosion of east-west traffic so focusing on deities is it's clearly a paramount and then you know being truly expensable and having a predictable performance even in the you know box full at high service density scenarios is clearly one of the on the table stakes and in order to achieve all of those for any any software any modern software data playing should count and provide the telemetry of not only that the traffic that flows through it but also resource resource usage and the claim we're putting here that the DVP does meet all of those challenges and number of them were proving in in insisted what we refer to as open source benchmarking claps these are these are our principles we like to you know the limits if you don't and you'll discover discover where things break we want to treat the systems as black boxes but also understand you know what happens when they break and if there are any ways to improve them as a white box and share it with the community so that was it from the from the sort of overview I'm now gonna go into a bit more detail some small print and and quite a lot of data hopefully without without overload so what do we do insisted we we do we do continues testing and we generate those reports per release the latest one is 1904 that went out about two weeks ago or two and a half weeks ago and we test various process of platforms Intel Xeon atom and arm Intel Xian's being elapsed since its inception in 2016 atoms are up and running awful they we still have issues generating the full report data from the from the devices in the in the labs due to various logistical issues but we are pretty much 95% there so next release we will all the data will be produced from from those test beds and the similar thing applies to to arm we test various packet paths including containers and and VM pups we do a lot of test cases we do a lot of measurements but we also focus on repeatability and I'm comparing apples to apples and also released release these are the emails that we sent on a pear report basis just with highlights on what the what the reports contain and I'm gonna focus on the next slide some highlights on the on the performance front so how do we have those things look in 1904 while to get a full picture you need to read the report we've been asked to do our best and provide some sort of technical marketing summary on the only what happens and in the release and this slide deck is the first attempt to do so unfortunately we don't have really much resource or time to write a glossy glossy collateral but we welcome you know any help and all held in that in that area so the the two charts are copy pasted from from the from this page and I have another specific page to prove that showing the performance of the ipv4 forwarding and l2 forwarding on the bare metal neck tunic you can see here the the baseline forwarding and with some at some capacity of the ipv4 Feb 20 thousand two hundred K and two million / 32 prefixes and the data is listed as shown so it scales from 15 to 18 million VPS per one physical core and this is on the three now skylake and for three like seven tenths what is interesting to see that these are the non drop 8 throughput data which means that as not a single packet is lost okay we run those tests 10 times different times of day different times of the week and and this statistical data show the spread of the results as you can see you know for the median here at fifteen point six eight the the min is 61 and the max is 76 so it's only a few tens of KPBS difference between the runs for the zero packet loss and that basically shows a great repeatability we look to l to speed is from 12 to 27 Mbps over 27 is really the the car marks off our tier XS we have some issues with the PDK compatibility but these are the shortest the fastest path alto patch and alto cross-connect and they're just indicative of for the or the software IO path can can do but again for l2 bridging baseline at 18 million and with 1 million marks at 11 million it is still state of the art and Industry best we we believe the spreads goes a bit a bit wider for some certain certain cases but it is still it is still within in this case about 1m one MPPs so how do the releases compare so there's a question from Jim sorry 20 point 20 point 5 gigs so it's 81 IQ and that's the right question to ask I will add that to their to the thing so he could call frequency clearly matters we run it at the base frequency for all tests as you can as you can imagine no turbo boost you can calculate the clock cycles improvements from release to release this is the the and the our data against your packet loss certain paths vm specifically and v and v host and ivf improved quite a bit than to 30% this list is generated for all the tests that the tran and with the you know descending order so you can see which which paths approved most and the dedicated links are here and as the comparison between the skylake and and kaswell / broad well we still see a big big big numbers as per the sort of more slam bling on moose Laurin bling on so that was it for data now the new stuff we've done in this release is we've been working for some time to address some of the challenges we're facing with specifically executing the tests and and having you know a limited physical resource so we started to innovate and and get smarter with the way we test the results are three new methodologies that we have developed the first one is ml our search which is an improved and more efficient binary search of sorts where we can search for multiple array write multiple rates in one in one look and and be smart with with the timing and we're looking to with analyzing this in in ATF to augment RFC two five four four second one is a maximum receive rate which we're using for a very fast spot tests and and evaluations and here we actually don't care about the packet loss so in this case in insisted we we test against the zero packet loss and they are and PDR whoops with some packet loss that's not greater than the non zero here we don't care about the loss and what we're using mr r for is for trending so we run tests daily for the designated packet puffs and we basically watch the MRI result and used some machine algorithms to detect the progressions regressions and notify developers so in this case you see a VF improvement that happened around the third of March and then again around here when you click on those dots you get a lot of info to correlate the patch and and so on the thing that we starting to be most proud of from the the innovation perspective is we developed something we refer to as automated soap testing so you know we do know what self testing is we want to know how the system behaves long-term throughput latency any resource leakage but to do it properly ideally you need a human to sit and to watch it and or check it after after a while and then understand what happened before but human libraries is not not cheap so and I mean you know throwing people that probably doesn't work so what we did we developed an algorithm we refer to as PL our search for a realistic loss ratio search which replaces the human and and it's using some maps from your hard core probabilistic maps and the result is really a heuristics that emulate the human behavior basically an operator sitting at the thing and with a knob and trying to find the right that is compliant with with the criteria in this in this case the criteria we gave was 10 to minus 7 packets per second which is the ITU named for a long term as your loss and as you can see here applied to the V BPL to patch we the algorithm adjusts over the the two hours of the test and ends up with with the critical region as we call it precision of 25 packets per second and the longer we can you specify the time you can run it for for few hours for a day for a week for for a month for a year and the alga will be basically monitoring the system behavior and adjusting to the 10 to minus seven in this case applying this another case V host with a lot of your vm interrupts and and memory interface we're getting lower precision still good 20k PPS and this release is the first one where we have a high confidence that the algorithm works and you will still the next table it's the data is matching what we what we expect now maybe this looks a bit futuristic and I don't know who is looking at the 2050 PS precision on the software and vs software in the visa lots tend to have a good reputation when it comes to deterministic behavior but as we can see certain paths do do you know can be can be exploding that in the in the sort of precision range the thing here is that we do not we do not prescribe the precision the only thing that we prescribe is the packet loss ratio 10 to minus 7 so we run this through a set of paths where VPP including some container and mammoth and and V and V hose puffs and we compared the the in D our throughput discovered with this ml our search I'll go and and the sulk throughput and and there is there is the results are matching and there are some discrepancies on ipv6 base and we expect this is to do with with LLC games but over time but it's its overall we are very happy with the results and I think we can now call it a a production ready for experimental use and we also I think I had a note on previous slide we're also looking to solid eyes this in ITF although to be honest B&W G group is is is hesitant because the maths involved are quite heavy but I'm sure we're going together so now another thing with it and we sort of give a taste of this at cube cos kettle is the result of a collaboration with CN CF CI team and sparked the CNF testbed is the attacking another problem which is verifying the behavior of the system where you have a lot of NFS in the inner system and and you know how do you help you the benchmark a bit a system like like that right VP peeps and alone is is call as we prevent its deterministic how do you approach the one is more of them and or there are other applications and and here we we also took a sort of semi-solid elation approach and we put a draft together where we're looking to come up with the methodology that allows one to in a universal manner benchmark a server full of enough services so we have identified some direct factors and some indirect factors direct factors are you know what an F code how it is configured what are the virtual topologies what are the packet traffic profiles and indirect factors are how the what are the technologies that are that are used sure technologies like virtualization and cost networking that are that are used to underpin the services ok I'm gonna skip that due to time but we basically identify the nav service we separate virtualization technology we separate the host networking and in our testing we're actually comparing virtualization technology we host vm and mammoth container and v switch as a two layers and we manage some some core resources and we're looking to do some tricks with LLC going forward so how do we abstract the service we we define how devices are the shtetl devices are interconnected how they are configured and how are the packets flowing through those sort of topology and configuration and with that in mind we defined three scenarios vnf service chains with weave snag forwarding as you see on on the slides the packets coming in the virtual switch and then go snake forwarding over the vnf same but with containers and the third one is because with containers we have this horizontal mammoth interface we have this pipeline pipeline forwarding that is loading the V switch less so these are the three three topologies that we benchmarked fully in 1904 we created this service matrix density matrix with gross listing number of services and columns listing number of NFS per per service we applied some core mapping magic which we're also looking to standardize from the approach and naming perspective in in ITF so we we taking this which with one core to core four core and NFS with half core one core to core and and we applying different core resources depending whether the hyper threading or SMT is enabled or disabled of course number of total services is dependent on the core usage so insisted we test the gray ones we can test the red ones because of because of running out of of core resources so these are all the boring boring things and now what we tested and the results are here so we took the the virtual service chains and we tested em over them so up to 10 with up to 10 vnfs and the configurations were VDP running 1904 of l2 switching as a V switch and IP for routing as a as a vnf with different resources associations this which one core to core four core and an F Swann core half core we didn't really know what to expect so the results were actually interesting now the second one is the same for CNF service chains and the last one is for pipelines so does anybody know if your brains are still working on the following on which one of those is expected to be the fastest bottom and second fastest gel so you know too much so these are the results if you interested in that in a top-level message the CSP the pipelines are up to five times faster than CC we are measuring mainly the forwarding performance of the V switch so this which is a containing factor and this is you know the least using a B switch so this is this is clearly expected C self to 30% faster than VC containers are a bit lighter weight but here it's mainly to do with IO packet interface optimization so 30% and V AC is you know the slower but the data is as shown here so the blue bars are for the a single single and F per chain so that's clearly the fastest for all cases about four and a half for VCS and over 5 million PPS for 4cs and and over five four for csps and and so on now what is interesting yeah remember Purim we have VBP is the CNF so is vvp IPU for routing as running in a container talking directly to each other no yet not yet okay now this is where the NFS running on one core funnily enough that if we put the data plane worker threads on on half of the core so only on a single hyper threat the results are staying pretty much the same for for CNF star slightly lower for for for vienna's which means we can pack more services on them on the on the on the socket okay so what we'll do in next we've done the work with in terms of the the CNF the service density we've done the work with homogenous and adopts VB PLA three forwarder and configurations we're looking to do different configurations and and looking at heterogeneous and F's now this is a bit out of scope for FDI assisted per se but we're collaborating with CNF testbed and and and lucina is here representing the team and we're basically looking for inputs on high performance and F service use cases and the requirements we believe that this work will underpin underpin any such use cases as they as they come along so IP for EEP sake not for for firewall stateful and and so on and this is just to cool out the old slide of mine materially put in 2017 showing showcasing the the Intel Xeon sky legs and one terabits per per server and be a quick recap on on mammoth a friend of mine asked me to remind the community that memory F is still is is there and it's ready to be used so in case you're not familiar with mammoth MMF is the the shared memory interface that is that can be used to a packet interface to connect processes together or containers in the user mode and it's it's optimized as I will show in the next slide in a container space it lends itself very well to connect containers together as I have presented earlier so it's safe and secure zero memory copy on slave mode a slave site and it's optimized performance not only for packets per second but really how many clocks are used per packet and and also instructions per per cycle on ziens atoms and arms and there is a memory of library available a Lib mammoth and there is a link to the to the docks on that just a repeat of with driving some solvation work with verify TF and Etsy and I think this concludes my slide with the usual wrapper of VPP is the one okay thank you [Applause]
Info
Channel: FD.io
Views: 190
Rating: 0 out of 5
Keywords:
Id: A8nBDOILLps
Channel Id: undefined
Length: 29min 5sec (1745 seconds)
Published: Fri May 31 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.