Considerations for SR IOV and NFV Solutions

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I cut your couch here today I'm going to be talking about using s RI o V for your NFV applications and the implications that may have on your decision to use it so we're going to be specifically talking about using sr iove and OVS DP DK for traffic in two different traffic patterns in general when you have nfe solution your traffic has two different patterns either it's going to do what we call north-south traffic where traffic comes into the platform into your server gets routed to a vnf whether it's a VM or it's a container and it gets processed within that vnf and then it's sent back out to the physical wire again so it comes in to the platform gets processed and sent right back out the other way is east-west traffic so this traffic is where the traffic in this example of traffic is going to get wrecked come off of the wire off of the switch into the vnf and then instead of going back out the platform it's sent to another VM and then it could be chained into more and more and more so we call that service chaining so that would be like if you're going to you know you have your router and you have your firewall in your deep packet inspection those kind of things you could run through a service chain all in the same platform so the reason we're doing this is is I started recently working in the area been Fe and I did some research on that topic and I found a number of articles online that talked about nfe and using SR io v and PCI pass-through to improve your performance which makes sense because that's Ravi is always going to give you your fastest performance and lower lowest latency because you don't have that software B switch there in the middle however when I started digging into this and looking at the numbers while the numbers were impressive in a lot of these papers talking some of them talking about offloads and a specific ethernet device to be able to provide B switch typeof offloads they all talked about received traffic and only one or two VMs vnf in general were shown in the numbers so they didn't talk about large-scale likely you know six or eight PMS and they only talked about north-south traffic they didn't talk about this east-west traffic flow that I discussed here which is a valid use case so that was never discussed they only discussed the north-south law so what we decided to do my partner in crime crime Brian Johnson and I put together this system here and it's a boon to box running KVM and it has open V switch and it has OVS D PDK honest which is d PK extensions to open V switch and it's got some nice processors on it and lots of RAM and it's got nine virtual machines on it all identical and it has three independent Intel 40 gig Ethernet controllers on it one of them goes straight into OVS another one goes into OBS DP DK which means that there's a DP d capo limo driver associated with it if you want more information about DP DK there's materials on the web that you can go find for that and a link at the end of this presentation and the last ethernet controller is for srl v so we made these all separate for the purpose of this test and each VM looks like this so we're going to have each VM is a Bluetooth virtual machine running on top of the system under test and each of them have three different network interfaces we have the SRV virtual function that's associated with the Intel 40 gig device we have a DP DK backed paravirtualized interface that we call the host for the purposes of this video and in the paper that this video is based upon and that is OBS DP D K back so that means that there's a DP DK pulmo driver here that's reading data off of the Ethernet port and putting that into OBS and then it goes up to the could this be host device with him the BM and then lastly we have the more traditional paravirtualized Verdi Oh driver that is backed within OVS by it be host user device and that is the traffic for that comes in off of the other neck just using a standard i4 te driver so in this case it's a you know running with a regular interrupt traffic comes in and interrupts generated and then it's processed and and it sent up to the up to the VM here so we're going to run two kinds of tests the first test I will show you is running a traditional network stack what we call in the VM over each of these three devices so you can see what running that poor folks like on over SRV d PDK and over Verdi oh and we are only going to show transmitted numbers for other sold demo just because it's it's almost identical with the receive side but to me it was more important to show the transmitted because that's what the output of the BNF is so we're going to show first of all a traditional network stack and see how these these three different interfaces compared to each other in that kind of environment and then we're also going to run tests for scaling using a more traditional vnf approach which is where you're going to use a DP DK application to do that fast packet process and because you're going to get your best performance on especially in small packets with using a DVD k so for that we're going to use test PMD which comes with DP DK and test PMD is simply going to read data as it comes in off of the interface so in this case let's do an example it's going to we're going to we're going to test we're going to use srl V so test PMD is going to read data off the virtual function it's going to use the pull mode driver and it's going to read the data and then it's going to write it right back out that device that same device so it's going to come in on SRO V it's going to turn it and write it back right back out all it's going to do is change the destination MAC address so it's going to change it to the system the remote link partner or it's for north-south type traffic or it's going to send it to the next VM it's going to give it the MAC address of the next virtual machine if you're going to a chain mug so that's the basic test and so we have the system's going to look like this on the left here we have the system under test with the nine VMs so we have eight beams we're going to use for test PMD and one VM we're going to do we call that VM 18 instead of DNS over here we're going to do that net perf test on it so you can see the traditional stack these are all identical VMs the only difference is that we're going to run that perf on this with a traditional stack and on these are going to run a DBT key application and then we're getting data sent to these boxes over a 40 gig device from another system running Ubuntu and DP DK packaging so packaging is going to be generating the traffic and sending it over to here I want to talk quickly about optimizations so both SRV and oh VSD PDK have Nov s itself have lots of knobs and buttons you can twiddle to modify and improve performance under certain conditions we found though was is that for the large number of vm's were using which is 9 the if we did an enhancement to say the configuration for o VSD PDK we found that a lot of times that would result in a degradation of performance for s r io v or vice versa so we left the we didn't do a whole lot of optimizations except that we did CPU pinning for each one of the virtual machines that we're using and in the DPD K pull mode drivers that are is in OBS TPK so it's kind of like the baseline optimization we didn't do a lot outs so what this means is the numbers you're going to see in this we have lots of numbers and charts and gauges and dials and stuff the numbers you're going to see in this are not optimal so I can't stress that enough this is not the best performance you're going to get what we're going to do is want to show a side-by-side comparison of these three interfaces so you can make a judgment as to which one suits you or your NFV solution best and they're in the possible problems that each one of those interfaces could have under different situations so now let's take a look at the traditional Network stack on top of these three devices okay so we're going to have a network running on remote system and network running the virtual machine the virtual machine is going to actually transmit the data the net prep running the virtual machines can transmit data over one of these interfaces to the remote device and we're going to capture the packets per second or the gigabits per second being transmitted from each one of those interfaces so let's start off with 128 bytes on the traditional Verdejo interface and then we'll hit start here and I'll go start running that task and you can see where we're around nine gigs of data is being transmitted okay so now we're transmitting at 128 bytes and number of threads we're transmitting at nine gigs at 128 byte packets now let's go to the DPD k fact DVD k OVS interface and we'll run that and it's about seven point eight gig so it's a little bit less now let's take a look at the same packet size hundred twenty eight bytes and s r io v you'll see that it jumps up to almost twelve gigabits so it's a much more performant interface and as is we increase the packet size let's go back to brood IO of course with the packet size the interface you know you it's more it's easier to process so the overhead from using OBS you know peaks at all because all this data has to be processed in OBS right so when you're using these other interfaces not using srl v then it all has to be touched by CPU here for sorting but SRV doesn't have to do that but it's still pretty good you're getting 20 gigs here at on your OBS on your Verdi oh so now if we go to the DPD kov AO v SD p DK we're getting you know 13 games so that's that's pretty decent and in SRL V we're transmitting at you know basically line rate at 512 bytes we're doing 39 gigs again these are not optimized interfaces it's just a apples kind of comparison if you want to look at the whole chart here we can look at say it's 64 bytes this is about what it's going to be so you have your Verdi oh you're OVS DB TK + sr lb and if we bring them all up here you can see what the data is so you can see that that OVS DP DK is pretty close in performance in this kind of situation to the Verdi Oh again this is a traditional Ethernet stack but SR view is always going to be more perform especially at a-- at 512 bytes SR avi jumps up to pretty much line rate but when you get up to large packet file size is 64 K then they're all pretty much line rate there's a little bit of a bump performance boost with Sol V so this is again this is a traditional network stack here running on top of the Verdi o or the DB D K OBS DB D K and s RL V so now let's go take a look at when you're happening when you're running a DB DK application in there so now let's look at we're going to first of all let's look at the board i/o and we're going to do 64 bytes and let's do let's do eight beams okay so let's run this so eight VMs will be there and what's happening now is that the remote system is transmitting data to each one of the MAC addresses for the bird IO interface and you'll see the performance is is so low that it doesn't even register these within the precision of these gauges it doesn't even register so we're only doing point zero four million packet spread over eight VMs it just doesn't register however if I jump over to the OVS DP DK interface and I do the same thing you'll see that we're processing you know five gigs of data and ten point six million packets per second now the reason that the Verdi o interface does not work so well what is because so this is what we're running right now we're running tests PMD on top of a Verdi oh and our DP DK back device or srl VV f so test PMD is a pull mode so it's always pulling for data well that's fine if you have a DP DK back device here because it's also pulling for data and they talk to each other it's paravirtualized or for SRV it owns that virtual function so it can pull that data at quite efficiently however when you're using this interface the traditional interface into OBS here this is a traditional driver which is interrupt driven so this is going to get data and fire an interrupt and put it into OBS or processing but that interrupt this doesn't work on a test PMD to the working around it's a pole mode driver in there so the fact that this is polling for data and this is interrupt driven those two do not align very well so if you're going to use a DP DK based application then this is not a workable solution the performance is just too bad to use it's unusable so let's go back to our demo here so practice down to say 5 EMS here and so at 8 VMs we're doing 5 gigs and 10 point 6 million packets so let's back it down to 5 EMS and see what the differences alright not a whole bunch of 6 million packets and 12 are 6 gigs and 12.5 million packets so that's pretty decent right each one of these VMs is getting data and processing again this is north style so that is coming in getting processed and sent back out back out the wire now let's contrast that with SR io v of IBM's again this is 6 gigs 12 million packets at 64 bytes we're going to run the same tests over the SR IV interface and now you can see we're at 15 gigs almost 16 gigs of throughput and 32 million packets a second are being processed almost 33 again it's north-south traffic only and so this is the details of each one the VM each VM is processing the same amount which is you know over about 6.6 million packets per second and if we jump this up to eight BMS what you'll see is that these numbers are going to stay very consistent almost identical they're going to still be about 15 gigs and almost 32 million packets that's because with SR RV the the sorting is done in the hardware it just grabs it and sorts it so it's a until Ethernet it's a round-robin queue they are all treated equally unless you configure them otherwise so in this case each VM is processing almost two gigs of data and a little over four million packets per second so that's our V scale is quite well for this north-south traffic pattern the more VMs you add it just equally divides up the available bandwidth and packet processing for that with the DPD KO VSD PDK it also scales but it doesn't scale quite as well in our configuration if we did some tweaks and modifications we think it would because the difference between like two VMs and six or four is actually the more VMS you add the throughput kind of drops a little bit and again that's the way our configuration is and you could optimize that so that wouldn't happen but the end result is that still north-south traffic over SRB is significantly faster even into one VM but if you could change the packet size let's jump up to let's stay with ATMs and we'll do say 512 byte packets again north-south traffic as you would expect the more packets per second are the more gigabits we're pumping through the larger the packet size this the fewer packets per second this process but the width of our gigabits but for for NFV most most people care about the small packet size so we'll concentrate on those but we'll Lee doesn't a 512 you can see that we're doing 31 gigs here which is you know pretty good this 40 gig device so we're pumping 331 gigs here and all these games are getting the data and processing it we jump over to SR IV again still eight VMs and you'll see that we're almost at line record 38 kicks here okay now let's take a look at east-west traffic so let's jump down to four VMs here and we'll do east-west traffic and we'll do it at 64 bytes again because that's the favorite for in a feed numbers so here we can see this is the service chaining so the traffic is coming in it's getting processed by the vnf and then it's being sent to the next vnf over the v switch again it's o vs d PDK I should note that all these numbers were seeing here on the transmit side it's not it's a it's so this this VM here is transmitting 2.5 3 gigs that means it's been processed and this is what is transmitting and it's transmitting 5.25 to 8 million packets per second and these number here is the aggregate this is the all the what the whole system is processing right now so we're processing almost 10 gigs of data and 20 million packets just on the transmit so if you really want to count you could double the Soviet Union packets if you wanted to Kemp that way on the RX and the TX but we just do it this way so each VM here now is getting a little over 2.5 gigs and as we add more VMS here let's jump up to 6 you can see that it pretty much stays of things 2.5 gig or so through this chain and we add more VMS and it doesn't change much it scales very nicely in the east-west configuration DP DK scales very very nicely so here we have eight beams at 64 bytes going through this chain this chain is again the traffic's coming in off the physical wire it's getting routed to the first vnf through OBS DP K and then each subsequent BNF is writing the data back out using test PMD and it's writing it to the next vnf in the chain until I get to the last one and it writes it the destination MAC address is again our transmit system our link partner over there so in this situation we have 8 v NS that are running and we're doing over 40 million packets per second okay and if we change the packet size you'll see that for example if we jump clear up to 15 18 bytes you'll see that the throughput here is is we're doing 220 gigs of processing again this is just transmit side so some people like to double that number for the rx but we're going to say 220 gigs is being processed by this system and 18 million packets so now let's take a look at let's go to SR Iove let's go down to 4 vm's like we did before and with SRO V 64 bytes each vnf is running only 2 gigs so that's a bit less than we had with OBS DP DK and each one is doing about 4 million packets okay so we're doing around 8 gigs of throughput here total and as we increase this let's just jump right up to 8 make our point you'll see that the all the numbers are pretty much the same they're all doing a little over 1.1 million 1.1 gigs and 2.4 million packets per second they're all the same but it's not near as performant as the Obst BTK let's jump back it out so this one is 9 gigs and 18 million packets 19 million packets so it's going to be host you can see that this was much more performant and if we go to large packet sizes remember that we were processing 200 gigs here 220 gigs east-west at 15 18 byte packets with OB s DB D K now if we do the same thing with srl V you see that we're doing 44 gigs and the three million 3.6 million packets so in an east/west situation SRV is not near as performant as kobe st bede k-keep remember this number here 44 Mearing because we're going to come back to that okay now let's do a side-by-side comparison of these two two things so let's go back to 64 by a north-south traffic and eight VMs so on the top here we have the s ROV data and on the bottom we're going to have the same thing over the OBS DB DK interface so again for north-south traffic with eight VMs here we're pumping through 15 gigs and 30 almost 32 million packets but with OBS DB DK recall it was only 5 gigs and and under 11 million packets and as you increase the packet size will jump up to 512 bytes remember the throughput changes the gigabits increases but the the number of packets per second goes down over here on the right is what each individual vnf is doing what each for what each end of the WM so go by 64 bytes here you can see that each VM is doing and with sr v is doing about 2 gigs of throughput and transmitting 2 gigs and transmitting 4.0 2 million packets however Obst PDK is about a half a gig let's look at this on a chart so the chart you can see on both situations there is a jumper performance from one VM to 2 it has to do with our configuration again we could have totaled some knobs to make one VM very performance but again we didn't do that so let's just take two VM see what this shows is is that at two VMs all the way to 8 that we're doing at 64 bytes which is what this is we're doing 15 gigs or so all the way across you add more VM doesn't matter we're still we're still processing 15 gigs and we're still processing over 30 million packets per second and as as we add more VMs there's a slight drop down here and again I think that's due with our configuration now as a contrast this with OBS DP DK the same situation you add more VM it doesn't get any better it doesn't really scale it actually drops and I think that is again a configuration issue but as you can see that max performance just doesn't there it's 5 you know 5 gigs and and about a little over 10 million packets so this is an example again of north-south traffic and so for north-south type of traffic pattern SRB is your most performant option and that is because it doesn't cause the hypervisor which could have some bad things you don't you don't have any to these switch controls that you may need but if you just want to raw traffic power then north-south traffic SR lb is going to be most performant because it doesn't touch that V switch at all so it's just the polar mode drivers grabbing that data and it's going fast as it can now let's contrast this with east-west traffic we're still going to show these numbers with 64 byte baskets because that's the one that most people will care about and there's more data in the paper that this talk because this video is based upon if you want to look at that so here we're doing 64 bytes east-west traffic so it's going chaining through these VMs as shown in this little mini video here and as you recall with SR avi we were doing aggregate value of about 10 gigs so all those ATMs together we're processing 10 gear 9 gigs and under 19 million packets per second so if we go back to the image of VM here you can see this is what each one of the individual VMS was doing on OBS DP DK we had a total throughput of 20 gigs and 40 million packets per second at again at 64 bytes and each VM was processing more than twice of what the SRV vnf was doing so the opposite seems to be true especially at low packet sizes is that north-south traffic SRV was a more than twice as performant as OBS DP DK whereas in the east-west traffic the opposite is true is OBS DP DK is more performant again these numbers can be improved but not a lot especially in the east-west traffic over SRV you are going to run into a limitation which is what we're getting after the purpose of this paper so let's air this so video so let's take a look at our charts again so here you can see at 64 bytes we don't have a lot of Giga bits here right it's pretty consistent though it's about nine gigs no matter how many VMs you add two to eight you need at least two for service chain right so it's going to be the same and the packets per second also is very consistent it's sticking around 18 million packets per second no matter how many being asks you you add in the chain it doesn't change the performance doesn't go up or down much there's a little flick here that's probably in our test so now if you look at OBS DP DK the gigabits here stays at around 20 right we're doing about 20 gigs here east-west traffic but as you add more vnfs there is more the packets per second improves a lot actually the gigabits goes up to is just this part of the charge you don't see it you will in a second so as you add more VMS it scales very nicely the more vm's you add in this east-west traffic flow the more packets you can process let's jump up to a higher packet size here so you can see it in a different view so again with the packets per second the process goes down because the larger the Ethernet packet size the few or that are processed per second right so now we have let's just do the biggest one here so do 15 18 bytes and again in SRV we're doing 44 gigs all the way across no matter if we have two VMs or we have four VMs or eight VMs is still 44 gigs and you know 3.6 million packets however with OBS DP DK we are doing each each individual VM right is doing almost 28 gigs here and 2.3 million packets so that's going to happen each one of those VMs in this chain is going to do that so aggregate we're doing two hundred twenty minutes or twenty or twenty gigs of processing and 18 million packets per second at a huge packet size so let's talk about why the YSR IV does not scale as well as OB SBPD k the reason is is that SR Iove is at PCIe technology this means that when you transmit data for one VM to another over SRV it has to go across the PCIe interface so the Intel 40 gig devices we're using are PCIe gen3 and they're by eight so that means if the theoretical maximum speed that those could do over PCIe is actually 50 gigs so we start off with a pool before we transmit any data we start off with a 50 gigs of being available so let's let's 10 let's send 10 gigs through this chain here and take a look and see what happens so now we're going to transmit 10 gigs and it gets routed through the the hardware up to the VM test PMDs running there it's going to read the data so we now we have used 10 gigs of our total 50 gigs so when you only have 40 gigs available on the receive side PCIe has received and transmitted sides so it's received 10 gigs and now we're going to transmit it back out so on one hop here we have used 10 gigs of our 50 gigs on both the transmit and receive and every VM if this goes to chain through it uses another 10 gigs so by the time we get to 5 vnfs that are processed this data we are out of bandwidth we don't have any more bandwidth on this on this maximum potential PCIe interface and in fact almost no devices actually use them have can actually support that much bandwidth for example the 40 gig devices that we are using for these tests for Intel they have a 40 gig Phi on them the physical interface internally over the PCI interface they can support around almost 45 gigs but since the traffic comes you know is restricted it's a 40 gig link you can't ever see that except on this kind of Noor or east-west kind of traffic and in fact if you recall from our video earlier I said hey remember these numbers when we were doing east-west traffic at the largest packet size which was 1518 bytes and 8 VMS we were doing you know five and a half gigs per V&F which was a total of 44 gigs so that kind of as of nicely doesn't it is that we're already we're reaching the cap of that interface ei interface of that device you can go over it you can divide it up over any number of vm's you want if I drop this down to four VMs you would still add up to be forty-four ish gigs of data and that limitation does not exist if you use o VSD PDK this is because OBS DP DK is a software construct right once that data is read by the como driver and gets into the system now the data moving from VM to VM is all done in software it's done with I don't know how the internals are but it's like page swapping or mem copies or you know they have all kinds of accelerate methods for this but there is no PCIe over here because that data is moved within the system itself it doesn't have to you have to go over the PCIe interface at all so it scales very nicely for east-west traffic not so good for north-south traffic because SR Iove gives you that best interface and there's less less overhead with a software so what's our summary the summary is there's this nice chart here that you can take a closer look at and get some more information but basically the the point of this video and the paper that it was written upon is that there is no one solution fits all SR iov is not your end-all solution for everything if you're doing a north-south type of traffic situation it is ideal if you need some V switch technology in there then you can't use this or avi so you have to do something else if you don't want to use D PDK then using regular OBS is a it's a good example is a good way to go but if you want your vnf to use DP TK to have the most packets per second then you really need to use OBS DP K or some other DP DK pulmo driver to pump that data into you into your V and F each one of these technologies has their challenges and limitations so the the advice is you need to go and investigate the strengths and weaknesses of these and you can have combinations of them on the same system but the there is no one solution that's best there are trade-offs for all of them so some of the learning is that I have had on this is that SRV provides the least latent you know so is your least latency because it doesn't have to go across any of those software layers your promo driver if you're running deep it can it can just simply go pull that data and read as fast as it comes in it also has less CPU utilization because it's only touched by the CPU in the BNF it doesn't have to there's no pull mode driver and the host it's that's pulling the data and there's no it's not being touched by OVS so it is a little more efficient that way however as as you probably know if you know anything about SRV that it is Hardware dependent so your VMs have to know about the devices can talk to you so that makes live migration a challenge it has been since SRV came out it is a challenge there are methods to do it but they're not easy they it's a pretty heavy lift observations and learnings about DPD k DB DK is super powerful and it performs and scales with the cpu and software enhancements so during the process that six months that are took us to go to do all these experiments and set everything up in our spare time and write this GUI and everything we upgraded processors and we upgraded the version DP DK and it gave us a pretty significant performance boost so what this means is that EP DK will become even more performant at with every generation of processors so right now the with the system we had it's between five and seven million packets per second is pretty much the maximum that that OBS DB DK can read data off of a physical port and process it and as those CPUs get more performant and the software gets more and more refined that number will do nothing to go up so it's going to continue to be a really good solution and I don't think it'll ever be is a performative that's Robbie again because it's still being touched by software layers DVD K has a steep learning curve it's not it's not easy to use you have to noodle on it quite a bit and it's not like other intel Ethernet technologies where the logo is it just works and there's lots of knobs I mean there's lots of things you can modify pinning decors and setting buffer sizes and there's lots and lots of things you can do to set all that stuff up and as with SRV it does have its challenges in live migration again because you have to have a VSD PDK device in your vm and if you migrate that to another system then there has to be something there for the hook up to and communicate with some resources so the paper that we wrote that has all this information and all the data was collected hundreds of different data points that are used to make the charts and graphs in the document are all document in there as is our test setup and our findings and why you know PCIe bandwidth is your limit limiting factor for your best ravi solution there's DP tk1 right that the instrumentation framework that I wrote the GUI based on is up on github I happen to write it so you can go up there and grab it and start making your own booze if you like and then if you want more information when I serve you and other stuff that I have written about it and did videos about you can go look at the YouTube channel here listed so in summary I'd like to say that's our iove again is great for north-south traffic but when you start doing east-west it just doesn't scale it doesn't matter if it's a smart minik or it has all these offloads on it the fact is is it has to go over that PCA ie interface to go if if your traffic is routed over PCIe to to provide your solution for an east-west traffic you're going to run on the same problem doesn't matter if you upgrade to 100 Gig Nick you're still not going to be able to push a hundred gig through all those vnfs in that mechanism so you may want a hybrid thing where you use SRV to get data into a vnf at a hundred gig and then you use OBS DP DK to do the service chaining and then the last vnf will send it back out SRV for example bottom line look at your options do some experiments and realize that you there is a menu of options that is available to you and you need to pick and choose and develop your solution in that mechanism I hope this has been helpful thank you
Info
Channel: Patrick Kutch
Views: 8,221
Rating: 4.9642859 out of 5
Keywords: SR-IOV, NFV, Network Functions Virtualization, DPDK, OVS, Kutch, Intel, Performance, NFVi, Considerations
Id: 6UUFWZs-Sck
Channel Id: undefined
Length: 37min 54sec (2274 seconds)
Published: Wed Mar 22 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.