Experience with DPDK Troubleshooting

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

in the meantime I'll just go ahead and introduce myself and you all can do the same maybe you want to go first since your name's yeah hello everybody my name is Yoko Soren I work in murky up mobile networks as a software architects yeah hi I am Jason ha my nfe architect at Red Hat I work in the group that produces all the reference architectures and I focus on nfe reference architectures all right today we are going to share with you our experience and D PDK troubleshooting although it says troubleshooting I know like some of you at least may not have the you know kind of architectural background of how D PDK works so we'll start off with that and then go on to you know these are things that we expect to for you guys to walk away with one is to get our V overview of DP DK and what can go wrong in DB take installation and how to troubleshoot that and lastly the tuning of DP DK which is very critical for you to get to high performance and you know what are the things that you can and should do to get high performance and how to look at stats because DP D K once your interfaces are in DP DK it's no longer in kernel so you cannot do like IP link show your links won't show up and so on so we're going to show you you know important and cool commands that you can execute on that Hat Linux so this is the agenda we're going to walk you through the kind of how we got here what are the telco requirements because DP DK is is is a you know data plane development kit is is a method for you to get high throughput which is like most of these requirements came from network function virtualization and that is kind of the main driver or motivator for developing things like SR Iove and DP DK so we're going to talk about the telco requirements at a high level then talk about like journey to DP DK how we got there from all the way from Volta you talked about the data DB DK in the data plane which is going to be like hardware tuning what needs to be done there how do you get the throughput what kind of throughput can you expect using DP DK and CPU core allocation high availability we're going to talk about high availability just at a high level but I'm going to talk about that because it's really important and lastly even though the title says DP DK troubleshooting the last section focuses on troubleshooting and various aspects of it which is the insta what can go wrong in install time what you should do and what can go wrong what does it look like in a h.a environment and lastly the performance related stuff so getting to the telco requirements they're basically two major pillars although you know telcos service providers operators whatever you may call them they'll always come back with you know we have I'm sure a lot of them a lot of us in the room I consider myself telco because I used to I've been working in service provider since 1998 so a lot of service providers will say you know these are the two major things which is first is two pillars which is high availability and the other one is performance but then again they may come back and say I also need by the way so is assurance I need that I need billing other things also everything is equally important but these two are the most important so we have heard of situations where like telcos are talking about like oh we do not want to set aside like no to do control function only because we want to get maximum throughput meaning the number of CPU cores that we allocate we want subscriber equate that to the number of subscribers who are paying for it so is that you know that's how detailed they get with you know trying to get ROI on this so first of all maximum subscribers per core how do you get that how do you get high availability of like as close to five nines as possible because here remember we have like tiered architecture it's like you have your data center at the bottom the hardware and then on the top of that you have the network function virtualization infrastructure which is typically OpenStack in our case and then on the top of that you have the vnf so the high availability is kind of actually tiered so you cannot get high availability unless you have kind of some sort of cohesive method but from the top to the bottom of the stack again high availability even though we are talking about troubleshooting and performance without 8j i mean if you have no high availability if your node goes down you're like failure means zero packets per second so let us remember that so that's why it's always important to think about high availability Delco applications and services typically so millions of subscribers when i mean we may not realize this but like when we talk about a mobile operator even the small ones are talking about like you know tier two tier three are talking about like you know eight million subscribers and ten million subscribers and it can be higher so what used to happen is the hardware based solution used to provide them these things because these two have you know multiple refrigerator size boxes do it you know having dedicated Asics and Hardware doing exactly what needs to be done to get the acceleration and the high availability so now we need to try and get that in a virtualized environment right how do you get that in a virtualization environment as much performance as we can get and we need to get that the maximum packets per second without having drops ideally with drops you I will show you a chart where you know which will share some testing done by our larger team which does performance testing I will share that information with you and you can see that with when you have drops like even though it's minuscule you can get like much higher throughput and without drops you know zero packet loss the PPS is going to reduce obviously network equipment providers you know people who provide the vnfs as well as the telcos have kind of like tried to move away from solutions which depend where the dns will depend on the hardware has any dependency right the reason I bring this up here is SR iov works great you get like fantastic throughput the only problem with that is really that the vnfs and the drivers for the nic are tightly coupled so if the vnf like vendor goes moves forward or more typically the operator or a service provider goes and changes the hardware then the VN there has to be a kind of like a dialogue between these guys saying like hey i change this thing is not working anymore and not working and those kind of things are not you know something that is tolerated in a telco environment typically even like you know if you have outages they have to report beyond a certain like seconds or like minutes per year you have to report it to FCC at least in North America so it's pretty serious affair to have all these things you know working all the time so ideally they want to get too many of the operators service providers that I have talked to want to get as close close to the cloud ready model as possible so they are willing to get the go for the flexibility while sacrificing a little bit of the performance again this like oops this is like kind of a kind of a gotcha because you know they may say we'll be willing to sacrifice the some performance but then you will realize like they want more performance later on so it's always a game catching up game where technology is trying to catch up and give you more and more packets per second yeah and then let's take a quick overview about the playground where vnfs are living so this is a picture of Nokia air frame solution on the right hand side there's there's a beam layer which in Nokia case also VMware is supported there and then there's a hypervisor layer and all of this is operating on top of the data center hardware so and on top of that is the actual DNS and those V&F may have been a certain requirements all of which are not exactly the same on all the BNF spot up probably and in many cases there are in common that the high throughput in terms of network as well as latency is required for example this is true in case of radio and annek or cases so there we see these terms like a real-time capability and throughout needs to be optimized and the DPD K is one approach to meet these Sciences thank you so now we'll let's talk about like the journey from vertigo all the way to DP DK how we got here right so when when people started thinking about nfe initially it was all word i/o and performance was you know considerably low you could easily get like maybe 600 to 800 you know Mbps out of a 10 gig link it won't matter it's just like not acceptable from nfe telco point of view so that is what like you have to nix 10 gig NICs in this in this whole scenario that I'm going to paint for you and out of that you have the you know first of all you have all these like hypervisor and layering there was actually one like you know older OpenStack documentation that showed that there was seven plus layers before like a packet from all the ways from the VM so like if you think about egress all the way until it makes it out of the physical neck or the P neck so that was like ridiculous right so things started improving even OpenStack made modifications it was not layered so many places but still the performance of vert IO in its native form is not good enough so then the next stage was you know people look at any piece started saying okay we need to get as much throughput out of a 10 gig as possible because you know you setting aside of one whole server like to do certain functions you need to get to put out of every single 10 gig as possible so that you can get the you know 40 gig 6800 gig throughput out of the whole box so this is a PCI pass through the PCI pass through the issue is if you look at you have to V enough V nf1 and v NF to PMF one has got Ethernet 0 and V NF 2 has Ethernet 0 but the 10 gig is going to be completely dedicated to the V nf1 because that's how PCI pass-through works you take you know basically the hypervisor takes the you know the characteristics of the 10 gig presents it to you know the Knicks and one of the next I mean the Ethernet on the Vienna for the virtual machine and that started this you know we have one uses that and that's the end of it so we enough to Ethernet 0 does not have a place to connect to that is a problem with PCI pass-through again you could have you could have multiple NICs for all these things but again think in terms of like monetization right you want your cores and your necks to be use Mac maximum so that you are getting paying subscribers for those things then came you know s RI of e which elegantly solved that problem where so what it did is like again the there's another issue with this right with this previous scenario is that the vnf one may not really need all the 10-gig it may want like you know say 1.6 cake or something there is no way to share that 10 gig is just like the remaining bandwidth for the 10 gig is just wasted that is the problem so sorry we solve that elegantly by taking the taking this the physical neck and creating physical function and virtual functions out of that right you typically have one physical function which has got all the characteristics of all the registers all the characteristics of the hardware whereas the virtual functions of VFS which you see we have one through n those have typically only the receive and the transmit queues and this is all kind of built you can think of it like a switch within the you know within the NIC card itself right this is not really a part of the OS that's why it's in its own box there so with this you can still have vnf one Ethernet 0 connecting to virtual function 1 and vnf to Ethernet 0 connecting to virtual function 2 and both of them sharing this 10 gig bandwidth and being a getting better you know throughput I mean utilization out of this one issue with this I already kind of alluded to this earlier is that the vnfs vnf 1 and BN of to have to have a driver which i mean they have to be have a driver support for the exact you know a family of NICs you know if it's a X 540 whatever it is it needs that right so if if the private cloud changes they call a new generation of necks via nufs have to be you know reprocess so that they will have that that you know support which is which is a problem which is what we want to go away from right this is what DP DK typically looks like you have a couple of fundamental concepts that you need to understand in d PDK is that DP DK uses something known as pole mode driver or TMD so what it does is there's a polling mechanism another equally important concept is that DP D K OBS DP D K actually is going to reside in user space there's a you know sitting in the user space it directly is able to because of other like you know kind of optimizations such as V host user and stuff like that it's directly able to communicate from the VM and bypass the kernel that is where you get the optimization and the speed and the pole mode drivers the pole mode Travis is aware the way it works is you will need to dedicate like one core to do the in my case in my lab I have like you know 448 CPUs 0 through 47 so I will need to dedicate one of them saying like you shall only be serving this you know you know polling polling function right so it's very important to understand how this thing works because that it's all about in DP DK the flexibility comes some complexity and the complexity is if you tuned it and you know you go through the extra effort of doing that then you get a pretty good solution that you can use where it's more cloud ready than SR IV so in the DPD kate data plane the there are these are the things to consider one is the hardware tuning that what kind of throughput that you can get what you should expect the cpu core allocation and the high availability will go through those things now so from a hardware tuning point of view first of all we talked about zero packet loss that is fairly important you know that we have this high throughput with no packet loss right for that you will need something some later hardware generations you cannot have a older one it's not supported there are lot of like tweaks that are done with if you use Red Hat director for installing the OpenStack a lot of it is done for you and more and more things as and then as we go from 10:00 to 11:00 11:00 to 12:00 and so on more and more of these things will be done for you it will be completely transparent but you will need to have a hardware like a CPU which supports those things the generation of CPU for example if you're using like in the nfe lab that I work in uses X 540 and some of them the other labs have ordered X X 710 these kind of like you know CPUs are required secondly it's also important that you go into BIOS and you know kind of disable things like the c3 power state and take all these extra steps that are required the reason they are required is if you don't do that it will cause a lot of this local interrupts if you look at like you know the interrupts you can see I think it's like /proc slash interrupts if you look at that will show you all the CPUs on all the interrupts that are associated with the that CPU right and the local interrupts are pretty bad I will show you like some charts what it looks like when you when you have a tuned situation tunes profile versus a non tune profile I will show you what it looks like so it's this is something that you know you will find even on the OU Vyas you know web page these are things that they recommend saying that you should do these things to get a high performance so now talking about I know you have a like a nice picture on the right side but I would like you to focus on the left side which is talking about some important things for the test that was done here to produce these results you know to to Verta your net interfaces were used in the VM and then to 10 gig interfaces were used and then test PMD was the test tool used although we have kind of like developed a new new tool now our group has developed a new tool called pea bench which is also available for public public consumption I will show you that and share the link with you bi-directional traffic as well as like you know you know the various profiles right you can see like on the left column you have the packet sizes if you talk about talk to any service provider if it's RFI RFQ always people will bring up the discussion about 64 the smallest packet size but in a true I mix whether it's a mobile network typically mobile network everybody uses mobile now like Netflix whatever in a to like what you call internet mix or whatever you want to call it these days it's a mix of all these packet size is 64 256 1024 and 1500 in fact is a very cool command that I came across which have taken the output off which will show you for these packet sizes you know what were the counters so you can actually look at that in DPD gain again if you look at column two you will see like with 0.0 0.6 T I just take the 64 byte packet since it's very popular and discussed all the time why by mobile operators and telcos so with that you could get like almost like 12 in 12 million packets per second if you had that kind of if you tolerated that kind of loss right but remember not all traffic is TCP not everything is going to retransmit there's lot of applications are built on UDP it's just a lot of overhead you don't want to throw a packet see if you can avoid it right and then you compare that with like with zero percent packet loss again this is remember this is for two ports not for one core right and for two course if you now look at a column number five which shows seven point three four that is realistically what you can get about approximately about 3.7 million packets per second with per core that's what you can realistically expect okay so which is a huge improvement over Berta oh and getting closer to the cloud cloud ready model with with sr io v you can get close to line right you can get like nine point six nine point three gigs gigs per second on a 10 gig interface again the we don't like to talk in terms of Gbps because it really packet packet size frame size comes into play because as you can see just smaller the frame size the you know the the lower the G PPS right if you can it's just simple math you take small packet sizes and you multiply how many millions packets per second you will get the like Gbps and large packet sizes you can get more Gbps less packets per second alright so this is a diagram which kind of shows and the bottom you have the physical course 0 through 3 and core 0 is being allocated we do the cpu isolation and say these are the ones which the host can use host can use for multiple purposes right one is like for nothing to do with housekeeping some other functions that it's doing right nothing to do with NFV or DPD k for things like that you can set aside CPUs for that you need to set aside CPU for the poor mode driver which runs on behalf of the like of the host to use and then this physical CPU is one through three one two and three are given to the VM to use you can say I will show you the configuration parameters in the director which you would set to achieve this I guess no questions so far should be pretty clear easy to understand I guess so talking about yeah we should wait till then okay let's wait till then all right so so DP D K H a the way you know we have kind of designed this for the lab and the upcoming reference architecture document is that what I don't show here is like Nick 3 and Nick for also bonded that's my network isolation you know for all the other traffic like internal API storage storage management all of that and if you are using VX lamb that tenant tenant traffic which goes from the controller to the compute nodes for the VX land traffic all of that rides on the bond which is on NIC 3 and Nick for I don't show that here because it's not it's not a otoscope for this but just I wanted to mention that Nick 5 and X 6 are dedicated for data plane in in the NF Allah that I work in what we do here is we bond those two using OBS DP DK bond the DPD Kapaun right and in this example we show that vm 1 DP is also running DP DK and it's got a pole mode driver and it poles that's other thing that you have to consider is that it has to if if the DP DK in the user space outside the VM delivers a packet unless the pole mode drivers active and goes and fetches that packet you could have like very post poor throughput so both things have to be in sync for this to work well so in this example if you have the way we do it as Ethernet zero connects to the DPD K bond NIC 5 connects to switch 3 NIC 6 connects to switch for so either a switch 3 or switch for fails you still have at least half the throughput that is the whole you know criteria of creating these bonds finally we will start looking at the troubleshooting part one is there are three aspects to this one is the what how to install what are the things to set for installation and what can go wrong during install what again how give you can look at that again from we said H is important and we want to use DP DK bonds and and you know we can will show you command how to look at that lastly we will look at performance related stuff how to get it on the node with show commands how to look at counters and what are the important things to look for hey how are you all mode driver DP DK to enable lesser Iove as well that you sort of made them sound mutually exclusive okay that is a thanks honest so so you're talking about the scenario there's only one scenario where you would use a pole mode driver in the VM and sre will be in you know in the on the host is that the one you are talking about possibly yeah that's the only scenario otherwise you don't really need a polymer driver for SR IV so there is a scenario but I don't know of any any peace or telcos wanting to use that they are okay we should talk later I'll talk at the end okay yeah please pull me in I'll be happy to discuss that I mentioned that in my yeah yeah yeah I know it will work but you know whether you that's again like going back to that doesn't take away the problem of using SR IV right you still need to use a SR IV there it will just increase the performance I guess from a VM point of view just do pull me in I would like to learn what you guys have done and we'll take it offline thank you so from installation point of view again you know we use you know in in the lab we have used recently Red Hat OpenStack director with the Red Hat OpenStack platform 10 and these are the these are the parameters that are set in network environment dot yeah Mel file are which are kind of related to DP DK some of them are used for other things as well as you will find similar configurations for SR IV and other things but these are the things that are kind of listed here saying that you have to kind of configure and touch it for it to work and the ones that I have highlighted here if you look at the host CPU list host CPU list is can be specified as like like enumeration like you've done here or you can give it a range saying like for example you can say I could have just said like 1 through 47 and just go ahead and use all or I can select these which means that like other CPUs like one or something else is not really you know set aside for this purpose and out of the host CPU list again some of these you might find some changes as we like these things are progressing and evolving the like how these things are being interpreted and like already I'm hearing through my developers that like developers in at Red Hat that these things are going to change how we're going to set this and interpret that to establish the you know the tuning is going to make change in the upcoming releases but for right now what it means is you have a entire list specified under host CPU list so you say this is the entire list out of that you say Neutron DPD cake or list can use four six 20 and 22 out of that list for for for PMD pol ma driver right you can use that and then you set aside for the V CPU pen is the one which is these are the CPUs which are set aside for the VMS to use the eight ten twelve and so on so together like if you think of it right the host CPU list is a superset of the neutron DPD cake or list and the Noah be CPU pin list pin set sorry and that can be some of this information is available in fact examples a complete full example sample file of metro network environment or DML and each of these configuration parameters i explained on the customer portal Red Hat portal the link is provided here if somebody wants a PDF copy I can try to take names and send you a PDF of this you know presentation later so during installation if your install went well and this is what the you know DP DK setup should OVS DP decay should look look like in our case we have a bond so you will see the port mentioned as DP DK bond zero and it's got two interfaces DP DK 1 and D PT k 0 right this is what a healthy installation should look like there's some interesting things you know which you know you make mistakes in the ml file and all that it could lead to some interesting situations like having extra space between commas you know I thought you know that will be ignored but it's provided fed into the command line which fails silently and you'll never know about it so those kind of things can happen to you but you know if it is healthy everything went well this is what it looks like ok the type DP DK is important because you know the non non DP DK 1 by the way this is referred to as obvious user bridge versus obvious bridge OVS bridge is the one which you would normally use outside the DP DK in our environment so how this is if you want to check like what obvious DP decay options are being set you can go cat this file after the install is done you can go cat this file at CSIS config open open V switch and you will see all the DPD care options there right there's also like this some like this file whether it gets used or not it also depends on which OVS version that you are using 2.5 uses this and restarts obvious and 2.6 reads out of some sort of a database is what I was told so anyway I mean you can go cat this file and you will find that information there this is kind of what it looks like when DP decade was not set up and you had some sort of error like this one scenario this could this another scenario that we ran into but have not captured that so this is like what happened was there is kind of a race condition and because of which you know the DPD case like I said like I factoring hurt and put a space between like two parameters which are fed too obvious on the command line so obviously in a command line you can't have like spaces it bought and silently did not install DP DK but that's what the output look like yeah this TPD ke dev point is a pretty handy tool to to point a nun point devices and also check the stairs which means that if something goes wrong even in a very beginning when you are taking into the PDK into use it's worth worth checking at whether you have the interface even create it or not thank you we talked about like how we have used the bonding for DP DK and the way to look at the bonds is you do a command obvious app CTL bond slash show and the actual bond name in this case it's DP DK bond 0 from that you can see all the details the most important being that the DP D K 1 is the active slave and so you know when we did failover test we actually went and failed the active you know the switch side of the active link and try to see whether it fails over to the you know standby and stuff like that other thing to notice is like the bond mode right now is balanced TCP LACP is going to be supported shortly I think in OpenStack 11 is that right I think I forget the details but I can easily find that out for you if someone emails me later on like you normally see P will be supported right now it's not supported with this one so you have to use if you want LACP you have to use balance balance tcp as the mode sorry when I say LACP is supported it will be a 2.3 ad which will be supported okay then comes the kind of the final part of this presentation which is the you know troubleshooting things to look for in performance and there's two aspects of it things which you can look at on the node and measurements that would be interesting both it's not just from a troubleshooting perspective of course like making sure like your RX and TX counters are going up drops are not going up things like that are obviously useful but also if there's some way you can use it to maybe graph something or use it for stats that's also important here's some of the commands that I got to use in the lab and you know so you can you can grip per tune D in the HC tuned e boot command line it will show you the like the host CPU list which was used right so this is the one that because of tuning this is what is provided so that at boot time these CPUs are set aside for the hosts second one is you this the in the OpenStack director installation it creates this file called CPU partitioning variables conf and that will also have like you know what CPUs are being set aside for the hosts and lastly you can grab a V CPU on the HC Nova or know what conf and we will show you the V CPU pen list right these are the things that are useful to look at so this example is taken from a output of a lab which is like not there's a dynamic Club they do use it for performance this is not from my lab but this is very interesting data so I thought I will share this with you so if you take a look at like this is graphing the like local interrupts like I told you right like you can you can actually cat or VI proc slash plop rock slash interrupts and you'll see it's a huge very long file it's like one line runs into one single roll runs into many lines and you know and it's kind of hard to view it visually but what they've done is like they built a tool that pbh tool that I talked about and that tool is available on github and I provided the link at the bottom so you can go grab that and this readme files and stuff like that they use that for a lot of you know testing as well as tuning stuff so here if you see notice I've highlighted CPU 17 in that like radial box or whatever you want to call it and that's showing that the average local interrupts are approximately thousand per second which is very high that means it is doing something you should only get to interrupt one for the poly mode driver one for the VM if you have tuned your CPU right if you don't you need CPU this what it will look like so you may have run some you know installation thinking everything is great and then you use your traffic generator and you're not getting the performance these are kind of things that you need to look at and here's the example of like CPU which has been tuned now selected CPU 1 and CP 1 from that radio box and you can see that the average number of local timer interrupts is is too close to 2 right again this is not the only thing you can also look at like war log I think I show that like var log QD dot log and you will see that like you know the it has actually attempted to do the partitioning and things like that look at that and a bunch of useful commands are captured here for sake of reference like first of all this one is the one which displays the packet counters and the drops are highlighted receive transmit and drops these are things which you can quickly look at like you make sure your receive Rx and TX is constantly incrementing and drops are not sometimes once there are transitions you can expect drops even in you know hardware based solutions you will see certain drops there are transient drops we tend to ignore them the more important thing to pay attention is drops should not continuously go up and there should not be large numbers that is very important this is again I mean this is sake of reference I provided this but you know this is not something that at least I have not used this a lot but it gives you kind of correlation between the port numbers and the port names so you can make the connection this is this is a very interesting command I alluded to this earlier this gives you like based on packet sizes right like packet ranges 128 through to 255 that's our X Q and 64 1 through 64 small packet sizes is in that receive Q and so on gives you counters per packet size which is very cool because you can kind of interplay extrapolate and figure out like the traffic mix also okay pretty cool command and lastly this gives you like displays the PMD port allocation like to show you that Numa Numa node 1 is where like you know the coarse 14 and 15 are being used right I think that's all I had if you have any questions we can take them now you go ahead um so TCP dump is not available with DP DK because it works kernel mode but there is a utility that replaces TCP dump and that's all I know can you explain a bit about how you do TCP dump like troubleshooting in a DP DK environment okay first of all for sake of transparency I ran into this exact same problem we ran in this exact same problem you wanted to see like packets you know I love to see packets from the source all the way to the destination every every hop of the way I like to see that and I quickly found out that TCP dump doesn't work in fact if you do a IP link show it will not even the links don't even show up in in that output right so there is actually a obvious TCP dump I'm being told by Flavio I think and Red Hat was like you know kind of a guru in this area he said that you can use that and that will show you all the output but if you do a normal deploy using God Red Hat OpenStack director you don't get that what I understand is you may have to kind of build your own DP DK and there's like a tools directory in that from that you can start using that I think that obvious TCP dump will be available I have not had time and the last month to dig into this but if you leave me your email address when I find out I can share it with you okay thank you to add that you might consider also trying a port mirror within again within that the key there is that anything that DP DK pull mode driver software is is utilizing any interface that's been assigned to it no longer uses a normal Linux kernel mode driver so TCP dump of course deals with kernel interfaces excellent slides I recommend everybody in the room do get these guys email addresses and ask for these slides this is golden information can you go back to the install slide with all the view CPU variables there are some bits here I want to point out everybody that is absolutely critical and understanding high-performance vnf compute nodes this so this is a hyper threaded system you can tell by the numbering the other thing there's another session that was earlier today find the slides and recording Newman node alignment CPU pinning one gig huge pages all of these things work alongside DP DK to enable not only DP DK if you're deploying it at your host level which is what we've been covering in this session but for any other use case for VN s as well having one gig huge pages nice neat blocks of RAM having Numa node alignment is critical and this is on a Numa node system if you the short version is if you have a multi socket system you have two motherboards that are connected via a QP I interconnect like a bus so every time you map a port you want to map your CPUs and your RAM to the same side of the motherboard the same Numa node that that your whatever your transport mechanism is so excellent stuff here this slide alone is worth everybody reaching out to you for the slides the other thing on performance that is always alarming the first time people run into it is the apparent CPU utilization of the pull mode driver if you go and monitor just you know your typical NMS guys are going to go oh you installed this thing and all of a sudden my utilization is 100% on these cores and no no really it's a pull mode driver which is doing what it says it's pulling constantly for packets to be centered or see about a port so that's a false positive maybe not way to describe it till you're knock people to settle down but yeah if you're still running classic OBS you are filling your company in my opinion okay thank you thank you for the comments go ahead just a follow-up to what you said does it make sense to actually track or measure the processing cycles for DVD K and the polling cycles for TP TP separately so it gives a sense that okay how much of the CPU cycles are being used for processing packets great question no do not do that those CPUs that you're allocating to anything running DP DK whether it's the many many vendor provided vnfs that that employee DP DK within their own data planes or whether it's DP DK and open V switch those cores are dead to you they are gone they are allocated for forwarding you will not be using them for any other purpose you should not want to use them for any other purpose they are giving you much greater functionality in your environment so don't don't even bother tracking those core those cores are gone I understand what I meant was that is there any way to find out that the CPU core allocation or CPU CPU cores allocated to a BNF are begging to the threshold and that is a reason why the performance is limited or that's the bottleneck for your performance how do you do that okay so let's say we were comparing classic OBS to OBS DP DK that that's a good context for your question total number of packets per second throughput is what you should look for because if you're running OVS on a good day you're getting 1.1 million packets per second and you're done for the hosts the rest of your house is sitting idle core is doing nothing so packets per second is the threshold that you're looking for as soon as you go to hear his charts already covered all that I mean even even lightly tuned the dpk enabled OVS is going to decimate that regarding coexistence shortest possible version of this using open Stax scheduler filters groups of compute nodes that have ports allocated for SRO V and groups of compute nodes that have well anything running OVS should have DP DK enabled but those hosts coexist in the scheduler filters do a very nice job of picking where your workloads go so if you're a telco and part of your job is to schedule you know various cisco juniper brocade whatever Vee routers write firewalls load balancers that are compiled with DP decaying they will benefit from running on top of the switch and you retain the flexibility that our speakers have mentioned they benefit more if you can afford to give them Numa node CPU pen and and and one get cube pages to those s ROV enabled interfaces so the rest of your commodity oversubscribed compute load should be on top of these nodes without any question whatsoever they're like how existence works great you guys choose what your business needs are to determine where this workflow work I'm being signaled that you know we're run out of time so can we take your question offline or maybe last question make it quick maybe it's the last question okay thank you so much for attending this is also like a reference out architecture that we put out there are two of them one called you know deploying mobile networks using nfe something like that and the second one is going to be using OpenStack 10 so look forward to that on access dot dot dot comander reference architectures

Info

Channel: Open Infrastructure Foundation

Views: 2,618

Rating: 4.6666665 out of 5

Keywords: OpenStack Summit Boston, Architect, Telecom, Juha Kosonen, Ajay Simha

Id: BEXwaf5IPhk

Channel Id: undefined

Length: 48min 35sec (2915 seconds)

Published: Mon May 08 2017