Multi-Path vs. Multi-Chassis

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so welcome back everybody this is the third session for the brigade virtual symposium thanks for joining us this session is multipath versus multi chassis as many of you know we are looking at the new era of land networking where once upon a time everything was driven into spanning tree and it was a rigidly defined rigidly session so in the first session if you've just come to us here we had an overview of how the spanning tree forces us into an invariant design position where the hierarchy comes like this and then the use of fabrics especially brocades VCS fabric allows us to create a cloud of networking so this session then drills into how the multipathing VCS fabric works and the differences that we're going to bring in it's going to be a more open format open structured kind of discussion and we've got the the virtual whiteboard is made real we've instantiated here in the boardroom to have a much more dynamic discussion and joining us in a session first of all is mr. even people hey he's coming back again for those of you and also introducing Josh O'Brien he blogs at static NAT calm and I'm on Twitter at Joshua Bryan 77 and also joining us is I'm Chris market and I'm Chris market on the Twitter cool and thanks for joining us good to see you by the right person not to offer we actually get to get MCS and wearing a stunning mullet today is tony burk yeah do Center overlords and i'm on twitter @ @ t burke maybe you a Yorkie okay and returning is a chip copper chip at brocade calm yeah and of course I'm Greg Fowler you can find me at ethereal mind calm where I blog and write and on the Twitter's as ethereal mind and of course also at packet pushes so without further ado let's just hammer off into some topics around multipathing versus who who wants to head off with a bit of a discussion around how trill fabrics do multipath ivan or chip I'm sure I'll talk about it the first idea here is that we have multiple ways the different components inside the network can be connected to each other and so the idea of shutting off a communication path just doesn't work so what we have to do is have to look at solving a couple different problems interestingly enough before we get into it trail solves half of the problem that we need to discuss trill solves the a multi cloth multi-hop issue and we depend on data center bridging to solve the other piece of the issue which is the lossless behavior so for a time being let's ignore the lossless behavior piece and let's just concentrate how things are going to go through the network as is pointed out in the previous session Browning's the way to go the ability to go multi-hop multipathing the ability to dynamically and someone intelligently go through and figure out the best path through the network is absolutely an optimal network design the difference is that we don't want the host to know about it so as we start to talk about true most important thing about it is that as far as the the edges are concerned nothing has changed the frames that they're going to send out are going to continue to be the source ID the destination ID of where they want to go with the payload and we're done in the meantime what's happened is the different nodes that are going to be participating inside of this troll fabric now notice our bridges for routing bridges are going to have exchanged routing information going back and forth there's a bit of a religious war going on about what the routing protocol should be because the in the very early versions of the standards they said that in order for these bridges to communicate with each other they need to be able to exchange routing information how do we do that and the early versions of the standard said well they should use a link state routing protocol like say Isis is is of a fond protocol for those of us who remember the early days of Digital Equipment Corporation yeah that going to taste fine is great oh I know it wasn't around the technically spot but OSI oh yeah still there we shall talk about we just don't use them so so what we're going to do is we're going to have each one of the nodes figure out who their neighbors are and then they're going to exchange according to a link state table now the reason I bring this up is a little bit of a religious issue right now is there are a lot of people who are saying what link state routing protocol do you use first thing I'd offer is that if you care about that you're still down in the weeds then I think that there are different ways that we can we need to be running a link-state routing protocol and it needs to be one that we can agree upon so that different nodes can communicate with each other especially in a multi vendor environment but the important thing here is that they can interoperate and so although Isis happens to be the one of choice for trill that doesn't mean that necessarily is is put in stone the other thing is significant about this is it's not really Isis it's actually a variant of Isis where we've had to change some primitives and so forth depending on what the network's going to do so we certainly believe that that's a standard that's coming forward there are other complications about bringing in third-party switches which we probably won't have a chance to talk about here but the important thing is here there are standards that are out there in old guys story does anyone remember when tcp/ip first came out it took about seven years for interoperability to take place that when we got tcp/ip it was remarkable because vendor a's didn't work with vendor b's and in one case vendor m made the stack for vendor i but vendor M's and vendor I stack didn't interoperate I think one of the things that's them we're seeing right now is that we sort of expect instant interoperability the moment we see things coming out onto the on the market well we hope the vendors have learned a lesson or two in the last thirty years yes to me is actually a really interesting topic because networking is driven by standards yet service applications don't so when Microsoft writes a C compiler it's actually not C compatible it's their version of safe that's okay now we've seen different companies in the market space create or adopt the tril standard but then enhance it with their own proprietary extensions so they claim that they are they would say you know we are compatible with the standard but we have these whole bunch of extra extensions that allow us to be fancy but their proprietary that's right right and then we have in the net you know that social Cisco's fabric path for example is tril plus some set of extensions which is their fabric part strategy the brigade is chosen to go down the path with their VCS virtual cluster switching which is the name for your fabric strategy that's right it's correct and there your fabric strategy says we use ultimate absolutely trill standards but the routing protocol that we use to transport to to spread the Mack routing information the MAC address routing information uses F SPF that's right today today but it is is it your stated direction to move into ISAs in the future absolutely as a matter of fact for us to be able to go to that routing protocol it's just a matter of reflashing the firmware we go to customers and say do you want us to go through the effort now the reason we have it that way by the way is because as I mentioned earlier we only had to link have a link state routing protocol the reason we picked F SPF is actually for the efficiency of the switch because if we're going to already be running Fibre Channel over Ethernet traffic Fibre Channel over Ethernet also requires F SPF and the idea of going into existing fibre channel sans said let's use the same routing protocol well since we're already running one protocol why not use it for both TRO routing and fibre channel routing but as we do really see interoperability taking place and we can now have multiple vendors putting together trail networks for us implementing Isis is going to be very straightforward yeah so it also is worth pointing out that the interoperability is usually a red herring it's someone trying to stop the sale so it I have nothing against interpretability I'm all for interoperability but VCS fabric today is how many switches at the most um very turn pro but check it out today we'll say 24 but complete but please check that because it's a slice absolutely so we are talking about interoperability of something that has at the most at the moment 24 switches yes are we stupid yeah I mean I mean right now in terms to get the benefits of this the being able to have this multipathing Ethernet technology I really don't care about any off the bill right now I would like it in the future but right now I'm not gonna build a huge if I'm going to build an art bridge Network I'm going to use one vendor I mean it's just like if I build a data center if I use multiple vendors I'm not going to intersperse them would katene each other I'm gonna have like a cluster of switches here and a group of switches here I might and there are three switches on the top from whichever vendors yes oh I love the you know my core will be one vendor and my aggregation and access layer might be different vendors but I'm not gonna I'm not gonna like have my a switch my B switch one vendor in another bidder so in terms of interoperability right now I just don't care that's right now the good news on all this is from a frame format as was pointed out 100% complying to the standards so far as the standards have been defined and I bring this up one of the problems with with tcp/ip is that everybody's with standard compliant no one's was interoperable because you can't write standards that are exact enough and that's why I think that when they be you can takes 20 years yeah I think one of the things that was brilliant when the loss of safety net was coming out was it at the same time data center bridging was being developed there was a reference architecture called Cee converts enhanced Ethernet that also came out as a freely available standard and the idea being that wherever the standard was ambiguous go look at this reference implementation and whatever they thought that counterman that's what the counter means and that's how it's going to be used unfortunately trill followed a different path and so what that means is today because there are all these different variants that are out there a lot of people can say where standards compliant and yet there's no interoperability in the long term now we would expect some level of interoperability between vendors I think it's that's absolutely going to be driven by the market and I agree with ya so you know if you're watching make sure you demand the vendors to be interoperability over time to make it into you know I think that's important for the industry as a whole because it may be one day that you need to interoperate you will have a multi vendor network and if you haven't got that functionality then your design choices are limited now maybe you may not get certain enhanced functionality so for example the brocade VCS strategy has an order figuration capability or what I'm not sure if that's your term for it but if you plug in a bunch of brocade switches they will detect the neighbor that the neighbor switches of VCs capable chassis and then just bond up and automatically drill enable itself self configured almost that's right right completely and that's I must say we'd actually came to brocade six months ago we actually got to play with that in a live demo and now I just it took me a little while to tweak exactly what had happened and literally just light them up plug them together you turn them on and they go ah that's a brocade that's a VCS enabled and trill turns on and it's really that simple to run trip now as Ivan will leap in and say there's an awful lot of magic going on well you see Greg actually thanks for mentioning this particular example I'm really disappointed at all the vendors who are not doing that because after all we have lldp that has been standardized for I don't know how long and it's pretty simple to detect that there is a switch at the other end of the link if you are using LTP now why can't you auto provision that link with some sensible set of defaults that I could configure as a template and I mean if we look historically back some of the greatest successes in Ethernet networking have been Auto configuration so not wishing to you know bring another vendor in but you know we look at ISL and dynamic trunking protocols you know in the Cisco Network just plug them in and then they started trunking and then they created VLANs and VTP signal-d you know and today it's a purse worth it to automatically propagate VLANs around the network but in its time and I remember deploying VLANs in you know 1999 and it was an extraordinary amount of work to convince to for people to understand how to configure VLANs people didn't grok the idea of a VLAN right so no those automation technologies have a powerful effect for adoption of the technology yes they do yeah and they drive the market in that direction that's right and and and in people who can start and it just works and then they go well maybe I should start to understand it more and then the trillion so tech anyway now that you've mentioned VTP you know it was a first attempt at doing something and they obviously went way too far with the ability to a random of a random switch to you know erase the whole fabric if you wish to call it that way we've learned in the meantime how far we can go with other configurations oh yes that's right yeah I think people liked VTP but there is that danger there so that's why people stopped using it it was brilliantly can convenient but also dangerous well just to defend that back in 1998 when I started to deploy it they weren't that many big networks it was five or six switches and now you and people have five or six hundred them it didn't scale but you know you didn't have the defenses that it might need it to have so the good part about VCS technologies is that or trill is that that's part of the calculation of a standard that's right right so it allows for up to it has mechanisms in it that it makes it safe so the people who put design trill and took it to the standards bodies they built in failsafe so that things don't overrun it their intention that's exactly right and there's always going to be an element of proprietary and for example although we speak trill and although we understand wise and V likes and so forth we still have a an admittedly proprietary differentiated protocol so that if I have a link that's running between a pair of switches and I plug a second link in it's automatically going to figure out not that that's alike but rather than now I can combine those two links and on a frame-by-frame basis I can do load balancing across them without having any manual intervention oh you should really explain that in more details please did you let's hit the one point yeah this is one of the amazing features of VCS fabric right let's frame by frame we're not exactly afraid that's exactly right so Wow but how do we how do we make sure that we get in order frame delivery which is important especially for things like fabric now I'm going to give him I forgive me close to me so I should be careful here don't strike me when I give you this answer okay the answer is we've been doing that for 15 years in in the fibre channel world the type of trunking we're going to be discussing here has been out now for a decade and a half so we know it works and we know how to put it together it's just a matter of introducing this technology into the ethernet world where this whole idea of how are we going to do it is is dangerous it sounds dangerous so the good news here is this is not an experiment this is not something new we've come up with I for 15 years now I've been doing exactly what you described and we address that issue so that's the good news it's just a matter of getting ethernet people used to that but I appreciate your comment and thanks for graciously letting me give you a snotty answer so the reason this is different let me know if you can't see this suppose that I have a switch here and a switch here and um on this can we see this just look like Oh excellent and so on this I'm going to have an ASIC because the the ASIC is essential here because I don't have the opportunity to go in and do this manually if I have to go to a control plane to do what I'm going to show you everything is going to slow down that's part of the piece here that says it's going to be differentiated on these Asics I have these things here that are called port groups and what these port groups are is inside of the ASIC because one of the ways that you scale the number of ports as you actually have multiple collections of modules inside the ASIC where there's a very tight relationship between some of these ports and so what happens is I'm going to start with the case you'd expect first if I have a link here and then I go in here and I connect the link here I've now got two links going between these pair of switches what will i form here well I'll form a line right and I'll form that because I have two separate Asics and it looks exactly like it would in any other device for anybody doesn't know a lag is a link aggregation group and that's a board show it's a port channel or an ether channel but it's basically two physical connections between switches bonded into one logical autumn or which was a trick that we use to get around so we tricked it into thinking it's one link instead of two it sold several problems it gives us double there's double the bandwidth at this speed and it's useful for spanning tree to avoid some spanning tree loops and it actually doesn't give it about the bandwidth which was the original question because if you have one huge flow it's an one source in order to use let me explain a variety of source and destination addresses so in Ethernet networking today you have two devices that's right an Ethernet networking today you have two devices and when you put them together and you bomb the two until port channel the conversation is to steer this way or this way and a given IP destination source will either go left or right in a two point channel and so your maximum speed is always determined by this links up there two one gigs then your maximum speed or network speed is at one gig but your maximum bandwidth is two gigs that's as low me sure that's relation right there's also a we need usually and we want to keep it two powers of two so we'll get into that but so for those of you perhaps coming from a different vendors approach that's the way we know and that's the way we understand it just to sit a background on which to go that's exactly right and because of the way that we decide to go left to right if I have an unfortunate hashing scheme then what can happen is that this link over here gets terribly congested well I've got almost no traffic that's going over here and because I don't want to allow the possible out of order delivery of those frames I mandate especially for TCP flows that everything go over exactly the same path so what happens here is this we're talking about interoperability in the case where I'm hooking up to either different ports in different AC groups or another vendor switch this is exactly what it looks like and this is the behavior that you get so it's very expected NACP everything standards they figure out they're in a four channel or lag and they work it's exactly right everything looks exactly the same way but suppose that now what I do is I have two different lines like this and I connect them to the same what's called a port group on these two separate switches as soon as I do this they recognize through some low-level signaling that they are on exactly the same ASIC and now they use a diff form of load-balancing we're on a frame-by-frame basis they actually put I get into a little bit of details they actually put a little bit of a shim as they transmit the frame should point the point so that now this does look like a single link of the combined capacity and I can use an algorithm that basically says as a new frame comes in here the shim goes on and then I'm going to look at these two lines in whichever line doesn't have any traffic going across it that's the one I'm going to use and so then the next frame comes in does it go to this one or does it go to this one it can go to either one depending on which one is currently busy and so what that means is I see a very even load balancing taking place between here and there so that now for example this looks like in this case of Subang 10 gig links this looks like a single 20 gig link this is interesting because now we start getting into the costing of these things the question is well if I can do that and it looks like a 20 gig link and with arbitrary loads arbitrary hashes I can get this thing all the way up to 20 gigabits per second then now do I really need to have faster individual port speeds it's an interesting comment for example suppose I have four of these things that are moving back and forth and doing a 40 gig know do we need 40 gig connectors that's exactly right because it would need four fiber cables anyway could we run three instead absolutely yeah and the thing that's nice about this is you can just scale on demand all the way up to eight so back to my earlier can so if you have a if you have a you know an answer to this I'm fine with it but initially hearing this my concern is that alright well what if frame gets held up in a queue on one port versus another fires at the other end and it that would be my concern if you the shim helps us resolve that okay and remember there's other communication that we can have between these Asics and the other thing that hopefully makes you feel good is again the fact that this technology came directly from what we are doing in the fiber channel side so we have 15 years of experience of knowing how to arrange that so the situation that you just described doesn't occur so you can guarantee in order absolutely Oni what they're actually doing is if they if you want to transmit the frame on one link and the frame with the same hash it's already train being transmitted on the second link they delay this frame okay they will start transmitting only when they're sure that the other one will be received prior to this one based on their length and the difference between cable lengths they even measure the cable length okay that's right as a matter of fact we have is they're called eschewing register yes that's it up here so that we can actually watch everything and we periodically send pings we send like race car drives here and we can have one line that goes this way and one line that goes direct and we can still guarantee the reliable in order to deliver it coming off the other end okay so now what I want to do is scale that out to a multi chassis lag or an M lag so we have a new debt no what we talked at the point of the discussion is to compare multipathing versus multi chassis right so this is a feature which case so we want to expand the conversation inside of the time available so what we now have is a second switch here which now acts to multi chassis leg how does that look so the first thing that we have to do is we have to make sure that whatever we do is something that's going to look like a reasonable trill network because the minute I go off of this port group I'm back to what trill will allow me to do so in that case let's come down here we have another port group maybe I'm going to take two that go down here in this case this now appears to be two completely separate legs and I'm going to use ecmp across these two because I don't have this special sauce that lives in here so as we started to talk about trill we talked about how a trill is going to have this link state routing protocol the piece we didn't get to forgive me I digress was that there is once again going to be a shim header that's going to be put on at every single hop going through the network and that's going to dictate for example for this particular ASIC where do I send the next piece of traffic and in this case this next piece of traffic can go either up here or down here depending on the hash though I'm using for the slag here's where it gets really interesting fabrics are going to have boundaries so for example suppose now that this is all one fabric and I've got this traffic that's moving through this fabric Oh an eraser yeah yeah yeah okay thank you and let's suppose that I now have a network that's already in place and I want to have these two devices communicate with another let's say a logical host that's up here let me let me draw that bigger because we might want to do something that's a little different let's say that I have y'all can still see that that's remarkable suppose now that we have a single entity that I want to talk to and now I'm going to have one link that goes up like this and one link that goes up like that the question is as far as this device up here goes in this device up here it's important notice that it's outside the boundary of my fabric this could be a brocade device it could be a Juniper device and Arista device a Cisco device it can be anyone else's device that needs a host or arm or host the question is what does this now look like to this node and the answer is as far as this node is concerned it's all the same thing and so now that means that just as I could have these light groups going between notes down here inside of a fabric as far as this thing is concerned up here this is in fact one node yeah so troves going to handle the howl notes are how traffic is going to go through here but to the outside world with technologies like multi chassis trunking and so forth this looks like a single entity here this looks like a single they up there isn't it interesting isn't it interesting that if I do actually have two protocols going back and forth like this these two let's say that this is running multi chassis trunking or VPC or the SS or some other protocol in here some IRF some proprietary protocol and isn't it interesting that down in here along with trill we're running some sort of proprietary protocol in this case the net effect is the same I need to have a tree because it's going into an existing network therefore this needs to look like one node this needs to look like one note and so here I can actually have multiple nodes yet appear to this node to be the same so at this point this is what this could be you know some other vendor that's and this is LACP that's right pure standards-compliant LACP becomes OBCs boundary and then in here is a trill it's really cool is what you call your multi chassis called be lag be like okay and on how many different devices can you could terminate one lakh check the website because I don't know when you're watching check the website today it's fine so that means isn't that interesting that up here if we pay a lot to the ability to have two of these chassis x' work together to look like a single node and yet down here in this environment I could go one two three four I can have a fifth one that comes up here these all look and behave like a single node and what you'll find out again is we didn't get into pricing a whole lot but what you're going to find out is that the pricing down here is about what you'd expect to pay for I'm going to say current layer 2 devices running without any fabric infrastructure that these things but please check with your local sales team to get the latest pricing all those and you can find them on brocade though I don't ask Claire sure and as we come out of your guys's stack your fabrics bottom then we go up into this this host or this other vendors environment that standard LACP that goes back to standard flow based so flow goes laughter flow goes right it now is not per frame based that's exactly make that that consideration people don't look at it and still think what's important about this what's important about this approach is there's a transition from your current network so if you've got a pair of which I wrote to thank you oh it is good but that's okay we can draw it out I wanted to ask a few questions about that picture so thank you imagine a few words so let's say you have a network today and you're running a vendor's em like solution so M lag is multi chassis link aggregation and that's technologies which take two physical stresses and make them act as one so between the two physical chassis is you've got some sort of protocol between here so there are many different names for this VSS is one from Cisco IRF is one from HP and it's a generically sort of an interest free approach what it does this takes to physical chef sees and turns them into one chassis so that means you can have two cables going down here to a an access switch and then your server connects to that and this looks like even though it's to physical chef sees it looks like it's actually one just to back off a little bit but let's say you want to trill enable your core using brocades VCS you now at a situation where you have two switches over here and you can then say we have the six point we don't want the 19 I mean and non non Pentangle are type of thing and so we then we hyper connect these I call it hyper Connect knees when we flood the trill connection between we'll talk a little bit more about that perhaps in a future thing so when we connect here this then becomes your probably should have gone with a different color this becomes your VCS edge so at this point we're running VCS in here which is concomitant with the term of trill right if you wish to talk about it in that Center this is an LACP industry-standard bundle so you've got an existing vendors m lag at the edge and it looks just like that that's exactly right right now point two in a network is you might have existing single switch right and you'll connect it to the trill core and this is a spanning tree that's right right so this is spanning tree and so this would be a way of running a lag but it's it still here we were in the lag and consider we wouldn't even either as manager that's right okay so we could still run this as a lag or you could run it as a spanning tree where tonight no this is exactly the picture I needed that Greg's not so dumb so now you might have this situation tell me we're not go wrong here because I'm starting to get the creative and now I might have a server down here where it actually has the active standby so this is the active port and this is the standby court at that point you've now got a transition from this and to encompass your existing things and creative this tool enabled core and then you have you know the fan-out continues off to the other side that's exactly right okay so if you can replace all this with a single box you can do it with the cloud now the advantage of something like that is suppose we still want to have the edge core aggregation and suppose out here I have a series of racks and let's say I roll the first rack in here's my top of racks which I roll the second in my second my third and so forth and all of a sudden I start thinking you know I want to grow my Angra Gatien layer from where does it live well a piece of my aggregation switch may live here another piece of my aggregations which may live here and so what that means is as we scale out the environment I don't have to figure out how to dig out a big piece to put this large chassis in instead of that I just have to have enough room as I add more chassis to add the next incremental layer so you can almost think of this as being a way to have a large virtual chassis without with all the management features of a large physical task I quit the next incremental layer you mean the seventh node in that mesh in this case it would be the next rack that I'm going to roll in just the next rack the next rack and then the next rack after that and the next rack that's after that so you're almost talking like a deconstructed core yes that's exactly right so you get rid of the the three layer that we talked about having over the years but you end up with this deconstructed core that is kind of this one layer that exists at every top of rack and then communicates the TRO so this is an October so I don't want to go too fast to scare anybody so today I'm only going to be talking about aggregation and edge but the concept of saying can we have similar things happening with a core stay tuned well it depends on how big your data center is if you only have a few hundred servers you don't need more than what you have here on the whiteboard that's exactly right and that's where your Facebook that's a different story so suppose that we get kind of leave the green part okay and that hurts with you can elect every part up here oh you know just say yeah yeah no no you you had the beautiful picture before and then here is that and then he almost redraw it so here's the thing to say about this suppose I already have up here's my core in here I'm gonna call distribution here just so it's not ambiguous with access and down here I have access out here I've got access and maybe I've got some MCT or some VSS or something like that that's running in here to cry and distribute this virtual quartz nail so now the question is how do i organically grow this so that so that i can incorporate fabrics and the answer that is down here you bring in your first fabric switch now what have I done to the total switch count here from a management you point I've added one so this goes into this rack and now I bring in the second rack and what have I done to the total switch count from management view point nothing because now since this is a fabric these are going to look like a single switch I may be adding some additional links for a bandwidth or for a high availability but it remains the same now I add another wrap that's out here the same thing can keep going on at some point it's going to be time to add the next piece for my distribution layer or my aggregation layer where does that go maybe it doesn't have to exist as a separate switch anymore maybe now what happens is it appears down here it's just another component that fits into this fabric and that feeds my so now the thing that's actually happening from a networking design viewpoint is our jobs just got harder not because we made it more difficult but because we're there's only one way to do it this layer can only talk to that layer and we're done it's easy choice is a tough thing so now we get to the situation of saying well do I really need to add another switch-up here can I add the switch down here do I have a lot of east-west traffic that's taking place in here so I and look oh this east-west traffic I don't have to go up anymore so this is the place where my next-generation application is deployed where I have federated virtual machines causing the same thing without needing this infrastructure and the problem you have here is that now all of a sudden you have to understand how much east-west traffic you have and how much bandwidth you need on those links that's right and the key here is to be able to measure it I mean that didn't have predicted behavior yes sir so while you're collapsing the traditional traditional structure everyone's used to I think what I think you're kind of glazing over is that there is inherent structure with that row of switches that you have the line drawn through what is the inherent structure they're in here yeah between those four switches how do you if you're gonna build out a network of switches of switches what is that going to look like it's not going to be four switches in a row what is the cable and what is the structure of that Network outstanding question and I have to contrive time for a story absolutely yeah we all got tons of source okay it turns out that um I hope I won't get in trouble for telling this but we were actually going to show a commercial a while ago and it got such a reaction from networking people we actually stopped the commercial what the commercial was was it showed a bunch of switches that were sitting there and a bunch of little kids and we said okay kids go build the network yeah and they all ran in and they just plugged in things nilly willy and it all worked and then he said okay kids go change it yeah in it and it still all worked now I'm telling you that because I'm telling you here is in this area the thing that's going to dictate how I put those switches together isn't going to be well architectural ei needs to go to B and B needs to go to C instead of that the way that these four things are going to be put together is the way that the application and the workload needs for them to be put together let me give you a few examples if you examine and this is where your point about measuring the workloads is so important you have to be able to characterize this to do it effectively because you had a choice now suppose I have four nodes I can hook them together in a ring if that's the right the polity where for example neighbors are only talking neighbors maybe I'm doing some sort of FFT calculation if this is good for VDI well for VDI there we go it may be that because any can talk to any and I want low latency I'm going to hook them together using a complete mesh it may be that I'm going to and again forgive me for not exhaustively going through these it may be the what I really want is a hypercube so the question is who are the people who can best determine exactly how these things should be laid out the good news is is for the last 15 years plug and do a shameless camp listen for 15 I think you may need a napkin the answer to this is for 15 years in fabrics people have been solving this problem which is why today what we're seeing happening which is a really uncomfortable dynamic is when it's time to design these racks using fabrics we're seeing more and more people especially from the hypervisor level saying let's get the sand people in here not because there's sand experts but because they're fabric experts and they know that because you can lay switches out any way you want to doesn't mean you should so let's bring them in let's figure out what the right traffic flows are going back and forth and let's engineer the network the good news is this is a little scary I'm like I said we need Thunder outside to talk about these things the scary thing here is you're allowed to be wrong and the reason you're allowed to be wrong is because if I happen to notice that I've cabled this incorrectly and now I've got an inordinate out let me erase this I've got an inordinate amount of traffic going between the two or there's something about the deployment of my applications that requires more bandwidth in here I just go in and they have more cables yeah so I just want to they're all use him so then it comes back to this thing here there's no what that did that then that's one of the questions hey so are they all used so you might say I've got a set of applications on the other side of this right and I've got some sort of cluster over here maybe you've got you know and you need to exchange data so the question is you might plumb this and say that's enough and you know obviously but you might only say have you know some arbitrary architecture like this and that might work for it for all cases of work and then you suddenly realize that over here somebody's actually added some more elements of a Hadoop cluster and they need to talk to this cluster over here so now in a in a trill core you can now start to do things like just xko that out to give more bandwidth from this business is where my set of questions start okay so the first question would be let's assume that I have equal cost multi path yes sir and then I figure out that the links are overloaded and I just add a new parallel link to one of the links in the equal cost structure yes so I guess you do can take the bandwidth of the links in account we take the bandwidth into account for load balancing not for routing so that means that a please please yeah if you have say you have a topology like this and but but here you have high bandwidth links you kept LACP here and ICP here yes sir so all of a sudden this looks like what wherever it comes if it's equal cost multi path going from here to anyone to anyone down here but effectively because this link is thicker so to speak it shouldn't be equal cost multi path but if you make it equal cost multi path then this link will be underutilized I think that what you'll see there is that if you look at the individual links in each one of these trunks they all have approximately the same workload assuming a perfect hash scheme within the leg no actually across all of them because if they all have the same routing distance from here to there okay then that means that all of them are going to be considered if all of them are going to be considered then the way the hash algorithm works is let's say I have five links from there to there the way that it works is I'm going to draw a number from one to five mm if it's one through three it goes here if it's four or five it goes here here so that means if I have a good hash algorithm I'm actually using all of those links going from there to there so if you're also at the mercy of other vendors decisions in their hash mechanisms as they come into the LACP you have an incredibly high bandwidth flow coming in to the top part of that fabric that never takes into account the bottom part of the fabric for the higher bandwidth path is and this is where Sdn and those solutions come into play as overlays yep but again press the end to have incredible play even in this model I don't want to harp on is so on you then have to go back to having true equal cost lags everywhere which means lots of infrastructure lots of cable cable ones architectures I mean this is full stuff but it's not it's not inherently without without challenges from the outside world how do I fix it oh you go you go strictly to a brocade and to be more gentle this goes to trim yeah so for our our bridge routing the ecmt the is still at the mercy of a hashing algorithm it is and that's fine but just wanted it it is and that doesn't necessarily mean that it's a single hash algorithm for example one of the things that we figured out if we have a truly unfortunate set of hashes then then we can actually change the hash algorithm we can change some of the salt that's in there so if I have a beautiful oak so if I have one hose just blasting out one TV session one side to the other that's right yeah and that's that's pretty well understood that's right but you're also going to have that same congestion issue on one of these links and also coming into these links okay now the other point to notice here is just to come back to a thing this is a multi chassis solution when you build a multi chassis using an M leg like this you have a coherence between the control planes so you have two switches here they have a control plane and a management plane which has to be synced into one so there's a synchronization protocol between here and here this requires cleverness extreme cleverness at some points whereas using at rule-based solution each of these is an autonomous and distributed system so it's something that most networking engineers are familiar with that is there's a control plane here in a control plane here and they're using trill over a sepia F SPF to exchange routing information so the MAC address in a trill core when your Mac actually comes up and into the fabric enable air you know the the trill enabled switch actually gets encapsulated and routed across the core it's not forwarded across the core that's part of the secret of trill of course and so there's a significant demonstrate multi chassis where you're actually taking just standalone two standalone switches and trying to go weld them together to make them one that's right but it's not a perfect it's not tight Italy integrated it's a loosely coupled you're trying to you're either disabling the forwarding engine and the control plane and one or the routing engine over here and saying everything's over here and then you're distributing forwarding information and and then you you have these active detection mechanisms to try and make the to look like one this is in a realistic sense this is a more robust design because the failure of this does not create a co-op Konkani finally the man with this device its autonomous unless it's a routing problem which brings me to another interesting question let's say you Raonic LACP with outside devices if you want to do that then obviously both of your fabric nodes have to pretend to be the same note yes sir from LACP course which is fine now let's say the fabric falls apart you have two fabrics all of a sudden yes will those two devices still pretends to be the same device in the outside world so you have a split brain scenario you've hit on something which is I think the crux of what of interoperability because I think that's so vital and that's a piece that people I think forget and that is that a fabric is not just a collection of tril switches that in fact there's distributed configuration management and so forth that's going through here it's not a answer sorry I'll have everyone alienated by there's none if I do this right the answer is a decade and a half ago we figured out that there needs to be a sense of fabric coherency that goes on where for example the minute I see the light blink out or the minute that I see something is going on here I start to register fabric events which means that a huge piece of keeping something like this working in the coherent fashion you talked about for example if this was just the trill network distributed likes or multi lags wouldn't work exactly of I guess special sauce that goes between these two that allows that to take place so what true network is not the same as a fabric there's a lot of other protocols that go on in our case what will happen is the minute that fabric is separated both fabrics are gonna say something just happened here what did I lose and how do I adjust myself appropriate is it's based on something different and that's what I sauce yeah it's basically based on the higher levels of it's scary fibre channel ish protocols to talk about fabrics are those as opposed to node forwarding services okay now now that's we are in the fiber channel in the world okay so if the fabric splits yes do I get fabric reset on both ends if you're running FCoE yeah yes but remember then the majority this isn't a safe there even though what happens to them to my trunks when a fabric reset call al cause LACP reset if I have separate links going like this then the answer is they sure better because it wouldn't be compliant that's right okay okay I think I have a big brain in the fabric all my poor channels will also go down and they will be reestablished in whatever way is still consistent with the state of the fabric that's exactly right that's exactly right okay I think we might have covered this topic all right I think so and I'd like to thank you for watching so that's a pretty rapid pretty detailed discussion on multi share see versus in leg if you'd like to get some more information I suggest you get to the brocade website there's a great white paper there which I've read myself on the VCS technical brief which is I read and I got a lot of good information from that I'd like to thank our panelists especially chip for joining us for this you should remember that there are four sessions in this portfolio session 1 which is the overview session 2 which is where we talk about converged storage we have this session which is multi share C versus multipathing of course drill being the multipathing in brocades VCS which is a multipathing technology and session 4 which is where we'll be talking about virtualized networking which is hard cause and soft edges I look forward to seeing you in those presentations and thanks for joining us
Info
Channel: Stephen Foskett
Views: 10,524
Rating: 5 out of 5
Keywords: Tech Field Day, Virtual Symposium, Packet Pushers, Brocade, Chip Copper, Greg Ferro, Josh O'Brien, Tony Bourke, Ivan Pepelnjak, TRILL, link aggregation, Ethernet
Id: 46aSB9rHpuQ
Channel Id: undefined
Length: 49min 15sec (2955 seconds)
Published: Tue May 08 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.