Arista Merchant Silicon for Service Providers

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
thank you very much i'm excited to be here to talk to you all about a topic that is near and dear to me has been near and dear to arista since the company was started my name is hugh holbrook i'm the vice president of software engineering i've been in arista since since the company was in ken's living room and um merchant silicon and software have been very very deep to the company and deep to me for a long time so i'm excited to talk to you all about this so we have not not too many topics i want to talk about in merchant silicon i'm going to talk about our portfolio a little bit about merchant silicon and then how that intersects with service provider networks um so so this picture here is kind of a little graphic of the different families of products that arista has for people who don't know arista's portfolio all that well we have small products large products products built on different families of silicon chassis fixed config devices all the way from like 21 ru down to one ru one gig up to 400 gig um all of these products some of them are high function deep buffers some of them are lean and mean and fast we'll talk more about this all of these actually end up being applicable to our service provider customers they have they have lots of use cases and they use everything as we'll talk about um so so this picture actually i want to throw it up here to sort of set some context this is kind of a scattered plot of merchant silicon over the last 10 years that that we've used at aristo so this is all the merchant silicon flavors of things that that we have used in our products and what you see here as you can see in the last 10 years we've developed new products based on at least five different silicon families from broadcom intel marvel and you can see here that there are different different kind of trajectories if you plot out the colors the colors correspond to a single chip family and they end up with different curves and so like some of them are going this is this is bandwidth plotted against time here just looking at one dimension which is bandwidth but one thing you can see here is that different chip families are making different design decisions and going faster or slower and the decisions can be fairly significant so like between one family and another at the same point in time you can have like a 2x 2x performance difference in terms of what the silicon is delivering and similarly at the same performance point one family can be 12 to 24 months behind another family so they all eventually catch up and you know every every family is going to get to 3.2 and 12.8 terabits but at different times and that's because of different tradeoffs that they're making and i'll talk more about that and how that's how that's relevant to our service provider customers so this is this is our routing products that's based on one of those families that i that i talked about is the gray dots i didn't have the line in there but it was the gray dots which is the jericho family the dnx silicon and this is silicon that is based on we've been using it since 2008 i think or 2006 really we started working with this team but it but it's really it's really great silicon and it's got deep buffers large tables advanced traffic management we've invested a lot in protocols i'm not going to go through the sort of alphabet soup of or you know of words here but it's a you know significant portfolio features and this is sort of the classic service provider router routing use cases are based on these systems and we've got chassis fixed configs and it's a 7 800 7500 and 7280 line and all of this is one eos image that that ships it it's one eos image not just on this family but across all the families of aristo products and as a result we get consistent features in the family and across eos from the lowest speed up to the highest speed and from the smallest config up to the highest config and that's really helpful for our service provider customers that have a lot of use cases okay so that sort of sets the stage on arista in our products and i'm going to talk a little bit about merchant silicon um so first of all like you probably know this but just i'll set out what i mean by merchant silicon and it's pretty simple it's chips usually like asics or i think in all cases it's asics they're sold on the open market and they're not tied to a specific vendor that's selling the product and they're not proprietary that vendor they're not tied by a vendor so they'll lock in or tie in to a specific vendor with merchant silicon so that's that's what that's what i mean by it so this is like broadcom intel marvel there's lots of startups out there many of the startups have been since acquired by broadcom intel in marvel but there's there's more coming all the time and a tremendous amount of competition and innovation in this space so one thing that i want to i want to kind of make a point about that that i hope is clear but but i want to belabor it a little bit is that all chip makers have access to the same technology proprietary silicon custom silicon merchant silicon it's all using the same fabs the same processes it's going to tsmc and intel and samsung the same processes the same memories tcam certies as a result you have roughly the same clock rates that you can achieve in any process technology and so like where do the differences come from given that like everybody's playing kind of in the same sandbox and the differences arise primarily because of design trade-offs that are made to sort of focus on or optimize for different use cases and process shifts so it's a huge thing and you you know if you read anything in the trade press you know about tsmc and tsmc's advancing and the fact that the m1 chips were built on you know tsmc's five nanometer process and they're talking about three nanometer and two mm meter but the process shifts going from 28 to 16 to seven to five which is kind of where the the networking industry industries at have huge savings because this is the feature size in the transistors and you can pack many many more transistors into kind of a fixed die size and so like what was 28 nanometers now in 16 nanometers only use a quarter of the space and then seven nanometers use half of that again and five nanometers using like half of that again so you're like tremendous savings as you go from one process technology to the next and the other thing which is not to be under understated is the importance of execution like executing is really really important and this is a way in which different merchant silicon vendors you know succeed or fail and differentiate themselves how well they execute and how fast they can get their products out and with what success so there is there are execution differences but it's really these things but but i want to the point is that there's really no fundamental advantage there's no different magic that happens with customer proprietary silicon the fundamental limit of course is that silicon size is bounded and you know this and it's bounded by the reticle size and the feasible die size based on defect rates in silicon manufacturing processes and yields it prevents chips from getting too big so there's a practical limit this is just like a physical thing again that everybody plays with and so given that physical limit and the choice that everybody's got the same tools choices are really driven by these trade-offs as i said and so you have choices when you're making merchant silicon about with with the finite die area that you've got about whether to allocate more of that space to ports do i add more ports do i add more lookup tables and get larger lookup tables add more counters more rewrite capabilities do i have more packet buffers on chip do i decide i need 32 megabytes or 128 megabytes or 64 megabytes in my chip and then in terms of parallelism determines how much parallelism i have in the chip also determines the speed like my packet rate and how much logic i have factors into how programmable the chip is so in order to have programmability i have to have extra logic in there and then i have two other sort of you know levers that i can pull on which is die size so i can make the chip bigger but of course that adds cost and then power factors in as well so more pipelines can go faster but it requires more power different devices make different trade-offs in this kind of multi-dimensional space the industry is highly competitive here there's a tremendous amount of innovation which is really cool to work in this space and talk to the startups and talk to the companies about what they're doing but like this is true even within one vendor like the different families within one vendor are competitive um and at aristo we really evaluate all of the merchant silicon that's out there like at all times we're constantly talking to people and looking at them and looking at what they've got coming in order to build the best products across this range of choices okay so that's that's merchant silicon now i want to talk a little bit about service provider networks and how merchant silicon intersects with some trends in service provider networks that i see these are this is what i think is happening from talking to service providers so a first fact that i think is is kind of undeniable is that service provider routing has evolved like there there is and was a very important legacy of service provider routing of 30 years of protocols and services and hundreds thousands of like internet rfcs describing functionality but there is a new world that's evolving where simple fast reliable service is increasingly important and i think just a couple examples of it that i think sean talked to are you know in the commercial space there was a lot about vpns and high-touch services and tunnels and overlay networks provided by the service providers much of that is getting replaced by sd-wan and activity happening at the edge and commercial services some fraction of that business is moving to customers who just want give me a fast internet connection that's what i want and i'm going to build my own reliability on top of that and the same is happening a similar thing i think is happening in the residential service where again that's very competitive the demand for bandwidth from netflix and hulu and whatnot is like in youtube it's just insatiable for like over the top video and so competing in the service provider world in residential service like what can i do well i can drive up the bandwidth and i can make it really reliable and cheap to operate and that's simple and fast we see those same trends not just in the data plane but also in the protocol level and you know sean talked about this and alex will talk more about this as well but like evpn and segment routing have become tools that have allowed certain service providers to simplify their network and it's it's not slicing and dicing the bandwidth anymore it's about providing reliable high-speed bandwidth and this you know we still need to support the legacy where it's necessary but and it is important and i don't want to i don't want to dismiss the fact that there are lots of protocols as a brownfield and service providers need that but they also need to deliver the high bandwidth simpler services so a first trend that i think kind of comes out of this is like clear is just like going simple and fast when you can is cheaper it's more reliable and it's better again when you can strategy around that to is you know to simplify improve your reliability reduce your opex enable high bandwidth and where you can't simplify you can simplify the underlying technology you can simplify by automating you make the solution simpler and easier to manage the unavoidable complexity that you're stuck with and the diversity of products and diversity of you know connectivity technologies and whatnot that you've got okay so that's my first kind of trend the second trend is that or fact i think that's again it's undeniable is that service providers have a lot of use cases and like it was very interesting i was in i was in a conference and i was talking to one of the one of the guys who runs one of the largest u.s service providers and he just impressed upon everyone they have everything they have every use case they have every router that anyone makes they have it somewhere in their network and they've got because they've got all these different use cases from metro aggregation business services core routers that need high speed and traffic engineering peering routers that are like need lots of table scale lots of peers residential aggregation which really just needs this high bandwidth functionality they've got data centers with data center technology possibly with you know service chaining layered on top of that and vxlan and mpls may be running as part of that solution they've got data center interconnect they're interested in you know innovating in the optic space with zr vxlan evpn there's like a ton of different use cases like service providers might have all of the use cases that arista sees it's like it's kind of amazing um and they're interesting to talk to so a second trend that i think again if you map that on to merchant silicon i think excuse me yeah go ahead regarding your last slide um sure do you only have solutions for backhaul or you also have something for front hall i mean i i think we have solutions i i mean i think it's very specific what you need to front hall and backhaul to a particular solution i think we have solutions that are applicable to all of that so i guess i don't know if there is a solution for frontal or a solution for back all you have to look at a specific carrier specific server service provider but i think we have solutions in in those spaces okay okay thank you okay short so i think that you know mapping this on to merch and silicon you know their different use cases benefit from different silicon and like that's sort of like you know not surprising but it's true and so as a service provider i think what i see the the customers who are you know forward thinking in this space doing is they really want to use the best silicon for the role you know you bring your leatherman when you're like don't care about the weight and it's more expensive and you bring your opinel when you're going backpacking and you know you got the machete when you just need to like hack through bandwidth and i need and i need a lot of a lot of traffic sent fast right there's like different silicon has different advantages and at the same time and maybe this is obvious is that there really is no ideal silicon like you can build extremely flexible silicon flexible packet processing with like arbitrary rewrites unlimited lookups trade-off buffering for tables for packets for rewrites but it's not free like i don't bring this thing when i'm going backpacking because it works because it weighs a ton right so you you you you can build flexible silicon and not that flexible silicon is not good like flexibility is great when and where you need it um but you don't want to be you don't want to be tied into something this is another trend that happens like service providers i think well customers in general i think are wary of being tied into something that only one silicon can do that requires flexibility that locks them into a single silicon or a single vendor so it's it's again it's not that there's not use cases where you need that but there but there are trade-offs and this this sort of you know extreme flexibility is not free um so expanding on that and you know stating what's perhaps obvious based on what i'm saying is that you know merchant silicon enables our customers to choose trade-offs you know depending on the use case your core your edge your data center your dci routing etc and so different families have different attributes i'm not going to attempt to walk you through this through this graph of silicon families from different vendors and you know this is my subjective assessment of what i think the strengths are and the grades that i would give them sort of consumer report style but um but i think that uh you know i think the the point i want to impress is that there are different trade-offs and there's no one best silicon for every use case but there are real advantages to using the best silicon for a particular use case can i interrupt you for a question real quick because i think your chart here outlines a question that i was going to ask you but i think you were getting into it so i didn't want to interrupt but um i work in a lot in the disaggregated space with disaggregated operating systems and so you know in the odms you know use all of these chips and different uh you know different in different roles in the service providers as you've outlined and i guess one of the things that i wanted to ask you and i have you already stated it very clearly so we're overstating the obvious but we go into pretty epic debates about you know commodity silicon um and merchant silicon versus proprietary silicon especially in the service provider space because historically you know we haven't had deep buffers if you want to do 100 gig 400 gig long haul and there's been all these it was designed for the data center and then you know we haven't had it until maybe last two three four years so you know how do you see that changing because obviously arista is leveraging merchant silicon you know and they're doubling down on it and we use it a lot as well i you know i see the future is very bright for service providers in merchant silicon and i see you know custom silicon maybe not going away but certainly shrinking what's your what's your take on that for the future of of that relationship you know i think your insight is is right on as a very you know i think insightful comment and i think anything's all true i agree with what you said i think that like it it is true that i think there was a shift in a service provider's view on silicon really kind of with the jericho generation so back in you know 2016-ish when and a little bit in 2013 with a ride but like it was really 2016 and the jericho silicon which had deep buffers which they've historically had use cases where they needed that and it could hold the full internet routing table like with lots of room to spare for v4 and v6 and i think it was the full internet routing table plus the deep buffers that opened up a lot of those customers eyes to that to like oh i should be considering this and wow like the pricing on this is fantastic and so like i really got to be looking at this and i think it opened up their thinking to be you know along the lines of well maybe i don't need and in 2016 2013 we didn't have every feature under the sun that list that i threw up before like we didn't have that right like like i mean it was really limited to you know the old school router companies right so like we didn't have that that breadth of portfolio but it was still the service providers and you know content providers and you know people doing you know buying routers were like oh the economics of this are so compelling like i got to be using it and like i can make some compromises i can change my network i can simplify and then i can use merchant silicon and i think by simplifying it also allows them you know to go faster which allows further simplification actually like speed simplifies so yeah no i would agree with that yeah so so that's that's kind of the trend and then you know of course what's happened to us is like we've gotten pulled like little by little into more and more service provider use case now we have a pretty full portfolio of that you know brownfield you know legacy stuff actually in aristo products which is now part of eos has one operating system so actually like applies you know not just to that to that one family of merchant silicon although there's some component that does and so like we've really focused on focused on the jericho silicon for those for those use cases yeah that makes sense i'm excited to see marvel in your portfolio as well because i think marvel's got a great play in in building some inexpensive sp applications that you need to push out i think they've got a good angle uh with their chipset and i don't see it in as many uh and as many vendors so i'm glad to see marvel in there well just to be honest like we're not actually shipping products based on the anovium silicon we we have used x client we ship products based on explant i mean this is this is exploring this is the space but um but we're evaluating everybody all the time right and trying to make sure that we're considering everything and we talk to our customers and like their our customers and the service provider customers and the cloud customers are very aware of what's going on in the merchant silicon space and they tell us what we want and like you know we're like kind of coin operated at that level okay you want that we'll build it so another thing that merchant silicon enables is the scale of merchant silicon because the global reach and like their investment and their savings across different families um enables them to build kind of right sized products even within a family so like right now we're shipping and using five flavors of jericho and four generations of tomahawk just to like talk about two families and they're variants optimized for port speeds and amount of memory and whether it has encryption on chip and the number of certies but this gives us kind of consistent products within a family that are right sized and kind of cost and power optimized for our customers that's another important advantage of merchant silicon so one thing that i want to talk about is our operating system so as you probably know we have one binary that runs across all of our products and it's really key to our strategy it results in a better outcome for our customers and for arista having one operating system image really enhances our testing efficiency because all the testing that we do applies to all of the products i mean it's not that there are no drivers right like we have drivers like linux has drivers right you have like an intel nic or you've got uh melanox nick or you've got you know whatever in linux but like all the testing of linux you know outside of that applies across all of the products and that's that's true for us and it helps us the single operating system helps us get faster time to market and produce a more stable image so a a more stable software um which is super important so i want to talk about the question before you move on there speaking of software qa i know that uh us does have a container based image but i believe that that's behind a registration wall are there any thoughts of opening that up to just be you know something you can more freely access other vendors have started to kind of open up their you know demo images and things like that so it's a lot easier to get to um i'd be curious if you had any thoughts on that it is useful like i have no doubt that it's useful and i don't know if we've talked about licensing it at lower scale or you know licensing it in in some way but um that that could be useful but but it is an important part of testing and our customers do use it for testing um so let me let me go on to just talk about this and i i want to talk about merchant silicon and sort of the software that comes with merchant silicon so typically and i think this is the way the merchant silicon vendors sort of envisioned things and i think we've changed their view on this a little bit but like they envision that you buy their chip and you get a prefab sdk also reference designs for the hardware if you want it their micro code basically everything in a package and then you deal with some interfaces like please load this route please you know send this route to this next hop please create this tunnel and it's kind of a black box with some basic debugging inside that software development kit shell that isn't really that easy to optimize because it's a bunch of code that you didn't write and the result is kind of vanilla products and what we've found and not to throw shade at like our partners but like we found that like it hasn't scaled that well and the control plane performance wasn't that great and it was hard to debug and it like honestly wasn't that reliable um and we sort of realized this through experience after shipping you know a couple years of merchant silicon products in the early life of the company that if we were going to get to that like internet scale we need to rethink how we were going to do our software stack and so starting in like 2010 with a rod generation that was going to have way more routes in it we're like we got to do this differently and we went to our partners and got all the information we could get about the chips leveraged our expertise to build an optimized pipeline and to to build build a software stack that did as much of what the sdk does as we could and lifted that out of the software development kit into shared libraries libraries that are used across all of our products and that is just like super duper valuable for us because we get um better testing better quality we move faster you know the the library that programs the routing table on tomahawk on trident on explain even though the registers are different and the details of the table are the same like the optimizations the interaction with eos the data movement of the routing table the restart handling cases like those those can all be common cases and it's much better we can have way more resources on it if we lift that code out of the chip specific teams which are much smaller into shared infrastructure that's part of eos and that way we get better testing better optimization more people were able to dedicate to working on it and a better result and it it helps us to ship the silicon on the first day it's ready which is really really important because of that up into the right curve getting to new silicon is super duper important this is my final slide here so like merchant silicon and arista here's our strategy in a nutshell okay the underlying fact is like moore's law is charging along like tsmc is announced two nanometer process in 2025. merchant silicon is kind of three years behind the cutting edge of tsmc so like we have like a roadmap out to like 2028 or something and they're huge power functionality cost density benefits by moving to new chips and we want to have broad silicon choices simple and fast options to benefit service providers and our strategy is really to optimize the software for time to market with functionality it's got to be ready to go as soon as that silicon is ready but with functionally functionality that the service providers need and all also number one always as i'm sure you know is quality at arista so like we have to develop a software stack that can deliver that functionality as quickly as possible with the quality that our customers have come to expend and really they depend on to operate their networks and their businesses
Info
Channel: Tech Field Day
Views: 121
Rating: undefined out of 5
Keywords: Tech Field Day, Gestalt IT
Id: DQShDHex4Xw
Channel Id: undefined
Length: 26min 59sec (1619 seconds)
Published: Sun Dec 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.