Hitting the Nail on the Head: Interdisciplinary Research in Computer Networking

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I'm super excited to introduce Jennifer Rexford. Jenn, [LAUGH] as she allowed me to call her. So she's the winner of the ACM Athena Lecture Award. It's Athena Lecture. And which is kind of the Turing Award for women. So this is a really, really big deal. And, but of course, this is not the only award that she has won. She won the Grace, ACM Grace Hopper award in 2005. She's an ACM fellow, she is a fellow of the American Association for Arts and Sciences and a member of the National Academy of Engineering. So she got her PhD from the University of Michigan, as it turns out in the same year that I got my PhD, so I'm not gonna comment on that. [LAUGH] >> [LAUGH] >> For protecting my own privacy. >> [LAUGH] >> And then she went to industry to work at AT&T and Bell Labs. And Albert is here, and so they go back. But of course, she has a number of other Microsoft relationships, so her students are always welcome, and they do come, and we're very grateful for that. This is not her first talk. She gave a summer school on networking in India. And as I just heard, she visited Victor's group nine years ago, right? >> Yes. >> Now, after AT&T, she's on the faculty at Princeton, and in her work she's kind of a pioneer of software-defined networks. And again, your co-inventor of OpenFlow, and I have a personal history with that, because when I was at ETH, we bought a huge pile of Arista switches. And the big defining factor was that they were OpenFlow and that we could program them. Before that, she also worked on PGP, which I also at some point taught at ETH. [LAUGH] >> [LAUGH] >> And one of the most interesting, as I was googling and searching around, she has a page with all the advice that she give like PhD students and so on. And there was also one of our PowerPoint text which is interesting for me which was industry versus academia. And this is my last thing. You went from industry to academia and the PowerPoint explains why. I went from academia to industry. So we'll cross our stories later today. >> We'll call it even. >> Yeah, okay [LAUGH]. So thanks for coming and yeah, and please, we are all- >> Thanks so much for the introduction, and it's great to be among so many old friends. So what I wanna do is give a talk that I've given at CITCON our main networking conference a few months ago. And it's about interdisciplinary research in computer networking. And what I wanna do is give you a little bit of background of how I came into this style of research, and formed a lot by my experiences at AT&T, walk through a few example projects that I've worked on since going to Princeton. And then end with some, sort of advice for people, particularly young people that are considering doing interdisciplinary work, both about the joys and some of the risks of doing that style of research, particularly from an academic perspective. So just to give a kind of a little context. To think all of us are living in a really exciting time in computer networking. Really, the Internet is an invention that escaped from the lab during our life times to become a global communications infrastructure, growing from this. It's a really global communications infrastructure that's increasingly critical for a huge number of applications that affect our daily lives. And it's not only a critical infrastructure, it's one with near constant innovation. Innovation in the applications and services that run on top, on the kinds of machines that connect to it at the edge, and the kind of physical infrastructure that the network itself runs over. So, society critically depends on it. It's in a constant state of churn. And so a really simple grand challenge that we really all, I think need to think about, not just as networking people but as computer scientists probably, is that we need to build computer networks that are worthy of the trust that society increasingly places in them. And that really just unpack that a little bit, means they needs to be performant. They need to be reliable, secure, protect user privacy, be fair, efficient, enable the different autonomous entities on the Internet to work somewhat independently to be cost effective. And even reason about the inherent tensions between all of these goals. And so, I would argue, to do that, we really need to build much stronger intellectual foundations for how we design and operate networks, so that we can build better real world networks. And also, in particular, build ones that are robust to the constant innovation taking place above, next to, and below them. And the main point I wanna make in this talk is that networking researchers really can't do this alone because the intellectual challenges are so significant and the societal implications so broad. And that's why, really, an interdisciplinary approach to solving these problems is so critical. And so what I wanna talk about is bringing together these sort of big, hairy, audacious research problems that arise in computer networking together with effective solution techniques that borrow from other parts of computer science and other areas of engineering and mathematics. And this sort of style of research was something that was born out of my experiences in the nine years I spent in Albert's group at AT&T, where the group we were in really connected the people within the research organization doing work in algorithms, optimization and statistics with the people that were really holding the network together. AT&T's commercial backbone network, together to be able to take the problems that they were facing in their day to day live running the network and to bring those problems out of the sort of great details in the domain details that networking people would be comfortable with in the problems, there can really be a tact in a rigorous way from other disciplines. And so in the work in Albert's lab, the work started before I joined, but I think the premise of a lot of the work was that the network wasn't designed to be managed, right? Internet protocols are really not designed with a network operator in mind. If anything, network management and commercial interests were anathema to the early designers of the Internet. And so the idea that came out from Albert and his group really worked on was the idea that we could bolt on network management on top by having a sort of control loop for measuring the network to know that offered traffic, the topology, anomalies, taking place in the network and so on. And feed that into systems that model, analyze, optimize and so on. And use those results to go back and actually affect change in the network, to block attacks, to re-route traffic, to plan for the future by following this controller. So there was a lot of work I was involved in in the nine years I was at AT&T. And a lot of it was able to be both academically interesting, and deployed in practice because it was sort of close to those domain details, and yet abstracted a bit from them enough that we could solve them in more rigorous ways. So at the time that I left AT&T, the one frustration I had with this style of work was that for good reason we were really focusing on taking the existing network equipment as a given. This was pragmatic. We didn't really have a choice. Even AT&T is a major purchase of network equipment, didn't get to control the code that ran inside the box, let alone the hardware inside. And so pragmatically, it was really difficult to do much of anything if you wanted to do something that required changing the equipment. But when I went to Princeton, I was interested in taking a step back and revisiting this control loop and the problems associated with it assuming I could change the network. And there was already at the time starting to be signs of an interest in making networks more programmable, including work that Albert Dave Moths, myself, and others worked on at AT&T that was a part of this sort of great movement towards software defined networking over the last decade. But more generally, I also just wanted to engage with the broader set of colleagues in different disciplines on problems that I hadn't been able to tackle the way I wanted because of having an arm held behind my back by the Legacy equipment. So what I wanted to talk about are three example projects that I worked on since coming to Princeton. They're kind of are part of a stack of protocols, if you will, for running the network. The first looking at higher-level policies, making it easier for network administrators to describe what they want the network to do, and to synthesize the configuration of the network from it. One level lower to allow the distributed collection of routers in the network to figure out what is the right way to send traffic by solving together a single central optimization problem. And then finally getting the data out of the underlying devices That we need to make sound decisions about the network. And then we'll describe these in chronological order. So actually the middle project took place first. I'm going to talk about that one first. But in each of these, what I'm going to do is talk about the problem we looked at. Kind of a brief synopsis of the research results. And a little bit of the human side of the lesson, how the collaboration worked. So in each of these projects there was a set of other people involved in programming languages, in optimization theory, or in theoretical computer science in the area of compact data structure. And I want to say a little bit about how those collaborations happened, and what made them possible. And at the end I'll talk a little bit more in anonymized terms about less successful interdisciplinary projects I've been involved in, and what went wrong in those projects. But just to start at the middle, a lot of the work that we did at AT&T focused on optimization. Right we're trying to optimize the resources inside the network to the prevailing traffic. And I became very interested. If we were to start with optimization in mind from the beginning. If we weren't forced to solve the optimization problem the network itself induced, could we actually design the network to be inherently easier to optimize? And so I engaged with a colleague of mine then at Princeton, Mung Chiang, who is an expert in network optimization. And he has been working along with a number of other people on the idea that protocols are distributed optimizer's. That protocols are themselves a distributed algorithm for solving and optimization problem and that we can turn the crank the other way, we can start with an optimization problem and generate the protocol. That corresponds to it. So the tipping off point was the projects that I had been working on at at&t just before joining. Well, we were really thinking from the network operator's perspective, how to optimize the flow of traffic inside the network. There we couldn't change the end hosts, we couldn't change the routers. The end hosts are sending traffic in whatever way they're gonna send. The routers are computing shortest paths and the network operator can only observed the offered traffic and tweak the routing protocols so that the routers in doing their own computational shortest paths might compute sensible paths to the network that balance load. But the more general question is, how much traffic should be sent and over what path? And what is the right division of labor between the senders, who are adapting their sending rates, the routers that are computing the paths, and the network operator that's trying to minimize congestion. The way that this was being done. At the time was essentially without any cooperation between these three parties, and hosts, using the TCP or Transmission Control Protocol, would adapt their sending rates based on observations of network congestion. When your packets get through quickly without getting lost, send more. When they get lost or delayed, send less in order to avoid overloading the network. And the network in turn, with the operator's help, would be monitoring the overall network wide traffic. And reoptimizing the routing to direct traffic over less congested paths. And so there's a bit of a control loop happening here if these two systems operate independently. Congestion control is changing the traffic in response to observed congestion. Traffic engineering is changing the routes in response to observed congestion. And they're each optimizing a slightly different thing, and so several things go wrong here. There's often a time scale separation where traffic engineering was taking place on the time scale of hours or day. So there was often slow adaptation to decouple these two control loops, and also sort of a one size fits all view. Where the network administrator would be trying to, in general, just minimize congestion, rather than realizing the different applications over the network might have different notions of utility for what they want the network to do. So we took inspiration from a body of work that came out of work from Frank Kelly and Stephen Lowe and others, looking at protocols as distributed optimizers. And so I'm gonna walk you really briefly through the basic idea network. It focused a lot on TCP itself. So the idea is that the transmission control protocol is solving an optimization problem. Where traffic is being sent over a route at a particular rate from a particular sender. And the idea is that sender wants to send traffic at a higher rate to get higher utility, but with diminishing returns. So as the sending rate is higher and is delivered over the network, the user's happier and happier. But they're the most happy when they go from getting nothing to getting a little bit more, and gradually there's diminishing returns as they get higher and higher bandwidth. This is sort of a concave curve of utility. And the beautiful work that Steven and Frank Kelly did and others was to recognize that what TCP is doing, even though the early designers of TCP didn't realized it, were solving a utility maximization problem. Where you're essentially trying to maximize the total utility over all senders subject to these terms. And subject to constraints on the capacities of the like. In other words, if we have two flows, one and two, they're gonna have to share that link l that has some capacity cl. And they should do so in a way where they maximize the total utility across the two users giving each of them the bandwidth whoever needs it most, relative to the utility. >> What year was this work? >> This would have been back, let's see, around the early 2000s. >> Yeah, cuz in 99, 2000, we >> I chaired an isat study looking at future models on the internet and Scott and the whole gang was saying, don't touch and don't think about TCP IP. We don't know how it works, if you try to do coordinated control, going to mess things up. So let's just go with the lightweight approach, and that was kind of the theme back then, stay away, hands off You know, the cross at a distance don't optimize, just go away. And I guess there's a change in there. >> Yeah, yeah. I think in the beggining it changed in a way of being reversed engineering. Was like, hey we have this sort of cottage industry of various versions of TCP, all named after geographic locations. TCP Madison, etc. All of these were eventually reverse-engineered to fall in this framework. It's actually a really beautiful body of work, and they all differ in exactly what shape that utility function has. And there's a really beautiful connection to the notion of alpha fair utility functions in network economics. And all the different variants of TCP that are prominent have been translated into what number alpha Best describes the utility function. It's actually quite lovely. And I don't think it's a coincidence. The hackers, if you will, that got TCP working without the benefit of this theory, were doing something based on intuition that kinda mimics it. But it's really lovely that there's sort of a collapsing of the literature on TCP end of this nice framework. So the basic idea at the time I joined Princeton a number of people were starting to turn the crank the other way right? Can we start with this definition of utility functions and generate new versions of TCP? And Steven Lowe generated one called Fast that gave TCP for really long distance networks where you have very slow feedback coming back from the network and so on. So the jumping off point for me in Mung was to think well what if we wanted to control not only the sending rate but also the path? So we wanted to solve the entire traffic engineering problem in a unified way. So what we started with was very much the same idea. The only thing is I was given an extra degree of freedom, instead of controlling the sending rate on a given path. We have the ability to control sending rates on multiple paths on behalf of a single sender. So we still want to maximize the same notion of utility, but the routing itself is a variable. So there might be let's say 2 paths through the network for this sender and I have the option to set the rates on both of these paths so that the total utility Is defined by the sum of them. And I wanna, again, be able to arbitrate who gets access to which paths, and at what rate, collectively. So, think of this as trying, now, to solve the traffic engineering, routing, and congestion control problem in a single, unified way. And so, what did we do? Well, we used the same kind of optimization, decomposition techniques that the previous work on TCP had used, to translate that single central optimization problem into a distributed solution. And the form that it takes, is that every link computes a price, based on its own local observed load. Less in indication of the amount of loss for example, taking place on this link because a link is over capacity. Imagine sending up that feedback to the sender and the sender will saw the local optimization problem to update the path rates. Basic idea, think of it as I wanna send more because I get more utility when I do but the more I send the higher the price I pay because the price I pay is the load I send. Times the price of the pathenon. And so if my path is expensive, I'm gonna send a little bit less, because I'm only getting a little bit of marginal increase in utility as I send more and I'm paying a higher price. So every link is locally computing prices. Every sender is locally solving an optimization problem to find the sweet spot, between sending more to get more utility, and sending less to pay a lower price. Yeah. >> How sensitive are these solutions and parameters to assumptions about the utility curves? Do you measure utility directly, do you measure the curves directly and use them or do you just assume the curves? >> We're designing with a particular utility function in mind. >> So how sensitive of what you do to the functions of that curve? >> It is relatively sensitive. You wanna pick something that has the shape you want, because as long as the optimization problem's convex- >> No, I mentioned convex as the beginning- >> Yeah. >> But when it comes to the parameters of the curve it's just very sensitive to that, and you have to check that, and actually we just go with the assumption >> So there is a sensitivity. I'm not sure I'm getting quite at your question but there- >> Sensitivity to the >> There is a sensitivity in convergence time. That was one and I'll come to that in a few slide. The sort of brute force kind of approach of just doing decomposition and running ends up being quite sensitive to, it often converts very slowly. Even though it is sort of preferably optimal and will converge, it might converge very slowly. And so we found this sort of straightforward application of this technique didn't produce effective results by itself. So the basic idea, then, is now we've got a single optimization problem that's being solved by each of the links and each of the senders doing its own local computation. So the next question we cared about was, well, what if we have multiple classes of traffic, each with different performance goals. So you might imagine some traffic cares about throughput. Some cares about latency, and you might have different functions for those different traffic classes. And you might imagine that the links have a share of them with allocated the one and a with allocated to the other. Think of this as a green virtual network for the latency sensitive traffic, and a red one for the throughput sensitive traffic. And you might imagine a straightforward extension to the utility function as you wanna maximize the aggregate utility across the two classes of traffic. So again here, we can turn that crank of decomposing this optimization problem now in two steps. One that separates the two utilities from one another. Subject to the local link capacity shares. And a second on a longer time scale that decides what fraction of each link's capacity should be devoted to each one. Think of this as sort of a form of adaptive network virtualization. The topology is virtualized but the bandwidth allocation between the two classes are being dynamically adapted based on who needs it more to be able to achieve their utility goals. So, again, same basic idea, that now we can essentially design the same protocol we would have run, had this delay sensitive, or throughput sensitive traffic been the only only traffic class in the network. But we can arbitrate between them by dynamically deciding what shares of them would they each get. So, that was sort of the second step. Now, I think for me, the lessons here. One was the idea that protocol should be solving a well-stated problem. This sort of sounds obvious in hindsight right, but a lot of networking protocols were designed without a really clear specification of what problem they were solving. And in particular, what global problem they were solving rather than the local problem. A particular sender might be interested in. And then to use techniques for decomposing the problem. This gives us, I think, both a better understand of how the protocols work and when they work, and also guarantees that come from that on optimality, and conversions, and so on. But in practice as Erik's question already hinted at that isn't quite enough, the process itself is inherently iterative. We didn't really come to the solutions we did in one step and so there's sort of a human lesson here and then a few technical takeaways. One I think of as sort of research as decomposition which is this is sort of the way the work between Mung and I would go. I would say I need the network to solve this problem, he would say the math is too hard. And this is great for Mung, because instead of Mung going and working on a hard math problem he would turn back to me and say change the problem so the math is easy. He liked to put it as, if you're the professor and the student, you write an easy exam, right? So it was my job to figure out if I could change the constraints that I felt the network would have to impose and relax them in some way to make the problem easier. And then Mung would eventually say thumbs up, it's convex, I'm happy, we can apply standard optimization techniques to solve the problem now. So the art in this work was not the math. Quite the opposite, the art was in not making the math hard. So just to give two examples of where that concretely came up. One was that we found when we did the sort of standard decomposition approach we did get something that was optimal and conversed, but it conversed really, really slowly. And so we actually changed the utility function we were using accordingly to make it a little bit quicker in adapting to congestion. In particular, instead of looking just at utility, we also look at a penalty function that penalize links that were getting close to their capacity. This problem on the left, maximizing utility was what used to be congestion control was doing. Minimizing this penalty was actually what the network operators were doing. In doing traffic engineering, so there's sort of an interesting coming full circle here, that in some ways, both sets of hackers were right. Maximizing utility and keeping a damper on congestion both end up being important. And once we decomposed this problem, we ended up with something that not only provably optimal and stable, but converged much, much more quickly, because it gave an early warning when the network was getting into an unhappy place. Rather than waiting for all hell to break loose beforehand. And a second example, when we're looking at multiple classes of traffic, I described it as if there's sorta two separate virtual networks each with its own share of resources. But when we started we envisioned, there being one Q, one capacity for the link and the traffic mixing together. But in practice that made the optimization problem impossible to decompose. Because the utility of one traffic class was really intertwined with utility of the other. To put that in kinda concrete terms, delay sensitive traffic will be suffering delays in queuing because the throughput sensitive traffic would drive the links to very high levels of utilization. And so, we found to get the optimization problem to decompose, we had to have separate queues on a longer time scale with the scheduling of them changing to accommodate who needed more utility. And that came out of just observing that the math was really hard if we didn't make that separation. So the perfect example where Mung said can't do it but if we, actually could separate this through traffic classes and make their utility functions less dependent on some time scale we can. A sort of more human lesson here is also anybody who's ever been a graduate student in this room knows that research never works this way, with faculty actually talking to each other on a regular basis. So you should already be quite suspicious that I haven't told you the whole story. Which is of course, the student in the room was critical, because she was able to connect with both of us and to really wrangle us and get enough expertise with both topics, of network hardware and software constrains as well as the optimization techniques involved to be able to actually make this work happen. Now, for me, a great frustration, this work happened about 10 years ago, is it was really hard to get it deployed because again we were still stuck with the problem I mentioned at the beginning that we can't change the equipment inside the network. So, we could to experiments, we could do simulations, but we couldn't deploy. And now we're actually just starting to have the kind of programmable switch hardware. And I'll talk more about these kind of switching capabilities a little later in the talk where now it actually is possible to deploy these techniques. And I'm starting to work with folks at AT&T, sort of then coming full circle who are interested in doing a deployment of some of these ideas to see if they might be suitable for their backbone network. So that's sort of the end of the first part of the talk. If we wanna maybe stop here, if people have comments or questions before I go on? Yeah? >> So you mentioned optimization as being as the talent you're looking for to mix with networking. But it seems it's a big third bubble coming in in mechanism design in economics, which is kind of a different way of thinking about distribution of effort, and credit assignment, and payment, and so on. I'm curious if that's still an area that's waiting, pregnant with results in this space, or? >> Yeah, that's a great question. So, yes, I think so, and one piece of work I don't have in here that had that flavor was looking at how the different domains that make up the Internet interact with one another, because their economics obviously play a direct role. Entire autonomous systems, entire networks, are sort of rational actors. >> Yeah. >> Making best responses based on what their neighbors do. And there the language of game theory is a beautiful fit and it is. >> Is it an area that is wide open in the fifteen years while I was asleep at the switch per say, will that also happen in the field I've been tracking. >> I think some, particularly in the area of inter-domain routing I think so, also in areas around spectrum management or in dealing with scarcity of IP address, there's sort of been little pockets of networking, where we've seen game theory come into play. I think there's a great opportunity to do more there, frankly because I think a lot of the research on game theory has, understandably, focused on, particularly computer science work on game theory, on some of the theoretical machinery. And there hasn't been enough of a meeting of the minds with practical problems. >> But even in the problems you're putting up here, first reaction is, yes we have to do optimizations of various kinds. >> Yeah. >> Classical models, looking at the whole system, but then there's this kind of this whole distributed world of national actors that should be like, it's like a generation approach to thinking >> Yeah. >> The large scale of populations of actors, as nodes and messages for example and in that case that's wide open right now. >> I think it's mid way open, I think the reason it doesn't come up here as much is because we are focusing on single administrative domains. Where there are social welfare maximization is a reasonable model, but certainly spectrum options, any sort of distributed scarce resource I think it's very cool. And I think another thing that's interesting here is a lot of the work on game theory doesn't think it's virtually about dynamics as say, work in distributed computing does. And yet, a lot of the distributed computing work doesn't have the same sort of take on rational actors. So there's, I think, a ships in the night problem there, where those two fields could be closer together than they are. Yeah, great question. Other thoughts? So I'll talk about the second piece of work. So one thing that was comfortable for me in this work with while I'm no expert in network optimization, I'm at least familiar with sort of the language of optimization. The next project was a little bit out of my comfort zone, which is working with folks in programming languages. And the jumping off point for this work was that there was a lot of interest around ten years ago, in moving towards a more central model of network control and management, software defined networking, as was mentioned in the introduction. It was something we started working on a bit when I was at AT&T and became more prominent when open flow came around. And so we became interested in the question of how should network administrators subscribe to the network, in a central way. How that should describe how the network should actually behave? OpenFlow itself, as I'll talk about in a moment is not a linguistic formalism, it's like assembly language. It's better than microcode, but it's still pretty low level. And so, I talk a little bit about this project, the Frenetic project, that was done with Dave Walker, my colleague at Princeton, in Programming Languages, and Nate Foster, who is a Professor at Cornell, who was a post-doc at the time in our groups. So, in Software-Defined Networking, now, you sort of pop up a little from what I described before, and now the underlying network devices, are largely done. You have a logically centralized controller that is running some sort of application, that is gonna tell the underlying switches what to do. And so that two basic ideas here one is the idea that decisions should be made based on network wide visibility. And exercising network wide control and that you should do so using some direct open interface to the underlined forward capabilities of the switch. This is what was lacking when I was at AT&T, when we were not allowed to change the equipment,not even the software of the equipment, this changed when there started the open interface between the software on the device and the low level hardware that forwards packets at high speeds. And so open flow which came out around 2008, was essentially a beginning of thinking about what these interfaces to the switch hardware might be. So at a high level, what OpenFlow 1.0, the original version of OpenFlow did, was to say well every switch has a rule table. Packets come in, they get matched against the highest priority rule based on patterns in the packet header. And the patterns expressing rules and the simple action associated with the matching rules is the one that gets performed. So a good example is a packet comes in, you might look at this prioritized list and based on the source IP address if it starts with the first 2 octets of 1 & 2 and destination starts with the first 3 octets of 3, 4, and 5, then those packets would be dropped. If the packet doesn't match this pattern, it will fall through to the second rule, and maybe be forwarded out of a port two or maybe it follows to the third rule and get sent to some sort of off port control. So basic idea here, this actually very cute idea, is that all of these marketing terms that networking is frankly rife with, router, switch, firewall, load balancing, net, firewall. All of these things fall into this basic framework. If you match on the MAC address, you're a switch. If you match on the IP address, you're a router. If you're doing access control based on several of these fields, you're a firewall. If you're modifying some of the fields, you might be a network address translator. But that the underlying hardware mechanism is largely the same, a rule table with patterns of matching and very simple actions. And by taking that interface that already matched what the underlying hardware was actually doing. So this is not designing new hardware it's just opening up the API to the underlying hardware that existed at the time. And in particular, a particular kind of memory called the ternary contact-addressable memory that allows these lookups to take place at very high speed by doing them in parallel across all the rules in the switch. So there is sort of a jumping point that you have now this open interface to this wedge, but it still extremely low level as I mentioned a moment ago OpenFlow mechanism, not a linguistic formalism. It' s a away to install rules and switches and if the reason about those rules would do to your packets, but you're still thinking about priorities of rules and wild cards and. Thick masks and overlapping rules and so on, making it very difficult to actually write applications on top. And so what we really wanted to do, is come up with an effective programming abstraction that will let people write their apps without having to be experts in the lingua franca of OpenFlow. And in particular, we should've started with the question of marginality. Some of the job problems. So, you wanna be able to write an application for your network. But actually, you really wanna write multiple. You wanna route, and monitor, and do firewall, and load balancing, and various other things. But most importantly, you wanna do all of them on the same traffic. It's not you wanna route some traffic and monitor other. You actually want the illusion that you're doing this on all of the traffic and yet still be able to avoid building a monolithic application that would be naturally hard to test, debug, and so on. That sort of motherhood in apple pie, right, you want modular, the ability to develop modular application, so you can write each of these apps separately. But because each of them partially specify how the same traffic should be handled, you can't just run all four of them with isolation mechanisms in place. You need to be able to reason about their composition. So you can write and reason about them separately and yet still have them act as a single application on the single set of traffic affecting a single table of rules in the underlying switch. So that's the problem we were trying to solve. So my colleagues and programming languages, like to think about everything as a function and everything is what, composition of functions, and so a functional reactive programming was the area of that, we started looking at to give us ideas. So the first idea is how can you model OpenFlow itself? So you can think of OpenFlow as a function. A function from a packet to a set of packets. So packet and its location, this is all the header fields including something logical about where the packet is at this moment, what switch and what part of the switch are at. And they're gonna do a function on this packet that would generate some set of packets, the reason that it's a set is if you drop the packet, it might become an empty set. If you forward the packet, it might become a new packet at a new location. If you multi cast the packet, it might be multiple packets at multiple locations. So the basic idea is you have a packet and it's location, and then you're going to spit out a set of packets and their locations. And that's gonna mimic what operation the switch is actually performing on the packets. Because that's sort of the first basic idea. And just to give us really simple example, you might imagine having Boolean predicates that match on certain aspects of the packet's location and heterofields. And then do modifications to the packet's location and heterofields accordingly, to do packet forwarding, or in this case also netting, by changing the packet's destination IP. Yeah, that's sort of the first idea. The second is now you've go this functional way of thinking about policy, you can do sort of standard kinds of composition. And in particularly, you could imagine that two basic ideas of doing composition in parallel and in series. So just to give a concrete example, let's suppose I'm gonna monitor traffic by source IP address. I wanna give you the histogram of the traffic by a sender. And I wanna route the traffic based on the address block it's destined to. So I wanna route on destination prefix. Then I might have one function that tells me to count traffic depending on the value of the source IP. And I might have a second function that wants to forward traffic out of a particular output ort based on the value of the destination IP. But of course, I actually wanna do both. For every packet, it's as if I have two copies of this packet. One I wanna monitor and the other I wanna forward. And in practice, I really wanna do both and synthesize a single table of rules in the switch that went to both. And so parallel composition does exactly that. If you look under the hood, it's ugly. You can think of this as sort of an ugly cross product of all combinations of sources and destinations and do both the counting and the forwarding. The key point here is that the person writing the code doesn't have to think about gobbledygook. They just think about monitoring and routing as independent entities. They may buy their monitoring application from one party and write their own for the other. Doesn't matter because they can really reason about them independently and snap them together in this way. Okay, that's sort of the first idea. But many things besides monitoring wanna do more to the packet than just look at it. They actually wanna change somehow the way the packet is handled, that gets a bit more complicated. So just to give a simple, concrete example, imagine you wanna build a load balancer. And I know there's been a number of projects here, etc., that have done exactly that. The switches can play a role in some primitive parts of load balancing. So a common thing when you're building a server load balancer, you have multiple backend servers, let's say with three different addresses, but a single public facing address that the clients send requests to. That box may be a load balancer and what it's gonna do is select different subsets of the request, ito go to different backend servers by rewriting the destination IP address to send it to the chosen backend server. So here you can see this is a sort of sequential operation. One is picking the backend server to achieve some load balancing goal among whatever servers are up at this time, and a second to actually route the traffic to the chosen backend server based on its choice. And so here you could imagine a form of, think of this as the Unix pipe operator, right? You're essentially load balancing first, and the output of load balancing should affect the inputs to routing. So concrete example, you might imagine all packets whose source IP address starts with 0 that are going to a particular public service will go to one of the two servers, and all who start with 1 might go to the other. This is sort of roughly dividing in half the traffic so that half the requests go to one server and half go to the other. Routing might be completely unaware of that process and just be deciding what the shortest path is to get to each of these backend servers and forwarding the package. But again, of course, we wanna do both, which underneath the hood if there's only one rule table and this switch is kind of a yucky mess of matching on packets, modifying the destination and forwarding the packet as if the destination had already been modified, right. Because it's essentially, smushing together two different functions and synthesizing one that has the effect of both done in series. So it's just sort of the two basic composition operators. And then we did a bunch of other things around abstracting the topology, abstractions for monitoring the network, abstractions for updating the network that I'm not gonna talk about here. But they all kind of are in the same spirit of trying to find simple, reusable, and composable abstractions for querying, computing, and updating policies on the network. So a couple lessons here. OpenFlow was an important first step. It generalized the capabilities of a lot of commodity networking devices and frankly, for me, it was hugely helpful because I could explain it to people in programming languages without them running away screaming. One thing I find challenging about doing interdisciplinary work in networking is our field is riddled with acronyms and minutia, and it often is just very off-putting to people that actually want to think creatively in their own field. And so having a simple abstraction to explain to a colleague without them having to know all the alphabet soup of networking was just tremendously helpful. I don't think we would've been able to do this work without that. And then, from the PL community, we really got this sort of taste for thinking in a compositional way. And for painstakingly thinking really hard about what were the right, simple and reusable abstractions, and continuing to revisit them as the work progressed. And in fact, my colleagues did a really nice piece of followup work called Netcat, which recognized that this policy language that I just was describing earlier, and the composition operators are actually a lovely example of something called a Kleene algebra with tests, which has a long and storied history. And so you can actually map everything I've just been talking about and more into this nice mathematical construct. And it's very useful for synthesizing networks, for doing verification on networks, for virtualizing networks, and so on. So if you have any interest in this space the Netcat paper from Nate Foster and Dave Walker and Dexter Kozen and others, it's a really lovely paper. So that essentially built on top of recognizing that the abstractions we'd already built actually were not entirely new. They really had a very nice mapping into some existing ideas. And as I mentioned, we also kind of came back to that whole control loop that I've been obsessed with ever since I started working with Albert. That we want abstractions for measurement, for making decisions, and finally for doing distributed updates to whole collections of switches as if it happens in one fell swoop. And so there's a body of work focused on that also. The human lesson here, one was really Nate. So Nate did a postdoc with me and Dave at Princeton for a year before he went to Cornell. And he basically was embedded in my group, sitting with networking people, absorbing networking lore, writing lots of painful code on the early OpenFlow controllers at the time, experiencing pain and reflecting on that pain. So this was very much a learn by doing kind of project. There was an open source network controller called NOX that was an enabler there because we could get started trying to build real applications. And then reflect on what was painful, what kind of repeat code we had in multiple places. And a lot of that allowed the extractions to follow. And finally, there's been a lot of work by Nate, largely by Nate, but also to some extent by me and Dave, on creating software that other people could use. Tutorials, summer schools, keynote talks in each others' conferences, publishing in each others' conferences, that I think helped a lot in creating a community of people that were able to build on this work. And now we see in both PL conferences and networking conferences a significant number of papers in both. Just as an anecdote there, the second PL paper that was sent to POPL in this space, the reviewers came back saying there's already been one paper on programmable networks, aren't we done? And I think we don't get that anymore, and I'm sure networking, now I hear people complaining hey, a third of the conference is on PL now. So I think there's definitely been some misunderstanding of how rich the problem space might be. But I do think we're sort of over a hump now, where people appreciate that there's some really rich and interesting work to do at the intersection, yeah? >> Yeah, so I saw Ned talking about- >> I hope you weren't that reviewer. >> What's that? >> [LAUGH] >> No, I saw Nate talk about P4 and verifying P4. I'm curious, it seems like my understanding of sort of where this language went was it got to sorta like the C programming level of specifying, which is a much higher level than you started with. But it's not at the level of sort of there's contracts and you can verify aspects of the program statically, essentially. So I'm curious, do you think that things went far enough, or is there more room here? Are we kind of at a point where the network community says, yeah, this is good enough for us? It's sort of high enough level, or do you think there's still room for even raising the- >> I completely agree that there's room to go further. Even programming in something like frenetic, the language we created, is quite low level. And in particular, when I talk to companies who are not software companies of their own, not Google, Microsoft, Facebook, and so on, they find this doesn't solve their problem. It might be under the hood in something that solves their problem. But it's much lower level. They wanna talk about service level agreements and high level specifications, exactly what you just said. So I do think there's an exciting opportunity there that could come up with languages that talk about that, and maybe synthesizing under the hood, things that can be compiled to lower levels. A second thing we really haven't done that's really important is many of these abstractions give a program a lot of power but not a cost model. I gave a talk on frenetic at Google. And the first thing I heard from the people at Google is that if you give people a programming interface to the switch, they will bring down your network. >> [LAUGH] >> Okay, so clearly there's a need to give people capabilities for reasoning about the overhead of the operations they do. If you create something that creates a cross product, and you have an explosion of state in the switch or Create a policy that forces a lot of the packets to go through the controller, that's not okay. And I think we're not really still at a good point of reasoning about the overheads that are introduced when people write these applications. And then newer switch hardware has capabilities, I'll talk about that next, that are not captured by this programming model. And so I think there's a lot to do to make the underlined data plane richer and be able to put more of the operations down there. You had something else. >> Yeah, so I just want to reflect a little bit on the lessons learned, right? So and also about your [INAUDIBLE] research. So I think one thing, so in 1990, right, so in terms of separate research I think this notion of separate in control planes with data planes was quite well established in wireless industry. For example this notion of having dumb access points and smart switches was there and then what happened is in fact I personally appear to call architecture based on that. But what then happened is that there was a lot of push from companies like Microsoft itself to put, a lot of these marks in the switches itself, so it 1x happened because they wanted security at the switch. Now the reason for that was because there was sort of separation of labor, like who controls what. And from a Microsoft perspective, it was sort of important to say well, if you take that responsibility, I don't have to worry about it, and so I can do the other things. >> Right. >> Now what happened in the wire line networking space was, it wasn't getting any traction either. I mean, you guys were doing it, and Enterprise wasn't getting any action. It was only because Cloud became a thing, where it became that everybody owned their own equipment. And once you own the equipment you wanted to have the flexibility to program your own equipment because you were being sold features that you didn't really care about, and you were spending a lot of money for that stuff. So even though this is happening in academia, what is happening in industry, in the business is actually impacting how effective any of this is becoming. And so I think as you reflect on all these things and as we move forward, I think that has to be a pretty serious consideration about how businesses are moving forward and what is happening there, why research has been done. Otherwise, you do decades of research and nobody ever pays attention to it. And I think when that happens, then these things start to light up. Even now, for example, if you look at the internet in wide area, these things are not taken out there. So the question that Erik was asking about the economics of it, I think that is more spot on because the economics drives that design as opposed to enjoyability like [INAUDIBLE] >> Yeah I that though kinda what you said particularly just to put it in the context of OpenFlow I think so there's a whole body of people in networking, there's a whole body of work in active networking that didn't get off the ground sort of in the mid 90s, mid or late 90s. And I think what happened with OpenFlow that made it more successful was partially timing, right. There were starting to be third party vendors merchant silicon companies that were selling chipsets for building switches. So you could build a switch without being a large vendor with its own fabrication facility and there were cloud operators who had unique requirements and ability to write on code and that was sort of a perfect storm. And I think what Nick McKeown, in particular, because he was really the one who spearheaded the work on OpenFlow, was to recognize this was a special moment. >> Yeah. >> And that if some effort was put in place at the right time to get the APIs in place, a lot of good stuff would follow. And so I do think that the definition of OpenFlow was strategic, it's not a beautiful spec. But it was at the right time, and had a right sweet spot of being flexible enough to do something interesting, without being ahead of what the hardware could do at the time. >> Yeah, I just wanted to have acknowledgement of the fact that a lot of things that are external to the community happened, which caused something to- >> Completely agree. >> Otherwise, you're just can't be doing great work, it doesn't really do anything. >> Yeah and I guess I sort of hinted a bit at that in the first project, which was not successful at the time in terms of give you use success in terms of having successful tech transfer. Because it was sort of in that sense ahead in that it assumed capabilities of the hardware that didn't exist at the time, so there was nowhere for it to go when the work was actually done. Yeah. >> I have a more technical question, but I think you've partially answered it when you commented about cost models. >> Mm-hm. >> Because in these SDS situations, it's not just all about composition modularity, even for the examples you gave on both balancing and NAT, you have to do at least some arithmetic minimally. So what are the limits on, what, can you write [INAUDIBLE] complete computations, for example? >> So one way to think about it is that the underlying switch has this very limited match action paradigm that's not remotely close to [INAUDIBLE] complete and then the controller is an arbitrary software that can do everything else. Of course that's a glib statement because of then at great cost. Right, so they're sort of similar to wireless setting that Victor was talking about, you essentially have these dumb switch devices that are fast but only have relatively simple computational model then we got the slow central control that can do just about anything. And what's happened in the last few years is a revisiting of what the switch could do, if you're willing to now say programmable networks of what I want, and I'm willing to design new switches with that paradigm in mind, then there is an opportunity to think about what can go in the switches to make it possible but do more of the computation directly in the switch. >> So is it reasonable to guess, on this specific work, that you're talking about, especially with David Walker and Nate Foster involved, these fairly powerful functional, do you have the whole lambda calculus available? >> So you could in fact you can essentially view that really what you're doing is installing a set of functions, or a composed set of functions on the network over time. >> I see. >> And the software that decides what the functions should be at any given time should be arbitrary code right. And Nate had a version of Frenetic that was implemented in OCamel we had a version implemented in Python, but essentially you could imagine taking whatever other programming constructs you might want for constructing those functions and then across apps you would compose them and synthesize them. >> Then I guess Victor's earlier comments tie into this because partly what the equipment manufacturers are willing to actually put in to the underlying network infrastructure would also put a limit to what's actually practical then. >> Yeah, exactly and OpenFlow is embarrassingly cognisant of that, right, I mean OpenFlow basically defined only things that could be done in existing switch hardware. And I try to be as general as I could subject to that constraint but that's a fairly significant constraint. And to the question that you asked about a few minutes ago, the software running above the switch is now kind of arbitrary code. So if you wanna think about verification, that's not good, right? So you definitely want languages further up the stack if you wanna be able to reason collectively about the entire system, not just about the part that's running in the network. >> Jen, just about the open. The original spec was pretty because you guys try to be very general, right? So it was very bloated in a way that the switch manufacturers didn't actually commit the entire spec, right? So when we talk to a lot of these guys, we implement some subset of these things as Peter was saying. So I guess the question is that would you reflect back and say that maybe you were too general in some sense or trying to solve too many problems for too many people? >> That's a really good question. So the Open Flow spec evolved. There's sorta OpenFlow 1.0 that was pretty simple and then it quickly became 1.1, 1.2 and so on. And I think what happened is people wanted greater expressiveness in the switch, and yet, the more that happened the more complex the spec got, it became the kitchen sink. The original OpenFlow spec could match on like 12 header fields with a simple set of actions, 1.5 could match on 42 header fields and do a whole bunch other actions with multiple stages of tables and your brain starts to explode. And some vendors implement some parts, and some implement the other. >> So what do you take back on that, cuz we saw that in very real way. >> Yeah, no I think the industry got ahead of itself. It's understandable because people wanted to write applications that couldn't run efficiently running OpenFlow 1.0. So that's this sort of other side of the coin of being pragmatic to the existing switch hardware and I think that's why you're seeing now new switch hardware, right, that's trying to address the limitation of functionality that- >> All of these questions matter a lot to the networks firm, the clouds also. >> Of course. Yeah. Exactly. And I think many people found OpenFlow was too much and not enough, right. And so a lot of people started doing things in the hypervisor or in the nick because OpenFlow switch is more capable enough to do more. And some of that's just because they were defined that way because of the legacy chips. And some of it is, it's really at line rate in this wedge to do really complicated things, yeah. So I thing the vote's still out, actually, right? In fact, to Ben's question about programming, I think many people don't wanna program the network. They wanna program the system. And a lot of this work is still focused on the network, right? And you'd like to think you're gonna talk about the machines as a pool of resources. You know, the servers, the nicks, the switches, and so on. And right now it's still very siloed. So I think there's a lot of scope to do. Not only to go up the stack, but to go broader, as well. So that's a great segue to thinking about the ability to program new hardware. So the last thing I wanted to talk about was some really recent work. So this is gonna be a little bit more narrow and a little more focused on a very specific technical problem, just to illustrate some of the work we've been doing on network monitoring. So the big frustration in a lot of the work that I was involved in at AT&T is many networking problems, knowing what's going on is harder than knowing what to do about it. And that's often because we just don't have sufficient visibility into the traffic, the performance, and so on. Often working with measurement techniques that are quite coarse grained and slow. So I became very interested in how to leverage emerging switch hardware to be able to do better measurements. And that's sort of the big focus on my current research. So the work on Openflow and Frenetic and so on also led to a bunch of programming obstructions. That led us to rethink what we would want the hardware of the future to look like if we could go beyond Openflow. And so a number of people became interested, us and other groups as well, in more programmable packet processing directly in the switch. And so I was gonna tell you a little bit about how current switch hardware that's coming out of a number of companies, Barefoot Networks, NIC cards from Netronome, and Xilinx, and so on, how they're thinking about what the switch hardware of the future should do. The first is that instead of having a fixed set of header fields, first 12, then a bunch more, then 41, then 42 in later versions of Open Flow, there should just be a parser. And that parser should be programmable as sort of a little state machine that can decide how to read the packet and which fields to extract from it. This is not a particularly deep idea. And in fact a lot of switch hardware has this. It's just that it's not exposed to the programmer. So a programable parser. And then a series of match-action tables, just like Openflow had, but able to match on the fields that are coming out of the parser. And also on metadata that might get passed along from one stage to the next. Now often the computation you wanna do might go across multiple packets. And so it'd be nice to have some storage, some registers, if you will, that can keep state about history of past packets and computations on them in the switch. And then ability to pass state, also, from one stage to the next. So that some of the two big things here are being able to parse, match, and act on a programmable set of header fields. And the second is the ability to create a maintained state. State across stages, between stages in the switch. And state and registers that might be read when the next packet comes along. So this computational model is still pretty primitive. But it's much richer than Openflow. And it's still possible to run at line rate. So again, the goal here is to find the sweet spot that's amenable to efficient hardware implementation to run at line rate. And yet still as programmable as we can make it, subject to that constraint. And so the problem space I'm particularly interested in here is measurement. So what we have here in this picture is a streaming platform, right? We've got a modest amount of state. Modest computational capabilities. And a pipelined architecture that one can do processing in multiple stages. It's a perfect fit for the problem of streaming algorithms, were you wanna do some simple computation on packets as they fly by, keep a limited amount of states. And there's actually a very rich literature in theoretical computer science that goes under the name of compact data structures or streaming algorithms, or sketches, that create a great opportunity to actually realize those algorithms now, directly in the data plane. But there's a catch. The theory literature is not a perfect fit, because they often assume a richer computational model than the one I just presented. And so, a lot of this work is taking ideas from compact data structures. And figuring out where it doesn't work because it violates some assumption of the computational model I just presented. And trying to find a meeting of the minds, where hopefully we can approximate those algorithms in some reasonable way, while still working within the switch. In particular, we've got a pipelined model of computation. It's very difficult, after you've read a register or written it, to come back to it later with the same packet. You sort of have to leave behind you something lying in wait for the next packet. But it's very hard to do repeat operations on the same data. The amount of memory involved is quite modest, and there's simple AOU's associated with the action processing. But you can do very simple if, then, else processing, addition, subtraction, not division, etc. So again, a very very primitive computational model. So the question we looked at was sort of very natural one to think about, which is heavy-hitter detection. In my network there might be tens of thousands of flows going through it, all conversations taking place independently. I wanna know the k largest. Or the ones that are exceeding a particular rate of sending. And the challenge is I don't have memory to store data amount of reflow. So in the theory literature there's a simple algorithm. It's a very natural algorithm for doing this called the Space-Savings algorithm. Essentially what you do is you store a key-value table. The key is, let's say, an identifier for the flow. This could be the source IP address or the source destination pair. Or whatever you define a flow as. And the value is a count of the number of packets or the number of bytes that have been seen for this flow so far. And the goal is to identify only those above a threshold or only those that are the biggest. And so the basic idea of the Space-Savings algorithm is if a key comes in that's already in the table, you just increment its count. If the table's not full and a new key comes in, you put it in. And if the table is full, and you have to evict something, you evict, what would you expect? You would evict the key with the minimum value, right? Because that's the one least likely to be significant down the road. And so, you would find key 5 with a value of 1. And you would evict it to make room for key 7. That's it, yeah, that's sort of the basic algorithm. This does not work in the computational model I mentioned for a bunch of reasons. First of all, it requires an order and scan at the table to find the current minimum, which isn't gonna be feasible. We really wanna do one read or a small number of reads and writes in each stage of the table processing. So we can't do this. So we're just gonna walk you through succession of some simple ideas. They're gonna each be natural. But at the end they turn into a solution that can run In one of these switches. So the first idea, very simple one, is that you would not need to look at the entire table to find the true minimum. If you got kinda close to the minimum, that would probably be good enough. So let's look at d entries in the table, rather than all n of them. Now just a simple case, let's suppose it's 2. Then when the key comes in, you might apply two hash functions to the key. Look at two different locations in the memory. And recognize, well one of these is the smaller of the two. I'll evict that. Its not perfect. I would have loved to have evicted key 5. But I didn't know about it cuz I happened not to pick it. But the hope is that you would do a reasonably good job not evicting one of the true heavy hitters, if you apply the technique like that, okay? Now even that is not ideal, because it does multiple memory accesses at a time. So a simple extension of this idea is to make this table. Actually multiple tables, one per stage, in the pipeline of processing. So a packet would now have a single read and write in a single entry in each stage. So you might, let's say, hash the key here, and recognize that key 2 should be evicted. We have a problem, though. Cuz if I were to go through multiple stages of processing and look at key 2 and key 4, and then compute the minimum, I don't have the luxury of circulating back to update the table entry that has the true minimum in it. I only know the minimum of these d things when I get to the end of the processing. And I don't have the luxury, without great efficiency, to route the packet back through again a second time to be able to actually do the updates to evict the true minimum. I don't know at the beginning this is actually be the minimum, because I haven't looked at the rest of the tables. So the final solution we came up with essentially does the computation of the minimum in a rolling fashion, along the pipeline, rather than doing any recirculation. And the basic idea is this, I come in with a key. I look, I evict the first entry. Gee, that was a huge mistake. That's actually the biggest flow in the system. So I really don't wanna evict it. But that's okay. I'm gonna carry it along to the next stage where it has a chance to compete with whatever entry it hashes to. And it'll compete now and recognize, well I'm bigger than key 4 with a value of 2. And so that's the one that gets evicted. So essentially by processing each packet once in each stage, reading and writing one entry at each stage. And passing a little metadata from one step to the next can essentially evict something close to the true minimum even though it's not exactly right. So just to think about how this matches the computational model I talked about earlier, we're essentially storing these tables in the register arrays that I showed you earlier. We're using a simple arithmetic logic and you have to do hash function on the packet header. We're using simple logic to do conditional updates to compute the minimum. And we're using small amount of metadata to pass along with the real packet information about its key, or about a key that's been evicted in a previous stage in processing for this packet. And then the nice thing is that even though this is not exactly the space savings algorithm from the theory literature. In practice we do really close in terms of the overhead and performance that one would get for using that algorithm. So that's sort of a little deep dive into one example problem. What we'd really like to do is come up with a more general way of thinking about this computational model and what kinds of questions can be answered efficiently using it. But in terms of the lessons here one is just getting at the beginning, super concrete. Defining a very simple computational model, inspired by the capabilities of new hardware. A simple example problem that has a straw man solution that's been heavily studied in the theory literature. And then iteratively relaxing that design based on the constraints that the hardware imposes. But of course, we have a long way to go. We'd like to have provable bounds on accuracy, that's something we lack for this solution right now. We have some sort of simple analysis that helps us get insight, but it's not a very tight bound on performance. And would like to have a more general way to approach a broader set of analysis questions using techniques of this type. So that's sort of the third piece of work. And I think I'm just gonna end with a few minutes of thoughts about interdisciplinary work. And when I first gave this talk for the Athena Lecture was to a largely academic community where I think I was trying to sell people on the merits of interdisciplinary work. I'm at a place here where you guys do that all the time. So indulge me for a moment on a few thoughts about lessons about that I know I'm sort of preaching to the choir. So I personally find interdisciplinary work intellectually really exciting. You learn about other areas. And perhaps the more surprising thing is you learn about your own area. When you have to explain it to someone who is a natural skeptic and maybe not naturally passionate about the minutiae of your field. And so it becomes a nice way to learn how to articulate your own field to other people and often real problems are inherently interdisciplinary. So it's part of a broader hope for impact on the field of IT in general, it's a nice style of work to do. And frankly, it's actually just a lot of fun. I've gotten to know each of the people I mentioned in these three collaborations, really well. And sometimes the fun doesn't involve work. Sometimes they're connoisseurs of fine wine and food as well. But a few thoughts, particularly for younger people in the field. One lesson you might get from this talk is that, I've done a little bit of work in network optimization, a little bit of work in programming languages, a little bit of work in compact data structures. And in fact, I've really done very little work in any of those, because I've worked a lot with collaborators who had expertise in those and other areas. So I've been pretty opportunistic. If someone will play with me, and they have a hammer that's relevant to the problems I work on, I say yes. That's not always the right way to go. In particular, there can be merit and honing ones own single hammer to use on a range of problems, particularly for grad students or people early in their career. It gives you a way to control your fate, and to do work without being dependent on a collaborator at all times. It requires a bit of care because you end up having to pick that hammer very wisely because becoming an expert takes a lot of time. And you wanna make sure that hammer is useful for a wide range of problems in your field, not just one. You can do what I did, which is to work by collaboration. It's riskier for junior folks, but it can be okay if you can define a niche for yourself involving digging really deep into crystallizing models of what the computational capabilities are of the devices in your own field. And that's sort of where I've tried to walk the line between deeply understanding the capabilities below me. And the network administrative tasks above me and translate those into terms that other people and other fields can think about. But it's a bit more of a nebulous skill set than being an expert in a particular disciplinary domain, so certainly it's something to do with caution. There can also be a pretty steep learning curve, many of these projects I just mentioned were multi-year efforts before anything happened. That was interesting, because often there's a lot of time taken to learn the language of both of the two different fields, and the two different cultures. And in some cases, it can be easier in one direction than the other. Just in working with programming languages, colleagues have certainly found it's been easier to get programming languages students to do great work in networking than the other way around. Because PL has many years of training and a lot of aesthetic involved in it that can be very hard to train in a short period of time. Different subsets of networking can themselves be challenging, but they're often things you learn more incrementally as you go. And you can often do interesting work without knowing everything about networking. So I certainly found that it's sometimes easier to go from the hammer towards the nail than the other way around. And one way to walk that line is to join an existing community of people in a networking PL intersection is an example of that. Now, where there's already some technical foundation, software stacks, models to think of, of course, it's always with some risk, because if you join the party too late, it's not as necessarily as interesting. But sometimes there can be the beginnings of a new community that afford an opportunity to do new work without doing all the heavy lifting yourself. One thing I hear a lot from people is credit. How does this work when you're publishing in two different venues and there's may authors on papers? And I would just say a couple lessons there. I've worked in these projects with a number of junior collaborators, and clearly one way to walk that line is to publish where it is appropriate for the junior collaborator, so that they're not taking on a risk that you no longer have to worry about taking. And also understanding the evaluation processes of one's own institution so you know whether the work that's done with multiple senior or multiple faculty co-authors will be valued in the tenure evaluation process. And finally, I'll just note that sometimes interdisciplinary work can make this problem easier. If I write the paper with Nate, even when he was a relatively junior colleague, it was really really obvious which work was his and which work was mine. Cuz there's just stuff in the paper that I could never have done myself that I barely understand. Now, if he is working with a PL colleague, it is a little less clear and likewise if I'm working with a junior networking person, it can be a little less clear who is bringing what to the table. But within a disciplinary work it's often pretty easy to tell whose fingerprints are on which parts of the work. >> James Bond was [INAUDIBLE] >> [LAUGH] >> [INAUDIBLE] >> Exactly, bingo, bingo. Yeah, I remember when my colleague Jen Fagamom once said to me she said, networking papers are weird, they have a lot text, yeah. [LAUGH] So there's truth in that, I did the text. So I think another piece is helping interdisciplinary collaborations can be challenging. I only talked about three projects that were at least to some degree successful. There were a tons of projects I chose not to talk about because they weren't. And reasons they weren't successful was often collaborators don't stick together. It's a relationship that requires nurturing, and also people that are willing to dig in for a long period of time before something interesting happens. And the bottomline is, sometimes the work isn't actually interesting in both fields where there can be an expertise I need to get my job done. But if that work is not interesting intellectually to the other person because it's sort of routine, run of the mill, that's not gonna be a lasting collaboration, unless they really have a lot of patience for tech transfer, yeah? >> I think even when it is an exciting field, sometimes it's hard to convince the people in your field. On the intersection, if you're trying to publish in POPL, the people who review the papers look at the POPL results. And they don't know networking so they don't understand fully what the challenges are, how hard it is. So I think there's a certain amount of just assuming that, if it's this collaboration it's a weaker form of the same kind of thing. >> Yeah, particularly in the work with Mung that ended up being an issue on the optimization work I talked about in the beginning. Cuz networking conferences would read it and be like, there's scary math in it. And then the optimization venues would be like, this is standard optimization theory. Because we're trying to pick problem formulations that were amenable to simple math. Simple math, for the optimization folks. Right, cuz we wanted everything to be convex so we could do optimization decomposition. But in my own community there's Greek in it, so they're like I don't know. What you guys are up to, it seems a little weird. So that definitely had some issues because of differences in value structure. And in fact, another piece of that is some communities value proofs and some value code and not many value both. And so if you want the paper to have both, if you want to sort of make the more applied people and the more theoretical people happy, that can certainly be a challenge. And at a minimum, the collaborators need a shared union of each other's value, if you will, or union of their values, to be able to make that work. And of course, engaging students and post docs. I found this is actually generally pretty good in the sense that students, often are excited about interdisciplinary work, because it is exciting, it is a little risky, but it is unique, it gives them a niche to be in but, it does require more training and more patience. And wrangling two or more advisor's, which is always a challenge of a non-technical nature as well. Finally, I know this is going to sound kind of weird for an Internet researcher to say, but I'm a big fan of physical proximity. This kind of work, all happened in person, all of it. The unsuccessful projects were a mix of in person and not, but it's just super hard when two faculty members or two researchers, really need to have air time with one another. It's just very, very difficult to have those early stages of collaboration take place not in person. This has worked for me remotely, when a student does an internship and I had many successful projects at Microsoft of this type, where students spent a summer or longer at Microsoft, and made bridges stronger to make that possible. But for projects where it's like two faculty members at different institutions working, it's just kind of a blood bath, because people just, out of sight, out of mind and it's just hard to talk on Skype or hangouts all the time. So I'll just conclude by saying, computer networking, I've always found it a very exciting area of research, because it's a rich space of important practical problems that affect real people's live's, and as an intellectually rich space of really hard problems. And the sort of general grand challenge of building networks worthy of the trust society increasingly puts on them, is intellectually enriching and has really I think potential for a lot of practical value. But it definitely I think requires ability to bridge across the divides with some of these neighboring disciplines within computer science and elsewhere to make it happen. So thanks for your attention, I'm also happy to take questions. >> [APPLAUSE] >> Yeah? >> I like the metaphor hammer, you only know what hammer you need after you decide what to do, right? I'm more curious about, in this year how you build up your vision, and how do you decide which directive is better than the other so you would jump? >> Yeah, I think in particular for the first two projects, I've always been interested in resource optimization questions. A lot of the work that I did at AT&T, was always about optimizing resources and about the pain network administrators felt in getting their networks to do what they wanted. So optimization and PL, they just sort of jump out as hey, there's pain around resource optimization and around expression. And it became sort of natural to think of those two fields. And then, I think recently, with these new emerging hardware devices, it's really clear that it's running a streaming algorithm with limited storage and computation. And so then you're like, that's compact data structures. So I think for those three particular projects it was sort of natural. I think it could be harder in other areas. But I think it's not a coincidence, that those particular problems were born out of networking, people are always preoccupied with problems of scarcity, right? And problems of scarcity really benefit from algorithms and optimization. So that's just like our bread and butter. And the PL stuff is a little more of a stretch, because I've never take a PL course. >> [LAUGH] >> I've actually never taken a networking course either. >> [LAUGH] >> Either, for that matter, actually. But, in fact, actually some of my colleagues who are more disparaging of networking think that might be why I stayed in the field. That if I'd actually taken a networking course, I might think differently about the field. But yeah, so I think some of it is. Those are sort of the most natural collaborations. I think the Eric's point's about game theory and economics. I think that's an area that's natural, too, but that's tougher, cuz I think it's a bigger reach, I think, for networking people who don't already have some of that background. Whereas some of these stay within CS, and I do think there's starting to be a more vibrant sort of mechanism design and algorithmic game theory community within CS. But before that built up, I think it was actually really hard, cuz there's an even bigger gap to talk to an economist. Like, what is that? I don't even know how to do that, yeah? >> The one thing that I think is interesting about this relationship with EPL networking is there's sort of a, the history of PL and hardware design, and that evolved over decades. And then the network community sort of saw that there was a potential opportunity there, a similar relationship. But they had all that history to sort of quickly move up, and get on par with. And so I think it's exciting to see the kind of rapid development of these ideas. And how they get applied in practice. >> Yeah, I think part of that, going back to points that Victor was making, was because there was an inflection point in the networking community with Cloud, with merchant sets that changed the nature of the question. So it was possible to build on top of something that wasn't there before. >> Yeah, I just want to add to that, I think that it's actually really correct. For example, if you thinking by the cloud, I just, I think you saw my blog, I wrote recently. >> Yeah. >> I think, things are getting so complicated. I mean so complex. In here, we live actually in a very pristine world in some sense where every kids are thinking clear of things. I just had lunch with a friend on the other side, and he sort of explained to me. >> The other side of what? >> Side of what? >> [INAUDIBLE] >> [LAUGH] >> Okay [CROSSTALK] >> I was thinking of someone in prison or [LAUGH] >> Anyway so the point was that, dealing with that complexity. I mean, it's sort of like, it's always a firefighting drill, it's non-stop. Now how much capacity do you have? How do you sort of move VMs around? And networks actually, and things go down all the time. The problem is, that if one of, as you know, one router goes down, one switch goes down, then entire massive sections of the cloud come down, which means all these properties, everything comes out. So, as we progress, it's becoming even more clearer that we should have done this right in the first place, which is that we should not have acted altogether the way we've acted altogether. We should have had some sort of discipline in the way we did, we took conceptual programming, and so this is what's going on with net verification now for example, right all this work, that's going on. >> Yep, and I think the other important point about cloud companies, Microsoft included the network is just one piece of the infrastructure, and in every other part of the infrastructure you're allowed to write your own code and touch the code and that somehow the network is sacrosanct. The network's in the way and you can't fix it. And so people just defend it. Whereas, I think, if you work at a networking company, that's just so normal to you, but if you were in a company, where networking is just one of many pieces, that's just such an anathema. >> Besides, what I wanted to say was that will keep us busy for a while. There's no question about it. >> Yeah. But still there are other aspects to things too. Like for example you know latency is a thing, think latency the first time. You're getting faster and faster. We don't really know why the speed we can't get to the speed of light. Like why can't we move pack and star fast enough to actually be able to get to it. So there are all these other things that require it, but the balance is where our attention is right now. The whole community seemed to have that intention of. When you get to things like I don't quite have a clarity on what sort of interdisciplinary sort of stuff can help there and things of that nature. >> One thing I've not worked on that seems very natural also is control theory. I mean, I talked a lot about control loops in this talk, but I've never actually collaborated with someone in control theory, and that seems like a missing link, as well. Or I think the networking community there are a handful of people who do like Muhammad or MIT certainly has done a lot of work. >> Yeah. >> Including work he did here, that has that tasty. So there's certainly people who do it, but I think that's an under explored area in networking for sure. Because everything we do is control loops and complex interacting ones, and when they interact in ways you don't understand all hell breaks loose. Yeah. Any other comments, questions? Good. Thanks so much. >> Sure. Thanks. >> [APPLAUSE]
Info
Channel: Microsoft Research
Views: 1,060
Rating: 5 out of 5
Keywords: microsoft research, interdisciplinary research, computer networking, innovation, networks, internet
Id: VEEPJjg2pkI
Channel Id: undefined
Length: 75min 7sec (4507 seconds)
Published: Fri Dec 22 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.