The Language of the System - Rich Hickey

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

"There's no, like, person in charge of the internet making sure everything's okay."

👍︎︎ 1 👤︎︎ u/Godd2 📅︎︎ Apr 01 2015 🗫︎ replies
Captions
Thanks this is the third Cange and the fifth year of closures being a public thing and I couldn't be happier to see everybody here and a lot of good old friends and new friends and so excited about the vibrancy in the community and obviously the creativity of everybody involved so congratulations on what you're accomplishing now what I've been accomplishing is is something I call TBD and I'm a little bit frustrated because my my thing leaked you know it's like one of those Apple Apple Keynote so TBD was it mean to to better do and that should have a little trademark a trademark so to better do is a is a new massively parallel concurrent AI driven to-do list application and and our trademark is putting the personal back in P Mac that's all I have there'll be a github repo tomorrow with nothing in it and that will probably although it will ever be no so today I'd like to talk about the language of the system which is a which is a title it may not convey anything in particular but hopefully it will make some sense by the end so one of the things I think happens to us all especially as enthusiasts of languages and some some people use their language is like it's just a tool or whatever and then you're like you find something that you really like and you become enthusiastic about it and you look forward to enhancing it or making libraries for to making things to interconnect with other things and you you sort of define your world synonymously with the world that's implied by your programming language and it's impossible to avoid this right because the semantics of a language they eventually you know pervade your brain we say things in these conferences that you know people from outside the closure community be like how come you can say that nobody says aw yeah you know it's all that it's all data you know it's all the data oh yeah I I know it is I hear you here so a programming layer defines the world and and and I'm going to say language here and and I really mostly mean sort of the language in the corresponding runtime because we have languages a lot of languages at the bottom the perimeters are kind of same this control flow and things like that and the runtime sort of enhances that with a bunch of other things but but we get involved in this programming language as the world and then of course if it's a functional language like closure we get even more involved with wow this functional part this is the good world this is the world I really want to live in and everything else is sort of like the ich so I have the good world and we want to minimize the ich you know and we call it IO or something like that and by painting it as IO we almost sort of like would like to make it somebody else's problem and like Haskell is really good at this you know it's like there's a monad it's like stay out you know it stays over there we don't really force that but by by convention and discipline we try to do that but it's important to note that you know that's never been closures approach to imagine that that part of your application was not important I mean the whole existence of the state model is there because you know actual programs need to do interactions with the world need to affect the world if you're not affecting the world I don't know why you're writing software so really really is important so if we look at what constitutes a language and again sort of language what's runtime we get all of these facilities and this is in no particular order but some of the things that that really matter when we start talking about the bigger picture as being either present or missing or the analogies either hold or don't are things like a memory model right so we have this presumption in Java maybe maybe enclosure you're isolated from this but as the author of closure is author of the the primitives that guards state and memory transitions the existence of a memory model and job is super critical it's a big big promise and you know the fact that it's present it's true for all libraries written in closure or not that running the same runtime that that's based upon you know a resource management structure like garbage collector that's shared is a gigantic suite of facilities that's common both to your your language other things written in the same language and things written in other languages calling conventions this may be who even knows what a calling convention is anymore C programs remember calling convention because you had all these choices right and maybe maybe maybe even in the absence of you know who's pushing what at the staff level we still have sort of conventions around deciding whether we pass values or references even in Java though that's sort of disappearing but that would be one one aspect of it resource management like I said mostly in in the memory space we know eventually the runtimes and the languages start not helping us anymore with resources outside of memory there's all kinds of coordination right we have monitors we have volatile and things like that to interact with the memory model that help us coordinate things and again that's sort of embodied in the primitives enclosure right swap and things like that our coordination primitives that rely on coordination primitives down underneath and of course probably the biggest things that we derive from languages as we touch them that are that are more fun I mean again this there are the primitives for control flow and whatnot are any of the tools for abstraction and or type stuff and of course some languages emphasize this more than others and closure probably does not emphasize it as nearly as much as some others so that's what we talk about when we talk about programming language and typically language when we talk about system we're talking about something bigger bigger than a program in particular I'm talking about something bigger than a program so the definition of system is is the roots of it are in stands together and by that I think the interpretation I would take is that you know one leg of the stool is not particularly useful thing and a stool with two legs is dangerous but you know when you compose enough of the pieces you end up with something that performs something a useful a useful function and it's actually these systems that most of us deliver how many people how many people have a main product of their effort that is a single program that doesn't interact with any other programs how many people think mostly what they do is build systems or parts of systems right so we do that but the programming language is pretty much stopped before the system in other words a system is this composition of things whose language doesn't know anything about systems it doesn't say anything about systems this in sambala programs of course there's lots of ways to build systems and I'm going to try to narrow the scope of that because in the old days you just any two programs could talk to each other any particular way and you know that's a system and it is still a system I think over time we've gotten more disciplined about how we build systems and now we tend to think of systems as compositions of programs that offer services to other programs and that's an analogy we can draw out of what we do inside programming languages right you can get libraries that give you services as you consume the library and then you know in the process space you have services that you can call and they have certain API s and you call them and that's what happens but there there are many things about system that are very different in particular there's no global supervision anymore a lot of what we get inside the language is not there right there's no global resource manager there's nothing watching everything there's nothing that knows everything that's going on could be more than one process in the same box it could be more boxes there's no like person in charge of the internet making sure everything is okay and and the question is how do we connect these how do we connect these pieces and the premise of this talk is that there's a way to talk about the way we connect these pieces that draws analogies to the way we talk about how we connect pieces inside programming languages and it both informs the design of systems and I think goes the other way and systems should help inform the design of languages of the use of languages so when we say language what do we mean the root again is tongue it's obviously about communication right but everybody knows you know the old saw about programming is you know you think it's about talking to the Machine and in a certain sense it is but it's certainly also about talking to other programmers right so you write a program the other programmer could be you write later ten years later you look at your codes like wow who said who said that but I think it does split out a little bit right so I think in all cases all programming language and all the use of language around talked about is somehow about programs talking to programs as programmers talking to programmers but inside a programming language this is also the other aspect which is the programmer talking to the machine do machine make this happen do this stuff but that a very interesting different characteristic of the communication that occurs between programs in a system is that the language that's used there is the land is a language for programs to talk to programs almost definitely it's extremely rare see the interface on a service be one that's oriented towards people or at least oriented towards people and human interaction fundamentally it's fundamentally oriented towards or program talking to a program and that's going to become really important as we move forward so one way to think about these two these two things is as stacks stacks of specificity and hierarchy and and encapsulation so at the bottom of a programming language is a bunch of primitives language primitives for control flow for memory acquisition and things like that then on top of that we have core runtime facilities and core libraries and/or libraries from third parties and then finally we build our application libraries and our applications on top of that that's sort of all inside the program inside the program view if we look at systems I think it's a little bit harder to sort of tease out what are the what are the primitives of systems but certainly if you start with the communication side you end up with two very evident pieces to the language of systems right one is are the protocols right UDP TCP HTTP WebSockets all these things right sort of the negotiated transfer primitives that we have and the other are the format's what do we say over these protocols and I think that's pretty evident and straightforward although I will talk more about formats but not at all anymore about protocols then the analogy to the next level up though I think is an area where we're particularly weak in in having good language for it that's where the focus this talk is going to be and finally somehow at the top we ended up with either portions of applications or entire applications acting as services and or consuming each other as services and that's a system of course is it there's a joining here because those things that are the applications on the right we're written using the stack on the left but the stack on the Left doesn't have a lot to say usually doesn't have a lot to say about the stuck on the right so the first thing we have to talk about is say what again we talked about protocols and formats but formats are huge right how many different ways we have to talk over these wires what are we sending XML JSON is probably the big winner right now protocol buffers and then of course it's quite common in this room would be Eden enclosure data but there's also Avro and Hessian and Burton how many note what all of these things are not too many how many know of those people could make a matrix of two as to why one is better or different than another and yet you know this is actually pretty important right this is what we're going to be saying from one process to another it's a huge thing and it's full of decision points I think one of the things that's really cool about it is all of these things are representations of data what's not up here what key Java technology for things talking to other things is not here well that's not really what in this yeah with RMI right RMI yeah big winner how about decom Korba anybody okay these are not even on this list right they all lost that all lost for really good reasons so we're not going to talk about that we're already reached a point where every single one of these choices is of data format it's so already we've got this great premise the way services are going to talk to each other is by conveying data not through some hyper linguistic or extended linguistic thing where there's all these extended verbs and there's a notion of a program object being on a different machine and things like that we're just going to talk with data so we have to talk we have to split out what about the data is good or bad what are the decision points one is extensibility right given this format if I have a new thing to say to you tomorrow is there a way for me to encode that if there's not it's not extensible which of these things on the list is not extensible JSON there you go that's not really that's really not good and at least a couple of problems we'll get to later and there's two notions of extensibility one is two new types the others two new versions right so there's a sense in which for instance protocol buffers are really mostly about being extensible to new versions you can make things go to new types but an existing consumer can't be really aware of those but they can be tolerant of new versions self-describing which of these things is self describing XML kinda-sorta what else not protocol buffers Avro Eden Hessian and Burke and and Erlang transfer which is what Bert is a flavor of what does that mean to be self describing it means that if I have a decoder that understands the rules of the format I can read anything that you said and I don't need to know anything else out-of-band I don't have to get a description any other way that's not true a protocol right so I start streaming your protocol but for stuff it's like gobbledygook if you've never seen the schema and where is the schema in the protocol buffer stream it's not in the stream it must be transmitted out-of-band so we get to this other part which is schemas in or out of band of the ones that are self describing one of them has schemas which is that well that's optional though but one has the one that's required for for reading them now Avro toko buffers were a bro Avril has a trailered schema thing so you and you have this question of the schemas in or out of band Avro has schemas protocol buffer has schemas average or in band protocol buffers were added band but both of those have more requirements on the schema interpretation than something like Eden or XML of course XML you can definitely read it you may not understand it you can read it without anything if you have schemas they're sort of optional why does why does it matter whether or not schemas are in and out of in or out of band means on the slide if you have scheme is what can't you have if you have outer band schemas what can't you have you can't have these things generic processors and intermediaries it's really interesting that Google came up the protocol buffers imagine if the internet was built with protocol buffers how good was Google search be it would be bad right because they're in the intermediary business they are taking advantage of the fact that any HTTP HTML processor can read any HTML right if everything was the negotiated contract it just simply wouldn't work so you really have to understand it's not to say the protocol buffers are bad I'm not saying that right but what I'm saying is that there's a spectrum of choice and-and-and-and trade-offs that's really important here it's as important as choosing a language when you pick your programming language but picking any programming language now leaves you with this decision when you move up to the system level of course a lot of times this is not your choice right you're consuming a service that somebody else has made a choice and I highlight sort of the next problem within this space which is that there's nobody in charge when use a programming language the programming language kind of sorta says well we're all going to pass arguments like this and we're going to define our types like that and everything else and with no one in charge systems struggle against this set of independent decisions which may or may not compose and the format's problems is that is the first place this comes up so this this scheme is out-of-band is really tricky and that's one of the things where people like Oh JSON where I can put dates in JSON right how do you put dates in JSON as strings and how do you know they're there add AB and you go back to the napkin right it's like if the if the if the key has the word date in it then the string is a date there we go and so there's another aspect of that which is that's not merely out-of-band right if you get a protocol buffer schema out-of-band like it's not a napkin right it's very straightforward JSON is very very the people's use of JSON is extremely context dependent and a lot of times that context is not captured anywhere except on a napkin it's like okay well we've all agreed to send this and like you know this is coming and therefore you're going to go to the you know last edited field and you happen to know that last edited is a string that has a date in it so that context sensitive is really bad so obviously in this room we don't have to talk about the value of values we like values and and I think the only thing to do here is to sort of again think about the difference in differences between programming languages and systems with values so we definitely have values and systems at least at one level on the wire right we just looked at all the popular formats for transmitting stuff they're all data formats they're all values right we're not really passing a reference to a guy that you're going to bend call back on as RMI interface to go get more stuff and have this big chattery communication with objects we just convey the data that we care about so that's fine those are ephemeral and they're usually nameless and in programming languages values are often usually nameless right we have the same notion we can pass values wait we get our value as a return from function we just have it we start processing it I mean Java is not a particularly strong language for values because everything almost is a reference type but in languages that really have them as distinct things a lot of times values are completely anonymous you have an array of structs none of the struts have names if however you want to have a value in a system that is not ephemeral that means that either maybe it's large it's so large I don't want to put it on the wire and send it to a hundred people I want to put it somewhere and let the people know where it is or I want to have memory in a system I'm going to remember a value in both those cases you end up incurring a new thing which is that your values need to have names and that's a definite change versus your programming language it's one that really matters because until we start becoming more cognizant of when we're manipulating values and that this is the name that names a value we're going to keep making these icky messed up systems that don't distinguish references from values for instance how do you know when a link is a permalink you don't know when the link is a permalink because they on the webpage where you got it it said this is a permalink and so when designing a system you need to be more considerate of this and call it out so that brings us back to names and again here we sort of have this difference right inside a program we have all these great scopes right I'm in a local scope I have a let just nobody knows about this now I'm going to function I'm also sort of cool and this function is the namespace that's also sort of gray and then the namespace is on github and then what happens then we're all fighting for names on cool names on github or use all of all the characters and all the stars and robots and you know names of food and so it's really critical once you lift up as a system right and nobody's in charge anymore what's true of most system names their global I mean they're potentially global and you really need to think about that you really need to be considerate of the fact that as you start building systems as your names start escaping out of your processes that they are global names right and the really tedious things like Java's you know calm dot whatever not whatever that stuff matters right because what's calm dot whatever where'd that come from somebody who's in charge right there's that somebody in charge there in the absence of that it's a free-for-all and so those those DNS names and and whatnot become critical and using fully qualified namespace names that are truly global names is an important discipline for doing systems but it's also interesting to think about how different the names are one of the one of the things that one of the things that what are the most of your names in a program so say in a closure program most of your names 99% of your names are what they're one of two things right there either locals or what the names of functions and we have a huge huge number of names dedicated to functions in our that's where most of our names go they're mostly verbs what happens in systems who likes to work with the system has a ton of verbs that's really interesting right why is that there's all of these inversions as we get to systems aren't there right we have lots of names of verbs hardly any we have this global control we don't have global control and we are going to have a lot of names in systems with the gonna be use for other things probably not verbs machines and things like that storage locations and then these values right are going to need names another critical thing so so systems look like this every every process has a number no obviously they don't this is being lazy on Google images like that has circles and lines it's it's faster than me trying to learn how to do that in keno does anyone know how to make a line connect to a thing and stick like I moved the thing and the line is just sitting there dude can you make them connect no I can do the day I can do it there but then it's like two things and then there's the internet it has this picture so if you ignore the numbers the numbers are not important the numbers are not important but by law systems have this shape right it's fundamentally hierarchical it's not like everybody's calling everyone it's this big big nightmare right it's generally some things call other things call other things come back come back and there's some sharing across there maybe a couple lines across at a level and there may be one guy at the top you know from your perspective that's you who I get to consume all this stuff maybe they don't serve anybody else depends on how I'm situated but the critical thing here is that while each of these things in their bubble might make a ton of sense maybe they're written in Haskell and like it's proven that they're correct or something awesome right as soon as you start drawing lines between them what happens all sorts of new implications about what things mean have arisen have emerged from the connections of these things and it's different you know in some way from consuming libraries you might look at this and say well it's not different from libraries oh I have libraries is the same thing they wrote the library and they did whatever then I'm consuming it but what did the library and you share a of stuff all that runtime stuff you share all kinds of presumptions about memory coordination locking threads garbage collection the whole the whole nine yards what do you share between these things some wires routers and thing and things like that so the question is you know where do the semantics of a system what does this mean how can we define that the pieces such that we can sort of get a grip on what this is so usually it's hierarchical but that's not enough to really understand it and this is where I think we really run into trouble this is where the problem is right what what does that look like it looks like object-oriented programming right all these objects are connected and they send stuff to each other and whatever and and it's and it's possible right it's possible but that this that this system built out of all these processes is exactly like objects at scale right every process is like an object and it's stateful and it sends things over to other guys and then they change and the whole thing is really exciting because because service is an arbitrary notion was it mean to be a service you know you send me stuff on I do stuff I mean one thing that's sort of telling is there aren't a lot of verbs which is kind of good but you know all the services are still nouns the fact that they don't have a lot of operations is helpful about saying well maybe they're not like objects but there's nothing stopping them from being objects so that is that is crossed that right yeah so so in what way is this not object orientation how do we keep it from being object orientation in the large because if we if we you know spent all this time doing functional programming in the small only to build object-oriented programming the large then our system in the large is still going to have the negative attributes of object orientation so I think one way to think about this is to think about machines and product lines and things like that what we're trying to do here in the next few slides is to try to think about a way obviously we're saying change happens right we know that this is a dynamic system it's producing stuff it's affecting the world that's the point of it so we're not going to try to deny that but what's a way to organize it such that we don't end up with object mess and one ways to think about it like this this production line thing so what does a machine do a machine applies forces to accomplish work now think about like car factory times in a car factory well people go in there every day and they work real hard and they mutate the state of the car factory and then they go home right that's like objects that's like an object-oriented program right maybe you know some stuff like no it's not like that right there's like one end of the factory and something comes in there what raw materials parts you know things iron in tires and stuff right em and then something comes out the other end what hopefully cars right and so this notion of a flow I think is the key to keeping keeping a system sorted so there's a bunch of characteristics that you can combine that will even though that they technically a certain percentage of them are not functional accomplish something in a way that is not place oriented right if you've heard me talk negatively about place orientation right that you know we all into the factory and had a good time and went home and like the factory is now better on this place orientation and this kind of flow orientation it cures that so what are the what are the things that we have went in in flow we have transformation right we're going to so one of these are going to be doing is transforming values so I'm going to take you know the lugs and whatever things go and tire I'm going to screw them together and now I'll have a wheel instead of the parts of a wheel we're going to move things from one place to another we're going to route them maybe it needs to go here or there we're going to have decisions about that we may remember things right and again the word remember is a term that that is not incompatible with functional programming in a way that update is and I think the critical thing to sort of making systems out of these parts is that you as much as possible keep them separate in other words when you make a transforming moving routing remembering thing it's really going to be hard to keep that from being a something you can't take apart and reason about or combine with other things right so even though each of these steps I think this has this has a sound use if you were to put them all together in one thing it would not know would not be sound anymore so you want to keep transforming separate from moving and moving separate from rounding rounding separate from remembering it's like that and this is the difference between flow and places but move and route and and remember are not strictly functional that's okay we know we need to affect the world so transformation this is the thing that's easiest right we know transformations just functions right it's basically straightforward the only thing here is that generally there may be some input to the function which is now not just sort of a local input from a call from a programming language but it's coming over a wire and there's output over the wire the thing that gets a little bit trickier sometimes with functions at the system level is that sometimes you need to convey information out of out of you know off the wire you know I need to put it you know in a database so that you can see it later and I'm not going to actually put some huge thing over the wire to you in every message and in that case you now have this sort of stranger view where I need to run this function and what I have is not the value but what the name of the value and I'm going to try to distinguish the name of the value from a reference because they're actually different so sometimes you work to and from storage otherwise though it's still functions this is not straight this is not hard now we get to moving things around I think it's one of the things enclosure maybe I didn't make clear enough I didn't need to wrap them is that the queues in job you took concurrent are awesome if you're not using them as part of your system designs internally you're missing out and in the large queues also rule because they have this really great characteristic they're completely decoupling right messages what it what happens with the message a says something to be when a says something to be what is a need to know be right that's a problem if a puts something on a queue who gets it don't know so that decoupling is really good both in the identity of the consumer also in the availability if I put something in a queue and and the person is post consumer is not running and what does it do I care not usually there may be backflow on some other kind of considerations but the availability of the consumer is also something that you don't care about right again a directly connected message a said something to B if B is not around that's now a problem for a if a put something on the queue presumably if you can make the queue more available than B you get this you get this independence both in the identity of the consumer and the availability of the consumer which is extremely strong the other great thing about conveyor belts and queues is that what do they do what's their job move stuff what's their other job there's no other job that's all they do right so it has that characteristic we we had from before I mean when you get to pubs about you really you end up with routing and moving and they're both on this slide but that's that's really strong cues are extremely important cues are decidedly different from messages right for those reasons messages they need an available consumer and you need to know who you're talking to it's architectural II completely different all right now this memory this is the part that's really tricky right because you do not have a ton of great options for memory that are not place oriented there's a new thing that's kind of good for this but but but you don't need to even use that the key point I want to make here is that the epical tie model the one that's behind closure it works in systems it works at the system level I'm going to show you the picture again later but the basic idea is what we have reference types right and we have values and the reference types only ever contain values they only ever just point to values and they have semantics about how they transition from one value to the other there's nothing about what I just said that is about closure that is about memory that is about locking there's a little bit that's probably about Kaz but not Kaz on the chip it's a very very general notion and de Tomic implements that notion in a large but you can also implement it yourself right and you're going to need to combine a couple of things you need to combine naming values with some sort of reference and some sort of ala carte coordination so this is my old slide of the epical time model closure implements this right we know Adams or this refs or this agents or this and we can do this ourselves what we're going to say we have a reference it takes on different states over time each of the states is a value you're able to obtain the value out of the reference as an independent thing and we just said before about values in systems that you're going to need to get a hold on are going to need to have what names they're going to need to have names that's what's different and then we can transition from set values to values so we can see this in action in in the way de Tomic uses zookeeper and things like react or s3 so react and s3 don't have the semantics required to do the State succession right they don't they don't have what you need to do that you need something along the lines of either Kaz or versioned updates or something like that but zookeeper they have that they have versioned updates so you can combine them and you can implement something like refs in zookeeper that point to values that you store in something like react or s3 or some store that doesn't otherwise have the consistency or the ordered transitional semantics and you can pull tools out like about right right now and do this for yourselves so the important thing to note is that the closure state model is available at the system's level you do it this way and the only thing you have to do is put names on your batteries what's a good name for a value your UID is that's - no that's good Stu's always my spoiler yeah UUID what's not a good name Fred I got this from wherever or anything else because what starts to happen when you have those those kinds of names people start to care about them what should you care about about a value name nothing at all also because a lot of places where you're going to be putting values you really want to be conflict free you don't want to have to coordinate our value you want to keep rojos this Fred 27 or Fred 217 or you know whatever you just don't want to be there so you you IDs are a good good thing to use to name values you don't care because that's not the identity right what's the identity the one over here right what you're going to have very few of so for instance the atomic you can have like hundreds of millions of items in day Tomic you know how many reps you're going to have in zookeeper for a database three you know it now right you build systems enclosure how many reps you end up having how many atoms a tiny tiny amount probably the best thing about closure is showing people how little of that you actually need it's the same thing here but the the strong names right the globally qualified namespace names will be the identity names that's really important that they be like that the value names you want to be a conflict-free tear off names that anyone can create without coordination and that's what a UUID is about all right of course is my favorite topic errors and error messages and whatever so so this is really important paper at the bottom here and if you read this paper over and over again which I recommend you're going to see a couple of facts about systems right and and and it's another way in which systems are really different from from programs right in a program what are you really you're like afraid that some objects you're going to call is not going to be there no the whole program tends to like be around or not like altogether it's like it succeeds or fails all altogether we get all confused because we live in this bubble it's like well errors are like when I made a mistake that's not right that's just like programmer convenience thinking right in the real world failures are like there all the time right the things that you depend on are possibly not there all the time right a large system is in a state of partial failure almost continuously right the the math is against you for having like all of your 10,000 machines always work all the time so parts of your system right when you look at the whole thing will not be working it also means that those things that are not working will not be available right those failures are going to be uncorrelated they're going to be completely independent right you still are fine but somehow the thing you're talking to has become unresponsive or unreachable or whatever and it starts to give you a whole new way of thinking about dealing with failure right because the things you're talking to are unreliable you have to use timeouts you have to retry if you're going to retry well you have this open question I mean I might not have heard back from you but you might have heard my original request and done it so I need to know that my future requests are idempotent who is worried about that when you're working on stuff in memory inside your program you don't worry about these things at all but the thing is as soon as your program becomes part of a system this these error modes are going to go right through your program you're not going to be able to deny them and I can be able to convert them into something else you can't fix them right they go right through you as soon as they go right through you you realize that distributed error modes are the only error modes everything else is just like program or convenience error handling stuff but it's not really what the systems error modes are about so I definitely recommend that you read the paper because you can't think about it often enough and it really is difficult to internalize and you'll still write systems where you presume the best and then you're like ah the best thing is not going to happen sometimes so the other things about systems is that they're dynamic and they're dynamic in a whole bunch of different ways right they're dynamic in membership where you just said some machines come and go sometimes they'll come and go on purpose right not because they failed because somebody started some more machines they'll come and go for capacity right as people trying to scale they'll also come and go for capability like the system will be running and all sudden somebody wants to do something new and they'll start up new stuff and systems that can become dynamically capable of doing new things or really strong systems it's the kind of system that you want to pursue and so all new kinds of terminology is going to come to bear at the system level that you don't have inside right you can't scale one box but you can scale a system right it's not usually the same notions of discovery right the somewhat you know maybe if you're talking about injection and things like that but the true notion of discovery is a distributed thing elasticity is the same kind of thing so so we know that systems are dynamic that has implications for the programming languages so there's a holistic approach to this right and there's a great example of the holistic process which is Erlang Erlang is a language of the system it takes the approach of saying I am only going to be building systems I know that upfront and I want these semantics inside the processes I don't want different set semantics I don't want my bubble semantics and my system semantics I don't my bubble interfaces and my system interfaces just so you're not worried this is not where I say we should all switch to I just saw everybody's like oh my god did he change his mind already it's only been a couple of years no so there's nothing wrong with holistic approach right at any erlang the fundamental units of programs or services they call them processes but there's their little services they have communications capabilities right but they follow all the things that we talked about before in particular it's not like RMI right those little services are not like objects they send what messages which are data right their data it is though custom communication that they use and and there's a very specific model baked in to the language and the basically said we are going to do actors we are going to do asynchronous send only receive asynchronously no synchronous communication RPC you have to build out of pieces and things like that so it's very very specific model here which i think is extremely well-suited to making communications programs but what's the trade-off with the holistic approach is it Erlang a great number crunching language no is it is it really expressive in certain kinds of domains no right that's definitely it's good at some things and less good at other things doesn't have a rich type system it doesn't have a rich abstraction model or other things so the trade-off of a holistic approach is sort of you put all your eggs in one basket I think the fact of it is you're never going to be able to dictate to everybody to use Erlang or use any one thing you can't say we're all going to do our programming in this one language right that's the whole there's a king of the world thing inside you know Ericsson maybe they can do that because everybody's going to do Erlang but in the world on the whole I don't think you can sell holistic approaches so you can't convince everybody to use the same language even if it's better so that leaves us with the heterogeneous approach right we have to have some sort of cross language notion of how to talk about things how to express the semantics of systems and what the language of systems are that crosses languages and runtimes and platforms like that and as I said the beginning right we know parts of that language are protocols and formats and I think the the third part the thing that fills in this box are things I'll call simple services so a simple service is a service it's its own process right it does communication using data should have a very small surface area in terms of the API right if the API is mostly data it should have an extremely small number of verbs associated with it and it should do mostly one thing and you'll see that a lot of the facilities of programming languages and runtimes are now available as services right so we have queues right we have Java util concurrent queue and then how many message queues are out there tons tons all with different characteristics and you know you'll make different choices but there are plenty of message queues that are dedicated to that now unfortunately this says simple and you know if I knew how to use keynote that would be blinking and like on fire I saw a fire was good right that's super important and and I think one of the challenges for for this approach is invariably people would like their service to like do some more and making it do a little more Olsson breaks the simple part so for instance queues usually have very very icky durability things like once they start to get into that space and also an Wow this is not not simple anymore coordination things like zookeeper are extremely interesting if you've not used it or something like it it's very cool to think about all I have over here is just coordination and if you can constrain yourself to that across again so you keep her adorable and you can try to treat it like a database and now you're trying to make it do more stuff and not use it use it simply because it does do more if you treat it simply it's a fantastic little little utility just to do that part of the closure state model or the whatever epical state mode control flow right you have things like Amazon simple workflow right and storm we just saw an example of storm before look at storm what is it it is what I'm talking about is phloem Oh although again it sort of says this is the recipe that crosses all the pieces as opposed to saying we're going to compose qs+ arbitrary consumers of cues and other cues this sort of says I want to wrap around your whole thing and I want you to play this coordinated game so again there's it's less simple then it could be but as an architectural strategy it's an example of what I'm talking about it's flow oriented right we're used to memory services right memcache is a beautiful thing people like memcache bla bla bla most of the problems with memcache is people are using it to solve horrible problems with using place oriented databases that's a sucky problem that's not a suckiness of memcache right memcache is brilliantly simple it does exactly one thing oh of course they keep trying to make it do a little bit more but it does the one thing it does really well so that's shared memory Redis is another popular example right again hopefully they keep it simple and to the extent they do it's the kind of thing you can compose together and of course storage has exploded s3 is global shared memory it's an awesome thing except what shared memory is dangerous right but we know how to make shared memory say but closure has shared memory uses it in fact it's quite fundamental to closure that you have shared memory and shared memory is important right you just have to be careful in using it if you combine the reference to immutable objects you can use s3 just as safely you can use a key value store just to say for exactly the same way the only trick there is the transitions of the refs needs help from things like zookeeper but moving up the stack like DynamoDB has that semantics built into it a lot of the memory caches like in Finnish band have it built-in so you can get it you can get both together like we have in memory in systems so you want I think we want more of these and want them to be smaller still and to do to do as little as possible so I think one of the problems we have here is we there is something that we really like inside our programming languages an important tool which is the interface or the protocol right it's the thing that attracts away from us the details of what we're talking to where is the interface for s3 right in a different audience to be people like gripping the arms the chairs like no we've solved this right we use wisdom and then I use a BPEL thing and I draw these pictures and like I have dive systems and we're just naive in here because we like to build things out of smaller parts and with this we should be up there now I mean there are things like that right but they don't get used my ambled amazon did not use wisdom maybe they tried do they try early on same way remember whether any scheme as ever there used to be right now it's just like go read the docs try it you know and when you get it right you'll get a good you will get a 404 so you just don't see it you just don't see anymore and so what you've seen now instead is you know s3 is so dominant that when OpenStack wants to have the same kind of service they don't have any abstraction to tap into to say we also implement that abstraction what do they have to do they have to directly imitate the protocol of s3 this is not a great place to be same things happen with memcache right people like oh memcache is cool right and people are like well I have this other cool distributed redundant memory cache it's like well I use memcache but I mean this is more better but you know in this way no what do they have to do mimic memcache on the wire this is really a bad thing and I don't know what the answer is because I don't think wisdom and things like it or the answer either but it leaves us in a difficult it leaves us in a difficult place this is an area that we can repair inside the programming language right there's all kinds of variants of put stuff out of place like s3 some of them mimic s3 and some of them don't but something like J clouds right can go and isolate you from that right so it's superimposing abstraction now there's two ways to think about doing this right that super imposition of abstraction happens where is it a service is it who knows what J clouds is all right fair amount so J clouds is a library it's a Java and closure library that has an encapsulation both over sort of like the ec2 elements of cloud services and of the storage and we just think about the storage right now that this thing called blob store and it tracks away the details of connecting to s3 or connecting to you know open stacks stack or to whatever VMware sells or whatever another vendor has and so they've given you abstraction inside language if we don't want to do this inside what do we end up with what's the system version of this proxy and that you tend not to see why as a hop right it adds a hop in and it's like that but it's still tricky we don't have interfaces and I think we're suffering so what can programs tell systems what can systems what can our systems learn from our programming one is we need more values values need to be first-class we need to name them we need to start using that epical time model in our systems designs you can do it yourself today just showed you three ways to do it you just have to choose to do it right you have to take this flow orientation right this is something you may or may not be using like people talk to me a lot and closer to like I love clothes I have the functional part I think I'm getting a grip on it and every time I try to get the state even if I use the state stuff from closure still end up sort of struggling with a model for the whole thing the model is this flow model right just flow values around use cues inside your application it's not like this tribulus --is everything you need to do but you can do a lot by just emulating this inside and of course if that's your best practice inside it's nice to convey it out this is the way you're going to get more reusable things and things that are easier to compose I do think we're struggling with any kind of abstraction we know it's good but we don't know how to do it at the system level and I think the biggest thing we suffer from here is a well yeah how does somebody else provide a service like s3 and let you try to use it but the B side of it is what if you're trying to be a service and you're trying not to build the in durability into yourself like you'd like to be playing this game well and saying I'm componentized right well in a programming language we totally know how to do this you say I'll work with anything that implements this interface or anything that implements this protocol we now have a way to say that and and the person who wants to compose you with something else has this recipe for doing it now what's the system's way to do that what's the system's way for saying I'm parameterizing belen my storage it's really difficult a URI is not enough right you mean you need to know what what method to talk over so what ends up happening right now is your service needs to embed something like J clouds are an implementation of an abstract thing and you need to individually support what your users are going to need or provide an extensible mechanism but you're doing it inside yourself as opposed to sort of saying at the system level I have a way to say this is an interface that I use so that you can plug in the kind of storage do you want with me so we're suffering there like the system's tell programs I don't think I don't you know there's great papers great old papers that say do not try to make a distributed system like your programming language and they're totally right especially at the time they wrote it which when objects were hot and people are trying to do korba and things like that terrible terrible idea but we should also be able to pull so but some things are important like functional program is important I think it's not done a lot in systems what consistence tell-tell programs well the one thing is this machine-like thing right maybe it's easier to see when you have wires right it's quite obvious the only thing I can send over the wire is it is a value in XML so I've chosen to use that but now like well in this audience I'm going to say this but in John but people have a real question right they don't tend to send data structures around in their interfaces the way we do and they have this real choice I can send a data structure or an object that has like all these verbs and knows how to do stuff and changes and dances and I might as well send that it's only one argument it's a lot easier and I don't have to type and in fact the IntelliJ will just type it for me but so I'm so I think in closure we're kind of spoiled right because we do this all the time but it is something that if you're trying to talk to somebody you're trying to talk somebody else is building a system about maybe they should bring this architecture inside their program you have to make the rationale from that systems level this makes sense and systems and you explain to me why it doesn't inside the program because I don't understand why it wouldn't the other thing is this programmatic program to program interfaces rule right where do we suffer when we don't do that when we when we only define a human interface so we're defined a human interface first where do we suffer every single time we do it every single single time right anybody ever trying to write a program that manipulates any UNIX program yeah is it fun yeah yeah three parsers you have to figure out how the command lines work and all this other stuff I try to manipulate get from a program it's like terrible I just did it it's not fun what else is an example of that sequel right in both these cases they wanted to support somehow some persons going to be seeing at the computer and then you're going to want to like do stuff and they're going to go blue and go and it's got to work and there's nothing wrong with that you know that use case is important you want to make that happen but when the only interface you define is the one for that you end up with no programmatic interface so what we have in sequel yeah well this is simple you know people will say where and blah and that's really great and what do we have for programs string building we got nothing we have nothing to work on so build your human interface on top of a programmatic interface because programmatic interfaces are all you've got in the systems level always typing into Amazon AWS services and was like oh I'm going to like use s3 you know they don't do that so you wanted you want to have the programmatic interface underneath the systems failure model is the only failure model you have to look at all of your error handling from that perspective as soon as you do you realize they're not going to be a lot of places for the I made a mistake flow it's got to be dominated by the the system is partially unavailable flow systems are dynamic and data-driven it might be a nice idea to use a language that was also dynamic and data-driven again in this room I don't need to say that so I think people are building some great libraries I'd love to see more people build some services some simple services I think this is a tremendous opportunity area for closure closure is really really well-suited to building these things and if you build these things it's going to give you the inroads into your into your your organization's right oh can I build this new thing in closure I don't know well I built a service you want to use it oh well yeah what does it do it does this oh it's nice it's simple does this one thing right and we're seeing some of that like the Reimann thing right who even knows is closed well it's this cool logging thing but does one job it doesn't really well it's a service like thing there are tons of opportunities we just saw a bunch of things that we're done and storm is really great and things like that but there's lots more and when you build something like that you're going to end up something that's much more reusable than a library now things will have to be libraries and libraries are great but I'd encourage you to build systems I'd encourage you when you do it to avoid custom formats of course again in this room I don't really need to say that there's a good format we tend to all like it and we will try it yet we'll try that first even though you don't necessarily have a means of expressing at the system level the abstraction of your service design it anyway right at the point you know there's always all this stuff about a premature abstraction whatever definitely a danger by the time you're writing a service there's nothing premature about abstraction the thing has got a surface area this big it's you're going to spend time on that there's no problem spending time on that it's never not worth it it's never going to be well it's overkill you wrap this thing with the thing you know down in the small and a program you can over abstract up here you can up here I mean unless you start making a lot of new layers before your service you want to have some abstraction consider a second implementation over your interface like maybe you've decided for speed you're going to use your avro or something like that but if you also design an HTTP interface you'll sort out your abstraction just by that exercise it still doesn't give somebody the ability to say I'm going to make something like it with the same shape but it will make your service better and the other thing is to design your service to be composed and again I think this is a challenging area right don't keep adding stuff inside yourself you're going to make a little monolith you're going to become a stack yourself you don't want to become a stack you want to allow people to plug in right if you need to store stuff consider using something like J clouds now you don't need to store disks are terrible who wants to write and program disks you know it's a solved problem so as soon as you get to the oh I need to put something somewhere plug in something like J clouds you know or anything or me you can roll your own whatever it has to make sense for your your thing but make it so that somebody somebody doesn't say oh I'm taking you on and I'm taking on the fact that you store stuff over here don't do that let them say this is how I want you to store let them make things composable let them say this is the kind of queue I want you to use this is the kind of storage I want you to use to the extent you can do that you'll build a system ponents that can become parts of systems that are built of services that are simple and that's it
Info
Channel: ClojureTV
Views: 141,510
Rating: 4.9513593 out of 5
Keywords: programming, clojure
Id: ROor6_NGIWU
Channel Id: undefined
Length: 62min 49sec (3769 seconds)
Published: Thu Feb 28 2013
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.