"We Really Don't Know How to Compute!" - Gerald Sussman (2011)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
👍︎︎ 1 👤︎︎ u/lambda_abstraction 📅︎︎ Mar 20 2021 🗫︎ replies
Captions
hi what i'm going to tell you about today is that i actually think we're in real trouble that is i think we actually have the foggiest idea how to compute very well and i think that there are some small you know glimmer of hope on the horizon in a number of different directions which i may point out okay but i think most of the things we've been talking about even here are obsolete i thought what i said to uh somebody after the haskell session i went to there was a workshop yesterday is gee this is the most advanced of the obsolete languages okay and what i mean is this okay see that you now see the triangle that isn't there right everybody see the triangle that isn't there good okay what did you do it took only 100 milliseconds or maybe 200 milliseconds okay that means between your eyes and whatever it is that knew it was 100 milliseconds certainly if you had to put on brakes if you saw something like that right and it was crossing the street okay you know you know that's only if you had to build that's only a few tens of neuron times i don't care how parallel the machine is i have the foggiest idea how to write a program that organizes a complex process like that in a latency of about 30 or 40 steps now that's that's an example of something we don't understand how to do okay and it's more serious than that okay a human genome is about a gigabyte that's the instructions for making you from a cell okay it's you know your pre-complicated machine gigabytes about the same size as the source code appropriate for say making microsoft windows right it's not much bigger or smaller it's something like that okay so yeah we have all the possible things that it could possibly do and all the drivers that might be relevant and everything it's probably that size it's not can't be it's not much bigger let's put it down it's not three orders and i do bigger and a gigabyte is a a gigabyte is not that big but what's interesting is that with a small change to it you make a cow instead of a person so it's flexible too so we've got this enormously flexible language there that we haven't the foggiest idea of what it's like i'm not talking about dna dna's you know this that's that's something like gates or something okay what i'm interested in is the high level language that's necessary to do things like that and we certainly don't know how to do that so now what can we say about that well i'll tell you one little thing but maybe there's some hints you know these biologists do nasty experiments on animals i suppose you took a uh salamander you know a salamander can regrow its its limbs if you cut them off so all these nasty biologists do things like they'll cuddle they'll cut a limb of the salamander hand the arm off over here at the show just between the the shoulder and the and the elbow and the between and at the same time cut the same one between the uh elbow and the hand flip around that segment and resew it back on nasty thing to do to the salamander but what do you think you get does anybody know is anybody here no no you get the three elbows it grows two new elbows to make up for the fact that the that the wrong thing is connected to the wrong thing okay it's pretty weird okay but but there's a clue that's a clue to something that we should be thinking about about how do you talk about how to make a bazillion dollars we're talking humans like 10 to 12 cells okay a salamander is probably 10 to the 11th or 10th to 10th i don't know okay but the fact is that how do you make a a how do you how do you make that something big like that grow itself the little neighbors talk to each other and they say oh boy you know this i'm supposed to connect to a thing like this but ain't there so we'll make some of that okay and that's the sort of amazing kind of process that's going to be necessary if we want to approach the future in our computing because by golly it's very it's almost impossible to set up a situation where we understand everything that's going to happen in the future let me give you a get let go back in past a bit okay when i started computing in 1961 this was a this was the computer this wasn't this particular one but it was an ibm 650 at columbia university that i was competing at okay that machine cost a half a million bucks it was it had 2 000 decimal 2000 words of 10 decibel digits each of memory on a drum that rotated so the access time is 12.5 milliseconds okay so they take a 10 milliseconds cycle time and that the total amount of memory is probably about 10 kilobytes now for the price of that machine i can now buy 50 000 pcs okay literally okay each with a gigabyte of ram that's what's it that's 10 to the fourth times as much and with uh uh and which were about a million times faster so one of the things has been happening over the years i'm not trying to worry about mr moore's law anything like that what i'm worried about is that all of our all of our sort of intuitions from being programmers have come from a time of assuming a kind of scarcity a world in which computation is expensive where memory is somehow expensive where we have to optimize it okay well there are some applications like that but the answer is most things are not like that at all right now memory is free computing is free it's our problem now to figure out how to use computing like that and you know if i had 50 000 pcs hooked up with an appropriate network which would probably cost a million dollars at that point modern dollars okay well then you buy them in bulk uh by golly you know you could make a pretty serious computer out of that that by this machine took 30 kilowatts for the if you're interested i have the power plug from the 1794 which was somewhat more advanced but that's a different problem okay so it is no longer necessary to minimize operations it is no longer necessary to worry about to worry about how much memory we take up it is very important to minimize latency okay that's critical we have to understand that but the real cost of computation is not that at all it's the cost of programmers okay here's a uh a quote from mr evans from glasgow made a while ago it turns out that almost all the cluster computing is you and me okay that's a different world that's the real problem okay so what are we going to do well there's lots of things people worry about over the years we hear lots of pundits worrying about things i know people worry about correctness gee that's a problem is it the real problem maybe probably not most things don't have to work look at google if it makes a mistake it doesn't matter okay so long as the next time he gets the right answer right or maybe something nice or close to the right answer something called reasonable it needs a reasonable answer okay uh what other kinds of things are there well people worry about about security well they've got no idea how to handle security uh we'll find out for example that uh humans managed to make it for about 70 years would be continuously attacked by parasites gee there must be some clever things we're doing okay it's all in that gigabyte of code hey it's we've got to think about that we don't we certainly haven't got any any good ideas about how to make how to make things that last for a very long time when continuously attacked by mutating parasites okay that's okay right okay the other thing that seems really important to me is that we spend all our time modifying existing code okay the problem is our code is not adequately evolvable right it's not adequately modifiable to fit the future in fact what we want is to make systems that have the property that they are good for things that the designer didn't even think of or intend that's a big problem okay that's the real problem i want to go after today a little bit okay so what's the what is this problem the problem is that when we build systems whatever they are we program ourselves into holes i've done it a number of times i've been programming as i said since 1961 i found more ways to screw up than anybody i know okay and i've learned a lot of hope but not an infinite amount okay and let's say so here's the here's this problem okay how do we keep ourselves from from programming ourselves into holes what is the what does that mean it means that we make decisions early in some process that spread all over our system those the consequences of those decisions such that the things that we want to change later are very difficult to change because we have to change a very large amount of stuff that's the problem okay and i want to figure out ways that we can organize systems so that the consequences of decisions we make are not expensive to change we can't avoid making decisions but we can try to figure out ways to minimize the cost of the of of changing decisions we've made okay so let me start with a simple a simple example one that is so familiar that we all do it these days certainly in in good in good dynamically uh typed languages like lisp or python or whatever or in in good statically typed languages like haskell or whatever okay we always have something like generic operators they call it operator overloading and things like that okay let's talk about that for a second because i'm going to be very extreme about this okay here's a here's the traditional arithmetic you would find in scheme okay i can add integers together great i can add introduce to rationals i can make new rationals i can add introduce to quote reels okay they're of course floating point thingies that are very dangerous nothing brings fear to my part more than a floating point number i mean numerical analysis the blackest art i know and you know they just without worrying about this it's terrible but then again but we can deal with things like complex numbers and and exact complex numbers which are integer parts and things like that and that's okay okay but you know those things are built in okay how about dynamically extensible generic operations now we can do that of course if you do a separate compile time and interpretive time or run time something weird is happening here and i want to be able to dynamically extend things while my program is running i want a program to compute up what it's going to do okay remember in the old days when you used to program an assembly language or something at least i used to do that yeah it's a great language because you could write the next instruction you're going to execute you could put it into the place where it was you were going to go to it just was and you know that many interesting programs were written that way and it was very flexible there's too many fearful people you know i can see a lot of people cringing but the fact is that you know you could do that lisp also okay you can eval something you call it concept okay that's boy is that useful but in any case okay so here i have i can do things like expand this arithmetic on functions you know here's an example i can add together sine and square right over there okay and i've got a new function because that's a classical thing people can do in algebra or in in mathematics you say if you've got two functions with the same kind of input then they can they can the sum of those is in fact assumed to be the sum of the outputs okay it's a as it's a generic extension okay that is for things that are of type procedure if you want of the right same the same number arguments i could talk about the sum of the square oops of of uh sine and cosine applied to 2.5 and i got one that's right that's what i should expect okay good okay that's an example of a simple extension there's gonna be seven and by golly you know you have all these nice things i'm looking at watching uh somebody talking about scala z today and of course you can do things like that okay but you then you get to things with let's start things better there's symbolic algebra okay now this is a symbol this is a again a generic extension i'm going to explain how it works except that i can tell addition and subtraction multiplication division all those things to deal with symbolic entities as well literal numbers okay and what i get is a symbolic expression as a result of such computation okay and i can of course the nice thing is that's a fragment of a program that came out so i could write a program that writes a program and i can use it i'll tell you a good example of the use of that okay it's a numerical analysis example build a gear integrator does everybody know what a gear integrator is anybody know what a gear integrator is it's for something called stiff equations differential equations okay turns out there's some polynomials that you have to compute up and if you compute about ahead of time you only can do you know having a step or doubling the step size but if you want to be able to make the step size any size you want you want to be able to compute up the appropriate polynomial at the moment so you write a thing that symbolically does that then you pass that through the compiler as you need it and then you you run it and use it in the same program that's sitting wrong integrating along turns out it's a great great thing to do and great fun okay but the point is that you can get essentially machine level machine quality code you know you're getting fully fully uh fully native code efficiency except at the moment where you're changing step size to a financial step size okay that's a useful thing that's why i like list because it has uniform representation of programs as data so that symbolic manipulations produce stuff that i can execute okay now but you know you could do something even more wild you could did you know that automatic differentiation does that everybody know what differentiation functions is and automatic differentiation can be done as a as a generic extension what you do is you create hyper real numbers numbers that look like x plus delta x the same way the complex number is real part plus uh j times the imaginary part of i if you're a mathematician j if you're an electrical engineer that's because of a french invented use j for user i for intensity which is current but in any case in any case uh there's a you could do this sort of thing okay and you could say the derivative of the cosine and the sum of the cosine the square function applied to x is now i'm using of course my symbolic parts as well so these things all mixed together okay this is the uh sum of two x and minus sine x so two x minus sine x okay and i can you know i can look at something more complicated a derivative of a function of two variables which is in fact a structure of the two partial derivatives which is what it should be because if i multiply that by an increment in the incr the arguments i should get an increment in the answer and that's a matrix multiplication this is you get a row vector coming out okay that's the sort of thing that you can do very easily of course i'll tell you the trick of how to do this okay just for fun okay for those of you who are interested in things like this kind of trick this is the kind of thing you're interested in so if you have if you take every possible that was it that was pulled out of abraham lincoln actually used that that uh comment uh if you have a if you if you extend things x plus dx and you have all your primitive functions including things like addition subtraction multiplication division okay and things like that now are tanned with two arguments and things like that produce an output which is f of x plus d of f x times dx where dx is the thing that corresponds in the that's uh the imaginary unit in imaginary numbers okay then you can then then what you can do is you can just push things through that and extract out the df of the df which is the derivative of the function okay now that gets you the chain rule instantly and automatically right that's it so you do a generic extension of all your primitive operations and you get the automatic chain rule which is right and that's hard to do symbolically without any other way but this works also numerically at full speed whatever you want okay it's producing it's producing a composition in the same way you have combinators okay so that's a useful kind of thing and i actually use this all the time because i teach a class in fact i'm teaching one this term in advance advanced classical mechanics starts with lebron's equations and goes through canonical perturbation theory and i use computer programming in my class and to for these graduate students of physics and things like that use computer programming in this class to actually clarify the things we put right on you know when we write down these equations because mathematical notation is both awful and it turns out ambiguous and to be perfectly honest it's into it's impressionistic right it's very simple it's look what i'm talking to you in english it's the same phenomenon mathematicians talk to each other because they're almost all alike that's the way reason why you can understand me is almost all identical okay and so there's only a few bits have to be transferred in order to get an idea across maybe a mathematician's painting an idea in the platonic world of another mathematician's mind okay by talking to them they write down basically uh impressionistic scribbles the same way my speech is not grammatical beautiful english okay but in fact uh when we when we write programs we have to be precise because computers are dumb at this point and as a consequence that's a great way to show someone the real story okay and so i can write things like this which is the definition of of the method of computing lebron equations right here from a lagrangian given a configuration space path now for those of you who don't know this sort of thing doesn't matter i'm not going to dwell on it but what this basically is is you make a state path out of a configuration space path which is a tuple of three things the time the uh the path the path value at that time and the derivative of the path out of the velocities okay and you have to take the the derivative with respect to the time which is the capital d here because that's derivative one argument of the result of getting the partial derivative respect to the the the uh velocity argument of the lagrangian uh and you'll compose that with the state path et cetera okay and you get the right things here and by golly if you take a harmonic oscillator or whatever you like you stuff in lagrange's equations and out comes out comes the actual residual which is equal to zero which is lagrange's equations okay nice okay now if you want that doesn't matter for what did i say but you know that's yeah that's that's that's easy stuff what if i had to do something with them no i need it what if i deal with what if i'm dealing with something with the with a couple of rigid bodies that are that are wobbly and have wall sensors then i can't write the equations down on this on this board okay in fact it's something if you do celestial mechanics like i've done sometimes the equations are 20 pages long okay and by golly it better be done by a computer you the old days people did it by hand they got it right most of the time it was amazing it's amazing i can't do what what people did in 1911 or whatever but it's quite amazing that they could do that sort of thing and now we can do this easily and it turns out that this is very simple because of the simple generic extension okay now i'm not trying to claim that solves any important problem okay very careful about that okay that this is this is a little tiny feeling of what is it needed to make things more flexible consider the the value here the value is that i'm dynamically allowed to change the meaning of all my operators to add new things that they can do dynamically not a compile time at run time okay i can programs can produce things which are the new way to add okay that sort of stuff and they can attach that to edition and they can take it off and all that and it can look like it can look like binding i can bind that idea okay that's wonderful okay what does it do what it's doing it's giving me tremendous flexibility flexibility because the program i wrote to run on on a bunch of numbers with only a little bit of of tweaking suddenly runs on suddenly runs on matrices so long as i didn't make the mistake of commuting something wrong okay if i did i'm in trouble right so this makes proving the theorems about things very hard in fact maybe impossible and that's the sort of cost the cost is and the benefit are are very extreme okay i can pay i could pay correctness okay or proofs of correctness or belief in correctness i can pay that for tremendous flexibility right there now again we worry about ideas like is correctness the essential thing i want look i'm not opposed to proofs i love prince i'm an old mathematician okay the problem is that proofs putting us into the situation that mr dijkstra got us into about 30 years ago 40 years ago where you know you're supposed to prove everything to be right before you write the program okay getting yourself into that situation puts us into a world where we have to make our specifications for the parts as tight as possible because it's hard to prove general things except occasionally sometimes easier but usually it's harder to prove a general theorem about a about something so you make a very specialized thing that works for a particular case you build this very big tower of stuff okay and boy is that brittle change the problem a little and it all falls over we didn't learn something from the digital from the fact the electrical engineering i did the digital abstraction whereby the inputs to something except much bigger range than the outputs are allowed to produce and therefore you get rid of noise at every stage that's that's the kind of flexibility that i i'm expecting to get and i hope to get now remember the old stuff you by the way when you do this kind of generic thing extensions it doesn't mean that the old stuff breaks it means the new stuff is not is not proved got to be careful about that okay so with all many of the techniques i'm advocating make proof much more difficult and perhaps impossible now i'm going to now be a little more extreme i haven't been extreme yet this is this is the beginning of stuff okay i think one of the problems of the is the kind of architecture we're assuming everything in the world is looks like at least looks like my 650. it's you know it's very occasionally people have made wonderfully things a gpu is different right but maybe i suppose the connection machine was an experiment which was different okay but you know in the future it's going to be the case that that computers are so cheap and so easy to make that you can make them the size of a grain of sand complete with a megabyte of ram you're going to buy them by the but by the bushel i suppose you can pour them into your concrete okay you buy your concrete by a megaflop and then you have a wall that's smart okay so long as you can just get the power to them and they could do something you know that's going to happen okay it's sort of the way i said remember your cells are about are pretty smart they have about a gigabyte of rom for a few a few a few bytes of ram probably and or maybe a few hundred we don't know how many but and they but they seem to talk to each other and do useful things so we have to begin to worry about that kind of a world okay well i'm going to tell you a story that started um around 1975 okay and was recently had a some advances uh it's called propagators it started when i was teaching my first classes in electrical engineering at mit uh in circuit theory actually actually the first class i thought was the field theory but then i taught a circuit theory class with a real wizard of circuit three affect theory uh paul penfield and i observed that what we taught the students wasn't at all what the students actually were expected to learn that is what an expert person did when presented with a circuit that looked like that was quite different from what uh what the we tell them to write down the node equations and there are certain equations for the the resistor and the capacitors and the and the inductors which are none of them here and the transistor which has a complicated non-linear equation and then you're supposed to grind these equations together somehow and solve them to find out what's going on but you know that's not what a per what a really a really good engineer does what a good engineer does he looks at this thing says hmm um uh well let's uh consider the bias analysis here that means these capacitors are open circuits okay and that means that um therefore and i'm going to consider this transistor to be uh beta infinite and be a follower right now meaning its base its base emitter uh voltage is going to be 0.7 that's an assumption based on the assumption of how we're using it it might actually be wrong which case i might have to back out of this okay given that there's no there's no current into the base here there's no current going into this capacitor so there's a voltage divider since i know the voltage here and i know the voltage here i know the volts here therefore i know the voltage here this point seven volts below that therefore i know the voltage across this resistor so i know the current through it that current must be coming through the from the collector here therefore it must be going through this resistor therefore it lowers this voltage by amount which is rc times the uh current which is the voltage from here to here divided by the resistance which is points et cetera okay and so therefore i can tell you therefore i can tell you the the uh the the all the bias conditions of this transistor and i need to use that to check whether or not my assumptions were right and they are okay now given that incrementally i can tweak this thing that means this capacitor is now short circuit uh that means i tweak this in voltage which means this changes by the right amount which means this current changes by an amount which is the amount this voltage change divided by that resistance which means this voltage goes down by an amount which this which is the current uh increment divided by multiplied by that resistance so the gain of this stage as an amplifier is rc over rtre it's within five percent that's right okay and every real engineer does that okay and that's not the sort of thing that we were teaching the students so i wanted to figure out how to teach students this sort of thing right of course the right vision of things is you write a program and then give it to the students to read okay that's what i did is i actually hired richard stallman in 1975 to work for me and he and i wrote this program together and it was quite a success and i'll just i'll show you a modern version of it because the old code doesn't seem to exist anymore i went looking for it and it was that was on the old its operating system and it's gone and probably probably on some backup tape somewhere which somebody has it can be found but it's not around now so here's a new version of the same program uh here's an example that here's the representation of that amplifier i'm not going to tell you what i'm not going to read it to you okay what it is it's a wiring diagram guess what the same lambda calculus combination structure that you use for functions works for other kinds of things too all it is is a consistent naming scheme it says for example let rb1 be a resistor let um let the rail node be a particular node combining a bunch of terminals from the various parts that were declared previously and the outer scope uh and so i have a bunch of nodes then i can say uh i'm going to produce a few more things and and that's my the output of that is this rest of the stuff okay wiring diagram description description of how these things are put together okay that's a really nice thing remember underneath this is lambda calculus nothing more there's no magic in there okay it's just it's just naming as a very smart mystic once said if you have the name of the spirit you have power over it so then once you do that then you can then you can for example make a test a test jig for this which looks sort of like that okay and i could tell it what kinds of uh what kinds of parameters i want notice by the way i'm allowed to use uh i i get all my values are also allowed to have have units that's another generic extension it doesn't matter it's just simple once you have generic operators you can make units and it works just great okay i can assume i can make various other assumptions about here's the transistor assumptions that are pretty simple i can say what what my assumptions are for getting the thing on the on the chalkboard here this is to make the arguments short enough to put on one on one slide okay and then i can ask what's the value of of the potential of the the base of the amplifier in the bias region and it does some deduction and it comes out with an answer in volts okay and then you say well it's the value of the potential on the emitter and you get the same kind of thing uh not very interesting but the really interesting thing is here i can ask why why do you believe mr computer why do you believe that the potential on the emitter of the amplifier in the bias region is what it is and out comes a out comes the thing with a remarkable qed that has nothing to do with proving okay it's just to be arrogant okay but what what it's telling you is the because remember almost all these things are lies anyway the models i told you for the transistor a lie all of all the physics and and and and uh electrical engineering things like that is based on useful lies which are appropriate approximations for particular jobs okay and the reason why when galileo invented the modern physics for all practical purposes what he just discovered was the value of a lie aristotle understood that when you roll the ball along the floor it always stopped galileo said forget about the friction let's figure it out and then put the friction back in that was a required lie to understand what was really going on that changed the world okay that was the big breakthrough of galileo okay so this qed doesn't mean anything but what really matters is that i'm chasing down the provenance of every of every value as the values move around in this thing and what's really going on here what's really going on is that i'm doing what the what the expert did i'm seeing something very obvious what i'm seeing is that if i know the value here then i know the value here because of a local phenomenon okay i'm propagating information around and if i know the value here and i know the value here and i know this value then i know that value okay and that that propagates through here into here and so on okay so this is information is propagating around we call this propagators and that was a that was a big idea in 1975. okay however it does have some mind some power okay the power is this let's go back here that we have imagine that the propagators are independent little stateful stateful machines that observe that are connected to a number of little things i'll call cells not biological okay so that a propagator holds a bunch of fingers on a bunch of cells given that it it knows information from this one and this one and this one it may be able to deduce the value there and go clobber okay and given that that value suddenly got changed then this guy said oh i see something interesting and it may go change that guy okay so there's a that information moves around it's allowed to go in any direction right but that's so there are these stateful cells that carry half state okay and there are these interstates propagating machines and guess what this is very nice for a very parallel machine but even better alexi radul my graduate student two years ago i suppose he got a doctor's degree uh made a great breakthrough here the great breakthrough was that uh we don't actually put values in these cells we put information about a value in a cell and i'll get to that in later a little later what it means is that what a cell contains is a monotonically increasing value of information about a value gets better and better information as a consequence things like the synchronization problems go away for a parallel machine this becomes a very powerful mechanism for building a very a very a very large complex machine in a simple way but it's electrical engineers point of view it's not a computer science point of view it's not data flow okay it's closely related but it's not okay it's a very different kind of thing because we're worried about information that's being accumulate in these cells but let me just go a little further one of the other problems i hate computer languages now almost all of them including the ones i invent okay and part of the reason is because they're full of expressions they could be even worse they could be statements okay but they're expressions and what i mean by that is an expression expression is a tree okay and what's going you got a bunch of values and they only come go up the tree and they collected about one output that's what you want to think about it and you don't have any names for the intermediate parts there's no name for this place or this place as in the circuit diagrams you have an electrical circuit okay and boy those are useful places to put names on now sometimes they're a pain in the butt you've got to be careful not to not take overwhelmed with names but indeed yeah one big difference in this case is i'm going to change it so propagators have all all connections are explicit they have names so you see this one this is a this is a step for the computation of the square root right you could by huron's method which is sometimes called newton's method that was invented two thousand years earlier by hearing of alexandria newton did a generalization which was tremendously important which is any function but for square root it was understood by by huron what you do is you take the average of the thing you're trying to take the square root of and the guess you have for the square root and you add that to the you take the ratio of those add that to the out of the guess you have and and divide it by two you make the average of the guess into the into the the query uh plus the plus the guess okay and that converges to the square root quadratically that's that's hiron's method and so that's what this is and so here here's a propagator description which is a wiring diagram and indeed i have to i'm looking now at something that looks very much like again assembly language i've got good languages for this yet okay but i'm using lambda glue to put it together lambda glue is so excellent for getting getting going before we understand what the real language is okay um so here we have here we have cells that we're making up which are all the intermediate ones this takes three cells as inputs and it says if i want to uh divide x by g i put the result in the x over g cell and divide i add g to x over g and put the result into the g plus x over g cell and so on and that's what that is okay so you're seeing a little wiring diagram for little machines imagine each of these things is our autonomous machine running continuously if i can burn processor i can do that right supposing i have as much processor as i had memory for proposing i had for every for every little memory module i had i had a big processor associated with it don't worry about that that'll happen it's going to happen because there's no other way to make the future happen okay so of course this works as you expect you get a better guess but now you can also make an iterative wiring diagram view of the world okay what are we seeing here what we're seeing is of course a machine a wiring time for a machine the machine is potentially infinite because it's got a copy of itself right there inside of itself right it's got the huron step in it the machine inside of it of course okay but this is a this is a way to think about things which is quite different okay turns out things may have multiple outputs okay this none of the ones here do but things may have multiple outputs for example the way you divide integers you get the quotient and the remainder okay that's it why why have to do one or the other and why not even think about that as being the normal way to think most processes have that property or what if you have things which have many things that go in and many go out but in any case okay we have wiring diagrams like this and of course the way in which this guy squared energy knows whether or not to expand himself that's a implementation question is a question like the difference between applicative water and normal order in calculus lambda calculus are you choosing to expand something when you all the inputs are there or only one input or you want the output or anything like that that's a different question okay i'm not going to worry about that right now okay we can make machines do it any way we like and we can have policies that can be can be can be allow you to do both remember a real engineer doesn't want just a religion about how to solve a problem like object oriented or functional or imperative or logic programming this piece of the problem wants to be a functional program this piece of the program wants to be imperative this piece wants to be object oriented and guess what this piece wants to be logically based and they all want to work together youthfully and because of the way that the problem is structured whatever it is and i want i don't want to think that there's any correct answer to any of those it would be awful bad writing writing writing a device driver in a functional language it would be awful bad writing anything like a symbolic manipulator in a thing with complicated syntax it's awful bad to write a to write a uh what's good a new numerical pro a numerical program in in anything but fortran okay that's a joke i actually fortunate fortune is a wonderful language in some ways it's the ancestor of the wall besides lisp fortunately the two oldest languages that still can have anybody using them and i think that they are the keys to everything else the current version of languages okay but let me go further hey what do we what can you do with these propagators and i'm only pushing this idea not because i think it's the right answer i'm trying to twist us so we say this is a different way to think okay some other we have to think 52 different ways to fix this problem we haven't i don't know the answer to how to get out of this i don't know how to make a machine that builds a person out of a cell okay or but i but i think the problem is that we've been stuck for too long thinking about the fact that we've been diddling with our details we've been sitting here worrying about our type system but we ought to be worrying about how to get flexible machines and flexible programming so here for example um yes okay one thing i could do with propagating is i can make i can make constraint propagators out of directional ones if i say that look i can make a thing that will that has an adder that knows that this is the sum of these two which means if it knows these it'll compute that if it knows these it computes this if it knows these it computes this okay it doesn't matter which the other which way the the information flows and in fact i could do that by just building two subpropagators which are directional three of them okay um multiplication here is a lie it's actually a little more complicated because multiplication by zero has special properties okay multiplication by zero it doesn't depend upon the other argument so this is just for show that you're seeing here so it fits on the slide and there are other things like that okay but once i can do that of course i can back make my electricity stuff because indeed a resist a resistor really is a two-terminal device with a constraint that is i times r equals v and indeed some other constraints which is for example the voltage is the difference of potentials from the two nodes it's connected to the current some of the currents into those terminals is zero that the product of the current and the voltage is the power okay and that that sort of thing okay i can build that sort of machine that's you know so i can build up from what i just showed you okay so that's nice and it all works very nicely but the crucial thing here is what i said before what mr radul invented uh two or three years ago for a doctoral thesis the idea that the idea that that the cells merge information monotonically so i'm going to tell you the story about you know the story about if you've got a barometer a stopwatch and a ruler how to measure the height of a building there are lots of ways of doing that so here's one thing i could have similar triangles okay that is i could have measured i could put i could say sun is coming in from the from from some direction except around here at this time of the day this day um and so there's a there's a shadow cast by the building and a shadow cast by the uh by by the uh the barometer if i stand it on end so what i can do is if i know that if i'm a ruler i can measure the barometer and i can measure its uh and measure its shadow i can measure the shadow of the sea of the of the building if i mark it fast enough so the sun doesn't move in time and i can measure it out with the ruler and then i can compute similar triangles and get a get a value which is in this case if i started out with a building that was i didn't put units in here uh the building shadow was 50 between 54 feet 0.9 and 55.1 feet long i'm assuming that that's an interval which is as error okay and the height of the barometer is between 0.3 and 0.32 feet and the shadow of the barometer would be point three point three six and point three seven feet then the building height is is it drilled between forty four point five four one one four and forty eight point nine seven eight feet okay that's done by building a similar triangles propagator which is simple enough okay of course i could have done the other thing i could go to the top of the building drop the barometer off and see how long it on the stopwatch it takes to know you have to click right that works okay and if i do that then i can have a four uh i use s equal one half a t square whoops okay that's equal one half eighty square type propagator and i if i measure the fall time i can say i can add that to the system and i get a building height which is 41. but slightly different okay because of course measurements differ they're not the same but i could do something better okay i can start out with i can combine these measurements okay that is supposing i have i know that the building shadow is this that's what i measured with my ruler and i i measured the barometer's height and shadow then i dropped the barometer off the building and figured out the times okay and so i got a value there this is this is a better value uh if i go down here if i look but guess what this value let me go back here a second the value here is between 41.163 and 47.24 here it is 44. so it's a tighter bounce but the best thing is that that information propagates back to the measurements hey you know the guy made the measurement made a slight mistake and we could find it okay because each of these measurements reinforces the others okay and that's always true in real science okay by the way how much time have i taken up so far huh 45 so i've got 15 more good okay thank you but that's i really don't want to talk more than i'm designed to talk for 50 minutes okay i can talk for 60 minutes on any topic whether or not i know anything about it okay so each of these each of these uh values is improved from the values we had before from the ones we measured okay but you know now we can for example bribe the superintendent by giving him the spot watch and we expect to get the the honest answer out of this okay and so we get a value which is not an interval which is 45 he claims the building's height was 45 okay okay so we get the content of the barometer's height now is better still and all of these are improved measurements ah but you know i could do better than that because in the same way that you invented these hairy meta monads to be allowed to calculus plumbing which is what they are they're wonderful stuff by the way but they are very hairy uh for people who haven't who haven't understood that very carefully and thought about it for a long time so so in the same way uh you know you want to be able to carry extra information with something what a monad allows you to do is to pipe around the value you wish to carry also state or something like that that's extra information that gets fed into the right place so it's hidden plumbing it can be done with it's a lambda calculus trick that most list type programmers knew about long ago but didn't have a name for it and it's very useful to to realize that it's connected to category theory and things like that okay so what do we do here okay well tracking of provenance okay well i'm going to attach other information besides things like like units and dimensions and stuff like that i want to carry with it where information came from so i can make arguments as to what where does it come from you saw that in the in the electrical circuit case that's very helpful when you're debugging a complicated problem okay so let's see what that is all right well we're going to start again okay i'm now going to add to i'm going to make my values be what i call contingent values okay so this value which i'm putting in the building shadow is a contingent value which is based on a premise called shadow it could be several premises which why there's a list there and i'm assuming it's a symbol right now but the premises are they have no no interesting structure they can tell whether or not two of them are the same okay so these are all shadow shadow premises and i get answers which are contingent on in on the shadow oops okay i can do the same thing with the fall time okay but this one i'm going to make dependent upon the uh upon a premise i call time the building height now is continued on both the time and the shadow okay i if the authority the super gives me an authoritative result okay then i can look at the building height okay it's dependent upon the say he gave me that information but the barometer site is now dependent on three things the superintendence estimate the time estimate and the shadow estimate but the building shadow estimate is solely on the shadow estimate and the fall time however has been improved by the superintendent's estimate but not the other one because the other one wasn't very interesting it was superseded the information is monotonically increasing in each of these in each of these cells okay the monotonic increase is crucial that one way to do that kind of thing of course is with things like intervals but imagine you had unification algorithm for example that's a way of combining information and there are lots of things like that right suppose i have a free a transformative signal approximate and i have the a time domain for an approximate version of the signal both with noise can combine let's get better estimates of what that is so now here's a this gives us even more power because what i'm going to do is assume that of course information is usually inconsistent almost all the information in my head is inconsistent i believe lots of things that are false right we all do right but the important thing is that i can i don't fall apart every time i think is the famous robot in some in some science fiction story okay why it's because if i run into one of these inconsistencies i giggle okay that's the answer it's a it's because i can keep apart locally consistent subsystems of my thinking and each of those i can make the deductions that i know where they come from i hope and they're dependent only on the things that i that i specify so they look at this for a sec for example again i'm now going to make these things called tms's truth maintenance systems they were invented again originally partly by stallman and me but they were given a name by john doyle who was one of my graduate students who wrote a master's thesis about them his phd this was about something else and david mcallister did some work on them and made another version of it and then there was a whole book by two of my former students uh johan de clear and and ken forbus which is called building problem solvers which is pretty much about truth maintenance systems okay so there's a whole is a whole nice thing what it is is it's a collection of facts okay with appropriate dependencies such that it's possible to say what is the best estimate you can give me of the consequences of these facts the logical or anything else and so in every cell i'm going to put a truth maintenance system instead which is a thing that's going to maintain the best estimates of what's going on with the ability to back out okay so here's what you're seeing here is that sort of thing and indeed uh i can i can do all the standard things you saw here okay and i'm not going to go through this but if you look at what's inside the building height at the end of this little sequence there's a three possible contingencies depending upon different dependencies okay and uh this can grow quite large but it doesn't grow very large because anything which is subsumed by some other deduction goes away and eventually we can do things like now change our point of view i can i can change my world view i can bring in like let's see go back here where's the cat i can decide that i could i want to kick out the time yes supposedly i want to kick out the the time measurement i want to decide that wasn't so good well i can get a value for the building height which is dependent only on the shadow map whoops shadow measurement i could bring back the time measurement and kick out the shadow measurement and i've got the something that depends on the time measurement okay if i if i if i add the superintendent stuff in and i bring in the shadow measurement i get something else i can decide i don't really like the time measurement and i i get the dependence i get the result it depends only on those two okay and i can walk this thing around a lot okay that's a lot of fun but better than that i can get contradictions maybe somebody made a mistake maybe there's a lie maybe the superintendent's a liar okay or maybe the the the the the ruler where was it was not held quite right quite right when i measured the height of the whatever okay look in real scientific stuff supposing i have a corpus of of all the medical material in the world all the journals you know darn well that most of the most of the uh articles are wrong okay and there's a good reason for that the reason is that that people do for experiments with p less than .05 okay that's the um they that their measure of whether or not something is good is a good experiment is whether or not the chance of being wrong is one in 20. okay now since they do far more than 20 experiments and they only publish the ones that come out positive okay you can you make your deduction from that there's great papers about this so by ionitis there's a great paper about about why almost all medical research is wrong um and that's but you know we still live better than we did 50 years ago so it can't be completely true but in any case you know if a physicist walks in with b less than 0.05 they're thrown out of the room it's gotta be p less than 10 to my seventh or something for anybody to be interested okay um anyway so here we go uh here we hear what i'm gonna look at a contradiction the contradiction depended upon the superintendent of the pressure measurement i was measuring this one ah i used the barometer a second time maybe i had two barometers okay so i'm getting back here and i measured the measured that by a pressure measurement which i suppose i didn't put on the on the screen which of course you know the difference in the pressure from the knife as you move right over up and down the building is going to be the weight of air between those two okay and so so i got a contradiction between those two and i can decide which of those i like or don't like as you can see here okay or here okay i get two different different values so i can maintain an inconsistent world view a major incident world view by having locally consistent world views ah but better okay yes even better i can also have the machine automatically find for me the consistent sub consistent sub the sub world views that are consistent but world view is consistent that's called dependency directed backtracking and unlike chronological backtracking which you would get say by using the sequence monad uh and a failure mode right for those of you who are pascalites okay that would give you a chronological backtrack unlike that i can do much much more efficient backtracking because i know the provenance is very fact so i could say if this if i got a contradiction here okay and i've got another contradiction and the intersection is this guy it's highly probably probably he's the bad but bad guy but even so if i just find something to rule out okay i can rule out a whole sub tree without having to undo anything and redo it redo uh computations when that were already correct okay because they but they were chronologically later okay remember this because a parallel machine in principle it's right now simulated serially of course by a little time sharing system but parallel machine with with uh with where all of these things are independently looking at information and carrying them from place to the next okay so here's an example of the famous uh famous story uh there's a five-story building baker cooper fletcher miller and smith live on one of the five floors okay uh i require that uh baker cooper fletcher miller and smith live on distinct floors that baker is not on the fifth floor cooper is not on the first fletcher is not on the fifth fletcher is not in the first miller's on a higher floor than cooper and uh the and the the smith and fletcher and fletcher cooper are are not on adjacent floors okay and i want to know what the answer is and it's pretty easy to get the number of quotes and it's very very efficient if i did this by by a chronological process this would be 3 000 or something backtracks to investigate this space not that search is a good thing search is always exponential and will always eat you if you let it so this is not to say that we should be doing search it's just showing that i can automatically find consistent subsystems of a complex system because this is this is working by backtracking on consistency inconsistencies okay now let me tell you go back to the original thing i was talking about at the beginning okay when i was an undergraduate at mit i used to walk by jerry letterman's land everybody who jerry levin was who does okay everybody remember here with the lebanon leary debates debate or anything like he was a famous neurophysiologist who was the first person to measure uh potentials from neurons and he did he made great amplifiers he was professor of electrical engineering uh and biology and humanities at mit okay and uh i used to go by his lab which said laboratory of experimental histology yes i was i was employed as minsky one of minsky's computer hackers okay so i would go over there and i passed by the leftist lab and i dropped in and listened and talked about wonderful things about neurophysiology and things like that and one thing that ledman he died by the way last year uh one thing that uh he would explain or explain quite often is that almost all the cells in in the central nervous system are very tiny much tinier than the ones that you hear that have long axons and things like that and you can't it's really hard to see them even with a good microscope and it's pretty hard to it's pretty hard to uh measure potentials from them action potentials and things like that so he he he had this idea that they actually are unbelievable as being undercounted that they may be hypothesis many of these neurons do not explicitly are not explicitly directional that they are actually things that are like these constraints that feel a bunch of other things and adjust adjust the sensitivity of other neurons they touch okay so if they say gee i feel these three things be doing something can i say let me make this guy do something too okay but if it's only these two then i better make this guy do something and leave this one alone or inhibit this one okay that was the kind of idea he had which might be false i'm not trying to suggest it's true or false i don't know i don't know anything about neurophysiology okay but except for what i've learned by walking into london's lab but oh yes he could talk about any so i could talk about it for an hour too sure but getting back to the kinesis triangle illusion is it possible is it possible that almost all the computation that we see that's really fast is filling in details what is going on in that electrical circuit example i showed you even with a big circuit it's filling in details that the diagram is a memory and i'm writing in specific well-known defined places the answers that i can deduce from local information okay sometimes i have to invent a variable and propagate it around get an equation but i hope not very often okay but most of the time i'm doing by picking my models correctly i'm choosing to make the problem simple enough so i can do this kind of propagation which fills in details what's happening here what's happening here is this guy is giving me evidence that there's something that there's something including this black circle by golly there's good evidence here that this black triangle is being included by something too okay maybe there's good evidence that's a good enough evidence to think that maybe there's a white thing that connects these okay and things like that and maybe that's done by some magic process that looks something like that for all we know okay i don't know this is true i don't claim it's true okay it's different way to think maybe it's worth investigating it might be completely nonsense on the other hand we have to throw away our current ways of thinking if we ever expect to solve these problems so in summary okay i'm claiming the problem uh facing us as computer engineers is not correctness it's evolvability a trivial example of making things more evolvable is that an extensible generic operations may help because they allow us to extend functions without modification but they make proofs very difficult a more radical proposal is maybe there are freedoms that we can unlock unlock by throwing away our idea of architecture okay how machines have to work here's one particular example which is concurrent and parallel in an essential way it's redundant and degenerate meaning it's possible to compute the same thing many times and put them into the same place and it doesn't cause any trouble except if one of the things dies you get to say the correct answer another way it could be degenerate means i can compute a different way the same way i can compute in my physics class by newton's newton's uh vector vector uh methods i can compute uh the motion of something like the planets okay i can compute it by the lagrangian method i can compute it by the hamiltonian method they get the same answer but they get they reveal different phenomena i'm not going to find out about how the symmetries in the system correspond to the correspond to the um conserved quantities if i look at newton's method okay that's why i want to use lagranges if i want to do a canonical transformation it better be hamiltonian okay so there are all these that's what degeneracy means many ways of doing something if i can't eat the sugar because i'm screwed up in some way okay then i can get my calories out of out of proteins every biological system has many ways of doing things that's called degeneracy okay i can maintain providence as i can know where data comes from and follow it around and combine the data combines i can combine the provenances appropriately sometimes you have to subtract as in a conditional proof if you say assume a get conclude b therefore you can conclude a implies b without the assumption of a so there are certain kinds of rules that have to delete the provenance when they do it but you have to know that so the means of combining provenance they have to be taken care of okay and i can tell you about dependency directed backtracking which is a useful way of of controlling controlling non-deterministic search sometimes okay that's an example of a of things that i'm just pushing forward as being a possible way to break out but i want to hear every way that can and that's all i have to say today okay
Info
Channel: Strange Loop Conference
Views: 7,373
Rating: 4.9375 out of 5
Keywords: programming, propagator, scheme, sussman, program design, Scheme, MIT, electronics
Id: HB5TrK7A4pI
Channel Id: undefined
Length: 64min 18sec (3858 seconds)
Published: Tue Mar 16 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.