Jan Křetínský: Learning in Verification II

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so so far we have been speaking about how to get the strategies what I would like to do now is to tell you a bit more about how we can represent them so that they actually become useful and for that I will use a completely different type of learning namely learning of decision trees well why decision trees I will comment on this in a second okay what is a strategy so we have seen a strategy is a function telling us what kind of decision what decision to make in each state so it's a function from States to actions ok simplified ok for those of you who are used to work with historic dependent strategies then of course you can enrich your state space so that this information about the history is to some extent in there and then you build memory strategies over a blown up arrow state space ok so it's a function from States to actions what is that mathematically when when you output that well it's a long list of pairs state and action to be performed well how can make it more readable well first of all why is this not readable well this long list I mean it's a list so it's not really telling you what is exactly happening it just tells you what to do in each particular situation so it cannot make much out of that and it's a long ways to it's long so it's too long to be read too long to be understood so both of it is it's critical so what you want to do is rather choose an encoding that is more sensible in the sense that it reflects the structure of the state space somehow reflects the ideas that the designer of the system had in mind and now you tell him how to steer the system so that things work then the designer wants to realize these decisions to the concepts that he invested into the into the designing the system so there are things such as variables in the system and then the decisions depend on maybe whether we are in state which is dangerous or standard or or whatever and he doesn't want to get a list of all possibilities what to do in each possible state and it's also it's it's also desirable to make it smaller so I mean small compact readable description that's our that's our task so we have this we have this simple example where we had strategies where the list of all of all choices billion things long and then we want to make it smaller so there may be the just three are fine and secondly I would like to encode it somehow so first of all what you can definitely do is you can consider decisions happening here in the clouds and well maybe these are unreachable using your strategy so I mean they are obviously useless these are reachable but they are useless because you don't reach the goal okay so they have very low intuitive importance okay I haven't defined any importance yet I will do that later but this will be just one possibility how to define importance out of infinitely many it's actually pretty difficult to define something like important something that I care about this and that's why that's why I want to use learning to help us with that we don't even have means to actually say I would like to care more about this you could easily say I don't care about states that are not reachable and I don't care about states that cannot reach the goal and I drop the drop them off immediately okay we have seen that this this can be done but what about these states where actually the decisions could matter a bit then you could in principle compute what is exactly the effect of each particular decision in here or in there and you could somehow quantify this and then the small in those states where the important this is very low then you could drop them and you could invest a lot more computational effort into that to kind of make it a bit smaller but it would be a lot of work it wouldn't be efficient and in the end you wouldn't probably end up with exactly what you want anyway so we will see how to actually make use of rent in use of learning also not only to do that more efficiently but also to reflect this quantity that I haven't defined yet this importance in a better way than just putting up a threshold whatever is more important than this I want to capture that whatever is less important than that I don't want to capture that this is actually it will turn out to be in the case and he's actually not what you intuitively want although at first glance it looks reasonable very good I know I don't think it was that much of an achievement but thanks anyway so I mean this is this is definitely an achievement so you you have probably heard about decision trees maybe maybe already during the during the during the school but most importantly you hear about them all the time in real life decision trees have one huge advantage it's something that almost everyone can understand if you get your car repaired what the what the mechanics people there do is they have a big decision tree telling them what exactly to do okay is the I mean connect the computer to the cars that right try to start a starter the engine does it work yes do this does it not work do something else and then you have this kind of branching where in each possible situation you know what the situation is so I mean the car engine doesn't start but actually the I mean the the gas is in there and the oil is in there the electricity is working so I mean then you have more and more information deeper and deeper in the tree and that's something that everyone can kind of understand as opposed to many other representations that are maybe more efficient in the context of learning but I'm not so much understandable okay say your networks so there has been some recent work on on interpretability of particular layers so I mean if you are analyzing the pictures and in particular higher/lower actually layers of the network you can identify things that you intuitively understand under contrast and things like that so there has been some progress on trying to understand what is happening there but I mean the understanding of the whole thing is far worse than the understanding that you get when you look at a decision tree and the human mind is more available the human mind basically can understand decision trees and not not the other structures right through to put in very very a very simply it's not entirely true but it's maybe the best that we have at the moment okay there's definitely one of the simplest things so let's go for that okay so what what we do here is we want to learn and we will see why and how to apply it to the strategies you want to learn a function based on input-output pairs so here is an example of a decision tree so here I I want to say I want to learn this set okay this is my domain and this is how I read the tree so is the number I mean we only have this one variable X for capturing the state space here is very simple it's just one variable having values so if it's smaller than three then it's in the set if it's not smaller than three then I am here X smaller than seven well if it's not smaller than seven then it's actually seven and then I'm in the set otherwise I am unhappy so I'm not in the set okay so this is instead of yes no so in principle this is at recapturing this the set how do you learn it well there is there is there are algorithms for that and I will not discuss them in too much detail the point is that you give positive and negative examples so I mean one is in the set 3 is in the set 7 is in the set 6 it's not in the set and then maybe end up with in the with this tree okay so basically you can also view it this way that up up here you have 1 3 6 7 these are the examples so the the inputs that you have seen and now you are actually trying to split them so that you have as uniform uniform children as possible so the good ones are in 1 children when one child and the the bad ones are in the in the other child so here when she would actually you can do by this is to put 1 2 3 in here and they're all good so you're done whereas here you are left with 6 and 7 and then you still have to split to make it entirely entirely correct mm-hmm and there is an algorithm based on entropy which actually produces trees basically it tries to look at the current node and think of Oh what would be a good predicate to split these two so that I have asked your children as possible so that means one of them is really kind of almost all the good guys and is kind of almost all the bad guys okay and there are various techniques for that and then you build up this tree and at some point you either stop if you're happy with a bit of imprecision or you build the whole thing and then maybe you prune it to make it smaller while introducing some imprecision and and you can do all sorts various various things one thing to be noticed here is that actually two isn't appearing in the examples so maybe if I were actually learning this set through these examples my result would be incorrect because two never happened to be in the in the data which can be seen as a disadvantage that I didn't really learn the whole concept it's actually going to be a huge advantage for us because there will be a lot of things that we don't want to learn that we don't care about that using normal data structures we would have to say something about it we would have to declare it's two in the set or not and we have to do something about that if two is not too important then we just don't include it into the examples that we learn from and that's actually an advantage because we don't care about two and we don't have to spend effort on it and make the tree larger or whatever okay so this advant disadvantage will be actually our advantage for for this particular purpose right so how do I use decision trees to represent strategies so first of all I assume as already discussed in the in the first part that my state space has roughly the following structure it's a bunch of variables and for simplicity I will only consider integer variables and boolean variables here and I mean they can be split into different modules or not I mean this is not so important okay let's now just let's know just consider we have a state is given by a valuation of integer variables or boolean variables okay now what I want to do is I want to take the strategy which is a list of all the decisions and I won't want to encode it as a decision tree so if my strategy is of this form for every state I'm getting action that I should play a possible example of a decision tree telling me the information what I should do is this one so is the variable X smaller than 550 s then play action a if no then play action B you will probably agree that this is certainly better than the list basically telling you 1 a 2 a 3 a 4 a 5 B 6 B etcetera okay this is capturing the decision not in terms of the state but in terms of the variable that means something I mean the programmer use the variable because the variable actually means something to to the programmer so that's why we speak about I mean that's why we have the variable here rather than rather than say a particular bit in a bit representation of the state space right I mean I could play the same game using the BDDs and I will discuss the differences binary decision diagrams we that are pretty well known I guess almost everyone here and one of their disadvantages is that they're working on some big crazy bit representation of state space I mean this is good for machines it's not good for us okay we want to speak in terms of the objects that the programmer used or programming the system okay so this was an example the tree for the case where my strategy is of the shape okay for every state I know what to do it may also be the case that I have a type of strategies where I could do this or that I mean I could go with my robotic guru or left or right both is fine I shouldn't definitely go backwards because I would fall off the cliff or something okay so then this is something that is a strategy of the form that given a state and action it tells me whether it's a it's okay to play it or not so these strategies are dependent on the context sometimes called permissive where any of the choices is fine or liberal where it means that you can in uniform eyes over these decisions but I mean you should not omit any I mean if you return then you should maybe try also some others there are some small small differences of differences that I'm going to talk about the point is the type of the strategy is slightly different and this is actually the type of this this is a strategy that gives you more information than the strategy that finally from this I mean from you compute this and then finally you can process it to get the upper one but then you lose information you lose information about what is also a good option okay you determine eyes it to get something that is easily implementable but you lose a lot of information so that's why I'm going to work more with these but I mean you can of course adapt it to to the simpler case okay so in that case the decision trees may look a bit different so in the leaves there are not actions I'm not classifying States according to what actually should be played I am actually classifying state action pairs whether they're good or not happy smiley face or of a set smiley face okay how does this work so let's assume for simplicity that my action space has no structure I mean it could have structure I would do it with as with the state space let's assume that this is just discrete so I have just names of actions appearing in my I in my morning right so in the prison prison language model how well how does that one work you have synchronizing actions such as I mean this one is this is for some Mars rover arbitral for mutual exclusion protocol that doesn't really matter too much there are some actions that just have a name so you synchronize two models synchronize on one action there are also actions that are actually in the prison language they are I didn't they don't have any identifiers it's just a line of code so if L is greater than 0 and B is 1 etc then what you can do is these are the values of the successors okay so the prime values I mean B is set 0 0 0 and 1 is set to minimum of blah blah blah okay so this is actually a pointer to a line number in your program okay so it's pretty complicated expanded here but in principle it's an identifier of what what should be done and so that's something that the programmer can look at when we speak about that and this tree then is to be interpreted as follows is the action that you want to play rec if yes that's a very bad idea to do if no then is it the this funny action if no then it's fine then yeah then you can play it if yes then whether to play it or not is really dependent on whether Z is positive or not okay if it's if it's if it's positive then then it's good idea to play it but if it's it's not positive and it's a bad idea to play it actually that's the Z is capturing this mutual mutual exclusion arbiter so the Z is capturing somehow how many guys are there trying to get a resource so if there is nobody then the behavior is different than if someone got that resource ok and actually I mean this is a tree that we automatically get from this protocol when one injects an error into this so I mean this is a pretty long protocol we inject it or someone actually inject it error into into one line particular line which made the protocol not work and when you looked at the optimizing strategy to figure out what can go wrong then you get this tree and this tells you oh when there are actually people trying to access the shared thingy then this action is messing up this is how you have to play so that you mess up so this is the line number where you have to fix something and this was where the error was injected okay and it's it's it's nice that you get this automatically and you get an explanation sort of of the problem it's not like oh now it doesn't work and so what okay and trying to prune the system so that I mean you have like a minimal system where it's still the error occurs I mean here get the directory an explanation of what goes wrong okay so how do we get something like that from a strategy and yeah maybe before I tell you how we do that maybe the those of you who are more inclined to to learning might already ask oh do I really want to process the strategy that I compute in this way or should I really directly compute these decision trees and would be even cleverer right this is harder this is not solved yet I mean there are a couple of groups working on these things like how to get reliably the strategies in this form so that you get the advantage of learning only small things but already guaranteed okay so this is harder so what I can show you at the moment it's complete form is how you process the computed strategy into something that is already readable so how do we do that so the high-level algorithm works as follows so first of all so you have this I mean in principle we have this long list okay of state and actions state action pairs and if Sigma is returning means the predicate is telling me whether sa is true or false or whether it's good or bad and what I do I take good examples from those I mean these are the actions that actually I want to play so these ones these ones okay and then maybe I also have the also have the bad ones okay where this doesn't hold so I mean these are I don't want to play here and won't play there and so on and this is important this here is an inclusion I do not necessarily need to put all the examples there so for instance I don't really care about what is happening in this cloud so I don't want to put it into it every I don't want to put it into the data that I process and secondly the inclusion buys me one more thing and that is I don't even have to know the strategy completely so I don't know for some states I don't know what to do there I just don't put it in the data that I learn ok so I can process incomplete strategies as well it's also fun ok so once I have identified these and we will see how to do that we can call learning algorithm for decision trees to learn from these samples good and bad examples and this may or may not be the of the shelf tools and I will show you an example for Markov decision processes where it was basically enough to call what is in Weka or whatever whatever library you want to use and then when we wanted to do the same for games I mean no stochasticity at all then actually we had to reimplemented different stuff because it's what's different than what learning to suffer once you learn a tree it may be a wrong one I mean it's product of learning okay so you look at whether it's good or not and then if it's good enough then you say ok this is a result and if it's not good enough so that means it's not capturing enough information so that for instance if you pray according to that strategy than you're then you're losing or you're losing with two higher probability then you have to do something better about what you learn from okay provide more information more targeted information we'll see that so the crucial point is this how do you generate what you learn in the tree from and yeah so this is this is the question moreover in this setup I consider as usual with the decision trees and learning of any way then we're not having a set of inputs when we're having a multiset and this is important if basically it's like teaching a child something right if you say it once and does not necessarily have to work immediately and if you say it 50 times then there's a higher chance that the learner where reflected in his knowledge then if it's there just once okay so the question is what should be put there and in in more detail how many times we should put it there should it be there once 50 times or not at all now yeah so the key the key idea is of course well the more and the more important the decision is the more frequently I should put it into the data set because maybe I really want to capture it well what is important here we said that we definitely want to skip all the data points where we cannot reach the state or we cannot reach the goal from the state and then it doesn't really matter what we do because we anyway don't do anything useful so we define here one possible definition of mirrors of possibilities of an importance of a decision in a particular state so I define an importance of a state with respect to a go will and a strategy or controller here so I mean I could start with this probability of reaching this state under the strategy so if I go there under that strategy often then probably it's it's kind of an important state and I should really know what to do there because I'm going to face this problem again and again and this is not reflecting at all yet the possibility that I go frequently to somewhere I mean I could come frequently over here well it's half a person in this picture but it could be a lot more but no matter what I do here I'm anyway never going to get to the target state ok so I only take those runs where I reach the goal sorry those those events where I reach as if I can also reach the goal so I take this conditional probability which is one way to view this you could take for instance just the first part or you could take the expected number of times you visit s on the way to go low so the more you are there I mean maybe you just don't pass through maybe you have to come there five times and then you really should know what you're doing and so on so we have like many different possibilities and this is one way to try out things I mean it's like in most of the learning approaches you try out and see and then you tweak the parameters and maybe it works better and then you tweak a bit more and then a better and maybe you have to come up with a completely different approach the important thing is for correctness this is not important at all you can come up with anything it may be more or less efficient but it will always work in the sense that you will produce a result that is correct and that is reliable it just may not be that nice or that understand about that readable for instance so how do you how do you how do you deal with that so this is actually even a more complicated question than the one that we set off at the beginning so we asked what is the probability to get from in it to the gold state from the initial state to the goal State and here we are asking for each state what is the probability and even conditioned on something else I mean it's even more complicated so I definitely see so how do I go about that I don't use the exactly this information anyway I took this formula just because I had a feeling like it could be useful I mean maybe it's not right so I can take something that is just an approximation of this that's what I can do is I can approximate it from experience so I can take I can run simulations and I am going to record the number of those that visit the state and the goal and if I divide it by the number of submission visiting the goal then I get an approximation experimental approximation of this formula okay so what I'm going to do is when I'm generating good and bad and I run a simulation I collect all the states that appear there if it's a successful one they reach the goal I collect all the states that are there and I owe them into into the into the into the data set and for each of these states I learn which actions are okay which are good which are bad so I put them into bad and good sets or multi-sets and this way I generate generate the data and it's done through the simulations very efficiently not very precisely but anyway we are approximating something that is maybe a complete nonsense so it doesn't matter but it should be somehow intuitively capturing why this could be important okay you can come up with something else if you like but this doesn't really matter too much okay once you have the once you have the data here then and the strategy the strategy in the end is good enough and that's fine if it's not then you have to refine so what you do is well you can do a lot of things it may mean that you didn't have enough data so maybe you run a few more simulations this may give you information about other states that you haven't visited before that or it may just tell you oh these are states that you have already visited you have it in your data set but let's put it there once more because we really want to learn these these are really important we are visiting them all the time or you could be saying well I detect places where I'm screwing up using other techniques and I just put this data in there you can just do whatever you wish it's just that you should you should allow for in the limit for more data and if you also will then for more precise trees capturing the data so that means large tree has more precision of course this is what you want more precision but of course the larger the tree the less readable it is so this is a trade-off to give you an example of what you can get on the benchmarks that we have seen previously so these are I mean these are with some of the benchmarks and now that the numbers are slightly different because the the the SIA the parameters are are different of these models because it's actually when you want when you want to compared with traditional verification techniques then for things that were as large as they're you cannot even reasonably work with the output so the tools cannot even output the strategy because it's so large so they give you a number but and I can internally maybe work with a strategy but you cannot even output it before for various technical reasons this is just too large of an objective to work with you see the probabilities are different you know different orders now what you can do would you normally do is you first drop all the states that are not reachable under that strategy so you see that actually saves you already quite quite a portion of state so instead of 90,000 states here you only have 60,000 reachable so it's a lot smaller thing but still it's too long list to make any sense out of it you could say okay I take this and I process it into a BDD so there's a binary decision diagram so how many of you know what a binary decision diagram is they are at their own wrong question how many don't okay so the binary decision diagram looks a bit like a decision tree but it's actually working over not so the state here would not be a collection of values for variables it would just be a bit string a long bit string and then you and ask is the 7th bit in the representation of this string 0 or 1 and if yes then you go left in if no then you go right and so on and so forth moreover there is a fixed order of these bits that you can ask for so maybe your order is first I ask for the seventh bit then in the second level I always ask for the third bit and then for the seventeenth bit and you can play with the ordering but it's actually quite expensive to figure out what is a good ordering and well anyway you can you can apply this structure that is very very usable for many verification purposes for compactly representing huge sets because I mean it's a classifier so it's representing a subset of the domain and then what you get is well I mean considerably smaller considerably smaller object I mean it has its graph with four thousand nodes instead of a list of half a million items if you actually play the same with the decision trees you get something that actually as a picture can fit on a piece of paper right although it's describing what the system millions of states or at least hundreds tens of thousands of states can do so I mean you can get like say 27 state 27 note tree with a relative error of 1% okay so these are these are those that are the numbers are for those trees that I already have precision at most one imprecision was 1% what do I mean by that that if the original probability to reach the goal was 1% then now this is one you get a tree with 1% plus minus 10 to minus 4 okay so it's it's a relative error of the resulting strategy now okay what is this funny one well actually in that setup you can do whatever and you'll almost surely reach the goal well well it's good to know that right that you actually don't have to implement a complicated strategy and no matter what you do you just get that okay this is what with the verification tools don't even realize okay but anyway you're getting you're getting small things now what is this funny memo here well if you try to get a strategy out of this system with this that many states then you don't succeed I mean you cannot even get the strategy for such a small system from the standard tool so what you actually have to do is you have to run the technique that we talked about an hour ago where you produce your visit only a small fraction of the state space you produce a controller that is defined only for a small fraction of the state space I mean like what is that a thousandth of the state space there it's defined it has precision 10 to minus 6a and then you can encode it as a bdd or as a decision tree so actually you can draw a small picture representing what is happening in this 1 million state system with a relative error that is close to negligible and actually you can scale it up to even even so I mean you can scale up this model to I mean trillions and it will still work so well it's something that the model doesn't even fit in the memory you know look at the whole model I can still produce actually a small picture representing what to do in that model with with very high precision and you know the impress the degree of the imprecision all right I have been talking about BDD and decision trees why as why our decision trees here so much better than than BD DS what are the reasons why do we have to actually go for something that is in the learning community and we don't mean verification we don't have that we use BD DS and we're we're happy about them so there are a couple of couple of differences so one some of the differences are directly on the level of the data structure so what are actually the difference is between the data structures so a disadvantage of decision trees versus binary decision diagrams is that trees are trees so we are not having direct acyclic graphs we were just having trees we're not merging identical sub graphs okay in BDD is you merge or whatever is isomorphic you merge that and you get something a lot smaller this is essential it makes it very compact here not only it's not Dharma you could in principle consider doing that you wouldn't save much one of the reasons is that the son of the sub trees are always a bit different why well this is actually an advantage of the decision trees that the sub trees can be different in the sense that you can have a tree where you ask here is X greater than 7 and if yes then you ask is why smaller than 3 and if no then you ask is Z greater than 5 you ask different questions depending on what the result of the previous information search was I mean the engine cannot store it cannot start then it's probably not really important to ask if the engine is running is there any strange noise going on well I mean it doesn't even start right so why would you ask that in binary decision diagrams you have the fixed ordering of what to ask when sometimes you can skip some of these some of these questions but you cannot reverse the order with the skipping I will comment on the later and reversing the I mean changing the order and choosing different predicates on the same level which is not allowed in BD DS it's very helpful to get smaller trees of course then if you're changing if you have different if the trees are different I'm using asking different questions then you cannot really merge the identical sub graphs because they're almost one that was a question yeah yeah good comment yeah so we have can related to that we have differences on the level not of the data structure itself but on the way we use the data structure so actually that is the difference between learning and sort of just storing in a data structure so what are the advantages of actually learning here so first of all for the BDDs we're using the bit representation which is something that is not understandable so we don't want to use that and it's actually not what is going to be efficient or the most efficient thing to ask because that typically the program works on the level of of variables with integral values so if X is greater than 5 than something or if X is even than something if you ask for bit values then you have to ask quite a few question before you figure out that the boundary is 5 and that it's even or odd right you have to I mean whether even though it's easy in binary but it is divisible by 3 then I mean it's it's it starts to get complicated so the possibility to have a wider class of predicates that you choose from and then you pick by some heuristic it's very expensive to choose the best one be the least choose the I mean the one that is prescribed by the ordering and then just choose the ordering here you it's also all about the choice of the predicate so may try to synthesize the predicate learn if there is a good predicate kind of splitting your children into a splitting your data into pure children as much as possible a good predicate is also a readable one so you can try to synthesize among those that you declare nice readable simple or whatever ok so there there you have more flexibility secondly you have this more efficient way of producing this data structure instead of figuring out which is the right variable ordering where I mean this problem is np-complete of course then you have some heuristics which try to swap the ordering but I mean if you ever try that it's really hard and it doesn't help too much well I mean too much means I mean it can help on orders of magnitude buddy I mean it's like you would expect that you could do even a lot better it's really hard to figure that out here you have this well fancy entropy based heuristics based on what I currently think is the most efficient way not really caring about the global optimality but I mean trying really to go for what what looks looks reasonably now and even more importantly you have the don't care inputs so as I had in the example before the example with a set 1 2 3 6 I was ready 1 2 3 7 was that depending on whether to is there or not then the tree would look like this or like something else if I don't care about - then I will produce the simplest tree or kind of the simplest tree that captures one three and seven and only care about two maybe it will say that two is in there maybe it was saying that two is not in there but I don't care about two so let me just use the smaller of the two possibilities and this will result a small into a small tree if I'm using a BDD I actually have to say what I put into the data structure I mean this is this rigid view of data structures I have data I put it into the data structure and I inspect the data is there I don't mean this is it may be the definition of a data structure we are actually taking the data putting them into the data array or taking part of the data we are putting in them into the data structure so we definitely don't have the complete data there and even the stuff that we put there is wrong okay so this is giving us more flexibility so this is also the last last advantage it's actually imprecise the fact that if I say you should do this and you should not do that and then the learner gets it wrong just because it's much simple to learn it in a slightly different way which is screwing up that particular decision allows me to have much smaller decision tree that maybe is not capturing this very corner case but maybe if that corner case is very negligible then I can forget about that and if it's important then I will figure that out and I will put in more times there and I will make the learner learn it right so it's like when you're again when you're teaching a child something then you're not explicitly listing all the corner cases right I mean you you first start with like like don't touch the socket and only later on you you say like okay well sometimes we have to I mean touch the socket if you want to repair it but then you have to switch of the switch of the electricity and I mean you you add more refined stuff later okay but you start with with with the core stuff and this gives you a very simply learnable set and therefore so the result is more readable and more understandable so if this is also these sort of disadvantages or something that one could view as a disadvantage its imprecise and it's not really capturing I mean it's it's it's it's it's only a partial model for your data these are actually advantages in this context right so this is we have been speaking about the setup of Markov decision processes now let's go for something let's go for something different namely for games so no probabilities now so let's play a game of chess or go or or whatever then in that case I mean these techniques I mean learning techniques generally can offer a lot more than precise techniques okay so everyone knows that okay we can beat the world champion and go bye-bye computer now okay but it's going to beat him maybe maybe even now like thousand times out of thousand and one or whatever but isn't going to win always this is something that in the setup of a game we are not happy about right so if you have if you want to so games arise for instance in the case of synthesis you want to synthesize a system that works always your interests in the worst case if there is a game that I will lose then I don't have a good enough player I mean the alphago is not good enough because it doesn't win always it's good enough for practical purposes of playing with other humans but I mean if you want to if you want to make sure that no matter what kind of behavior happens on the industry that you not doing something wrong then I mean you have to take care take care of all the possibilities well then you have to take a bit different philosophy than actually learning does because learning is not made not tailored to capture all corner cases and be precise so you actually are interested in overfitting so you are constructing maybe decision tree for the strategy we really want to over fit your data you don't want to generalize in any way maybe you have the strategy already I mean right you don't mean to induce what would you do in states that do not even exist you only want to capture what you already have so you really want to overfit something that you definitely don't want to do in an in the learning area so we are kind of misusing or even abusing decision trees for our purposes in a way that is never meant was never meant to be in the way they were never meant to be used so so we really have to overfit that that means first of all when you get the tree and you use the standard libraries the standard libraries at some point refuse to unfold the tree anymore because they tell you oh this is precise enough come on you never you never want to have something that is more precise and because in reality this is what they are used for okay so even if you just want to want to unfold you unfold a non pure child into a pure children then you don't have the option because the truth don't allow you so you have to do it yourself moreover there are also more fundamental issues at some point it may be the case that the entropy heuristic tells you well no matter what you do you're not going to improve I mean you can split on that predicate on that predicate you're not going to get more information why well actually maybe you need to get two predicates to actually figure out more information for instance if it's about equivalence if X is equal Y I mean it's if you can only ask about X&Y and a lot about X or Y or X equivalent Y then you actually have to have two queries and if you ask just about X is X 0 1 or is y 0 or 1 you're not getting more information you have to ask two things at that moment again the tools will say well we're not getting any information gained when when we were unfolding about using any of the predicates so just stop here okay so you have to actually take the tools and and change them because you are using them in a way they were never meant to be used so then you have to kind of maybe make some sort of look ahead like okay if I use more more predict as what happens and so on but you can do that and you can still get something that is better than than be Dedes so here is a this is some for some collection of benchmarks from the synthesis competition collection so this is this is the this is the diagonal here so this is the these are sizes for trees and for BD DS so you see they are better it's not like by orders of magnitude so this is like 10 times what is that this is this should be something like 10 times smaller or something okay so it's it's not so great an achievement compared to the MVP so why is that well because we have to actually fit every possible corner case in here and that is costly well it's still it's better than the BDD but it's of course more costly and yeah then what can you do well you can do a lot of things you can say so I mean this this is actually also why it's not so good is that it's for this synthesis competition where the the tasks are very lowly I will define this is basically like hardware circuits okay so you basically have input and output signals boolean input and output signals and then you have a then you have a a requirement on what the output signals should do corresponding the input signals so your structure is very poor it's a lot of boolean sand you have no idea what they mean right if you work with actually integers you could do a lot more so I mean you start with a poor representation so you get poor results and secondly then if you have only these boolean then of course you're also on the level of BDD is because you only ask is this 0 or 1 so you don't work with the integer so you don't have this fancy possibility of using the integral predicates okay so this is again in that respect then you're losing so you should I mean transform your benchmark into something that has the structure and not really have it in them in the boolean boolean way written down and so you can in even in this case you can think of okay maybe I shouldn't be using clever predicates that are still sort of just on boolean's so very often what happens is that you get this pattern simple pattern if this goes wrong when this goes around this goes on this goes wrong then do this otherwise do something else so it's like a big disjunction of off of the boolean and then if you add this as a predicate that you can also use in your decision tree they'll allow it to use these more complicated predicates assuming that they are still readable enough this a disjunction of boolean may be still considered reasonably readable so already then you get something that is like half the size so this is this is this this is this table you get trees of half the size just because you eliminated this simple pattern that appears and it's hopefully still readable and it's much smaller so then of course if you have wrongly or not wrongly but somehow a problem posed in a in a very naive way without the structure so you kind of transform it into something that is just like bit string representation of these things and still work with that but of course it's harder so that's there's the message of this and of course it's harder to work with the setup of games where we have to cover every corner case compared to probe list existence I still can do that and I mean this is some example of washing machines that run in parallel and it's not so important a couple of input and output signals the size of the state space here you train on some samples and then you build the BDDs and they are of some size the decision trees are considerably smaller and if you allow if you in this particular case if you allow us or the disjunction then you get like trivial trees where just I mean it's basically one decision like any of these guys is doing something wrong do this otherwise just do the default behavior right and only to capture such a simple instruction you actually don't see that in the BDD you don't get that information okay so we're not doing anything actually fancy it's just that the verification techniques where that line before right so another useful possible use of this of this strategy representation is that maybe you want to get a parametric solution for a problem of I mean maybe you have communication protocol or maybe these washing machines many of them in parallel and then you have to have someone who kind of centrally decides what to do and then the solution can be fur depending on whether you have five machines six machines seven machines and you really would like to synthesize a control that works always okay for whatever number of machines and then this is in general undecidable and depends on model but even if you can decide something then typically typically it's very hard what you can do here is you can take model with four of these machines small one you build a decision tree and you get this let's look at this so this is always a question on the on the boolean variable this where there are more of them this is the big disjunction if any of them is I mean if any of them is as one then you go in one direction if the other way in the other case you couldn't the other direction now if you look at carefully industry it's a complicated thing for and I can tell you this is for the set up where you have seven what was the machines or clients or whatever and you see that there are a couple of things such as empty zero here and pretty one here and put a 1 here and put a 1 here a fill one here and then there is some requirements 6 requirements 6 requirements 7 requirements 7 6 6 ok 6 here ok so actually there is nothing interesting happening with the clients 2 3 4 5 it's all about like communication of two clients so what you can actually do is try out this in the general setup where you have more of those clients or if you think that it's important that it's six and seven if you have some clients then maybe if you have ten clients then the first thing to be tried out is what if I replace 7 with 10 and 6 with 9 ok will it work ok and so you get at least suggestions and you can actually take this and you can apply it to the parametrized system and then check whether that one is actually working and then the the task is much simpler if you already have a good suggestion how it could work so it could help in unexpected ways not just represent strategies but actually solve parametric verification problems yeah this one is to actually show you that it really depends as often it is the case and machine learning on the way you represent the data so we have complex things such as maybe you have a synthesis problem for an LTL formula so you have a linear time temporal property that you want to satisfy and you create a system game out of that you want to solve the game and represent the result and all this these are complex subjects these are not boolean and these are formulae of logics and very complicated objects and then depending on how you represent them as vectors I mean you have to represent it as as vectors of integers or boolean or something that the learning algorithms can work with and of course it's also up to you to represent it somehow carefully and this is just to show that if you choose extremely naive representation for Sumerian for presentation then you're still a bit better than BDDs okay but if you actually do it just a tiny tiny bit more work just look at what in this current this actually should be you don't represent it as ID of like some formula but actually look at all this formula is a big disjunction of something then already with with with this you can I mean again save another I mean it could be again twice as small and then you can I mean you can always work more on trying to represent your data so that it fits the representation fits more the actual purpose so this is also important part that one knows from learning and one doesn't know from verification how you represent it is very very critical okay so let me summarize this strategy representation so there is again this interleaving of I get the precise decisions from from the verification part and then I suggest the decision tree so I mean there is this what is important and then how do I create the tree and then I get the tree and then I see oh it's wrong one so where it's wrong what kind of further information I need and then we kind of exchange this until you get something that you're happy with and yeah the point here was that you don't want to just say in this traditional verification philosophy the usual that something is relevant or not I can drop it or not but you have this quantity of like how important things are this something this thing that is so hard to express and that's why we that's why we need the learner to make make some sense out of this intuition and yeah okay so there is some literature on this in the slides on the topics that that I have covered so far for different models and different objectives so you had it in the slides that should be now accessible or yeah so let me just skip this and let me let me go through a few more examples that illustrate this concept of how to combine learning and verification many of them are following the similar path some of them are following a different path so let me let me comment more on this so I will look at reinforcement learning decision training learning automata learning a bit all of those you have already seen and for different purposes and let's start with reinforcement learning and verification so this is let me focus on some other approaches the problem that we discussed in the in the first part of the lecture so for instance this this paper is one of the older ones or oldest ones on this topic is considering MDPs where on top of our NV piece you also have some sort of time and some sort of prices okay so rewards you can think of rewards and time is just well let's let's just imagine discreet or let's imagine let's imagine clocks from from time to automatic if you know what that is and if you don't then let's just assume that the time is continuous but interesting things happen in the sort of discrete way now what they want to do here as they want to combine things and I forgot that question during the the break and I will comment on this more maybe we want to combine some sort of save behavior so my robot is doing something that I definitely I mean it's not hurting anyone it's not getting damaged and all that and on the top of that I want this to behave energyaware or whatever okay so I want to consider the worst case so what they do is time bombs I mean but there is some worst-case consideration and and there is some sort of optimization so in the sense that okay I want to minimize the expected cost but if I actually don't succeed entirely maybe I don't mind okay so this is somehow bordering with verification on the border of verification because I'm suggesting controller that is then maybe not the cheapest one okay and I don't even have any guarantee that it's close to a cheapest one again and then it's up to you whether you like it or not there are also techniques that guarantee you this but then there are of course slightly less efficient than those that where the guarantee is not there and you have to work more to get the guarantees so this was one of the early works where they do the following they take a sort of strategy iteration approach I would say so you take a strategy arbitrary strategy say a uniform at the beginning you simulate the strategy on your system and now this gives you different runs for different decisions because it was uniform meaning that it was randomly deciding between all the actions now some of the runs were really good like that means quick you quickly got on time where you want it to be and it was cheap and some of those were bad okay either incorrect or very expensive and then what you do is you look at the best runs so you've got the data now we take the day time you try to get another strategy improve your original strategy based on this data and get another strategy other than that is generating more of these good runs and how do you do that well then I mean there is a bunch of ways how to do that I mean you can use something they they use some logistic regression covariance approaches I mean they they also build trees for the for the strategy to kind of generalize it and try to generalize the runs into a strategy and see how it's good how well it's what's doing and then what you do is you repeat this again and again and again and when you run out of your time budget then you just say okay this is what I came up with and here you have a guarantee that I mean it's safe because you always chose only a strategy that does the safe stuff maybe so the after several iterations you get that but maybe you're not you're not cheap in the end in the limit of course you're converging you don't know how far you are so maybe this is a bit unsatisfactory so maybe what kind of one can also do other things so there are other words such as this one where the safety is really guaranteed so what is what are they aiming for they have it's called safety constraint reinforcement learning meaning that you have a hard constraint for the safety of your system you never want to do this or that and so what the naive way how to approach this is well take your system analyze what are the bad actions that you never really want to take because you would commit to something that results in a disaster later so we just pre compute this and on the rest you run reinforcement learning okay so this is the cheapest way how to combine verification and learning your run verification part of it and you learning on the rest and yeah well you again have guarantee that it's safe and you have I mean in the limit maybe you're getting optimal opportunity words but maybe you don't even know how how close you are to the optimum so it's still not so good then there is a very similar work to the one that that we seen in the in the first part of the of the lecture where basically this is what they're doing is they're not doing they're not analyzing reach ability in more precision processes but more complicated properties that you can encode with certain types of automata and I'm not going to go into that but so they are doing it in a slightly more on-the-fly fashion than what one would have to pre-compute with the older approach I mean making things more on the fly is of course something that pays off so this is something that that you try to that you try to do there is there also recent works on how to combine again rewards with but not only safety but maybe or generally Omega regular objectives there we can reap recompute and then run reforms meant learning on the rest because there is no simple representation of what is the pre computed state space with the safe actions it's more complicated than that and so you don't have for those of you who know that then you don't have any most permissive strategy and then you have to do then you have to combine interleaf the learning and the playing safe sort of and so but you can also do that and get the guarantees you can also apply monte carlo techniques these are I mean fancy because it's also buys you something then there is a lot of a lot of work on entirely different problems than those I have discussed sorry namely the application or what people are doing with with programs okay normal programs that we program or something nothing fancy no nonprobability is no synthesis of controllers just verification of programs then you want to say that okay there is I mean the program terminates or there's never division by zero and and so on and for that oftenly you need for the analysis you need to generate invariance invariants of cycles okay so this is known to be hard to come up with with with invariance and how automatically and then this is a very good place where we can apply learning what we can do is we can generate any nonsense and then check whether it's an invariant or not checking is easy getting good suggestions is hard and this is what learning is good at so again we wrap it into this safety envelope of well you get me or you give me your suggestions I process them I see if there is any sense in it and then if it is then I will output that and if not then I will give you some more feedback and you improve your suggestion so here for instance what you can do is you can run runs of the cycles and see what are the relationships between the variables how they are changing how they are not changing and come up with suggestion for an invariant once you have this suggestion for an invariant you check whether it's an invariant or not this is easy or it depends on how complicated the invariant is but I mean reasonably easy and if not then you actually you want to refine so maybe you want to you just don't want to give get more runs and then try once again you want to give some information back so what was the problem so this variable was was wrong so maybe then I really want to focus on that one and then try to run a few runs with different values for this particular variable to see how it behaves and so on [Music] right so you also know about automata learning so let me let me show you to two examples of usage of automata learning so one of them is for representing strategies so you can represent the strategy not only by a decision tree but of course by well whatever you wish some people think it might be a good idea to represent them using an automaton it's a traditional way of doing that in verification because it's very convenient computationally probably isn't very useful for understandability or to some extent but it's definitely it's worth representing strategies this way because you can then actually learn maybe directly the ultimate tasks will learn small representation of the strategies compact ones and actually that will help you also to generalize your learned knowledge about part of the system and maybe you will actually capture the whole strategy very soon because you know already know something about parts of the system and you say oh well if X is positive maybe in these cases I was always I should always pick a and then you haven't seen many of other states where X is also positive and you try playing a there and it actually works oh very good you came up with a winning strategy much much sooner than if you had to actually examine the rest of the state space in detail so you may try to learn automata froma from the runs and then and see if they generalize well you may also learn automata of systems that you may even know maybe you even have a white book system that you even know the code but you don't really have a good model of the system you have just the code so you run the system you run a lot of runs you have some intuition that I mean these things are important so maybe these are some high-level communication things and some other things are like low-level computation within modules so you ignore those so you learn a model on a particular level of abstraction that you you're interested in and then you can take this model and you can verify that model against your property that may be some communication protocol never gets stuck or whatever people have been doing that very successfully this doesn't give you the guarantee that the system is working actually very often you can this you can find box based on the learn small model and then you can check the model and you can find the box easily and you find them in an abstract way so it's easy to kind of correct them because you know that was the problem it's not like that here you get one obscure trace and you don't know what is happening but you get a trace in this abstract model and you see oh there is a bug in here and show us it's real box and then I mean there have been quite a few quite a few experiments with this and usually it's pretty successful when it's quite surprised how many in for instance I TCP implementations that are on the market they're all buggy I mean there is like manual that thick with all the requirements what must shoot and can happen in the protocol and what should not and must not and and then all these kind of requirements and then it's really hard to actually figure out what the requirements are and that's also maybe one of the reasons why the protocols are often wrong and also I mean it's a hard task right so it's so surprising that they are wrong in commercial things and this way you can learn the behavior from observing that check the model and see oh it's wrong here and there and it's good for bug finding okay you don't give getting guarantees that the actual model is correct because you just learn the model from the for what you've seen but still still useful and I also want to stress one application one or two applications of learning approaches where it really doesn't matter if it actually produces any reasonable result it can be complete nonsense most of the time because you don't really care about correctness here at all and that is for instance four theorem provers well four theorem provisions you definitely care about correctness and this is they are taking care of that and when they prove something you have the proof the trouble is which way to go so that the proof is constructed should I try this right try this rule that way should I try these hypotheses I mean this lemma that lemma this is hard and then for guidance learning is perfect you get some sort of experience from past examples you get some suggestions you have some sort of feelings you get whatever vague thing wrong thing from from from learning and this gives you some guidance try this first and maybe this will lead to a useful lemma maybe it will not lead to useful lemma but it will lead to a lemma you will always produce a correct result because the it's wrapped in the theorem prover okay it's just that the search the search in the the space of all possibilities of the proofs is guided somehow by something that has a chance to succeed better than then doing in randomly or I mean there is also a meta usage of all this so I mean maybe you have your model checker that already uses learning and then I have plenty of those and you have different benchmarks and then you can see it in the competition some tools are kind of better for some benchmarks and some are better for other for other benchmarks sometimes it's easy to see that right someone some tools cannot cope well with concurrency okay so you know that sometimes it's just like in what the hell is the difference between these two programs why does it work here what does it not work here and I have no idea and then you'll learn it so you say you learn control flow graphs or whatever you classify it maybe you don't even have an explanation why it is the case but you have a suggestion while you should use that tool and then of usually I mean this is not so surprising once you learn it then we don't say half of the half of the the samples and then you get this meta model checker that just first analyzes the task picks the right model checker to be used and and runs that one and this one I mean if you apply to this to the other half of the input data then this wins the competition okay although you're not already doing anything just like trying to figure out what is matching which benchmark okay to summary slides so what we have seen is using learning for verification as a sort of heuristic to help us enhance the verification so we have our verification objectives and we have troubles and we used learning to to get rid of the trouble so to improve scalability to improve explained ability of the results which is something that is like a huge thing for verification but no one really knew how to address that and people we're largely ignoring it but I mean that's one of the hurdles that prevents verification from being accepted by more practically oriented people because they are not making sense out of the results and and of course scalability for for practically large systems is also important so one of the examples was how to speed up things using reinforcement learning and the general idea was try to identify what is the important parts of the state space and there do the computation in the other one when we try to create small and readable strategies we use this decision tree learning but for the same meta goal try to identify what are the important parts that you should represent and then we use the the decision trees so it's always about this communication between the two and I mean in the first case when we have this value iteration over here and the reinforcement over here value iteration is slow I mean it's maybe it's not it's not giving you entirely correct results in the default forms we have to have these upper bounds and lower bounds then it's correct but then it's it's slow reinforcement learning is not guaranteed to give you like the the current error usually it converges but you don't know how fast you really know about the results and actually oh and that's what I'm not actually a reinforcement learning person but what the learning people told me is well it's actually also quite slow but wait if you put them together in the new setup where I mean you have the model this is something that this guy didn't have and to this guy you get the information from this one and you return the information back then you actually get something that is guaranteed to work and is much faster so it can give you more than more than just a pure sum of the two and lastly to to actually thinking about what is what are we doing here I mean I said at some point of time this is the paper where they return a safe controller but it may not be the cheapest one then it's always the question of your particular application do I need hard guarantees for this or not do I want to pay for this how much do we want to compromise between scalability and correctness how much of it is a verification one question how much of it is useful for your particular application is more important and Weston is maybe do we have to compromise at all well I mean for in some cases we didn't have to I mean if the invariant generation for instance it's it's I mean the result is correct right I mean with with a strategy of presentation the BRT DP we knew how far off we are okay so is Absalom tomoow controller good enough for your problem if knowing this episode is it is it I mean how far you are from the optimism portent for you or not are you fine with arbitrary controllers I mean is there is the alphago good enough well it depends on your application right I mean if you you wouldn't bet for your life on forego winning a match but I mean it's good for fun right watching it so it really depends on the application and so I mean from the verification perspective I try to take the approach to present the pars that are mostly focusing on not sacrificing guarantees so it's still verification but you are still improving your scalability and usability and this kind of usage of learning as an Oracle to give you hints suggestions advices that you take and then you process them your own old-fashioned way this seems to be safe and enough maybe in some cases probably approximately correct results are totally fine because anyway and that was the question where do the probabilities come from well it's just experience anyway so we're not sure it's just again probably approximately correct all the way from the start maybe or problem so then one should always bear in mind and what would your what your application area is and yeah so I mean I definitely don't want to don't want to use learning to create strategies that drive our cars although I mean it's it's very hot topic on this machine learning end of spectrum but hopefully I advocated that I mean the use of learning and verification is still pretty cool topic as well so you can you can use verification with the help of learning in a safe way for those of you who are interested in this topic a bit more there have been already two two issues of this event learning and verification to the workshop where you're all cordially invited takes place together with etabs which most of you probably know so many people are coming and from from broad verification area and and yeah this year we had also quite a few nice invited talks and next next year is going to happen in in Prague so if you like old fashioned cities like this one then I mean Prague is probably the one to be chosen in Europe or at least in Central Europe and with that I want to thank you for attention and I'm ready to take questions thank you repeat the question this question is asking our models are often compositional so it's composed of MVPs are composed of many components how does this help me to understand the interaction of the components so for instance one of the things that you can do is when you look at the tree then the behavior may depend on some third component but if you want to kind of ignore this for a moment you can kind of ignore this node and get a subtree that is actually not capturing all the information but it's capturing the relevant information concerning the components that you are interested in so you keep the variables that are appearing in those relevant modules and the actions that are relevant to their communication and this gives you a subtree for that particular sub composition totally so I mean if you have if you have spaghetti a thing in the beginning that no one understands your chances of recovering the deep information down there are lower so then still maybe there is a chance that you actually discover something that no one sees in the messy thing because I mean you have the computational power but of course the more structured problem and the more structured system you have in the beginning the more stress you can expect on the output so in that perspective of course it's not Silver Bullet that from now on we don't have to do any annotations in the code because we can all synthesize the meaning out of just the pure code rather the trend is to actually take annotations as part of the code and learn also from that because I mean the naming of the variables for instance this is something that is from the verification perspective it's totally irrelevant it's x and y no other namings of the variables I mean it's like long strings this is describing that something already happened and maybe you were still I mean someone else is doing the same thing and then it's obvious that these two variables should have some sort of communication there will be some relationship between the two so you could you could be interested more in predicates relating these two rather than this one and and something in a completely different part of the system right so then this is partially happening I mean the there is there is work microsoft say who are looking a lot at what the names of the variables are how does this affect their usage and and using things from from that so this is again once again very important that you have this safety and envelope that even if you think like oh this one and that one could have something in common even if they don't it's not you're not in hazard right or you're not risking anything so yes this is this would be a desirable enhancement for for these systems well ok so speaking about space shuttles ok so if you want so you cannot really verify space shuttle well what we can definitely do is for instance the following has been happening so you have these satellites running around the orbit and then basically what they have to do is they have to change the position of of the what is the stuff that catches the the the race of the Sun to get the energy okay so the the solar panel is basically it's different yeah it's the basic result and now they have to rotate in their in some way and then depending on how much energy you get then you can perform this task of that task then you are on the other side of the planet you don't get any energy you want to perform some tasks but you don't want to run off out of the energy too much so you have to make sure that you're not getting into the dangerous region but still you want to do as much as possible so this kind of stuff like very fine that if your battery model is like this and I mean you have done the measurements that I mean the parameter is kind of fit and you have the model of the battery and then you have the model of how much this kind of censoring and how much sending this message is it's costly and then you have all this information you have big model then you build a safe system that doesn't run out of battery and or tries to optimize how much you perform per minute on average no matter how many tasks or prioritize the task and all that so this is something you can easily do and then this just is the scheduling of the activity there and checking the verifying basically the battery and then I mean maybe you can ask a different question like well something else doesn't break there so you can design a fault tree for the components that are in there and then why is that one so I mean it's it's never like that okay this is my space shuttle and the question is is it safe and you push a button and the answer is yes it's definitely like you specify the particular problems typically try to look at certain domain so I mean just the scheduling and you kind of ignore there could be actually a meteorite flying next to your satellite and ripping away one of the one of the solar panels and now we have less less energy or whatever you ignore all that right and you analyze it somewhere else so yeah you have to decompose your problem you have to decompose the property and then you try to verify as much as you can but certainly it's it's it's hard and we need help in terms of scalability as well as usability that is the whole point
Info
Channel: Federated Logic Conference FLoC 2018
Views: 147
Rating: 0 out of 5
Keywords:
Id: PAFj45yD3S4
Channel Id: undefined
Length: 91min 42sec (5502 seconds)
Published: Mon Jul 30 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.