Nina Gierasimczuk: Learning and Epistemic Modal Logic I

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so indeed I would like to connect learning theory with logic in this lecture and the connection will be in the beginning quite standard so I will look at classical first-order logic art medical care connections but later I will dwell something a little bit more controversial my namely I connect formal learning theory with epistemic logic and yesterday I didn't and many people ask me what is model epistemic logics so it seems like this a keyword that is not really very well-known in the community which is great because this means that I can give you a very basic introduction which makes things easier for me and there will be a lot of funny examples and small toy examples so this is the plan for the first part and in the second part what I will do is that I will try to make this connection between those two fields explicit mostly through actually talking about my own work in connecting learning theory with topology in the context of model logic model logic is also given to for logical interpretations and this is where we think in this research program that some interesting abstract connections happen this material is very abstract it's quite symbolic in nature so I will be not I I won't talk about probabilities at all but you should not think that it has nothing to do with probabilities because there are extensions of this work that work go this direction and another thing is that this talk will not be also very applied so I will not show any particular implementations I will not talk about any concrete problems in real world but you should also know that this topic connects to robotics and how robot can actually interpret certain states or epistemic states and knowledge of other agents so I would like to start with well does the come for the first class an example a long one then we will talk about inductive inference paradigms again probably you heard a lot about this thing already through the school I actually couldn't come earlier so I I have no clue what what they told you I hope it will be consistent with what I am saying but yeah I will give you a short overview and then in the end I will introduce a little bit of epistemic logic how many of you are familiar with the game allows it one person right and the organizer okay so the game goes as follows we have normal playing cards like the ones you use in poker or bridge it's just that we have this funny mathematical admission that we have unlimited number of them right so it's not one deck but it's just as many decks as you want and now the question behind the question that I'm asking you is what is the rule behind this sequence of cards and I will show you the sequence step by step and you as a learning algorithm OSA or a learning process each at each step are supposed to come up with a hypothesis and you are allowed to change your mind of course right so let's start first first discovery of your scientist you saw Ace of Spades what is the rule behind the sequence of those yeah very conservative choice right so here we have a quiet timid conservative learner only Isis of spades okay yello-ha nice fights again why why wouldn't be a good idea to say holy space so far is there any operational reason why would it be a good choice the hypothesis yes the shortest maybe the simplest right there's some sort of an idea behind like if you think of your intuitions there's some sort of simplicity order on hypothesis implicitly right so you would think that perhaps those hypothesis will be easier in some way or more obvious and this is what lies behind the idea of Occam's razor that very often we want to pick simplest hypothesis over more complex one in order to revise our beliefs later and actually the whole of this lecture will be about that waiting what do we do now you stick to the previous one yeah so you are again a conservative learner consistent and conservative so you only would change your mind if there is evidence that contradicts your previous data right that's it's a bit boring isn't it but since seems reliable ok and now hmm yeah exactly right so now we are looking at the sequence and perhaps we want to have some sort of temporal order to the sequence right that would be another hypothesis there is expressive or natural language so you could actually say something like that right so I will have ace Queen three ace Queen three and exactly yes yes so we have all sorts of cognitive biases as learners which obviously in the context of logic doesn't make much difference because we don't like cognitive biases in logic but if we think about artificial intelligence for instance implementing this sort of things on in the robot they're supposed to mimic human behavior then perhaps we would like to say something about it what is this attraction to certain types of concepts rather we see patterns everywhere we try to extract them ok but let's see what happens next yes and now we are even more confirmed right and perhaps even some of us will just drop the game at this point and say yeah I know what it is again confirmation bias perhaps right and now any ideas outline I won't even dignify this with it now we go on and on and yeah if you are a good learner or you would like to win this game then the idea would be that after some finite time you would stabilize on a correct answer right obviously in such a setting you can never know perhaps there will be like this devil situation we just screw up your hypothesis at any point later in time but so the game is called allowances and it's actually quite playable I played it in some summer schools with the students we can also do it here if we have cards it's extremely frustrating game but for academics is great over the parents so now let us try to analyze this game a little bit more so assume as I said before that we actually indeed have at our disposal unlimited amount of playing cards right so it's not just one deck but unlimited number questions to you how many different kinds of playing cards do we have that's a question that you should know from background knowledge how many cards are there in a deck the fifty-two not 54 52 50 how many different beginnings of this sequences of length one for no fifty how many different beginnings of lento how much but five thanks yeah quite a lot already right fifty to spare how many different infinite sequences how much yeah infinitely many any ideas about the kind of infinity that we are talking about here yeah so is there a problem with this obviously there is because if you would like to treat learning just symbolically what you would perhaps require is that you can identify some rule for any infinite sequence but since there are uncountably many infinite sequences you can't even list those hypotheses efficiently right so you can't have an algorithm that would actually retrieve enough information for this let us see for those of you that don't know what uncomfortability is I would like to just show you quickly what what's the idea of the proof by a contest a organization would be so assume that we actually can list all infinite sequences so the learner can actually with super hyper precision now which sequence we are talking about they are numbered they're infinitely many as many as natural numbers right still the algorithm could search through them step by step we recursively enumerable process and now what we want to do is construct a sequence that will not appear in this list so the assumption is this is a full list natural numbers full list of all possible sequences now we want one that does not appear any ideas how we would go about doing that some hats so what we want to do is simply replace build a new sequence which will differ from all of them and from the first one we'll it will define the first place so instead of this we will put whatever else then on the second place it will differ from this card and so on and of course in the limit we will obtain a sequence there is different from all of those they exists it's a valid sequence so this is roughly why many of my approaches and also many of the approaches on the market to this abstract symbolic representation of learning restrict the attention to countable cases right so what we would like to have is the learning framework which deals with potentially uncountably many cases but the hypothesis space is countable right so by necessity we will have to merge some of those sequences together under one hypothesis so let us talk about these hypothesis spaces right so in principle if we wanted to be supportive precise we would want to describe each of these sequences separately as with some sort of language so in principle we would have uncountably many rows but if you put any condition on them for instance in this game of allows this the condition that is in the rows is that the rule is supposed to be written on a piece of paper so it has to be actually somewhere hidden right so that god the person who comes up with a sequence cannot cheat another condition would be that the rule has to be expressed with a natural language sentence or with a 300 pages book as it is with some theory scientific theories right so that are supposed to describe certain phenomena and in those cases obviously we already do stick the hypothesis space to countably many cases because they are expressed in natural language you can lexicographically order them they're finite right so it is the case that then hypothesis spaces will country finally what is a mostly done in computational learning theory is that this role that is behind those cards is supposed to be encoded by a Turing machine program right which is kind of the same as having three hundred pages book I mean it's just a matter of having a program there's finite and generates this infinite sequence or tests if it actually belongs to them so if the descriptions are finite there are only countably many of them are there any questions so far so now let us compare the evidence of the sequences of information the environments that we get from cards and this hypothesis spaces so how many sequences comply to this one rule the sequence has solely a space cards the first one that we had yeah one just one right it's a great hypothesis it's very easy to falsify right it's just one possible scenario the sequence has solely space cards uncountably many with the same argument right so you see I mean the variety here between those hypothesis is quite large the sequence has hard cards on even places again right I mean the sequence is definable in first order logic right that's another question that we might ask right so the point here I'm not going to answer this question but what what the point is here that the language that you use for the hypothesis spaces will cluster your evidence in certain ways and will make the process of learning easier or harder depending what a hypothesis space is it's really crucial to know what the hypothesis space is for those of you that are familiar with version space learning it has very much to do with with this way of thinking when which hypotheses are eliminated by certain evidence ok and then we can have different hypothesis space some of them will be friend here some of them will be really awful so for instance the first hypothesis space will consist of two rules all cards are spades and all cards are diamonds so those are the only two possibilities is it a nice hypothesis space very incomplete but perhaps this is the question that we're interested in right this is the research question right it's either this or that so it just divides our space in those two so let us think in terms of the inductive process how soon would we know assuming that we get the truthful sequence complying to one of those hypotheses first how soon will we know which of those it is straightaway so the first piece of information will allow you to know where you are and know with certainty so after receiving this information you'll know for sure where you are you're done the second hypothesis space pates at the fourth position or not Speights at the fourth position right so this is a decision problem I mean decidability problem truth or false right of a certain hypothesis how soon do we know for step right okay so now a bit more complex now that we have an infinite hypothesis space that consists of the following formulas statements exactly n cards are hearts and here we have such a hypothesis for every natural number okay yes but we could perhaps try to relax the condition of this side ability a little bit can you think of a way to kind of handle this problem a little bit more practically perhaps if you are we are playing this game a string we're playing this game and this is the high intelligence and then I am showing you a sequence of cards would you have a way to deal with this as as me or as you like now the young is not fixed I mean I am I have it fixed in the background so the sex the the sequence will comply to this fixed number yes but you don't know exactly unheard but you don't know which and you're supposed so here the idea would be that we basically look at the sequence of cards and count the hearts right and in each step you're allowed to give a hypothesis so let's say it's you see the first when you say okay my hypothesis is exactly one card is hearts and then you sit back relax and wait for the next heart to happen the next heart happens and you say oh I want to change my mind and I want to do another conjecture which is exactly two cards and so on and since you know that I had to fix an N in the background you are sure that after finite number of steps you will converge to the right hypothesis right you don't know when this happens but the method of learning is reliable yeah in this case in the practical version of the game you actually can play in a pub there is some limit to it of course right so after a long 100 iterations the idea is that you win but in general mathematically speaking there is no requirement of of the end for this so where I send my decidable yes so now the idea of identifiability in the limit is exactly that that you have a reliable learning method that you can really fall back on given the hypothesis space but you can't really be 100% sure and those are like different types of knowledge one is the knowledge that gives you certainty of the states of affairs like the first two examples after some sample you know for sure which hypothesis is true in the third case this is what we would call limiting knowledge or perhaps safe belief something that makes the conjecture safe after a while it will never change after a finite amount of steps finally there is a border there so you can imagine that this would be something more difficult now we take the previous hypothesis space plus one more hypothesis which says infinitely many cards are hearts how about this one can we handle this no we can't because for any finite hypothesis any finite initial segment of the data stream we will have one hypothesis of those being consistent but also this one will be consistent because you never know what will happen in the longer later on I will show you how to prove such things on infinite sequences but the idea is that here between those two there lies this border between any learnability you can think of in terms of symbolic learning and the unknown unlearning of structures it is clear why this hypothesis who understands why this hypothesis by system ok good enough like once again if we are happy well you could still try to define a language so you could for instance use disjunction in your logical language that says that makes it into one hypothesis any finite number of hypotheses will do but in the case of this particular framework we want to pick one hypothesis at each step that's the condition of the learner right but obviously abstractly speaking this restriction so it's just an example so now what is this learning game it's a game between a learner and nature we have a class of possible worlds possible realities so possible enumerations of cards or possible types of Turing machines possible types of physical phenomena and then the idea is that nature chooses one of those to be the actual one so a range of possibilities one of them is this special type and nature generates data about the world and in our framework it generated truthful information right so there was no noise no errors no nothing except for this one outlier suggestion perhaps it's a bit trickier but the idea is that nature gives us evidence of what it's chose and from this inductively given data learner draws her conjectures so step by step of a growing amount of evidence more and more information is available and it's being clustered into a hypothesis space and this hypothesis you can think of from the perspective of the actual sequence of evidence so it can be kind of constructively given from there but you can also think from the level of hypothesis space and ask yourself a question is it decidable which hypothesis in question among the available ones is consistent with our yes and the assumption here is that with each input learner can answer with the different hypothesis and now here this condition when is the learner successful and this is something that is probably after this school you you have a lot of doubts about it what does it mean to learn something right so we have so many different paradigms for learning reinforcement learning of well here our networks nowadays we have all sorts of programming paradigms and what is kind of obscured very often is what would be the satisfactory condition that the learning actually happened so especially in the proximate learning what you often have is some sort of empirical criteria that you try to satisfy that the agent or a simulated agent behaves in a good enough way in the environment here in the abstract context we would have two most interesting conditions so one would be that in finitely many steps you know for sure which hypothesis space is the one of interest and in the limiting case after finitely many steps you converge to the right hypothesis right and then you don't change your mind and then learning is still successful because the learning method is reliable so I would say that the learner succeeds if she stabilizes to a correct hypothesis over time right and now the connection of the the notion of stability is something stabilized stabilizing is something that you can choose I mean there are many different options so obviously as we saw from our allows this game the success of the learner depends very much on her skills so how good the learner is what type of procedure it has in the background but also on the problem definition right on the hypothesis space there will be two things so if you want to treat it in more detail mathematically you would say that the actual and the step at which the answer is correct is not computable the moment at which you would convert is not computable however the learner is often assumed to be a Turing machine which it elites that actually makes a computation and tries to come up with a reasonable hypothesis and it keeps on going and this is a notion of learning that it's known since the 50s and I will give you a little bit of intuitions behind it any other questions at this point probabilistic learning time to be in the first place some of the teachers probably might disagree with me here but it came from this type of discoveries that there is certain wiggle space between knowing for sure after finite amount of time and the limiting version of learning right and then you basically are supposed to add certain parameters about which you agree you already are willing to like probably approximately correct learning for instance but learning right it you just have to have a certain threshold which you're willing to agree on the hypothesis right so it's a way to to cope with the complexity of limiting inference but I'm not gonna talk about probably I'm very sorry I know that some people are disappointed okay so various inference paradigms and here just a short run through different possibilities so this is how I usually think of learning paradigm so you have several criteria that you have to just fit in this list so for instance you can think that your algorithm or you as a researcher are interested in learning functions so you want to know there's certain phenomena you have some assumption that it behaves like a function and you want to know what kind of function this is right so you wanna make perhaps infer a Turing machine in this of it or so we assume the possibility is to be functions then hypotheses are some names of functions perhaps indeed indices of Turing machine or perhaps we want to have some fragment of mathematics in which we can express this functions now information accessible to the learner would then be a sequence of pass argument value because this is what function does it takes what argument and goes into some sort of value so you infer from those pass you infer from the graph of the function which function in question in which each function is being learned now the they learn there is a function that takes a sequence so finite but progressively growing finite sequence and outputs a hypothesis from the hypothesis space and the success criterion will be defined in the following way so after finite number of outputs we stabilize to a correct answer this is what it will be function learning we can also think of model theoretic realities we don't have to think about functions we can also think that there is some sort of model we want to learn what the reality is like with objects being in relations with each other let's say a robot in the room tries to figure out what is the actual description of the of the situation question and this is a work of osho Sun mostly elements of scientific inquiry if you would like to have a look at it model theoretic learning assumes that possibility IDs are actually models logical models of given signature how many of you knows what a signature of the model is on this side only that's interesting left brain right brain so the signatures basically specifies in a certain type of logical language what are we allowed to talk about within a model so what kind of objects you have you have certain entities but you also have relations functions and then the language is supposed to talk about the structures now the hypothesis space is in such cases are usually first-order sentences so a little bit like the ones that I actually gave like exactly and cards our hearts those would be first-order definable and information accessible to the learner is not just like two numbers like in the case of function but it's actually sequences of atomic formulas so we have grounded atoms so some we talk about certain objects in the reality and we say that relations are functions hold of them on the atomic level and then this is information given about the world now the function the learner is a function that takes sequences of such data and outputs a hypothesis and after finite a number of outputs we stabilize to a correct this is something that is alarmingly close to model checking right because we are just trying to figure out whether certain first-order formula will describe them all and finally something that will be of interest to us is the most basic way of thinking about this inductive inference which is via sets set theory right so we now assume that there is no more structure to the actual realities than just the bunch of stuff that is in the world a bunch of facts a bunch of sequences of natural language and so on so possible realities would be sets of integers hypothesis would be names of sets information accessible to there are sequences or infinite streams of numbers and now learn there is a function that takes a sequence and outputs and hypothesis again after finite steps and finitely many steps of our inquiry we stabilize to a correct answer I'm repeating this last condition many times because I would like it to be tattooed so let's so consider some examples learning sets so let us consider the following class we have this hypothesis space the hypothesis space tells you that we basically have it's possible that the sequence that you will see conforms to is necessary it confirms to exactly one of those and each of them says all natural numbers except one so the first hypothesis says all natural numbers except for the second all natural numbers except two all natural numbers except three and so on is that clear so we have natural numbers without one singlet and all of such so let's let's try to see if we can play it so I'm giving you the sequence now [Music] yeah I mean yeah so thank you very much usually this answer comes very much later in the talk but let's see what happens so you saw one if you thought in terms of divergence based learning or Illumina eliminative learning this evidence allows you to eliminate how many hypotheses change space just one right so you eliminate this and you're still nowhere right I mean you have infinitely many of those at your disposal you could claim that the line I can just somehow describe all this complexity but the goal is to actually figure out now here the answer that was given to this question is basically to come up with a method that perhaps will make you very slow in learning but it will be a conservative way of changing your mind from one hypothesis to the other following some sort of order of simplicity on hypothesis and this order of simplicity is very problem-specific right so in this case it will be so just to be to illustrate what would you say now after three everything except to everything except two yes so now we again after this step we are looking for the smallest number the minimum number that does not appear in the sequence right and you can prove I suppose I will even show you a proof of this that after that you can actually rely on this method in in the long run to identify yes this is again this assumption that the sequences the sequences are sound and complete with respect to the hypothesis in question and again this notion soundness and completeness you probably know it from logic right but it would be a little bit messy because you would have to define for your algorithmic procedure a way that you recover so if you now say everything except 115 right then what happens if you had some point C 150 what do you do yeah but stop with this probabilities okay I mean we are not talking about that we are talking about symbolic methods of course I mean then you can show certain it's a way to cope with this problem right but well list all hypotheses that are still possible yes it's a way to do that it would take some time for the learner to list after the first step so if you were waiting until the procedure of listing finishes then but then the problem will resurface on the level of more complex hypotheses I mean this is an abstract problem of actually having a hypothesis space and evidence and they don't match or they match in certain ways so what I will try to do is to talk in the second part about how to construct those simplicity orders what can illogical but indeed your doubts are kind of steps forward that have been actually taken after you mean that the learner can be proactive in terms of asking questions yes and actually this version of the game of allows this game with the cards that I showed you is the first version there is extremely hard to play in the new version is called the new allow C's invented by mater shock I think the idea there is that the players are actually making experiments so there's thinking I know the hypothesis and now I will play this card in order to either make myself even more convinced which many people actually don't so they are kind of playing cards that conform to their hypothesis and they go nowhere with it or you try to falsify the role with a certain and indeed it changes but here you can think a little bit in terms of positive and negative data so learning from full information okay this goes on okay so a couple of questions about this previous game we actually covered this already with this answer so the level of confidence in this learning each time that you actually make a hypothesis the level of confidence is kind of tricky right because it's a it's a temporary hypothesis something that you are willing to change but it's still essential to make it because the learning is defined in a way that what matters is the sequence of hypotheses that switch right not the actual time point what would make us change our guess while we answer that what is the guessing rule the guessing rule in this case was somehow driven by the order of hypothesis that was underneath of the learning problem now how do you like the winning condition of this game that says at least one of your guesses is correct actually yes but even more than that so if this were our learning condition you are fine if you excuse me yeah so the most trivial situation that you actually just go through the sequence of rules you don't care right of course one of them will be correct at some point right so that's why we want to consider a bit more complex condition now if you succeed to make a right guess and never change your mind after that how many wrong guesses could you make under this condition in the previous scenario with yeah so this is actually something that is called a mind change complexity in formal learning theory in which what we measure as a complexity of learning is how many times the learner has to change their mind until they convert the right hypothesis right and it's it's a measure that is much nicer than measuring the size of the sample because in such frameworks you never know where the evidence will pop up right simple evidence will come like very late okay so the last part of this game assume that I will give you all and only truthful clothes as so far what would be the guessing rule to win according to the last winning condition ah this will read it you're too fast for for this list now what we want to do in the second point we want to add all natural numbers to our hypothesis space is the guessing role that was proposed still good for learning in the limit yes so if we go back to this example here we have it inductively given growing number of points and now what we get is that we keep on switching looking for the minimal that is missing in the sequence right but there will be always a minimal that will be missing and it's not the case that there will be minimal that is missing for this additional hypothesis that takes into account all natural numbers so we will never according to this guessing rule you will never get out of of this sequence now while keeping this hypothesis again the hypothesis of natural numbers assumed that what I guarantee to you is that my sequences will come in an increasing order so the evidence will come would you win this game now so what changed now is the postulated simplicity order underneath of the hypothesis because what the learner should do is to start with a full set of natural numbers by which first of all is the biggest set among those well yeah depending how you think of it second of all it's very much learning driven so it's driven by the procedure that is learning so what you can see is that the simplicity order of hypothesis very much dependent on the learner on the hypothesis space right those are two things that that matter and the structure of the evidence and then the second part of the lecture will try to make this thing specific and try to come up with objective criteria for the simplicity in logic so if you would like to read and you're interested in logic and I know that some of you actually are at least I hope my students sitting there are interested in the subject it's a threat then you should perhaps do some digging in the library and look at the papers from the 60s and hear the name of Hilary Putnam would pop up trial-and-error predicate and the solution to the problem of most country most off scheme and actually this realized to this decidability semi decidability controversy that we had here so what is the right notion of decidability does this concept of identification then let me give you a little bit more insight into higher-level obviously did you hear about marker code in this school no yes one person heard yeah okay yeah I will show those results now so actually it is quite interesting that gold does not appear in such contexts because this was the first attempt his work became remained hidden he worked for RAND Corporation the first paper that he published on limiting recursion was published in 65 at the same time as Hillary's putnams paper about only on the 67 he postulated his language identification in the limit and this was strictly in response to Chomsky's ideas about formalizing a natural language with grammars and what Mark got proposed is that we can think of the reversed Chomsky paradigm so instead of thinking that grammars generate the proper sequence to see in a certain language what we want is we want to look at the sentences that appear in natural language and come up with the grammars that generated them in the first place which is basically what inductive inference is like and of course race I Solomon often hear this is something that is probably known to many of you this work how many of you know this three again first paper on inductive inference and all of those guys they they knew logic all of those guys drew their as from decidability semi decidability differences how does it work so a predicate yeah yeah so yeah so our assumption would be first of all that one of them has to appear because the sequence is consistent with one of the hypothesis if in our hypothesis we would have all Co finite set so sets for which the complement is finite then it would be a different algorithm but then we still can enumerate them so okey doke so now we go to decidability so what are those trial and error predicates in logic and this is the most classical connection to logic that you can make like the original way of thinking about this learning paradigms so a predicate set P is decidable if there is an effective procedure and here by effective computable such that X of P so a certain objects belongs to the set or satisfies this predicate if and only if this computable function gives you one right computable and not P if it gives you zero that's something that you all as adepts of model checking and all sorts of things know very well how this stuff works so now the question that those guys ask themselves what happens if we modify this condition by allowing Phi to change her mind any finite number of times right so she's what then she says I know zero right and then she says oh maybe one yeah yeah well it would be decidable if we knew that the actual number of times because we would know when they went to take the stirring machine seriously yeah you mean that we have an Oracle yes so it is semi decidable in this yeah exactly so this is this really looks relaxing it might seem crazy condition but not as crazy as it could be I mean we could imagine crazier I mean it seems like many at least cognitive phenomena they work this way there is some cognitive computer computable program and it just keeps adjusting over time right and and we're fine with that we don't want to have like a definite answer straight away we just want some sort of reliability conditions that are satisfied so this is how you formally spell it out so P is a trial-and-error predicate if there is a Turing machine Phi such that X P of X if and only if there exists a point natural number serves it for all points after that the machine in this steps answers 1 and otherwise it answers 0 so the idea is from some finite point on we just stabilized to the correct answer right and here what you can see is that we have just those two conditions zeros and ones but this is not what we've done so far now we before we didn't only answer 0 1 we answered with many possible - so our outputs were natural numbers we could answer with any English of a Turing machine right so it's an extension nothing and there you can actually you can flip effective procedure recursive function during machine right so all those things classify under under this particular questions at this point so now clima stops charismatic alkie are key and for those of you that know a little bit about apology this would also go all the way to borrow how many of you know barrel jerky some I'm not going to go into that in this talk but here the idea is that you can talk about ranges of arithmetical or topological complexity thinking of logical formulas that are essential to express them and so here at the bottom we have a condition that the status or a predicate is computable then here we have that there exist a point such that the answer is being even so it's existential condition then we have something that is expressible both by existential and universal condition and here we have actually the trial and error predicate and we have also Sigma 0 2 which is often associated with learning processors the quantifier prefix in the definition of trial and error predicate indicates that the place of our it medical care key is here and if you consider for learning here this is the quantifier prefix exists for all right Sigma 0 so in general what I will do in the second part of this lecture is I will both talk about this binary case in which you have decidability but also a more refined cases in which you you as a learner are asked to say exactly which reality you think we have not only answer some general question but also hypothesis why this sigma0 twist it's because if you think of a dual condition so the condition the positive condition will be there exists a point after which I know which hypothesis holds right but if it doesn't mean that you have to know that it doesn't so you might not be able to exclude the hypothesis but you might be able to learn it right as we had in this case of when each information point had just eliminated just one hypothesis so there's this discrepancy between but I hope you will I will talk about this a little bit more like let's see where we are some more intuitions those are this slide is for your convenience so it's basically introducing the basic notation that is used in this literature we talked about streams of information they're often called texts because if you think of not language learning it will be the text that you're learning from so sequences of of data that come from from the language and it or not most of the time it will enumerate all and only the elements of the set right so there will be no noise no errors it will relax this conditions for me it starts at zero most of the time I know well so I have many characters in my papers and sometimes I paste the definition from one paper sometimes from another and then some of them were like zeros some of them don't normally I would include zero the problem is that then you have to cope with minus ones and yeah sometimes the parameters become a little bit obscured when you read the paper so yeah that's why there is this sometimes my slides might be incorrect in this respect sorry about it but here you see I'm starting with zero yeah so a set is a family of sets that we are learning we have a stream of observation from one of those sets any of those sets and any stream so any order and we will use the following notation so T and now will stand for T yeah exactly and element of T then this will be the initial sequence outcome and minus one and the content will be something that you you just remove the order from the elements you just think about the content of the sequence without thinking of the enumeration and the learn next sequences of numbers into a number this is what I call a learning fashion but the real reason is that those two ends they have different ontological status this end is an end that talks about the content of the set this an is something like an inducer vectoring machine right but what is nice about this number theoretical paradigms is that we can actually do everything with natural numbers it's just a matter of keeping in mind a certain encoding in in question for those of you that don't know about coding I'm sorry but we won't talk about it but you should trust me that it's possible to to assign numbers to those functions and just treat them as natural numbers so now the formal condition identify a beat in the limit we'll give you the following sequence of conditions so we have a learning function as before and now we say that the learning function identifies a set out of this space in the limit on a6 certain stream if for ko finitely many amps for ko finitely many steps it actually outputs the English of the set so it points to the right one right now it identifies the set independently of the stream if it identifies the set on every enumeration for this set so this is the condition that you know there's no cheating in the learning process if we just require that the learning function will just stabilize on one sequence of evidence one particular sequence you could call the answer in this right so there could be so here the idea is that independently of what will be the order of presentation you are actually converge to the right hypothesis it identifies the whole class in the limit if it identifies in the limit every element from this class so now we can talk also about the identifiability of the whole hypothesis space instead of talking about identifier video and now as is identifiable in the limit if some learning function identifies as in the limit so it's an existential addition again it has to exist a function that learns it so this is what I mean by learn ability or identify a beat in the limit a sequence of collision this is like really bullet proof right I mean any enumeration and he said and from a certain type of glasses any questions about this yeah Alco finite is a yeah just finitely many guesses are wrong which is the same as saying that after a finite number of steps we will have the right answer right because if you just think in terms of magnitudes here it's just that finitely many are wrong okay so some examples of learn about classes we have a class that contains all finite initial segments of natural numbers right so finite sets one one two one two three one two three four and so on now we can claim control the discus is identifiable in the limit by simply giving this learning function and this learning function says it's defined in the way that we expected it to be it takes a sequence finite sequence of natural numbers and it outputs the maximal element from the content that it saw so far right so as a result it will point to the right encoding at the right time then it will of course change its mind indefinitely perhaps but at some point if it's the right learning problem it will stop it will stablish questions to this how many of you can parse this definition so it depends on the way that you called the hypothesis oh the I it's is can you think you can think of it as an ordering but you can also think of it as calling that you called the sets with natural numbers right in this case actually it's even easier because since those are finite sets you could just answer answer with the whole set computer bleep right you could just say this is the set in question the answer of what sorry yeah yeah yeah but what we are saying here yeah this will be the result this is what is this cells right so s1 is identifiable in the limit by this function and this fact the so this function then it identifies this in the limit means that at some point this will be the right index right so it's included in the definition so now this I want for those of you who don't want coffee in the break perhaps you can go through through this argument over over the break so now what we have is problem in which we have finite the same class as before but on top of it we have all natural numbers I will go through it so to show that this is the case let's assume that there is a function a learner that identifies this new class well we want to show that it's not identifiable in the limit so now we will construct a sequence an infinite sequence of observations on which this function whatever this function is it will fail right it's a little bit it's one of those mathematical arguments that sometimes make your head spin and let's see how it does so our sequence starts by a numerating and in order 0 1 2 and so on if at a number K our learner who is supposed to be a good identify a nice learner that knows what they are doing decides that it's a 0 like whichever hypothesis this starts repeating K indefinitely and this means that T is a test for for text for SK as soon as v decided as sky we continue with k plus 1 k plus 2 and so on so it will become the text for a 0 and so on so what we are basically doing is we are using the assumption of the existence of the Kouta learner in order to construct the text on which this learner will always buy so as soon as it thinks that it's a finite set we will tell we will keep feeding him more and more and as long as is okay I am give up this is the infinite set then we stop at that point we don't give them anything else so this is what is known as Gauss theorem and the proof of no not mathematically speaking I mean this is a non-constructive proof right I mean we are not using a particular learning method here it's that you don't have to point to the light right learner you just saying that whichever learned I would the profit self does not have to be computable right it's not like we expected Turing machine to make this proof yeah if he doesn't do that then he fails to identify the hypothesis which does not which is consistent with the infinite sequence yeah but then if if I start with any other hypothesis right what I can do is simply again manipulate him so keep I'll keep him changing his hypothesis forever if he says as one I will give him two he says ass - I will give him three and so on oh you know so I will keep him in an infinite mine change which will in which case he will fail to identify because he won't stabilize right yeah and there are also other examples all right so here we have a class of cells that we considered in the beginning so just sets with one integer missing and now we have two theorems the class of all finite languages is identifiable the class containing all finite and at least one infinite language is not identifiable and those are two goals theorems and actually of course this is a bit more general than what I showed you in the previous slide because previously I talked only about initial segments of natural numbers this is about all finite sets but what this did historically is that it kind of challenged the chomskyan way of thinking about language and because it just said okay reverse it think about learnability we are failing at the most basic level because if you think of the levels of Chomsky hierarchy accept finite languages that are at the very core nothing else would be identifiable so you could not learn and we are talking back before all these things with statistical methods happen mostly so people thought like okay so this cannot be the right model for learning because we actually do learn regular languages supposedly maybe as humans or we learn we want our machines to learn context-free grammars what happens how do we cope with this problem so either Chomsky hierarchy or goat learning it must be off or both cognitively cognitively right of course I mean as a mathematical concept it just exists right I'm a realist mother this exists and it's fine right we should keep we shouldn't throw it away so boom it was a problem so what happened after were actually many attempts to think about goal learning in different ways like the park learning a computational learning theory came out and obviously like you know just the controversy between the the problem that it spiked huge research interest another solution to this problem was to think to question Chomsky hierarchy as something that we should actually be so obsessed with and this is where you can think of pattern languages something that emerged that is orthogonal to Chomsky kuraki the idea that actually languages are just following so simple a patterns that are not consistent with automata theoretic perspective and here the name of Dana I'm blowing should pop out how many of you know the name so she actually proposed a lot of interesting algorithms for coping with this problem okay so as a summary before the second part there is a very interesting paper by Google which the title is thinking is more than computing and it might sound a little bit vague and like wishful thinking especially in the light of church-turing hypothesis that our minds are Turing machines but the idea of this paper is that we are actually perhaps Turing machines but our success definition is not a computable case like for instance if you think about language learning you don't expect that people really converge in finite time to the right hypothesis about what is the grammar in question there's some sort of open endedness in this paper you can see a lot of examples varying from computer science to artificial intelligence cognitive science that kind of art substantiating this claim that limiting now that is actually an interesting concept perhaps even more interesting uncertainty something you should aspire to as researchers or scientists or learners true there are good reasons for preferring the computable way of deriving knowledge we know the results of computations and only thing we know the results of trial and error predict procedures there are many reasons for preferring knowing to thinking as popper observes but that does not change the fact that sometimes thinking may be more appropriate

Info

Channel: Federated Logic Conference FLoC 2018

Views: 898

Rating: 5 out of 5

Keywords:

Id: NcFUibIQCWc

Channel Id: undefined

Length: 69min 4sec (4144 seconds)

Published: Mon Jul 30 2018