Riemannian manifolds, kernels and learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
each year Microsoft Research helps hundreds of influential speakers from around the world including leading scientists renowned experts in technology book authors and leading academics and makes videos of these lectures freely available you good than everybody thanks for coming it's a pleasure for me to welcome back Richard Hartley to M Bazaar is visited a few times before Richard actually needs very little introduction the very famous professor Australian National University he everyone knows that he worked on projective geometry for a long time and he has a very famous book too that's out a lot of vision folks have it co-authored with the NGO system in so there are some very little-known facts about Richard he likes opera he likes to hike in dangerous places and um he actually won a prize for drinking so at least two of those things are true that Richard at least yes okay thanks uh so this is not all entirely my work and really the fact that my name's first simply because I wrote the slide has nothing to do with that you know who did what in particular the work at the end is done largely by my students are deep if we get that far okay so what is a manifold anyways so I'm going to sort of start fairly slowly giving some definitions and things so those who are not totally familiar with this sort of thing not lost from slide one so this is what I was doing research on this I came across this nothing in the National Library of Australia James Chester manifold 1867 to 1918 from a family of manifolds in Australia they were actually quite a lot of them they have naturally enough nothing whatever to do with this talk so so here are some examples of manifolds which one would be familiar with only sort of two dimensional manifolds and we'll come back to that slide so a manifold or didn't actually give the definition did I okay so manifold something which looks sort of like Euclidean space if you don't look too far so the surface of the earth you may say as a two dimensional manifold every region around it looks like a region of the plane and we're not talking about geometric width measurements just talking about rubbers rubber sheet but well for the time being so examples of manifolds there are a lot of them and some of them have not a lot to do with computer vision but others do so RN is naturally a manifolds and then dimensional manifold because around every point there's a little bit which looks like a little bit of our n naturally enough so the sphere SN is one which common manifold well I like the surface the earth s two rotation space that's one which I've given a bunch of thought to recently and under looked at in some detail very simple manifold of course but a lot to learn about manifolds from looking to so3 so3 is rotation space three-dimensional rotations space of three dimensional rotation which it's a five dimensional sorry three dimensional manifold that you can yeah think about in lots of different ways positive definite matrices are form a manifold I'll have a bunch to say about positive definite matrices in fact they form a cone as it sort of well known from those who do convex optimization they form a cone in you know if we have an N by n matrix then that sits in our in by n so our n squared and a neighborhood there of matrices is like a neighborhood of are a grasp and manifolds are interesting one I'll say something about them later the essential manifolds that harks back to what I spent a lot of time thinking about in previous years it's related to structure and ocean and presents the rotation and translation of one camera with respect to another so the set of all such possible rotations and translations is the essential manifold and one which I've sort of looked a little bit recently shape manifolds which is an interesting thing so yeah okay that's that's manifolds what's a little bit more about romanian manifolds in a bit but the other main topic of this talk is about inner product Hilbert spaces of course they're tied up with kernels which is the other thing on the title of the talk so what is the Hilbert space so Hilbert space is simply a vector space within a product alright for me it's an in a it's Hilbert space within a product so you know dot product well it does have one other property to really make it a Hilbert space it has to be complete but our complete under yeah convergence of sequences but that's not really a very interesting or very important to me at this point so it's interesting not important so usually you talk of Hilbert space you think of as being infinite dimensional but that's not necessarily the case so infinite dimensional spaces things like spaces of functions on the real line or on an interval of the realign one of the things which your Hilbert space has to talk about a little bit is a norm right so a norm is basically a length so you take some of some of X I squared gives you the length will gives you the square of the length of a victor so you can get it from the from the inner product X X so if you've got an inner product you've always got a norm the opposite does not of course hold their norms which don't give you inner products such as things like l1 norms they do not arise from inner product but so norms allow you to measure distances inner products allow you to measure angles right and distances so an angle of course you get by that well-known formula the inner product gives you the cosine of the angle effectively okay so we'll talk a little bit more about Hilbert spaces in a bit Romanian manifolds so okay I said what a manifold is what is the Romanian manifold we'll really the only thing important to me particularly here about a Romanian manifold is that you can actually measure lengths of curves on the manifold right so here's a curve on the manifold you can in fact measure its length gamma T is a curve you can measure its length really to be precise it's it's to do with inner products on the tangent space okay I didn't say that you know the way I think of manifolds I'm taking a very intuitive assault for me I thinking of manifolds is being embedded in some sort of RN like a surface in RN right for a mathematician that's not entirely satisfactory but for me a manifold is you know think of like like a surface an hour and the tangent plane is in fact the tangent plane where you know in the natural way but a plane that meets at one point tangentially so one of the things you can do on a manifold is think of it a direction in the tangent plane think of a tangential direction and now head along this tangent plane head along the surface in that direction a certain distance and the way I like to think of that is in terms of riding a motorbike on us on a on a road which is very slippery if you want to ride a motorbike and not come off you should arrange it so that your acceleration is always perpendicular to the surface otherwise you'll subside ways and fall so a geodesic is a coupe which is what I'm just describing here is a curve whose normal is always perpendicular to the surface perpendicular to the surface so that's a geodesic gd's can also be described as the shortest path between two points so if you take two points on a manifold and take a elastic band and pull it'll settle down on a geodesic the shortest path or at least locally shortest paths and way I describe so that's what a G OD zyk is a taut piece of elastic band curved inner surfaces acceleration is always normal to the surface and always also a local distance minimizing curve you can break up the curve into little sections so it is the shortest path from gamma AI to BI okay oh maybe because because you're talking about federation co does hinder many more so let's see 250 noises yes I should have said I should use the word smooth here I'm talking about smooth curves smooth surfaces in fact and there is a hierarchy but I first described as the manifold was the topological manifold next sort of level hierarchy is what you call smooth manifold so on top logical manifold you can define continuous functions on a differential manifold you can describe differentiable functions and on romanian manifold you can describe depth and angles and things like that so it is in fact differentiable and smooth in my were thinking okay so here's an example for instance of a geodesic and although this is this this red line is geodesic is obviously not the shortest path from there to there but locally it's the shortest path it's made up of a whole bunch of little shortest paths and that is the geodesic a great circle here are examples of geodesics on various manifolds so on a simple thing like Taurus the geodesics are remarkably complicated in fact on a on a pretzel even more but on things like cones they're actually quite easy you draw a paint line on the ground and roll the Kern over it it will the paint will transfer to the cone in a geodesic on a geodesic line on the curve that happens to be examples of them you see so getting back to this thing here there's things which are going to talk about called the exponential map which means that if you start here with velocity V heading in such a direction and you go along the geodesic for time one at velocity V you'll get to a point on the manifold on the surface and that is known as the exponential of V don't ask me why it's called exponential has something to do with the way these work with matrix spaces but it's the exponential it's a mapping from the tangent space at a point to the surface to the manifold and the mapping which goes the other way is called natural enough the logarithm map so a lot of algorithms we shall well talk about that where we get to it so geodesics in the exponential map I've said that so that allows you to do convex optimization on manifolds so I've looked at that a bit in recent particularly convex optimization on so3 I've looked a bit recently because because the structure of manifolds you can start doing things which mimic convex optimization and optimization theory on a manifold so so convex sets are defined in terms of geodesics so geodesic is sort of like a straight line Smart's your replacement for a straight line in a manifold theory and you the straightest the path between two points if the geodesic between two points the shortest geodesic lies inside a set that's called a convex set and functions the convex if they come if they're convex when when restricted to the geodesic so you can do a whole bunch of things on geodesic on convex optimization you can also do things like define whether a function is you can define a hessian the Hessians is defined like this if you have your manifold and supposing we have a function from the manifold to real numbers I can define the Hessian of that function by starting out from the tangent space and mapping by the exponential map and then the function and if that map all the way across there which is now a map from RN or a Euclidean space to R has a Haitian and if that is positive-definite the function is convex so it allows you to do all sorts of convex optimization optimization convex at the positive left so this allows you to do iteration algorithms on the on a manifold is also gradient was was defined there the gradient is a gradient of this function is also there's the gradient of that mapping same thing and so you can define things like gradient descent algorithms on manifolds and Newton algorithm of manifolds using the Haitian but that's not really what I want to talk about so but before I get off the idea of optimization on manifolds I think I talked about this one here before maybe about Vice the Vice field algorithm on a manifold where the Vice field algorithm given a bunch of points on a manifold tries to find the point which is their median in the sense that it minimizes the sum of distances to the points and your standard sort of algorithm for doing these sort of things or optimization in general you knew how to do optimization on the tangent space because it's Euclidean space it's it's flat right is to map start at a point map up to the tangent space do a step of optimization and map back down again by the exponential map then take another tangent space and keep going and so an interesting area of study is when of these algorithms convergent what is the radius of convergence and things like that so we're looking at that that sort of problem on so3 excellent I'm supposed to see yes they are yes just two different diagrams I guess I've been shown different places on yes so example of this is rotation averaging where you can rotate through rotation averaging on so3 this is so3 being set of rotations so this is comes back to some structure in motion where you've got a camera moving around and you can work out what the relative motion of the cameras are are oh I'm also r22 are three methods based on fundamental matrix or essential matrix you can look up our two I want you look at all the pairwise things but for consistency because rotations form a group you need this relation to hold so the question is find find the absolute rotations of each of the points it's an example of averaging on a manifold in this particular case the manifold rotations so this I think I've shown here before though maybe you don't remember if any oh if you were here even at my talk but we did this with the knot radar data set where these are all different positions of cameras images of that 569 images 280 thousand points and forty-two thousand pairs of images that overlap by more than 30 points so if they've got that much overlap you can you can work out relative rotation between these but some of them you can't they don't see anything input so you get a sort of a graph like that where these are cameras and you can work out the relative rotation from one to the other of the camera and then the the task is to work out all the rotations and I won't go into the details of the algorithm but it's an app it's an averaging process where iteratively we average the rotation of one based on the supposed now known positions of all its neighbors and the relative rotations so you can also into the confidence of estimating cameras rotations right because yes is depending on reliability yes you may not be right but so that's true delete okay that's true in what we were doing this we didn't actually do that instead we did the algorithm I said the Vice halogen doesn't an l1 optimization so l1 is basically robust rather than r2 then rather than l2 so you know bad measurements sort of not so important and once you've actually done it you can work out in fact if there are some measurements that you've got there a bad because you can sort of look around the loop and if the loop doesn't add up to 360 degrees you know roughly speaking one of the estimates around there is wrong and so you there are processes that you could think of for weeding out bad ones because when I say average I meant add up yes it will planar which is not of course then you go around the loop rotation from there to there for about the patricians edge there and there's they are never there it's got to be one complete turn right and if all those relative rotations don't add up to complete 360 - the whole cycle condition yeah what do you mean by the average average okay well sorry yes well you can think of it as being well going back to this diagram perhaps this diagram think of this as being so3 and you've got a bunch of rotations represented by points on this I'm trying to mimic that what is in fact the point that is closest to all of them in some sense well in fact whose sum of distances to all of them is is minimized these would these would all be a specifically had a bunch of points around here they're all supposed to be estimates of that one right so what is the best estimate of that one it's in it you get you know in our in our n you would add up all the estimates and divide by n right within that said that geodesic distance between hit and all the others is minimized by the Sun wants to minimize no you couldn't you couldn't assume they're illegal let there be a team and a triangle movie in a triangle yes but if you've got any lines and I read you this suddenly they're geodesic distance of time conclusion but even those I mean re and I'm like five points are pretty configuration and make the center point beat with distant all of us so you try and get one as many distant some distances some squares with being so mean some some distances not as squares is I call media okay gets to the waist the logo theater into the wild album is iterative you start at the point to do an update down here and then go back into another process I and from the details exactly what you do in the tangent space there's a formula for it the vise film formula which explicitly it's an explicit update formula for a gradient descent algorithm and we can prove that it's it's convergent improves convergent RN you can prove the convergent on so3 and in fact even prove the convergent on a romanian manifold with positive curvature which i'm going to find but you certainly years wonderful sorry so it doesn't this all you have to work locally in some sensible on the sphere there's some of this Antipodes of the sphere right there so is it this only true like if you were in the upper class yes you have to source you have to see you the things are not through random are expressed everywhere okay all right okay so i want to finish with that and talk about about well that was a little more about kernels on manifolds which i had sort of advertised i was going to talk about so what is a kernel anyway i think so if you do machine learning us talking about kernels right you're talking about kernels for kernel SVM kernel fisher discrimination all this that nella pca things like that no I'll show these a little bit what is a kernel exactly from a mathematical point of view well it's not exactly but it's like a similarity measure right it's defined for points in some set s any set and K the kernels large when they're similar and small when they're dissimilar so think of it as being like a an inner product if you've got to put inner products two vectors if they're close together they're inner product is large if they're way apart their inner product is 0 right and small so it's a similarity measure it's analogous to the inner product in fact if a symmetric kernel is positive definite then it is essentially inner product and I'll say what this so the point is that you like symmetric kernels you like symmetric kernels because then it can look like an inner product and then the kernel trick allows you to do a whole bunch of algorithms that use kernels kernel SVM and cetera et cetera so the point is which you know things like SVM you use for discriminating to different classes and things like that okay I'll give examples but the point of the talk then is when can you find these kernels which are sort of distance measurements or you know between points or similarity measurements between points on a manifold okay when can you do that so okay I made I mentioned positive definite it's the definition of that is the form matrix kxixj as being you know matrix made up of these elements this really means the the well this is the formula it really means that C K C transfers or C transference K C is greater than equals 0 it's a positive definite matrix so if you form kxixj it's positive definite matrix that's what the thing is and support that's what the definition is for all choices of points in the set s that has to be positive definite as a matrix indexed by I and J and the importance is that if a symmetric kernel is positive definite this is just like an inner product in that there exists a mapping Phi from the set X I was calling it s before now I'm calling it X map from X to a Hilbert space such that K of X and Y is simply the inner product in the Hilbert space ok that's what the thing is and if you're infinitely with the the hilt the kernel trick you looking for algorithms that only really rely on inner product so you can do all these sort of things so is it constructed can you break a use the fire or just there's just yes you can can define a hilbert space you can define H and you can define exactly what Phi is the slogan Versa versus the arrow basically yes there is a way of doing it but normally you don't have to worry about that and in fact in the case they're looking at you don't you seldom have to worry about constructing it because all you're really interested about is the inner product and that's given by K right so as long as you have K you're okay so one of the ones which one commonly uses is the radial basis function kernels okay so like this kernel XY is e to the minus x squared X minus y squared on Sigma squared I'm thinking here of when you're nuclear in space okay when you're in RN so X minus y is the like distance so e to the minus distance squared on Sigma squared is the radial basis function kernel and that is always positive definite for all Sigma right where this is in fact it's a norm in a Hilbert space it's always positive definite but in particular in Euclidean space so that got me thinking okay why right why is this positive definite why are we able to use this to do algorithms with kernels so start looking around a bit and so the first thing you think of is well maybe it's just because D is a distance right it's a metric so it satisfies the triangle inequality maybe that's it right turns out that's not the case no so if you have a metric space D XY defined on a set s which is a metric space which means that there is a distance which satisfies the inner product then the answer is well I won't say the answer yet Leo to surprise so this being a distance is not enough so that means for instance really you know the first thing thing is having a norm good enough if you're on vector space with a norm will that work right and the answer is once it once again it won't you actually need the product there being in a product the theorem says this the radio basis function is positive definite for all Sigma this is the radial basis of K X Y defined by that where this is a given distance this is a positive definite kernel for all Sigma if and only if s can be isometric embedded in the hilbert space this is saying something rather different from the previous thing we're here we're taking a norm in a Hilbert space whereas the previous thing right we were taking inner provides until the space it's not the same theorem but so this is so it's not enough for there to be a Banach space technically it has to be embedded in a Hilbert space even though we're only using the norm if there is a Hilbert space and an embedding of K or sorry of s into the Hilbert space then you have a kernel so let's listen look at some examples here if you take the sphere the supposed to represent a sphere maybe it's a circle whatever it is okay so there are ways of defining the distance between two points x and y here you can define the distance around the curve right or you can count the distance across you know the chordal distance well the chordal distance on sphere is clearly a distance which is embedded in Hilbert spaces sphere itself sits inside a Euclidean space which is by definition a Hilbert space and has an inner product it's a vector space from inner product and the distance if I define as a Korell distance is exactly the distance in the Euclidean space whereas there's no way of embedding that so that the distance the Euclidean distance is equal to the distance around the curve so that means that if I take the chordal distance DX y 2 Sigma Theta onto a theatres the angle that leads to a kernel in all cases but the geodesic arc does not does not give you a kernel okay so we want to not so interested in spheres we are interested in positive definite matrices for reasons that they come up in computer vision and they come up in number of ways in computer vision one of the most obvious ways well I don't know maybe it's not obvious but one of the most common ways perhaps in in recognition that they come up in computer vision is through so-called covariance features where you you look at a window and you still take little sub windows and you work out things like intensity across that window and you work out gradient across that window and various other things and you work out how they correlate how does gradient correlate with intensity across this window how does gradient correspond with curvature of things these were these were featured this were features which were invent by M for typically under to solve I believed and they can be used because their covariance matrices they're how these things correlate they are positive definite matrices and they use those features to describe to describe objects I'll push ahead here head a few bit things here this is the inria dataset you're trying to detect pedestrians so what you do is you put a little you train this right put a little box around pedestrians and you take little windows and take these covariance features so and then you look you train you classifier on these and then you run a window over and see which windows you find a container a a pedestrian odd and so the window is described by these covariance matrices by little covariance matrix okay so the question is I've got a bunch of windows which have these descriptors which are pedestrians and bunch which aren't I want to distinguish between them with a support vector machine for instance but the covariance matrices sit naturally on this Romanian manifold so what we have to do is find a colonel on this Romanian manifold of positive different matrices so we can apply a support vector machine and then we can distinguish between the different things now about positive definite matrices people there are lots of ways that you can define distances between two positive definite matrices okay so one of the useful features you might like is fi invariance which means distance whose necks and Y's should a trans physics like that if X is positive definite so as a transpose X a and this if this were true then it means that every point in the space of homogeneous may have positive definite matrices so like any other point from the terms of it in terms of neighborhood and it's the homogeneous if the thing can be think well gives a homogeneity condition on positive definite matrices which turns out to be good and you can define an affine and very Romanian metric using this so a distance function on positive definite matrices there are other metrics there's logarithm metric distance with necks and Y is log X minus log Y Frobenius norm it's a useful one then pull the Styne metric we're given by a rather more complex formula these are various different metrics that when we define to look at positive definite matrices this is a distance and the question is under what circumstances does this allow you to define a radial basis function colonel that's the question you want a distance function which allows you to define a kernel so that you can then discriminate classes and it turns out if you look at various ones which we looked at and there are a few others as well there's just one the log euclidean which is both a geodesic distance and what that means is it's really measuring distance along some curve in the manifold you're not sort of going out of the space units if you have a space that curves around it's no use measuring the distance across there the shortcut you have to measure the distance round through the space of what you're really interested in so the log you clean is Judy's existence and it is a positive definite kernel and you get that as positive definite by applying this this theorem which we had here right it's a kernel because it can be embedded in a tabular in a Hilbert space so this theorem allows you to sort of look at various distances and say quickly rule out yes no this is not going to work to give you kernels because I can't find is not going to be embeddable or it's not it's not obvious it's embeddable in the hilbert space whereas otherwise some summit is obvious you can embed it and you can apply the other theory sometimes it's not quite so obvious but but you can never les find ways in which you can define kernels so the log you could log euclidean one looks good we applied it to various things I'll skip of that but one of them is the INRIA data set and and lowers would here this is this is a false positives versus miss rate it works this is using a technique called Multi multi kernel learning but still your your kernel you're discriminating between covariance matrices which are descriptors of Windows in the space okay multi kernel because you're actually taking several sub windows you're taking good sub windows which you can find what ones are good by training and adding the kernels taking the new combinations positive linear combinations ago so this is the best one we could get and this is the kernel method based on sync on on positive definite matrices there's thing called the Euclidean kernel which means you don't take the the logarithm it's you know it's just actually the Frobenius distance given by the Frobenius distance between two matrices doesn't work so well at all and others it works well so okay the technique works well on support vector machines the same kernel you can use the same sort of yet the same kernel you can use our object classification where you're doing kernel k-means to kernel k-means to group group things which are the same among among this this data set you had a question mark yes you learned the koi fish boria's yes dilemma confusions of aquamarine and kidding okay well is if you have different yeah well we are using the kernel based on the log York City and distance right we're using the kernel based on log Euclidean distance always but the different kernels that you're adding are kernels depending determined by little sub windows so it's a combination little sub windows same kernel yes yes thank you another place article you're using more to two situations well things like so that this is Colonel SVM this is um texture recognition another one here is a diffusion tensor imaging where you're actually so in the fusion tensor imaging MRI imaging technique no we familiar with it I'm not too familiar with it myself but instead of just having a color at each point you actually have a three by three matrix a diffusion region so describes diffusion directions of a fluid in the brain and each point a sort of think of as being a little slip sidle cigar and you're actually doing classification based on the 3 by 3 matrix there now this is this is what we there's no ground truth here so I can't haven't got a lot of basis for saying that's the best result but it looks sort of ok a motion segmentation once again okay so colonel dictionary learning on manifolds I'm going to skip over this one fairly quickly but it's another example where we're talking now about a different type of a different type of manifold not positive definite matrices grassman manifolds so here's here's the idea behind dictionary learning so some of this work was done earlier by Byron David are we're applying it to grasping manifolds here but the point is in dictionary learning you have some object X that you want to describe in terms of some dictionary D ok as a linear combination close as possible to a linear combination of some set of dictionary elements but you want to do it in a sparse way so you this l1 some of the coefficients okay when you want to MIT you want to choose your dictionary which you're you know set of features or whatever they are so that you minimize this of the choice of all possible dictionary and coefficients so what dictionary best describes my elements here my X now how do you do this if the thing lies on a manifold if everything lies on the manifold here it doesn't make sense on the manifold you can't add points on a manifold right it doesn't make sense to multiply V and point on a manifold by V if you multiply a rotation by a constant you get something which is not a rotation matrix right you multiply a fundament well you multiply it just doesn't you don't have multiplication so how do you make sense of that so one of the ways you can make sense of it is to kernel eyes the whole thing where you find a kernel thigh which Maps everything Maps my manifold into a Hilbert space and I can define kernels on that and the question is well not so much the question the the idea is that now when you've mapped Phi into a manifold all the theory of kernels works as well so when you've mapped into Hilbert space okay oh I'll say again because I probably got wrong fires now a mapping from my manifold into a Hilbert space in a Hilbert space you can add you can take linear combinations and the whole thing makes sense right so we apply that to a dictionary learning the whole process there's a there's an an easy way of mapping positive death of Pop grassman manifolds into a into a Hilbert space which sorry grassman manifolds I'm not going to say anything more about graphs and metaphors because I want to hop on to the last thing which to do with the shape spaces which I find rather more interesting maybe because the thing I'm latest looking at most most recently okay shape manifolds maybe you should call them shape spaces they're all always manifolds really our and dimension to what is it's what captures an invariant of a set of points in RN so let's have a look at something like this you have a set of points which you found in a person's face okay maybe they're equally spaced points around the contour or equally spaced points around this contour now you want to capture what there is about the shape of these points right and so that you can say these are a similar shape of these another similar shape so what is shape first question shape is clearly something which if you rotate you don't change right I don't change the shape my head by rotating by rotating it I don't change it by translation either and the point that in my interpretation here I also don't change it by shrinking so this is this is what you'd probably agree is what shape does it's not changed by rotation so I changed by translation change by shrinking or scaling this is an idea which goes back to Kendall and up in the paper from 1984 and it turns out that if you define shape as an equivalence class of points under transformations of those three types you call that a shape the equivalence class this forms are manifold so which is rather surprising if I but just let's work out how this works out you represent each point as a complex number supposing we've got points in the plane Europe's and points is a complex number okay that gives you a real imaginary party to access so endpoints form your vector of complex numbers okay a vector of complex numbers as a the first approximation to you know what you might call the shape or something it's the the pre shape manifold equals the set of all complex n vectors or m in this place now if you Center them so that you you subtract you've got a bunch of these things sorry you know bunch of these things you take the mean the center centroid of all these complex numbers their mean and subtract it out then you've got a centered vector here okay so you do that translation you Center at the origin so it's now translation is taken away and you scale it so it has length ones let's go rid of the scale right now what about the what about the what about the rotation that's the important what about rotation now the interesting observation here is important observation if you multiply this whole thing which is a vector of complex numbers by a complex number e to the I theta it rotates each complex number so it complex is it's a it preserves the shape so what what you can think of is your shape manifold is sets of centered while sets of points on the complex sphere they have centered of length one right because I've scaled length one these are points in the complex sphere and I take the action of rotation or multiply just multiplication by Z now this gives you what's known as the complex projective space to understand this just think of if you if you're more familiar with real projective space real projective space you can think of as points on the sphere with minus one and plus one identified right so all real numbers of length one you know minus one plus one you do the same thing in complex you take points on the complex sphere and you identify them when they are equal up to a complex scale okay so this complex projective space which is a manifold just basically skills what is the rotation similarity as well as the Planck scale is fine but rotation it's it's it's its scale its scale in terms of complex scale but in terms of actual if think of these as points on the plane you're actually rotating right which is answer rotation rather than a shrinking of scale right so it turns out that this has to do with thing called the hopf fibration which i studied a little bit in graduate school and topology a very complex looking thing like this these are the these are what happens if you take a point and multiply by a complex number of unit a unit complex number they're they're circles so in other words if I take if I take this and multiply by all complex numbers it traces out a circle right so the complex space is made up of a bunch of circles and it's it's this thing hopf fibration for what that's worth but I'm getting through towards the end now so which is good but the point of this research is this work that I'm talking about by my student is that he can define a kernel on this two-dimensional shape manifold so which is shown here think of think of as I said a shape as being a unit vector well say forget about the fact that we're identifying the the Z&Z might think of it as being you know you forget that because that's the way you think of these things in terms of real projective space so just for intuition a shape is a vector also it's an equivalence class of vectors on unit sphere and if we take two of them I want to work out what is the distance between them right as shapes well it turns out this thing called the the full procrastinates distance between two shapes now and that is the I can't reach that far the perpendicular from 1 to 2 the other one that's a good let me convince you that is a good thing to think of as being the distance between these two shapes the distance between the arrowheads is maybe fine as well but maybe not quite so good as this one because this is the smallest distance between so the left-hand Arrowhead and any scaled version of the right right so between one shape and any scaled version of the other one that is the minimum distance right and it turns out by a bit of calculation this in fact gives you a thing which satisfies that theorem about about kernels it turns out that this allows you this gives you the radial basis function kernel based on that distance measure on shape space gives you a positive definite kernel and hence you can do all the sorts of things that you like to do with kernels X&Y our elements are in vectors in complex space yes unit vectors the third unit yes ignore but how do you define the inner product between two complex and vectors ah okay it's so if they're a I you know it's AI times bi is learnt one more inner product in this blue faces a complex conic a a complex conjugate times bi ox sorry I've ever hired African complex conjugate yes so the point is that the complex that an inner product in complex numbers in complex vector spaces or complicated faces can give you a complex number yeah it can give you a complex number so you cannot forget you cannot just say the cos theta is equal to inner product because what what doesn't make sense that the cosine is it is a imaginary so you but it turns out you can define cos theta to be equal to that the absolute value of esterification and then you just take the magnitude or whatever you wanna call the aspiration yes let's give Z cos theta and sine theta which is that because the full Procrustes distance yields a positive different cone right various other ones do not G of these it doesn't this is not this one which is more distance between the arrow heads does not right which is curious because it does if you're in a real space but not in the complex space okay so okay that will that gives you a kernel that you can now look at and try and do things like classifying shapes yes you don't allow that you you say okay this is okay this is one of there are a few a few weaknesses behind this in a practical sense which I can just point out a little bit but yes but it allowed and yeah and two-dimensional shapes are somewhat limited interest really to in computer vision but one hopes to be able to to go to three-dimensional shapes and I've been talking fact with with kutsu eqg who's here so say can we try to use it to classify different faces which there are on the temples of Angkor Wat or whatever it is he said so which he says they're all there there's their classes so can we use this so that's that's the you know work that I'm interested in doing but for the present you know we can do it for two-dimensional things now and you know leaf database and things like these so what are the weaknesses the weaknesses are for one thing yes you need the same number of points right and furthermore you need them to be the correspondence between them to be made and to make sense right so one can think of yeah you one can think of situations where that's possible like you know take you knows your two eyes your corners of your mouth and your chin and your ears or something like that in an image right but or obviously in in I don't know well maybe not obviously you can even maybe find the stem of the leaf and then take a fixed number of points around the step around on the contour of the leaf equally spaced right so that will give you the same number of points and the correspondence that you can make to use to to classify those there are obviously other situations where this is going to be a problem right one also may there's there's some hope of generalizing this work to things where you don't have a fine a number of points but you have a curve and you try to work out shapes of curves and I think I haven't followed this up yet completely but people do talk of shape manifolds in in that sort of sense to where you it's not a finite number of points but at the present time I'm not sure of the details to get this into 3d there are problems too which means we haven't just immediately gone and done it in that in 2d this 2d space I said it's a complex projective space it is in fact a manifold in three dimensions this is not a manifold it's normally called shape space not a shape manifold it's so like a manifold with cusps and things so I don't know quite how to handle at at this point but one one does research and hopes to find answers to some of these problems so okay that's that's it with the various people who I worked with this different list of people not exactly the same list as before but the union of the two lists is the full union of collaborators on this work okay that's it right I understand when you say this is music something likes are the shape space shape manifold yes what what exactly do you mean because there's always some space in which the manifold is embedded and you can always think of that as your shape space like it's like taking a set and relaxing it into a larger set and it seems that a lot of which of Eurovision people think there are features and turn them into vectors do all the work with vectors and then maybe at the end they project back to the manifold giving I don't know really what the last part but well what's wrong working that vector space so there are a number of things wrong with working in the vector space I guess it depends on the particular application looking for if you talk about kernels in particular our old sort of sidestep that for the moment but if you're talking about averaging for instance finding on clustering if you're doing things like clustering on a manifold there's no point finding the center of a cluster in the space in the embedded space because it does not represent an element you looking at and if you think of very specific manifolds like rotation space some rotations is not a rotation right the grassman manifolds it's not so the way grassroot manifolds are skipped over it is you you're applying graphs and manifolds to represent a sets of images well if you do if you pastoring with the grass manifolds and try and work out you know where the center the most average sort of you know thing is it's not going to be a member of the graphs and manifolds not going to sit it's not going to correspond to a plane it just won't be right distance is also you have said if the if the manifold in vertical sort of comes around and like this you really want to find distances around the hole through the space of possible configurations not across the gap which is really just something to do with the particular way that's embedded right in RN for different embedding it might be completely different but the intrinsic thing is in the manifold itself and so that's why you want to deal with the manifold but you know the the experiments also satisfied also some support this best to be doing these things with intrinsic metrics and trinsic distances on the manifold okay bringing the questions okay just one so we talked about comparing shapes does it allow you to compare against different categories of ships we should leave so we say okay can i distinguish between you know maypole against others and even the maple leaves a dot for exactly the same yeah well I mean it depends on the granularity of your of your clustering I guess right so we are talking about there these are the experiments were in fact you are taking maple leafs and comparing beech leaves and like that but if you bothered to train your your classifier on you know maple leafs in when they're young and in spring or something like that and what they look like I'm sure you'd find that you be able to you know find differences in shape as well and as as cuts who was saying I to me yesterday they weren't using this technique but they did for instance fine in in which was very interesting in these phases in Angkor Wat that there were differences between clusters of these they all look like people's faces but the clustering was perhaps based on having different sculptors who did these things well some of them supposed to represent humans and some of them supposed to represent gods and so you can actually distinguish between them so sure you can you can diversify in classification as you like really you know well obviously you think this questions but not let's thank Richard once more
Info
Channel: Microsoft Research
Views: 54,850
Rating: 4.8513799 out of 5
Keywords: microsoft research
Id: MtZV82LCNHc
Channel Id: undefined
Length: 56min 33sec (3393 seconds)
Published: Tue Jul 26 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.