New Directions for Tensor Networks: Machine Learning and Quantum Computing II

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so okay welcome back for the afternoon part so as the title says in this talk I'm gonna be telling you more about applying tensor networks and that includes doing calculations with them meaning like specific pieces of calculations how can they make these pieces more efficient and then I'll kind of brush on to the important topic but which we don't have time unfortunately just since I'm only giving a few lectures to go away in depth about you know optimizing and obtaining tensor networks but I'll say some things about it hopefully just to get you started down that road and I'll give you some resources you can look at for more details and then the last part of the lecture today I'm actually going to step outside of tensor networks even outside of physics for a little while and just do a kind of pretty broad introduction to machine learning so that we can have that background for tomorrow so hopefully that will just be interesting to all of you anyway for even if you're planning to use machine learning say not involving sensor networks just for something you want to do in your own projects and earn physics okay so let's get started so so the first part of the talk is actually about computations of the matrix product states meaning what are some of the technical manipulations we can do with them and what are the costs of those and hopefully it helps you to see the benefits of this idea so so far all I've really told you is that they compress a wave function meaning that a full wave function would have you know 2 to the N or 4 to the N parameters and you can get that number down that's good however a key point is like not only how not just parameter counting but like how easy is it to access those parameters and do manipulations with them right because there's other forms of wave functions that are very like good at compressing so there's restricted Boltzmann machines or these neural net weak functions there's correlator product states there's slater jestro a wave function there's all these other forms of wave functions that also compress the wave function quite a bit so what are some specific advantages that say matrix product states and then also tensor networks have okay so let's get into that a little bit so here you'll have to bear with me that imagine we already got a tensor network from somewhere so we already did a calculation to obtain one and actually so some parts are maybe a little bit about order I'll tell you in a bit how you can get them but let's say we already got them and let's see why we might want to get one in a in a matrix product state form so the reason we might want to wave function to be given to us in a matrix part of state form is because there's many there's many important steps of calculations that become efficient that otherwise wouldn't be efficient so you know most operations of the full wave function just wouldn't be an efficient would be inefficient like even storing it would be inefficient so let's be concrete about it let's consider we have a full wave function like some full mini body wave function with like insights right so that's that's this graphical notation for the the norm or the overlap of a wave function with itself or it could be two different wave functions so what is that that's the sum over you know all two to the N different indices at once so you just have to sum you know sit here in this case of in of ten indices you just have to directly sum two to the ten terms that's that's a no-go and by the way in machine learning they have problems of this type so you might have like a neural network and you say what's the norm of the neural network because you sometimes you want that if the neural network represents the probability you might want to normalize the probability you run into the same issue there however you don't have such elegant ways to do it there but for matrix product states and tensor networks you do have really elegant ways to do it so let's see what that is so if you have a matrix product state form or the wave function is so is there some more efficient thing you can do besides just summing over all all the physical legs like here I've written five of them you know so instead of doing all two to the five terms just one after another is there a better thing you can do so by the way you could do it that way you could say okay let me set all of these spin indices that run through the middle here that's kind of analogous right to these lines here I could just set them to all different values that they take and for each one I could do something kind of efficient I could let me just show you how that would go right so this may be giving you know more you know some appreciation for why there's an even more efficient thing that we're gonna do in a minute so I could say okay let me fix the values s1 s2 s3 s4 right then from each one of these choices s1 to s4 imagine those are fixed numbers just I set those then what I can do is I can say okay I can recover this amplitude efficiently by just multiplying matrices together because I can say you know let's say these lets me concrete about it let's say the pattern was you know up down up up or something so then I can I can I'm done with those lines those lines are fixed so maybe the up ones I'll shade I'll shade in and the down ones I'll leave as like a white circle okay so that fixing those lines had some effect on those tensors it was like taking a slice of those tensors so now that's just a chain of matrix multiplications right that's just a vector times a matrix times another matrix times a vector and that's sufficient I can do that efficiently I'd still have to do it though two to the N times that's the problem so even even using the efficiency of retrieving one amplitude from the matrix product state doesn't totally save me here so I have to do something even more clever okay so there here's the more clever thing you can do so the clever thing you can do is you can just number the tensors you know here like one to ten in this example being concrete about it and then you can contract them and you can just think of this as some network I have to contract and you can even forget that it represented a norm and you could just contract it efficiently in step so you can say okay let me contract tensor number one when sensor number two and so I sum over this line between them and the result is that I get a new tensor that has these two lines sticking out right then I can stick on tensor number three on to that that since or whatever that blue thing is and then I contract over this leg and now these these three indices are sticking out and then keep going like that put on tensor number four now I'm back to a thing with two indices then I just kind of repeat these two steps over and over and over again and I'll show you like an animation so so the idea is that I only have to figure out how to do these kinds of moves until all the tensors are contracted so here's like a little movie of what that looks like so basically I combine those two then I merge them that that one then the next one and the next one the next one the next one next one right so so that's that's the movie right so it's like just kind of just kind of chomp the thing down from the side like a little pac-man or something so and then that equals a scaler why does that equal a scaler at the end no legs sticking out right so that's a scaler so so basically and we'll talk about the efficiency of that in a second okay and then you can do similar things to get other kinds of calculations so that was the norm okay we'll get into the scaling in a second another thing that you could do similarly would be an expectation value so hopefully you see why this is an expectation value so here I'm taking two copies of the wave function one on the top one on the bottom written in this matrix product state form I'm summing over all the sites but in two of the sites I stuck an operator in there so this a is some arbitrary to site operator could be you know pick any operator you want that could be Sigma Z times Sigma Z or something but it could be some more complicated one it could be some term from a Hamiltonian and then I can do similar things where I can bring the left in and bring the right end by kind of combining tensors from the left and from the right and now all I have to do is contract this diagram which I can do efficiently or some finite cost to doing that and I can and I can do that efficiently so what what does this cost me well let's let's start giving some numbers for these things let's say that all the legs going through which in MPs parlance that's called the bond dimension because these are like the bonds like of the lattice but you could call this the internal dimension or virtual dimension let's call that M that's like the typical matrix size of these MPS matrices and then let's say that the physical legs like the spins or whatever they are electrons let's say those run over D values so D would be 2 for spin 1/2 3 for spin 1/4 for the Hubbard model things like that ok so let's say m and D are our numbers here so then the two operations that we do over and over again when we could say can consider this norm calculation one of them was take this partial tensor from before and stick one more MPs tensor onto it ok and get this funny thing with three indices and that one involves doing sums doing a sum over m values while holding these other three indices fixed so we hold this index to some value this one this one but the thing is there's M different values we can hold this one to D different values we can hold that one to and for that one so it's like having to do a sum over m things M times D times M times so it's like I pick 1 1 1 for these 3 then I sum over m values then I pick 2 1 1 some of our M values then I pick 1 2 1 sum over m values I just you know do all the combinations of holding these open lines open and then summing over this internal one and when you work that out that's you just basically say all the letters that you see you just multiply together that's that's the general rule for estimating costs of tensor contractions is is every line that's involved right it's size and then just multiply all those numbers together the point is you don't double count the the ones that are summed over you just count this once okay so the scaling here is this is an M cubed D cost diagram to contract okay any questions about that because that's an important concept that how to know the cost of those things okay so that's an M cubed D cost to make that one the other operation is then when I've already made that funny thing with three indices I put one more MPs tensor on so that be like if I've already combined those three together and now I want to stick that one on or maybe I've already combined these four together and I want to stick those five sorry and that one on so that one also is an M cubed D I just I look at every line that takes place in this diagram and I write down the value the values like this the ranges I mean and then I multiply them together M cubed D yeah yes you can definitely use Monte Carlo for this kind of thing so so I mentioned a few people at the break that one interesting kind of future research direction for these algorithms is to hybridize them with Monte Carlo ideas so if I had more time I would describe an algorithm I really like that does this for finite temperature systems is called Nets ma T TS that's a really nice one but yeah that's that's an area that is definitely deserving a further investigation of like doing these steps with Monte Carlo it's certainly something that can be pursued so the tricky part is how to get the best of both and not the worst of both you know so if you don't do it if you don't make a good combination you might just get an expensive thing that has a big error bars and in a terrible sign problem but if you can if you can harness it well it can it can be strategy the tricky part is that like let's say I sample this part somehow like I think of this as a this is a sum and then I do it with sampling somehow that'll have some kind of error bars leftover if I sample I can never take all the samples otherwise there'd be no advantage in sampling but then the question is what is the fate of those error bars like will the next step that I do control those error bars or will it multiply those error bars by some number bigger than one and will things grow and get out of control so you have to be doing it in a context where where the errors don't with the errors propagate well you know that's that's the key thing to think about or one of the key things to think about the other thing is you've got to keep the sine problem from the roaring backend so if the sine problem can creep up on you really easily if you put sampling into things so it's another thing to think about okay yeah yeah good observation yeah that's right so yeah that's that's worth pausing and talking about a bit more details so I had some slides on that that I was thinking of including and and maybe it's good just to do that on the board so let's just think about some general sense or contraction and let's let's say we have you know let's say we just have a funny one like this like we have some tensor a with like four lines and we have some other tensor be with or let's let's do maybe more like this some tensor be with like three lines let us say we're doing like that contraction and let's just call these you know i j k l m right so and then let's just say that let's say that they all let's say i goes from you know one two up to di you know j goes from 1 to DJ and so on d just means dimension of i dimension of j you know so then what is that what is this diagram or that diagram is just a sum and it's just a sum over all the connected indices that's L and K holding I J and M open right so it's a you know i j k l b l m right and then the way the way this really works in the computer is that I have to address you know like I and J so I have to sort of fix those I'll just draw arrows just to indicate did I make something up oh yeah B also has a k' index thank you yes so thank you so the point is I have to hold I and J and M fixed and kind of think of that as some slice out of a and some slice out of B now those are fixed temporarily then that's just a sum over that's like some kind of generalized dot product that I have to do over K and L if you want so that that dot product costs dimension of K times dimension of L so each of these inner computations cost DK times DL to do but then I have to hold how many different ways can I hold these other indices fixed right and it's you know di DJ DM so these are like how many different ways I can can kind of plant these and hold them fix and for each different way of doing that I have to do this sum so then you just multiply out with all the dimensions that's the idea so that's the important thing is that you might want to count somehow the ones that are contracted over twice or something you don't you just you just look at the whole diagram once it's connected up and then it count everything once that's participating yeah so that's an important thing to know about so in fact I mean and don't feel you know dumb if you didn't know that or if this is new information because there was sort of a very funny time you know maybe just old enough to remember when this topic was sort of taking off and some of the people who are coming from quantum information and working on this topic we're like kind of boldly writing down these these algorithms for 2d and just not even really paying attention to the costs of these different steps and they were saying do this and then do that and do that and do that and is that within people who actually had to program it would try it and they would get an algorithm that was scale like some bond dimension to the 20th power or something right but then but then the people who are doing this we're saying yeah but it's not exponential you know like you know because if you read these if you read these papers about you know complexity theory it'll say anytime you go from exponential to polynomial you know it's efficient or something but I mean you know something to the 20th power I mean I don't know that's not really gonna be practical practical to do so now people are wiser and they know you have to think really carefully about these scalings of each step right so let's just recap those last two slides so the two key operations for this one in cubed D this one M cubed D okay so the whole thing is M cubed D and that's great because if you think about it if this is your matrix product state underlying a matrix product stated matrices right every site there's two matrices if it's the spin 1/2 case and matrix operations you may or may not know the you know they they generally scale as like the cube of the linear size of the matrix so what this means is that we're doing things with high dimensional tensors of wave functions that involve these restoring these matrices and we're doing everything with the with the scaling that's the most efficient scaling you can do for matrices so this is a good sign when you see this so this is really efficient I mean M cubed you know be better if it was M squared or just M but M cube you can push very hard so you can take m to be many thousands and afford that so that's a very favorable scaling okay so that's that's the good news there and so the rule of thumb in fact within PSS is that you always want to see if you can get M cubed where M is this typical bond dimension of the NPS if you can do anything within cube you're in business and squared is even better and there's a few steps you can do with M Squared but if you do anything this M to the for you you may be but you should worry about it and if it's higher than that you're doing something wrong basically or you're not really getting the benefit of a matrix product states that you should be so that's just kind of a word to the wise or if you're going to get into this business seriously in cube for matrix product states now for peps and these other more fancy tensor networks you have to settle for less so there you'll have bond dimension to like the six or the seven sometimes and that's just life right now with those sensor networks and so people are trying other strategies to make them you know to make them more practical to work with Monte Carlo is when people are thinking about actually okay so that that was just sort of a few slides about some technical aspects of if I have a matrix product state what are some things I can do that I couldn't do otherwise right so I can actually extract like local properties like expectation value of operators efficiently normalized them efficiently these are non-trivial things right you can't really necessarily do that with other forms of the wavefunction so easily I mean you can with like Slater determinants and things but but this is really important because of that capability that's given us lots of nice ways to optimize matrix product States and other tensor networks so I'll talk a little bit about that so let's talk about applying them to physics so so this is just kind of a lightning introduction to DMR G which is how do you actually obtain matrix product states that say for ground states of many body systems even excited states so this may be a little too fast but hopefully it'll give you the flavor and then I'm going to point you to some resources that have more in-depth explanations also later this week I forgot to mention but later this week holy Sherlock who's a real expert on DM or G it's going to be giving you more detailed lectures about that and other topics so so look out for that but so let's let's let's start with it at this level so let's think about you know the many-body problem as this kind of tensor optimization problem and you can sit you can certainly think of it this way so let's say we have a mini body Hamiltonian and I'm just writing it right now is this giant object with two indices right so I have you know in like bra indices and in kept them to season the top I mean I'm in my head I'm thinking of those and bra and ket indices so basically that's you know it's an operator so it has to have you know two copies of the physical indices and then the wave function has you know one copy of the FinTech physical indices there's the wave function the cat and there's the bra version of the wave function so that's what the energy looks like if the wave function is a matrix product state any questions about that diagram like what I mean there that's just putting the wave function on both sides of the hamiltonian right but I haven't said anything about the Hamiltonian yet whether it's local or what you know and then what we want to do is we want to try to update the numbers that live in each of these blue tensors these wave function tensors same on top and bottom and it's the you know same wave function twice until we can get this energy to go down for you know for a normalized wave function so somehow we want to just update these so one thing we could do actually is just compute a gradient and just gradient correct those but there's something smarter that you can do actually okay so one thing you do to make that makes life a little easier as you say okay instead of trying to update all the numbers in these wave function tensors all at once all at the same time what you can do and that can be a good idea in some contexts but what what tends to work better for these is to freeze most of them for a while and just very a few and this is one of the first ingredients in the in the so called EMR G algorithm so what you do is you say okay on this step let's vary the third one maybe Anna later staff will vary the fourth one and the fifth and we'll go back and forth but for now let's say let's bury that one and hold the other ones in blue fixed for now okay so then in that forum you can think of sticking all the other ones that are frozen onto the hamiltonian and they like transform the Hamiltonian into like a reduced or effective Hamiltonian just for this red one right so you can think of this as like a little mini Hamiltonian for the red one I'm not sure if I have a drawing of it let's see yeah let me let me draw more of that on the board just to be more concrete about it so if you look at that thing in the middle right so that thing in the middle it has three indices sticking out at the bottom right 1 2 3 and it has 3 sticking out the top so you can write that thing in the middle as some kind of three leg three leg thing like that and then this this tensor that we're varying comes on like that and then the result would be another tensor with three legs right so you see how that all fits together so that's like that's like H times this you know this little tensor here equals another tensor that's similar to it so that's how you'd multiply H or you could take this stick it on top and then you would get the energy but just involving that tensor in red ok so that's what I mean by this is kind of an effective Hamiltonian for it now I'm leaving out some important technicalities which is that for this really be properly thought of as an effective Hamiltonian these transformations in blue it's best if they're unitary transformations but if I just put random numbers into this MPs they won't be so so it's best if you arrange for them to be unitary and it turns out you can do that without loss of generality and that's a whole lecture itself but it turns out there's always a way to make these blue things unitary collect individually and collectively and so this effective Hamiltonian really as has properties that are very much reflective of the original one so that's an important detail and also it makes it just a regular eigenvalue problem rather than generalized eigenvalue problem so now we've reduced it to just an eigenvalue problem for this red guy and then we also have to use the fact that H itself has some structure otherwise it wouldn't be efficient to do this so we have to remember that generally we're talking about Hamiltonians that are sums of local operators right I mean there's more complicated ones that might have power log interactions and things like that we can deal with those as well but the simplest case is let's say it's 1d and you just have a nearest neighbor lattice model so then the Hamiltonian is just the sums of these local terms and these lines are identity operator someone asked earlier about how do you notate those so these are just tensor products of identity operators that's one thing also that's interesting about doing numerix and working with these kinds of algorithms is that when you do the theory on paper often you kind of gloss over these details of like you write s 1 times s 2 and you you know you mean by that but really that shorthand of course for s 1 times s 2 tensor product identity 3 identity for all the way to an identity n sometimes you forget those things when you're doing pencil and paper theory but on the computer you actually have to tell the computer that there's an identity on every other site right computers don't know these kind of things like implicitly ok so then this is the most complicated slide but just to give you a flavor so let's take that form and then imagine sticking that inside here kind of wrapping it up in all the parts of the NPS that we're not going to optimize on this stuff and then see you know see what happens when the kind of the dust settles right this is mostly just a motivating slide just to give you a flavor so what I've done here is I've exploded the Hamiltonian out into the first term the second term the third term the fourth term the fifth term on a finite small lattice and then for every term I've wrapped the NPS surrounded on top and bottom leaving out the third site ok so any questions just about what what I'm showing here at all right so these are all the terms that are inside this little mini Hamiltonian right then what you start to see is that some of these terms have things in common like the right-hand side of this one is the same as the right-hand side of that one and the left-hand side of these three is all the same right and also some even some little mini pieces are the same like if I contract those two tensors together that's the same as contracting these three together so I don't have to calculate all these separately I can do smart things to try to like make this more efficient so what that looks like is I say okay I'm going to calculate that thing on their right and call that this tensor that has these two lines coming out of it and I'm gonna bring all those things on the left in and then I'm gonna do further reductions like you can do these kinds of left/right reductions where basically this thing in blue indicates that there's no operators on the right that's just the one time identity being stuck into the right basis and over here this is the identity being stuck into the left basis this is some right piece that involves having a term somewhere on the right completely and you know terms on the left completely and then I finally I have these complicated pieces where I have a term that's actually touching the site that I'm optimizing so this is like all the terms on the right in the Hamiltonian all summed together all the terms on the left all summed together and then these these middle terms that have to do with how many actual terms across that site don't worry if you're not catching all these details this is just to sort of show you where the technical parts live in at one of these calculations but then I can actually do optimization in this form so I do that work once and then I then I have this little mini Hamiltonian now I can use efficient algorithms like link shows or Davidson that are like these are basically state-of-the-art edie exact diagonalization algorithms to optimize that tensor that's that's the upshot here that's the payoff right so instead of having to resort to something slow like gradient descent I can use all the speed and power of length shows for Davidson and the key step of those algorithms is you have to multiply whatever wave function you're optimizing by the Hamiltonian so you multiply it by the ammonium then you do some transformations to like make an orthonormal basis then you hit it again with a Hamiltonian over and over and over maybe three or four times if you're doing a partial improvement and so this is the key thing you need to be able to do is stick the wavefunction in that's putting this red tensor into these four diagrams then the result is a three index thing each one of them and you add them back up and you loop this a few times and you get this really fast convergence to the ground state from that step yeah there is so it just gets in just lots of details but you mean this thing of zooming in on one side at a time or oh no so this this decomposition into these gray to site terms here that was an assumption so it's not that I'm using that it's that that I'm just saying if the Hamiltonian has that form like if it's the Heisenberg chain or something if it's not you have to do more complicated things so yes like an MPO is one of those things yeah so you can do an MPO you don't you don't have to but but for some things MP owes really you have to or it's like the best way like if you have long-range interactions or something so there's there's nothing a whole more sophisticated set of things you can do here to make these Hamiltonian so but let me just say that these slides were just to give you a flavor of like why I'm only kind of going quickly through this part because it's really this big lecture how to optimize matrix product states and there's all there's many things that could be said but basically hopefully that gives you a flavors that you you take the ability that I mentioned in the first few slides of being able to efficiently compute local terms and you kind of operationalize that into giving into getting these like reduced forms of your whole mini body Hamiltonian down to just one or two tensors of your MPs and then you have all the pieces you need to use in the core lane choice or Davis and these very fast algorithms that can really take you big decreases in energy they can really like you know improve your wavefunction a lot then when you're done you switch to the next tensor and optimize that the same way and so on and so on and so on so you can get really good results this way it's it's much more efficient again then say just treating all the parameters in your MPs as just some collection of adjustable parameters and doing gradient descent on them or something like that be very slow so this can be much much faster than that that's that's one of the key points yeah yeah which which one like this right hand side here well that has that has two and then that's the bottom part is one of the three at the bottom there's one two three on the bottom and one two three on the top when this one what I mean I didn't do right there's a lot of details missing here of course but here this gray one is meant to indicate all the terms and the Hamiltonian on the left all wrapped up in that part of the MPs and then the middle one is just an identity operator on site three you in the example I was doing and then the right one is some kind of rotated form of the identity on the right in this case you know you know to be specific about what I meant here so basically you have the terms where they're all far away on the left then they finally start crossing the bond then they're over on the right that's kind of what's that's what's left over here you know so the point is is that it's always some small number of terms that you can reduce it down to and you can be smart about how you do it we're now as I move over to the next site I can save a lot of these pieces from before and I don't have to recalculate everything again I just advanced them a little bit like I had these things that were every term up to site three and then I just include the term on site four and now I have every tournament aside for so I can reuse like that piece and just kind of keep growing it as I go back and forth you're smart about it to get to explain this in full as like a whole lecture but I just wanted to touch on that but hopefully you see how this is about a valid form of that that thing of that effective Hamiltonian because you see it has three indices on the bottom that I'm able to stick that red tensor in and then what's left over is three indices on the top so I get a new like red tensor like that and I can iterate this you know okay so anyway it's probably it's I was debating whether I should even get in these details because it's sometimes worse to show it half way then the whole thing but anyway it's mostly to give you a flavor of there's some non-trivial steps in doing D MRG but I just wanted to give you some flavor of how that algorithm works okay so we have this algorithm that lets us update a matrix product state tensor 1 tensor at a time or two tensors at a time and if you do it two at a time that you can adapt so you can grow the bond dimension automatically and let it grow or shrink as needed you can get really accurate results so I just wanted to now spend a few slides on saying okay there's DMR there's a nice packages you can use to do it I'll just flash one of them which is the one I work on and a little bit so let's say we we have this algorithm in hand in some form you know you can use one of these packages I mention later what kind of results can you get and what kind of systems can you apply it to so as far back as 1993 which is around when DMR G was invented you could already get some very impressive results so this was a study done by Steve White who invented it with david hughes of the spin-1 heisenberg chain and this was at a time when I think if I maybe I'm wrong about this but I believe there was still some debate even about whether the gap whether there was a finite gap or not or at least what exact size it had there's this famous Haldane gap which is predicted to be there from some kind of topological effects so so while this was debate was happening in comes this method gets the ground state energy to however many digits that is like nine digits or something and then gets the gap to like five digits so this method you know was pretty impressive already from day one and this is back in 1993 so you can imagine you know you can do a lot more even more recently so now one technology we have is you can run the MRG in parallel this is just showing an animation of that so this is some work I did to parallelize VMR G where basically you have in this case four different walkers and they've been assigned a patch of the system and they each walk over their patch locally improving the matrix product state and then the blue dots here are the local energy so that's just the value of SS expectation value of estan s on each bond and so you see there's some boundary effects it's the open system but then in the middle it gets very flat once you're past the correlation length that's the edge of the system so and then a key point about it one reason I just wanted to mention this on this slide is that the convergence for a gap system can be very very fast so toward the end of a calculation you can basically get exponentially fast convergence where in this case I only did four sweeps and you can see the energy changes a lot at the beginning but then toward the end you're just sort of putting on extra digits onto the energy so it gets very very fast at the end well it's kind of hard to expect that from most algorithms so it's kind of a special algorithm that way in terms of just how fast it converges in the best cases yes you can see it's just now it's just kind of cleaning up in these like smaller digits yes so this algorithm needs to be studied more along the lines you're saying actually so one problem is it was an interesting idea but it's been kind of missing like a key use case because basically the the serial version works too well kind of and so all the 1d problems had already been pretty much cleaned up and then 2d has problems that this can't fix so 2d has problems to do with scaling that are just fundamental that you can't beat but just by paralyzing very well very well but now there's a good case that we have that I want to try this more which is quantum chemistry which I won't say a whole lot about I mean I think I have a slide coming up about it briefly but there you have just a huge amount of sites you have to chew through in the way that in some of the ways of doing it if you want to have like very good resolution of real space like you've to use a lot of basis functions so then it'd be good to try this algorithm again for that and indeed it's a good point you're raising could be very interesting to run this algorithm and the limit of of every worker gets just one or two of the tensors and it doesn't even sweep it just stays there and just cranks away on that one or two tensors and then and then you just have like as many walkers as you have sites basically and it's all just cranking away and sharing information with each other I don't know actually how well it scales up so definitely it works better for two than for four and then but then after that it shows pretty good scaling but I don't know if that goes all the way to and it's an interesting question so yeah maybe it doesn't but maybe it can be made to yeah so that's that was resistant that was some discussion about just applying it to 1d systems briefly it's been used quite successfully I would say in the last you know 10 years for 2d systems so so even though DMR G in matrix product states are really best suited for 1d you can just kind of shoehorn them into 2d by these kind of sneaking path constructions the the best path to use really is not so much a snake as much as like a cigs AG so you kind of go up the first column then you jump to the bottom of the second column and go up up up up up like that this is from a paper that's a really nice pay we're talking about actually transforming the basis in one direction someone asked earlier about momentum space in 2d that can be a good idea to use momentum space in one direction and continue to be in real space in the other direction and this is actually in 2d one of the state-of-the-art methods so even though it's not really that well-suited for 2d in principle in practice everything is having such a hard time with 2d that this can be competitive even with like quantum Monte Carlo even without a sign problem in some cases if you're clever about how you use it when you do have a sign problem it's one of the only methods you can even try so so there's that so it can be very powerful so this is just showing how powerful it can be so this was a nice study from 2007 using this kind of sneaking path construction on the square lattice Heisenberg model and also they studied triangular lattice Eisen remodel and some other models in the same paper and you can see that there had been some prior qmc quantum Monte Carlo work trying to estimate the magnetization and getting it very precisely and and had gotten the bounds down to these two dashed lines but by a very clever combination of two DD MRG calculations combined was very clever thinking about finite size scaling and aspect ratios of different rectangular clusters these two authors were able to extrapolate very precisely and get much better kind of error bars than the prior qmc work I mean since then the qmc has been improved but I mean it wasn't obvious that this would it would even work this well so it's interesting paper what they actually found was that by doing different rectangles of different aspect ratios you could actually kind of dial in the finite size effects and make them go away so you could basically have a very flat extrapolation like this red one is the kind of the optimal aspect ratio so you could get an extrapolation this is basically just straight into the thermodynamic limit so that was pretty interesting and then you can study all kinds of systems with with with DM RG you don't have to just study spin models you can study things like electron systems including quantum Hall systems so this is this actually isn't from a DM RG calculation but it's more or less illustrative of what you can get but this is by a group that also later used DMR G to study these same systems this is actually an exact construction is pretty interesting this paper tells you how you can take all kinds of interesting exact kind of like quantum Hall states like Laughlin states and other states and basically kind of write down more or less by hand different matrix product states forms of them in 2d and then you can actually make quasi whole excitations and then kind of map them out so this is a these are some densities that were measured from these kind of exactly constructed quasi whole states but underneath this there's a matrix product State sneaking through and what you do is you basically work in a basis of orbitals continuum orbitals that go around a cylinder you can see this is periodic if you kind of look see that connects to that and then you can do very interesting constructions of quantum Hall States on cylinders so that's just another area of application that probably there's more to do there and then one other area of application I think is the last one I have is quantum chemistry and this is an ongoing story it's it's been very successful for common chemistry and there's still a lot more that could be done that needs to be done just to show you the kind of results you can get applying DMR G matrix product states in this context this is showing the local density of states I should have an x-axis here as you kind of scan a different energy I think the idea here is that you basically imagine like shooting in a photon of different energies and then asking like what kind of you know could I eject an electron out basically so you can just you can resolve different kind of core States this way and then this is a table of core state values to a particular core state where you're basically saying what's the energy of a certain tightly bound electron of an oxygen atom of a water molecule how much energy would I have to supply to remove that electron and you can see that doing DMR G in different basis sets you can get you know very get you can kind of converge to really high quality basis that's basically the more letters it has the more functions it has and you're resolving the continuum more and more and more and more that's what these crazy combinations of letters means it just means you're basically stacking a more and more Gaussian functions on top of atoms to sort of resolve the continuum and then you can see that many of these compare very favorably to the experimental value and some might actually be more accurate than the experiment in fact so you can you can really compare to experiment with this method okay so in terms of like things I won't have time today to say and today tomorrow I could say a lot more about optimizing tensor networks could say a lot more about other algorithms and types of tensor networks let me just point you to some some resources at least one new resource that I'm working on so this is a review article that I'm working on but it's interesting because it's not like a paper I'm not planning to publish it it's just a website basically but it's not even really a website it's really a github repo of a bunch of text files that I'm working on that you can also help me work on which you can compile into a website or to a PDF or whatever so let me just give you a quick tour of this so I'm calling this website the tensor network and it's just meant to be like a community resource where I just going to write down every known tensor Network algorithm every known fact that there is about this so let me just show you how that how that looks so here's one of the pages on on TRG which is an algorithm I'm not going to go away into but this is an algorithm for contracting the partition function of a 2d classical model so you take something like the classical Ising model on a 2d lattice at finite temperature and you can write that as a tensor Network and then you can use this very interesting set of steps where you can basically break the tensors into two pieces and then merge them back together in some complicated way and in coarse-grained to a new lattice and you can iterate this over and over and over again and so this actually goes through every step of that algorithm later I'm planning to add some code and some things and it has you know references for you here's a page all about matrix product states so I'm trying to give a nod to the applied math community where people in the applied math community have started calling these tensor trains just a different name for matrix product states so I'm trying to mention that as well and so it talks about what is one of these things what are the key concepts number of parameters then it goes through this these different algorithms like how do I get a component of a single component of an NPS how do I do an efficient inner product of two MPs is that's the thing I showed in the earlier slides that's kind of going from the side algorithm here's an interesting one how do I take an NPS of one size of bond dimension Capital m and compress it into a matrix product state of a smaller size making a controlled error so what if what if I somehow did some optimization that blew up the size of my MPs I want to shrink it back down but in an optimal way there's a really interesting algorithm for doing this and what you do is you actually compute reduces density matrices then you diagonalize them one at a time going backwards sticking the result on going from right to left and then in the end once you've done all of them then the result is your new MPs so there's this really interesting algorithm to do that so just flashing that up there lots of references to review articles and and the original articles proposing all these ideas in the first place and then one other page I'll show is there's the entire DMR G algorithm actually written out and in steps for you so it just talks about some background and then it basically shows you preparing your MPs building up the Hamiltonian the reduced Hamiltonian then doing optimization of a first bond breaking it apart going to the next bond advancing the Hamiltonian and you know it has all the steps and everything so I'm planning to flesh this out a lot more maybe even some sample code one other site I wanted to mention that so at the same time Glen Evan bleah had the st. more or less the same idea but his is more pedagogical so he came up with a site called tensors net and this one is really nice so this one is aimed more at people who are in physics who are starting on this path of learning about these topics and they want just to get started and kind of focus on physics applications I asked him if he wanted to include everything and he's like no I just want to keep it aimed at physics grad students basically so this one is basically just some very high-level information about tensor networks but then it has things like DMR G completely written out in Giulia for example or MATLAB every step and you can just copy the code from the website so here's some similar diagrams but then it just has code you can just copy yourself and and run with and it's not using really any other libraries more or less other than some standard Giulia libraries so that's really nice so you could like write your own DMR G code starting with glands Giulia one and it's probably already very efficient and you could probably already use it to do to do research so that's that's another resource that I'd recommend and then one other let me just quickly go back to the tensor Network page for one second one thing I wanted to mention was that so like I mentioned this is an open source review article so what I mean is if you click here on this takes you to a github page where you can actually go into the source folder and you can see every page so like this one diagrams and if you look at the raw text that's this page so you can see it says tensor diagram notation Tara Diamond Edition is a simple yet powerful graphical notation etc that's coming from this file so all I did to make that page is I just wrote this text file and then I have a little static site generator that just generates the website from that so it's really easy to add content so if you or someone you know knows anything about this topic or if you see an error or anything you want to add even a reference to one of your papers or something just you can just edit some simple text files like this these are markdown files and then send me a pull request I'll add it and then you'll be credited because because I'm github you know it tracks all the contributions basically from everybody so the idea is I'm trying to make it so everyone have some kind of academic incentive to actually help out so it's like this ongoing review article that I'm going to try to write with the community one other resource I wanna mention is I tensor which is this whole software package that I maintain and develop and everything and I tensor does a lot of things it has all these facilities for doing low-level tensor operations but it also includes an entire DMR G front-end so this is an entire code to do DMR G right here in the front page basically you can input all kinds of complicated lattice models almost in a form that's like similar to how you write it on paper you just say S Plus J S minus J plus 1 you know etc as Z JSC J plus 1 and there's a little system that will like compile this into what's called an NPO which is like a type of Hamiltonian that DMR G understands it's like a tensor network for a Hamiltonian and then you just set up a wave function you say how many passes or sweeps you want it to do which is how many times it goes left and right back across the system and then you just say run DMR G on it and then when you're done sigh this objects I here will be an optimized matrix product state that you can then use these these steps that I mentioned at the beginning of the talk to say extract local properties efficiently you just stick tensors together and you can say measure the magnetization or other correlation functions things like that so there's a whole system for doing that a lot of it's already made for you ok so that was just some resource I wanted to quickly mention so check those out so there's tensor network I tensor tensors net so lots of stuff out there you can can read about okay so any more questions about tensor networks cuz I'm gonna shift gears now let's see okay probably have to skip some stuff in this part actually cuz I didn't quite time this right but okay so just to get ready now for the lecture sore tomorrow and maybe I can spill some of these into tomorrow if we don't get that far I just wanted to give some kind of really broad introduction to machine learning so we'd all be ready to talk about that tomorrow plus I thought it would be just generally useful for all of you so let me just get an idea who's heard of supervised learning who knows that term okay who's sort of unsupervised learning you know kind of same people right who's actually programmed like neural nets or like a linear classifier or something like that okay not so many so I think this will be helpful to you guys alright so as you all know machine learning is it's really kind of taken off in recent years in terms of what it can do probably taken off too much probably some hype as well but but you know we've seen the results right so progress in natural language actual demos of self-driving cars you know the real impact will probably be more in quiet ways like in our lives like we'll probably see quality of medical diagnosis just start to improve a lot there's companies working very hard on that one of my friends actually who for he was a postdoc at Rutgers and Santa Barbara doing D MFT I remember meeting him here at one of these schools now he works this is this guy Chuck Yee he works for a company now in New York City that does machine learning of medical data actually so this is something that if you're looking for a career using all of your physics you know numerical optimization skills this is a good career to get into actually so he's making you know six figures living in New York now doing that so and then for us you know directly in science machine learning could have a big impact on on studies of materials chemistry even strongly correlated electrons you know the hype is getting is so big that Google is basically rebranded itself as a machine learning company now instead of a search company and so they have these articles now coming out saying stuff like why the people who now sit closest to the CEO is actually the Google brain AI team not the search team you know and they replaced their Google Translate efforts just with a neural work system actually and they have a team in LA mostly consisting of physicists working on a quantum computer and they're very interested in quantum machine learning which I'll talk about tomorrow a bit so that's just to show you how much is going on and machine learning is already being shown to be very useful for the kind of physics that most of us in this room are interested in so things like identifying phases of matter so this is a paper from 2016 with Juan Kerr skia Roger moco where they trained a multi-layer neural net to figure out from data what the order parameter should be and then report what phase you're at right also this is an interesting set of papers by the group of Pankaj Mehta and anatoliy of laconia cough and this is mostly led by this guy marin bukhov who's now a postdoc at berkeley i think and then alexander de and three cells so they have this nice work where they were basically saying can we approach quantum control theory from a machine learning perspective and think of it as what's called a reinforcement learning problem so I encourage you to take a look at their work it's getting used in other areas too so things like materials discovery also high energy physics this is this guy Carl Cranmer at NYU is doing a lot of interesting work so they're actually trying to constrain theories or or you know extract maybe discover new particles by sifting through you know huge amounts of data using machine learning so that's that's some interesting work let me fly through this part because some of you have probably seen this before but let me mention a few of these so what are some examples of why machine learning is you know why people are saying it's a big success right now so a key example was this one it's called image net an image net was this very challenging benchmark that it consists of 1.2 million training images and 150,000 test images so you're given this huge dump of all these images and you're given a thousand categories and your job is to sift these images out into the correct categories right just by giving just by being given these labeled training images so this had been going on for some years and then this paper came out in 2012 and there was a competition that year and that year this paper got only a 15% error on the test set so trained on the training set then they tried the same model on the test set and only got 15% of the labels wrong which is a lot but I mean it's it's it's pretty impressive for how complicated the task is then best entry that you're about 26 percent wrong and years prior to that it you know things have been creeping along it had been you know twenty eight percent twenty seven percent twenty six percent then boom this thing jumps all the way to fifteen percent and now they've gotten it up to like I think only a few percent if that and you can see how hard this task is you're given pictures like this which is like a Dalmatian sitting behind a bowl of cherries and the correct label is cherry and you know but that's really hard but then there's there's ones that are less tricky like this one but still you have to have a system that can recognize that's the container ship and it says yes container ship I mean so this is a really tough task yeah let me skip this one it's cool but running out of time there's other things you can do like you can actually use machine learning to generate things so there was one that's called discriminative learning that image net thing where you look at things and you discriminate you say what label should this have you can also generate this is using this technology called Gans more more generally this is just using deep convolutional neural networks and what you can do is you can create neural networks that map from some high dimensional space of images to a lower dimensional like latent space hidden space and then you can do arithmetic in this late in space so you can figure out in this smaller hidden space you can kind of think of this as like a space after you've done RG or something you can find that there's a vector that's like the typical vector of all images of men wearing glasses all images of men not wearing glasses and then all images of women not wearing glasses then you can just do this arithmetic if say okay if I take the man with glasses vector subtract the man vector then in my head I'm thinking okay maybe I found the glasses vector and if I add the glasses vector to the women vector maybe I'll get women with glasses and that's actually exactly what you get and it even handles cases like when the woman turns her face a little bit to the side or not the glasses stay in the correct position on her face you know it doesn't always work perfectly sometimes the glasses are a bit see-through or something but like that one it kind of looks like she has heavy makeup or something but the other ones that works oh so the fact that this all works tells you something very interesting might be going on there's some kind of hidden structure and some high dimensional space and you can do arithmetic on it that's that's very interesting and basically you can succeed at tasks that people just thought were like computers may never do like like be able to beat humans that go but now it's just destroying you ago right so so all this has to do is somehow the people are getting a handle on some kind of high dimensional spaces and we're able to kind of think about how to program computers in some different way so what is this what is machine learning no can we try to summarize what's going on and so here's my best attempt to summarize what's the core idea behind all this it's basically data-driven problem-solving so the idea is that we want to say that's any system that if you just feed more data into it just gets better and better and better at some task right so we want to automate that process it's like kind of like getting the human out of the loop so the idea is that the human helps to feed the data in but then the system takes the data and knows what to do with it to solve the task and the human doesn't have to really understand what the system did in detail and so it's really just some general framework or philosophy there's lots of different methods so I'll touch on what some of the different methods are another bit of philosophy that I like is this article by this guy who's now the director of AI at Tesla so he says that it's almost like a new evolution in software and I think this is actually pretty accurate that if you think about originally software I mean go go one step even further back what was software zero-point-one software earliest version of software was you had you know computers and they could do like it add numbers together and so you would say well what number should I add to make the computer do a certain thing that I want but then people said wait we can we can turn this into concepts like if and then and else and and you know we can have arrays and we can have languages like see that that take these manipulations of binary numbers and turn them into things humans can think about like logic right but this is kind of like one step past that this is like saying okay we went from just manipulating numbers on chips to logical operations which is what's up here now we can even think about it as we just say hang on we don't even need to be telling the computer each step it should be doing we just described a generalized collection of steps like multiply things by numbers take those numbers and plug them into some functions and so on and so on and so on then what we do is we tell the computer how to program itself so we say okay here is a way to take a program and replace it by a slightly better version of that same program then the computer does all the rest you know so basically we just tell it how to compute a gradient and then the computer just it's out over and over again and it programs itself so the idea is you actually get the humans out of the loop even out of programming now the computer programs itself and all we do is we just describe families of programs and their gradients or something like that so it's kind of like this higher order of programming the good thing for you guys is that this kind of programming uses a lot of math skills calculus and you know complicated optimization it's a lot like varying wave functions so this is a possible job opportunity for all of you okay so let's just go through some basic concepts of machine learning so I'll fly through this part one basic concept is a data set okay very simple you're given a bunch of data you're given categories or labels and very often the data set is divided for you or you can do it yourself into a training set in a test set and the idea is this is what you can do whatever you want with this you're only supposed to look at in principle once at the very end of your study like if it was a double-blind study you would never see this and some other group would be holding it on for you you know or maybe the referees would have this and they would apply it to your model and that you know and they're in the more like in the real world you kind of sometimes snoop on to this and look at it when you're not supposed to but you know so what you do in to prevent that snooping where you're kind of secretly bringing the test set into the loop is you make these things called validation sets and you say that's going to be my mini test set within my training set so you train on the blue stuff then you test on that and you can do that as many times as you want and that way you don't have to have the test set around then what you can do is what's called cross-validation where you say train and test on this one then you train on these and test on that and so on and so on and then the idea is you by having different validation sets you can get an idea of like how well is this likely to really work in the real world where you don't get to see all the data so it's a good idea to have multiple validation sets okay so now basically machine learning you can think of dividing at a very high level into different kinds of tasks okay and these these broad classes of tasks are these supervised learning unsupervised learning and reinforcement learning and you can think of these as corresponding to how much knowledge you have about about your data a priori so by a pre I just mean upfront so in supervised learning you have a lot of information you have your data in hand and you have labels telling you this is a cat this is a dog this is a you know containership or whatever those things were unsupervised learning you may be only just have some data but you don't even know what it's about you just like here's a lot of configurations of some physics model I don't even know if it's a ordered phase or what is I don't even know if the model is one - your 2d you know so you just have a lot of data reinforcement learning is the most extreme case where you don't even have data at the beginning you just have a world and you have a way of collecting data so you're just starting out you go into some dark room and you have a flashlight and you're just like scanning it around trying to get information about the room and then make decisions about how to proceed through that room so it's like one step back even from one supervised learning so and then I'm gonna have to end pretty soon so I'll wrap the rest of this up tomorrow but I mean I think I have time to go through these tasks and then we'll kick the rest to tomorrow okay so let's go through these tasks and describe what they are all about so the one that's gonna be the most important because it's the one where you can it's the one where all the technology is making the biggest progress right now and I'll talk about it the most the supervised learning so supervised learning what your task is is you're looking for a function f called the decision function and its job is to discriminate between different inputs so let's say you have two kinds of inputs in this little mini universe of this problem and those inputs are type A and type B so the type a could be pictures of alligators and type B could be pictures of bears so maybe it's telling you like what should you do if you encounter this versus that should you run away or should you play dead or what's your strategy for surviving this encounter okay so what you do is you have different kinds of ways of signaling the output a simplest one is just if the outputs positive it indicates the function thinks it's in a negative it thinks it's a B it could also be some kind of vector valued output if you want to Joanna have more than two categories and the way you do the way you often find this function as you say pick a function that you can parameterize it has some adjustable parameters then stick it into some cost function some kind of objective which is just that you you test your function on every training input that you have and you measure the distance from the output of F to the ideal output which you call Y so Y is like the label so it's either plus one for a or minus one for B in this case and you say oh how good is the function and however good it is that determines this number C then you just kind of optimize f - f is some adjustable parameters you just keep adjusting them to try to get this to go down however you do that okay so it measures the distance of the trial function from the idealized function YJ which is giving the perfect outputs for the inputs so just like think about function it's a very high dimensional object this is why is like the perfect function f is the one you have and you're just trying to bring them closer together so it's sort of just high dimensional fitting in that point of view there's a part of it though that's because it was beyond fitting and ask you with generalization unsupervised learning it could be many things it's sort of anything you can do with unlabeled data so one thing you could do is you could say the data maybe has repetitions or some things in the data are similar to other things so there's some kind of probability distribution so I want my function to sort of fun you know to match whatever hypothesized probability distribution there is or it could be that Square matches the probability distribution you could also find clusters in the data so this is this one this third one is rather different from the first two you could say the data has some kind of structure and maybe I want to find that structure maybe I want to lump different things together and that's what your task is or you could even just be doing pre-processing saying let me take high dimensional data and find some reduced or compressed form of it that I could then use to do other things like either some of these tasks or supervised learning from the previous slide and so on and the approach here let's say for this first task this probability one is that you you say okay let me call the function I'm trying to optimize P so there's there's the P I'm optimizing and then there's like the true one of the data which you think about is like taking your data and just putting a delta function of probability on everything you've observed so that's like you're a perfect probability distribution in a sense but it's kind of an over fitted one and then you say okay I have the one I can actually optimize P here and then I want to measure some distance between this and this and one way to do that is to use this thing called the kullbackleibler divergence which equals this formula up here which is called the log likelihood so the idea is that you basically can say let me take every input that I have and plug it into my my P that I'm trying to improve this function P take the log of it so that things are more manageable and then I want to maximize this and the sums of these logs which is equivalent to the products of all the probabilities so I want to maximize the probability of jointly observing all the data that I have I want that probability to be high right so that's that's equivalent to minimizing the distance between your distribution and the true one so that's what you do there and our last one I'll mention and then I'll stop for today is reinforcement learning so this one has there's a lot of different ways of defining it at different forms of it but here you have some kind of environment the environment is like a maze or some kind of world that you're exploring it could be the actual world like maybe you actually have a glider that you're throwing up into the air and you want to explore the wind currents and in the air so you have some environment and you have an agent the agent could be that glider it could be this little robot in this maze you know it could be a game player in some some game state and then this you have States SN and the states are like where are you in the maze or what's your you know current angle of this glider and what what wind is it feeling or something and then you have actions like tilt the wings of the glider or go up or go left in the maze something like that and then you have a reward and the reward could either be distributed all throughout the environment like maybe throughout the maze there's pieces of cheese or there's traps or something so you could have positive rewards and negative rewards or it could be that the reward is only at the end of the maze and you get no information at all until you actually get out of the maze and that those are very challenging but you can actually make progress on even on those kinds of problems so then your goal here is to determine a policy and those usually are thought of as probabilistic saying like if I'm in state SN then I want some little miniature probability distribution over actions basically I want some probability of going north versus west versus south versus East so that if I follow this policy then I'll try to maximize the reward and the fewest number of steps and so you can actually do things like you can watch a game of pong or you can play pong only observing the pixel States and only knowing that you want to get these scores to go up but otherwise not knowing anything else about the game not even knowing which paddle is yours or like what the meaning of the ball is or what what happens when it hits the boundaries only just watching the score and you can actually train a little neural net to take in all these pixel configurations and then output a probability of moving up and then you can train this neural net to get better and better and better at outputting that probability to get the score to go up and actually works so there's there's a nice blog I could point you to that tells you how to do that in Python so you can really do this so that's just that's just quickly going through three different types so tomorrow I'll say a few a bit more about that and then we'll get into using tensor networks to do the first one supervised learning and then also have some slides on doing unsupervised with tensor network so okay thanks [Applause]
Info
Channel: ICTP Condensed Matter and Statistical Physics
Views: 2,490
Rating: 4.8688526 out of 5
Keywords: ICTP, Abdus Salam International Centre for theoretical Physics, Condensed Matter, Statistical physics, Statistical Mechanics, Numerical methods, Collective Behaviour, Coherent dynamics, Quantum Matter, Topological quantum matter, entanglement, phase transitions
Id: Zcr85jmeY94
Channel Id: undefined
Length: 64min 18sec (3858 seconds)
Published: Tue Sep 11 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.