QAOA Introduction

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- [Yuri] Hello everybody. This is Argonne Quantum Tutorial. Today, speaker Ruslan Shaydulin is going to talk about how to solve terminatory optimization problems on quantum computers. Ruslan is Argonne's scholar. He won prestigious directors fellowship at Argonne. Go ahead, Ruslan. - [Ruslan] Thank you, Yuri for the introduction. Okay, my name is Ruslan, I'm going to talk today, mostly about quadruplex optimization algorithm, but also writing some more general points about solving optimization problems in quantum computers will come up. The first part of my talk, I'm going to give you some theoretical background and that will be about an hour, and in the second part, we're going to do a hands-on using kits kit. I will show you how to implement QAOA and show you some of the practical pitfalls associated with running QAOA. And then you point through the presentation, feel free to put your questions in the chat. I think Casey's my starting to chat or Yuri, or maybe both, and I we'll try to keep an eye on myself too. So, if there's a question in the chat, I will try to address it as soon as I see. With that, let me start by saying a few words about the big picture considerations that are relevant here. So when we're thinking about solving problems, optimization problems in particular, what we're usually interested in is the asymptotic complexity of solving this problem. So specifically what this means is we're interested in how the resources, the resource requirements, what time requirements, and memory requirements grow with a problem size. And this is what those famous classes of problems are referring to like P, which is class of problem solvable in polynomial time, NP class of problems where the proof can be verified in polynomial time, and PSPACE where the problem can be solved in polynomial space but unlimited time. And these are the famous classical complexity classes. In the quantum space, the complexity class we're most interested in is the BQP bounded error on a polynomial time, which roughly corresponds to polynomial depth quantum circuits. And this class contains problems that are solvable on quantum computer in polynomial time with an error probability of at most 1/3 for all instances. And it's roughly equal to a BPP founded on an error of bounded probability per (indistinct) Now, what is the relationship between these classes? Well, so the big one is the PSPACE. So if you can solve something in polynomial time, you're necessarily solving it in polynomial space. Everything is contained in PSPACE. And then of course, if you can verify the proof in polynomial time, then you can solve the problem in polynomial time as well. No, if you can solve the problem in polynomial time, then you can verify the proof in polynomial time. So P is contained in NP, and of course P is contained in BQP as well. Now the question is, what is the relationship between NP and BQP? And so for example, I'll give you an example of problems that aligned in each of these regions of this Venn diagram. If you thinking about something like finding a shortest path between two nodes in a graph, that's a polynomial time problem, and it's in P. Then the famous example of integer factorization, which is in BQP because there exists a polynomial time quantum algorithm, Schwarz algorithm. For this problem yet it's not known to be in P, that is, there is no known classical polynomial timeout. And one of the problems that we're going to talk about today is the maximum cut problem is in general NP complete, that is it's as hard as any problem in NP. However it is NP complete in the worst case. And on specific instances, like you were solving maximum cut problem on a simple graph, like a cycle graph. This problem is actually trivial in P. The reason I kind of bring up this taxonomy is that when talking about how quantum computers can speed up solving optimization problems, we have to be very careful about what we mean by that because worst case instances are not always trivial to construct saying that maximum cut is in NP-complete. It's not trivial to saturate that bound. And maybe not all practical problems are like that. No practically interesting instances are like that. And proving asymptotic performance, especially of quantum algorithms is notoriously difficult. So this is something to keep in mind here. And really the main takeaway is that we are not gonna solve NP-complete problems in polynomial time today, or probably ever. It is sort of, and if a folk common assumption people hold that NP is not in BQP. So there are some problems here that are not solvable on quantum computers in polynomial time. Okay, so we're not going to solve NP-complete problems in polynomial time. So what are we doing here? Well, we're going to look at things slightly differently today. And the way we're going to think about solving optimization problems is we're gonna think about two kinds of algorithms. So on the left are algorithms with proven performance. So for example, Dijkstra's algorithm with shortest path is our polynomial time algorithm is always guaranteed to give you a solution that is the shortest path between two nodes in something like O of D plus E. And in quantum case, there are also algorithms like that, algorithm that are guaranteed to give you a solution of given quality, often exactly correct solution within certain running time. Like Schwarz algorithm is only going to give you indigent factors, or factors of your integers in some polynomial time. However, the most powerful algorithms, including the most powerful optimization algorithms, both in classical, especially in the classical case are often heuristic methods. And what I mean by heuristic here is that there is no guarantee of getting the optimal solution, of getting a solution of given quality in the worst case. In the worst case, you'll either have no promise about the quality of the solution, or you can have no promise about the running time of the algorithm. And the famous example is using gradient descent problems that you can not prove to be convex. And that's a heuristic because gradient descent will only work for provably for convex problems, and if you have a weird non-convex landscape, or you cannot show that it's convex, then it's a heuristic because you cannot guarantee its success. And we think what I'm going to argue today is that the quantum heuristic algorithms such as the quantum approximation optimization algorithm that I'll be talking about today are the most promising algorithms for optimization (indistinct) And it's especially promising because the quantum hardware, such as the noisy intermediate scale quantum hardware that becomes available today provides a unique opportunity to benchmark these heuristic methods and actually answer the questions about their performance. I see there's a question on the chat from Ish. I'm going to read it out and I'm going to answer it. Ish is asking about the VQE, which is an algorithm that was discussed yesterday, involves a lot of optimization loops, that is outer loop parameter optimization. How do we then classify in terms of BQP when we don't know how many durations are required? So this is exactly what I'm getting at. The VQE here is a heuristic algorithm, at least the way it is discussed today, that is we don't know how many outer loop iterations you would require, right? So you cannot say that it will be, for example, at most polynomial time, that is not polynomial number of iterations in dealing with the problem size. And because you don't have this guarantee, it can be exponential. And in fact, there is a result showing that parameter optimization even for QAOA is NP-complete, is NP-hard, it's hard in short, even for problems that are classically easy. So the parameter optimization problems can be more difficult than their underlying problem. So we do not classify this algorithms in this sense. I mean, they're all, I guess, PSPACE, right? But they are heuristics, so they're on the right side of this bucket. Hope this answers Ish's question. And any other questions at this point would be a good time for questions because we're at the end of the first part. So the first part of my talk now is I'm going to talk about how to take a classical optimization problem and map it onto the quantum computer. And I'm going to talk about the maximum cut problem specifically because it's commonly used kind of paradigmatic example for quantum optimization and you'll see a lot of papers that discuss maximum cut. But I will also talk later about the general rules for building quantum Hamiltonians representing arbitrary Boolean functions that you can map any optimization problems you would like onto a quantum computer, as long as it's defined on the Boolean queue. In the maximum cut problem, the goal of maximum cut problem is to split the set of vertices of a graph into two disjoint sets such that you've got as many edges as possible. So for example, if we're looking at this graph on the right, there are four nodes in this graph, two, three, four, and there are five edges in this graph. And if we let the color denote part, click cut, partition the set of verts into green ones and red ones, then we cut the four edges between these vertices. We can write it out as an optimization problem in the following way. So we're summing of all edges, one minus SI SJ or SINSJ is minus one or plus one in nodes in the part assignment of the vertices. Though here we're saying that all green vertices are assigned label minus one, and all red vertices are assigned the label plus one. You'll make it to verify that it's correct if you look at this term, so this is the contribution of each edge, the cut. If both variables have labels, they have the same sign and then this becomes zero, so there's no contribution to the cut. And if you have different sign, then this minus, this SISJ becomes minus one, so the whole thing becomes one. And so the cut has increased by one. That is it correct formulation for maximum cut problem. Okay, so we starting with this classical optimization problem on the left. We're trying to find some configuration S, binary configuration S, that maximizes our objective function. And this is all of the problems we're going to consider today are going to have form like that. And what we want to do is we want to solve them somehow on a quantum computer. So on a high level, what we do is, and this is not the only way to do it, but this is the way (indistinct) I'm talking about today. We're going to take this optimization problem, classic optimization problem, and we're going to convert it into a problem of characterizing some Hamiltonian, and then the spectrum of that Hamiltonian will encode the problem, the objective function, and then the ground state of this Hamiltonian with highest energy gets set with this Hamiltonian will encode the maximum configuration, the configuration that maximize the objective function. Let me quickly unpack this. So we're going to take our objective function and we're going to construct some Hamiltonian which is our mission operator, mission matrix that encodes this objective function that spectrum. And it's like in values. And then our solution, which in classical case, some configuration, which is a string of minus one and plus ones of length N, where ends the number of variables. In quantum case, this build become the encoded in the highest energy eigenstate or in more linear algebra terms, largest eigenvalue, eigenvector of this operator, and the observable associated with the, corresponding to the value of the objective function in a given configuration is the energy of the Hamiltonian with the given quantum state. Now the nice thing about this is that because our Hamiltonian is classical, that is, it's a diagonal operator. This eigenstate, this highest value energy eigenstate will be a computational basis state. So if we measure it, we will get the solution to our problem with probability one. So there is in an ideal setting where we solve the problem exactly, there is no randomness to it. Okay, so what has this Hamiltonian actually look like? So what we want is a Hamiltonian that acts on a computational basis state, acts at the facts. And so what it looks like is it's a matrix of two to the power of N by two to the power of N that has on its diagonal all values of the objective function on all possible binary strings. And that is easy to check that the effect acts on, for example, 000, which is represented by a vector, which has one in the first entry and zeroes elsewhere. Then the action of this operator on this vector will be exactly F of zeros. And the same is true in the general case. So that is if we are looking at the action of this operator on an arbitrary computational basis state, which is encoded by a vector, which has one in the end, should responding to this binary string and zeros elsewhere, and the action are Hamiltonian will be exactly F of X, that is it would add a phase of F of X. This matrix that I wrote is too large, it's two to the power of N by two to the power of N. So it's too large to construct explicitly. And in fact, if we were to construct it explicitly, we would have already found the solution of our optimization binary because you can just look at that diagonal, this upper right and pick the smallest value. So would be a little silly. So the question is, how do you construct this Hamiltonian efficiently? And the way we're going to do this is we're going to build it from Pauli Z matrices. But first let's quickly recap the notation we're going to be using. I think it's the same notation that's used by the speakers the last two days, but it's always good to make sure we're all on the same page. So when I write cut X, this is the same as in classical case, bold X, or X with an arrow over it, it's just a notation for a column vector. And when I write bracket Y this is conjugate transpose of cut Y, and it's a role vector with elements being conjugate transposed on the Y column vector. This simply denotes this bra Y cut X, simply denotes in our product between Y and X, and that X bra Y, this denotes our other product between X and Y. And this I think is all the notation we're going to use today. Let's go back to our MAXCUT objective. And so what I'm going to do is I'm going to just show you that we can just take these variables S that are plus or minus one, and we can construct Hamiltonian by simply mapping these binary variables SI onto the eigenvalues of Z, of Pauli Z matrix. And what this would mean is that later is that all we do is we mechanically take the formula and we replace each SI with a Pauli Z of thread or acting on the corresponding cubit, Pauli Z acting on the I cubit. And this is what we're going to do. And next, what I'm going to show you is that this is indeed correct. And I'm just kind of walk you through what's going on here to give you a sense of why this is indeed correct. So what we want to show them that this is correct is we want to show that this operator acts on the computational basis states X as C of F as the value of the objective function on that binary string. So let's show that. So the Pauli Z operator is this matrix. It's diagonal matrix with one and minus one is diagonal. And it's easy to see that eigenvalues minus one and plus one with eigenvectors being computational basis. So you can just apply it to vector zero, you cut zero, which is computational basis state zero, and it's simply a of vector of one zero and also stays zero. And if you have computational basis state one, which is zero one vector, then the action of Pauli Z matrix on this vector gives you minus one phase. So in short, the action of Pauli Z operator on the computational basis state X is adding a phase of minus one to the power of X for X at zero or one. Now, if we have Pauli Z acting on I-th cubit, what we mean by that is that we have a operator that is acting on all N cubits, but its identity everywhere else. It's identity in terms of product, as a product of identities on both ends, and then Z acting on the I-th cubit. And here I'm making assumption that Z is somewhere in the middle. So in general, if Z is like the Z of cubit, there is no identities on the left, for example. But for simplicity of notation (indistinct) And then the action of this operator on I-th cubit is easy to see. It's going to add a phase of minus one to the power of XI, where XI is I-th entry of the binary stream, responding to this computational basis. Further more, if we have two Pauli Zs, ZN ZJ, then the action of this operator, which is a product of two Pauli Zs is minus one to the power of XI minus one to the power of XJ that is adding a phase of minus one to the power of XN minus one to power of X. And here, once again, I'm just assuming for the sake of notation that we'd reorder it to keep it such that they are adjacent. Okay, so this is the action of ZZ operator. Now let's show-- Let me show you that this is the right action. And so what are we going to do you, is we going to take our objective, our MAXCUT objective, and we're going to take it from SIB and minus one and plus one to XIB and zero, which is a trivial, which we can just say SI equals minus one to the power of XI. And this should already start looking familiar. So our new reformulated objective now is contribution of each edge is one minus power one to the power of XI minus one to power of XJ. To remember of what we want to show is that in our MAXCUT Hamiltonian, in our MAXCUT Hamiltonian, it acts on the computational basis. The X as C of X as the value of the objective function on the corresponding binary string. And from what you've seen before, it should be clear that it is exactly what it does. Just walk through it. So this is our Hamiltonian some computational basis state acts. Then by linearity, we can open this brackets. With the first term, this is just one identity just does one, and the second term gives us minus one to the power of XI minus one to the power of XJ, which is exactly the value of objective function. This gives us the-- This shows that the Hamiltonian doesn't correctly implement our objective. So what we've shown is that the Hamiltonian and closing our MAXCUT objective function can be constructed by simply mapping the binary variables SI minus one and plus one onto the eigenvalues of Pauli Z. And know that everything I worked, even though I only walked you through the example of MAXCUT, the very same procedure will work for any objective that is built by some products of binary variables. If they need not be for example, quadratic, so you can have a productive of five Ss, and what would that become is product of five Pauli Zs. Any questions on the construction of the Hamiltonian representing the MAXCUT objective function? The size of this operator, the size of this Hamiltonian is two to the power of N by two the power of N, because we write ZIZJ it's a operator acting on N cubits. So it's a two to the power of N by two to the power of N operator. So we cannot, as I said earlier, we cannot write this out explicitly. How do we do it in general, how do we build this Hamiltonian in general? Well, maybe a better question why can we always build it? And the method that I'm going to show you is due to this excellent paper by Stuart Hadfield, I highly recommend. But what we want to do now is we want to construct a Hamiltonian that acts on the computational basis state X by adding a phase F of X. The first, let me give you a quick primer on Boolean analysis, analysis of Boolean function. So consider some arbitrary Boolean function. But here, just to show the generality of this, I'm considering a maximum function, which is not even a typical and objective function optimization problem. And I'm going to kind of go willy-nilly between X being minus one and plus one and X being zero one. And we know that and I've shown before, we can always do this by doing this, by change of variables. So this maximum function can be expressed as a multi-linear polynomial of this form. Now, how can we construct this multi linear polynomial? Know that any Boolean objective function can be written out as sum of F acting on some configuration times indicator polynomial correspondents configuration, which is zero if X is not equal to A, and one, if X is equal to A. And this indicate that polynomial can constructed explicitly this way. For example, for our maximum function, Y, building this indicator polynomial explicitly and adding the corresponding, completing the corresponding values of the function, we can build them all to linear polynomial representation of this. Now, let me quickly add annotations. I'm gonna write X to the power of, Sth S to be a monomial corresponding to S, namely, it's a product of XI over indices that are in the Sth S. And so this multilinear polynomial is actually a Fourier expansion of this Boolean function. And these coefficients are rear coefficients corresponding to this Boolean function. And there is a results showing that this multilinear polynomial always exists and is unique, though, for a given objective function, we have a unique multilinear polynomial representing it. Okay, now back to our Hamiltonians. So know that instructing this action of the Hamiltonian is actually represent-- You can get this action by building a Hamiltonian as a sum of F of X times projector onto the computational basis state backs, right? This holds because this projector will give you one if you're projecting that. If X is itself, is X, if you were projecting X sum two X, it will give you zero if you're projecting some eigenstate, some computational basis that is not equal to X onto X because computational based states are an orthogonal basis. And just comparing these two, you can already see that this is how you do it. So you go from in classical case, indicator polynomial to projector operators in the quantum. And finally, just to-- So, okay. And then know that the because products of Pauli Z implement parody functions. So products of Pauli Z acts on the computational-based state X acts as by adding a phase next to the power of S, then taking the Fourier expansion off your classical and Boolean objective function and taking those Fourier coefficients, lagging them into this formula will give you a unique N-qubit Hamiltonian representing F. So, in the short the algorithm, the recipe for constructing a Hamiltonian for an arbitrary optimization, arbitrary objective function is you start by performing Fourier expansion of this function, and then you replace the multilinear, the monomials X to the power of S by products of polycis. Now this is a general recipe in a sense, a proof that this representation always exists, but is in general hard to compute the Fourier expansion. And in practice, you can build this Hamiltonian from simple building blocks, and I would recommend this paper, Stewart's paper, and this kind of walks you through how this can be built for common Boolean functions. Finally, in practice, your objective function usually already comes as a polynomial. For example, in MAXCUT, we already started out with a polynomial. So if it's a polynomial, it's easy to convert to an objective function, it's easy to convert to Hamiltonian by replacing the products of X with products of Pauli Zs, as I've shown you for MAXCUT. So now I'm going to talk about solving this problem. So we've converted this classical optimization problem in a form that is convenient to solve in a quantum computer. So how do we actually solve it? How do we find this highest and eigenstate of this Hamiltonian? And this we're going to do using quantum approximation algorithm which a heuristic algorithm for finding or solving conventional optimization problems. And the way it works is it prepares the following parameterized trial state, or which is commonly referred to as ansatz. So we start in the vacuum state or zeroes, we apply the parameterized cubit, and then we apply a sequence of alternates and operators, where one operator is major exponent of minus I times, some free parameter, gamma times the problem Hamiltonian and another one is exponent of minus I, some free parameter beta, and B is the mixer Hamiltonian. The problem of Hamiltonian is this combination of Pauli Z images that we just learned to construct, and the B is a mixed Hamiltonian, and it's the commonly used ones one is sum of Pauli X on all qubits. And then we'll just typically done is we use parameters, we use some method to find parameters, beta gamma that maximize the expected value of the objective function. Now, why do we think it's a good answer? If we have a P, for P is the number of these pairs of operators that we apply in sequence. Though, we know that if we have infinite number of these operators, then this can at least exactly approximate something called adiabatic quantum algorithm, which is guaranteed to find the exact optimal solution in the limit of potentially exponentially large running time. Now for small P the picture is somewhat mixed, but there are some results indicating a potential for quantum advantage. For example, there are demonstrations that due to a different mechanism, QAOA can outperform, for example, simulate annealing and quantum annealing or certain class of problems. Now, one thing you might wonder about is how do we implement this in gates? So this is easy, other math is easy, but this may be not obvious to you. And maybe this is something that Dimity has talked about yesterday, but I'm going to reiterate it anyway because this is very similar. It's the fact when you're simulating the Hamiltonian, C more time gama. This is simply Hamiltonian evolution circuits. I'm gonna assume a pretty standard basis state. You have your valleys, your Hadamard and your RZ gate, actually don't need phase gate. Okay, so let's start with a simple operator and work our way to evolving the cost Hamilton involving out handle. The simple operator we're going to start with is just probably Z on one qubit. And what I'm going to show you as that, what you can see is that exponent minus IZT or T, some free parameter is equal to rotations around Z or with angle two times T. And you can see it by just examining the formula for the RZ gate. The RZ gate is exponent minus I theta, zero over two, which is exactly the operation that we want to implement. And so the circuit for implementing the simple operator is rotation on Z or attempts to T. I see there was is question in the chat. The question is, can the mixer be sum of Z, so the start state is all zeros, which is easier to prepare than a superposition state, is the question is that we want, the question is if I understand correctly, that in B the mixer to also be, to be for example, sum over all, single qubit Pauli Z on all qubits. Okay, well, the issue with that would be that both the mixer and phase separate Hamiltonian will be diagonal in the computational basis. And so what this means is that no matter what state you start with, if you only have diagonal operators, they do not change the probability amplitudes of your outcome. Therefore, this does not, no matter how you vary, it would vary the parameters, you would get the same samples out. So you need something that is non diagonal. More specifically, you need something that doesn't compute. So C and B should not compute, otherwise you will not have interference. But you can, theoretically you can plug in, I think some of Pauli Ys here, and it would also be non-trivial. So in general, any non-computing operator here works. Some are better than others. Thank you for the question. Okay, so we figured out how to implement this simple one minus IZT, single qubit Pauli Z. Now, what happens if we have a two body term, a product of two Pauli Zs? So, to see the action recall that Z has eigenvector zero one with eigenvalues plus one and minus one and recall from your class that exponent of matrix A acting on eigenvector V is equal to exponent of the corresponding eigenvalue. And from this, you can simply write out the action of this operator on all computational basis state. So we can by writing out its action on the computational basis states on all of the basis, we can see that it adds a phase with the sign depending on the parity. See our phase, it adds a phase factor that is something like minus one to the power of A, X or B. And the circuit implementing that is T not, RZC not. And you can see this simply by seeing what it does on some input AB. So C not sets the second register to be X or B. And then RZ by definition applies the, adds the phase equal to the value of this register. And now we add a second C not to undo, to reset the state of the second register back. And so this gate adds a global phase as a phase of that is equal to the desired phase. So this is the correct circuit to implement exponent of minus I ZZT. And that gives us the two body terms. And now the last thing we care we want to is we want to have many ZZ terms in our MAXCUT Hamiltonian. One, because they compute, we can just represent them as products of corresponding terms. But if you have sum of two terms in our exponent and they compute, this simply product of their exponents meaning, we can just concatenate the correspondence circuits. And this is sufficient to build circuit to simulate our MAXCUT Hamiltonian, that is our problem Hamiltonian. And this is half of our circuits, or this is all of these terms. Now the second part of our circuit is the mixer Hamiltonian, which is exponent of minus I beta sum overall qubits Pauli X. So each of these terms, exponent of minus I XT can be simulated by noticing that if you sandwich an exponent of minus IZT between two Hadamards, it is equal to the exponent of minus IXT. This is simply because Z Hadamard is equal to X. And by just looking at power expansion, you can see that this is indeed the case. And so what that means is to simulate exponent of minus IXT, all we do is we apply our Z Hadamard and this gives us a circuit to simulate the entire. This gives us a circuit to simulate the entire QAOA sets. Okay, so we can compile the QAOA answers into gates. Now, finally, let me say a few words. Now, let me say a few words about an intuition you may have for this method. So you can think of the initial state, the uniform superposition as some vector into the power of dimensional Hilbert space. And we have our target state, which is the highest energy eigenstate, for example, a maximization problem as some other vector. And what we are doing is with our trial state, at each point we'll plug in some parameters, and get something different vector. We're trying to vary the parameters beta, gamma, to bring this trial state as close as possible to the target state, which may look some of you familiar and resembling Grover's algorithm, which in the sense that is, it's similar idea of sort of you add a phase factor, and then you do something to cancel out the phases, where the phase factor corresponding is the exponent of the cost Hamiltonian and the the canceling out amplitudes is the mixer Hamilton. Now from this picture, hopefully it's clear that the quality of the QAOA solution depends heavily on our ability to find good QAOA parameters. And this is something I'm going to touch on in the hands-on part of the talk. Now, let me say a few words about the connection to the Adiabatic quantum algorithm which is something that I think commonly comes up as a question. So adiabatic quantum algorithm is a way to find some desired eigenstate of the problem Hamiltonian, for example, a ground state or highest energy eigenstate of the problem Hamiltonian. And it's inspired by the adiabatic approximation theorem, in which states roughly that if you start in some eigenstates, such as the ground state of a time dependent Hamiltonian, and you evolve this Hamiltonian slowly enough, and this potentially means exponentially slowly, then you will remain in instantaneous eigenstate of the system. So if you start in ground state, you will always remain in the ground state, meaning that if you evolve your system into the problem Hamiltonian, then your final system will be in the ground state of the problem Hamiltonian. And you can read out the solution by simply measuring the system at the end of the evolution. So your state throughout the evolution is given by this Hamiltonian, which is where S of T is a some smooth function that varies from zero at the beginning. So you begin some simple Hamiltonian and it ends with one. So in the end you're all in the problem Hamiltonian. And so in our MAXCUT case, we are going to write it out. We're going to use our problem Hamiltonian to be the MAXCUT and our initial Hamiltonian, which is our simple Hamiltonian, it's just going to be a sum of Pauli X qubits, which has the ground state of uniform superposition state. So how can we simulate this on a quantum computer? What I'm going to do is I'm going to show you how the simulation circuit, where the simulation circuit for this is, and I'm going to show you how it can be considered a special case of QAOA. Okay, so let's go to the basics. Well, we have some quantum state evolution. We start in time zero-- So we assume that our initial state, we start in time T zero equals zero, and we're also going to blend Planck's constant into H. And so we start in state zero and evolve with some unitary, and so we get our time dependent final state of T. The evolution is described-- So we're evolving with some-- So if we're evolving with some Hamiltonian H of T, then the evolution is realized this is Hamiltonian by the Schrodinger's equation. And the solution to this simple differential equation is if our Hamiltonian is time dependent, but computes with itself at points in time, you consider it simply as exponent of integral Hamiltonian over the time period. Now for our time independent Hamiltonian, the solution to this, it is even simpler, it's just Hamiltonian minus IHT. Now let's get to this time independent Hamiltonian. The first, if we have some do non-computing Hamiltonians, A and B then by Trotter formula, if you have this product of E to the IAT over N to the IBT over N to the power of N, and you send N to infinity, then this is equal to the exponent or time of evolution of these, some of these two Hamiltonians. And as a corollary, you can truncate it to just the first two terms. Now let's consider our time-dependent Hamiltonian and break it down into, break down our evolution into small chunks such that it's constant over each chunk. We're gonna set some small delta T, and we're gonna break up our time-dependent Hamiltonian, such that it's constant over the session that we can assume that it's constant over this delta T, exactly. So we're going to be assuming that it's approximately constant over this interval, and so our evolution is now a product of these time-dependent of chunks, where in each chunk, the Hamilton is approximately constant. They don't compute necessarily, so apply the Lie-Trotter-Suzuki decomposition and now we get this product of, exponent of I minus SJ Delta T. And this is our E because our Hamiltonian is evolving with one minus S of THD plus SHB. And this hopefully resembles the quantum approximate optimization algorithm ansatz. So if you look at just to put them side by side, on the left, we have the QAOA in our ansatz this product of exponents of minus I beta B, minus I gamma C. And on the right, we have the same product, but we have, instead of beta, we have one minus SJ delta T, and instead of gamma, we have SJ delta T. And so by simply setting the parameters appropriately and setting the number of steps be sufficiently high, we can recreate, we can simulate adiabatic quantum algorithm using QAOA circuits. So this is the connection between, this is what I meant when I said the limit of P approximates an infinity you can exactly simulate the adiabatic quantum computation. Why the need to dicretize time evolution AQC. Well, fundamentally I'm not, fundamentally the ratio is that we in general cannot just exponentiate this Hamiltonian 'cause has non-computing terms, so we cannot directly simulate it. To do that, we have to break it down using the Trotter formula, and then break it down into chunks simulated with small time steps. So, for example, these two things were computing, so we only had-- For example, we didn't have, the HD was for example, single qubit Pauli Zs, then we could just exponentiate it directly, but not in this case. I hope this answers, okay? With the choice of delta T need to be roughly proportional to the spectral gap of H is there some way of estimating this quantity without running several trial experiments? This is a good question. I think delta T is the overall running time, I think scales roughly as one over spectral gap squared. But yes, to the first question. To the second question, I'm not sure. I think for some problems you can, but this is not something I'm familiar with. And in any case in QAOA, we don't have to deal with this because in QAOA, we simply allow these things be free parameters, but in a sense, we're sizing, we're fixing the number of steps in this, letting delta T be free parameter, and then we somehow optimize them using that, an outer loop optimizer. Now let me switch gears, and I think this is the last part of my theory presentation- - [Yuri] There is a question from Kadir. I don't know, did you answer it already? - [Ruslan] Yeah. - [Yuri] Okay. - [Ruslan] The question about the delta T, the choice of delta T. - [Yuri] All right, excellent. - [Ruslan] Now is the final part of my talk I'm going to talk about briefly about the challenge of optimizing the QAOA parameters. And as I said before, typically the way it is done, the way those parameters, beta and gamma, which are the time with, or how long you evolve the system with each Hamiltonian now chosen is unlike in QAOA in adiabatic methods where you would do this by leveraging your knowledge about the spectral gap. Here, we just do the classical, some classical outer loop method to optimize them as a black box. Now, one nice property of QAOA is that the energy, this value, is expected value of the simple solution and be evaluated without necessarily using classical quantum computer in some cases, you can do it purely classically, and then you can, using this optimize that is find optimal QAOA parameters purely classically, and then only run QAOA on a quantum computer to sample the optimal solutions. So, to show this, let's look at our QAOA ansatz, and let's just denote the by UB the exponent of the mixer, and by UC exponent of the phase separating (indistinct) Then writing out QAOA energy for P equals one for MAXCUT. So the identity computes out, and what we're left here is ZZ, sum of the over all edges ZZ terms, and on each sides are these UB and UC operators. Then the contribution of each edge is equal to this. And if you go through it, middle out, you can by noticing that UB which is exponent of-- Product of exponents of I beta, and then single qubit XM, this will compute through the ZZ term and cancel with itself on the right, whenever the qubit the X is acting on is not one of this qubits. So whenever the qubit M is not J or K, we can compute the term through and cancel it out. So the only thing that will be left is the mixer term acting on the cubits that are corresponding to this edge, they're same qubits on which of those Pauli Zs act. Now if you had the phase separation operator, it's easy to see the same way that everything will compute through, except the terms that are attaching one of these two qubits. So in general, what, in this case for P equals one, the only terms that come into play are terms that are in one neighborhood of ZJZK. And so the number of cubits that you need to simulate is only equal to the one neighborhood of ZJZK, which can be much smaller than the total number qubits on the problem. For example, if you have a three regular graph, you will have no more than six qubits of six nodes in your one neighborhood of this term. So even if you have a million node graph, you can still evaluate the energy of QAOA and you can still chain up parameters, purely classic in simulation by leveraging this observation. And if you have high RP, all you do is you keep growing your neighborhood to include everything that is in what is sometimes called the reverse causal code of this operator. Okay, to sum up what I've talked about today briefly. First, we are not-- QAOA is not, will not solve NP-complete problems in polynomial time. However, it is a promising heuristic for the noisy intermediate scale quantum era, that is the near term quantum hardware. For any Boolean function on n-qubit, we can construct the correspondent Hamiltonian, and therefore we can solve it using QAOA. QAOA has a deep connection to the adiabatic quantum algorithm, which can sometimes be helpful, and can sometimes be misleading. And finally, we can often train QAOA purely classically in simulation. And to the question of Alvin, the presentation slides are in GitHub, the same GitHub that was used in the last two days. I think its slide.PDF in the Wednesday folder on GitHub. Okay, and now I'm going to move on to the hands-on part. Thank you Yuri for posting the link? Okay, so what we're going to do is we're going to solve MAXCUT problem on a simple graph. So recall that our objective is this, it's sum over edge is one minus SISJ. And what we're going to do is we're gonna drop some constants because the only thing that matters to our solution is this part, the objective. And we don't actually care about this identity. So we're going to just reformulate our Hamiltonian as sum over edge minus SISJ. And so this gives us this maximization problem, which as I've shown you earlier would map to a Hamiltonian, which is sum over edges minus ZIZJ. You can simply take this binary variable and map them onto the eigenvalues of Pauli Z matrix. So if this are Hamiltonian, then QAOA circuit is we start in the vacuum state apply our layer of Hadamards, and then apply a sequence of alternating operators. So what we need to do to run QAOA is we need to implement these two operators. So let's start by importing everything. And we're going to consider this graph. It's a graph on five nodes to bipartite graph. Going to be solving this. So let's start by building the circuit for the cost operator. So the building blocks are-- I have another one. Because this is sum over edges of computing terms, this can be enrolled into simply a product of expand (indistinct) gamma, ZIZJ. Therefore, what we need to do is we need to implement ZZ term by itself, and then because they compute, we can concatenate the corresponding circuits. So the code to implement that ZZ term is simply C knot RZ two times gamma C knot. And then to implement the entire cost operator circuit, all we do is for each edge, we append the circuit implementing ZZHR. And this gives us the entire cost separator circuit. And we can check that it indeed produces the correct output. We get C knot RZ C knot for each ZZ term corresponding to each edge. And here I just plugged some random parameter to get some nice output. Okay, secondly, we need to build the mixer operator. And the mixer operator is just one of my inside beta X on H qubit. To implement the exponent of minus I beta X, this is simply RX, and then the entire mixer operator circuit is for each, qubit for each node we append this X term. This gives them an extra circuit. We can plot it, and it indeed looks correct. It's RX on each qubit, okay? And with this, we can build the entire circuit. So we start with just an empty circuit, and then we apply a layer of Hadamards, so we apply Hadamard on each gate, and each qubits. And then we D times apply first, the cost operators circuit, which is this, and then we apply the mixer operator circuit, just this. And in the end we measure. And this gives us the QAOA circuit. We can draw it, doesn't quite fit on the screen, but it looks correct. So we first have this layer of Hadamards, then we apply our mixer operators circuit, and then in end we apply our phase separation circuit, or excuse me, mixer operator circuit. Okay, and now that we have it, we can run it. Now, I'm only going to be talking about running in simulators today, but it's very easy to switch to actual quantum hardware by just changing the backend in Qiskit. But know that even this small example may not run very nicely on a real quantum computer. Now, one thing to note is that Qiskit uses an ordering where the zeroth qubit is the rightmost to the least significant bit in the bitstream, which is different from the traditional textbook notation where your zero element is the leftmost binary string. So what we have to do is for each sample we received from Qiskit we have to invert it. May I have a little helper function to do that. Yes, so we can take our circuit, which we've built earlier and we can run them, run it simply in simulator, and we get some results out. And it doesn't look like much because they all seem to have roughly the same counts loses on binary string. And this is the number of times we've sampled the binary string. So to how good were our results. So we can kind of tell here they are not that great. But to answer how good they are exactly, what we need to do is we need to be able to evaluate the value of objective functional base samples. And here's a simple MAXCUT objective that takes the graph G, and all it does, is it goes through all the edges of this graph, and it checks on the corresponding bits are equal, and if they're not equal, that means they're just cut. For example, MAXCUT objective of 00011 is six or minus six, because for convention, we are minimizing. And now we can compute the energy of our sample. This value, the C psi, and what we do is we go through our accounts and for each measurement outcome, we measure, we compute its energy multiplied by the number of times we've obtained the measurement outcome, and then divide by total number of counts. And this gives us the expected value of the objective function. And we get something that's like three point something. Remember that our optimum is six, so this is not too great. Now to get better solutions, what we need to do is we need to find the good parameters, beta and gamma for QAOA. And so what we're going to do is we're going to simply use SciPy implementation of constraint optimization by linear approximation, which is a simple black box optimizer that builds a linear model with some trust region and uses this to go downhill. And so to you use that, to use this SciPy minimizer method, what do we need to do is we need to wrap our objective, our QAOA into some black box objective functions our compiler can optimize. And so this is a simple function that does that. So it returns some function that takes in parameters theta, interprets the first part of theta to be betas and second part of thetas to be gamas, and then builds QAOA circuit, and then runs this QAOA circuit, and returns the energy. And then a COBYLA can take this function, it can find parameters theta that minimize the energy. And now we can put it all together. So I'm going to set P equals five, and going grab the objective, and from some initial point. I'm going to run the SciPy minimize it with COBYLA. And I'm going to set number of the durations to two and a half thousands. And so what we get is we get energy off 5.29, which is not too far from optimal number, our optimum is the six. This is much better than what we had previously, which was something like three. Okay, now this only gave us the optimal parameters. Now what we need to do is we need to give the solution out. And for that, we need to take this circuit and run it with optimal parameters and get the counts. And what we can see here is that the most frequent counts are accounts corresponding to the solution. So our optimal solution is either 00011 or 11100. And those are the two most frequent samples. So that's good. We frequently get the optimal solution. This is what we want. Now we can look at how good the results are by just blotting how many of samples we get for each energy level, and we see that for optimal solution, which is energy six, we get, what about like more than 70% of our samples have optimal solution, that correspond to the optimal solution. This is pretty good. And we can take look at all of our samples and just take the best one by just picking the one that has the lowest energy and by coloring our graph, by looking at that graph, we can see that this is indeed the optimal solutions, so we got all the edges in the graph. So that's good. In that example, we estimated the QAOA energy from samples. Now, what happens if we use the entirety of the quantum states to estimate the energy? And for that we can use Qiskit state vector simulator, which I think Kevin discussed on Monday. And what state vector simulator gives you is the full quantum state. So for example, if our state is a belt state, then the state vector will spit out a vector of one over square root of 200, one over square root of two. Now we have to invert the state vector to account for the Giskit ordering of qubits. And I have some help of function here to do that. Now I'm going to reimplement the method for building QAOA circuit because we shouldn't measure if we're using state vector. And finally, the last thing we need is we need a method to convert our state vector to the energy estimate. And for this, I'm just going to take state vector and convert it into counts. And now we can put this all together and we can just plug in the parameters that we got previously and random and see what comes up. Now, let's make sure that our implementation is correct by verifying that the energy is equal. So we take our parameters and with plugin the-- Right, so we plugged in the previously optimized parameters we got out of the state factor, and we're looking at the value of the energy of the state vector, and it is indeed correct. It's the same value as we got from samples. So this is good, we didn't have any bugs. Great question, Alex. So Alex has a question. I understand that all of these tutorial problems have been run in ideal simulators, but just how much does the noise in actual quantum computers affect results such as these? One thing I can point you to is-- Here's an example of what noise does to you. So here we're running QAOA circuits for MAXCUT on this five cubit device, and we're using these four cubits, and we're running on star graph. Our graph is four node star graph that looks like this. And we're solving MAXCUT on this graph, and we're mapping it on the line segment of IBM qubit (indistinct) Here the green line is the noiseless energy. And as you can see, as we increased the QAOA depth P, the noiseless energy approximates optimal, and for P equals three will solve the problem exactly we achieved the optimal solution of three. Now, if we're on hardware and we started increasing the depth, from P equals one to P equals two, we get a little bit of improvement and expected value of the objective function, but it's well below noiseless case. But if we go from P equals two to P equals three here, then our expected value of the objective function actually drops. So this is what noise does to you. So noise effectively imposed an upper limit on the number of QAOA steps that you can do. I hope this answers your question. Thank you, for an excellent question. And I would recommend this paper it shows a method for mitigating these errors by leveraging the structure of the QAOA ansatzs. And I can drop this link in the chat. Okay, so now we were using the full state vector and let's just see what happens if we run the optimization loop with the full state vector. So now our optimizer is gets the energy that's competed from the full state, so it's an exact energy as opposed to energy estimated from samples. And if we do that, and it's gonna take a little more one, because state vector simulators is a little slower than the chasm assembling-based simulator. So we're running it, we we're giving the same budget to function evaluations. The only difference is that we have full energy instead of estimated from samples. I'm gonna answer the question from Ish while this is running. So Ish is asking, "When doing experiments of IBM (indistinct) "for example, and say, "we have a graph that is complete. "So to simulate the cost operator, "the operator correspond to the cost Hamiltonian, "we need C knots between all qubits, "which of course does not match the hardware connectivity, "but how do we implement C knots between qubits "which cannot be entangled on hardware?" This is a great question, Ish. So what we do is we insert swab gates. So for example, or that example that I've shown you, which had three edges. So you need only 60 knots in the ideal case, but because we only have this line connectivity, what we have to do is we need to insert a swap insert a swap here. So as a result, we get nine C knot gates instead of six. So to run QAOA on which requires the entangling qubits that are not directly connected on hardware, we insert small gates to move the qubits next to each other. And in general, this can be done with, at most linear cost by using something called swap networks. Well, coming back, I see that our state vectors simulators has finished, and you can see that our objective function is much better. So we got energy of 5.95. To compare it with what we had from samples, it was about 5.3 by 5.2, I guess I could print with more digits. Yes, so we went from 5.29 something, 5.3 to 5.95. And in fact, by running with a full state vector here, we can get arbitrary close to drop zone. So if you just, don't try it now, because it'll take forever, but we can plug in 10,000 iterations here and you will get like 5999. And you might wonder, well, why is this happening, why do I get so much worse energy, so much worse parameters, and so much worse solution when using the sum estimate of energy from samples, as opposed to the full state vector? And the reason is COBYLA gets stuck in a low quality point because of the sampling noise. And you can see this by just verifying that the parameters are quite different. Now it's important to point out that this point that COBYLA stuck got in is not a local minimum. So if we stay to this point and the run optimizer with a full state vector from that point, it could actually go downhill. And here we're assessing the COBYLA's initial step size to be very small, so we're this tiny neighborhood of that point. So this point is not the local minimum. It's simply a fake local minimum that we get stuck in because of the sampling error, the finite sampling error, growing our optimizer on. And this maybe to the question of, to Alex's question, this is only going to worse on the hardware because on the hardware the error in your objective is even larger. So you really need to think about the stochasticity of your objective when optimizing it. This makes it difficult to use some of the of-the-shelf deterministic optimizers like COBYLA, which have a baked in assumption that the objective function value is always correct. Okay, so far I kind of walked you through pedagogically, what it takes to implement QAOA, so you have a good understanding of it. Now, let me show you how you can skip some of these steps using Qiskit. So what we did to recap, we chose a problem class MAXCUT, and problem in essence, in this case, it was this five node graph, we build the general Hamiltonian for the problem class by sitting down and deriving it, and then we build this Hamiltonian for the problem instance, from the Hamiltonian we constructed the QAOA circuit, wrapped the QAOA circuit in the black box function, and use some SciPy methods to optimize QAOA parameters. And finally, we sampled the solution bar and QAOA with optimized parameters. In general, all of these seven steps can be handled by Qiskit with their high level declarative interface. But I'm only going to show you today how to use these steps, give this steps three and four by using Qiskits' Hamiltonian functions. So what we're going to do, and this is, I just copied this from Qisqit optimizations. So this is method implementing Qiskit. And what it does is it builds it cost Hamiltonian, which is a Pauli operator. I did the sum of Paulis and each Pauli is a product of two Z, Pauli Zs. The notation here, may sounded unfamiliar, but this is simply similar to stabilizer Tableau notation. So you first, you have string encoding the, Pauli X string encoding Pauli Z, and if both X and Z are one, then this is why. This is exactly like the stabilizer Tableau notation. And so now what we can do is we pass a graph to this, get the cost operator function, and it will spit out our Hamiltonian. And by matching the Hamiltonian, to the kind of learning that we know to be correct, you can tell that this is indeed the right Hamiltonian. So it has terms corresponding to each edge with the right coefficient 0.5. Now we can make sure that the Hamilton is correct by simply using an eigensolver, like a classical NumPyEigensolver to find its lowest eigenvalue eigenvector. And so we get some eigenvectors. Now we have to convert them to the solution on the MAXCUT problem, which Qiskit provides a function to do. So we take these eigenvectors and we convert them, and we obtain results, obtain the solution corresponding to this eigenvector using Qiskit, which is indeed the correct solution. So the Hamiltonian we've built in Qiskit is correct. And this is something you can do if you already running QAOA on your own problem, because you can simply build your Hamiltonian the same way and then check it using exact eigensolver. And of course you can simply build your Hamiltonian as a matrix. So you can know that your Hamiltonian is the sum of ZZ terms and each of them is Z tender product identities. And so or you can just build this in NumPy and then just add it up and you can use this operator in Qiskit. Well, actually first you probably want to test it, but here we take this operator and we get it's eigenvalues and eigenvectors, we get the lowest eigenvalue, and corresponding eigenvector, and then if we use Qiskit's method to get the corresponding solution it's indeed correct. So even the Hamilton we built Qiskit is correct, in, excuse me, in NumPy. And finally we can build the QAOA circuit from this Hamiltonian by just using QAOA variational form implemented in Qiskit. And then the same way as before we can just take this and plug it in by constructing the circuits. So the methods variational form to construct circuit, and you have to plug in some parameters. Measure and run it on the Qiskit and then you'll get some counts out. Similarly to how we did with a circuit that we've built ourselves from gates. And by comparing this to the circuits, doesn't fit on my screen, so you'll have to believe me, but it's the same circuit. You can let me guys zoom out I can see if you can see it. Yes, it's the layer of Hadamards and then RZs and they're on slightly different order, but you can see that they match. And the only thing we have to rescale the parameters correctly. Actually we don't. It doesn't have to happen. Yes, and the two circuits match. And in fact, we can check that they produce the same unitary. So what we can do is we can compute the unitary using Qiskit's unitary simulator, we can compute the unitary corresponding to the circuit that we've built from gates by running the circuit through the unitary simulator and grabbing the unitary. And what I'm doing here is I'm removing some errors that come from numerical instability. So I'm throwing away items that are smaller than 10 times the old Epsilon. And when you do that, you can check that the two unitaries are equal up to the global face. And note that even the Hamilton that you have built in NumPy, you can use in Qiskit, but it will be slow in general. Probably only recommended for prototyping. And that's all I have for you today. Thank you for listening, and I'm happy to take any questions. I see there in a questions from VJ asking what other black box customization techniques are available. And the answer is there's many ones in SciPy, and there's actually further ones in Qiskit. If we go back here we can instead COBYLA, we can use something like (indistinct) and we'll get some solution as well, but it may be worse. In general, you can go to SciPy minimize. And there's a whole list of methods available there. Drop this in the chat. Now the good library to use is NLopt which also has implementation of many different methods. And some of these NLopt elimination in general better, they are usually suffer from like weird memory issues or numerical instabilities. So I'll drop this in the chat as well. So you can see that we ran it an elder medium, we actually got a better solution, 5999. The lesson is try different methods, some way work better than others. - [Yuri] And thank you very much Ruslan. It was a very nice presentation (indistinct) session as usual. Again, I'll see everybody tomorrow at 10 o'clock. Okay, bye. - [Ruslan] Thank you guys.
Info
Channel: Argonne National Laboratory Training
Views: 212
Rating: 5 out of 5
Keywords:
Id: 4JfwuAX_cvU
Channel Id: undefined
Length: 84min 58sec (5098 seconds)
Published: Thu Aug 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.