- [Yuri] Hello everybody. This is Argonne Quantum Tutorial. Today, speaker Ruslan Shaydulin is going to talk about how to solve terminatory optimization problems on quantum computers. Ruslan is Argonne's scholar. He won prestigious directors
fellowship at Argonne. Go ahead, Ruslan. - [Ruslan] Thank you,
Yuri for the introduction. Okay, my name is Ruslan, I'm going to talk today, mostly about quadruplex
optimization algorithm, but also writing some more general points about solving optimization problems in quantum computers will come up. The first part of my talk, I'm going to give you some
theoretical background and that will be about an hour, and in the second part, we're going to do a
hands-on using kits kit. I will show you how to implement QAOA and show you some of the practical pitfalls
associated with running QAOA. And then you point
through the presentation, feel free to put your
questions in the chat. I think Casey's my starting to
chat or Yuri, or maybe both, and I we'll try to keep
an eye on myself too. So, if there's a question in the chat, I will try to address it as soon as I see. With that, let me start by saying a few words about
the big picture considerations that are relevant here. So when we're thinking
about solving problems, optimization problems in particular, what we're usually interested in is the asymptotic complexity
of solving this problem. So specifically what this
means is we're interested in how the resources, the resource requirements,
what time requirements, and memory requirements
grow with a problem size. And this is what those
famous classes of problems are referring to like P, which is class of problem
solvable in polynomial time, NP class of problems where the proof can be verified in polynomial time, and PSPACE where the problem can be solved in polynomial space but unlimited time. And these are the famous
classical complexity classes. In the quantum space, the complexity class
we're most interested in is the BQP bounded error
on a polynomial time, which roughly corresponds to polynomial depth quantum circuits. And this class contains problems that are solvable on quantum computer in polynomial time with an
error probability of at most 1/3 for all instances. And it's roughly equal to
a BPP founded on an error of bounded probability per (indistinct) Now, what is the relationship
between these classes? Well, so the big one is the PSPACE. So if you can solve
something in polynomial time, you're necessarily solving
it in polynomial space. Everything is contained in PSPACE. And then of course, if you can verify the
proof in polynomial time, then you can solve the problem
in polynomial time as well. No, if you can solve the
problem in polynomial time, then you can verify the
proof in polynomial time. So P is contained in NP, and of course P is
contained in BQP as well. Now the question is, what is the relationship
between NP and BQP? And so for example, I'll give you an example
of problems that aligned in each of these regions
of this Venn diagram. If you thinking about something
like finding a shortest path between two nodes in a graph, that's a polynomial time problem, and it's in P. Then the famous example
of integer factorization, which is in BQP because there exists a polynomial
time quantum algorithm, Schwarz algorithm. For this problem yet it's
not known to be in P, that is, there is no known
classical polynomial timeout. And one of the problems that
we're going to talk about today is the maximum cut problem
is in general NP complete, that is it's as hard as any problem in NP. However it is NP complete
in the worst case. And on specific instances, like you were solving maximum
cut problem on a simple graph, like a cycle graph. This problem is actually trivial in P. The reason I kind of
bring up this taxonomy is that when talking about how quantum computers can speed up solving
optimization problems, we have to be very careful
about what we mean by that because worst case instances
are not always trivial to construct saying that
maximum cut is in NP-complete. It's not trivial to saturate that bound. And maybe not all practical
problems are like that. No practically interesting
instances are like that. And proving asymptotic performance, especially of quantum algorithms
is notoriously difficult. So this is something to keep in mind here. And really the main takeaway is that we are not gonna
solve NP-complete problems in polynomial time
today, or probably ever. It is sort of, and if a folk common
assumption people hold that NP is not in BQP. So there are some problems here that are not solvable on quantum computers in polynomial time. Okay, so we're not going to solve NP-complete
problems in polynomial time. So what are we doing here? Well, we're going to look at things slightly differently today. And the way we're going to think about solving optimization problems is we're gonna think about
two kinds of algorithms. So on the left are algorithms
with proven performance. So for example, Dijkstra's algorithm with shortest path is our polynomial time
algorithm is always guaranteed to give you a solution that is the shortest
path between two nodes in something like O of D plus E. And in quantum case, there are also algorithms like that, algorithm that are guaranteed to give you a solution of given quality, often exactly correct solution within certain running time. Like Schwarz algorithm is only going to give you indigent factors, or factors of your integers in some polynomial time. However, the most powerful algorithms, including the most powerful
optimization algorithms, both in classical, especially in the classical case are often heuristic methods. And what I mean by heuristic here is that there is no guarantee of getting the optimal solution, of getting a solution of given
quality in the worst case. In the worst case, you'll either have no
promise about the quality of the solution, or you can have no promise
about the running time of the algorithm. And the famous example is
using gradient descent problems that you can not prove to be convex. And that's a heuristic
because gradient descent will only work for provably
for convex problems, and if you have a weird
non-convex landscape, or you cannot show that it's convex, then it's a heuristic because you cannot guarantee its success. And we think what I'm going to argue today is that the quantum heuristic algorithms such as the quantum approximation
optimization algorithm that I'll be talking about today are the most promising
algorithms for optimization (indistinct) And it's especially promising
because the quantum hardware, such as the noisy intermediate
scale quantum hardware that becomes available today
provides a unique opportunity to benchmark these heuristic methods and actually answer the questions
about their performance. I see there's a question
on the chat from Ish. I'm going to read it out
and I'm going to answer it. Ish is asking about the VQE, which is an algorithm that
was discussed yesterday, involves a lot of optimization loops, that is outer loop parameter optimization. How do we then classify in terms of BQP when we don't know how many
durations are required? So this is exactly what I'm getting at. The VQE here is a heuristic algorithm, at least the way it is discussed today, that is we don't know how
many outer loop iterations you would require, right? So you cannot say that
it will be, for example, at most polynomial time, that is not polynomial
number of iterations in dealing with the problem size. And because you don't have this guarantee, it can be exponential. And in fact, there is a result showing
that parameter optimization even for QAOA is NP-complete, is NP-hard, it's hard in short, even for problems that
are classically easy. So the parameter optimization
problems can be more difficult than their underlying problem. So we do not classify this
algorithms in this sense. I mean, they're all, I
guess, PSPACE, right? But they are heuristics, so they're on the right
side of this bucket. Hope this answers Ish's question. And any other questions at
this point would be a good time for questions because we're
at the end of the first part. So the first part of my talk now is I'm going to talk about how to take a classical optimization problem and map it onto the quantum computer. And I'm going to talk about the maximum cut problem specifically because it's commonly used
kind of paradigmatic example for quantum optimization and you'll see a lot of papers
that discuss maximum cut. But I will also talk later
about the general rules for building quantum Hamiltonians representing arbitrary Boolean functions that you can map any optimization
problems you would like onto a quantum computer, as long as it's defined
on the Boolean queue. In the maximum cut problem, the goal of maximum cut
problem is to split the set of vertices of a graph
into two disjoint sets such that you've got as many edges as possible. So for example, if we're looking at
this graph on the right, there are four nodes in this
graph, two, three, four, and there are five edges in this graph. And if we let the color denote part, click cut, partition the
set of verts into green ones and red ones, then we cut the four edges
between these vertices. We can write it out as
an optimization problem in the following way. So we're summing of all edges, one minus SI SJ or SINSJ is minus one or plus one in
nodes in the part assignment of the vertices. Though here we're saying
that all green vertices are assigned label minus one, and all red vertices are
assigned the label plus one. You'll make it to verify that it's correct if you look at this term, so this is the contribution
of each edge, the cut. If both variables have labels, they have the same sign
and then this becomes zero, so there's no contribution to the cut. And if you have different
sign, then this minus, this SISJ becomes minus one, so the whole thing becomes one. And so the cut has increased by one. That is it correct formulation
for maximum cut problem. Okay, so we starting with this classical optimization
problem on the left. We're trying to find some configuration S, binary configuration S, that maximizes our objective function. And this is all of the
problems we're going to consider today are going
to have form like that. And what we want to do is we
want to solve them somehow on a quantum computer. So on a high level, what we do is, and this is not the only way to do it, but this is the way (indistinct) I'm talking about today. We're going to take this
optimization problem, classic optimization problem, and we're going to
convert it into a problem of characterizing some Hamiltonian, and then the spectrum of that Hamiltonian will encode the problem, the objective function, and then the ground
state of this Hamiltonian with highest energy gets
set with this Hamiltonian will encode the maximum configuration, the configuration that maximize
the objective function. Let me quickly unpack this. So we're going to take
our objective function and we're going to
construct some Hamiltonian which is our mission
operator, mission matrix that encodes this objective
function that spectrum. And it's like in values. And then our solution,
which in classical case, some configuration, which is a string of minus one and plus ones of length N, where ends the number of variables. In quantum case, this build become the encoded in the highest energy eigenstate or in more linear algebra terms, largest eigenvalue,
eigenvector of this operator, and the observable associated with the, corresponding to the value of the objective function
in a given configuration is the energy of the Hamiltonian with the given quantum state. Now the nice thing about this is that because our Hamiltonian
is classical, that is, it's a diagonal operator. This eigenstate, this highest
value energy eigenstate will be a computational basis state. So if we measure it, we will get the solution to our problem with probability one. So there is in an ideal setting where we solve the problem exactly, there is no randomness to it. Okay, so what has this
Hamiltonian actually look like? So what we want is a Hamiltonian that acts on a computational basis
state, acts at the facts. And so what it looks like is it's a matrix of two to the power of N
by two to the power of N that has on its diagonal all values of the objective function on
all possible binary strings. And that is easy to check
that the effect acts on, for example, 000, which is represented by a vector, which has one in the first
entry and zeroes elsewhere. Then the action of this
operator on this vector will be exactly F of zeros. And the same is true in the general case. So that is if we are looking
at the action of this operator on an arbitrary computational basis state, which is encoded by a vector,
which has one in the end, should responding to this binary
string and zeros elsewhere, and the action are Hamiltonian
will be exactly F of X, that is it would add a phase of F of X. This matrix that I wrote is too large, it's two to the power of N
by two to the power of N. So it's too large to construct explicitly. And in fact, if we were to
construct it explicitly, we would have already found the solution of our optimization binary because you can just
look at that diagonal, this upper right and
pick the smallest value. So would be a little silly. So the question is, how do you construct this
Hamiltonian efficiently? And the way we're going to do this is we're going to build
it from Pauli Z matrices. But first let's quickly recap
the notation we're going to be using. I think it's the same notation
that's used by the speakers the last two days, but it's always good to make sure we're all on the same page. So when I write cut X, this is the same as in classical case, bold X, or X with an arrow over it, it's just a notation for a column vector. And when I write bracket Y this is conjugate transpose of cut Y, and it's a role vector with elements being conjugate transposed on the Y column vector. This simply denotes this bra Y cut X, simply denotes in our
product between Y and X, and that X bra Y, this denotes our other
product between X and Y. And this I think is all the notation we're going to use today. Let's go back to our MAXCUT objective. And so what I'm going to do
is I'm going to just show you that we can just take these variables S that are plus or minus one, and we can construct
Hamiltonian by simply mapping these binary variables SI onto the eigenvalues of Z, of Pauli Z matrix. And what this would mean is that later is that all we do is we
mechanically take the formula and we replace each SI
with a Pauli Z of thread or acting on the corresponding cubit, Pauli Z acting on the I cubit. And this is what we're going to do. And next, what I'm going to show you is that this is indeed correct. And I'm just kind of walk you
through what's going on here to give you a sense of why
this is indeed correct. So what we want to show
them that this is correct is we want to show that this operator acts on the
computational basis states X as C of F as the value
of the objective function on that binary string. So let's show that. So the Pauli Z operator is this matrix. It's diagonal matrix with one
and minus one is diagonal. And it's easy to see that
eigenvalues minus one and plus one with eigenvectors
being computational basis. So you can just apply it to vector zero, you cut zero, which is
computational basis state zero, and it's simply a of vector of one zero and also stays zero. And if you have computational
basis state one, which is zero one vector, then the action of Pauli Z matrix on this vector gives you minus one phase. So in short, the action
of Pauli Z operator on the computational basis state X is adding a phase of minus
one to the power of X for X at zero or one. Now, if we have Pauli
Z acting on I-th cubit, what we mean by that is
that we have a operator that is acting on all N cubits, but its identity everywhere else. It's identity in terms of product, as a product of identities on both ends, and then Z acting on the I-th cubit. And here I'm making
assumption that Z is somewhere in the middle. So in general, if Z is
like the Z of cubit, there is no identities
on the left, for example. But for simplicity of notation (indistinct) And then the action of
this operator on I-th cubit is easy to see. It's going to add a phase of minus one to the power of XI, where XI is I-th entry
of the binary stream, responding to this computational basis. Further more, if we have
two Pauli Zs, ZN ZJ, then the action of this operator, which is a product of two Pauli Zs is minus one to the power of
XI minus one to the power of XJ that is adding a phase of
minus one to the power of XN minus one to power of X. And here, once again, I'm just assuming for the sake of notation
that we'd reorder it to keep it such that they are adjacent. Okay, so this is the
action of ZZ operator. Now let's show-- Let me show you that
this is the right action. And so what are we going to do you, is we going to take our
objective, our MAXCUT objective, and we're going to take
it from SIB and minus one and plus one to XIB and zero, which is a trivial, which we can just say SI equals minus one to the power of XI. And this should already
start looking familiar. So our new reformulated objective now is contribution of each edge is one minus power one to
the power of XI minus one to power of XJ. To remember of what we want to show is that in our MAXCUT Hamiltonian, in our MAXCUT Hamiltonian, it acts on the computational basis. The X as C of X as the value
of the objective function on the corresponding binary string. And from what you've seen before, it should be clear that it
is exactly what it does. Just walk through it. So this is our Hamiltonian some computational basis state acts. Then by linearity, we can open this brackets. With the first term, this is just one identity just does one, and the second term gives us minus one to the power of XI minus
one to the power of XJ, which is exactly the value
of objective function. This gives us the-- This shows that the Hamiltonian doesn't correctly implement our objective. So what we've shown is
that the Hamiltonian and closing our MAXCUT objective
function can be constructed by simply mapping the binary
variables SI minus one and plus one onto the
eigenvalues of Pauli Z. And know that everything I worked, even though I only walked
you through the example of MAXCUT, the very
same procedure will work for any objective that is built by some products of binary variables. If they need not be
for example, quadratic, so you can have a productive of five Ss, and what would that become
is product of five Pauli Zs. Any questions on the construction of the Hamiltonian representing the MAXCUT objective function? The size of this operator, the size of this Hamiltonian is two to the power of
N by two the power of N, because we write ZIZJ it's a operator acting on N cubits. So it's a two to the power of N by two to the power of N operator. So we cannot, as I said earlier, we cannot write this out explicitly. How do we do it in general, how do we build this
Hamiltonian in general? Well, maybe a better question why can we always build it? And the method that I'm going to show you is due to this excellent
paper by Stuart Hadfield, I highly recommend. But what we want to do now
is we want to construct a Hamiltonian that acts on the computational basis state X by adding a phase F of X. The first, let me give you a quick primer on Boolean analysis, analysis of Boolean function. So consider some arbitrary
Boolean function. But here, just to show
the generality of this, I'm considering a maximum function, which is not even a typical and objective function
optimization problem. And I'm going to kind of go willy-nilly between X being minus one and plus one and X being zero one. And we know that and I've shown before, we can always do this by doing this, by change of variables. So this maximum function can be expressed as a multi-linear
polynomial of this form. Now, how can we construct
this multi linear polynomial? Know that any Boolean objective function can be written out as sum of F acting on some configuration
times indicator polynomial correspondents configuration, which is zero if X is not equal to A, and one, if X is equal to A. And this indicate that polynomial can constructed
explicitly this way. For example, for our maximum function, Y, building this indicator
polynomial explicitly and adding the corresponding, completing the corresponding
values of the function, we can build them all to linear polynomial
representation of this. Now, let me quickly add annotations. I'm gonna write X to the power of, Sth S to be a monomial corresponding to S, namely, it's a product of XI over indices that are in the Sth S. And so this multilinear polynomial is actually a Fourier expansion
of this Boolean function. And these coefficients are
rear coefficients corresponding to this Boolean function. And there is a results showing that this multilinear polynomial always
exists and is unique, though, for a given objective function, we have a unique multilinear
polynomial representing it. Okay, now back to our Hamiltonians. So know that instructing this
action of the Hamiltonian is actually represent-- You can get this action
by building a Hamiltonian as a sum of F of X times projector onto the computational
basis state backs, right? This holds because this
projector will give you one if you're projecting that. If X is itself, is X, if you were projecting X sum
two X, it will give you zero if you're projecting some eigenstate, some computational basis
that is not equal to X onto X because computational based states are an orthogonal basis. And just comparing these two, you can already see that
this is how you do it. So you go from in classical
case, indicator polynomial to projector operators in the quantum. And finally, just to-- So, okay. And then know that the because products of Pauli Z
implement parody functions. So products of Pauli Z acts on the computational-based state X acts as by adding a phase next to the power of S, then taking the Fourier expansion off your classical and Boolean objective function and taking those Fourier coefficients, lagging them into this
formula will give you a unique N-qubit Hamiltonian representing F. So, in the short the algorithm, the recipe for constructing a Hamiltonian for an arbitrary optimization, arbitrary objective function is you start by performing
Fourier expansion of this function, and then
you replace the multilinear, the monomials X to the power of S by products of polycis. Now this is a general recipe in a sense, a proof that this
representation always exists, but is in general hard to
compute the Fourier expansion. And in practice, you can build this Hamiltonian
from simple building blocks, and I would recommend this
paper, Stewart's paper, and this kind of walks you
through how this can be built for common Boolean functions. Finally, in practice, your objective function
usually already comes as a polynomial. For example, in MAXCUT,
we already started out with a polynomial. So if it's a polynomial, it's easy to convert to
an objective function, it's easy to convert to Hamiltonian by replacing the products of X with products of Pauli Zs, as I've shown you for MAXCUT. So now I'm going to talk
about solving this problem. So we've converted this
classical optimization problem in a form that is convenient to solve in a quantum computer. So how do we actually solve it? How do we find this highest and eigenstate of this Hamiltonian? And this we're going to do using quantum
approximation algorithm which a heuristic algorithm for finding or solving conventional
optimization problems. And the way it works is
it prepares the following parameterized trial state, or which is commonly
referred to as ansatz. So we start in the vacuum state or zeroes, we apply the parameterized cubit, and then we apply a sequence
of alternates and operators, where one operator is major
exponent of minus I times, some free parameter, gamma
times the problem Hamiltonian and another one is exponent of minus I, some free parameter beta, and
B is the mixer Hamiltonian. The problem of Hamiltonian
is this combination of Pauli Z images that we
just learned to construct, and the B is a mixed Hamiltonian, and it's the commonly used ones one is sum of Pauli X on all qubits. And then we'll just typically
done is we use parameters, we use some method to find parameters, beta gamma that maximize
the expected value of the objective function. Now, why do we think it's a good answer? If we have a P, for P is
the number of these pairs of operators that we apply in sequence. Though, we know that if
we have infinite number of these operators, then this can at least exactly approximate something called adiabatic
quantum algorithm, which is guaranteed to find
the exact optimal solution in the limit of potentially exponentially large running time. Now for small P the
picture is somewhat mixed, but there are some results
indicating a potential for quantum advantage. For example, there are
demonstrations that due to a different mechanism, QAOA can outperform, for example, simulate
annealing and quantum annealing or certain class of problems. Now, one thing you might wonder about is how do we implement this in gates? So this is easy, other math is easy, but this may be not obvious to you. And maybe this is something
that Dimity has talked about yesterday, but I'm
going to reiterate it anyway because this is very similar. It's the fact when you're
simulating the Hamiltonian, C more time gama. This is simply Hamiltonian
evolution circuits. I'm gonna assume a pretty
standard basis state. You have your valleys, your Hadamard and your RZ gate, actually don't need phase gate. Okay, so let's start with a
simple operator and work our way to evolving the cost Hamilton
involving out handle. The simple operator
we're going to start with is just probably Z on one qubit. And what I'm going to show you as that, what you can see is that
exponent minus IZT or T, some free parameter is
equal to rotations around Z or with angle two times T. And you can see it by
just examining the formula for the RZ gate. The RZ gate is exponent
minus I theta, zero over two, which is exactly the operation
that we want to implement. And so the circuit for
implementing the simple operator is rotation on Z or attempts to T. I see there was is question in the chat. The question is, can the mixer be sum of Z, so the start state is all zeros, which is easier to prepare
than a superposition state, is the question is that we want, the question is if I understand correctly, that in B the mixer to also be, to be for example, sum over all, single qubit Pauli Z on all qubits. Okay, well, the issue with that would be that both the mixer and phase separate
Hamiltonian will be diagonal in the computational basis. And so what this means is that no matter what state you start with, if you only have diagonal operators, they do not change the
probability amplitudes of your outcome. Therefore, this does not, no matter how you vary, it
would vary the parameters, you would get the same samples out. So you need something
that is non diagonal. More specifically, you need
something that doesn't compute. So C and B should not compute, otherwise you will not have interference. But you can, theoretically you can plug in, I think some of Pauli Ys here, and it would also be non-trivial. So in general, any non-computing
operator here works. Some are better than others. Thank you for the question. Okay, so we figured out how
to implement this simple one minus IZT, single qubit Pauli Z. Now, what happens if we
have a two body term, a product of two Pauli Zs? So, to see the action recall
that Z has eigenvector zero one with eigenvalues plus one and minus one and recall from your class that exponent of matrix A acting on eigenvector V is equal to exponent of the
corresponding eigenvalue. And from this, you can
simply write out the action of this operator on all
computational basis state. So we can by writing out its
action on the computational basis states on all of the basis, we can see that it adds a
phase with the sign depending on the parity. See our phase, it adds a phase factor that
is something like minus one to the power of A, X or B. And the circuit implementing
that is T not, RZC not. And you can see this simply
by seeing what it does on some input AB. So C not sets the second
register to be X or B. And then RZ by definition applies the, adds the phase equal to
the value of this register. And now we add a second C not to undo, to reset the state of
the second register back. And so this gate adds a
global phase as a phase of that is equal to the desired phase. So this is the correct
circuit to implement exponent of minus I ZZT. And that gives us the two body terms. And now the last thing we care we want to is we want to have many ZZ terms in our MAXCUT Hamiltonian. One, because they compute, we can just represent them as products of corresponding terms. But if you have sum of
two terms in our exponent and they compute, this simply product of
their exponents meaning, we can just concatenate the
correspondence circuits. And this is sufficient to build circuit to simulate our MAXCUT Hamiltonian, that is our problem Hamiltonian. And this is half of our circuits, or this is all of these terms. Now the second part of our circuit is the mixer Hamiltonian, which is exponent of minus I beta sum overall qubits Pauli X. So each of these terms, exponent of minus I XT can be simulated by noticing that if you sandwich an exponent of minus IZT between two Hadamards, it is equal to the exponent of minus IXT. This is simply because Z
Hadamard is equal to X. And by just looking at power expansion, you can see that this is indeed the case. And so what that means is to
simulate exponent of minus IXT, all we do is we apply our Z Hadamard and this gives us a circuit
to simulate the entire. This gives us a circuit to
simulate the entire QAOA sets. Okay, so we can compile the
QAOA answers into gates. Now, finally, let me say a few words. Now, let me say a few words
about an intuition you may have for this method. So you can think of the initial state, the uniform superposition as some vector into the power of
dimensional Hilbert space. And we have our target state, which is the highest energy eigenstate, for example, a maximization
problem as some other vector. And what we are doing
is with our trial state, at each point we'll
plug in some parameters, and get something different vector. We're trying to vary the
parameters beta, gamma, to bring this trial state
as close as possible to the target state, which may look some of you
familiar and resembling Grover's algorithm, which
in the sense that is, it's similar idea of sort
of you add a phase factor, and then you do something
to cancel out the phases, where the phase factor corresponding is the exponent of the cost Hamiltonian and the the canceling out
amplitudes is the mixer Hamilton. Now from this picture, hopefully it's clear that the quality of the QAOA solution depends heavily on our ability to find
good QAOA parameters. And this is something
I'm going to touch on in the hands-on part of the talk. Now, let me say a few
words about the connection to the Adiabatic quantum algorithm which is something that
I think commonly comes up as a question. So adiabatic quantum
algorithm is a way to find some desired eigenstate of the problem Hamiltonian, for example, a ground state or
highest energy eigenstate of the problem Hamiltonian. And it's inspired by the
adiabatic approximation theorem, in which states roughly that if you start in some eigenstates, such as the ground state of
a time dependent Hamiltonian, and you evolve this
Hamiltonian slowly enough, and this potentially means
exponentially slowly, then you will remain in
instantaneous eigenstate of the system. So if you start in ground state, you will always remain in the
ground state, meaning that if you evolve your system
into the problem Hamiltonian, then your final system
will be in the ground state of the problem Hamiltonian. And you can read out the solution by simply measuring the system
at the end of the evolution. So your state throughout the evolution is given by this Hamiltonian, which is where S of T is a some
smooth function that varies from zero at the beginning. So you begin some simple Hamiltonian and it ends with one. So in the end you're all
in the problem Hamiltonian. And so in our MAXCUT case, we are going to write it out. We're going to use our problem
Hamiltonian to be the MAXCUT and our initial Hamiltonian, which is our simple Hamiltonian, it's just going to be a
sum of Pauli X qubits, which has the ground state of
uniform superposition state. So how can we simulate
this on a quantum computer? What I'm going to do is
I'm going to show you how the simulation circuit, where the simulation circuit for this is, and I'm going to show you how it can be considered a special case of QAOA. Okay, so let's go to the basics. Well, we have some
quantum state evolution. We start in time zero-- So we assume that our initial state, we start in time T zero equals zero, and we're also going to blend
Planck's constant into H. And so we start in state zero
and evolve with some unitary, and so we get our time
dependent final state of T. The evolution is described-- So we're evolving with some-- So if we're evolving with
some Hamiltonian H of T, then the evolution is
realized this is Hamiltonian by the Schrodinger's equation. And the solution to this
simple differential equation is if our Hamiltonian is time dependent, but computes with itself
at points in time, you consider it simply as exponent of integral Hamiltonian
over the time period. Now for our time independent Hamiltonian, the solution to this, it is even simpler, it's just Hamiltonian minus IHT. Now let's get to this time
independent Hamiltonian. The first, if we have some do non-computing Hamiltonians, A and B then by Trotter formula, if you have this product of E to the IAT over N to the
IBT over N to the power of N, and you send N to infinity, then this is equal to the
exponent or time of evolution of these, some of these two Hamiltonians. And as a corollary, you can truncate it to just the first two terms. Now let's consider our
time-dependent Hamiltonian and break it down into, break down our evolution into small chunks such that it's constant over each chunk. We're gonna set some small delta T, and we're gonna break up our
time-dependent Hamiltonian, such that it's constant over
the session that we can assume that it's constant over
this delta T, exactly. So we're going to be assuming that it's approximately constant
over this interval, and so our evolution is now a product of these time-dependent of
chunks, where in each chunk, the Hamilton is approximately constant. They don't compute necessarily, so apply the Lie-Trotter-Suzuki
decomposition and now we get this
product of, exponent of I minus SJ Delta T. And this is our E because our Hamiltonian is evolving with one
minus S of THD plus SHB. And this hopefully resembles the quantum approximate
optimization algorithm ansatz. So if you look at just
to put them side by side, on the left, we have the QAOA in our ansatz this product of exponents
of minus I beta B, minus I gamma C. And on the right, we
have the same product, but we have, instead of beta, we have one minus SJ delta T, and instead of gamma, we have SJ delta T. And so by simply setting
the parameters appropriately and setting the number of
steps be sufficiently high, we can recreate, we can simulate adiabatic
quantum algorithm using QAOA circuits. So this is the connection between, this is what I meant when I said the limit of P approximates an infinity you can exactly simulate the adiabatic quantum computation. Why the need to dicretize
time evolution AQC. Well, fundamentally I'm not, fundamentally the ratio is that we in general cannot just
exponentiate this Hamiltonian 'cause has non-computing terms, so we cannot directly simulate it. To do that, we have to break it down using the Trotter formula, and then break it down into chunks
simulated with small time steps. So, for example, these
two things were computing, so we only had-- For example, we didn't have, the HD was for example,
single qubit Pauli Zs, then we could just
exponentiate it directly, but not in this case. I hope this answers, okay? With the choice of delta T
need to be roughly proportional to the spectral gap of H is there some way of estimating this quantity without running
several trial experiments? This is a good question. I think delta T is the overall running time, I think scales roughly as one
over spectral gap squared. But yes, to the first question. To the second question, I'm not sure. I think for some problems you can, but this is not something
I'm familiar with. And in any case in QAOA, we don't have to deal
with this because in QAOA, we simply allow these
things be free parameters, but in a sense, we're sizing, we're fixing the number of steps in this, letting delta T be free parameter, and then we somehow
optimize them using that, an outer loop optimizer. Now let me switch gears, and I think this is the last
part of my theory presentation- - [Yuri] There is a question from Kadir. I don't know, did you answer it already? - [Ruslan] Yeah. - [Yuri] Okay.
- [Ruslan] The question about the delta T, the choice of delta T. - [Yuri] All right, excellent. - [Ruslan] Now is the
final part of my talk I'm going to talk about
briefly about the challenge of optimizing the QAOA parameters. And as I said before, typically the way it is done,
the way those parameters, beta and gamma, which are the time with, or how long you evolve the system with each Hamiltonian now chosen is unlike in QAOA in adiabatic methods where you would do this by leveraging your knowledge
about the spectral gap. Here, we just do the classical, some classical outer loop
method to optimize them as a black box. Now, one nice property of
QAOA is that the energy, this value, is expected
value of the simple solution and be evaluated without necessarily using classical
quantum computer in some cases, you can do it purely classically, and then you can, using this optimize that is find optimal QAOA
parameters purely classically, and then only run QAOA
on a quantum computer to sample the optimal solutions. So, to show this, let's
look at our QAOA ansatz, and let's just denote the by UB the exponent of the
mixer, and by UC exponent of the phase separating (indistinct) Then writing out QAOA energy
for P equals one for MAXCUT. So the identity computes out, and what we're left here is ZZ, sum of the over all edges ZZ terms, and on each sides are
these UB and UC operators. Then the contribution of
each edge is equal to this. And if you go through it, middle out, you can by noticing that UB which is exponent of-- Product of exponents of I beta, and then single qubit XM, this will compute through
the ZZ term and cancel with itself on the right, whenever the qubit the X is acting on is not one of this qubits. So whenever the qubit M is not J or K, we can compute the term
through and cancel it out. So the only thing that will be left is the mixer term acting on the cubits that are
corresponding to this edge, they're same qubits on
which of those Pauli Zs act. Now if you had the phase
separation operator, it's easy to see the same way that everything will compute through, except the terms that are attaching one of these two qubits. So in general, what, in this case for P equals one, the only terms that come
into play are terms that are in one neighborhood of ZJZK. And so the number of cubits
that you need to simulate is only equal to the one
neighborhood of ZJZK, which can be much smaller
than the total number qubits on the problem. For example, if you have
a three regular graph, you will have no more than six qubits of six nodes in your one
neighborhood of this term. So even if you have a million node graph, you can still evaluate the energy of QAOA and you can still chain up parameters, purely classic in simulation by leveraging this observation. And if you have high RP, all you do is you keep
growing your neighborhood to include everything that is in what is sometimes called
the reverse causal code of this operator. Okay, to sum up what I've
talked about today briefly. First, we are not-- QAOA is not, will not
solve NP-complete problems in polynomial time. However, it is a promising heuristic for the noisy intermediate
scale quantum era, that is the near term quantum hardware. For any Boolean function on n-qubit, we can construct the
correspondent Hamiltonian, and therefore we can solve it using QAOA. QAOA has a deep connection to the adiabatic quantum algorithm, which can sometimes be helpful, and can sometimes be misleading. And finally, we can often
train QAOA purely classically in simulation. And to the question of Alvin, the presentation slides are in GitHub, the same GitHub that was
used in the last two days. I think its slide.PDF in the
Wednesday folder on GitHub. Okay, and now I'm going to
move on to the hands-on part. Thank you Yuri for posting the link? Okay, so what we're going to do is we're going to solve MAXCUT problem on a simple graph. So recall that our objective is this, it's sum over edge is one minus SISJ. And what we're going to do is we're gonna drop some constants because the only thing that
matters to our solution is this part, the objective. And we don't actually
care about this identity. So we're going to just
reformulate our Hamiltonian as sum over edge minus SISJ. And so this gives us this
maximization problem, which as I've shown you earlier
would map to a Hamiltonian, which is sum over edges minus ZIZJ. You can simply take this
binary variable and map them onto the eigenvalues of Pauli Z matrix. So if this are Hamiltonian,
then QAOA circuit is we start in the vacuum state apply our layer of Hadamards, and then apply a sequence
of alternating operators. So what we need to do to run QAOA is we need to implement
these two operators. So let's start by importing everything. And we're going to consider this graph. It's a graph on five
nodes to bipartite graph. Going to be solving this. So let's start by building the circuit for the cost operator. So the building blocks are-- I have another one. Because this is sum over
edges of computing terms, this can be enrolled into
simply a product of expand (indistinct) gamma, ZIZJ. Therefore, what we need to do is we need to implement ZZ term by itself, and then because they compute, we can concatenate the
corresponding circuits. So the code to implement
that ZZ term is simply C knot RZ two times gamma C knot. And then to implement the
entire cost operator circuit, all we do is for each edge, we append the circuit implementing ZZHR. And this gives us the entire
cost separator circuit. And we can check that it indeed
produces the correct output. We get C knot RZ C knot for each ZZ term
corresponding to each edge. And here I just plugged
some random parameter to get some nice output. Okay, secondly, we need to
build the mixer operator. And the mixer operator is
just one of my inside beta X on H qubit. To implement the exponent
of minus I beta X, this is simply RX, and then the entire mixer
operator circuit is for each, qubit for each node we append this X term. This gives them an extra circuit. We can plot it, and it
indeed looks correct. It's RX on each qubit, okay? And with this, we can
build the entire circuit. So we start with just an empty circuit, and then we apply a layer of Hadamards, so we apply Hadamard on
each gate, and each qubits. And then we D times apply first, the cost operators circuit, which is this, and then we apply the mixer
operator circuit, just this. And in the end we measure. And this gives us the QAOA circuit. We can draw it, doesn't
quite fit on the screen, but it looks correct. So we first have this layer of Hadamards, then we apply our mixer operators circuit, and then in end we apply our phase separation circuit, or excuse me, mixer operator circuit. Okay, and now that we
have it, we can run it. Now, I'm only going to be talking about running in simulators today, but it's very easy to switch
to actual quantum hardware by just changing the backend in Qiskit. But know that even this small example may not run very nicely on
a real quantum computer. Now, one thing to note is that Qiskit uses an
ordering where the zeroth qubit is the rightmost to the least significant
bit in the bitstream, which is different from the
traditional textbook notation where your zero element is
the leftmost binary string. So what we have to do is
for each sample we received from Qiskit we have to invert it. May I have a little helper
function to do that. Yes, so we can take our circuit, which we've built earlier
and we can run them, run it simply in simulator, and we get some results out. And it doesn't look like much because they all seem to have
roughly the same counts loses on binary string. And this is the number of times we've sampled the binary string. So to how good were our results. So we can kind of tell here
they are not that great. But to answer how good they
are exactly, what we need to do is we need to be able
to evaluate the value of objective functional base samples. And here's a simple MAXCUT objective that takes the graph G, and all it does, is it goes through all
the edges of this graph, and it checks on the
corresponding bits are equal, and if they're not equal,
that means they're just cut. For example, MAXCUT objective of 00011 is six or minus six, because for convention, we are minimizing. And now we can compute
the energy of our sample. This value, the C psi, and what we do is we go
through our accounts and for each measurement outcome, we measure, we compute its energy multiplied
by the number of times we've obtained the measurement outcome, and then divide by total number of counts. And this gives us the expected value of the objective function. And we get something that's
like three point something. Remember that our optimum is
six, so this is not too great. Now to get better solutions, what we need to do is we need
to find the good parameters, beta and gamma for QAOA. And so what we're going to do is we're going to simply
use SciPy implementation of constraint optimization
by linear approximation, which is a simple black box optimizer that builds a linear model
with some trust region and uses this to go downhill. And so to you use that, to use this SciPy minimizer method, what do we need to do is we
need to wrap our objective, our QAOA into some black
box objective functions our compiler can optimize. And so this is a simple
function that does that. So it returns some function
that takes in parameters theta, interprets the first
part of theta to be betas and second part of thetas to be gamas, and then builds QAOA circuit, and then runs this QAOA circuit, and returns the energy. And then a COBYLA can take this function, it can find parameters theta
that minimize the energy. And now we can put it all together. So I'm going to set P equals five, and going grab the objective, and from some initial point. I'm going to run the SciPy
minimize it with COBYLA. And I'm going to set
number of the durations to two and a half thousands. And so what we get is
we get energy off 5.29, which is not too far from optimal number, our optimum is the six. This is much better than
what we had previously, which was something like three. Okay, now this only gave
us the optimal parameters. Now what we need to do is we
need to give the solution out. And for that, we need to take this circuit and run it with optimal
parameters and get the counts. And what we can see here is
that the most frequent counts are accounts corresponding
to the solution. So our optimal solution
is either 00011 or 11100. And those are the two
most frequent samples. So that's good. We frequently get the optimal solution. This is what we want. Now we can look at how
good the results are by just blotting how many of samples we
get for each energy level, and we see that for optimal
solution, which is energy six, we get, what about like more than 70% of our samples have optimal solution, that correspond to the optimal solution. This is pretty good. And we can take look at all of our samples and just take the best one by just picking the one
that has the lowest energy and by coloring our graph, by looking at that graph, we can see that this is
indeed the optimal solutions, so we got all the edges in the graph. So that's good. In that example, we estimated the QAOA energy from samples. Now, what happens if we use the entirety of the quantum states
to estimate the energy? And for that we can use
Qiskit state vector simulator, which I think Kevin discussed on Monday. And what state vector simulator gives you is the full quantum state. So for example, if our
state is a belt state, then the state vector
will spit out a vector of one over square root of 200,
one over square root of two. Now we have to invert the
state vector to account for the Giskit ordering of qubits. And I have some help of
function here to do that. Now I'm going to reimplement the method for building QAOA circuit because we shouldn't measure if we're using state vector. And finally, the last thing we need is we need a method to
convert our state vector to the energy estimate. And for this, I'm just going to take state vector and convert it into counts. And now we can put this all together and we can just plug in the parameters that we got previously and
random and see what comes up. Now, let's make sure that
our implementation is correct by verifying that the energy is equal. So we take our parameters and with plugin the-- Right, so we plugged in the
previously optimized parameters we got out of the state factor, and we're looking at
the value of the energy of the state vector, and it is indeed correct. It's the same value as
we got from samples. So this is good, we didn't have any bugs. Great question, Alex. So Alex has a question. I understand that all of
these tutorial problems have been run in ideal simulators, but just how much does the noise
in actual quantum computers affect results such as these? One thing I can point you to is-- Here's an example of
what noise does to you. So here we're running
QAOA circuits for MAXCUT on this five cubit device, and we're using these four cubits, and we're running on star graph. Our graph is four node star graph that looks like this. And we're solving MAXCUT on this graph, and we're mapping it on the
line segment of IBM qubit (indistinct) Here the green line is
the noiseless energy. And as you can see, as we
increased the QAOA depth P, the noiseless energy approximates optimal, and for P equals three will
solve the problem exactly we achieved the optimal solution of three. Now, if we're on hardware and we started increasing the depth, from P equals one to P equals two, we get a little bit of improvement and expected value of
the objective function, but it's well below noiseless case. But if we go from P equals
two to P equals three here, then our expected value of the objective function actually drops. So this is what noise does to you. So noise effectively imposed
an upper limit on the number of QAOA steps that you can do. I hope this answers your question. Thank you, for an excellent question. And I would recommend this paper it shows a method for mitigating these errors by
leveraging the structure of the QAOA ansatzs. And I can drop this link in the chat. Okay, so now we were using
the full state vector and let's just see what happens if we run the optimization loop
with the full state vector. So now our optimizer is gets
the energy that's competed from the full state, so it's an exact energy as opposed to energy
estimated from samples. And if we do that, and it's
gonna take a little more one, because state vector
simulators is a little slower than the chasm assembling-based simulator. So we're running it, we we're giving the same
budget to function evaluations. The only difference is that we have full energy instead of estimated from samples. I'm gonna answer the question from Ish while this is running. So Ish is asking, "When
doing experiments of IBM (indistinct) "for example, and say, "we have a graph that is complete. "So to simulate the cost operator, "the operator correspond
to the cost Hamiltonian, "we need C knots between all qubits, "which of course does not match
the hardware connectivity, "but how do we implement
C knots between qubits "which cannot be entangled on hardware?" This is a great question, Ish. So what we do is we insert swab gates. So for example, or that
example that I've shown you, which had three edges. So you need only 60
knots in the ideal case, but because we only have
this line connectivity, what we have to do is
we need to insert a swap insert a swap here. So as a result, we get nine
C knot gates instead of six. So to run QAOA on which requires the entangling qubits that are not directly
connected on hardware, we insert small gates to move
the qubits next to each other. And in general, this can be done with, at most linear cost by using something called swap networks. Well, coming back, I see that our state vectors
simulators has finished, and you can see that our
objective function is much better. So we got energy of 5.95. To compare it with what
we had from samples, it was about 5.3 by 5.2, I guess I could print with more digits. Yes, so we went from 5.29
something, 5.3 to 5.95. And in fact, by running with
a full state vector here, we can get arbitrary close to drop zone. So if you just, don't try it now, because
it'll take forever, but we can plug in 10,000 iterations here and you will get like 5999. And you might wonder, well,
why is this happening, why do I get so much worse energy, so much worse parameters, and so much worse solution
when using the sum estimate of energy from samples, as opposed to the full state vector? And the reason is COBYLA gets
stuck in a low quality point because of the sampling noise. And you can see this by just
verifying that the parameters are quite different. Now it's important to
point out that this point that COBYLA stuck got in
is not a local minimum. So if we stay to this point and the run optimizer
with a full state vector from that point, it could actually go downhill. And here we're assessing the
COBYLA's initial step size to be very small, so we're this tiny
neighborhood of that point. So this point is not the local minimum. It's simply a fake local
minimum that we get stuck in because of the sampling error, the finite sampling error,
growing our optimizer on. And this maybe to the question of, to Alex's question, this is only going to
worse on the hardware because on the hardware the error in your objective is even larger. So you really need to think
about the stochasticity of your objective when optimizing it. This makes it difficult to use some of the of-the-shelf deterministic
optimizers like COBYLA, which have a baked in assumption that the objective function
value is always correct. Okay, so far I kind of walked
you through pedagogically, what it takes to implement QAOA, so you have a good understanding of it. Now, let me show you how you can skip some of these steps using Qiskit. So what we did to recap, we
chose a problem class MAXCUT, and problem in essence, in this case, it was this five node graph, we build the general Hamiltonian
for the problem class by sitting down and deriving it, and then we build this Hamiltonian for the problem instance, from the Hamiltonian we
constructed the QAOA circuit, wrapped the QAOA circuit
in the black box function, and use some SciPy methods
to optimize QAOA parameters. And finally, we sampled the solution bar and QAOA with optimized parameters. In general, all of these
seven steps can be handled by Qiskit with their high
level declarative interface. But I'm only going to show you
today how to use these steps, give this steps three and four by using Qiskits' Hamiltonian functions. So what we're going to do, and this is, I just copied this from
Qisqit optimizations. So this is method implementing Qiskit. And what it does is it
builds it cost Hamiltonian, which is a Pauli operator. I did the sum of Paulis and each Pauli is a
product of two Z, Pauli Zs. The notation here, may sounded unfamiliar, but this is simply similar to
stabilizer Tableau notation. So you first, you have string encoding the, Pauli X string encoding Pauli Z, and if both X and Z are one, then this is why. This is exactly like the
stabilizer Tableau notation. And so now what we can do
is we pass a graph to this, get the cost operator function, and it will spit out our Hamiltonian. And by matching the Hamiltonian, to the kind of learning
that we know to be correct, you can tell that this is
indeed the right Hamiltonian. So it has terms corresponding to each edge with the right coefficient 0.5. Now we can make sure that the Hamilton is correct by simply using an eigensolver, like a classical NumPyEigensolver to find its lowest eigenvalue eigenvector. And so we get some eigenvectors. Now we have to convert
them to the solution on the MAXCUT problem, which Qiskit provides a function to do. So we take these eigenvectors and we convert them, and we obtain results, obtain the solution
corresponding to this eigenvector using Qiskit, which is
indeed the correct solution. So the Hamiltonian we've
built in Qiskit is correct. And this is something you can
do if you already running QAOA on your own problem, because you can simply build
your Hamiltonian the same way and then check it using exact eigensolver. And of course you can simply
build your Hamiltonian as a matrix. So you can know that your
Hamiltonian is the sum of ZZ terms and each of them is Z
tender product identities. And so or you can just build this in NumPy and then just add it up and you can use this operator in Qiskit. Well, actually first you
probably want to test it, but here we take this operator
and we get it's eigenvalues and eigenvectors, we get
the lowest eigenvalue, and corresponding eigenvector, and then if we use Qiskit's method to get the corresponding
solution it's indeed correct. So even the Hamilton we
built Qiskit is correct, in, excuse me, in NumPy. And finally we can build the QAOA circuit from this Hamiltonian by just using QAOA variational
form implemented in Qiskit. And then the same way as
before we can just take this and plug it in by constructing the circuits. So the methods variational
form to construct circuit, and you have to plug in some parameters. Measure and run it on the Qiskit and then you'll get some counts out. Similarly to how we did with a circuit that we've built ourselves from gates. And by comparing this to the circuits, doesn't fit on my screen, so you'll have to believe me, but it's the same circuit. You can let me guys zoom out I can see if you can see it. Yes, it's the layer of
Hadamards and then RZs and they're on slightly different order, but you can see that they match. And the only thing we have to rescale the parameters correctly. Actually we don't. It doesn't have to happen. Yes, and the two circuits match. And in fact, we can check that they produce the same unitary. So what we can do is we
can compute the unitary using Qiskit's unitary simulator, we can compute the unitary
corresponding to the circuit that we've built from gates by running the circuit
through the unitary simulator and grabbing the unitary. And what I'm doing here is
I'm removing some errors that come from numerical instability. So I'm throwing away items
that are smaller than 10 times the old Epsilon. And when you do that, you can
check that the two unitaries are equal up to the global face. And note that even the Hamilton that you have built in NumPy, you can use in Qiskit, but it will be slow in general. Probably only recommended for prototyping. And that's all I have for you today. Thank you for listening, and I'm happy to take any questions. I see there in a questions from VJ asking what other black box customization
techniques are available. And the answer is there's
many ones in SciPy, and there's actually
further ones in Qiskit. If we go back here we can instead COBYLA, we can use something like (indistinct) and we'll get some solution as well, but it may be worse. In general, you can go to SciPy minimize. And there's a whole list
of methods available there. Drop this in the chat. Now the good library to use is NLopt which also has implementation of many different methods. And some of these NLopt
elimination in general better, they are usually suffer from
like weird memory issues or numerical instabilities. So I'll drop this in the chat as well. So you can see that we
ran it an elder medium, we actually got a better solution, 5999. The lesson is try different methods, some way work better than others. - [Yuri] And thank you very much Ruslan. It was a very nice presentation (indistinct) session as usual. Again, I'll see everybody
tomorrow at 10 o'clock. Okay, bye.
- [Ruslan] Thank you guys.