Bayesian Deep Learning on a Quantum Computer

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

- [Alex] Okay uh good morning everyone let's let's change a little bit of topic now I'm going to talk a little bit more about applications of quantum computing and in particular in machine learning this is going to be a machine learning talk and for preparing it I, I did this exercise I thought let's take the last year the last period of 365 days and let's see what was in the news about machine learning And you know every once in awhile you will hear some news that something has been done so on but like doing this retrospective really shocked me because there's so much that has been done this year we've gone from like generating high resolution images of faces of people that do not exist to using machine learning in in medicine in helping predicting diseases to also it's also being used to as a tool in other areas of research to do actual discoveries and then the list goes on and on and on we have like now presenters that are not real we have AI creating art, we have AI writing coherent text and even a couple of days ago we had a machine making, well competing in a debate competition against humans and this actually well, seeing this with a bit of like perspective makes you think wow like from here in into a couple of years there's nothing that deep learning is not going to be able to do, right well actually, in this talk I want to convince you that this may not be quite the case and let me put you an example of it let's think that you are an expert in deactivating bombs Okay, something that most of you are probably and um let's well you want to innovate and you want to implement machine learning and try to make your work easier by using machine learning so you have this deep learning algorithm you develop this deep learning algorithm that takes information about the particular bomb for instance a very important thing is as you as everyone knows, the color of the cables, right and well, then you use it, you use the algorithm to predict which cable you have to cut so the answers you get are something like this Okay and cool you train it, it works fine, perfect but then it goes in the application into a real problem so you are faced with a real bomb that if you cut the wrong cable, it explodes and you just get this information well I don't know you but probably, I would like a bit more information right and maybe I don't know how sure are you about this. Isn't it that well, were you more or less the same sure but just a little bit less that it wasn't the blue cable I just wanted a bit more you know. And it turns out that this questions like how sure is an algorithm about a specific prediction is a very difficult question to answer in the standard framework, framework of deep learning. The reason for this is that deep learning as we know it now is mostly based on an optimization, it's based on calculus and these questions do not fit that well. In contrast, there's other well there's people aware of this and there's other frameworks in which machine learning operates that they have a more natural frame, are a more natural framework for these sorts of questions. Like a probability theory and that's what I'm going to talk about. This Bayesian approach to machine learning. The war version should I, I don't want to scare people here it just using these sorts of theorems about probability distributions. This is essentially Bayes theorem which tells you what is the probability of some event occurring given that we have some previous information A. And how to compute it given other information that is more easily accessible And this has a very nice application in machine learning. You can think of what is the probability that a label is given, given that I know some data in training I know some previous, I have some previous experience. For example, and I compute I can compute that from quantities that are more accessible, more accessible in my data set for example. And now the kinds of answers that we get are still not, maybe not completely convincing, but at least we're getting a bit more information about the solution that is being output. With something like this, I would be a bit more convinced on cutting the right cable you know. Uh, okay and one so, one approach so when I press this, is well doing it in classical computers the people have been working on this and have been doing research and this kind of Bayesian training of deep neural networks can be done. Here I just will have to warn you that here comes the boring math but I will try to keep it simple. Essentially the way of doing Bayesian training of deep neural networks is thanks to this analogy between each layer in the network and something that is called a Gaussian process. And the important thing about Gaussian processes is that we need to know, is that well, we assume that there is a global Gaussian distribution underlying the outputs of each label. And then we want to compute this quantity. The what is called the Posterior distribution which is essentially the probability distribution of some label White Star given that I have some input X star and some training set. Which with instances and labels And this if we assume this Gaussian process is, has this form here. It's just a normal distribution. The Gaussian distribution with some in and some variants that are given by this formula is important thing here, I wonder if this I can point with this. No, no okay anyway the scale here is an important quantity that is called the covariance, the covariance matrix. And it's essentially a matrix that you build out of your data out of your data by applying what is called a covariance function to each of the, if each combination of data points. And the very very nice thing is that for each layer, you can compute this covariance matrix just using the information from previous layers. So you can do, you can do this training in a recursive way. Awesome then, so this thing exists so why is not everyone using it. Well it turns out that, well it's not like super hard if not, it's not NP hard to compute this inverse so remember that we need this covariance function with this covariance matrix but we have a power to the minus one, we have to invert that big matrix. And this inversion, yeah it's not super super hard but, but still for very big datasets for a large amount of points, the number of operations that one has to do goes with the third power of the number of data points. And this at some point becomes a bit intractable. And once so, so, so what what can we do now. Here is the point where quantum computing can help. So why don't we do something like this. We encode these vectors Y and this this case star into quantum States and we interpret our matrix as a quantum operator. Can we now do something easier? Well it turns out that yes. Luckily there was this algorithm by Hassidim, Harrow Hassidim and Lloyd, developed in 2009 that allows to do exactly this. So you have a system A times X equals V like a linear system of equations. And there exists a quantum algorithm that retrieves the solution this, this vector X which has a very similar form to this scale to the minus one times Y. So we can do that part on the one hand. And on the other hand, we also have quantum algorithms to perform this inner product in an efficient way. So we can do this. And that's what we were, well these are the sorts of results we were connecting in order to have an end to end quantum algorithm to do this Bayesian training of deep neural networks. What we did we do essentially, this is in this paper over here that we released like half a year ago. Essentially we need just two ingredients which is first the recursive formula for the covariance matrix over layer as a function of the covariance matrix of the previous layer. And then we need and this, it's true that it's not a trivial thing but we would need the initial covariance matrix, the covariance matrix of the first layer encoded as a quantum state which I mean in principle you could for instance, compute classically and then prepare such a state but anyway this, this we don't care at least in this project we don't care too much about it given these two things, what we were able to do is to build an approximation of the covariance matrix of the last layer. Again well we will we build this and then what we also built was the time evolution operator under this approximation. So this is essentially, yeah this could be encoded into a quantum circuit or, or, or simulated by a Hamiltonian simulation and could be applied in the HHL algorithm to do the matrix impression. So, essentially yeah, we take this state encoding the initial covariance matrix and we developed the evolution, the time evolution operator that allows us to do the matrix inversion in a quantum way and compute the inner products. So to obtain the parameters of the distribution that we want to fit the data to. Not only that, this was more theoretical but then also we did some sort of experiments. Bear in mind that these are experiments done by theoreticians so they may they may not satisfy real experimentalists but well we were, we were coding the core part of the algorithm this HHL part. We were implementing it in various frameworks in regard to forest and in IBM Q. And we were doing simulations of the run of the algorithm for invert, inverting big matrix's as big as 4 by 4. And running the protocols in these simulators using different, different kinds of noise. In this figure I have both gate noise which is an X operator applied after every gate of the circuit with some probability and you see that this awful essentially because of the number of gates that you have in this secret so it's quite big so even for low probabilities you have a lot of ex operators applying on your on your state and then we have this measurement noise which is just a readout error when you do measurements and this is not, not, not that bad. Not only that we also did runs in real quantum computers. Both of IBM and reeding and in the case of IBM we got particularly nice results. In particular, well we got the here well I'm putting this probability of success under a swab test just not to make too much fuss about it. This can translate into a Fidelity with the desired target state and in the case of IBM we get fidelity's of about 78% which is which is brilliant so yeah, that's that's essentially all I wanted to tell you just to wrap up quickly. I hope I've, well the takeaway message is that not all machine learning is deep learning. And actually there's other frameworks another and other ways of doing machine learning that may be more useful for particular applications. In this context by using deep learning based in Gaussian processes, it is useful. You can train very large networks but it's also classically hard. Nevertheless, for the classical, for the hard parts we can resort to quantum computing and have some sort of hybrid classical quantum algorithms to do the full training and in this respect, the experiments that we have conducted are encouraging as I said especially in the IBM platform but still there's a lot to be done. The matrix's that we could invert in real computers were not bigger than two by two. So probably it would take less time maybe doing that by hand but anyway, all the tools are there. We did everything open source, all the code so, so they are up to available for generalization or, or for the modifications and I guess it's a it's a matter of time that we have application of these algorithms in more realistic scenarios. And that's all thank you very much.

Info

Channel: Qiskit

Views: 11,752

Rating: 4.9789472 out of 5

Keywords: qiskit, quantum computing, quantum computer, bayesian, deep learning, talk, tutorial, probability

Id: 7CKqbIoxYGA

Channel Id: undefined

Length: 14min 46sec (886 seconds)

Published: Fri Apr 26 2019