James Zou: "Deep learning for genomics: Introduction and examples"

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

and supposed to give us a research talk but I enjoyed the tutorial talks yesterday so much that I'm going to also bit of last night to our to change the talk to make it more as a hybrid tutorial and research parts the first part would be more tutorial and I'll talk about some some more recent stuff that we're doing in my group in the second half so feel free to ask questions join opponents just I'll be talking about some of the applications those and challenges of deep learning in genomics so so many of you have heard about the period quite impressive achievement recently of deep learning in a lot of different applications so in some of the earlier applications from a few years ago in image recognition and then more recently there's also a lot of interesting advances in natural language processing so a lot of the speech recognition was image to speech captioning with each to speech translation under the scene unit obviously nice deep neural networks right there's also a little more ambitious kind of plans going forward right so there's some activities activities going about 20 of self-driving cars right so there's the slide involves different aspects of in a really robust image recognition system through robots or reinforcement learning systems right and behind the scenes of that it was a lot of these people and networks as any of you will probably heard about he's uh you know AI system service can play games they're very high level right so it was quite a lot of headway in the last in the last year or so where researchers from peat mind developed games based on reinforcement learning deep reinforcement learning to play go I know the play other games that guitar games on H really impressive performance surpassing the currently the best human players okay so so we think about these kind of applications of deep learning they're basically two broad classes of kinds of applications right to the first class is what we are more familiar with sort of making predictions this is a sort of what we call for supervised learning in machine learning all right making predictions you know it's really what you think of when you think about machine learning you have some input example in this case it could be images or natural images in other examples to look at it could be of genomic sequences or medical images right and the prediction usually have so a few will specify categories of classes or labels that you want to predict for each image which classes it along - right before this image what's probably that's a personal kind of car all right so that's the supervised learning approach and this rough mix up I think the bulk of machine learning so this actually this other aspect of deep learning which is in more recent years becoming also becoming quite important more than unsupervised learning approach with the second category where the goal here is not to make prediction okay we're not trying to classify an image or classify genomic sequence we're just trying to actually model the whole distribution in a way that allows us to generate new images or new DNA sequences so here's a cute example of this where as input where you have a human that draws out the outlines of different kinds of ten packs like our car - outlines of the human draw this is the input into the algorithm so this so these are the wrong truth that's the middle middle column and what the algorithm tries to do is clearly learns the whole distribution of these different handbags right and then is able to describe the network of researchers draw a sample right - for a conditional sample from the input of the right column or the sample a draw based on the cartoon sketches and you can see that interactive notice that if there is a sample across were actually very realistic but also has sort of interesting variations from the actual the route through images so these are the more I supervised learning example so I look at about both classes of of applications of neural networks so in the last few years sorry so it's clear that there are a lot of interesting ideas see me seek your network and people have started to explore the application standards biomedical data so if you think about application system of you the most natural places to look for applications will be driven more involve medical imaging like it's there you're already dealing with images and so either seeing how the the algorithms can just be ported over and people have had a lot of quite awfully impressive achievements involvement for imaging so this is for most of some students and colleagues at Stanford where they're looking at applications in digital pathology but the idea church that you're given some slides rights of different kinds of tumors or dependent tissue samples so currently in the hospital's it's relatively manual process and the sub labor-intensive process for the doctor to individually look at the slides and trudeau classified disguises to whether they're pathogenic or this or that they are benign or malignant right so so here in the this research group which means has obtained a very large dataset of labeled examples or equal to training algorithm that has basically is able to surpass the human pathologist in terms of the accuracy answer related type approaches can also be applied to other applications with Apple here is from n to inks group where they're trying to take EEG signal Center to detect abnormal pattern very severe so in my group would also be into in recent work along its line so one of the big problems in building cardiovascular models is if you actually want to have very such three-dimensional shapes off of the friends Nestle's right then currently that's also very much a manual process we have the human basically has to go in and essentially do a lot of segmentation and annotations by hand which takes many days to do one to build one model so it was somehow Stanford and students we are able to narration of these models developed system the council automatically generated three-dimensional models and segmentation of this examples okay so this are these artists are in imaging applications as the research team is somehow it's easier to think about why the methods are dissolved for your image recognition can be applied to biological images but what we really want to talk about today now how do these methods we actually apply to genomic data right which seems to be quite different in characteristics and different property now also what these methods even work and so that the top will have two parts the first part will be here a tutorial of basically how do these newer networks work I also see a little bit more detail for particular class of neural network called convolutional neural network and in the second article to the illustrate some of the main lessons and messages that we learned from work movies with these neural networks in genomica application right so there's a lot of stroking of the heights about different your networks about deep learning so I think it's really important we're thinking about applying them to new settings in front what's really the the real content of this message which is what is the no just the top and the height right so there's still very much a work in progress but I want to sort of get across and we throw three messages or lessons that we learn from our experience right I know illustrate each of these messages with sort of a concrete example from our based on our recent research so the first message is that we apply thinking about applying them to genomic examples it's really about the training data more so than the actual models in the algorithm how to go to the proper training data the second message both feature of this in much more detail the second message is that somehow being able to make predictions it's good but it's not really the most exciting applications right so these generative models beam example it's actually optional convince you it's actually much more exciting and then simply be able to make predictions even though most people are working on predictive models the generative models I think our exciting future direction on the third direction a third message maybe it's one of the more provocative it's as everybody is thinking about their overfitting which expected to new and networks right we have take that we're up to millions one hundreds millions of parameters right so overfitting should be a very big problem so what I want to convey here that actually overfitting is not the main problem is basically a solved problem for this in your network but there is the main bottleneck which which is what we call the fragility of these network I'll define precisely what for Jeremy okay so so here's a relatively quick introduction to these neural networks so that we're on the same page alright so the basic units of PCA or network is the artificial neuron start from very beginning so where the neurons think of that is being taking a weighted linear combination of your input right the input into our original is the vector of real numbers so the neuron will have weight assigned to each of the input and give you a very simple weighted linear combination with a weighted sum of these input examples but they can wait some and then it's going to be passed it to some threshold in common right so then the case is going to basically set everything that's below zero its inputs before the weighted sum is less than zero it's going to set it to zero right so it's going to be just simple threshold and that's the output of one unit right so the units of this new neighbors artificial neuron extremely simple works takes as input weighted average threshold and the next output so the whole idea of this deeper neural network is basically taking modular compositions of a lot of these individual neurons into different architectures so a particular pop reference architecture for working with imaging and for genomic data as we say what are called convolution of architecture not as I'll explain that with any practice sauce that have an image that I want to classify right the NHS that state is pixels right so 32 by three few metrics and your of three colors RGB ColorPlus is 32 by 3 2 by 3 8 or 3 dimensional matrix so that's my input all right so each entry of the matrix corresponds to the pixel value as at adapt addition so what a convolution neural network is trying to do it basically defining a lot of these local accomplished no filters right so think of each filter dealing with a patch of of exactly ourselves a patch of artificially around so we saw on the previous slide right so each and all so here I have this 5x5 filter right there the 5x5 is artificial neurons that we saw before right and what we're going to do is that we basically scan each of this patches across the image right for every place where it's what dispatch it's going to you know take this waited you know combination of the pixel values at that position it's going to spit out a number that's the output so the operation is very simple right so I'm just going to scale this this filter right meet up artificial neurons across my image know throughout the image that's a convolution operation as the output of this it's going to be either a smaller matrix 28 by 28 tricks it's a little bit smaller because some of the edges are left out when I do the convolution so each filter now it's basically the success of these artificial neurons and one skin across the image is giving us this output which is basically is another image right you can think of this as another matrix another image so I don't have this one filter so I can define a lot of different filters so each filter is going to have a different set of weight so it's going to take you taking a different linear combination of the input values all right otherwise get a different filter if someone gives me is different 28 by 28 output right so if I have a lot of different filters then I'm going to get a bunch of different outputs right one corresponds to each of my filters and this is where the modularity becomes very important right because now if I look at the stack of output after taking a convolution it's basically just another three dimensional matrix right so I think a lot of basically another three dimensional image and then define an additional applied additional convolution of that image but this convolution alaior basically that the basic operation the unit of operation it's like we're under questions about that okay so so there are basically two essential operations when building this new networks one is the convolution operation that we saw on the previous slide the second operation basically refers down sampling people give it pooling sort of a fancy name for it taking averages or down sampling so that means that you have the roughly large image I'll do the simplest thing with south of downtown voice in terms of a smaller image okay so here's basically how through an example of how you actually apply this and how this would work on on real natural images so this is my mod input here I thought no it's a it's a picture but but you can think about it's a three-dimensional matrix of pixel values so here I define accomplished on network with a few different layers right look each layer this is course on fuel the module early taking a convolution operation exactly saw before and then taking using some folding operation doing some sampling averaging so it's actually quite interesting a lot of fun to look at what the output of each other's layers right so in the beginning you can see when they take the convolution you're basically capturing some of the edges right as we take a conversation you take this stretch holding you're finding some of the edges of the image of the cars right so it's basically each pecan Lucian's basically a little edge detector trying to find the local patterns and if you go look around right so there's a pattern that you become harder to interpret right so some of the later layers especially taking their combinations of this local edges right so let's be coming the harder to interpret at the very end it is even something as easy to to carbide is basically doing sort of a logistical question at the very end and taking whatever features learned and basically doing sort of potentially as a logistic regression to say what is the probability that a total image in a car or on the airplane or its assigning a probability to each of these classes and we train this network now UBC so bad for this process right you have examples where you know the actual label and you essentially do the first gradient descent right if you are to train the weights of the network like the parameters of the network essentially the way to sign please each of this neuron okay so that's a very brief overview of what are the hottest new networks worth right so conceptually is easy kind of helpful to keep a few for bullet point in mind right so so there are lots of I mean we deciphered a lot of different architectures that people think about a lot of different training algorithms but in the end you think of this neural network is being for a very flexible functional approximation right we have some info T of some output and we try to build predictions either from water for notes imaging data work or genomic data you're critical to learn a mapping where some functions on the input your output so so then your network nothing but a nicely parametrized flexible class of functions to do that approximates your true mapping so the network I mean there is no magic here right we can't just put in garbage and then try to get out some nice results right so no in other kinds of machine learning during Bayesian methods where you have to be putting your prior information into the into the structure of the graphical model or or into the prior right so here if you have to put in your domain knowledge your priors addition to how you build the architecture of the network right so in that sense it's not fundamentally different from other sources Co machine learning models you have to put in your prior to my knowledge into the architecture alright so in particular here right so we where did we put in the prior knowledge we put in the prior knowledge and that we think that there are local edges that should be somewhat or spatially invariants right that's why this convolutional filter accomplish your operations that particular architecture is appropriate for this study and if you use the wrong architecture so I'll be analogous to be put in the wrong priors and then your models can actually be very much worse off so probably the biggest advantage of using these kind of newer networks is that it is a very well engineered large-scale platform through large-scale optimization right so there are a lot of other machine learning techniques that have first people at once that allows for flexible function approximations about to be building through priors but for various reasons right so this new networks are actually being extremely well engineered because partly because they're using a lot of large companies so as academics then we get to basically leverage the result of water that is very large in your system that's one of the main benefits so a project here is exactly two trainees models it's typically not always but typically require actually a large number of well labeled training examples okay so that was four images I mean I should example four images because it's sort of easier to interpret and easier to us or see what's going on but we're what we're really interested in today is how do you actually apply this method to zero genomic data right so and for this I think it's also eat it maybe is helpful to keep in mind for for this analogy right so the image you were thinking out where this input so in january context we want to think of the DNA sequence is basically being the images if you think of the dns essentially a very long one dimensional image right so when we always we saw that for images basically what these algorithms are looking for our local edges local filters extremely compact does analogous to those looking for local sub sequences right local structures or more in the world that's what we call motifs on the output right the labels of these images rather than being your cats and dogs or airplanes right so what's relevant for us here would be whether this particular stretch of DNA sequence right whether it's bound by by a transcription factor whether it's know may be relevant for a particular disease right so these are the output so okay so just to make this analogy from this here like very clear very concrete there here's a particular around the simple illustration of how you would actually apply this for DNA sequence data I'm taking basically the same and architecture that we saw from the previous slide write these convolution architectures as applied to images we're going to see how this applies to DNA sequence right so as we said when you think of the DNA sequence is basically being an input or a 1 dimensional image right so this is particular way so in the images right you have three colors RGB in a DNA sequence where every nucleotide we have four different register as well it's also analogous so it's been coded basic analysis or one notice a very long matrix exact is before so we have this local completion of filters so each filter against going to scan across the image looking for particular sub structures right so 19 body soldiers that there are spatially invariant right so they're going to be the same sort of supplies everywhere so in the examples each other here corresponds to a different filter right so different culture has different ways they're looking for different sub sequences essentially looking for different motifs and then the output of the filter our base is going to be you know some after doing thresholding going to be some quantification counts of at that particular location is that so to detect this particular motif or not right so that's going to be the output and then there's maybe some additional knows linear combinations of those outputs and we can iterate this and then the end the prediction I want to make is you know what's the probability that for this particular sequence whether there's a protein a transcription factor as financed this sequence so the conceptually in the model it's very similar to we saw before with the images and you can say that okay so mr. Sasaki is not all that different from what people have been doing right which is true right so you know people in computational ballots in genomics has been working with these different motif scanners which we started essentially what these accomplished no filters are right then I think so it is very stimulative very similar so the main difference here is that by allowing for additional compositions and off the completions and additional layers it's basically giving us a little bit more flexibility in trying to learn not just different motifs but maybe different combinations interactions between different motifs this is where having a flexible architecture Marshall architecture it becomes quite useful okay for the outputs right so once we have to set up then it's you know you can go to town which is great so it doesn't have to be whether it's just predicting whether it's bound by a protein or not you might want to also predict what may be the expression of comparte genes or of know of nearby team okay so so here's why I want to go through in more details of these three messages on your network that I want to hear conveyed right so the first message is that especially when dealing with genomic data it's really much more about building the right training data rather than a specific model as an algorithm so the records we look at this model before and that saves that go is actually sure to predict whether the sequence bound by a protein or not so here it's actually we have a lot of data sets right from basically from from chips you know from large-scale chip seek study so the input here that would be an order a 1,000 DNA no four vector ones often DNA nucleotide to locate you DNA sequence the output is the weather in this particular sequence you know particular KB candidate regulatory region is bound by of TF or not and here what's nice is that you can know so we have a lot of chips not as many cell lines but also got many here okay so that is that basically can be used as sort of the gold standard labels or reach for the algorithms of labels for what is being balanced and what it's not be involved in typically you get on the waters of the millions of positive examples of actual binding events from this biological chip seek study and for the native training sets for some negative events working to are not found we people usually take genomic sequences are not overlapping different histone modification sites or different ships e-filing sites with as negative control and we look at the training data session so it is actually a very large training dataset right and it has a very clear label right you know what I also have this binary whether it's bound or not songs and actually on the roughly of the same order of magnitude as the data that's used to train is large image data set right so here you actually have enough training data label low-level training data to make some progress and I should mention here that you know in a lot of our experiences a lot of our colleagues experience right so and the actual specific exactly how you put the architecture ends up nothing is so important right so it is important to building the priors that maybe you're using this convolutional structures and you're studying the filter to be the right links in about ten twelve base third but beyond that it really doesn't matter so much in terms of the performance of predictive activity of the algorithms whether you use three layers or five layers or exactly how you train the model right so this is where it's no it is much more important to have them well curated training data sets where it's clear what are the positive examples and what are the labels unless you have less then actually most of the standard architectures work relatively well and this I need to be very deep architecture as I want it go through like a million more concrete application example of this something that our group in cooperation with on a show have been working on recently right so here Institute illustrate ideas we saw from the previous slide so here light step is that have some reporter assays some synthetic sequence may be that I take from different parts of the genome where I can just generate right now may go on a Tesla sequence and we can put those up into a different reporters these are genes where we can have readout of the gene expression in for this whole contract itself and what it's like about this is that you can do this in very much in parallel right you can get in one set of experiments on the water itself a hundred thousand of these labeled examples right where each labeled example is one set of this constructs or water DNA sequence hair twists and outputs which is the expression level or the cookie level of the reporter gene I want to thank you I think in you know you can test a lot of different DNA sequences with different combinations of reporter change then also test this in different cell types okay so the so the note so notice this perspective this into our into our example right so the input here is basically this equal here is shorter than 1000 KP 1 KP on the water come here a couple hundred base pairs do you have this sequence of Nexus the input is the algorithm the office of the algorithms the targets for prediction is basically the activity level of this reporter athlete alright and we have 100,000 or $300 screen examples so quite good training data and basically the set of things pretty straightforward concept so later a basically we're just seeing this completion on your network similar to what I've shown you before across this constructs and every position a constructs is to insert a basically predict what is the Tippie score of is gene one carrot was this was this regulatory element and the nice thing about this is that once you train this model that you can get to get quite high prediction effort these around validations your test data well if you turn the model you can actually use it to make interesting interpretation right so here's one kind of interpretation if I suppose so now I've trained this new and network the weights are fixed acknowledges it is just a deterministic function of input the output so I can now apply linear network in a particular position right there for practical regulatory elements and instead of projecting a song maybe like a load lower could be a level low pressure volume and then computationally you can do the syrup in silicon with the Genesis very efficiently just say okay I can take this particular nucleotide of interest round maybe change it to a C rather than T and it's speed that the mutated sequence attacking to them into the prediction algorithm to see okay so what is this notice is predicted to be actually change or not so maybe you have made a perfect orientation adjust to this new honey but I'm keeping everybody else's saying you know maybe it gives a super high prediction scores predict activity score so the thing about this which is nice is not I can do this very systematically across all the regulatory sequences right so there are constituents ficient ways to introduce Vista to do these different into the community Genesis and what that lock is it was used if a simple way to interpret what is what is the thinking essentially behind it in your networks right is actually assigning an important score to each nucleotide based on how much the predicted activity level will change if you mutate in silico that particular nucleotide and that actually does have quite strong agreement to the question yes yes that's a good question so what one can do is that you have sort of control sequences right for background sequences where I talked a little bit about these negative sets essentially where they don't have any predicted or they're not they don't have any measure activity scores and you get the control you can see okay so please make random mutations there the assumption being that most of those mutations are unlikely to actually beat you with generation of their new new your Hansard just be five point wise mutation and then you see okay so does that actually you have false positives there but actually missions that there are other ways you can process the statistical significance of between Kota scores so one way they go here the x-axis quick run through the important scores of the nucleotide right now you can ply the network to the whole genome try talking to the fire genome human genome and testing put a score of nucleotide that's the x-axis and the y-axis is receive a confirmation of that nucleotide alright so this is a finest model to school sequences and never seen before joint training process and it is a very reasonable thing where you can see that based on our house both great very positive or very negative things that there are predicted to either be highly influential in up or down regulating the expression also have a much greater for conservation of scores so another way you can sort of used to to interpret the results of this is the okay so how does this different important score correspond to different epigenetic signals come tomorrow an idea here is that there are specific chromatin marks without going to too much detail that are corresponds to use of highly where if you have a high pick ptosis mark sense more like things that you have different regulatory activities so the one can do here basically then the the true genomic sequences by their observed chromatin signals from the highest peaks to the lowest peak and we see here that so the the basically this is speaks the how quickly the highest predicted important scores also have the highest of teenage signal so there's a very clear monotonic relation between the important scores predicted by the algorithm which the last we measured chromatin signal okay so so so that was hopefully of now you've seen some examples on how you can try to apply these models to make prediction right if you have a reasonable training data set then then this gives you this for a nice machinery so let's make predictions and also by doing these things silicon with the Genesis to interpret the outcomes of these networks of the prediction so the the second message I want you're going to discuss is that you know making these predictions is nice but somehow it's not really the most exciting I think applications of these methods and in many ways these predictive models are there are theories concepts rhetoric standard models that we all know how to do pretty well but I want to go back to these are examples of the generative models which I think are actually more exciting and potentially the more powerful applications of this neural network alright so again here in this general example what makes is quite striking is that it has to give it some sketch of what the kind of things you're looking for right and the network's exactly will know some rough specifications the network that I've learned the distribution of these objects of the 10-pack well enough that is actually you could generate your synthetic samples that are seasick your prescription but also go in a lot of the details right so why might this be exciting so imagine now if you really want to design particular protein or DNA sequences a pocketbooks you know specification maybe I want this to bind oil supported with certain proteins and very specifically right because those are the input they want put in and I would like to the network so it's basically automatically generate sequences that satisfies those specifications and generate a lot of diverse set of them right there currently it's not clear how to do this sure by hand in a principled way so I want to discuss a little bit give you some intuition for how these generative models work these are actually much more recent advances in in the community and I think they're quite exciting the idea of a generative model is that I would like to basically design and network a newer network that can capture and I can model a very complicated high dimensional distribution I suppose my distributions instead of images right so that's a very complicated distribution in a high dimensional space and I would like to have a neural network that models that whole distribution if you're not trying to make predictions and turn model the distribution so the idea actually turns out to be very simple so it start with a distribution that we're all very comfortable with such that was going to be a high dimensional Gaussian distribution and I can sample from the skeletal distribution very easily so what we like to do seriously learn a deterministic mapping of function called F from my gaussians to this really complicated high dimensional space right such that the disability of the Gaussian after adding this mapping would actually capture the densities of my images or my DNA sequences so that's the idea and right so we know that these networks are good functional approximator guys okay so that's this privatizes function asked by the Maltese composition of your network but there's still a bottleneck here right so I mean the whole difficulty of this is that if I'm trying to model the space of sequences or space of images I don't actually have any explicit form to know what is the distribution right that's the whole crux of a problem so how do I even know that my you know my I'm transforming my gaussians in a way that's actually reasonable giving a reasonable probability distribution so the very nice idea here that such we try to learn that so here's how it would work right the spoke I have a sample from my Gaussian distribution at some point here let's say we fix my math you right I'm going to you're by this point is passes to my mapping and to engineering some sample in this case using a pretty good sample then actually looks like a real digit if I have a different point where maybe gives me a different example so the idea is here is that okay so I want to be able to tell you wonder where this implies I guess on this my distribution after a passage to do this mapping it's actually a realistic stamp or not right if it is a realistic sample that means that somehow my mapping this transformation ad has captured a lot of the interesting densities and distributions of by head of images if it's not then they had to down low enough and so how do I do that so why don't we just actually have another classifier whose goes to visit a to predict whether the synthetic sample generated by this algorithm it's actually a real temper fake sample and it's got fired given access to a set of other real actually real examples we all Gammage it so the fast works trying to predict whether it's real or not I in the state near the conference's okay so the image generated by the algorithm actually looks pretty realistic so it couldn't tell the difference then this crowd particular parameter value is somehow rewarded practically given a little bit Griet gradient boots maybe you know in other settings a generator for know terms of that example that's clearly not from a human drawn image then put on the classifier to the predicted to be fake and then you know this parameter value is going to be punished correct that's going to go down the Grady okay so again this is conceptually very simple but it gives you is for a nice modular way still by iterating through this process to see whether the city learn a setting of the parameters so that the original Gaussian distribution snaps to something that close the approximate your full distribution of images and once you have that then the sample is extremely easy example from a Gaussian pass it to my learner mystic function f and that's and then I have my image or as my protein okay so so recently within exploring how to apply this to actually to genomic synthetic biology or savants genomic example example size are along the lines to the admission before so let's say I have what design DNA sequences and I would like to have some specifications where maybe someone comes in maybe want my synthetic biology colleagues comes in and says okay we won't have DNA sequences find specific use these proteins X and it's not fine - why okay so an isolation of this whole systems cause of generative adversarial that work and adversarial because you think of this as being super game between the generator which can generate images and a classifier trying to say that what are the generated images is real were fake there's a competition between these two these two okay okay so so here and so you know I know I'll come aside to it to this I know this network is going to take these specifications and then they can be sample from the conditional distribution and give us just really nice diverse set of DNA sequences that can find specification excreted as soon as we really powerful for a lot of cloud tech application okay so so basically the strategy that we use is similar to we saw before so we have a generator that's essentially taking Gaussian and Inter the map post Gaussian in some way to approximate the more complex probability distribution of DNA sequences and then we have a discriminator or a classifier that Swiss gold face feature to a reward or punish the examples by telling it whether it's real were synthetic and here the goal is to make the genome simple go for temple would be to see if it can we generate proteins actually look realistic there's just a proof acosta so in this case then the discriminative additional January DNA sequences that code what if things are really realistic proteins and then the discriminator is trying to classify this cooking this game where to look real this it looks face compared to existing database of human protein so so what I mean so actually there's a lot of work to be done in this area but we show that is as a proof of concept that you can actually go through system and generate protein inside look very realistic right so there's a lot of different ways you can measure your house the distictive cooking is it means one simple way diseases are expose ability I think have a protein that's actually fold into service table structures that's sort of a pretty simple but competition efficient works evaluate how could this protein serve and in this work we showcase we have this network that can synthetically automatic generates artificial protein it's actually has has portability scores and she looks as good as support abilities of the real human protein is actually happening even more higher affordability score so a little bit more stable structures than the distribution of the naturally obscure human protein so I should give a shout out I think you my students on visa cooked up with an under budget Stanford she actually won the National Invention competition College Admission competition for for building this generative model okay so the last part of this talk the last message is that I want to convince you that the introduced so this is for critical thinking that overfitting circle while the main bottlenecks with this neural networks with especially each neural network but I want to show you that specially overfitting is really not the main problem right we know essentially how to deal with overfitting but actually there's a much more complex problem called the fragility of these networks and when their interest actually intrinsically very fragile object and this is also very much work in progress I think there are lots of interesting ideas research ideas here specially for our right decisions so so basically we know how to prevent these new and I went to overfitting right so the better decision for fitting it Center all like the TED Socrates also we know they're not overfitting because the test accuracy is initially the same as the training accurate we have very clean test kolevatov I think I give us an example has never seen before my test data it's going to predict that in this group of Panda with very high confidence it's not overfitting by the standard definition but it's extremely fragile following sense right so it will give it there's a very small perturbation it's not a random perturbation but I can tell you how to precision constructs it afterward they get a very small perturbation then the same that works could you predict this image to be you know given the type of eight with almost 99% confidence worth to human life is clearly nothing really has changed in image too few more minutes okay so the idea here is that these networks are simply costing a lot of what so why does this happen alright so we think this networks are actually cutting a lot of deformation to the space so here's an example actually from our colleague David siminoff where if you take that so imagine if our input space the distance if you pass it schools and different compositions of functions but if you increase the number of layers for a number of no composition you have you have the coracle distortions of your input space all right what is this matter because if you take two points are close to each other in the original input space or the image of the something that's very close to it they're currently maps you of places there actually could be very far apart in their output space and this is sorry also an example this is just acknowledge the distortion changes as the steps of this algorithm of depth of function increases that's all so we see that the distortion is actually increased during training time later right between each other go this serve nice work for my student a Maratha origami where does it began the two dimensional input or there's a few different classes right this is proposed distribution I want to model and it would be what is shown here as you train the network here's a two-dimensional output of this that may pop into the network you Trina network or the network study learning over time to release this form the space from the input space in more and more complex ways right again still a couple hours distortion in that point there nearby in the beginning are going to be sent to very far more point places in the end and this is a huge problem I think this fragility is a huge problem for a lot of the applications of this approach because if you came out in an image setting so if I'm trying to build this classifier to predict the weather image from a pathology sianis malign whir whir whir whir healthy right so it's a very small perturbation of that actually changes my prediction with high confidence than that's clearly problematic but a similarly so genomic gradation example so yeah so there's a lot of things here that I'm not going to but I think this is a very interesting and input an open area of research and ending up as you get into your interest in half and more about that okay so so sure to my my students who did most of the work we last example [Applause] [Music]

Info

Channel: Institute for Pure & Applied Mathematics (IPAM)

Views: 16,811

Rating: 4.9389315 out of 5

Keywords: computational genomics, genomics, cgsi, ucla, ipam, james zou, deep learning, machine learning

Id: JYt1IqdDAPc

Channel Id: undefined

Length: 49min 0sec (2940 seconds)

Published: Fri Aug 04 2017