J.J. Allaire - Machine Learning with TensorFlow and R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all our life here be much harder losing I think I want to where you're here you found an art studio alright thanks [Applause] that's a lot of pressure thanks Jared let me see if I can get the mic working okay good okay great all right all right tonight we are gonna talk about after I get my password entered here we're gonna talk about machine learning with tensorflow and are some of you may have heard of tensorflow it's a open source library from google it came out a little over two years ago and from the time that tensorflow was released i have been very excited about what we could do with tensorflow from our and hopefully i will communicate that to you to communicate that well to you tonight I've been working for the past 18 months or so with a couple of other folks from our studio on building our interface to tensorflow and I'll talk a lot about that work tonight I'll start though by kind of giving you an introduction to what exactly is tensorflow kind of define it what are the core components of it give you some intuition about what it might be good for the principal application of tensorflow not not definitely not the exclusive application the principal application is deep learning so I'll talk a little bit about deep learning what is it again what is it what does it how does it work what is it useful for not useful for and then I'll get into the work we've done in our to let you use do deep learning and other types of machine learning using tensorflow so first what is tensorflow and you might and some of you might answer oh that's a that's a deep learning library that that Google is created and that is true but it's actually significantly more than that tensorflow is actually a general-purpose numerical computing library actually it's applications go quite far beyond deep learning or even even just machine learning tensorflow is open source as I said and one of the most interesting characteristics of tensorflow I think from our standpoint in the our community is that it's Hardware independent so we and our have been wrapping miracle computing libraries for a long time the original s language was motivated by wrapping Fortran numerical computing libraries and we've been wrapping blast libraries which you can see listed there we wrapped the the C++ eigen library we love wrapping numerical computing libraries in our the cool thing about tensorflow is that it's an american computing library that runs on all of the hardware and runs well and all the hardware we have today and also a potentially on future hardware so it runs on the cpu and takes advantage of all the cpu cores take some advantage of parallel processing using sim D instructions it runs on GPUs it runs on a new piece of hardware called a tensor processing unit which is something that Google came up with hardware specifically designed to run tensorflow programs or tensor flow models so that Hardware independence is really critical it also supports automatic differentiation which is used extensively in the in the deep learning interfaces within tensor flow and the whole system was built from the beginning for scale it was built for deployment it was built for distributed parallel execution it was built to work with very large datasets so it's an it's a numerical computing library that's really really powerful interesting and built to scale so why should we care well all the reasons I just talked about a new numerical computing library for us to build beautiful our interfaces to a couple other things I didn't mention the optimization algorithms intensive flow the whole system is built with the presumption that all the data is not going to be in RAM so the optimizers don't require that all the data is in RAM most optimizers that we might use or many optimizers we use with machine learning have the assumption that all the data is in RAM that sumption is not there with tensorflow and the other thing that I'll get into a little bit later is that when you deploy a tensorflow model you don't bring along any our code with that or Python code it's just deployed against a C++ runtime so you know traditionally an R will do exploratory work we'll build models and then when we deploy them it can be a challenge to bring R along with along with that the whole model of deployment is that you don't use a C++ runtime for deployment all right so some of the basics of tensorflow what is a tensor what's flowing you how does it work what are some uses of it tensors actually you're all already familiar with tensors and you use them all the time in are they're just a raised specifically multi-dimensional arrays so a vector which was pretty much we all use all the time in our is a 1d array a matrix is a 1d tensor a matrix is a 2d tensor and then R actually supports 3d and 4d and ND arrays using the array function or a data type 0 D tensor is a scalar which are actually does not have a scalar data type but you can think of a scale or conceptually as a vector that's always of length 1 so pretty much tensors are what we already use in our to do most of our computation so that's really nothing new some examples of tensors and how they play out in tensor flow one of these I want you to note is that the first dimension is always samples there's always you know n number of these things that were that were considering so when we think about vector data like a data frame we can convert our data frame to a matrix that's a 2d tensor samples are the rows and the features are the columns that's a 2d tensor so that's very common in our 3d tensor a time series would be an example of a 3d tensor and we do represent time series objects and are kind of in 2d but if you think about it there's the the features and then the steps like a data frame but then your time series actually considering a sequence of observations how they relate to each other so that's really a 3d data structure that the models considering it's considering multiple steps at one time images are an example of 4d tensors you might think of an image as a 3d data structure you'd think height width and then color channels red green and blue but it's in tensor flow it's a 4d tensor because you have multiple images so I'm gonna feed 4d tensors say 10 images each of which has 3 dimensions and so the whole tensor itself becomes four dimensions so all the data you feed into these models is one of these it's a 2d 3d 4d or in the case of a video that would be like a 5 D tensor because the video is just a sequence of images so that's just defining the term usually when you're actually coding you're basically dealing with matrices and some you know essentially arrays that compose that stack multiple matrices together so what's the flow part of this what does that represent well tensor flow models or tensor flow programs are not like ordinary scripts in our or other programming languages actually a tensor flow program is a data flow graph the tensors the data flows through the graph and then the graph has operations in it the operations are sort of like functions they're matrix multiplications or additions of bias terms or taking a gradient so the the the the program is represented as this data flow graph rather than just as a script as you might be as you might be accustomed to and I'll get into why that's actually beneficial in a second when you're actually working in R though typically with high level interfaces to tensorflow you're not actually building the graph directly I'll explain this code later this is some R code that basically builds a Karass model and neural network and you're just here you're just declaring layers of the neural network there's nothing about graphs or nodes or operations that is a picture of what the actual tensor flow graph that's created as a result of this model again you don't usually program that directly although you can if you want to there are interfaces for interacting directly with the graph and more innovative applications of tensor flow might might want to take advantage of that so what is the motivation for this data flow graph why do we have a graph well I can think of a couple other analogs in R of intermediate representations that make this a little more intuitive so when you write a shiny application you have a set of inputs and outputs and reactives that is actually effectively a graph of how the inputs and the outputs every application relate to each other and when shiny runs your graph it can actually be smart about oh this input changes so I only need to execute this reactive in this output so so shine is another example of a graph that bite bite bite sighing knowing the structure of your user interface can run it more efficiently when you write deep liar code that works against a like a relational database the deep liar code is going to create sequel the sequel is an intimate representation of your query and that is going to get fed to a sequel optimal that's gonna run the query really fast so the motivation for the graph is basically speed performance scalability portability so so by getting your model into this graph it actually can be run in parallel it can be run in parallel across multiple machines the operations can be compiled in an intelligent way like fusing together related output operations and then we get this portability attribute where when you deploy your model it's just running with c++ it's not running with r or python so that's kind of the motivation so that's really at the lowest level what's going on with tensorflow and what so what are people doing with this I'm gonna get into some of these in more specific detail later but we have a gallery on our tensorflow for our website that basically has a bunch of blog posts that write up some different applications people are doing a lot of deep learning with tensorflow which typically and will might involve some computer vision it might involves natural language processing speech recognition it might involve doing the analysis of really noisy time-series data there are people are doing conventional machine learning things like support vector machines with tensorflow so there's lots of things people are doing with it but because it's a general-purpose numerical computing library there are other applications of it that may not that you may not have thought of and I actually I'm actually quite excited for the our community to kind of get their head around this and figure out what what might be possible I'll give you one example of this that's already happened in the art community there's a project called Greta which is similar in aims to bugs and Stan the idea is writing statistical models and fitting them with MCMC but the way Greta approaches the problem is that you write our code rather than writing in a special language to define the model you just write regular our code you can see a comparison of some Greta code and some bugs code and then that code actually gets compiled into a tensor flow graph and therefore that code can you know train on huge datasets can run fast can train fast on multiple CPUs and GPUs can be deployed so this is actually written using the are interfaced of tensorflow and it's taking advantage that underlying numerical computing and automatic differentiation to do something that really has nothing to do with with deep learning so hopefully we'll see many more patience like this so now I want to get into a little bit of what is deep learning and what's it useful for should we care about it now later kind of the motivation for why why even why even spend time on learning about doing deep learning with our so at the at the very very highest level what is deep learning it's really this is an example of a handwritten grayscale image of a handwritten digit and we're trying to build a model that will predict what that digit is so really what deep learning is is I have an input domain like an image and I have an output domain a prediction I have inputs and outputs observations predictions and I'm gonna build a set of I'm gonna change the representation of that image to get closer and closer and closer to my output domain it really it's stacking layers together that progressively transform the representation of data until it gets close to the output or the prediction that you want that's what the fundamental mechanic of deep learning so what are layers the some of you may have seen you know neurons and neurons activating other neurons you don't I don't talk about that at all here you can think of a layer as just a data transformation function that is parameterized by coefficients or weights so they're just big functions sometimes a layer could have to have a hundred coefficients and it's going to do some kind of a transformation of the data and we have to learn what the weights are to do the correct transformation so it's a geometric transformation parameterize by set of weights that's really what a layer is and when I talk about representations what am I talking about here so there's a really trivial example of how changing a representation gets us closer to the the prediction domain or the output domain we're looking for I have a bunch of points I want to classify whether they're predict whether a point is going to be a new point is going to be white or black by changing the coordinate system I've made the prediction quite trivial because when I change the coordinate system now if X is greater than zero the point is black and if it's than zero the point is white so you might think of this or know of this if you do machine learning as feature engineering where you're transforming your inputs or features into ways that are amenable to modeling and deep learning model the feature engineering is basically learned the we don't do a lot of hand tuning of features before we feed them into the model instead the layers are the feature engineering so there are transformations that probably aren't this trivial but sort of transformations like this again to get us closer to the prediction domain so going back to the example of the of the handwritten scale digit as we go through the layers of the model we're discarding kind of the appearance of the digit and we're kind of learning filters that let us see okay there's there's shading there's there's a there's an edge there's a slanted edge and as we go through the layers of the model I'll kind of explain a little more how this works later we go from just the appearance of the digit to something that's closer and closer to the actual prediction which then on the final layer we get so we're stripping out with distilling data by stripping relevant information and trying to get closer to the essential thing trying to predict so that is where the term deep comes from in deep learning it's not deeper insight or deeper models its layers it's many layers you you could call it equally well layered representation learning or hierarchical representation learning those might be like more slightly more precise terms but it's called deep learning because well these layers and in the number of layers I showed for there could be there could be ten there could be 50 there could be a hundred layers depending on on the application so typical typically machine learning models will tend to feature engineering and then one or two layers deep learning has many many layers that's why it's called deep learning and what can we do with this technique with this different approach to to mapping inputs to outputs or observations to predictions well deep learning has accomplished quite a few things in I would call in the perceptual domain so computer vision image classification speed recognition handwriting transcription some things in natural language processing some of these things have been composed together to do things like autonomous driving so deep learning is able to learn very very complex functions functions that we haven't been able to learn with previous machine learning techniques how is it how is this possible to give you some intuition you think about a paper ball I think about like trying to write a linear equation for UNCHR um pulling a paper ball you it would be really difficult to do that but if you think about what a human being uncredible is a paper ball they're just gonna do 30 or 40 little simple transformations to pull apart the paper ball until it's flat and that is you know in a way really what a deep learning model is doing it's gonna learn the sequence of simple transformations to ultimately do something very complex to learn a very very complex function and that's why I think it's worked better on some of these very very complex perceptual tasks so why should we care about deep learning I think there's two axes to consider here one is deep learning plays in domains that we don't do a lot of in are so things like computer vision and computer speech recognition there are some reinforcement learning applications of kind of writing models that know how to interact with their environment to play games or do other things we don't really do that much in our we do we do definitely do some computer vision and certain in geospatial applications and some biomedical applications so we do computer vision but but we don't do a ton of it so this this opens up kind of being able to use deep learning from our opens up some new problem domains to us but I think what's more interesting is to think about does do these tools give us improve techniques for the domains traditionally of interest to our users is there data that we have that has very complex sequence or spatial dependencies that's hard to model too noisy to model is there data that we have to do a ton of feature engineering engineering on that may be kind of brittle to get good models out of so maybe this will provide some techniques for our traditional domains I think you know spoiler it might it might not it's not proven that these techniques are gonna are going to revolutionize any of the things that we traditionally do in statistical modeling and and a lot of machine learning it's definitely proven to be very good way better than than what's come before in these perceptual tasks but it's a work in progress but it's a work that a lot of people are engaged in there are a lot a lot of people kind of coming at can we use these tools that have done so remarkably on these perceptual tasks to help us with other data analysis tests so I'll get into more of that later on on the talk so before I do that I want to just just describe a little bit the mechanics of how deep learning models are trained and kind of go through that handwritten digit example again in some more detail and talk about kind of the training loop the basic mechanic and I'll hopefully in doing this convince you that this is actually a really straightforward mechanism that kind of once it's been scaled up can do these really really remarkable things so a word first about kind of statistical modeling versus machine learning I'm not going to get really into this topic but I think it's important to state upfront that statistics is often focused on explanation on understanding on kind of figuring out the process by which data is generated and machine learning and deep learning in particular is really heavily focused on prediction it's not something that really yields explanations or understanding and you get more intuition about why that is later so that's a difference there's a number of Link's here to kind of read more about that but it's different than some of us who predominately do statistical modeling and way of thinking about data analysis problems so let's go back to this handwritten digit recognition later on on the talk but basically an R we're going to define a model that is composed of a set of layers there will be certain types of layers we're using there'll be certain parameters of those layers we're trying to define the right layers that we'll be able to learn the task here of predicting a digit and then those layers will be trained to do these these representational transformations that I've talked to you about before so how does that exactly work well when a deep learning mouse starts we have layers and we have weights the weights are the coefficients that we're trying to learn at the beginning the weights are initialized they actually are complete garbage so the predictions that you get off the on the first batch of samples are gibberish they don't do anything so we're gonna start training our model by having again randomly initialized weights we're going to feed data into the model and we're going to see how our predictions are and they're not going to be very good so what we're going to do is we're going to measure the quality of the predictions by using a loss function the data that we feed them to the model we actually know when we feed the four in we know it's a four and we see what the MA oh the model predicted a five okay or I thought it was maybe a four but probably a five so we compare the predictions and we generate a loss score and we use that loss to update the weights to learn the coefficients of the model and that's the job of an optimizer which basically translates the loss core into a set of updates to the weights so this is the basic mechanic we're going to feed through thousands and thousands of batches of data we're gonna start with weights that are random and we're gonna eventually learn weights that actually cause the model to do what we hope it hope it's gonna do what we wanted it to do and if you think about this is a really simple mechanic and actually the mathematics behind it are really straightforward this tweet is from Francois who's the creator of the Charis library and he's saying these are not neuro they're not even networks they're just this chains of functions trained with this really straightforward gradient descent algorithm so it's a really straightforward mechanic but it's done all these incredible things really we've created this mathematical machine for unn crumpling these really complicated paper balls so how does this work that the the why is it that this works so well and actually the the ideas behind neural networks were invented decades ago so we've actually known about all these things for a long time and they haven't started working well until very recently and it turns out that that the key to making them work well is to have really large models trained on lots and lots of data and the reason we can have really large models or there's some algorithmic advances that enable that but there's also GPUs and modern CPUs that can be used to train really large models and we have the Internet and the digitization of many many elements of modern life that have created lots and lots of data so it's the kind of hardware algorithmic advances and more data have actually made these models work so when I say large parametric lots of parameters give you some intuition about that the grayscale digit recognizer a simple greyscale digit recognizer this is the same one I showed you the code for earlier has 1.2 million coefficients or 1.2 million parameters that's a really simple it's just greyscale digits it's not doing much at all if I build a model that can recognize everyday objects a model that can that can look at color images of you know and say is it a cat is it dog Ian's at a table is it a chairs at a cloud that is gonna have about 138 million parameters and that model is trained on 14 million images so when I say large that's what I mean large not like a hundred like 138 million so that's that's one of the things that's that's that sort of enabled this breakthrough these really really large models trained on lots and lots of data so what's happened is that there are frontiers now where especially in computer vision we've hit new frontiers where we're able to do things that we could never do before and people are really excited about that we've started to see some frontiers emerge in natural language processing natural language translation and people are working in other fields to see do these techniques apply can we get two other frontiers can we find other breakthroughs so some people are working in time series things are going on in biomedical applications so that's kind of a thing for us it's these frontiers some of the frontiers are already explored and we can benefit from those and some of those the frontiers are there for us to explore so really important to that this computer vision this really touched off the kind of intense interest in these techniques across computing and data science so there's a competition called the image net challenge that's run from 2010 to 2017 where there's 3.2 million labeled images you know it's a it's a car it's a it's a cat it's a dog it's a chair etc and machine learning researchers competed to see who could build the best models for identifying what's in the image and when they started the contest in 2010 the accuracy was in the low 70s I think it was 72 percent that was the best that machine learning researchers could do in 2010 so in 2012 one team entered the contest and used deep learning and they beat the entire field by ten point eight percent and subsequent to that everybody started trying to figure out how to how to do computer vision with deep learning and in 2017 we were at ninety seven point three percent so in the span of seven years we went from 72 percent to 97 percent and that is really what got people so excited and the reason people are looking turning up every rock they can trying to figure out do these techniques work well for what I'm doing see yeah so people are looking hard at natural language processing and this is a one indicator could be indicator that's just a fad or it could be indicating a fundamental shift but the graph on the right shows the percentage of deep learning papers at computational linguistics conferences and it's over that same time period it's gone from the mid 30s up to 70 so a huge amount of activity there this paper these slides are will be available I'll share a link to these slides after the talk you can you can find a reference to this paper does a roundup of some of those of some of that work and has found that there definitely progress is being made that field is not being revolutionized but progress is being made in a number of areas in natural language processing one area in particular that huge progress is made is in language translation Google published this I think in 2000 maybe in late 2016 or sometime 2017 their neuro machine translation system which beat their previous phrase based translation system significantly and is getting closer to human levels of language translation so that's that's another huge win that's language translation there are there are many other things people do in natural language processing and people are hard at work trying to figure out where else can we do interesting things with deep learning and natural it's processing some of the same things are going on a time series honestly almost almost all time series analysis the conventional time series statistical modeling tools that we have will be deep learning in certain areas of really noisy datasets with like overlapping seasonality people feel like maybe deep learning models could be better this is a paper that tries to use convolution which is the technique typically used for doing computer vision on some time series and compares it to some conventional methodologies another paper that uses a different approach and an autoencoder to try to predict they're doing predicting financial I think stock prices so there's there's research going on an experimentation going on in time series and this field is not is not turning upside down because of deep learning I think it's I think in general if people are getting wins here they're probably spending several weeks on their model and then getting the win they're not just okay let me try a deep learning bird notice oh look it's better that doesn't happen the biomedical field there's a lot of stuff people are looking at everything from things like just patient classification treatment recommendations even looking into into various biological phenomena this is another paper that rounds up what's going on there again it's not changing the whole field but there have been lots of promising advances so again worth paying attention to if you work in this or related field this is a very interesting study it kind of indicates the visit axis is a paper that just came out last month it sort of indicates the the the very different approach that deep learning models take from conventional statistical modeling techniques so this is a study that Google and Stanford and New York Chicago School of Medicine and University of California San Francisco did together and they looked at electronic health records which are very messy there's all kinds of missing data that data is not well normalized you know it's collected in one way at one Hospital and or geography and another way and another one so really really messy data typically when you want to build statistical models against electronic health records you have to curate and normalize the data extensively so you're going to pick out okay we're going to pick out a subset of features we're sure they're properly normalized a we're going to impute certain data so this very very labor-intensive process of kind of picking out the data we're gonna analyze so we can so we can actually use this to pull model against it but in doing that we're actually discarding a huge amount of data all that messy data we discarded is gone it's not part of our analysis data like the notes that a physician makes gone so what they did was they said let's take all that data and let's try to have the deep learning model consider the entire medical record we're not going to clean anything we're not normalizing we're gonna look at the physicians notes and the network is going to figure is it's going to sort of do its own normalization do its own culling of the data and see if it can learn a better model and in this study they got predictions that were better than what they would they got trying traditional predictive models that could be a strawman what they what they tried but here they found better predictions the if you notice the people who are on this paper one of them is he's the architect of tensor flow so it's not just like anybody can get these kind of results I think I think Google tried pretty hard to get these results but it may be that as this is an example of a technique that may we may gain broader understanding how to apply and it could change the way we do certain types of prediction so as you'd imagine there are there are many problems with deep learning models I referred to one which is that there they are blackbox models you cannot model with 138 million coefficients you can't look at the coefficient and say look we understand this phenomena much better so they're they're total blackbox models they are they can be very brittle so if you look give you google for adversarial examples there's one you'll probably find which is like a picture of a panda and another panda and they look identical but one of them has been doctored to trick the the image classifier into thinking it's not a panda by just changing little little things that a human brain would never do that human brain has a far more sophisticated set of mechanisms at work and I could see it's a panda it's not even a question but these models can be fragile so you can trick them easily and there's ways that are being pursued to tend to overcome adversarial examples you can think of like in autonomous driving you want to make sure there's not like an adversarial example of a sign that is misinterpreted so that this is a serious problem for the field but that's out there they typically require a lot of training data to perform there are ways to do what's called transfer learning where you can get the knowledge from one model and move it into a smaller data set I'll talk a little about that later but mostly they require a lot of des and they're very very computationally expensive to train so there are a lot of problems and a lot of reasons that they that you know that they don't work well for certain things one of the biggest problems that was actually the hype that has emerged around these techniques driven by some of these big successes in perceptual tasks and actually the the current generation of tools makes deep learning accessible to people who've never done statistics or modeling so they can build a model and they're really excited about it and it's deep so it must be good and it's actually like way more expensive to train than a statistical model and way worse and accuracy so this is we're all going to be trolled by this you know and the years ahead and we can on the one hand could say well the deep learning stuff is snake-oil and don't pay attention to it it probably doesn't work or we can try to say well let's have a more balanced dialogue about where this can be used and how it can actually be made better I think I think that's the thing to do because in spite of the problems it's really useful and you know you can see at Google they came out with tensorflow in you know the line here says q3 of 2014 and it's just the use inside Google has skyrocketed they've found uses for it in many products in many applications so I think we got to try to find help deep learning find its appropriate place in the domains we work in rather than just say oh there's so many things wrong with it it's not worth spending any time or bandwidth on so I want to talk a little bit now about the work we've done in our to let you use build deep learning models and use tensor flow from ours so we have some very high-level our interfaces you've seen a little bit of that already we have low-level interfaces to enable kind of new applications like Greta and we have some other tools to help with your workflow so and hopefully we've also invested quite a bit in sort educational resources so you can learn more about how the stuff works so there's actually three sets of high-level api's to tensorflow from our there's the Charis api which is this high level interface for neural networks there's the estimator API which sort of a classic linear regressor linear classifier classic machine learning those API almost look like our modeling functions and then there's the core API which is this low level access to the graph which which was used like for Greta and can be used for other kind of innovative applications and research and this this then sort of ends up there's actually seven are packages that give you interfaces to tensorflow there's one for each of those interfaces there's a package called TF datasets that that deals with giving you access to very large data sets from tensor flow models and we have some supporting tools for deployment experiment management and doing cloud training so I'll talk a little bit about all this in a minute so the Charis api we're going to actually talk quite a bit a bit about that today that's the high level API again you're just saying these are the layers I want and then you fit your model from there again I'll drill into this more later that's really the the kind of the preferred API there's this estimator API that again it looks like our model the trishal are modeling you say I want to create a linear aggressor or a linear classifier you can actually do neural networks with this but they're specified at a really high level you say well I want you know six layers and I want them to be this width it's not anything like the granularity you get with Kerris so they're really the way to do neural networks is with carrots and the way to do some of these more straightforward models as estimators and then the core API actually lets you're actually computing directly on the tensor flow graph way lower level you can see here I'm actually defining my lost function explicitly doing my training loop explicitly so again I don't recommend people use this interface unless they want to in a again interact directly with the graph or do things like like Greta but we have this interface for people who want to use it so let's talk a little bit more about Karis kind of step through an example and talk about some of the different components of the Charis API before I do that I want to just describe a little motivation about why Karis and actually Google is now emphasizing Karis and specially in the last six six or nine months as the preferred interface to tensorflow if you look at what's this is on the left kind of web search interest in deep learning frameworks and you can see in the last year or two the interest in tensor flow and caris has just taken off on the right is interestingly these are this is actually from about a month ago there's a website called archive which has all the research that's done in deep learning is generally published to archive and and there you see a lot of tensorflow but you see a huge amount of Karass so it's really interesting that Kerris is used it's sort of really popular easy to use API for practitioners but it's also used quite a bit by researchers so so it's it's quite it's quite flexible and quite adaptable so that's why we put a lot of energy into the our interface to Kharis so I'm gonna walk you through a cara straining script the different components of it and this will be for this application of recognizing handwritten handwritten grayscale images of handwritten digits the first step is always going to be pre-processing and that's going to basically be giving your data whether it's sitting in a CSV file or it's sitting in a directory full of images or sound files or text documents and somehow getting that into tensors you're going to turn whatever your raw data is into vectors matrices 3d arrays etc so there's a pre-processing step then there's a step where you define the model by basically saying these are the layers I'm going to use in my model and you know it's not the the process of the model definition it looks like oh I just have to type in these five layers it ends up being the case to get a good model you're gonna play with these layers like hundreds of times so it's not a lot of code but it is a lot of effort to get a model that works well which layers stack how with what parameters really determines the quality of the model after you've defined the model you compile the model by basically saying what loss function do I want to use what optimizer do I want to use and what metrics do I want to collect to judge how my model is performing before I go and I want to make one comment about this model compile statement you notice that I'm not assigning the result of that statement back to the model I'm actually modifying the model object in place and that is not typically how things are done in our typically our objects are by value you pass an object in if changes are made to it a copy is returned caressed models don't work that way because they're actually these complex a cyclic graphs of layers layers can appear in more than one place in a model and that those representations are shared so they're their objects that are modified in place so I say model compiled and that actually changes the model that's just you may not quite understand how this code works unless you know that piece about the objects being modified in place all right so we've defined our model we've compiled our model and now we're going to train our model by calling fit fit basically takes the data the x and y and then it has a few other parameters it says how many samples at a time am I going to give to the model in this case I'm gonna give 128 samples at a time how many times am I gonna traverse the input data set again I have to I have to look at the data thousand you know many many many times over to get a good model so here I'm going to say look and scan through all of the data ten times and then the other thing I'm going to specify here is that hold out 20% of the data for validation because if you have a deep learning model that's big enough it will just memorize your data so that's one technique if you let it do it it'll just memorize your data that is not a model that generalizes well you know memorization so what we do is we hold out 20% of it and we constantly test our model against the held out data to make sure it's not just memorizing that the input data so that's what that is so we fit them all you can see we noticed we returned a history object from the fit and we can see that we can plot the history and see how did accuracy and loss change over all the epochs of the models training this is a really good model because accuracy and loss of converging validation is the same as the validation accuracy is the same as the training accuracy so we're not we're generalizing well but a lot of times this looks nothing like this you'll see overfitting you'll see your validation data getting worse as your as your regular training accuracy it's better so this is a diagnostic tool that you use to figure out how well how well training is going on how well your model is working when you're done with that you you can have there's an evaluate step there's a yet another bit of data that's been held out data that the models again has never seen before and you haven't used for tuning parameters that you can use to assess the model and then you can generate predictions just like you do there's a predict method there's predict classes generate predictions what you normally do do an AR from a model so I'll give you a quick demo of how that works looks inside our studio is the same training script and I'll start by loading the day of the amnesty to set the grayscale images that actually comes with Kerris so I just load that up and then I take my data and I reshape and scale it in various ways until I basically got as you can see I've got a few matrices that I'm gonna feed into the model so that's you can see over here I've got my matrices intervene in the model and then I define the model compile the model I can I should print a summary as I showed before in some other slides one of the layers how many parameters this model has two hundred thirty five thousand so it's less complex model than the other one I showed you and then I'll fit the model and as I fit the model we're gonna show you real time metrics because it's just really important because a lot of times models training doesn't go well you start to see the validation accuracy get worse the models plateaued so it's really important to get real-time feedback about how the model is doing as its training and you can usually then you'll interrupt the model and try something else once you get the true the history back you can plot the history and that will give you that same plot you can you can actually get a data frame back and do your own plots evaluation I showed you on the slide and then generating some prediction so that's a basic workflow this is a pretty simple caris training script they can get a lot more complicated than this but this is the kind of demonstrates the basic mechanics for training then this model does actually perform at about ninety eight percent accuracy on handwritten digits which is pretty good for such a simple and straightforward model right okay so getting a little more into the Keirsey API we've talked about layers there's the art of building these models is basically figuring out which layers to use and how to compose them and so there's actually 65 different layers I'm not even gonna bother enumerator layers there's 65 different layers that are built into carrots that do different things and you can actually also create your own layer so you can write custom layers just to give you a sense of categorically the types of layers that are to give you a flavor of some of the differences there are dense layers and again this is the classic like 1970s intro network of perceptrons and all connected you know one layer of neurons connected to another layer of neurons to another layer of neurons they're kind of the staple of neural networks and and even more complex models typically will have these models these dense layers at the bottom a kind of a staple workhorse basic neural network layer there are convolutional layers that are typically used for computer vision the idea behind convolutional layers is that there's there's patterns of data inside an image or some kind of other data structure that are location or translation invariance so like edges or shadows or you know at a higher level like eyes and if you see an edge or a shadow or an eye in one part of the image you should be able to recognize it in other parts of the image so convolutional layers basically learn these filters and the filters are then kind of slid over the whole image to look for these things so once a filter has learn to find an eye if it sees an eye on the upper right-hand quadrant and then sees an eye on the upper left-hand cognate quadrant it can still recognize it so again these are used a lot of times for finding patterns in in computer vision applications but you can see people have used them for time series and and that sort of thing two recurrent layers are basically layers that can maintain state so if I'm looking at a set of input data the current set of input data I'm I'm trying to make a prediction based on that but in the case of sequence data time series data or text data it also matters what I've seen before so recurrent layers or layers that can accumulate kind of a vector of state information they can learn us a vector of state information and that's used to condition the the predictions to the next layer so there's a recurrent layers again used in time series used in a lot of natural language processing applications embedding layers for those of you who do natural language processing you'll recognize this the idea is that if you look at text you could look at text as just a classification so I have the word dog and I have the word cat and they're just two different words and there's 10,000 different words and they're just two of them or you could learn a representation of the word that reflects that the fact that there's a there are semantic relationships like cat and dog they're both house pets they're both mammals so embedding layers try to learn richer representations of kind of categorical data in this case like text data and so there are many layers that can actually learn those vectorization zazz you train or you can actually take pre train word embeddings that have already been developed using other techniques like word Tyvek and load those into embedding layers so that just gives you an idea of like three of the different types of layers there are and and really the trick of building these models is to recognize what kind of problem am i solving and what kind of layers will make sense at what composition of those layers will make sense so the next step is compiling the model and and I've talked about this a little bit before but we're really what's happening here in Kerris is that when we compile a model we're converting the layers into a tensor flow graph and we're then saying we have the layers converted into a graph then we take the lost function we add back to the graph we take an optimizer add that to the graph and then we say what metrics you want to collect and we add that to the graph so this is really just building the graph that we're going to execute the carest API has a whole bunch of loss functions you can actually write your own loss functions as you might expect so different tasks require different loss functions they have a bunch of different optimizer some of which work better for different types of data and different tasks they have a bunch of different metrics built-in metrics and as you'd expect you can write your own metrics that let you evaluate you know the efficacy of your model as you're as you're training it all of this that I just talked about is articulated much much better in the cares for our cheat sheet you don't need again to get this link down because I'll give you a link to the slides this is also easy to find on our tensorflow for our website which I'll give you a link to as well the cheat sheet is a great gives you a great 10,000 foot view of everything that I just talked about and the syntax for it so what are some examples actually using our to solve these problems with the tools the Charis tools that i just talked about we actually have a gallery on the tensorflow for our website and the gallery is a bunch of in-depth blog posts that basically talk about problems the motivation the data set how we train the model what went right what went wrong some intermediate data visualizations for diagnostics of the model speculation but how we can improve it so they're really in-depth blog posts about different application areas all using the carrots for our interface so I think once you get the basics once you learn the basics this is a great place to go next to kind of learn about deeper application of techniques the first thing I want to talk about is actually I talked about this idea of transfer learning where you can actually train on a very small data set and still get a really model that performs really well this is an image classification example that was trying to predict whether an image is a dog or a cat and was actually trained on only 2,000 images and the reason it could do that it actually took a bunch of the of the model that was used for that image that data set has a lot of knowledge about shapes and objects and things that are related to dogs and cats it also knows about you know other objects as well vehicles and household appliances and everything but we're able to build a model on only 2000 input images using this transfer learning technique and this article kind of describes transfer learning there's another article that gives us sort of a primer in the different ways you might do timeseriesforecasting this is actually whether time series data trying to predict temperature and we talked about using these recurrent layers I mentioned stacking recurrent layers there's a thing called bi-directional recurrent layers that try to look at the data in both forward and backward so there's a good primer on some of the techniques you would use in time series this is a really interesting example of trying to classify peptide to figure out in cancer immunotherapy when we introduce cells to treat cancer how they're gonna bind to existing cells so this is a it's a classification problem looking at the peptide saying how's the cell gonna bind and this compares using a regular feed-forward neural network against a random forest and also tries to use a convolutional model but the convolutional model ends up not being better and so there's a little discussion about to actually get a bit get a better result what other things can we try to get a better result this is an example of trying to look at a data set that has 300,000 credit-card transactions I think only 400 of which are fraudulent so you know a typical logistic regression might have trouble picking up enough signal in the 400 fraudulent transactions this is using a technique called an auto encoder to try to try an alternate way to try to figure out which which transactions are fraudulent this is a national natural language processing example that tries to take it's a data set of 400,000 pairs of questions from Chora and tries to figure out are they duplicates this also demonstrating a shiny application as a front-end to a Karass model which is actually quite straightforward another example of using a basic feed-forward neural network to predict customer churn and this demonstrates a couple other interesting things it demonstrates using a package called lime which we'll talk about in a minute that while we can't look at a deep learning model with millions of parameters and say this is how this phenomena works you can actually take individual predictions and you can figure out what features contributed to that prediction so this this is a dashboard that's actually also built with shiny that says for this customer what's their percent risk of churn and then what attributes of that customer actually contributed to that prediction and then finally an example from learning word embeddings which talked about before we've got a database of Amazon and food reviews and this is an example of trying to learn a richer vectorized representation of words than just classifying the words which then can be used later for parsing and sentiment analysis of things like that I want to address a little bit this issue of explained ability because again these models have really no explanatory power when you look at all the coefficients but there are ways to try to understand how a prediction was made by a deep learning model this is an example of a model that's that's predicting what's in the image and it's saying it's an African elephant you can actually look at the gradients of the intermediate layers in the model and you can draw a heat map out of that juxtapose that heat map over the image you can say what parts of that image actually contributed to the to this prediction sort of similar to what I talked about with lime where you can say for an individual prediction what features contributed to it and when you're using these models to do things like you know approved and I credit or make a medical recommendation it's really important even though the model itself sits functioning as a black box it's important to be able to highlight well these are the reasons why these are the things about the situation that caused the model to go one way or the other so kind of back to this deep learning frontiers you know there are many fields of inquiry I think where people are going to use take a run improving the state of the art using deep learning most fields that we work in now unless you're in computer vision or maybe natural language processing traditional statistical methods are going to be cheaper they're gonna be more accurate they're gonna be more more reliable easier so yet there are these frontiers and that's very that's very compelling for a lot of people to say well could we do better could we analyze data that we're not currently able to analyze so I think you can have a few different approaches you can really take a wait-and-see approach and say well when might when I see that it's really obvious in my field that people are using deep learning a lot I'll use it then you could you could follow research and not just formal research but also things people are lots of people are on the internet experimenting and writing blog posts about things could follow that and kind of evaluate it and say do I want to try that in my work or you can even do your own research and try to say I want to really get my head around all this stuff works and I'm gonna be one of the people who's trying to find that frontier so there's a lot of different approaches I really want to highlight again that in a lot of fields that the other than these these kind of staples it's really not there yet and and if you go looking for these big wins it might be really hard to find them I think what does it look like if you are trying to approach that frontier I think typically when people succeed with building a deep learning model they typically take it can take a matter of weeks to actually try all the different things the different architectures the different hyper parameters require to get the model to work well I think when people start using these tools they don't have an intuition and then and then that intuition is sort of developed over time they tend to you need a lot of compute power and you need tend to need tools that help you manage experiments and get the most out of the compute power that you have so it's hard and in many ways it may be the wisest thing maybe to wait until you see people in your field succeeding with it to try to get involved in it but it's um it's really it's really it's it's really really interesting stuff to play with and try out so I think a lot of people are tempted and go and spend the time on it and I think a lot of good things will come out of it we've built a bunch of supporting tools to help with this one of the most important sets tools you need to have especially if you're doing convolutional networks or recurrent neural networks is GPU they're really these models convolutional and recurrent models are really slow on CPUs so there's a lot of different ways to use GPUs if you have a current recent high-end NVIDIA GPU on your local workstation you can use that most people many people vast majority people don't there's other ways you can use GPUs in the cloud in various ways that's probably the easiest way or you can go buy a buy a box and as a GPU that's another way to go so the cloud is a great is a great way to do this because you could work locally on your laptop and then when you need a lot of training and compute power you can get it from the cloud we've made we've done a couple things to enable that we actually have a an hour studio server with tensorflow Amazon Marketplace image and it's not it's it's you know it's free and that you only pay for the Amazon compute cost but this actually has everything pre-installed our studio tensorflow the Nvidia libraries tiny verse got everything pre-installed so this is a great way to play around the GPU with very little investment fixed cost investment there's a package a cloud email package that lets you do training on Google's cloud ml using their GPUs and that's more like a batch submission type service so the so there's different ways to use dps in our and our website covers that in detail the intention for our website covers that in detail we have a package called TF runs I'm not going to actually spend very much time on this but it's used to manage training experiments so just at the highest level you're gonna run hundreds of training runs trying to figure out what hyper parameters are the best what model structure is the best so instead of sourcing the R script that runs the training you call this training run function then we collect data on every run we actually know all the outputs of the run what the metrics of the run were what the what the model was what the source code was and so there's a way of managing all your runs and you can view reports on the runs you can compare different runs you can establish flags for runs and use that for doing grid searches so the affronts package is really good once you once you get past the learning stage you're actually trying to tune a model it's a really good good way to go about doing that the TF deploy package has tools for deploying models again I said we don't when we deploy a tensorflow model there's no R or Python code it's just C++ code so we have tools for working with that again I'm not going to go into that in a lot of detail but you the process is you export your model and then once you have an exported model you can serve it with like a web server that gives you a rest interface to it you can serve it using an open source project from Google called tensorflow serving you can serve it with our studio konnekt you can serve it with cloud ml so it's sort of it this is sort of binary format again sort of language runtime independent that can be served in lots and lots of different ways you can put them inside a shiny application pretty easily and one of the cool things is that again this this idea a model serialization that's not tied to a language means that these models can actually not just run on servers that can run on embedded devices they can run there's a Apple iOS as a thing called core ml you can actually you can actually feed Kerris models directly to core ml so you you use R to train a Karass model and then you deploy it on an iPhone with no R on the iPhone there's a library called Kara CAS that lets you deploy these models into JavaScript into the browser so there's lots going on with taking these models and making them deployable in lots of places so that's that's really really interesting to it so we'll try to build tools that keep up with that and let you do that all of that easily from within R so biggest things I think to take away think of tensorflow yes it's a deep learning library but it's also this numerical computing library that that's hardware independent that's really fast that we can do a lot of stuff within the art community deep learnings done a lot of great things with perception and I think in some fields it's gonna make an impact some some fields sooner some fields later some fields not at all but I think it's going to make an impact and probably most importantly are now has the best deep learning tools the Charis and tensorflow out there we have a great set of api's for them we have a bunch of supporting tools so if you want to get involved in this stuff and your use are there's really wonderful support for that now a lot of the stuff I talked about tonight you can find at the tensorflow for our website easy-to-remember tensorflow to our studio comm again i'm gonna give you a link to these slides so you'll have that link there so that's a good place to start and the gallery I talked about and documentation for all the packages is all there this is important as Jared said there's two books that I would recommend reading if you're interested in this and they're very very different books the deep learning book on the right is actually kind of the definitive work in the field in terms of the concepts and theory behind deep learning the book has no code in it at all it's all concepts and math the deep learning with our book was actually written by the creator of the Charis library and I worked with him I didn't write my name is on the cover by didn't write any of the prose in the book I just adapted he wrote the original book deep learning with Python that was used with Karras and I basically adapted it to the our interface to Kharis so it's deep learning with our this book is a really good conceptual introduction to deep learning it actually has no math all code so it's all everything is done with our code so one one has you know no math all code the other has no code all math depend on how you learn how you like to approach problems you know read them in whatever order you think is appropriate but I think if you read these two books you will know every just about everything you need to know to get started so thank you very much the slides that I just went through are available at the URL I have here I'll keep the slide up and if you are interested in this we have a blog censor flow for our blog and that's where we post announcements about packages and other software releases but we also post all those gallery entries the examples those are also going on the blog too so if you want to kind of stay up to date and see what's happening and see what kind of interesting examples the people in the community are coming up with I would urge you to visit and subscribe to the blog so thank you very much [Applause] questions personal over here thanks for an awesome yeah yeah the tensorflow uses CUDA and CUDA DNN under the hood and typically what you do is you run like nvidia diagnostic tools to see like what your memory and CPU utilization is so you know your and you can't interact directly with CUDA tensorflow is doing that but you can see how the model is taking advantage or not taking advantage of the GPUs and one of the things that can happen is if you get a model that's training really fast then the bottleneck can actually become the ingestion of the data and that's one of the things that TF datasets package addresses is to try to parallelize like I'm reading and pre-processing the data and I'm training at the same time so you can have a model that's like only using 20% of the GPU because all the time is spent in pre-processing that's not getting data to the GPU fast ever yeah yeah that's right yeah yep yeah yeah yeah yeah yeah so the our encoders like learning an alternate representation of the data and so I think the way that one worked is that it took the non fraudulent transactions and it learned this other represented the auto encoded them to another representation and then it fed data through and it looked for like prediction error and it said like by hive dictionnaire on high prediction error indicates it's probably not like the things I auto encoded in the first place and therefore is fraudulent yeah yeah yes yeah yeah unfortunately I think there's a lot of it is like witchcraft a lot of it is really people just try stuff there isn't Theory in fact what you I've heard talks of people at Baidu saying the way we train people is we just give them like hundreds of examples and they just it's like a cockpit simulation they just try things so it's not I think in in a given field it might get narrower than that like in computer vision it's like okay these kind of layers with these kind of parameters tend to work well but like for something like this I don't think there is a there's not a theory or literature that says this is how this is done there probably will be if there's anything to find here there will be in a few years but there isn't there isn't right now yeah bleeding edge that's right yeah it's it should be this exact same because it's it's really just using all the same libraries yeah yeah yeah yeah yeah I think the key selling point is that you like to use our for doing data analysis you like the whole our ecosystem you like working in our studio you know that's really it's because you prefer our and now you can do all the same things in our that you can do in Python I think if you already prefer Python there's not a clear motivation all right it's not because it's really the same exact library just with here's a here's a Python skin and here's an our skin so yes in fact yes do you already know the answer to this coz yes so yes in fact if you google has a thing called Auto ml which literally it starts from no model no model and it learns the whole model the layers diagram prepares everything and they actually have so they're doing that I think they have a service called Auto ml now that I think it does it for computer vision but that is some variation of that it's probably the future that that I mean all the machine learning engineers sitting and trying grid searches of things and witchcraft this and that that's probably not the best way to approach this it's probably some kind of combination of your intuitions that you develop over time and then machines looking and in the space of models yeah yes so there's so Karis has a thing called Karis applications which are pre-trained models so you'll see them it's called Karis applications those are almost all computer vision models and there's a bunch of them those are the easiest to use tensorflow has also I don't know if it's called a model zoo but they have a big collection of models those are harder to use honestly from our than the Charis models and and Karis has ported most of the interesting models so yes with Karis there's something similar I don't know if it's comparable to the depth or breadth of the model zoo that cafe has but it's there yep that's right that's right yes it's a good question so you know when you're writing a custom layer you're basically so if you if you're just composing layers really you just have to train the model and see what happens if you're writing a custom layer or a custom lost function or custom metric then you're basically writing tensor the same sort of code that you write in the lower level interface you're basically doing the same thing but inside a custom loss function or metric or optimizer so those yes the the debugging of those I think will get better with there's a thing in tensor flow coming called eager evaluation which basically will the the one of the problems of the graph is that it's all deferred so you in are nothing's to ferb and you add a you know you a matrix multiplication it happens you see the result there's a thing coming intensive look on eager evaluation will give you the same semantics so then when you're building a custom Kerris layer or loss function you'll just be able to call it like a function and test it and see what it does so I'd say when you're building the custom things it's very similar to that to that lower level interface yeah so the efficiently I'll tell you the way you you basically can build I didn't show this but in carrots they have this concept of a multi input model so you can have a model that actually takes in some vector data some image data some sequence data and so you you again it's I showed you the sequential model which is just like a stack of layers and that's most common but you can build a graph of any level of complexity so you typically will have multiple inputs and then you'll they'll get down to a certain point and then you'll concatenate them in some way and you'll maybe weight the concatenation of them and Samba built-in yeah so like you'll get all the data then you'll flatten them and concatenate them together so yeah that's how you do that yes yeah yeah well it's one model that has multiple inputs so I showed you feeding input data like here's the X data you could pass a list of dates so you can have a model that has three heads right it's got this takes an image this takes some vector data and this takes like a sequence of text and then they all you feed them in in a list and then they all at some point you actually have multi outputs as well so you can have multi inputs and multi outputs so let's say you have a single output at some point those lay their inputs are gonna get concatenated together somehow you're gonna wait them or you're gonna do something that says okay I've got these different factors they they what happens is the standard I see typically like the standardize them you might just flatten them so you've got a tensor it's 2d or 3d you just flatten it out in a bunch of numbers and then concatenate them together it's like a big soup of numbers and then you and then you feed it through the dense layers after that so you're not it you're it's not like you're writing a program you're saying I'm very explicitly like I'm gonna normalize them and put them together you're just gonna like slam them together and put some weights on them and then let the network see if that lets the network build good predictions can't can't can't hear you yeah yeah in computer vision you're talking about computer vision I think people are using these kind of models I don't want to say exclusively but this is like really the predominant way people build computer vision systems now it wasn't five years ago but that's not necessarily the case in like natural language processing and natural language processing it's so mostly traditional things and the people are experimenting with it with this yeah I don't think somewhat I can see and people are trying really hard I think the deep-learning performs really well with like really really messy day that data that's hard to encode it's hard to feature engineer has a lot of noise in it you know that's the kind of data that it does well on but if the data is like vector data that's like well structured and understood it's almost impossible to be you know the tools that we already have and then D blaring doesn't really add much in those cases yeah oh you mean oh that yeah people play with that that validation split depending on the type of data that they're you know that varies depending on what the type of model you're building like how much data you need to hold out I don't know I'm not sure I don't know [Applause]
Info
Channel: Lander Analytics
Views: 890
Rating: 5 out of 5
Keywords: rstats, nyhackr, rstudio, tensorflow, keras, machine learning
Id: eryaBs2dJzk
Channel Id: undefined
Length: 74min 17sec (4457 seconds)
Published: Fri Feb 16 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.