Machine Learning with R and TensorFlow

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This was an awesome talk. It's rare that I see a deep learning talk that's not sensationalized, but JJ did a killer job.

πŸ‘οΈŽ︎ 8 πŸ‘€οΈŽ︎ u/coffeecoffeecoffeee πŸ“…οΈŽ︎ Feb 12 2018 πŸ—«︎ replies

Full day 2 stream is available here:

https://youtu.be/Ol1FjFR2IMU

For the folks who clicked on comments because they saw ML:

2:01:54-ish

Michael Quinn and Sina Chavoshi from Google talk on TF/CloudML/BigQuery.

2:19:24

Javier Luraschi from RStudio talks about TFDeploy, deploying TF models to RStudio Connect, and KerasJS

2:40:53-ish

Kevin Kuo from RStudio talks about SparkML machine learning pipelines in Sparklyr.

2:59:14

Ali Zaidi talks about reinforcement learning and Minecraft

Day 1: https://youtu.be/ogy7rHWlsQ8

πŸ‘οΈŽ︎ 5 πŸ‘€οΈŽ︎ u/scottmmjackson πŸ“…οΈŽ︎ Feb 13 2018 πŸ—«︎ replies

I'm currently into Deep learning with R, by JJ Allaire and François Chollet.

The book is excellent, I know nothing in Deep learning and even Mahine learning in general, and it provides both an introduction for newbies and an in-depth presentation of many concepts. I'm going to read it from start to finish. Highly recommended.

πŸ‘οΈŽ︎ 4 πŸ‘€οΈŽ︎ u/MarcelChafouin πŸ“…οΈŽ︎ Feb 13 2018 πŸ—«︎ replies

great talk β€” very approachable and honest. i particularly like that he de-hypes things and yet gives a very good overview of what is available. good times ahead for the R community.

πŸ‘οΈŽ︎ 3 πŸ‘€οΈŽ︎ u/iconoclaus πŸ“…οΈŽ︎ Feb 13 2018 πŸ—«︎ replies
Captions
you [Music] [Applause] thanks very much Hadley and thank all of you for coming to the conference I hope I hope you had a great first few days and great night last night it's it's a great privilege for us at our studio to be here with all of you this week so today I'm going to talk about machine learning with tensorflow and our for those of you who don't know tensorflow is an open source project from google it actually came out just a little bit over two years ago and from the day it came out I have been excited about what we could do with tensorflow from our and for the last about 18 months me and some others at our studio have been working on our interfaces to tensorflow and I'm really excited today to share a lot of that work with all of you so I'm gonna start by giving an introduction to what kind of what is tensorflow what are the core constituent parts of tensorflow and how does it work at a kind of a low level the principle application but by no means the exclusive application of tensorflow is for deep learning so I want to talk a little bit incestuous deep learning how does it work what is it useful for not useful for and then I want to get into the work we've done in our to provide interfaces to tensorflow for doing both deep learning and for other kind of other types of machine learning so let's start with what is tensorflow and you might offhand just say oh tensorflow that's a that's a deep learning library it's it is that but it's actually considerably more than that tensorflow is actually a very general purpose numerical computing library and in our we've have a long history of providing interfaces to numerical computing libraries the original s language was the motivator for it was to provide interfaces to Fortran numerical computing libraries and today in R we wrap that code we wrap code from glasses we wrap code from the eigen C++ library the armadillo C++ library in our we love numerical computing libraries and we love creating lovely interfaces to them from our so this is a new numerical computing library and has some interesting attributes first of all it's it's open as I said before second of all it's actually Hardware independent so a tensorflow program can run equally well on a cpu and take advantage of all the cores on a CPU and that's actually using the eigen and blast libraries that we already use from our but it can also run on a GPU or multiple GPUs and there's even a thing that Google is created called a TPU a tensor processing unit which is hardware that it's designed to run tensorflow programs and there will actually be other instances of this in the future so hardware independence is a really cool attribute of tensorflow another really cool attribute of it for numerical computing is that supports automatic differentiation and that that's used extensively in the in the in the deep learning parts of of tensorflow the other thing is it was built from the ground up for deployment and scalable deployment so it supports distributed execution it supports very very large datasets so it's a really cool numerical computing library that we can do a lot of things with in our and that's kind of the first bullet point of why our users should care new numerical computing library what can we do with it one of the other cool things about it is that there are many built-in optimization algorithms that don't require that all the data is in RAM so typically the optimization algorithms we use require that we manage to get the whole data set in RAM but now we can have a 10 gigabyte data set we actually feed data to models just a small batch at a time and the optimizer will work correctly so it lets us do modeling and machine learning on very large data sets without having to have a huge amount of RAM another piece of tensorflow that's interesting is that when we typically we think about building a model in our and then we want to deploy it we actually need to bring our code along with it in the deployment but the whole design of tensorflow is that when you build a tensor flow model you can deploy it entirely separate you don't need R or Python or any other code it's just a C++ runtime so this is actually really cool too so the last piece I think we're really good in are at writing interfaces to to all kinds of things in computing and I think we can do a lot of wonderful things with our interfaces to tensorflow so some the basics of tensorflow where does that name tensor flow what are tensors what's flowing just to give you a baseline of what we're talking about here before we get into talking about deep learning so tensors actually everyone here already works with tensors pretty much all the time tensors are just multi-dimensional arrays so in our the core data type vector and matrices those are our tensors so a one-dimensional tensor is just an R vector a two-dimensional tensor is an R matrix and then you get into 3-d and 4-d arrays which are also supports a zero D tensor is a scalar which are does not have a scalar data type but you can think of it conceptually as a vector that's always of length one so you're gonna be dealing with tensors and the good news is we already deal with tensors all the time so there's really nothing new to learn there I want to give some examples of tensor data kind of how it plays out in tensor flow and one of the things to remember if you think about like a data frame each row is an observation so there's always a dimension that's dedicated to observations or samples it so in the case of a 2d tensor I can just take a data frame turn into a matrix and now I've got a 2d tensor that's something we're all familiar with in our time series data is our sequence data is an example of a 3d tensor even though sometimes that's represented in 2d you can think of like a time series object in R if you think about a time series you're actually considering not just the observations but what happens to the observations over time so it's really a 3d entity that you're analyzing it's the features the time steps and then groupings of those image data is an example of a 4d tensor again you might think of an image as a 3d tensor you'd say well it has height and width and depth which is color channels like red green and blue that's 3d but then if you consider samples again you end up with a for D tensor and similarly a video would be like a 5 D tensor because it's multiple frames multiple images so that's kind of how tensors get end up getting represented so what is this the flow business that we're talking about tensor flow programs are not just like an R script that executes you actually build up a data flow graph and what happens is those tensors the data flows through the graph and the operations in the graph and basically their operations in the graph so an example of an operation inside attentively graph is like a matrix multiplication or the addition of a bias term or taking gradients or applying some kind of optimizer so those are like functions that operate on the data and it's and it's all put together inside a graph and I'll kind of explain why that's beneficial in a moment you don't when you work with tensor flow you don't actually program that you can program the graph explicitly there's an interfaces to do that but typically you actually write much much higher level code and I'm gonna explain this our code in detail shortly but this is an example of a model that I've written in R and that's on the right what it actually looks like as a tensor flow graph so the graph is in when you're using high-level api's two tensor flow is basically generated and you don't have to reason about it at the level of the graph but the graph is still there and why is this graph beneficial well we can actually think of other examples that we that we have in our of these sort of intermediate representations so you can think of like a shiny application when you build inputs and outputs and reactives you're actually creating a graph and then shiny execute your graph in a way that's kind of optimal and efficient so the fact that shiny knows the structure of your program allows it to run your program in a way that's really optimal and straightforward when you write a deep liar code that works against a database it generates sequel sequel is sort of this intermediate language that send fed to a sequel query optimizer that then execute the query in an efficient way so similarly what Google wanted to do with tensorflow was if we can get the representation of your model or your program into a graph then we can run it really fast so we can we can run it in parallel we can run it distributed we can look at the operations in your graph and fuse them together when when possible and then the other thing is we can run this without R or Python just with C++ so really the the benefits of the graph are all about this idea of portability performance and scalability so the you know this does create some indirection but I think it has a lot of benefits so what are people using tensorflow for I'm actually going to talk about all of these in turn they're all this our tensorflow for our website a lot of people are using it for deep learning but people are also using it for a lot of classical machine learning these are some of the examples written up and these are all like long-form blog posts that get into things people are doing but you can see some text classification stuff some examples of trying to do predictions on really noisy data there's computer vision applications time series applications and I think there's these examples by actually hopeful and confident the art community when given access to this library is going to do some really additional interesting things that surprised us I have one example of a project called Greta which is similar in aims to bugs or Stan the idea is to write statistical models and fit them with MCMC and so the way Greta works is I write a model statistical model in our not in some kind of separate specialized language and then that model is compiled to tensorflow a tensor flow graph and that actually uses the art tensorflow package and the benefit of this is a I'm writing an R instead of a specialized language but B because it's using tensor flow I can train it on really large datasets I can train it with GPUs I can actually deploy the model so this is an example of something that really has nothing to do with deep learning that people have already done with the art for tensor flow interface and I'm hopeful that there'll be more examples of this sort of thing ok so now let's get into a little bit of what because deep learning is probably the main thing people do with tensor flow and so I think it's important to understand kind of what is it what is it useful for what is it not useful for should we in the art community how much should we care about it and kind of how does it work so I want to cover some of that ground and then I'll talk about the art interfaces to tensor flow so at a really really high level deep learning is taking some input transforming it into some output via successive layers of representation when I say that's really abstract to say input and output what I mean there's like observations or X data and predictions so it's X to Y but how do we do that we do that by taking original input in this case a grayscale image of a handwritten number for that's our input and our output is a prediction about what the digit is the number for we have to successfully transform that input until we get close to the output that's the basic mechanic of how deep learning models work so I'm talking about layers I showed four layers there what exactly is a layer and I'm not going to talk about the mechanics of maybe some of you have seen this before of you know neurons and how they talk to other neurons and all that stuff you can really just think of a layer and all you really need to consider a layer as it's a function it's a data transformation function it does a geometric transformation of data and it's parameterize by a set of weights and a bias just like a linear equation so you can think of each one of these as just a successive transformation of data so that's all a deep learning model is is taking some data and then piling chaining together a bunch of transformations of the data until we actually get an output or in this case a prediction so when I say representations what am I talking about we really want to take our data and transform it in a way that's closer to the domain of prediction that were that we're looking for so this is a really simple example I've got raw data I've got a bunch of points and I want to determine whether the point pretty able to predict whether a point is black or white if I change the coordinate system all of a sudden the problem becomes really simple because everything with X greater than zero is a black point and everything with X less than zero is a white point so the idea is we want to transform our data so it's easier and easier to get close to the actual output that we're looking for the prediction that we're looking for if you work with conventional machine learning models you'll know this as sort of feature engineering where we're trying to transform our data into a form that works better for the prediction tasks and in deep learning the feature engineering is actually done in the layers it's the feature Engineering is actually learned rather than hand hand coded so back to this example of the handwritten digits and I won't I'll explain a little bit later how this works but basically we're taking raw data and we're trying to fill these layers are trying to filter out irrelevant information like how how the image actually looks and it's trying to find other things like where our edges what are the angle of the edges and get closer and closer to the the output or prediction so it's you can think of it as sort of an information distillation pipeline and so this gives us an intuition about where the word deep comes from why do we call the steep learning it's not because it gives us deeper insight or deeper models it's really just talking about this the fact that there's multiple layers of representation a traditional machine learning model might have some feature engineering and then one or two layers a deep learning model could have 20 layers or 30 layers or 100 layers and so maybe a more accurate term for deep learning might be like layered representation learning or hierarchical representation learning the deep learning maybe implies more than it should but it's really just talking about the stacking up of layers and so what is this method achieved and you can see from the list here it's done really really well on a lot of perceptual tasks like speech recognition and image classification and then those in turn have been composed together to do things like trying to build autonomous driving systems to build reinforcement learning systems that play games so it's achieved a lot this is fairly straightforward mechanism and to give you an intuition for why it might be able to do that if you think about a paper ball that's crumpled up and you think about trying to write like write a linear equation for how to UNCHR umple the paper ball it would be really really difficult to do that but if you decompose the on it's a really complex geometric transformation to UNCHR umple that whole ball but if you decompose it into a set of simpler geometric transformations ie layers then it's actually straightforward so human being might just take that paper ball in in 30 or so it's really simple operations on crumple it and that's really how deep learning models work is that they learn these simple geometric transformations that compose together can do very very complex transformations and therefore solve a very complex functions or fit very complex functions so why should we care about this I actually think the the domains that deep learning has proven to perform well in are ones that are not often of interest to our users so these perceptual tasks like vision and speech recognition and these reinforcement learning applications they're not things that most our users are concerned with so it may be that as in our user you think it's cool that you can now do do state of the art computer vision from our but it's probably more interesting to say do does deep learning provide improvements on the techniques we have for our traditional domains and is there data that we analyze that has very complex sequence or spatial dependencies that's hard to model with traditional modeling techniques is there data that requires a huge amount of feature engineering that's potentially brittle to model effectively so I think it's more interesting to think about our our is deep learning applicable and how and when to the things that we traditionally I'll get into some examples of that and a little bit but I think it's important to note that it's definitely proven to be effective at these perceptual tasks but it's not yet proven that it's a widespread benefit in other domains so though certainly people are working hard at it so I want to talk a little bit about the mechanism of how these models are trained what happens and and try to convince you that this is actually a pretty simple and straightforward mechanism and it's actually kind of surprising that it's able to solve the kind of problems that it solves so I'll start with a little bit of the basics of how machine learning algorithms work and then give you an example of kind of how the training loop for a deep learning model works so just to define some terms the way machine learning algorithms typically work is that you start out with a bunch of data you have x and y and you have a known Y and you basically just feed batches of that data into the model and the model basically incrementally improves its coefficients by examining the prediction saying how close was the prediction to the actual value and then adjusting the weights and you know that the training process is a sort of iterative training process that takes advantage of a loss function to evaluate the model and then tweak it over time that's kind of how they work there's their significant kind of differences in orientation between statistical modeling and machine learning and I'm not going to go into a lot of depth about that I have some links here for you to take a look at but I think one that you'll notice right away is that oftentimes statistics is focused on explanation and understanding it's focused on kind of info during the process by which data is generated and machine learning really in a lot of cases is exclusively concerned with efficacy of prediction can we predict things we don't need to explain or understand the phenomena we just want to predict it so as you're going to see these deep learning models are complete total black box models they they do nothing to help you explain or understand phenomena but they work well for prediction so that's kind of where that comes from that orientation comes from so let's take a look at a model this is actually and I'll again I showed this before and I'll get into more detail on this later here's the definition of the layers of a model in R and I'm get I'll get into what these are a little bit later but there's different types of layers I'm going to compose them together and I'm gonna hope that I can train this model to recognize handwritten digits and there's an example of kind of what the model is going to learn throughout its gonna learn these filters that help go from an image into a prediction so how does that actually happen so when a deep learning model starts it has all these different layers the layers have weights the weights interestingly are randomly initialized so in the model at inception it literally has random weights so the predictions and it makes our garbage because the weights are random and so what happens is that input is fed into the model and predictions come out that initially are wrong badly wrong but what happens is that the predictions are measured against the true known targets and with a loss function so we get this assessment of how good the predictions were again initially there are quite bad and what that what then happens is that loss function is used to update the weights and that is the job of an optimizer and as I mentioned before these optimizers can work with just little batches of date 128 elements at a time they don't require that all the days in memory so we feed the data into the model we find out how bad the prediction is we use that to tweak the weights and then we repeat that thousands and thousands of times until we have a model that performs in a satisfactory way so the actual mathematical mechanisms that work here are really straightforward and the basic mechanics of the whole thing are straightforward and this tweet is actually from the creator of the Charis library and he's making the point that there's really nothing complicated going on here but what happens is that once we take this simple mechanism and we scale it up it ends up looking like magic and we return to this geometric interpretation that we had earlier it's basically a mathematical mechanism or machine for taking really complex manifolds of high dimensional data and on crumpling them and the ideas behind this were actually originated 30 years ago we kind of knew about this we've known about this for a a long long time and until about 2012 it actually didn't work very well in practice and I had the trick the thing that changed is that it turned out that we needed to be able to have really really large models with lots of big layers and train them on lots and lots of data and so what happened is that GPUs allowed us to train much much larger models and the internet allowed us to collect a lot more data and that's kind of what changed and made deep learning actually work even though again it sort of was invented conceptually 30 years ago so what I mean by sufficiently large parametric models this model I just showed you the grayscale digit recognizer that model has about 1.2 million coefficients or weights or parameters and that's actually a pretty straight grayscale digit recognition is a lot easier than a lot of computer vision tasks so this one has a 1.2 million parameters a bigger model that can recognize all kinds of everyday objects in color images here has about 138 million coefficients or parameters so when we say sufficiently large that's what we're talking about we're talking about two potentially tens of millions of parameters that need to be learned and you can see why these models have very little explanatory or ability to help with your understanding because you're going to get 138 million parameters and how do you interpret that so that's what I'm talking about and that has made deep learning work very well for computer vision and what's happening now is that people are beginning to apply it to other areas and so as I mentioned natural language processing but people are starting to work on deep learning for time series they're doing it for and biomedical applications they're doing it to create some novel approaches to prediction in some other domains so I want to talk a little bit about those frontiers and kind of where they stand and I think that illuminates a little bit of what our posture toward this this is in our so computer vision is kind of the poster child for what deep learning is accomplished there's a competition that ran from 2010 to 2017 called the image net challenge where machine learning researchers tried to to build models that would lay that would predict what images are and the images were like you know dog cat table chair so when they started the competition in 2010 no no entrance to use deep learning and they had 71.8% prediction accuracy I believe that was the best the best model in that year in 2012 the very first team that used deep learning entered the competition entered the competition and that team beat the rest of the field by 11 percent and that basically sat ignited everybody to start looking at deep learning for computer vision and by the time we got to 2017 at a period of seven years the accuracy and computer vision went from seventy one point eight percent to ninety seven point three percent which is phenomenal that more than anything got people excited about about deep learning and they have actually the competition isn't is not going to exist anymore after 2017 because this particular task is basically considered solved so that's clearly it works really well in computer vision and image classification and you see a lot of applications for it there people are also trying to apply it to natural language processing and it's not clear how good these techniques work there's a lot of research going on so the plot on the right shows the percentage of papers at computational linguistics conferences that use deep learning kind of over the same timeframe went from around 35% all the way up to 70% so again this could be it's a its fad it's trendy everyone's trying to see what they can find out but but if you read this paper that I link to and these slides are gonna be available after the talk you'll see that people are making progress in natural language processing to do different things clearly a lot of progress has been made in language translation and I don't know you may have seen this google has a new neural machine translation system that really significantly beats the kind of state-of-the-art phrase based translation and is approaching human level language translation so that's language translation but there's lots of other things you want to do in natural language processing again people are starting to apply deep learning there there's there are some interesting early results but it's not proven that it's going to completely revolutionize that field similarly people are looking at really complex timeseriesforecasting problems this is a paper that actually uses convolutional neural networks which are the same kind of neural networks that are used for images and so it's looking for kind of spatial dependencies that are invariant across the whole time series so they're trying to use techniques used for computer vision on time series and in this case the data sets they have missed paper they were they were able to beat some some benchmarks of conventional modeling but I have not seen in time series again it's not all of a sudden revolutionising the field and you should always use deep learning for time series that's on the contrary I think there are certain types of time series that it might work a lot better for that's all that's really a matter of exploration and research I have one other time series example of a whole different neural network architecture that's used to before in this case it's forecasting stock prices so there's a bunch of articles like this that you'll find and if you work with a lot of time series data it might be interesting to read these papers and see what where deep learning is performing well and where it's struggling there's a lot of work going on in biomedical applications and this could be everything from patient treatment patient classification to actual fundamental biological processes this paper just came out this month and it's kind of a roundup of the work that's going on there and I think they conclude similarly we it's not there's not evidence we're revolutionising this field or these fields but there are promising advances on the state of the art worth paying attention to this is a very very interesting example this paper just got published this month and it was a study that was done by Google Stanford you have Chicago School of Medicine and the University of California San Francisco and they wanted to see if they could provide better predictions about patient outcomes based on electronic health records and so what they pointed out is that typically statistical models electronic health records are really messy they're uneven you know there's different different features in different patients the data is collected in inconsistent ways they're really messy so typically a huge amount of effort is taken to select out a bunch of features clean up and normalize those features and the new predictions using conventional machine learning or statistical models but what they point out in the paper is that when you do that you're actually discarding a huge amount of data all the data that you all the features you excluded because they weren't consistent or coded consistently things like the the handwritten notes of the physicians it's all thrown out so they said what if we did this what if we built a deep learning model that considered the entire medical record of every patient including the physicians notes and including the fields that are not normalized against each other and see if we could actually beat conventional methods of doing prediction and in this case they did find that they could be state-of-the-art statistical models in a bunch of categories I don't think it was necessarily easy to get this result if you look at the paper Jeff Dean is on the paper he's like the inventor of tensor flow he worked on this so I don't know that anybody could just walk in and get these kind of results but it's uh but it's it's interesting that it's possible it's a very very different approach to prediction it's basically saying that the feature normalization and feature extraction is going to be done by the layers of the deep learning model not by kind of human reasoning about the data and human composition of like a sensible model we're just gonna learn what the model is so this is a really interesting paper to read this again this just came out in the last I think couple weeks there are quite a few problems with deep learning models I've alluded to some of these already the fact that that their black boxes and you can't interpret them you know excludes their utility for a whole bunch of things that we that we want to do with statistics and machine learning and are they can be brittle so there's if you look up adversarial examples there's a really classic one where you see like two pictures of a panda and to the human eye they look completely the same but one panda has been like tweaked in really subtle ways to yield to force the model to have a totally different prediction so you know they're people are trying to overcome adversity are examples with various techniques but it's just important like a human being can look at a panda and it just knows it's something way way way more complex go for human being perceptual tasks these models work but then they're brittle they typically need a large amount of data to perform well like I said we've known about deep neural networks for 30 years and they didn't really work very well until now so they typically need a lot of data there are ways to transfer knowledge from a model built on larger amounts of data to a model that's trained on smaller amounts of data so there are ways to use deep learning for smaller datasets but they typically need a lot of data and they're very computationally expensive to train as well so this creates another problem which i think is going to get worse and be annoying to all of us which is that there's a lot of hype about deep learning there's actually a lot of hype about like AI which as far as I can tell means like I have AI have an if-then statement so it's AI you know it's it's it's awful and unfortunately the the current tools for doing deep learning models have become so good that software engineers or even laypeople with no training and modeling or statistics can actually build these models and they'll actually think that they've built a model that's really great they'll say oh look I've got you know 84 percent accuracy I've built this model I know nothing about modeling I know nothing about statistics or there's nothing about probability but here's a model and it's deep so it's obviously better than whatever whatever anybody else has but but but typically you you can't actually outperform traditional statistical modeling techniques without a lot of effort so usually these models are going to be bad compared to the models that we build and so we're gonna have to deal with this and we're have to patiently explain why well maybe that model isn't the best one for us to use I don't think this is reason for us to throw up your hands and say just we don't want anything to do with this I think what we need to do is promote a more balanced and knowledgeable and nuanced dialogue about what these things are good for and what they're not good for because in spite of these problems it's it does a lot of really useful things you can actually this is a slide that Google uses that shows from the how many model description files which are basically tensorflow models are sitting in repositories all over Google and you can see since the release of tensorflow that is really accelerated across all areas of the company so these models in spite of their problems they're brittleness their black box nature can do super useful things so we want to figure out what useful things they can do for us in our and that kind of brings me to what we've done in our to let you use tensor flow and we actually have a bunch of different interfaces high level and low level we have some tools that help you be more productive with kind of your workflow and managing experiments ways for you to easily use GPUs for training and then hopefully an adequate amount of learning resources so that you can better understand how to use this these tools in your own work so top level we actually have three different api's for tensor flow that one is the Charis api which i'm going to talk quite a bit about that's a very high level interface for defining neural networks the are code that I've showed you so far has been from the Charis API there's another API called the estimator API which does more classic classifiers regressors support vector machines random forests kind of classic statistical machine learning models for tensorflow and then the core API is actually what Greta used this gives you full access to the entire computational graph it's really good for like library builders so not something I'm gonna I'm actually not going to talk about the estimator API or the core API much at all this maps out into a suite of today seven different art packages three for the interface two tensorflow there's this package for doing working with large datasets called TF datasets and then there's packages as I said supporting tools for deployment for managing experiments and then for interfacing with Google Cloud ml and I'll talk about their supporting tools in a little bit so this is I've shown you this code before this is the Charis api high-level api for neural networks and i will go into a lot more detail about that in a minute the estimator API gives you these high-level these almost look like regular are modeling functions really high levels they do a regression to a classification Metron estimators is about and then the core API is you interacting directly with the graph here you can see I'm like building the loss function manually and I'm manually doing the training loop so I don't recommend people use this API unless they're unless you're actually trying to build tools for other people or do some novel like Greta do something novel with tensor for our so let's talk a little bit about Karis I'll give you an example and talk about the different components of the Karis API I wanted to make a little bit of note about why Karis and actually Karis is being at this point and all over the last specially over the last six months is being promoted by Google as kind of the preferred interface to tensorflow Karis works great for end-user applications you can see this on the left is the kind of Google search for deep learning frameworks over the last three or four years and you can see that both tensorflow and carrots are kind of pulling away from other frameworks and on the right you can actually see the use citation of deep learning frameworks in research papers and you can see while tensorflow kinda has the most citations caris even though it's really great high-level easy to use api is also used quite a bit in research so it you can go pretty deep with with carrots as well so i want to walk through kind of what Karis code and our looks like and this is using that eminent example again of hand of handwritten digit recognize recognize the first thing we do is is data pre-processing and this really just has to do with kind of reshaping and scaling data into tensors you know you're almost always going to be dealing with a lot of times are matrices but you're gonna get your data off of disk or off of whatever what you know in from a data frame and you're gonna turn it into tensors that's the deep pre-processing step and then you're gonna define your model so I've shown this before this is the definition of a model which basically says what are my layers and how how are they going to behave so that's the model definition and then you're going to compile the model which basically says what's the lost function and optimizer that I want to use during training and what metrics do I want to collect during training so this is really the heart of like making a Karass model this define the layers and compile it I want to make a quick note about this second statement here model compile you'll notice I do not assign the the result of that expression back to the model I'm actually modifying the model in place that's not conventional are typically in our objects are by value you modify the object and then you return a copy of it that has the modifications but Kerris models are these a cyclic graphs of layers they are stateful they get updated during training I haven't shown this yet but a layer can actually be repeated at multiple places within the model so these carrots models are by reference objects and so the code you'll see when we mutate them when we compiled them or fit them is it's all done in place I just wanted to make that clear because I think when you see this code it's different from maybe other our code that you've seen so now training the training process is done through the fit function and we basically say here's our data and in this case we're saying we're going to feed 128 samples at a time to the model we might have millions of samples or just 120 at a time well we want to traverse the input data set ten times that's what epochs is again we're gonna since we're gonna iteratively learn on batches it's not enough to see the data once we're gonna see it multiple times and then as we're fitting we're gonna hold out 20% of the data and use that to validate that we're not just overfitting to our data set we're not basically a deep learning model can actually like to just memorize the data and so it's giving you a function that really isn't useful cuz all it's done is memorized the data set so we hold out data and we test it against that to make sure we're not just merely memorizing data so that's what training looks like when we're done with training you can see we assigned back to a history object and we can plot the history and see how the training proceeded and how our accuracy and loss improved or didn't improve during training later we can evaluate the model which is taking yet another set of data that we've held out making sure the model has never seen this data and then we test to see how how good the predictions are was the model really an overfit to the data or does it generalize well and then we can use predict or predict classes to actually generate predictions from the model so actually want to show you a little bit of what that looks like in our studio this is a complete end-to-end you know script I just showed you in slides so again when we start well though the damnest data sets actually built in to Kharis so I get the data at raw data in and then I'm going to turn that data into some matrices that I'm going to be able to feed into the model you can see on the side now I've reshaped the data into a bunch of matrices and then I define the model and then I compile the model I can print a summary of the model to see how many parameters I have what the layers are and how they work and finally I fit the model and as the model is being fit you were gonna actually show you the training metrics in real time in our studio and this is actually extremely useful because a lot of times when you fit a model it you're overfitting your accuracy is not improving the model doesn't work and it's really good to be able to visualize that as you fit it so you can stop training and say nope that that architecture doesn't work those hyper parameters don't work I need to try to do something else I can also plot the history and there's actually you can actually get up the data frame behind that if you wanna do various types of custom custom visualizations of your training history evaluate to see what my accuracy looks like on data I haven't seen before and then generate predictions so that's that's just the basic mechanic models can get more complicated but some Kerris scripts are really this simple that this is all you do not to say that the task of actually designing and building and getting the model to work is simple a lot of times it takes a huge amount of iteration so layers is kind of the core construct in Kerris and there's actually 65 different layers available and you can create your own layers this is just a sampling of some of them I'm gonna talk categorically about some of the types of layers that there are there's dense layers which are kind of the staple of neural networks these are actually this these work the same way that traditional neural networks that were written about and used in the 70s work and there are kind of a staple of every every every deep learning model usually if you're using more sophisticated layers at the end of the model you'll have these dense layers and these dense layers are really just like a bunch of weights and biases that are applied and and cascade through other dense layers convolutional layers are used for computer vision most commonly and they're basically abled they're basically trying to find spatial features and they're trying to find spatial features that when they learn like for example what an edges or what an eyeball is that that knowledge is is actually transferable to other parts of the image so I see an eye and a certain quadrant of the image and then I see later on another I in another quadrant I can still recognize that as an eye so the reason is convolutional we're building up these filters that recognize patterns and they were involving sliding the filter across the entire image and you can see an example of that here so again there's better there's there's much better explanations for how this works but you can think of convolutional layers as really good for image processing spatial dependencies that sort of thing recurrent layers are good basically layers that have some memory or state and so if you're dealing with sequence data like time series data or text where it matters not only what I'm seeing now but what I've seen in the past recurrent layers are able to maintain state so they're often times used in in these sequence oriented applications and then embedding layers for those of you who do natural language processing is a way of vectorizing text so one way to vectorize text would be to say I've got an incidence of the word cat an instance of the word dog and they're just classes it's just cat versus dog but another way would be I've got a vector that represents semantic relationships between words so in one of the dimensions of that one of the axes of that vector I've got a thing that says Oh cat and dog are both animals so embedding layers do this kind of vectorization and you can actually learn the embedding jointly with the building the model or you can load a pre trained word embedding like that we conventionally use like word Tyvek you can load one of those embeddings into an embedding layer so there's just a few of the layer types but there's again there's sixty-five layers and a lot of what doing deep learning is is trying to figure out what there's the right composition of those layers and behavior of those layers to get a model that works well so in Kerris once we define our layers we compile the model and what is that really what the compilation of a model is really converting the layers into a tensor flow graph and then applying the lost function and optimizer to those that's what a cop that's what compiled the compile step really does there are a wide variety of loss functions available in Carris and you can actually write your own loss functions so depending on what you're doing you'll use maybe a custom loss or maybe one of the built-in losses or more than one of the built in losses there are many optimizers that are built in again these optimizers don't require that all the memory is in RAM there are many metrics built in you can write your own metrics so these are ways of assessing how the model is training how its behaving how well it's doing all this is summarized in a cheat sheet that we have don't worry about copying this URL down because I'm going to share the link to these slides and this is available from the 1004 our website but that this cheat sheet kind of goes through a lot of the same content that I just went through but I think actually is a much more clear treatment it's really excellent so I wanted to go on and talk about some examples that we've written up of using our the arc Harris API to do deep learning these are all available on this gallery page of our tensorflow for our website and each one of these is a blog post that writes up in detail kind of the motivation what we did what worked what didn't work intermediate data visualizations so a really good way of exploring these different use cases the first one actually demonstrates how to do what I mentioned earlier transfer learning where you hear we're basically gonna try to predict whether an image is a cat or a dog and we're only gonna train on 2,000 input images and the way that's gonna work is that we're gonna use a model that was already built that that knows about edges and things related to identifying things like cats and dogs we're gonna transfer the knowledge of that model into our new model so this is an example of using deep learning with a smaller data set there's another example that talks about timeseriesforecasting of weather temperature temperature data and it talks about different techniques for different eggs you can try to make a time series model perform better including stacking up multiple current layers together bi-directional recurrent layers which look at both both temporal orders of data so this is a good kind of introductory primer to doing time series there's a really cool example we have where we're trying to for cancer immunotherapy classify peptides to figure out how they're gonna bind if they're if they're reintroduced and this is using deep learning we show the same classification tasks with a random forest and with a simple feed-forward neural network and it's a really good introduction to how you might use deep learning with with this sort of data another example is looking for credit card fraud and what's interesting about this is that you've got I think something like half a million transactions and only like two thousand of two thousand of the transactions are fraudulent so this is a way of trying to find signal in those two thousand examples of fraudulent transactions and be able to still get good good predictive accuracy even though we don't have that much data to train on so that's that is this is a text classification example which says we've got questions from Chora are they duplicates or not it's based on a kegel data set one of the interesting things about this we also show how to create a shiny application that's a front-end for this model it's actually quite straightforward to do that and this gives you a sort of simple introduction to doing that another example is using deep learning to predict customer churn and this also demonstrates a couple other interesting things one of which is the line package which I'll talk about in a minute which is a way of it's not a way to understand how the model works but it's a way to understand for individual prediction what features contributed to that prediction so that line package is demonstrated here and we also demonstrate again building a shiny application on top of a deep learning model another example is learning word embeddings from Amazon reviews so this is we've got think these are the fine food reviews data set from Amazon and we're trying to figure out what words are semantically similar so there's just a plot of some of the it ends up being a multi-dimensional embedding but this is an example of doing word embeddings couple examples about addressing explain ability because again while you know 138 million parameter model looking at it doesn't tell you anything about the phenomena it is actually possible oftentimes to understand why a given prediction was made by a neural network here's an example of we're classifying images and we've predicted that this image is an African elephant and actually by inspecting the gradients of the intermediate lairs it's possible to pull out a heat map that says these are the parts of the image that that most caused us to do this classification so then we juxtapose the heat map over the original image and we actually see this is why the model these are the things that contribute to the model saying that this was an African elephant and then similarly lime which allows you to take an individual prediction and say what features contributed to that to that prediction so there is work going on in explain ability of predictions it's not the same thing as I look at the model and it explains the phenomena but a lot of these models are going to be used if they're being used for things like patient care or making recommendations about surgical procedures or approving or denying loans they have to be explainable we have to be able to say this is what we have a machine learning algorithm but this is why the machine learning algorithm is coming out the way it is so kind of coming back to this idea of frontiers many fields have a deep learning frontier that is not yet reached or even well approached and in most of these cases the traditional statistical modeling and machine learning methods are cheaper and more accurate so it's really pretty as once you get out of computer vision and language translation a lot of this stuff is in is in the research stage it's in the speculative stage and I think there's different approaches you could take to that you know right now crew creating an image classifier that has a super high accuracy is this completely trivial so that field that frontier is explored and understood and applicable but in other cases the frontier is far away and you could so you can wait and see wait wait till okay I do this type of timeseriesforecasting it's not proven that deep learning helps me I'll just wait and sit until it is proven and then I'll apply it then another approach could be to try to see what contemporary research is doing see what people you know both formal research and what people are publishing on the web and evaluate that and try that with your data and your work and see if it see if it improves things and yet another approach is to actually try to in your field of endeavor try to approach the frontier trying to see if you can find novel applications for for deep neural nets so all of this requires you know it's great to show examples that do interesting things but it does require a lot of work and tie patience and iteration to get these models to work well so in practice there's a huge amount of experimentation that's done with model architecture with hyper parameters and what happens is people gain better intuition about what sorts of models and layers and other things work well for the class of data problems that they have so it tends to require a lot of iteration it tends to require a lot of compute power and it sort of tends to require some related tools to help you kind of streamline your workflow and make the most of the compute power you have and that's kind of the next piece that I want to talk about is the tools that we've built to help you with the process kind of first and foremost is using GPUs some deep learning models work great on CPUs ones that use convolution or recurrent neural networks tend to be very slow on CPUs so if you go to our tensile VOC tensorflow for our website there's a bunch of resources about different ways you can use GPUs one is if you have a high-end Nvidia graphics card you can do use a GPU on your local system there's cloud services that let you do batch jobs that use GPUs you can set up a cloud server with our Studio server on Google compute or Amazon or Azure and there's even ways to actually have a virtual cloud Linux desktop that has a GPU so all that's covered on our GPU page I won't talk about that further but it's definitely something to look into if you want to explore we have another package called TF runs stands for tensorflow runs and the idea here is that because a huge amount of experimentation is required with deep learning models you're gonna run a ton of training jobs instead of using the source function to run those jobs use the function called training run and what training run does is it records everything about that job the source code the output the performance of the model the hyper parameters are used for the model and then you basically have a data frame at the end for all the model runs that you've ever done and then you can compute on that data frame to try to understand better what's working well and not working well it also gives you reports on each runs that summarize again the code the model the metrics the performance of the model so every single time you do a training run one of these reports is generated and then you can compare so if you find okay I've examined my data frame I found that these are the my best to performing models what was different about them and you can see in this case that's showing a source code diff saying I changed two things in the source code and that accounts for that difference in performance there's another piece called flags which basically says if there's important aspects of my model it's good to externalize those get them out of my source code so that I can kind of pass these flags to my script as a parameter so you can see I'm defining a set of flags here and then I'm using the flags when I define a layer and then when I train the model I'm passing one of the flags they're basically kind of command line parameters for your script that'll you easily vary things and then that in turn enables this this construct called a tuning run that says okay I've got two types of dropout I want to vary in this case I want to vary dropout from 0.2 to 0.4 and I want you to try every combination of those two hyper parameters and then tell me what the best model is and if the number of combinations gets very very large you know into the thousands then you can actually sample from that and you know and only run you know 50 training jobs to rather than thousands of them to try to evaluate what the best model is but these sort of tools end up being critical to getting deep learning models that Performa because you have to try so many different things the cloud ml package is an our interface to Google Cloud ml and Google Cloud ml basically lets you train models sort of you sort of submit batch jobs to cloud ml and then the model trains and what's really cool about it is that they have GPUs and they actually have some really really fast GPUs and even machines that have multiple fast GPUs available and they also have a service for doing this this hyper parameter tuning and I just talked about so the idea is you want to work on the from the comfort of your laptop with smaller downsampled data and then when you want to actually do your training that's going to take three hours you submit it to cloud ml then it takes 20 minutes hopefully instead of three hours similar to training run there's a function called cloud ml train and when you give an R script to cloud ml train it will actually upload the contents of the working directory to cloud ml it will actually figure out what packages you're using and make sure those packages are installed on the cloud email server and then when I do this I can actually specify the kind of hardware I want to use so in this case I'm saying use a standard GPU in this case I'm saying use like super pumped up latest and greatest nvidia tesla GPU and then you kind of similarly once you do a training job you can collect the results you can get all the files that were generated and get a report about how it did you can see all your jobs in a data frame and then once if you're using these flags that i talked about you can actually do hyper parameter tuning so in this case I've defined my flags and I've used them in defining kind of composing my model and I can write a yamo file that I give to cloud ml and I can say here's the parameters here's how I want to vary them and here's the metric that I'm trying to optimize for in this case it's validation accuracy and here's how many trials I want to run so this is a way to search over huge spaces of models and it's all done remotely in the cloud and you kind of just only charged for the the compute that you use in the execution of those jobs so when the job is done you get a summary of all the jobs what the flags were and what the what the accuracy was just you can figure out what how your what architecture and hyper parameters work best for your model we I talked a little bit earlier about how tensorflow models can be deployed without an hour runtime dependency so the TF deploy package has tools that facilitate doing that I won't get into all the mother talk about this and the Interop session one that will go into more detail about this but the idea is you get a model you export it and then you can for example serve that model using at rest a breast HTTP API you can serve it using an open source project that Google has called tensorflow serving you can serve it from our studio konnekt you can serve it from cloud ml so these models the nice thing is these models can be served from a whole wide variety of different sources and again there's no our runtime dependency for serving the model it's very easy to embed these models inside shiny applications a couple examples of that on the gallery and then what's really cool is you can actually deploy these models not just to servers but you can deploy them to embedded devices you can employ them to mobile phones you can actually deploy them into the browser there's a library called Kerris j/s and a library called deep learning j/s that lets you take tensorflow and caris models and run them directly in JavaScript so this idea that the model is this graph that's independent of the programming language used to create it is a really really powerful one and I think hopefully one that we can take great advantage of in our again there's a talk talking about deployment and Interop session so what are the big big takeaways I hope you internalize a little bit from this talk one is think of tensorflow not just as a library that people use for deep learning it is a general-purpose computing library and it has lots to offer us and we should be exploring those things like what has been done with the grete project deep learning has made a lot of great progress in some fields and a lot of people are working hard at at making progress in other fields and it's likely to increase in importance in those fields it's hard to say exactly how much but I think it's something that a lot of people are working on it's going to be relevant to our work and I think most importantly the work that we've done for the tensorflow our interfaces we have a great set of high-level api's different levels we have high level and low level so doing deep learning in R is now really easy and you have the state-of-the-art tools you have the tools that are used by researchers and practitioners the best tools used by researchers and practitioners today and we have low-level interfaces for doing more innovative things there are a couple more talks in the interrupt track at 11:00 there's a talk from Michael Quinn who's from Google and he's going to talk about modeling predictions on Google store data using the tools that we talked about today and then we have this deployment talk that comes immediately after that so check those out if you're interested there's a website that we have tensorflow at our studio comm that really has a lot of the content that I covered in this talk obviously much deeper web site covers kind of everything you need to know about tensorflow and our and I this is actually really important if you want to get into this I there are two books that I highly recommend that you read they're very different books one is this book deep learning with Arthur just came out a couple days ago and my name's on the cover but I didn't write any of the prose in the book I just that it's written by the Francois who's the gentleman who created Karis and I took his his book his Karras book which covers the Python interface and I worked worked with him to adapt it to our so this is a book that really covers the concepts behind deep learning mostly it's a conceptual book but all of the examples are done in our using the Charis interface there's actually no math in this book at all it's all of the mathematical intuition is provided by showing you our code so that's one book great conceptual introduction all code no math the other book is kind of the most important book I think right now in the field sort of providing the conceptual foundation for deep learning this book has no code at all it's all math so so depending on how you learn and how you like to approach things read one or the other first but these are very different books but I think they're both they're both essential so thank you very much these slides are available at the URL displayed here and the other thing I would urge you to do is if you're interested in this stuff on our tensorflow for our blog we're posting new examples and case studies all the time so if you want to kind of keep up with the work that we're doing and other people are doing subscribing to that blog is a great way to do that so thanks very much and I'd love to hear what questions people have [Applause] you [Music]
Info
Channel: RStudio
Views: 77,560
Rating: 4.9796214 out of 5
Keywords: Machine Learning, TensorFlow, Keras, Deep Learning, RStudio, rstudio::conf
Id: atiYXm7JZv0
Channel Id: undefined
Length: 59min 44sec (3584 seconds)
Published: Tue Feb 06 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.