TensorFlow for JavaScript (Google I/O '18)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone thanks for coming today to see our talk my name is Daniel my name is Nikhil and I'm Nick and today we're very happy to talk about JavaScript all right so if you're in the world of machine learning if you're doing training a model or anything about machine learning you're most certainly dealing with Python right and for a good reason Python has been one of the mainstream languages for scientific computing for the last decade it has a lot of tools but it doesn't have to be that way it doesn't have to end there today we're here to convince you that the browser and JavaScript have a lot to offer to the world of machine learning and pencil flow playground is a great example of that how many people here have seen this visualization quite a few I'm glad for those of you that haven't seen it tensorflow playground is in browser neural network visualization shows you what is happening inside the neural network as the network is training and when we release this this was kind of a fun small project when we released this it had a tremendous success especially in the educational domain and even today we're getting emails from high schools and universities around the world thanking us for building this and they're using it to teach students beginners about machine learning when we saw the success of tensorflow playground we started wondering why is it so successful and we think the browser and JavaScript have a lot to do with it for one thing that's very special about the browser it has no drivers and no installations you can share your app with anyone and all they have to do is click on a link and see your application another important thing about the browser is highly interactive in the playground app you we have all these controls drop down controls that you can change and quickly run different experiments another nice thing about the browser it runs on laptops it runs on mobile devices and these devices have sensors like the microphone and the camera and the accelerometer and they're all behind standardized web api's that you can easily access in your web app we didn't take advantage of this in the playground but we'll show you some demos later and most importantly when you're building web apps these apps run client-side which makes it easy to have the data stay on the client and never have to send it back to a server you can do processing on device and this is important for privacy now to come back to the playground example the library that powers this visualization is not tensorflow it is a small neural network library 300 lines of JavaScript code we wrote we wrote for this visualization and it wasn't meant to be reusable or to scale for bigger neural networks but it became clear to us when this became so successful and popular that we should go on and build such a library and we went and built deep learn j/s we released it August 2017 we figured out a way how to make it fast and scale by utilizing the GPU of laptops and cell phones through WebGL for those of you that are not familiar WebGL is a technology originally meant for rendering 3d graphics and the library that we built allows for both inference and training entirely in the browser when we release deep lunges we had an incredible momentum the community went instantly with it and took pre-trained models from Python and ported it in the browser one example I want to show you of that is the style transfer demo someone went and took the pre trained model and this demo has an image a source image on the left as artists on the right and then it can mash them together and they made this in a crea interesting application another demo is people would take models that read a lot of text and then they could generate new sentences and then they ported it in the browser and explored novel interfaces how you can explore all the different endings of a sentence in the educational domain people took the standard convolutional neural nets and built this fun little game where you can train your own image recognition model just by using the webcam and this was very popular and lastly researchers took another example they took a phone generation model that can generate new fonts previously it was trained in a lot of font styles and then they build this novel interface in the browser highly interactive where you can explore different types of fonts now building on that incredible momentum we had with deep lunges about a month ago at the tensorflow dev summit we announced that we're joining the tensorflow family and with that we introduced a new ecosystem of tools and libraries around javascript and machine learning called tensorflow jeaious now before we dive into the details I want to go over three main use cases of what you can do with tensorflow j/s today you can write models directly in the browser we have sets of api's for that you can also take pre trained models that were trained in Python or other languages and then port them for inference in the browser you can also take these existing pre-trained models and retrain them do transfer learning right there on device to give you a schematic view of the library we have the browser that does all the computation using WebGL tensorflow jeaious has two sets of api's that sit on top of this a core api which is a which gives you low-level building blocks linear linear algebra operations like multiply and add and on top of it we have layers api which gives you high-level building blocks and best practices to build neural nets and on top of that because there's so many models written in Python today we our tools to take an existing model a Karass model and attends a floor safe model these are two formats that are very popular server-side and these tools will automatically convert that model to run in the browser now to give you an example of our core API I'm gonna show you how we're gonna go over a code that tries to train a model that fits a polynomial curve and we have to learn the three coefficients a B and C now this example is pretty simple but the code walks you through all the steps of how you would train such a model and these steps generalize across different models so we import tensorflow for tens of flow / tens of flow j s for those of you that are not familiar this is in standard es6 modern javascript import we have our three variables that we're trying to learn a b and c we mark them as variables which means that the optimizer the machine the machinery that runs and trains our neural network can change these variables for us we have our function f of X given some data it will compute the output this is just a polynomial function quadratic function on top of the standard API like TF identity of multiply we also have a chaining API chaining has been very popular in the JavaScript world so you can call these methods these mathematical methods on the tensor itself and this reads better closer to how we write math so that's our model to train it we need a loss function and in this case we are just measuring the distance between the prediction of the model and the label the ground truth data we need an optimizer this is the machinery that can optimize and find those coefficients we specify a learning rate there and for some number of epochs passes over our data we call optimizer that minimize with our loss and our f of X and Y's so that's our model now this is clearly not how everyone writes machine learning models today over the years we've developed best practices in high-level building blocks and new api submerged like TF layers and caris that makes mod much easier to write these models and for that I want to walk over our layers API to show you how easy it is we're going to go through a simple neural network that sums to numbers but what's special about this network is that the input comes as a string character by character so ninety plus ten is the input to this network being fed as a string and the network has an internal memory where it encodes this information it has to save it and then on the other end the neural network has to output the sum 1 0 0 again character by character now you might wonder why go through such a trouble to train this neural network like this but this example forms the basis of modern machine translation and that's why we're going over this to show you the code we import tensorflow from desert flow Jas we have our model we say TF dot sequential which means it's just a linear stack of layers the first two layers I'm not going to go into details but those two are building blocks that can take these strings into a memory into an internal representation and the last three layers take this internal representation and turn it into numbers and that's our model to Train it we need to compile it with a loss and optimizer and a metric we want to monitor in this case accuracy and we call model that fit with our data now one thing I want to point out about model dot fit training can take for this example it can take 30 or 40 seconds in the browser and while that's running we don't want to block the main UI thread we want our app to be responsive this is why model dot fit is an asynchronous call and we get a callback once it's done with the history object which has our accuracy as it evolved over time now I went through examples of how you write these models in the browser but there is also a lot of models that have already been written in Python and for that we have tools that allow you to import them automatically before we dive into the details I want to go through show you a fun little game that our friends at Google brand studio built called image called emoji scavenger hunt and this game takes advantage of a pre trained model a convolutional neural network that can detect 400 items and I'm gonna walk over to a pixel phone and open up a browser just to show you that test flow J's can also run in a browser because we're using standard WebGL and I'm gonna ask here any kill on my right to help me out here because I'm gonna need some help now to give you some little details about the game it shows you an emoji and then you have to go with your camera run around your house and find the real version item of that emoji before the time runs out and there is a neural network that has to detect that all right shall we start all right let me see we're gonna play it here live all right we have to find a watch 20 seconds all right that's great whoo all right let me see what's the next alright we need a shoe come on Phil do things buddy yay let's see what our next item is banana we have 30 seconds to find a banana does anyone have a banana anyone have a banana Oh awesome we got a banana over here fingers man there we go all right awesome all right so we're our high scores going up let's see what our next item is beer beard Daniels 12:30 in the middle of Io let's get back to the talk man alright man alright so let's talk a little bit about how we actually built that game switch back to the slides here okay so what we did was we trained a model in Python to predict from images 400 different classes that would be good for an emoji scavenger hunt game these are things like a banana a watch and a shoe the way we did this was we took a pre trained model called mobile net and now if you don't know what mobile net is it's a state-of-the-art computer vision model that's designed for edge devices it's designed for mobile phones so we did is we took that model and we reuse the features that I would learned there and we did a transfer learning task to our 400 class classifier so then once we do that we have an object detector so this object detector lives entirely in the Python world so the next step of this process is to actually take that and convert it into a format that we'll be able to ingest in the web and then we'll skin the game and add sound effects and that kind of thing so let's talk a little bit about the details of actually going through that process so in Python when we're actually checkpointing and training our model we have to save it to disk so there's a couple ways you do this the common way with tensorflow is to use what's called a saved model details are not important for this talk here but the idea here is that there are files on disk that you need to write daniel also mentioned that we support importing from Charis now caris is a high-level library that lives on top of tensorflow that gives you a sort of higher level API to use these things details unimportant there are also files on disk that it uses to checkpoint all right so we have a set of files and now the next step is to actually convert them to a format that we can ingest in a in a web page so we have released a tool on pip called tensorflow J's inside of that tool we have some converging scripts all you do is run the script you point it to those saved files that you had on disk and you point it to an output directory and you will get a set of static build artifacts that we'll be able to use on the web the same flow holds for Karis models you point to your input h5 file and out pops a directory of static build artifacts alright so you take those static build artifacts and you host them on your website this is the same way you would host pngs or CSS or anything of that sort alright so once you do that we provide some api's in tensorflow j/s to load those static build artifacts so it looks something like this for tensorflow save model we load the model up and we get a model object back that model object can actually make predictions with tensorflow JS tensors right in the browser the same flow holds for Karis models we point to those static build artifacts and we have a model that we can make predictions on ok so under the covers there's actually a lot going on when we convert these files to a format that we can ingest in the web we actually are pruning nodes off of that graph that aren't needed to make that prediction this makes the network transfer much smaller and our predictions much faster we're also taking those weights and sharding and packing them into 4 megabytes this means that the next time the browser loads that page your waste will be cached so it's super snappy we also support about 90 of the most commonly used tensorflow ops today and we're working hard to continue supporting more and on the Kara side we support 32 of the most commonly used Kara Slater's during that importing phase and we also support training and evaluation of those models so computing accuracy once you get them in of course you can also make predictions as well all right so I want to show you a demo before I bore you any more this demo is built by our friends at Creative Lab as a collaboration between them and a few researchers on Google so I'm gonna go back over here to this laptop okay so the idea of this model is it takes a 2d image of a human being and it estimates a set of key points that relate to their skeleton so things like your wrists point the Centers of your eyes your shoulders and that kind of thing so I'm just gonna turn the demo on here so when I do that the webcam will turn on and it's gonna start predicting some key points for me and I'm gonna step back here so you can actually see the full thing and as I move around you'll see you know the skeleton change and make some predictions about me all right so there's obviously a lot you can do with this we're really excited to show you a fun little demo it's very it's very thin and what's gonna happen is when I click this slider we're gonna move to a separate mode where it's going to look for another image on the internet that has a person with the same pose as me okay let's turn that on is it gonna work here of course it's not working now okay well we have a physical installation of this which you can go check out it's at the experiments tent at Allen H and it's really fun it's a full full-screen version of that where you can see another version of you we have released this model on NPM so you can go and use this and you need no machine learning experience to do it the API lets you point to an image and out pops an array of key points it's that easy so we're really excited to see what you do with that okay so there's a lot you can do with just porting the models to the browser for inference but since the beginning of deep learning is and tensorflow jess we've made it a high priority to be able to actually train directly in the browser this opens up the door for education in interactivity as well as retraining with data that never leaves the client so I'm gonna actually show you another demo of that back on the laptop over here okay Daniel you want to come help me here we go cool okay so the game is in three phases in phase one we're going to actually collect frames from the webcam and we're gonna we're gonna do is we're gonna use those frames to actually play a game of pac-man okay so Daniel once you start collecting frames what he's going to do is he's going to collect frames for up down left and right and those are going to be associated with the poses of with the four controls for the pac-man game itself so as he's collecting those were saving some of the images locally and were naturally training them all yet so once he's done actually collecting those frames we're going to train the model and again this is going to be trained entirely in the browser with no requests to a server anywhere okay so when we actually train that model what's gonna happen is we're actually going to take a pre trained mobile net model that's actually in the page right now and we're going to do a little retraining phase with data that he has just collected so why don't you press that train model awesome our loss value is actually going down it looks like we've learned something okay so the phase three of this game is to actually play so when he presses that button what's gonna happen is we're gonna take frames from that webcam and we're gonna make predictions given that model that we just trained why don't you press that play button and we'll see how it goes so if you look in the bottom right of the screen you'll actually see the predictions happening so it's it's highlighting the the control that it thinks it is and you'll see him actually playing the pac-man game now so obviously this is just a game but we're really excited about opportunities for accessibility you can imagine a Chrome extension that lets you train a model that lets you scroll the page and click now all of this code is online and available for you to go and fork and build your own applications with and we're really excited to see what you do with it all right men we gotta go back to the talk all right okay so let's chat a little bit about performance what we're looking at here is a benchmark of mobile net 1.0 running with tensorflow Python this is classic tensorflow not running what tensorflow jazz we're thinking about this in the context of a batch size of one and the reason that we want to think like that is because we're thinking of an interactive application like pac-man where you can only read one sensor frame at a time so you can't really batch that data on the first row we're looking at tensorflow running with cuda on a 1080 GTX this is a beefy machine and it's getting about three milliseconds per frame and I want to mention that the smaller the bar the faster it is in the second row we're looking at tensorflow CPU running with avx2 instructions on one of these MacBook Pros here we're getting about 60 milliseconds for a frame there all right where does tensorflow jess fit into this Victor well it depends so running on that 1080 GTX that beefy machine we're getting about 11 milliseconds for a frame on tensorflow J's running on an integrated graphics card on one of these laptops we're getting about 100 milliseconds for frame I just want to point out that 100 milliseconds is actually not so bad that pac-man game was running with this model and you can really build something interactive with that the web is only going to get faster and faster there are new set of standards like WebGL compute shaders and web GPU which gives you much closer to the metal access to the GPU but the web has its limitations you live in a sandbox environment and you can only really get access to the GPU through these api's so how do we scale beyond those limitations with that I'm going to hand it off to Nick who's going to talk about hollow scale thanks to kill today we're launching tensorflow support for nodejs thank you we're really excited to bring and easy to use high performance machine learning library to JavaScript developers the open source community around nodejs and npm is really awesome there's incredible movement in the space a ton of libraries and packages for developers to use today and now we're bringing ml to this front the engine that runs nodejs v8 is super fast it's had tons of resources put into it by companies like Google and we've seen the interpreter be up to ten times as fast as Python lots of room for performance improvements also using tensorflow gives us access to really high-end machine learning hardware like GPU devices and TP use in the cloud look for support for that soon let's step back and look at the architecture we highlighted earlier we have a layers API and in a little bit lower level core API that has our ops this whole runtime is powered by WebGL in the browser but today through NPM we're shipping a package that gives you tensor flow that gives you access to those TP use the GPU and CPU all of this is through our NPM package to show you how easy it is to use our node bindings I want to show you a little code snippet this application function right here is a very common server-side request response handler for an endpoint those who have worked with the Express framework know exactly what's going on here our endpoint listens for slash model and takes input as a request which we pass into a tensorflow j/s model and that output is pushed out into the response now to turn on high-performance sensor flow code we only need two lines of code an import which loads our binding and in an existing API call to set our back-end the tensorflow now this model is running with the performance of tensorflow what works today out of the box so we can take pre-existing Python models and run those natively and nodejs the models we've kind of showed off today all of those will run in our node runtime there's no need to bring up Python stack to your node infrastructure just run into JavaScript our MPM package today ships an off-the-shelf CPU bill there's no additional drivers to do just install our package and you're up and running our whole API we ship in tensorflow GS will work with our node runtime every API we've showcased will work today out of the box now we've built a little demo using Major League Baseball data and no GS to showcase what you can do with machine learning no GS and JavaScript we've used Major League Baseball advanced medias pitch FX data set to do some machine learning the pitch effects data set is this large library of sensor data about pitches that baseball players have thrown in actual baseball games for those that aren't super familiar with baseball we've given a little context here a pitcher will throw different types of pitches to fool the player who's trying to hit this ball there's pitches that have higher velocities there's pitches that are a little slower but have more movement in this example we've sort of highlighted the fastball to change up in the curveball those are all types of pitches don't get too hung up about the details of baseball what we're really trying to solve here is a very classic machine learning problem of taking sensor information and drawing a classification to that so for that I'm gonna actually showcase this demo Oh rent so on one side of my screen I have a terminal which I'm going to start my note application with and on the Left I have a web browser we've built this a very simple UI that listens to what our server is doing over sockets those have used the socket IO library and node know what this interaction is doing so I'm just gonna type node start my server and now alter node my model is up and running and training every time we take a pass through our data set we report that over the socket to our client and you can see the blue bar is moving a little bit closer to 100% that's our model understanding how to tell the difference between a curve ball or a fastball and as you can kind of see every step it moves a little bit differently and our model is having a little bit of trouble at the moment with the fastball sinker but it's only looked at data for a few passes the more this server runs the better it gets at training so all of this training data we've shown is historical baseball data from 2015 to 2017 I'm going to hit this test live button and this is going to use with ease the node framework to go out to Major League Baseball pull in some newer data some live data and run the evaluation so once this data comes in we're gonna see the orange bars and the orange bars shows how much better we are at predicting data that our model has never seen before these are live pictures and we're really good at I didn't find that curveball and not so great at that fastball sinker still so let's jump back and continue to look at our next slide this is the architecture of what we were doing in that demo we've built a very simple node server that hosts our javascript models and when we hit that live button it reaches out to Major League Baseball pulls in that live data and runs inference on those new pitches we continue to just report that to any connected clients through the browser where does this stack up and perform it so in that training set we were looking at 7,000 pitches and we were training every couple seconds 7,000 pitches so that's an interesting benchmark but let's actually classify that with the mobile net benchmark we showcased earlier so these are the numbers for Python tensorflow the GPU and the CPU inference time so we're just getting started we've just launched an NPM package we have a lot of a ways to go but we have some promising early numbers to showcase tensorflow when no GS is exactly as fast as Python tensorflow so with that I'm gonna hand it off to Nikhil to wrap up thanks Nick exciting stuff okay so let's recap some of the api's and libraries and tools that we have supported today with tensorflow jazz we have a low-level api called the core API which is a set of accelerated linear algebra kernels and a layer for automatic differentiation we saw an example of that with the polynomial Russian demo we also have a high-level layers API that encodes machine learning best practices into an API that's much more easy to use and we saw an example of that with that edition RNN translation demo we also have showed you a couple ways that you can take pre trained models from Python via saved model or via Kerris models and port those to the browser for inference or doing retraining or competing accuracy and of course we also showed you the new node J s support for tensorflow jazz today and we're really excited about that okay so this project tensorflow Jess was not just the three of us on stage here it was a cross team collaboration between many amazing people at Google and we also have some amazing open source contributors that I want to send a shout out to this project literally would not have been possible without them so thank you all of the things that we talked about here today the demos all the code is open source on our website J s tensorflow org all of the code is also open sourced on github github calm / tensorflow / TF j s we also have a community mailing list this is a place where people can go and ask questions post their demos and and that kind of thing we have some office hours today at 3:30 and we have some office hours tomorrow at 9:30 which I invite you to come and talk to us in person and we also have some tents we have the experiments tent that's @h that's where that move mirror demo the full-screen version of that will be and we also will be at the AI 10th as well so we are really excited to build the next chapter of machine learning and JavaScript with you thank you [Music]
Info
Channel: TensorFlow
Views: 39,752
Rating: undefined out of 5
Keywords: type: Conference Talk (Full production);, pr_pr: Google I/O, purpose: Educate
Id: OmofOvMApTU
Channel Id: undefined
Length: 33min 39sec (2019 seconds)
Published: Wed May 09 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.