Machine Learning for JavaScript Developers 101

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is jason mays i'm the developer advocate for tensorflow.js here at google which basically means that if you're using machine learning in javascript in some shape or form out in the wild there's a good chance we'll cross paths at some point now with that today i'm going to talk to you about using machine learning in javascript of course so let's get started now first up i want to talk about how machine learning has the potential to revolutionize every industry not just for tech ones but all of them in fact we could be standing right here at the beginning of a new age we've already been through the industrial and scientific revolutions but what about the future there could be a machine learning one too and we could be at the very beginning of that right now this is a really exciting time to start learning about machine learning as you can jump on the bandwagon early and really get involved and have impact of course before i get started on that what's the difference between artificial intelligence machine learning and deep learning i'm sure many of you today have very different backgrounds and it's important to understand what all this is all about and where it comes from and what all these key terms mean so we can understand what we're going to be making later on now first off i want to start with artificial intelligence also known as ai this is essentially the science of making things smart or more formally human intelligence exhibited by machines but this is a very broad term in fact and right now we're actually in a place of narrow ai this basically means that the system can do one or a few things just as well as a human counterpart could do in that niche area such as recognizing objects and a great example of that is when people in the medical industry are trying to understand what brain tumors look like nowadays experts use machine learning to actually work alongside them to help point out what parts of an image may contain a brain tumor for example and this leads to better results because sometimes it's just too grainy for the human eye to see but ml can pick up on these fine differences which leads to better results for both the patient and of course the doctor now machine learning on the other hand or ml in short is an approach to achieve artificial intelligence that we just spoke about on the previous slide now the key part about these systems is that they can be reused and this is done by creating systems that can learn to find patterns in the data presented to them this is at the implementation level if you will so if you have an ml system that is trained to recognize cats you can use the same system to recognize dogs just by giving it different sample training data so if you just roll back to traditional programming as you can see on this slide here you can see that in the old days we'd use lots of conditional statements in order to find spam emails for example if the email contains a certain word mark it as spam now this is not very efficient because the spammer can just change the word slightly and get around those conditional statements now fast forward to today and machine learning programs essentially get tons of emails to classify which are marked as spam by you and it tries to find what attributes of those emails led to it being classified as spam all by itself so now there's no battle between programmer and spammer and instead the end user can concentrate on making great software instead so what common use cases are there then well actually there's quite a few these are the typical use cases i see machine learning being used for about others of course but we've got things like computer vision like the object detection example we just spoke about we've got numerical things like regression predicting a number natural language for example uh text toxicity or sentiment analysis we've got audio for speech commands for example and my personal favorite is generative which is essentially things like style transfer and the creative kind of applications of ml and you can see on this slide an example from nvidia whereby they are generating human faces and these faces do not actually exist in the real world it's been trained on celebrities in this case and you can see how now this research can actually produce very cool imagery so what about deep learning essentially deep learning is a technique for implementing machine learning that we just spoke about on the previous slide and one such deep learning technique is known as deep neural networks so you can think as deep learning as the algorithm you might choose to use in your machine learning program essentially so if you haven't heard of deep neural networks don't worry essentially these are just programming structures for our arrange in layers that are loosely trying to mimic how we believe the human brain to work essentially learning patterns of patterns and we get into that in more detail later in the talk so in summary here you can see how all these terms are actually interlinked we have the deep learning that feeds into the machine learning so the algorithm that goes into the implementation and that machine learning gives us this grand illusion of artificial intelligence which is what we're trying to aim for longer term and these terms actually go back to the 1960s and 50s it's not anything new it's just that now we have the power with all the cheap processes and memory uh that we can actually make use of these techniques at scale with all the data that we now have this previously wasn't possible in the older days so how do we train machine learning systems and that's a great question essentially we need features and attributes and you can see here from this example if you just pretend to be farmers for a second trying to classify apples and oranges two features or attributes you might want to use would be weight and color these things are easy to measure digitally and can be accessed at scale so once you've got those if we go back to our high school maths we can try and plot those features and attributes on this 2d graph here and we've got weight on the y axis and color on the x and you can see how the green apples and red apples kind of clustered together there at the bottom in their respective color spectrums and then the oranges because they're juicy they're actually slightly higher up on the weight uh axes there and we can draw a line to separate the apples and oranges apart and in a way this is actually a very naive form of machine learning if we could get a computer to figure out the equation of that line because if we now classify a new piece of fruit we take its weight and its color and we plot it on this graph if it falls above the line we can say with some level of confidence that that piece of fruit is an orange and if it falls below the line we can assume it's probably an apple and that's kind of what is going on in all of these systems the machine learning is essentially just trying to figure out the best way to separate the data so that it can classify it later on what about bad features and attributes it's not always obvious what you should choose here and here is a great example ripeness and number of seeds this could lead to a scatter plot as you see on the chart right now and there's no easy way to separate this data with a straight line or even a curved line for that matter and this is a good example of a bad choice of features and attributes and you might be like well why jason would you choose such things and it's not always as simple as apples and oranges imagine those brain tumors we're talking about earlier on what features and attributes would you use to be able to distinguish a positive from a negative result in that case it gets very hard very quickly and this is known as feature engineering to find the set of features and attributes that give you the best separation in data and that's what folk get paid a lot of money to figure out properly but what about higher dimensions in our simple example we had just two dimensions let's assume we had three in that case we'd need to plot it on a three-dimensional graph as you can see on the right-hand side and here we instead of using a line we need a plane or a rectangle in 3d space if you will to be able to separate the data in a meaningful way now it's actually interesting to note that most machine learning problems are actually using much higher dimensions from three now unfortunately our human brains just can't comprehend uh what that looks like but you have to trust me if the math is actually the same and instead of using a plane you're using something called a hyperplane and that just means it's one dimension less than the number of dimensions that you're working with but the math works out the same and you're using this high dimensional space and dividing it up in much the same way so it should be easy right we've got a dog we've got a mop what could possibly go wrong well some dogs look like mops and vice versa and my point for bringing this up is that you've got to be aware of the bias in your training data one of the biggest challenges you'll face is not finding enough training data that is unbiased for the situations you want to use it in so in the case of recognizing a cat something as simple as a cat you might need to have 10 000 images of cats of different breeds different stages of a life cycle different shapes sizes in different environments different lighting conditions taken on different cameras all this is required to have the best chance of understanding what cat pixels actually are and without that you may end up having biases in your machine learning model which would be very bad the other point to note here is that data is not always imagery it could be tables of data with text or sensor recordings sound samples and pretty much anything else you can think of as long as it can be represented numerically we can use it in an ml system so that brings us of course to javascript why would you want to do machine learning in javascript and that is a great question too in fact javascript can run pretty much everywhere in the web browser on the server side desktop mobile and even internet of things and if we dive into each one of those you can see many of the technologies that you already know and love on the left hand side there we have popular web browsers you might use on server side we have node.js for mobile we can support react native and also other things that wechat and progressive web apps of course and for desktop electron can be used to write native desktop applications and of course raspberry pi for internet of things and javascript is the only language that can be used across all of these devices with ease without any extra add-ons and plug-ins and that is a very unique point about javascript on its own which i'm sure you're already aware of and of course with tensorflow.js you can run you can retrain via transfer learning and you can write your machine learning models completely from scratch if you so desire just like you could do in python if you're familiar with machine learning in python and that allows you to basically dream up anything you might want from augmented reality uh gesture sound recognition conversational ai whatever it might be you can do that in javascript now as well giving you superpowers in the browser and beyond so there's three ways you can talk about using machine learning in javascript i'm going to go through all of those now the first one is pre-trained models these are essentially really easy to use javascript classes for common use cases and you can see we have many of these already from object detection body segmentation which allows you to find where the body is in an image pose estimation to detect the skeleton and we've got speech commands and much much more in fact some of our newer models on the right hand side there you can see we now support face mesh which can recognize 468 landmarks on the human face we've got hand pose that can detect similar things for your hand and also the q a model that allows you to do question answer based natural language processing all in the web browser so let's see some of these in action and see how they perform so first up i want to talk about object recognition this is using coco ssd which is the name of the machine learning model that we're using to power this and that has been trained on 90 object classes such as these dogs on the right hand side so 90 common objects can be recognized out of the box now what's important is that you can see that this also gives back the bounding box data which allows you to localize it in the image and that's why we call this object recognition instead of image recognition image recognition is where you know about the thing exists but you don't know where it is so this is a pretty cool one to start with i'm going to show you how we can write code to make this actually work ourselves so let's dive into the code now so first up let's look at the html this is pretty boilerplate stuff we're simply going to import a style sheet there style.css and then in our main body we're going to have a demo section that initially is going to be invisible so you can see class invisible is set at the very beginning there and then we have some images that we want to be able to classify on click so these all have the class classify and kick and an image contained within that containing div now this could be any images you want and then at the end there you can see we simply have three script imports the first one is essentially bringing in the tensorflow.js bundle the second one is bringing in the coco ssd machine learning model and the third one is of course the javascript we're going to write to get all of this working so looking at the first lines of the javascript first of all we're just going to define a constant called demos section and that's just going to get a reference to the demo area where all of our images are living we're going to set a variable model has loaded and set it to false and also define a variable for the model to store that once it has loaded next we need to load the model of course so all we need to do is call coco ssd.load and because this is an async function we use the then method to call back a anonymous function in this case with the results and you can see that anonymous function simply takes the loaded model as a parameter and we can then assign that to our more global variable uh called model and we set model has loaded to true so we know that things are ready to use finally we remove the invisible class from our demo section to make sure it's now visible and not grayed out like it was before so next we get a reference to the image containers i.e all the divs that had that classified on clip glass we can then loop through all of those and essentially add a click handler to each so that we can decide what to do when each image within it is clicked and here we go here's the handle click definition we simply check if the model has loaded if it hasn't we're going to return straight away because there's no point doing anything unless the model is available to use and if it is available to use we're going to essentially call model.detect and we're going to pass it the image that was clicked so the event target in this case and then again this is an async operation so we use the then to then call our other function handle predictions once it's ready and in handle predictions you can see we now pass a predictions object that simply we can log if we wish to kind of inspect as we so desire but essentially this contains all the uh machine learning predictions that came back for that single image that we tried to classify so we can loop through those predictions and we can create a new paragraph element for each and set what we've what we saw along with its confidence and then we can also set the margin of this paragraph so it sits nicely at the bottom of the bounding box and then of course this thing called highlighter is essentially the bounding box that i've created and we're just setting the x y width and height coordinates of that element so that it sits in the right place in the context of its parent div and then of course we just add these two elements to the dom and that should now be visible and finally the css is pretty self-explanatory for various moments when we're changing vague so if we put it all together this is what we get so as you can see this is the code running and i can now click on one of these images and you can see instantly i get results coming back with the bounding boxes showing the items that is found in each image i've actually added a little extra bit of code here to do the same thing but with the webcam and if i enable this you can see that i can now see myself too and notice how the performance is pretty cool it's running at a high frames per second and all of this is running live in the web browser which means of course that your privacy is also preserved because this data is not being sent to a server for classification so the next thing i want to talk about is face mesh you can see here how it can recognize 468 unique points on the human face and it's just three megabytes in size in fact many people are starting to use this in creative ways such as muddy face which is part of a l'oreal group who are using it for ar makeup try on as you can see from the image on the right this lady is not wearing any makeup on her lips in fact the lipstick is being chosen dynamically at runtime in the browser and then we're applying it because we know where the lips are from face mesh pretty cool but let's see if it's running for real using my face so i can explain a little bit more okay so now you can see my face in the web browser and as i open and close my mouth you can see it reacts really well uh it's running at high frames per second but this is just running on the cpu i can actually switch at the top right and we can get even better performance by running on my graphics card now in addition to doing the machine learning in real time because javascript is obviously great at graphics we're also rendering a 3d point cloud that we can also tinker with at the same time as you can see i can move my face around on the 3d point cloud too so you can use this to make pretty much anything you want so next up is body segmentation this model allows you to distinguish 24 unique body areas across multiple bodies in real time as you can see from the animation on the bottom here but you can see how well that segments and it even gives you estimation for the pose of each body too are you where they think the skeleton is which can be used to do gesture recognition or much much more now models such as body picks can be used in really delightful ways too here's two examples that i created in just a couple of days that allow you to do some powerful things on the left hand side you can see how i remove myself from the webcam in real time rendering myself invisible much like a harry potter cloak or something like this and as i get on the bed you can see how the bed still deforms even though i'm removed from the cam feed in real time now on the right hand side you can see another demo created that allows me to measure my body size in real time and i don't know about you whenever i'm buying clothes i never know what size i am so i made this to help me out to find my size for different brands on the websites that i use and in under 15 seconds i can get a result back for my chest measurements my inside leg and all that kind of fun stuff in a much more frictionless way and of course all of this runs in the web browser so my privacy is preserved none of these images are going to a server and of course all this can give you superpowers too what if you combine tensorflow.js with something like webgl shaders in that case you can get an effect like this which is made by one of the guys in our community in the usa which can shoot lasers from your mouth and eyes all in real time at a battery smooth 60 frames per second but let's not stop there if we combine it with web xr a very emerging web standard you can now even project people from magazines into your room in real time too and this guy is using this on his phone and then he can walk up to the person and and kind of meet them in real life virtually speaking so that's pretty cool and i thought well if i can do this then why not go one step further and combine it with webrtc to teleport myself in real time and you can see here how i can project myself from my bedroom into another living space it could be somewhere else in the world to meet my friends and family such that i can be closer to them even when i'm not and having tried this myself it actually does feel better than a regular video call because you can walk up to the person and move around them and all this kind of stuff which you just don't get with a regular video call now the next way you can use tensorflow.js is by transfer learning this is where you retrain existing models to work with your own data and this is the next logical step after using our pre-trained models to make things more customized to your needs now if you are a ml expert you can of course code all this stuff yourself but i want to show you two ways today on how to do this in a super simple fashion now the first one is teachable machine this is a website created by google that allows you to retrain data in the web browser for very common tasks like recognizing an object or speech recognition or pose estimation for example and in just a few clicks you can make your own ml model so let's try this out right now and see how easy it is to use for something like a prototype so here's teachable machine we can click on image project to start and i can click on webcam and you can see now that i'm just going to take a few samples of my head in front of the webcam and then i'm going to do the same thing for class 2 and we take a similar number of samples but this time i'm going to use this deck of cards and we've got a similar number of images as you can see i'm now going to click on train model and essentially that means it's retraining the top layers of the model that we're using so that we can classify new data using things it learnt from before so in just a few seconds this process will be complete and we can now see a live prediction coming from the webcam and hopefully we can see that class 1 is predicted right now and uh if i put the deck of cards in front it should now show class two class one plus two and look how responsive that is it's really really fast and um you can get this great performance in just a matter of seconds i think it's like 30 seconds we've made a custom machine learning model so do try that out in your spare time and uh you can use this in prototype so you can simply hit an export model at the top right there and you can save the json files that you need to then load this model on your own custom website later on to do something more useful so maybe i can share a deck of cards and reveal a youtube video or whatever i want to do now the next method i want to show you is if you want to do something more for production use case which is more than just a prototype you might have a lot more data and of course in the web browser you're limited by the ram that you can use in a single tab in chrome of course so if you have like gigabytes of data you can use cloud automl and this allows you to train custom vision models in the cloud which you can then export to tensorflow.js just like we did before um so here you can see i've just uploaded lots of uh data of flowers in this case lots of different folders of different types of flowers and all you need to do is then specify if you want to train for higher accuracy or faster predictions and of course with machine learning there's always a trade-off between these two things but you can choose which you prefer you click next and then after a few hours of training it will give you the option to export to tensorflow.js as you see on this slide and it's super simple to use this exported json file in fact here's the code all on one slide all we need to do is include the tensorflow.js library at the top here we then include the automl library as well and then below this we have a new image that we have never seen before this is just a daisy image i found on the internet and we can then essentially use this as the image we want to classify and then in just three lines of javascript below we can now classify the image so the first thing we do is we wait for the model to load so we use tf.automl.loadimageclassification and we simply pass it a reference to the model.json file that you would have downloaded from cloud automl and that can be hosted on your cdn or your website or wherever you so desire because this is an asynchronous operation we use the await keyword of course and then that gets assigned to model when it's ready we then get a reference to our daisy image which is the new image you want to classify in this case and we then simply use model.classify and pass it the image and await the results to come back and once this is allocated to the predictions object this is just simply a json object we can pass through and see all the predictions that came back from the mr model for that single image and of course you can call model.classifier multiple times once the model has loaded so if you were to use this with a webcam you could then of course do that instead and have it running in real time on webcam data and the third way of course to use centrifuges is to write your own code from scratch now this is for the machine learning experts out there or people who want to go more hands on low level and of course going on that would be too much for a 30 minute presentation today but there's plenty of tutorials on our website which i'll share with you later to get started with this but today i'm going to go through the superpowers and performance benefits you can get by running in javascript and node for example so first up i want to talk about the different apis we have available there's two apis the first one is the layers api which is essentially like kiras if you're using python in the past and that is a high level api that's super easy to use now below this we have the ops api which is much more mathematical and this is like the original tensorflow stuff if you will and that allows you to do all the funky linear algebra and all this kind of stuff um so depending which way you want to go there's two flavors of tensorflow.js you can use here based on your experience and capabilities so you can see how this comes together essentially we've got our models at the top there bases upon the layers api and then that sits upon the core or ops api just below that now that can talk to different environments such as the client side and within the client side you might have different environments as well like browser wechat or react native for example and each one of these environments knows how to talk to different backends such as the cpu that's always available but also other things like webgl if you want graphics card acceleration on the front end or wasm web assembly if you want to have better cpu performance and there's a similar story of course for the back back end on the server side with node.js and here it's important to note that we actually have the same performance as python land so here we're actually calling the same tensorflow cpu and gpu bindings that python has to the c libraries that tensorflow itself is written in and that allows us to get the same cuda acceleration and avx support for the processor to make sure things are running as fast as possible and in fact if for some reason your machine learning team is still using python then of course you can load in save python models from the layers api if you if they're using qrs and you can use the tensorflow saved model formats via our ops api to load that directly into node.js without conversions you can just take a saved model and then use that in node.js now if you want to use any one of those saved models on the client side then you have to use our command line tensorflow.js converter and that will convert the model into the json format we need to run in the web browser so let's look at performance then um here is tensorflow.js versus python running mobilenet and these are the inference times how long it takes to classify the thing we're looking for in the image at the top there you can see running on the graphics card in python 7.98 milliseconds and in node.js just 8.81 milliseconds so you know that's within a certain margin of error anyways and it's pretty much the same for all intents and purposes now where it gets interesting of course is that if you have a lot of pre and post processing which basically a lot of ml models do because in order for the model to digest for data you need to manipulate the original data into something that is usable in machine learning lands then you can actually get further performance increases in node.js because of adjusting time compiler of javascript itself in fact we've seen with people like hugging face which are quite famous for making natural language processing models that they've seen a two times performance boost just by switching to node.js for their machine learning pre and post processing so now if we focus on the client side for just a second here are five superpowers you get which are hard or impossible to achieve on the server side now the first one is privacy as i kind of hinted at before all of these machine learning models are running in the web browser on the client machine that means at no point is any other sensor data going to a third-party server for classification and that's really important in today's world where privacy is always top of mind and with tensorflow.js you can get that for free of course now link to this is lower latency because no server is involved when you're running on the client side then we don't have that round trip time from the mobile device let's say to the server which could be over 100 milliseconds or more in a bad mobile network connection and of course that leads to lower cost if you have a reasonably popular website you might be spending tens of thousands of dollars on graphics cards and beefy processors to run those machine learning models by running on the client side all of that hardware is no longer needed and of course you can just execute directly on the client machine as you all know interactivity is a big thing for javascript it's kind of been designed for that from day one so we have a much richer ecosystem for graphics and charting and all that kind of fun stuff and the final point reach and scale which we all know and love being web developers ourselves um essentially anyone can click on a link in the web browser and have the machine learning loaded for free versus trying to do this in other ways on the server side which would require you to first of all understand linux and install linux then you need to install the tensorflow stuff and the drivers for cuda from the nvidia then you need to install the github repo and compile it and make sure it runs with the environment on the server side so all of that hassle goes away when you're running on the client side and that get that can get you more eyes on your research in machine learning which could be very valuable if you're a researcher for example maybe that means 10 000 people can try your model out instead of the five people in your lab that can maybe uncover bugs or biases in your model but you can then fix before you see prime time now flipping to the server side for just a second there's also some benefits there too of course if you choose to use node.js so obviously we can use the tensorflow save model without conversion as we spoke about we can also run larger models than we can do on the client side due to the memory limitations in chrome per tab and of course it allows you to write code in just one language which is of course javascript um which needless to say a lot of devs use javascript according to the stack overflow survey of 2019 i believe 67 of people are now using javascript in some capacity which is pretty cool and then the performance benefits of course you can get by getting the just-in-time compiler boost in node.js over using machine learning in python for example so with that i would like to talk to you a little bit about the resources you can use to get started if you're interested uh if there's one slide you want to bookmark today let it be this one and the next one actually so essentially here's some tutorials you can use to get started these are code labs you can walk through them step by step and learn as you go these are really robust ways to learn some of the things with tensorflow.js and machine learning principles in general and then of course this slide has pretty much everything else on the slide here's our website to get started the models that you've seen in this demonstration and many more are available on our github there and we have a google group to answer any more technical questions that you may have or maybe thinking about later on and then finally we have codepen and glitch which have boilerplate code you can use to get started now on the right hand side is our recommended reading material this is a great book that covers everything even if you have no machine learning background at all that's completely fine as long as you know some basic javascripts this book will take you through everything you need to know to get your machine learning chops up to scratch and with that please come join our community in fact here's just a few more examples of what people have been making just the last few weeks and this is growing every week if you check out the made with tfgs hashtag on twitter or linkedin you can find what people are making right now and please do contribute your own for a chance to be featured at future show and tell sessions or even conferences and such in the future so the final thing i want to leave you with is this last demo from a guy in tokyo japan uh he is actually a uh kind of dancer and he's now used machine learning tensorflow.js to make his next hip hop video as you can see here and it's really great to see creative folks starting to embrace machine learning as well it's no longer just for the one percent of people with phds um it's now for everyone and hopefully tensorflow.js can make this even more accessible to all in the future and i'm really excited to see what you will make and please do tag us with made with tfjs if you do make anything in the future so we can share it with the team so that please do stay in touch i'm happy to answer your questions after the talk uh or connect with me on linkedin or twitter and happy to ask questions over there as well thank you very much for watching and see you next time
Info
Channel: Coding Tech
Views: 11,208
Rating: 4.9330854 out of 5
Keywords: tensorflow, machine learning, javascript
Id: 47xeOwLLWDc
Channel Id: undefined
Length: 32min 23sec (1943 seconds)
Published: Wed Feb 03 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.