Hardware Acceleration for AI at the Edge

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today on the IOT show we have Ted way from the AI platform team the come to Ted to talk to us about hardware acceleration for ml at the edge [Music] hi everyone this is the IOT show this is Olivia here your host thanks for watching we have Ted way today with us Ted came here to talk about hardware acceleration for machine learning right that's right so our acceleration is something is needed but this tons of things that the AI platform team you belong to is doing before jumping into what you guys are doing and how you're helping you know data scientists and developers how about you introduce yourself to our audience and tell us a bit more about what your team is doing and then we'll jump into the topic all right great thanks Olivia my name is Ted way and I'm a program manager on the AI platform team so specifically on Azure machine learning so Azure machine learning is your end-to-end data science platform for everything from building and training your models to hyper parameter tuning doing your experimentation building out your pipelines operationalization and deployment and then integrating that into an end-to-end DevOps loop and so that's something that we offer so specifically where I sit is we are going to take that model and operationalize that for you now not only just operationalizing it but actually making it run faster on specialized hardware okay and so hardware acceleration I think people have notion of what that means and why it's needed the other thing we're gonna dive into a bit more details and and we're like free is well to kind of highlight in the context of IOT scenarios was edge computing where we need to have the intelligence actually brought all the way down to the hardware for various reasons like non connectivity or privacy problems whatnot how important that horror acceleration is and how we are helping developers making these models they're running today in the cloud running on the edge on devices so today yeah all you're familiar with with IOT is probably in the terms of numbers right you know you might have some sensors you might have like a temperature sensor or a humidity sensor that you might be able to take that data and you can do things like anomaly detection or forecasting and those are really cool things that you can do on relatively inexpensive hardware so now the question is how do you process unstructured data like images yeah or voice or acoustic data and what's powering all of that is deep neural networks and so these are really flex AI models the traditional neural networks might be one to three layers deep but deep neural networks can be as many as fifty to a hundred 150 layers deem quite a traditional deep neural network like ResNet 50 takes 8 billion calculations so taking that just to calculate you know run 8 billion calculations to figure out what's in an image becomes really expensive really fast and so on an edge device we really need to find ways in order to be able to run that super fast and that's why we're here to really accelerate that using the hardware it makes sense and and we'll see that actually we're talking about edge but it's also super important to being able to accelerate these algorithms in the cloud as well right yeah and so what is see cross-pollination how does that work what is the various options and how does horror acceleration work especially for us Microsoft yeah so from a Microsoft perspective we look at the spectrum of hardware that's available to us so it's not like an either-or decision it's really about what are your needs and what are the best hardware options to meet those needs and so if you look at the scale the efficiency and the scale of flexibility you know on your flexibility side you have your CPUs super flexible they run everything they're just sequential and so they might not run as fast as you would like them to next up you have your GPUs GPUs are your workhorses of AI today great at parallel processing but they're a little bit more pricey and they draw a lot of power and so that's the thing about GPUs from a more efficient perspective all the way at the end are your Asics application-specific integrated circuits and so these are chips that can run whatever you want them to run about you can't change them and so a program they're perfect for doing one thing they're hardened yeah so once you create that chip you put that into a device that you know nothing changes forever another one of the options that we're looking at our FPGAs and MP J's are field programmable gate arrays and these is these are chips that you can essentially reconfigure so they have logic blocks on them that you can reconfigure their routing on yeah and so once you deploy that ship to your oil rig in the middle of the ocean or to your super secret laboratory you don't need to change that Hardware anymore you can update that over software so imagine you have a model that does image recognition yeah our work for workplace safety for example and then six months later if you have an updated model you can just update that chip over software just reconfigure it over software to reconfigure with rogram any program yeah with your newest model and then you have the ability to be able to then run that new model directly on that hardware okay so that's there's a huge variety of them today which one are using the cloud which want to use and the edge yeah absolutely in the cloud and Azure we offer CPUs and GPUs for training yeah and we also have FPGAs for inferencing in addition to CPUs and GPUs so now you have this option to be able to figure out what are you trying to optimize for are you do you want to spend less money but you're willing to wait for your results or you want to spend a little bit more money and we want your easel snatch yeah i'm nice thing about FPGAs is we're trying to be both cheap and fast so you have the model run super fast and it's very very competitively priced so that's what we have on the cloud okay we also have those options on the edge but we also have a six on the edge and the first one that we've been working on is a partnership with Qualcomm Snapdragon family of chips so they have a digital signal processing unit that accelerates a lot of the processing so what we do is we take that model and we put it onto the Qualcomm device and we make use of that chip to make AI run faster directly on that ship and that's what we have that's what we have a gift so this is yeah this is the vision ai Developer Kit and what's on here is a let me jump to that slide right there so basically on the AI developer kit we have that Qualcomm chip in there yeah it's running Yocto Linux so it's just an edge device like any other Linux device okay and we have the ability to now convert that model and to deploy it on their campus oh sure yeah so all these different options for you know hardware acceleration I I would assume that to present some you know challenges because because they are different in nature so basically what you put in there as you know the bear the code it's different from one to another right so when you train your model are you a data scientist you train the model you need to export a model in a format that will work for these ones that's right so what's going on on that side I think so if you just looked at what things would you know what you would have to do today um you would take that model just like you said you might have your tensorflow model or your height which model and you decided to deploy it to hardware made by a certain hardware vendor yeah you'd have to incorporate their SDK user SDK to convert that model and get into a form that can then be deployed and run on that yeah yeah and then you decide to use a different chip from a different hardware vendor another SK yeah another conversion and so that's the challenge today with running things on the edge yeah so you have a lot of vendors for GPUs a lot of vendors for neural accelerators and so that's why we're starting our partnership with Qualcomm and with Intel okay and what we want to do is essentially give you that pipeline so you as a data scientist all you have to worry about is actually nothing at all you're doing what you're doing today you are building your model your training your model in that tensorflow pi Gorge cognitive Tokyo whatever framework you have and we will have a model converter this will take your model and we will run the SDK of that hardware vendor okay so for Qualcomm we've incorporated their Snapdragon SDK okay convert that model into the DLC format which is what I use and then we package that up into a container just like any other um Azure machine learning model okay and then we can now deploy it to this developer kit and run on that accelerator and you happen to use IOT have a nighty edge runtime on the device itself the deployment over the year super simply and and easily so yes so basically I imagine the experience would look like I'm the data centers working my model it's like export like drop-down choice of what formats you select done right essentially yeah that's that's essentially yeah that's that's the we're taking steps towards that goal and let's let's talk about what things will look like exactly yeah and we'll start with FPGAs and and so basically what we have on the FPGA are optimized models and so going back to that idea of a resident 50 model and the way I like to think about what these optimized models do is in the context of say a fruit sniffing dog so let's say you work at the airport and you have a dog that is supposed to sniff out through you know so to make sure that you know you don't introduce these things into your country yeah yeah and so food sniffing dog at the end of the day was a German Shepherd right so a German Shepherd when a German Shepherd is born is not trained to sniff out the kinds of fruit that you're looking for but the German Shepherd has the infrastructure if you will to be able to take in smells and to be able to distinguish among the smells very very in a very very sensitive way and so now what you can do is start training that German Shepherd you can say hey this smells like that contraband fruit this doesn't this smells like that fruit this doesn't this smells like that fruit three weeks later 50 boxes of kibble later you have yourself a second dog right to ask how do you treat your back in for learning your model bucks is a kibble that's what we do what we doing out here today but now that you have this true sniffing dog you know training it wasn't that difficult yeah yeah making that German Shepherd is the hard part and so if you think about German Shepherds that's what we essentially have on the FPGAs today so ResNet fifty is a German Shepherd dense net 121 is a German Shepherd vgg sixteen is a German Shepherd and now you can take in data and applying that concept of transfer learning you can then train it to do different things and so what I'm showing now is just an example of how you would train image classification model with just a data set of cats and dogs all you're doing is essentially pointing to a data storage location where you have pictures of cats and dogs yeah and then you're just using Python and tensorflow you don't need to know anything about FPGAs and then you can just run through our sample notebook and then deploy a model onto an FPGA and that's something similar as we bring it back to the IOT world that we did with Jabil and so Jay balls a manufacturer what they have are assembly lines and they have Aoi cameras that are taking pictures of the products as they come off the assembly line yeah and today a human looks at those pictures and decides do we send it to the next assembly line step or do we scrap it or rework it so I can imagine this kind of a process where they look at it like yeah for a long time or because you have several components of metal and that board actually it will take a long time for that to be process right and so basically the and so as we think about Microsoft and how we empower people in organizations to achieve more right let's have the human do high-value work of understanding what's the real cause of the estimate defects you know they don't they don't need to be spending time looking at those pictures and just all day yeah and so we built an AI model with JPL data scientists and as you can see here we have a picture of one of those components one of those components and what I'm going to do is send this picture to my model running on the FPGA and then it's going to come back and you can see that it's coming back in about 12 milliseconds okay and so there's the actual resident 50 running on this chip is only one point eight milliseconds the rest of it is just a little bit of overhead you know if we're sending data across the network etc but 1.8 milliseconds and that's just an amazing way to get what we call real-time AI because you're able to process data like this yeah and that's how you can take advantage of FPGAs and the cloud and we're also enabling this in the edge so data box edge every single Microsoft data box edge server will have an FPGA card in it and what you'll be able to do is go through the azure machine learning process yeah train your model containerize it yeah and deploy it to that data box edge device you have an FPGA and this device can now be in your factory yeah you can take all of your security video you can take all your manufacturing defect to video yeah feed it to the server and then do your inferencing there so ai in a box yeah I know that's exactly it and so in a very similar way let's talk about what we're doing with a Qualcomm camera okay and same idea what we're doing is also using the concepts of transfer learning except instead of ResNet 50 we have another model called mobile net and mobile and that's been optimized to run on mobile devices okay and what I have now is a data set of flowers and just running a very similar Jupiter notebook okay I'm just going to upload my data okay I'm going to train that model on the on the data yeah and then I'm going to now add that extra step of that conversion so hmm what I'm doing is registering that model yeah and I'm going to convert this model to into that form that the Qualcomm chip use this yeah and after that conversion is done I will be able to create my docker container and this docker container image is just going to be stored on my Azure container registry okay and then now and you're good to rit hub configuration set up by a t edge for you to download the apps container and run it locally that's right and running it locally again so let's take a look at my camera and i'm just going to connect to it just using my adb shell here okay and you'll see right now I have my IOT edge agent out here edge edge hub module and my mobile net flowers module okay so just download it to the device just like any other oil T edge module okay and now we're gonna switch to camera and that will switch to the camera and take a look at what the camera is actually seeing so what the camera is doing is it's taking images on the camera it's sending it to the model the model is accelerated on the Qualcomm chips yeah and we're gonna see results on that and this is how we're enabling you to just take your Python your tensorflow model yeah and you don't have to worry about anything else we convert that for you and compactor set it up for you you deploy it and now you can run that directly on the camera it could be a flying everything happens down there privacy included iris included yeah nothing gets sent to the cloud yeah unless what you decide to get sensitive exactly just a 1 or 0 depending on the result you want to slice right you want to yes yeah okay let's switch to the camera here all right cool well let's see what the camera actually sees yeah and what kind of inferencing is doing yeah I have some pictures of flowers here I trained this on the flowers dataset okay and you should be able to see that it's a looking at pictures of roses and then if we switch over to pictures of tulips we should be able to see that it recognizes tulips and so that's the basic idea the processing is done directly on this camera everything on this camera and the Qualcomm chip mm-hmm awesome it has been actually packaged super straightforwardly from the cloud and so if you have 10,000 of these cameras you have your ILO t-hub deploy the same model to all of your 10000 cameras and that's awesome any kind of data model on if you can update that model too yes awesome well thanks Ted for that you know overview of what hardware acceleration means for ml at the edge or AI at the edge I hope to see you soon for more of the I at the edge topics thanks for watching guys and don't forget subscribe for the IG show [Music] you [Music]

Info

Channel: Microsoft Developer

Views: 7,899

Rating: undefined out of 5

Keywords: ai, artificial intelligence, iot, internet of things, dev, software, developer, programming, tech, technology, microsoft, windows, channel9, iotedge, azure

Id: OmOV_4MZ2aM

Channel Id: undefined

Length: 16min 13sec (973 seconds)

Published: Mon Mar 04 2019