Embedded Deep Learning with NVIDIA Jetson

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone I'm Alexandra DeLeon content coordinator for Nvidia on behalf of Nvidia and today's presenter Dustin Franklin Nvidia's Jetson developer evangelist I want to welcome all of you to today's webinar embedded deep learning with NVIDIA Jetson all attendees will be muted during the webinar however you can ask questions throughout by typing them in the chat area provided on your screen and we encourage you to do so Dustin will answer questions verbally at the conclusion of its presentation we are recording this webinar and making it available for download you will receive an email shortly after the conclusion of the webinar with a link to the recording and a PDF of the slides so without further delay here is Destin hello everyone thanks for joining us today we're excited to be here to get to talk about how to get started with deep learning and deploy it into real-world embedded systems I'm Dustin a community developer with the embedded jetson team and Vidya we've been working to put the power of deep learning into the hands of developers everywhere and are excited to share these results with you today first we're going to clear up what deep learning really is and how anyone can get started with it introducing the workflow of training and deploying deep neural networks then we'll show what networks are already available and ready to use today for accomplishing things like flexible and robust computer vision perception Nai and we'll talk about advanced learning methods on the horizon for embedded systems like reinforcement learning and online simulation by and large the main reason we're here today is that the embedded space is growing fast as robotic platform developers are able to tap into comparable vast on-board computer resources in contrast previously accessible levels of performance and program ability deep neural networks are well matched for the massively parallel architecture of GPUs and are ideal at adapting across a variety of changing conditions and environments that machines and devices may encounter in the field each robot pictured here is NVIDIA powered and incorporates some aspect of deep learning like the starship dealer delivery service Rover on the right Home Companion BOTS from Toyota and IIT and in routes urban delivery drone deep learning provides them the flexibility and robustness to go out and operate safely in the real world over the coming years billions more of these intelligent machines will be deployed increasingly in need of onboard processing for high-definition sensors as we'll see a lot of recent advances in AI are powered by parallel computing and Nvidia graphics technology underneath since GPUs also tap into billions of other consumer and gaming applications the rate of growth and development year over year is tremendous neural networks have been around for a long time - since the 1960s but have only recently exploded in growth - the accessibility of teraflops and peda flops of compute allowing them to contain orders of magnitude more neurons and Express greater intelligence and recently with the increasing performance and prevalence of recurrent neural networks or rnns that are able to make memory loops between neurons forming internal random access buffers and storage networks can be deployed that our class that's Turing complete we're able to express and evaluate the same range of programming as a human with traditional languages like C and Python one thing about neural networks it's important to remember at heart they act as big function approximator 'he's undergoing training to produce the desired output from inputs pattern matching and recognition is the basis for many such applications in machine learning you take a dataset input and annotate it with training data ie the desired output and the training system iteratively solves or trains the wades to reproduce the error with the output with the least error the runtime phase of processing networks is called inference and that's where live sensor inputs get evaluated through the neurons and the previously trained weights that produce the learnt result you can deploy deeper neural networks onto invidious jetson high-performance embedded computing module for implementing real-time applications in sensor fusion vision and many other areas like smart signal and video analytics the deeper network is and the more artificial neurons that contains the more intelligent inferencing is able to be encoded within its layers with the added compute horsepower and efficiency of NVIDIA GPU technology neural networks can be scaled up in complexity and depth to contain several hundred layers and billions of network parameters these deep networks with additional neurons are proven to achieve ever-increasing rates of accuracy and precision NVIDIA GPUs are able to evaluate and train neural networks up to 20 times more efficiently than traditional CPU systems and we'll see deployed on small GPU powered embedded systems and further creating more Headroom for bigger networks and faster training the result is that with GPU acceleration developers and scientists can create train and deploy networks of ever-increasing complexity intelligence in just hours in what used to take days or weeks shortening the development cycle and further accelerating progress the first breakthrough came in 2012 when researchers from the University of Toronto released their formative publication image net classification with deep convolution neural networks detailing their newfound deep architecture which they caught alex net designed for accurately classifying a variable set of pictures places and objects a task called image recognition alex that employed many more layers and numbers of neurons than were previously seen in a functioning network coin deep learning due to the depth of the network and expand a number of layers Alex set experienced a huge bump in classification accuracy versus other existing state-of-the-art approaches in computer vision at the time and each year since researchers have steadily improved the classification rates and robust of the recognition networks by employing new and novel Network topologies stemming from the initial Alex that deep convolutional network another breakthrough came in 2015 when microsoft's new network ResNet or deep residual networks surpassed expert human vision and recognition effectively neural networks were now more accurate than the best humans at image classification tasks and speech audio and natural language processing networks quickly followed suit what's more is that with a series of hardware and software advances from Nvidia including FP 16 half floating-point precision and runtime network graph optimizations NVIDIA has continually improved already the leading D platform for deep learning by introducing key performance gains measuring up to 20 times more efficient than traditional processing architectures in short there's been no lack of breakthroughs in AI recently and the amount of them powered by deep learning has really been tremendous starting with the moment that machines surpassed humans and image recognition tasks AI researchers have proceeded to scale deep learning to greater and greater depths achieving ever-increasing levels of accuracy robustness and leading to new applications like Berklee's pixels 2 actions robot brett who learns interactively from experience just like people do to pick up and play with toys and household objects in addition to advances in vision deep learning is breaking down barriers between languages and enables automated speech translation during conversations and many of you probably noticed earlier in the summer in a moment reminiscent of when deep blue defeated chess grandmaster garry kasparov google's deepmind alphago agent bested world go champion lee siegel four games to one except in this case due to the extremely complex combinatoric oh and the enormous state space this victory represented a much more advanced accomplishment for AI and with the rollout of parallel embedded GPU hardware from Nvidia these advances are making their way from the lab into the filled the same advances in GPU computing which led to the first deep learning revolution are now available in real world embedded systems and these lead to very exciting applications in robotics automation and analytics recently the weep out group from the Netherlands entered service with their autonomous People Mover shuttle which is using deep learning and NVIDIA GPU tech onboard for hosting its self-driving capabilities this year there have also been significant advances in drug delivery and intelligent pick and pack machines Jetson powered starships have begun deliveries in five countries and the top three teams from the Amazon pick and Challenge finalists all use deep learning solutions running on NVIDIA GPUs the Nvidia ecosystem is unique in that Nvidia provides the highest performance at an incredible range footprint while all the processors share the same common cuda architecture underneath this makes it easy to scale and deploy applications ultimately the most efficient and each performance and power level all the way down to the 10 watt 1 teraflop Jetson module up to the 170 teraflop dgx one server and drive PX all running Linux and shared CUDA accelerated applications at this point so many CUDA libraries have already been released that developers often need and write their own custom CUDA code or be particularly well versed in low-level GPU programming at all but rather simply use libraries that already take advantage of CUDA acceleration underneath the covers in fact the CUDA deep neural network library called 2d n n is already heavily in use in the community and provides high performing Network layers that are used by deep learning frameworks most modern-day deep learning applications use a framework that promotes ease of use and reconfigurability of different network topologies and training schemes so you aren't hard coding the network each time today all the major frameworks already incorporate NVIDIA GPU acceleration and are used heavily by researchers and scientists worldwide to solve problems in computer vision speech and NLP that were previously considered impossible to solve to DNN which has already been integrating into the leading frameworks like Berkeley's Cafe Microsoft C and TK Google tensorflow Theano and torch from NYU and Facebook COO DNN includes support for all of those and these convolutional network layers really the workhorse of deep learning alongside those turing-complete recurrent R and n layers that we talked about and most recently LS TMS or at long short term memory networks which is like a more advanced RN in addition to ku DNN in video's release tensor RT are optimised inference library and the digits interactive web training system digits an open source project contributed by Nvidia is an interactive web tool that runs on PC or server with GPUs digits makes it easy to create and train customized enn models very quickly digits also makes it easy to import custom data into databases used during the training phase a variety of formats are some are supported for also importing training data labels and specify the validation and test subjects of the data after the data sets in place you create a new network model and initialize training parameters all of this is done through your web browser was just a few mouse clicks it really runs on the GPU accelerated PC or server and you just access it through a web terminal the network topology which indicates the types and dimensions of layers is typically specified in network protec store protobuf format many example functioning networks of which are available online in open-source model zoos on github in addition to the source proto text for different classes of networks these models OU's also frequently contain pre-trained checkpoints that are ready to use out of the box as a starting point refining a pre-existing Network model downloaded from the zoo can very often be much faster and result in less training time than training a fresh network model from scratch regardless of if you're starting with a pre-existing network or not during the training phase digits interactively charts network performance and visualizes output layers to keep an eye on progress analyzing the time series of the network's performance metrics can be key while deciphering the impacts that various training parameters have on convergence while creating new types of neural networks many times it's good to start with a setup that's previously working and then iteratively tweak and add in new features from there whereas if you're just retraining the network with new custom data to support recognizing new types of objects for example you can generally use stock pre-existing network setups like Google net or to TechNet for example in fact once you get a network training successfully once on a particular data set it becomes much easier to adapt it to new things simply by feeding in the new data without changing the underlying network configuration it's this process of adding new objects to the network which exposes the true power and advantages of deep learning being able to reconfigure applications for new problem domains and subject matter with minimal effort referred to the digits workflow it enables the network to recognize additional outputs or classification categories by training on new data and without changing code or overhauling the network topology and training parameters once you get the network set up and working once new objects can be introduced to the neural network with relative ease this is done by presenting it with pre labeled positive and negative training examples like a collection of images captured from cameras or sensors or a web crawler script after the initial network is trading and deployed to Jetson which is as simple as copying over the model checkpoint file from the digits server hard drive to the jetsons the digits workflow becomes very powerful and self-reinforcing when platforms in the field being collecting and reporting back additional training examples to define to refine the network with thus the networks can rapidly and iteratively evolve and the more data collecting inference platforms that a organization deploys over time the more training data they have and the more intelligent and powerful their network models will become this is where the benefits of Big Data really start kicking in and coming into play but no matter the vast volumes of data reserved for training networks can be deployed on the small efficient Jetson TX one module deep learning and intelligent machines are a big reason behind why we built Jetson tx1 it's a full system on modules smaller than a credit card with over 1 teraflop of peak performance and 10 watts of power consumption it's the best combination of performance power efficiency and size in an embedded package available today - runs Linux is compatible with existing software and GPU applications and contains standard ports and host interfaces like USB 3 HDMI and maybe direct chip-to-chip serial cameras in fact it can ingest up to 3 4k or 6 HD cameras simultaneously with Hardware offloaded pre processing and image enhancements including the own board dual isp engines additionally the Jetson module includes onboard Wi-Fi you be thor net pci express expansion lanes nice flow of low-level general-purpose protocols like urs by I squared C PWM and GPIO on the underside of the module is fitted a high-density 400 pin board aboard connector which provides all the signals a carrier board breaks out the i/o like the reference dev kit or miniature credit-card sized carriers available from ecosystem partners or a custom designer all the documentation to create designs for the module is publicly available so there are practically endless possibilities although typically gear for applications in the embedded space where one or two modules may be deployed onboard a motor mobile battery powered platform for example Jetson is also applicable to green cloud computing for example Connect tech has a slim one used server which has an amazing 24 Jets and modules software updates and packages for Jetson are easily available through NVIDIA jetpack the latest version jetpack 2.3 with tensor RT brings up to a 2x improvement in deep learning performance and power efficiency Nvidia took what was already the highest performing inference solution of 10 watts and doubled the performance all jetpack software upgrades are free for everyone and it's now twenty times more power efficient than an Intel Core i7 6700 KS skylake system Jetson provides the illusion of enthusiasts great desktop and server class hardware and a small embeddable form factor with all the software you need available to get started tensor RT is a new addition to jetpack 2.3 which is driving these big performance gains in deep learning with no tensor RT before in the digits workflow it's used for the runtime inference phase of deployment tensor RT provides up to double the performance or more by optimizing the network graph structure if using CUDA kernels and reducing memory bandwidth demands in short it makes the most efficient use of GPU hardware possible including 1/2 F P 16 mode and onboard kernel Auto tuning to choose the fastest execution path for each Network and platform these performance improvements from tensor RT push inference into real time territory meeting a critical threshold for hardware and de Loup applications like vision deep-learning vision primitives are becoming more more advanced over time and are trying to operate in higher and higher dimensions like 3-d and 4-d and even as high as 60 once you take in time series into account the jump to 3d with shape Nets allow robots to precisely identify the depth and orientation of 3d models that has been trained to recognize similar to how the previous primitives were training on data sets of example images of example images shape nets are trained on a database of CAD models they're being used frequently in picking and warehouse bending applications and when you combine it with registration or the ability to line images or point clouds together you can build and navigate Maps for example a map of your warehouse this is called slam or simultaneous localization and mapping registration gives you the affine transformation between frames or inputs which is very useful in identifying the underlying platform motion for stabilization and visual odometry applications and all of these vision primitives shown here are based on deep learning and use convolutional neural networks in various dimensions to provide at the most basic image recognition which you just feed in an image and it feeds out a class of what it thinks that image best represents the next step up from that would be object detection and localization where a bounding box is returned to you in addition to the classification results and the confidence values we call those that is indicating how confident the network is that it actually recognizes the objects so that's very useful than it just doesn't give you back out the object that it detected but how certain it generally is so you can weigh those results appropriately generally speaking the next big step up from 2d object detection and localization where you get the bounding boxes is called segmentation or this is where a very high density binary pixel mask is output from the network which gives you very well-defined and blobs of all the different types of classes for example in the image there you can see that it identifies skies a certain color vegetation as others so this is very good to use in tandem with the object detection and localization because sometimes you might not care about finding just one specific object in an environment you might want to know about all the potential hazards and other things that you might run into for example a drone you would want to home in on the blue space for example and then you could use object detection to detect drone landing targets where you might deliver packages or whatnot so all of these deep vision primitives have been under development for you know since deep learning has been scaling up and coming online and each year they become more and more advanced and then you know developers are able to integrate these into their robotic applications and we'll talk a bit more about the ones that are available today to use onboard jetson in real time so to get started first there are two major resources you should be aware of for the invented world with Jetson on the left is our developer forum and community this is where anybody can get in touch and ask questions and the community can respond and video responds and I am the forum moderator as well so you can find me there and this is where the technical support and all the technical discussions can go down and also there's a sticky thread on there about this webinar today so if you'd like to engage in discussions after the talk you can find that there the additional community resource that goes alongside the dev talk forum is the e Linux wiki and this contains a lot of the systems programming and other types examples and tutorials and links to the documentation that you might want to get started with initially since jetson comes with such a wealth of open source material and thousands of pages of documentation that are available any Jetson tx1 users get open access to the nvidia embedded developer zone nvidia provides everything that a developer needs to get from an idea to a final product including complete hardware design collateral reference designs drivers the libraries everything is available through this portal and when we update new documents they're automatically posted there and if you register for the Nvidia registered developer program you will automatically receive an email notification when we update the datasheet or other technical documents like the OEM design guide for example which it documents in great detail all of the pins and routing schematics and everything like that that you would need to follow if you're making your own tx1 design and all of these you can download for free anybody can access them because Jetson is an open platform so in addition to all the Jetson and embedded developer resources that you have available and videos also launched the deep Learning Institute where developers can go to learn the ins and outs of deep learning in addition to instructor guided hands-on labs in coordination with the NVIDIA GPU technology conferences and Microsoft self-paced online courses are available to run in the cloud via the likes of Microsoft Azure or Amazon AWS or you can run them locally as well but you can very easily run them in the cloud without any real investment to get started anybody can do it remotely the deep Learning Institute is also partner with Udacity and Coursera to launch certified nano de nano degree programs including Udacity self-driving car course where participants collaborate on open-source driving software and gain valuable experience developing fraud this vehicle and navigation and nvidia is heavily involved in that course as well 12,000 people signed up for a course which was originally meant for 500 people so now Udacity is kicking off another instance of the course each month to meet demand for AI startups and businesses Nvidia also has in place the worldwide inception program for gaining access to the latest strategic information and resources available including roadmaps and global sales and marketing channels you can sign up today for your organization located at Nvidia comm slash inception if you're eager to get down to brass tacks the most concrete way possible to get started today with a new is with a new series NVIDIA has available called two days to a demo and the 10 steps to deep learning it's an open source repo on github that contains step-by-step guides along with example tensor RT code and pre train network models including example image nets and detect sets for locating pedestrians and live video camera fees the reason for the name two days to a demo is that anyone can take the github repo and quickly integrate it into their platform of choice to come up with a quick proof of concept the community has already been using this since we launched it recently last month and three such example projects are shown here including the jet robotic teaching kit which uses our support for Ross or robot OS with the Ross nodes they were able to drop in real-time recognition into jet in less than a day just a couple of hours including the testing of the robot that's because Ross is modular and componentized and by providing our deep learning Ross nodes anybody can easily drop it into their robots that use Ross and quickly play around with networks on the robot which can be a lot of fun when the deep learning networks start exhibiting human-like traits and quirks speaking of fun the guys from Make Magazine built this awesome weekend project this automated automated cat toy laser system when the Jetsons on-board camera detects kitty it begins shining the laser on a pan tilt and endless entertainment ensues they got started with the github repo and hooked it up to their laser which was a lot of fun times more advanced integrations like IITs new version of the iCub companion humanoid called the r1 may technically take longer than two days but the spirit remains the same getting started to point deep learning in just a fraction of the time compare ibly after initially using the provided pre train models in the repo like detect net pedestrian and facial detectors the 10 steps to deep learning includes actions for retraining the network's with custom data the last piece in the puzzle for making the application truly your own and just like that you have an application ready to deploy with deep learning let's take a look at some additional applications with a more progressed level of integration Coris is a wearable devices company from milan italy that specialized in blind assistance for the visually impaired the horse device heavily leverages deep learning to convert visual inputs from their cameras to verbal audio cues that their users can hear for minimal latency everything's processed on board using Tegra to implement a variety of critical features including landmark recognition face and identity recognition reading books and text and importantly identifying obstacles in the environment another innovative company using the power of deep learning and GPU tech is Dutch industrial German specialist area electronics who recently debuted their automated industrial inspection system at GTC Europe and Amsterdam the week before last they worked with a Rai startup Narula to integrate on-the-fly tracking and detect net localization of cell phone towers and wind turbines and the way this was done is using very similar training datasets to what we saw in the two days to a demo essentially you can just go out and file your drone collect the data you need and iteratively refine the network model to extract the features that are required for your particular application the deep learning examples and networks that we've discussed up until now use labeled training data to learn to produce the desired outputs however what if training data isn't available or may not be easily quantifiable for a particular problem end-to-end learning may be appropriate in these situations where there's nothing between the raw sensor inputs and outputs than the neural network for example robotic picking and grasping is a challenge for a human to articulate precisely the correct training stimuli especially when considering the number of possible actions and training iterations some of these advanced grippers and minute violators that go on the end of robotic arms can have upwards of 20 degrees of freedom if you account for all the joints and knuckles it's just like a real human hand so it's very complicated to do a supervised learner in that fashion supervised is where all of the training data is pre labeled by a human most times in the loop but instead some robotic applications may benefit from learning from experience and environmental feedback the same way that people and animals do a technique in learning theory called reinforcement which uses a series of rewards it's similar to how you train a dog or playing a game of hot and cold and when the reinforcement learner covers the entire end-to-end pipeline it's called pixels to actions referring to the network's ability to take raw sensor data and choose the action it thinks will best maximize its reward ie there's no code in between the raw sensor inputs and the robotic controls then the neural network the reward is how you tell the network the behavior is wait the reward is how you tell the network what behaviors you want it to optimize itself for and although the network is learning for itself choosing actions and collecting experiences it would be a leap to say it's thinking for itself or has some element of free will because the network policy is to explicitly follow the rewards so by for example giving the dog a treat you know you essentially can train it to do the task that you desire be it that way so it is still programmed quote-unquote but at a much higher level than would be before and it can optimize its behavior to best achieve that reward so regardless over time reinforcement learners have an uncanny knack for developing intuitive human-like behaviors like learning to walk or peeking around corners when they're unsure about what might be around that corner it's a great extension of the new deep learning compute model because here truly there's no other human code operating the robot and this leaves the network to implement all the corner cases that traditional inference counter like when it loses track of the object where is it supposed to look the reinforcement learner can automatically learn to do that very intuitive like I mentioned before and reinforcement learners naturally incorporate elements of exploration and knowledge gathering which makes them good for imitating behaviors and performing path planning although a robot which learns on the fly based on its own experiences may not be completely desirable in a highly structured environment such there's self-driving cars drones and off-road robots operating in unstructured environments tend to greatly benefit from reinforcement learners being able to make sense of their environment which can be hard to quantify in advance for example all the possible obstacles that a robot should possibly avoid for reinforcement learning is a very intuitive way to express these problems in the real world and when the machines can understand so once you get onto the reinforcement learning and also we have a github repo available for reinforcement learning on the Jetson as well they'll be the link will be at the end here but once you get it running on the robot thanked quickly the next step tends to be to get a simulation of sorts up and running because sometimes the reinforcement learners can take many millions of trading interations depending on how complex the behavior is for doing tasks that having to chain multiple behaviors together it can very easily take multiple millions of training iterations so that can be problematic on robots that have lower MTBF and you know the motor servers might die before a network gets fully trained so that's where these physically accurate simulators like the gazebo and open rave and some other simulators that are highly available in the robotics community come into play because you can create a virtual version of your robot which really gets you all most of the way there it's very similar to when the digits workflow it was better to start with a pre trained network even though the network wasn't you know exactly what you wanted I got the training weights much much closer because when neural networks first can initialize they're initialized with random weights but if you initialize it with the simulated weights that's a much closer approximation so it can continue learning in the field in the real world to bridge the gap so to speak and a lot of times we found that in simulation the fidelity of the visual elements the cameras and such is actually less important than just the physical interactions and having objects respond in a physically accurate way and that's because these reinforcement learners learn to parameterize the physics of their environment whatever that's happening in it's very popular to see the reinforcement learners applied to the open AI gym and Atari Ames and other games as well and in that case the reinforcement learner learns the physics of the game so to speak in the real world it learns the physics of the real world and the simulators are very effective at deploying reinforcement learning in parallel which is work that's being actively undertaken right now in the research community and that's to be able to collect experiences from many robots in parallel and be able to train a central neural network to basically collect all of the experiences that happen because when robots are doing reinforcement learning they are essentially behaving in 6u and that means that a robot might not get into that precise situation very often so when it does it's very good to have all these operating in parallel so it really raises the coverage of the environmental space so to speak and a lot of reinforcement learners like you learners and actor critic methods they do have an exploration element built into the system where to prevent the neural networks from getting stuck in local minima essentially or to get stuck in lazy habits where it might not go out and find the optimal method because it's already converged on a suboptimal method so these reinforcement learners are very good at going out and exploring all possibilities to find the best ones and it's for that reason that they can be very surprising sometimes and exhibit human-like intuitive behaviors and a big part of that recently is with these new RN n & L STM networks that we were talked about before and since those are able to encode memory within them they're very good at remembering what they saw and functioning in what's called a partially observable environment which nearly all real world scenarios are partially observable all it means is that your sensors can't instantly sense the entirety of the environment all at once and for example in the Atari game the entire screen shot of the aim space is sent over to the learner so that's a fully observable environment because you can see the whole game but for an example in a first-person view or most camera systems that have limited field of view you only capture you know a limited subset of the environment and any one instantaneous time and these internal memory cells within the RNAs and LS TMS can remember what they saw in previous time series and that enables them to exhibit much much more intelligent and long-term behaviour and chaining together multiple types of tasks and such like that and these LS teams and RNs are also GPU accelerated since QD n n version 5 and jetpack 2.3 so all of this is now very fast to implement in runtime and you can do these advanced LST M memory networks on board the Jetson and to get started you can grab a Jetson tx1 developer kit this is the reference breakout board that I mentioned before it's like a little mini ITX desktop system that includes the Jetson tx1 compute module a 1080p camera module the reference carrier board and all the other stuff that you need to get up and running with that and takes just a couple minutes to get started with because you just plug in normal HDMI and USB 3 just like it's a little mini desktop but it actually gives over a teraflop of performance and only draws ten watts and I don't really even really use my desktop much anymore I just run these and have a much lower power bill so that brings us to the conclusion of the presentation aspect of this webinar at this time I will answer questions from the chat box so let me scroll to the bottom here huh there was a question what simulator is the bottom-left that is Grand Theft Auto 5 engine and is used in a lot of automotive scenarios because it's a great driving simulator and the thing with the simulators is they all have a unique use case and they have a lot of times they have a use case their best for I personally use the gazebo simulator a lot because it's good for ground robots and industrial arms and already comes with a lot of tutorials and such there was a question about stereo in Jetson is the premise of stereo there are traditional vision stereos certainly available from not only Nvidia but a wealth of other people who have done stereo disparity calculations but I don't know of yet a supervised stereo where it takes in the raw stereo and just gives you back the disparity map what people seem to be doing in stereo is using the pixels two actions where they feed in the raw stereo imagery and then just out comes the action and it's up to the network to internally utilize and extract and use that depth information somehow which actually can turn out to be much better because the way humans judge depth is not only by the disparity field but also by relative size and so these reinforcement learner networks can doall learn to take on all types of other stuff with the stereo field and there was a really impressive network that came out last year where it was a drone navigating around the woods and they recorded stereo raw input from a human hiker and when the human turned left to follow a trail certain way then the drone would learn to turn left as well and that was a raw stereo to actions which utilized depth information very well let me look for some other questions here there was a question and you deployed tensorflow on jetson TX one the answer is that there have been ongoing efforts to support tensorflow and get it compiling because tensorflow was predominantly developed for a cloud type environment and has some dependencies on precompiled basel and other types of build systems for the Oracle JVM and things like that which are normally just provided for x86 platforms but since Jensen's own arm that is have been supporting effort to get that working and fortunately just recently there was patches posted to the tensorflow github and the people have now been able to install tensorflow 0.9 and also 1.1 three I believe but go to the tensorflow github and look under the issues and you will find a Jetson TX one issue at the very bottom this has been a very long running issue to get working but people have reported they've been able to do it with jetpack 2.3 so that that's great there was a question does the virtual environments work natively in jetson like the simulators and the answer is it depends on the simulator you're using generally the open-source ones do because they need to be recompiled for armed for example grand theft auto 5 is not going to work on jetson because Rockstar does not provide an ARM version of grand theft auto 5 but gazebo is surprisingly in the Ubuntu AR 64 repo by default so it's one command all you have to do is sudo apt-get install gazebo 7 and instantly you have that simulator which can run on the Jetson a lot of times you would run it on a PC or server platform as well in addition to do hyper real-time training but I'll tell you having access to a simulator on board the platform on its remote is actually quite useful as well because they could if it got into a situation didn't recognize it could simulate it and figure it out from there so that's a very exciting here's a question can you call reinforcement learning a form of supervised learning or would you prefer semi-supervised learning and we are still learning the immediate States through annotated outputs right and the the question answer our question or was correct in that they are still evaluating various shades of supervision there's supervised learning there's summary supervised learning there's unsupervised learning and then generally there's reinforcement learning because a lot of times you can do scenarios where the reinforcement learning reward is actually automatically distributed to the in the real world BIA using another traditional inference network for example that's doing the reward giving so now there could actually be the dual networks where one is giving the reward and the other is receiving the world and a lot of work being done now in other actor critic methods that separate the reward giving from the policy from the network evaluation to introduce new concepts there like that and also Nvidia's recently published publications on semi-supervised learning that they do and we've also worked in unsupervised auto-encoders before so there's a lot of different types of training methodology out there for networks and the cool thing about GPUs since they actually run software underneath it's very easy to set up new training methodologies and scenarios and still have the solvers and the network layers be very fast in GPU accelerated ok um let me scroll to the bottom here there's a question what desktop GPU kinda used to run through TN n or DN n the answer is anyone GPU since 2008 that supports CUDA can run these now and can deploy them at vastly accelerated rates versus their CPU system so any NVIDIA GPU they all support CUDA now and they're all have the ability to run neural networks at very fast rates there was a question here what about slam on NVIDIA GPUs Nvidia is interested in slam both the traditional way which would be done through like the point cloud library or other cv methods and those are GPU accelerated as well there's lots of researchers who research slam and it really comes down to point cloud alignment and registration techniques like iterative closest point for example but there is a new work being done in deep learning with slam Edie again it kind of focuses in on the registration aspect where essentially you're aligning the instantaneous sensor data to your existing world map and the neural networks can be trained to automatically learn to align input data so that it's much much faster than these iterative closest point versions for example here's a good question how many training images should we have for retraining to TechNet with digits so the rule of thumb is 10,000 images if you want to do a really good you can do as low as a thousand images but if you're capturing a full motion video from a robotic platform that goes by quite fast if you're capturing at 30 Hertz or 15 Hertz or so and if you're doing supervised learning a good way to label all of that data without you needing to do it yourselves you can use an Amazon Mechanical Turk you can use other types of traditional cv methods to label it as closely as possible and this is assuming it's totally custom data set and there's a lot of existing data sets out there that you can leverage in addition to mining google images and the like Microsoft cocoa is one of the largest open source available image datasets and it has all types of segmentation and the bounding boxes for all types of advanced neural network so you can get started with those if not you can collect the data from a test platform are you using half precision math in DNN implementation for tx1 the question the answer is yes we do use half precision floating point 16-bit optimizations in convolutional neural networks and other deep neural network layers because we found that the final output of the network is essentially no different from if it was running with full float32 and it can be up to 2x faster using 16 and the future and videos introducing all types of new hardware units like int 8 to do you and faster normal networks because now scientists have realized they don't even need full half FP 16 all the time so Envy is a really great roadmap for scaling neural networks even faster than we have been to date um there's a question here how good is jetson TX one for training a DNN so you would not want to train with digits on Jetson because it could take many times longer as what might take you an hour on digits might take 12 hours on Jetson or 10 hours overnight because that's really like server HPC grade software a lot of times that does those supervised training and learning modules but on the side of reinforcement learning and other online learning algorithms that are more optimized for real-time learning Jetson is very much a capable learning platform for doing on-the-fly learning with gem generally speaking it doesn't involve terabytes of image data that are pre labeled that's the thing that makes it have to run on digits and on a server is that the image net training database for example is like 1.4 terabytes and it has to iterate over that training data set you know at least 30 times it's called an epoch so it has to process you know 50 terabytes of data and do all the training for that but in the reinforcement learners and unsupervised learners and semi-supervised learners Jetson is very very adept at doing those types of embedded online learning there was a question is there an example script on github repo that we can use to convert any cafe model into something that tensor RT can understand the question the answer is yes that is all located in it's called the tensor net class in that github and that contains all of the tensor arty code that's common to every network type including image net text that segment etc so that's where you would find that procedure and the script that takes in the cafe proto text and does the optimization and in fact in the github it will run the optimization automatically for you the first time it boots up if it doesn't find the optimized tensor RT output in the cache but then once it does because it can take up to a minute to do the optimizations on jetson for detect net for image then it just takes a couple seconds but the more more complex the network gets the longer the optimizations take so then those optimization outputs get cached on the disk and all that stuff is located in the tensor net class and also each jetpack 2.3 flash jetson comes with very things like single line tensor are key examples that you can follow okay here's a good question for an industrial high-speed image inspection system is it reasonable to connect one Jetson tx1 to a 200 frames per second high-speed camera and get a DNN classification results at 200 Hertz and how complex would that train DNN already be jessamine would be the perfect platform for doing that in an embedded scenario because it can actually process image net and you know very multi hundred frames per second rates I'd have to check if it's exactly 200 but it's in that neighborhood and you can do batch processing or other like the FP 16 that I mentioned and a lot of the systems that we test the deep learning is much faster than the cameras are themselves and the image net classifiers are an example of that so those run very very fast for example Google net or Alex net and you can do high speed recognition that way I'm not sure what type of camera it is but there are a variety of methods to get that high speed camera data into the Jetson including PCI Express or Ethernet or other the camera serial ports as well here's a question it kind of gtx 1060 be used for learning the answer is yes you very easily can I actually have one of those nice little cards and it takes longer to train on certainly then like a tight next but since the gtx 1060 has 6 gigabytes of onboard tgr5 it's more than capable of training networks as well so you can run digits on any Nvidia CUDA capable GPU actually [Music] okay let me scroll to the bottom and get the latest question so there's I see a lot of questions about work I get the presentation and and stuff like that there is a link here on this slide that we see right now accessing these slides you can just go to that repo in my github that's where I put all of the PDFs and stuff and you can contact me on LinkedIn or via email there and you'll be able to watch the recording afterwards it was a question can i deploy the model train on AWS cloud to the Jetson what type of environment - I need an AWS that is very easy to do because digits and Caffe each training epoch or iteration through the data set they save a network check point which includes the current training parameters and all the weights and it's normally a couple megabytes it can be up to like 30 or 50 megabytes and is an archive and then you just copy that over to the Jetson into your tensor RT program load that Network model binary which just contains you know all the hundreds of thousands or billions of trained neural network weights there's a question about that type of AWS instance you would need I would recommend the the GPU instances you can get them with the Tesla cards and there most recently upgrading to the Tesla K 80s which I think that's a G - x16 large a new type of instance and that would be very very fast here was a question is it possible to give a couple of cameras inputs to the Jetson tx1 for parallel processing into TechNet the answer is yes you can ingest up to six HD cameras or three 4k cameras simultaneously a lot of times the networks do take a smaller input for example the detect Nats they definitely take a 1024 by 512 input and that's considered HD for neural networks because for example Google met an Alex that they take in a 224 by 224 network but the the networks don't need so many huge raw pixel because they're able to extract such great information from these lower images themselves so a lot of times that means you can process multiple cameras in parallel okay I think we're almost out of time here but let me see if I can get one last question can the Jetson deaf kids support a solid-state drive that's a good question because the Jetson module comes with 16 gigabytes of emmc emmc is like a internal chip that stores the hard disk and flash memory but a lot of times for applications you might need external storage and yes the Jetson module does have SATA ports that come out over that 400 pin port DeBoer connector and the desk it also has a SATA port there those people that do MDOT to SSD keys that get used often in the miniature credit card size carriers or SD cards as well but those MDOT - SSDs are actually quite popular in that they're very small P deployed onto the embedded system ok well thanks everyone for joining us and I did not get a chance to answer your question please feel free to reach out or you can post in the dev Talk forum thread that's listed there and engage with the community and discuss later and thanks so much for joining us all today and we're very excited about where everyone's heading with this type of technology and we hope that you take a look and try learning it for yourself thanks thanks Justin for presenting and thank you guys thank you all for attending as mentioned earlier we've recorded this presentation and we will send it along with the PDF of the slides later today thank you again for joining us hope to see you at our next event
Info
Channel: NVIDIA Developer
Views: 71,724
Rating: undefined out of 5
Keywords: Deep Learning, NVIDIA, Jetson, Jetson TX1, Embedded Computing, AI, JetPack, DIGITS, TensorRT
Id: _4tzlXPQWb8
Channel Id: undefined
Length: 59min 42sec (3582 seconds)
Published: Fri Dec 09 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.