Animashree Anandkumar - Next-generation frameworks for Large-scale AI | JuypterCon 2020

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everybody and we have now our keynote speaker anima and kumar and i am very very pleased to be able to introduce her to this community she is brim professor of computing and mathematical sciences at caltech where i did my phd so my i'm a mother and she's also director of machine learning at nvidia she was the youngest named chair professor at caltech and has received many many awards in a short time stand but beyond this also anime is a relentlessly outspoken and influential leader and women of the times and having spearheaded many campaigns in the community of machine learning for improving our community her her campaign for renaming uh nurips uh the conference one of the flagship conferences in machine learning got her the recognition of the 2018 good tech awards of the new york times but in terms of research of course she's a pioneer of tensor methods and um are forced to be reckoned with in the machine learning community so with that i would like to pass it on to her and she's going to present and i will hide my video at that point thank you lorena and that was just such a great way to connect yeah i hope you had a great time at celtic i always continuously strive to have more women and underrepresented communities be at caltech and help us further enhance our research and uh other aspects of uh community life there so and i'm really pleased to be here today and uh project jupiter has been close to my heart i've you know been using notebooks all through my research carrier and now my team you know this is the way we prototype this is the way we then share the knowledge with the world you know open sourcing is a great aspect of the current culture in machine learning you know democratization of ai is built on the foundation of open source and project jupiter has been created enabling all this and beyond right so first is in the aspect of notebooks and prototyping but also having reproducibility and transparency in ai has been so crucial and over the last few years there has been such an immense push towards enforcing this in our conferences for instance on europe's this year has a reproducibility challenge you know the i think all camera ready papers have to come with code now archive added a tab like where you can have code along with the papers so i think this aspect that machine learning and ai is no longer in the theoretical realm right once it starts getting into the practical realm source code is such an important aspect pre-trained models and the model zoo are important testing tools are important profiling and monitoring tools are important so this whole ecosystem of the ai stack and infrastructure becomes such a critical part of um enabling this in practice and nvidia has indeed been at the forefront of building this ecosystem building platforms building frameworks in addition to the gpus to enable this ai revolution so let me start with that and i'll give you an overview of some of the latest frameworks and platforms we're building along with the algorithmic research that goes with it right because my background as lauren i was saying involves you know looking at tensor methods and other methods first from the theoretical lens understanding when these can do well and when they don't and now you know seeing them in practice has been a dream come true and so seeing also how we can enable new algorithmic research and have that be synergistic with development of these frameworks and platforms so with that i'll start sharing my slides now okay great awesome and i imagine that people can ask questions in the chat and uh uh lorena if there is some immediate clarification feel interrupt me uh something that doesn't need to wait until the end uh i will be looking at the uh q a uh at the chat here in webex and also in the forum and i will make sure to transmit any questions to you got it thanks so much so with this i'd like to start uh with the exciting deep learning revolution we are seeing over the last decade and that has been possible due to what i call the trinity of ai so ai is not just the algorithms it's not just deep learning but it is built on the foundation of having large data sets such as the imagenet and large scale compute from gpus so this confluence has led to the ability for us to have large-scale neural networks uh that are being deployed in a variety of applications indeed we've seen exciting results all the way from alphago beating the human champion to the nvidia gang passing the turing test and generating such photorealistic high resolution images of faces of people that never existed right or that do not exist so having this ability is much further than what any of us had expected when we started working on this a decade ago but even with this there is a lot that is still left to be done uh there's been so much excitement around autonomous vehicles and autonomous systems but we haven't seen it one yet be deployed at large scale in the streets and the reason is safety criticality is so difficult for the current systems to uh you know perform there's a long tail and looking at all possible scenarios in the real world is hard and any mistake could lead to life-and-death situations so that aspect makes it very challenging language understanding is another one where we've seen lots of very impressive developments with the availability of large-scale language models but still i mean given the pandemic that we are facing and when there was an attempt to remove the human content checkers with ai it wasn't still ready right and it isn't still ready because understanding all possible uses of language is still beyond the realm of current models and when it comes to robotics we are still far from having adaptive and uh intelligent beings like even like our pet dogs you know they may fumble but they attempt to learn new skills they interact with us they attempt to understand us and the most of the robotics on the other hand is pre-programmed and cannot adapt with data and so robot learning is another cutting edge field and i'll show you some platforms and frameworks where we are developing new algorithms so the question is broadly you know given where ai is today how do we push it to the next realm how do we make it adaptive how do we make it uh learn with just few examples how do we make it robust to noise like humans have the ability to do that so these are questions we want to ask to lead it beyond the current realm of narrow ai to generalizable ai and in this case i want to break down any ai algorithm into first two parts right so one aspect is the learning the other is decision making because you can learn on some data but you also then want to decide how to use that model and so you can think of decision making as having aspects of what tasks should you design this for you know how should you design the objectives how should you design the deployment of these models as well as what action should you take if this is an interactive environment like say a robot interacting with the environment around it but within learning there is further division we mostly think of learning as using data but there are also priors you know there's priors of the world around us and currently deep learning models don't use a lot of those priors and so it's an open question of how to build those better priors and if you think about current deep learning methods most of that requires massive label data sets that require expensive human labeling they are having just weak priors for instance for all images we use convolution right but we don't make it domain specific for instance in autonomous driving it's very important not to make mistakes on signs such as the stop sign and experiments have shown that even if you add simple blocks here like this you can fool the system into a chair you know confusing what's in the sign so this brittleness is not a real problem for many of these ai models and the question is can we design better and robust priors and much of my recent work has focused on this whether it's getting inspiration from neuroscience and adding feedback into feed forward neural networks or to you know utilize existing controllers in robotics and or to use domain knowledge from quantum chemistry to have good robust features that are transferable to larger molecules i won't talk about all of that today here but if you go to my website you can see great examples of where adding the right domain specific priors will make it much more robust and transferable compared to the baseline models and lastly we have to be mindful of the tasks on which these models are being trained you know we researchers are attuned to this culture of leaderboard chasing where we all accept a benchmark and we try our best to beat the current methods on it right but that can lead to dangerous consequences um for instance it's now well known that face recognition models that are out available through some of the companies are extremely biased against darker skin tones and especially black women and men so in this case uh if they're being used in law enforcement and other sensitive applications you can imagine all that bias amplification happening it could even lead to life-and-death situations so it's important when you're thinking of doing your research or putting it in deployment what kind of benchmarks should we design for training and testing and can we go beyond the current culture of just inaccuracy or accuracy being the metric so these are all aspects to think about and where i see the next frontiers in ai involves uh moving from supervised learning to unsupervised learning so a lot of research in my groups both at nmedia and caltech is focused on how to do practical unsupervised learning at scale and how to derive inspiration from how humans do disentanglement learning how humans can disentangle different variations how they can learn new concepts we have some exciting new works in this area but today i won't be talking about that i'll focus on the priors today and ask how to design better robust priors whether it's incorporating domain knowledge or adding structural priors and we'll see how tensor methods can give a great set of priors for the ability to train good models on the task side we've been looking at adaptive tasks which are in multiple domains uh which are dynamic but these are again aspects that i won't be covering today but i wanted to give you an overall view of where we are headed in the next decades when it comes to development of ai we want to focus more on unsupervised learning development of better and robust priors and design of more adaptive tasks and indeed a lot of inspiration comes from studying as humans and especially infants you know how do they do this amazing learning you know they're doing it predominantly as unsupervised learning so all those are aspects that should inspire us to do better and more robust ai so now i want to get into the frameworks and in particular show how we've been working on tensor methods over the last decade and what are the frameworks that enable it to scale efficiently and develop new code and new methods without even knowing all the underlying details of these methods and that's where tensorly is an open source framework where we've been extensively developing for tensor methods and that as i said will enable good structural priors and so the idea is uh you know tensors and graphs and laws of nature and other forms of priors are really important in many applications so what is the tensor it's extension of matrix to higher dimensions thanks to this cartoon where you know dogs are always a great way to describe abstract math and make it more friendly but the idea is really how do i now expand beyond just rows and columns to more but it's not just a multi-dimensional array right so just as we multiply matrices just as we have low rank matrices we can ask the same question of how do i interact different tensors together how do i manipulate them how do i express low rank tensor aspects so putting all this together means we can extend matrix algebra to tensor algebra and there's been extensive work done in this in the past century but bringing that to the practical realm and bringing that to machine learning has been an incredible journey for me and my collaborators and indeed so much of what we see in machine learning involves tensors right in fact if you go to tensorflow or python or any of these right that's a data tensor i mean the name tensor is in fact in tensorflow and tensor cores uh in nvidia gpus so but the understanding of tensors there is in terms of having multiple dimensions the question is can we think beyond and ask how to retain information in these multiple dimensions efficiently what kind of low rank representations can we design and why should we have a neural network with just matrix layers when the input is a tensor right so can we go beyond and design tensor computations in our neural network layers as well so these are aspects we've been building uh in tensorly so tensorly is an open source framework that was first started by sean cosefy and now enjoys an extensive open community and the idea is making this accessible to a wide community not just in machine learning also in quantum chemistry and areas that require tensor computations and the idea is uh you know having many different backends right so for instance somebody in the quantum field may prefer numpy whereas a machine learning researcher will prefer pytorch or even the latest one is jax and q pi is uh it will enable gpu acceleration for python so all these different backends you can easily fix and the idea is there is a stack of operations from simple manipulation on tensors all the way up to having ready-made neural network layers that are tensorized so you can start using them in various applications and there is a whole variety of notebooks on github here you know i won't be running them here due to lack of time i'll just show some code snippets but please go ahead and access them and play with them and indeed you know you can install this this is open source uh so all these instructions are now available and we've just released a new version that's out now so take a look at this and so the building block for all this is to extend matrix product to now more dimensions right so the question is i think showing this pictorially uh hopefully clarifies what we are saying here so just as you can multiply a matrix with a vector and think of that as linear combination of columns similarly you can now multiply a tensor along different dimensions and similarly do these combinations right so the idea is in a matrix you can only multiply on two sides whereas in the tensor you can now multiply in all those different dimensions and so intensely as you can see this is just uh one line of code right so you can directly express these contraction operations so with this building block then we can design methods that express low rank structures and tensors so the most popular version is known as cpd composition so unlike matrices where there's only one notion of rank in tensors that is in dimensions three and beyond there's more than one notion and cpt composition is one of the popular notions and intensely what you can do is you can just you know look at uh recovering uh the cp tensors right so given any tensor just say give me the cp decomposition and so in this way it's just simple operations to do these uh recover uh these low rank decompositions and you can also keep the tensor in a low rank form so the idea is even in case of a matrix if it's a low rank matrix there's only a row and column but if you multiply that that could become a huge matrix right and that's uh really wasteful to directly up operate on this large matrix similarly in the tensor you want to keep the low rank form as it is throughout the chain of manipulations if you expand it you will try many times run out of memory and intensely enables this with end-to-end operations and will also speed up because you're now directly operating on the low rank forms and so you can you know keep this as a cp tensor throughout all the operations so the question is how can we use these forms of tensors in different networks so the first work we went about doing uh uh was my first collaboration with jean-claude and hear others so and the question was how we can change matrix operations in neural networks tensor operations and the first one we went about doing was tucker decomposition so the tucker is another form of low rank decomposition and the idea as you can see here is there is a core tensor right that's of smaller size than the original one and then there are factor matrices that then expanded to the original dimensions and again uh in intensity that is just to one operation to get the stunker decomposition and so how we went about using this tucker form was to compress fully connected layers and so in the last layer we can now use tensor regression where we express the weights in a tucker form so now it's lower angle have much lower number of parameters than the original one and by doing this in fact we got space savings as high as 65 percent uh in standard imagenet benchmarks and architectures and without any loss of performance so this shows that having tensors can compress your layers by huge amounts it can speed up operations so there is a much richer set of architectures beyond just matrix operations to get these neural network architectures and jean also applied this to mri data you know you can do regression in all forms of data and with this is 3d resonance because this is 3d data and so there's even more potential for compression as well as robustness so another interesting feature was by using tensor representations we have much better inbuilt robustness compared to the baselines and you can also use this for domain adaptation if you are now having multiple domains an efficient way to transfer information is through uh having a core tensor that is common to different domains and this should be much better than simple vector or other representations that people typically do in practice and then john also looked into other forms of uh incorporating tensors into different layers and in this case it was convolutional layers right so earlier we did fully connected layers uh but then there's a lot of potential to compress and speed up in the convolutional layers as well and especially with 3d convolution so if you're doing full 3d convolution that's too expensive and so can you express this as a cpd composition so you can think of this as sum of separable convolutions right so this way we hope that you know we can by having enough such factors you can still correctly capture the relevant features in the images and videos to get good performance and they did it for emotion estimation and indeed there is a whole range of emotional states and you know this was the data they carefully collected and what they were able to show is that the model has very good performance with a tiny number of parameters compared to the other um state-of-the-art models so you're getting like you know having just one-fifth of the parameters and having better performance and i think that's what is surprising in now all these models is that you know bigger is not better you can get better performance with a smaller model which is not seen at all in any other setting in deep learning which is what makes tensors so intriguing and that's what we found also in this other work where the goal was to incorporate better temporal information so right now if you're doing lsta models there is only shorter memory so if you want to predict way into the future that fails on the other hand transformers are still too expensive to do large-scale videos end-to-end to capture all temporal correlations and that's where we designed tensor trains as an intermediate point between transformers and lstms because now you can factorize the temporal hidden states over a window and capture their correlations effectively so the idea of a tensor train is you have now a train of values i mean there could be many of these blocks here in the middle and you go from the beginning to the end so and that's what we built into convolutional lstms where the tensor train now takes a window of hidden states and then looks at their higher order correlation and expresses them as a tensor train which is low rank and hence compact and when you train this end-to-end what we see is that our model has extremely small number of parameters compared to state-of-the-art so you can see really impressive compression here but still doing better right doing better whether it's some terms of perceptual scores or ssim and other standard scores for video prediction and we also saw this if you want to detect the activity early as you can see the convolutional lstm detect with tensor train detected this much earlier compared to the other models so it's not just a compact model it's able to detect activity early and that's so important in so many settings you know whether it's disruption in chemical plants and other processes you know some suspicious activity so all these are aspects that require early detection and that's where these models can be effective so even now we did this work at nvidia of course the question is how can we make this model efficient and to make use of the infrastructure and the tools we have to get good gpu utilization and that required uh using automatic mix precision so in addition to having these compact models to begin with it can further quantize and get even better performance and better memory savings uh having gpu optimization of fusing different kernels and this lstm activation checkpointing which led to huge savings in memory so and then ultimately also having multi-node training and model parallelism and using multiple streams and so the idea is you know we can get uh much faster training much better utilization of gpus better gpu memory usage so all these are aspects we can combine as we develop new algorithms such as this one and also at nvidia that's the aspect right how do we make efficient use of gpus and design good effective primitives and to do that we are moving up the chain so to say meaning if we can now block and look at primitives that compute more complex operations but doing that effectively and in a parallel way then we can get much better savings and that's where tensors are also really effective in better utilization of hardware so once we have those primitives tensor models now will be much faster than the ones with matrix operations because they're further having more effective blocking and parallelization and the nvidia 2 tensor is a library that is using a high performance set of tensor primitives such as tensor contraction and we're now building that into tensorly so you can access this high performance primitives without worrying about low level details and directly program in python and so you can get both a good performance and ease of use by having this ecosystem of frameworks come together and indeed you know now with the ampere tensor cores there's even further parallelization and uh thinking about blocks of operations using tensors so that's where cuda library like q-tensor can be very effective so to summarize this thread what i showed was having tensors as a way to provide higher order primitives for deep learning is very effective because we no longer need to restrict to just matrix operations in our neural network layers by having this flexible approach to designing tensor operations we can compress the existing networks by such huge amounts and in fact have better performance so we have smaller models with better accuracy and other performance metrics and in addition by building good gpu primitives we can also get great speedups with it so in that sense you're winning in every way by using tensors so i encourage you to go try out tensorly and contribute to the open source community there so that was the aspect i wanted to showcase you know with tensorly and tensors i want to now show you a few other examples of how we are combining algorithmic research with new frameworks and good infrastructure support at nvidia and one of them is nvidia isaac which is the platform for robot learning indeed these are exciting times where we are envisioning robots that can be autonomous that can interact with us that can do all kinds of challenging tasks and uh to do that the isaac platform what it enables us is not only to train on real robot data which is little to come by and expensive to collect but also build highly realistic physically valid simulations so that you can train in simulations to a much greater extent compared to the real world and this is built on top of the egx stack which is the ai stack for the edge and so you can directly deploy these models also onto the edge devices that run on jetson and xavier for instance and there's also uh the exciting 5g capability available with the aerial part of the egx platform so you can have good connectivity you can have good intelligence with ai training and you can have good edge performance so all these are part of the ecosystem for edge devices and the aspect where we are contributing extensively is on the development of the ai algorithms and one example is this paper that got accepted to coral just yesterday and what it shows is if you just have a baseline robot model that tries to walk on surfaces it's never seen it'll fall right so this banana peel is a surface with no friction whereas we trained a reinforcement learning based control that not only can avoid falling off on a banana peel but also do that in the real world robot so here i only have a short video the long one is available both on my website and both of you google this uh there's even articles on this and the idea is how we can train good controllers like this in simulation and then deploy them in the real world and the way we went about this was doing it in a hierarchical way so the rl controller only decides how to mix the primitives so the primitives could be like whether it's walking or trotting or just standing so how to mix them up so that's what reinforcement learning is used for and so on the real robot as long as these low level controllers are good and they have the robustness properties because you know that comes from control theory so you can now add learning and adaptivity seamlessly and do it through simulations so that's been the exciting part and so in summary uh nvidia isaac is a great platform for us to try out new reinforcement learning and other adaptive learning methods but it also has physically valid simulations so you can then port them to the real robots and it's also gpu accelerated so we can run reinforcement learning at scale and we can now you know build all kinds of new algorithms so we can do lots of interesting sim to real methods of through these platforms so i'm going to like now take some time to discuss about how an ai stack should look like and how nvidia is approaching this so if you think about it uh you know in the beginning there was only cuda right but that was great uh universal platform because you could like program or you know gpus and then deploy them on so many variety of applications but now with ai and machine learning right you need higher order primitives and that's where frameworks like pycharg tensorflow right will be effective in enabling developers to do that but it's not just about writing deep learning algorithms if you think about end-to-end pipelines for machine learning it involves data analytics right it could involve graph and other kinds of special structure methods so those are aspects as well we want to worry about and that's where rapids is reimagining the ai workflow so not just to focus on model training which is what the rest of the frameworks do but add in the data preparation visualization and accelerate it end to end on gpus and then through das have like the ability to scale it out to multi-gpu and multi-node so as you can see like qdf is uh you know is built on the data frames uh framework but now accelerated on kudo and similarly for graph analytics uh even classical machine learning in addition to deep learning and so the main principle here is we avoid the cpu to gpu communications you know the basic hadoop has just not even in memory processing right so each time you write it up to disk read it i mean this is just impossible like you need stage first you query you do etl you do ml train very expensive and all on cpu then spark made it in memory so it avoided the reads and writes between querying and etl but on the other hand it's still on cpu and now to bring it on gpu you can make it efficient but you still have to write you know read it back into the gpu and write it back to cpu right so that was the traditional one and what rapids does is get rid of all of this so keep everything on gpu directly from querying etl all the way to train and then visualization so by avoiding the cpu gpu communication you can get 100x or more improvement and there's lots of real world use cases where this has been demonstrated uh you know xg boost which is very popular uh method which wins uh probably the largest number of kaggle competitions uh the ra the rapids version of it has great speed ups there and you can also as i said scale it out with dash and get performance even on multiple gpus and multiple nodes so you can access more resources this is an open source project so you know notebooks and examples are all available online so check it out uh there's uh an active community on slack twitter and others meet other platforms so now i encourage you to go check it out if you're running different ai workflows this is uh a great way to utilize the the gpu acceleration but with minimal code changes literally it can be like just one line of adding rapids because all the scikit-learn and different frameworks you know pandas if you have those existing ones you can seamlessly now get gpu acceleration with rapids and in addition to what i showed uh there are many other great verticals where now nvidia is saying let's have a platform with pre-trained models with good transfer learning toolkits with good ways to deploy the models and make it easy for domain specialists who don't have ai background to easily start using ai and that's where nvidia clara has been a great uh success story uh it started with medical imaging right so where having annotations is expensive so can ai assist having the annotations can we do good transfer learning and can it be deployed on all kinds of devices including the edge such as scanners and other medical devices this is also available and at the recent gtc we had many exciting announcements including uh using this platform for drug discovery and uh you know having uh federated learning with many hospitals to do scans of lungs for covid so very timely and important to enable ai in the healthcare space so this is an example of how we can take our ai knowledge we can build good infrastructure we can make these models efficient and we can make them easily usable by people without any deep ai expertise omniverse is another example of platform now aimed at merging graphics and deep learning which are the two facets for nvidia and putting them together now and making it uh very much usable by artists and other graphics designers so the ability to use all kinds of different interfaces and tools there and get gpu speed ups is a great uh platform where we are using this to do um ai based speeding up of rendering and other processes uh in graphics so it's an interesting question you know will gan one day take over traditional graphics or which aspects will it take over so there's an exciting interplay between ai and graphics that we'll see more results in the coming years yeah so the other aspect is conversational ai which you know involves multiple modalities most of conversational ai today is just language and text right whereas in the real world you know having that human interaction seeing somebody's face their body language is so important i mean even just doing this virtually on a screen right we know we already missed so many of these non-verbal cues so it's not the same as you see me in real life so there's so much there in terms of for humans to have conversations beyond just the text and that's what jarvis enables this multi-modality you know having videos and sound and text all come together and being able to have different kinds of you know you can then have gesture recognition you can have these chat bots multi-speaker transcription so all these different modules come together again having a great set of pre-trained models as well as the ability to train your own model efficiently yeah so i gave you an overview of different uh frameworks and platforms for some of the exciting areas in ai i started with tensor methods and how tensorly enables this development of architectures with tensor layers that not only lead to compression but better accuracy and other performance metrics and we're also looking at better primitives for gpu acceleration so we are getting gains in so many facets by going beyond matrix operations to tensors then i talked about rapids enabling end-to-end at gpu acceleration all the way from data access preparation to visualization and making it seamless for also for existing workflows to be brought into gpus whether it's scikit-learn and other frameworks with almost no code change you can get that to rapids i also showed many other vertical platforms for instance for robot learning isaac has enabled us to show the ability to do impressive tasks on real robots like walk on slippery surfaces that it's never seen because we could extensively train in simulations uh to have this adaptive behavior and to uh the simulations are physically valid and are of high fidelity and clara is doing the same for healthcare enabling medical imaging genomics drug discovery uh to go forward uh and especially relevant in these times of the pandemic thank you so much thank you anima thank you so much and i see a few questions that have come in and one question that came early on before you talked about tensorly was asking about what would be in your opinion the best open source libraries for working with tensors for operations and visualization using python yeah so the answer is transparently that the question came at 239 i guess they were thinking on the right lines so i'm glad yeah so i think uh you know there are many others right but most of them are focused more on the quantum domain because that's where it was popular before you know getting on to the use cases in machine learning so they don't work with the deep learning frameworks uh they you know don't have gpu acceleration many times so all these are aspects that tensorly uh aims to solve in one shot i knew the answer to that one even though i'm not expert in this topic okay i have another question here and uh it says it it's someone actually grabbed a question from twitter and posted it on the q a it reads in the trinity of ai should domain knowledge be added there or somewhere else in the process yeah that's a good question for me that comes with the algorithms facet right because i broke down the algorithms in terms of data priors and task design and action design so for me i view it uh within the framework of algorithm design and i think this is a follow-up question from uh or no it seems it comes from a different person um having said that generalized learning and recognition is the goal who is leading in the field right now and if you were to compare the current state of the art to the age of a human baby and its ability to learn and make decisions what age are we at that's a good question i guess we are not even at the age of single cells right in many ways because if you think about even single-celled organisms including the virus that has brought humanity to its knees i mean they're highly intelligent and they can rapidly evolve these are not abilities any of the current ai methods can do so i would say i mean getting to humans is so far off uh we should probably start with uh you know virus bacteria maybe c elegans which is the single celled organism with uh the i guess just what few hundred neurons so we are not even there right because the ability for them to adapt the ability with which they interact in the environment and are robust to so many kinds of unseen scenarios um yeah so it's a good question are there good ways to simulate that and see can the current ai methods uh do well um so we did release one of the challenges called the bongard logo challenge this is inspired by the bongart puzzles from the 60s and the idea was you know if you look at the shapes they're extremely simple looking shapes right so you would think deep learning method would get 100 accuracy but the catch is it's a few short learning where there's just six positive and negative examples in each instance and each instance has a different concept for instance you know how many lines are in this image or whether the shape is convex or concave right so so many concepts like that and tens of thousands of problems to solve and so when you give these kinds of challenges even the best metal learning and all the sophisticated ai methods perform very poorly compared to humans so that shows that uh you know it doesn't have to be very challenging rich scenes even saying images it could be extremely simple shapes and even their challenging ai methods to do concept learning and reasoning is still very primitive they they're not there yet i wonder if you'd like to uh stop sharing your screen and that's a good a better uh sense of your yeah so okay yeah now it came here we are yeah so another question here says well by the way the first part of the question said uh i was asking him about your opinion on who is leading in the field right now about generalized learning and well you skip that i on purpose since i personally work on that i think my question will be inevitably biased so i believe well you know we are doing exciting work you know both at uh nvidia and at caltech especially you know talking to doris with whom i'm closely collaborating so but indeed there are many others doing very important work in this area i mean josh tannenbaum comes to mind at mit uh for instance but i don't wanna you know again i think of course that's right like there are no good names we know that you are one of the leaders of course in the field and so we are lucky to have you and i have a couple of more questions that have come in one person is asking short and sweet do you know of any interesting applications of tensors in finance uh i mean indeed for instance the tensor train model that i displayed right we used convolution because it was images but the earlier version of it was just purely for forecasting i mean forecasting is the bread and butter finance right so whether it's short-term or long-term term so that would be the most immediate use case of an existing model great and um then i have a question that says you touched on the topic of safe safety uh for example in self-driving cars and ethics for example the black box scream black box criminal evidence judges bias in training models who in your opinion should be leading the way in ensuring that we don't go in a completely dangerous and ethically wrong direction with ml and ai and what efforts are there yeah i think that's a great question and i'm always saying it should be a community-led effort right i think the tech centric view that tech can solve everything engineers are the modern day heroes we can answer every question is wrong and i think that's what has led to so many blind spots you know we should talk to policy experts psychologists anthropologists all right other areas too to come up with what who are the stakeholders i mean who is ai affecting here what is the backup right so it also requires time to think this deeply and so the silicon valley culture of move fast break things uh is really not suitable for ai you know it worked with you know scaling up internet it worked with having consumer social platforms right it worked with enterprise tools but ai is not equated to that so i see in many companies to use that same formula but ai is so much deeper than building a pro you know infrastructure or you know having like okay we've deployed this this is done right we don't need any more other than just basic maintenance it's constantly evolving uh we are so far from the final answers so i think any company that thinks of it as traditional software uh will not be around for long yeah that's uh that's a very i'm very glad you mentioned in community i i i also it reminded me of a uh the touring lecture by a ben schneiderman at the university of maryland he's a pr pioneer in human computer interaction and he was uh proponing proposing the idea of like you know there's a national safety board for national transportation safety board if there's an aviation crash you know they all come in and do the uh the studies he was proposing a sort of safety board for algorithms the idea of algorithmic accountability through monitoring constant monitoring and then also the kind of investigation after an issue had occurred so that's a very interesting uh indeed right but the hope is we won't be doing it after some catastrophic results hopefully not and it's also for you know very hard sometimes even for others you know the public to glean that it was a bad one right i mean we've seen social media now attempting to destroy democracies but so many people don't even realize that whereas a car crash on an airplane crash it's so visceral that there's something wrong here uh it really yes he was looking at three different stages now that i remember the first stage was like the the planning stage for example like we have in construction if you want to make a building you have to write your plans and you have to have those plans approved and so to think about the possibility of kind of planning out your algorithmic research product or and then having it approved by a a process is so far out of what what we experienced now uh so a person from the audience agrees on the community approach and another question about applications of tensors this time in genomics do you know of any so we've been in the past looking at the use of tensor methods for learning good topic models so topic models are you know they're also called as admixture models and genomics so they're a popular class to kind of look at right what are the underlying topics or components without having hard clustering uh requirements of separability so that would be an immediate one of an existing method but there can be so many other aspects of you know not just having genomic data but like say medical imaging and other modalities right so having that you can uh look at what is the interaction between the modalities using tensors as another application it seems that stencils are everywhere i remember studying tensors in my solid mechanics class and i used notation at caltech and it was all you know uh old-fashioned way and and now we have computations for all that so now i want to imagine how our physical models that we build uh in in my world in engineering based on pdes and the old the standard discretization methods might be [Music] re-implemented rethought re-written to take advantage of tensor algebra and tensor hardware absolutely and you're right on target there because that's another area we're working on and in fact uh we've uh you know used graph neural networks to learn neural operators and uh inspired by multiple methods how we can like have multi-level graph networks so you get the benefits of these traditional methods but with the speed up and better uh you know efficiency from deep learning so that to me is just the beginning right like how we can reimagine these scientific applications by looking at the inspiration and again some domain specific constraints from that domain but adding now the flexibility of data-driven methods and so a lot of tensor algebra is natural to those domains so i i see a lot of excitement i'm going forward there's one more minute left in our a lot of time and i don't see any questions so i'm gonna ask one more question that i have thought about for you i know that you have a background in engineering your first degree is in engineering in india i don't know how the engineering uh you know regular courses are in india but uh here i'm an engineer as well here uh the linear algebra class in an engineering classical education is for some reason historical reason that i'd like to find out is the first proof-based class so engineers don't get the feeling of the applications and all of the wonderful use usefulness of linear algebra and instead are proving theorems for the whole semester and they want to forget and never go back so how would you redesign the linear algebra class for for engineering yeah that's a great question i think linear algebra is now at the foundation of machine learning right and so uh i think having that found you know theoretical foundation is good because you need to kind of understand what rank means not just kind of and now just think of a matrix as rows and columns right and that's the main difference like people who have done theory of linear algebra well know that it's an operator i think that's a good point of view because how to design good low rank representations and have good manipulations and treating that as an end-to-end view rather than okay it's just an array makes a difference but i think integrating that course with now all these programmable tools whether it's tensorly or even just for basic linear algebra right we can we have gpu acceleration with cupi so why not just have some quick exercises you know for instance in my class at caltech um when foundations of machine learning we cover a whole range of topics and you know tensors is one small portion but ask them to write some simple programs intensely you know try out some deep learning models so i think that kind of right then there's a connection to practice so i think that's important and speak so glad you said that yeah and speaking of engineering i did electrical engineering and i'm so glad i did that uh because having signal crossing background thinking of you know there's this fourier representation of the signal right and going from there to a neural network to represent any signal it's just so natural to me in fact that's how the proof works right you first look at the fourier decomposition and then you replace each of the sinusoids with rectified linear or other units and you can uh you know make the bases adaptive so i think that was great foundation for me and now you know with the data science major and minor at celtic and even at other places it's this more comprehensive view combining cs double e uh you know other aspects of math and uh engineering coming together i think that's a much better foundation for uh you know working in ai and machine learning yeah that sounds wonderful and i'm so glad uh to hear your answer uh okay everybody i see we've just gone a little bit past the time so i'm going to stop the recording and thank you very very much anima for accepting our invitation to keynote at jupitercon 2020. we really appreciate it thank you thank you so much lorena you did so much work in the background for this and it's a great community and i hope more usage of jupiter and more contributions to project jupiter so i wish you all the best thank you thank you
Info
Channel: JupyterCon
Views: 526
Rating: 5 out of 5
Keywords:
Id: U2aqdYrJh-I
Channel Id: undefined
Length: 60min 25sec (3625 seconds)
Published: Wed Nov 18 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.