Don’t we have ChatGPT? Problems and Challenges in Machine Learning and Robotics

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
oh [Music] profession you oh I see your Kappa yes [Laughter] very important very important yes so uh how was your hiking oh uh yeah it was in Scotland it was um like uh very eventful at the same time a little boring uh so that's good [Music] that's what I mean by boring so it's good that we didn't see the Loch Ness monster but other than that it was very even oh anyway I'm sure you didn't expect to see one but you you're hoping maybe against it so we are almost set a minute left um so when does your University reopen uh Amsterdam University like we open in the in September uh for students so yeah so that's when the real teaching starts uh I would start teaching in the second quarter so that's uh mid November so I still had a little longer vacation uh away from teaching but okay there's never a vacation so Flores you don't have such things you keep doing your research am I right um yeah when was the last time I went out of Japan I just for conference in London a few months ago [Laughter] [Music] yes I think we have reached the right time I think we should maybe start off if that is okay yeah let's go yeah okay so so ladies and gentlemen good evening to those of you who are in from India and Japan and a very good morning to those who have joined us from the other parts of the world U.S and Europe I am uh Saidi pratnam I'm the chief operating officer at mic SC at iimd mizuho India Japan study center is a center of excellence at Indian Institute of Management Bangalore and we are delighted to welcome all of you to mijsc's fifth webinar on our panel discussion series which is titled amazing Samba also known as expert conversation and in the overall scheme this is the 23rd webinar in the webinar series we started this right at the peak of the Corona wire pandemic in August of 2020 this webinar actually aims to foster a discussion on a specific topic amongst the various group of uh esteemed panelists both from India and from Japan and we will share their diverse perspectives uh sometimes maybe synchronize perspectives we will see how this discussion goes with a large audience our panelists who are experts in their respective fields will provide valuable insights and what these of course these insights will have high quality information in the areas of various areas of management technology and of course society and culture before going uh into the discussion itself let me give you a very a brief introduction something about our Center as well as about the topic that we have today uh as I mentioned it's a center of excellence in IIM and was set up basically as a to increase the collaborative initiatives amongst the Academia industry policy agencies as well as the society at Large the focus of this Center is to strengthen India and Japan bonds by promoting or focusing on Academia industry and societal linkages today we are very honored to have two very distinguished panelists with us is an associate professor at the department of business Analytics University of Amsterdam he did his Masters at Indians of Science in Bangalore India and did his PhD in coordination in software development from Netherlands University of 20. his research includes research interests include business intelligence through machine learning then open source development mining of software repositories and he has applied all the analytical tools or Analytics aligned with un's sustainable development goals Professor Amrit has an extensive publication record I think over 70 research articles to his name and he serves as a department editor at IEEE transactions in Engineering Management he also coordinating he is a coordinating editor for information system Frontier Journal and he's an associate editor at Piaget computer science journal he is an active participant in academic community and he regularly chairs tracks at the European conference on information systems and many other such conferences the world over so thank you for accepting to be a part of this webinar today Dr chintan thank you for inviting me uh real pleasure to be here thank you um my the next panelist is um Dr Flores Eric he's a permanent researcher at the National Institute of advanced industrial Science and Technology shot this is located in Japan his work centers around Bridging the Gap between the virtual and physical worlds and he develops tools and techniques to model real world conditions for verifying the correct behavior of robotic systems Flores has contributed to various projects sponsored by organization like the new energy and Industrial technology development organization as well as Japan Science and Technology agency he has earned his PhD in human informative informatics and from the University of sukaba and his research Focus starting from his PhD thesis all on reactive programming using procedural parameters for end user development and operations of the robot Behavior control so very interesting background and now he is fully into a research type of environment so as I was discussing before this he doesn't have any teaching load I presume so he can focus full time on research so thank you again um Dr Flores for agreeing to be a part of this kind of discussion today thank you thank you so much it's an honor to be here [Music] problems and challenges in machine learning and Robotics so a data driven machine learning models have become pivotal in addressing real world challenges within corporate businesses nevertheless a prevalent hurdle arises and these models are often using um are having limitations with reference to the data both in terms of quality and of course some quantitative challenges as well but quality becomes an issue including issues of sample size includes challenges of scarcity noise and other challenges like being incomplete in terms of coverage whether it's only covering the aspect that we are exploring this presents a significant barrier for developing are striving to develop accurate predictive machine learning models of course chargpt is one but we are going to generalize this topic not focusing on chat GPT but on these predictive aim predictive models and what are those challenges because this is the way the world is going to go and we need to understand today uh hopefully by the end of this panel discussion we will be um getting a good idea about the nuances and nuanced distinctions between practical machine learning or NL models and their deep neural network counterparts and the Crux of this disparity lies in the differing scales of data availability while practical scenarios entail a small sparse and noisy data sets so this is the challenge I think these uh two experts in in their fields will be addressing today the central theme of the dialogue will revolve around comprehensively exploring the multi-faceted landscape of data related challenges and the panelists will unravel the inherent problems and parents that arise when dealing with such data instruments additionally they will shed light hopefully on potential strategies to effectively overcome these challenges within the context of machine learning and creating models from them and domain adaptation will be the center stage for this discussion today this strategic approach involves learning from one domain and applying these insights to another something like uh you know distinct domains which are different from each other the potential of this technique and this is a very powerful way to do it and learn quickly but the potential to do this lies in its ability to address data related obstacles and facilitates cross domain learnings and adaptations and we hope that in this presentation as well as a discussion that will follow the conversation will shift from machine learning aspects to its cousin which is application on robotics and here um hopefully Dr Flores will focus on the application of something he calls it as a generative adversial Networks so both he and Professor chintan will elucidate how his Gans or Gans can ingeniously create relevant data thereby enhancing the effectiveness of these models even when faced with limited or insufficient data so this discussion promises to offer valuable insights into the challenges posed by data limitations in machine learning Innovative strategies for mitigating these challenges and the position potential impact of Gans in the context of Robotics as well as in predictive modeling the structure for this discussion would be a 15 minutes context setting by his speaker followed by about 40 35 to 40 minutes of discussion with me as the moderator and finally we will open it for discussion with the audience so you as audience you can jot down your questions as and when they occur to you um during the presentations and put it on the chatbot then we will address them once the discussions are over so we eagerly await this thought-provoking contribution for today's webinar so may I ask Professor Clinton to start thank you thank you very much for the wonderful introduction uh yeah from about me and about the whole Topic in general in fact he did such a great job I was thinking uh there's no need for representation I'll go ahead and share my presentation that I've made I'm sorry are you pulling my leg I said no no no no I was really uh honest about it it was it was a great time uh summarizing everything that we had planned to present today so um yeah so uh much of this work uh I have to say uh is um some of it is like under review um and uh yet to be published so many working papers and much of this work has been done with uh under collaboration with PhD students uh and other researchers uh in different universities so I'll be mentioning that as we go through the slides but uh before I start uh so uh to uh make a disclaimer this uh talk is not about chat GPT so we put that in the title to attract more people to this uh I hope you're not disappointed uh rather than talk is about understanding let's say the challenges has uh Professor or Mr ratham said uh problems and challenges in Practical machine learning problems so these are problems that are actually faced by different companies and institutions educational and Commercial uh for solving their own uh business problems so that is what we are focusing on at least in my part of the talk I and um Dr Eric will talk about uh things about robotics or something similar so before uh uh I go ahead I thought I will just give a very brief introduction on what uh machine learning is about except for the part that Dr Eric will talk about later um so that is more like reinforcement learning and that's much to do with robotics but I will be looking into uh let's say standard machine learning or little more standard machine learning so on the left side here is what we call a supervised learning and this basically deals with uh label data so you already know okay which part of data so you're dealing with trying to predict whether a shape is a square a hexagon or a triangle much of them are labeled right so assuming that this is a complex task let's go with that uh but it could be anything like you're trying to predict uh if a particular patient has cancer for example or if a criminal is really a criminal Based on data about that person uh in a particular case um so let's see this uh so here we are trying to make it pretty simple and we are trying to predict the shape of this uh object and we have labels in the green below uh we have hexagon triangle square and what we basically do is we uh train a particular machine learning model by uh by feeding it uh you know the label data so uh we say okay so these are uh the labels try to learn that okay six sides basically is hexagon and try to learn that three sides a triangle and four sides of the square and then we give it non-label data like what we call test data and then see how well it has done so if it's able to predict it correctly that four sides like this right angles and all that is the square and three sides perhaps not uh equilateral triangle but something else is a triangle then you're doing well right so then we have actually learned something from the data and the supervised learning algorithm really works uh on that on the other kind of this uh let's say deals with little more complex data which is not easy to even perhaps even label uh but in this case okay for humans it's easy to label but uh when it's slightly more complex in the sense that there can be more features or more uh aspects of the data that might be very hard for humans to even uh explain right so if you have all the features then uh and they are very small and simple then yeah you have supervised learning but also sometimes you need the uh computer or the algorithm to learn by itself by seeing the data and then understand the patterns by itself and then try to predict okay what what kind of uh object it is so uh there is this thing called unsupervised learning and uh where you give it unlabeled data so we don't know okay which is a cat which is a dog and it tries to understand uh the way the pixels of the dog has arranged that it's really different from a cat and then finally you can say oh this is the output this is a bunch of cats there's a bunch of dogs because the pixels are arranged in a different way more like a dog and more like a cat I'm simplifying it uh for the sake of this one simple side slide which is I think uh I don't know 10 000 meter view of a kilometer view of the area but of the domain of machine learning but okay this kind of gives you an idea of what might give you a better idea if you didn't have it before of what what we're trying to do with supervising and supervised learning so very quickly I will now uh go into what we are doing uh in this uh in this area so these two pictures have taken from java the point uh website okay it's not generated by me um but uh from now on I'll be giving the citations of all the uh things I take from so as uh said uh the focus of this talk is more towards a wide domain is such a crucial thing in in data analysis so uh this is a nice kind of figure uh I think uh generated by uh a PhD student of mine who graduated in 2016. and basically the diagram says that the the world consists of many systems and many domains okay so what we see in that data is a filtered version of that word uh and sometimes that data that you see for a particular problem can contain important aspects of uh that that you can actually work with and sometimes this filter that you see through your data itself might not contain those important aspects uh that are so essential in making a good Model A supervised or unsupervised model like in the previous uh that like I showed you in the previous slide so uh let me explain a little more about what we mean by uh domain uh so we have something called domain intelligence this slide uh this feature on the right uh domain intelligence basically is a concept coming from this professor in Australia uh long being cow and uh and it basically includes two uh important aspects of of the of the data like you're looking at some data it has some domain knowledge and it has some prior knowledge uh the domain knowledge gives you some insight into what exactly is happening in that particular case so uh as I will show you in a few examples uh in the next few slides that uh some cases it's very important to understand why certain things are happening the way it's happened so uh you have for example uh a case where you're trying to predict the arrival time of drugs in a in a distribution center it is really important to know why those trucks are being delayed or uh you know how the driver thinks how they work uh these are the this is the inherent knowledge behind that particular case that is encapsulated in this term called to me knowledge so this contains all those uh case related aspects that are important for that particular case whereas prior knowledge is like let's say the mathematical knowledge behind applying the Machine model to that particular domain right so it can be things like uh we know that okay for this problem this is the best algorithm that works so that kind of knowledge that is purely let's say uh algorithmic or mathematical that's called a prior background knowledge so there is a distinction between these two uh Concepts the domain knowledge and the prior background knowledge and this distinction was also very well brought about in this paper that we saw from Van Gordon I'm afraid I don't have those citations here in this slide but okay if you send me an email later on I can share them with you but they are also in sorry uh they are also in this uh a working paper that uh yeah is not it uh published but is uh and uh review or under preparations of submission so so much of this uh Concepts that I'm trying to explain today uh have are like working papers so they are not like exactly peer reviewed so there's some potential for improvement in many of these Concepts so uh so let's uh start with some examples of data so here they have given you some Concepts and these are all at a very high level uh so in order to make them more clear I have some examples from actual cases uh where hopefully I will make some of these Concepts that I just explained uh more practical and more clear so one of the places where we applied these Concepts uh is AIS data where we try to understand what is going on with the data itself so AIS is this uh data that uh All Ships and barges and boats I think which want to go into any Harbor uh needs to have uh and deal with so they it's just like a transponder put on the ship uh which basically suggests its location its current location um I'd like a GPS transponder they could have used the mobile phone but they wanted a particular way in which they know that it's really a shift they are talking to or not somebody walking on the street so they have specific transponders for that so uh the devices which I don't think are very expensive uh this is called AIS kind of a transponder and I might be not using their exact technical term but bear with me and then you need a place a way to actually catch that data so we have servers and you can build your own server using a simple antenna um and the idea is that uh these antennas actually figure out uh okay you can gather the the current location of the ship or barge or whatever boat and speed of the boat and uh direction of travel and size of the boat you can get a lot of such data with the AIS but it's complete it's quite noisy so at some point when we are dealing with AIS data we saw that the ship is Inland uh on the highway so that's naturally not possible so we see that there's a lot of noise when even it comes to the GPS data that this ship actually gives so in this paper we looked into all the possible perils that you can face while dealing with uh such uh such data so that's the paper below and uh here in this right side of this picture uh or right picture uh you can see that uh we actually try to list all the parents so this is something like a man mind map so we have tried to add all the parallels into this mind map to make it more understandable uh but maybe there is more Nuance than just what this picture actually captures uh so we see that Peril number one is all this around it adds nodes right so that's a very simplified way of saying that equipment quality that is the equipment that the boat might have and the Triumph the receiver also the server has uh all this adds to to the noise in the data right so external conditions human input because a lot of AIS data is also inputted by humans uh there's also the data source that adds noise uh and much of this uh noise adding actually leads to uh incomplete tracks of the of the boat so not all the complete track is uh can be seen and and sometimes it leads to Too Much Information which is like information overload for the person trying to understand this AIS data right so from from the port for example uh and also leads to a lot of dense okay we also have a pedal that is dense traffic and uh so on so but of course there are also promises from the data so it's very nice that such uh yes data exists and it prevents actually it can be used to prevent uh vessels from colliding and even predicting what time a vessel is uh expected at a bar at a terminal sorry a power so um that is like an idea of data and and what we are trying to look at what I'm trying to look at in this in our research but the I want to distinguish between this kind of data which is kind of sensor data uh which is kind of streaming and this there can be a lot of it right so even though this data is noisy and has all these parents there is there can be a lot of this data and given that uh size of data which is very important it can kind of compensate the noise so there was a paper long ago I think by Yahoo researchers which said uh something like the incredible effectiveness of big data right so if you have large amount of data eventually you can get interesting patterns from it and you can do something with it uh that's the uh uh amazing thing about Big Data uh but much many of the problems that we are seeing in uh industry and uh uh actual practical problems they don't have this kind of big deal so just giving an example we won't go into it in much detail with when we have such uh data we can use something like a deep learning so basically deep learning is these where you have many layers uh in in a neural network and each layer does something with the data uh in this case we are pre-processing the data at one layer and then we are feeding it to another layer which is uh are trying to understand the patterns in the visually in the data that's why we use something called uh convolutional neural network and then there is this New Concept about attention I think many of you might be aware of it uh and uh many of the llms also uh use attention so this is uh and so we also use that here and then there's Dropout that basically means that some of the notes in the uh neural net uh it's not a fully connected uh neural net and okay some of the notes are not really used so we drop out the links uh and uh then we concatenate the data at some point and then we use uh what is known as lstm which is a kind of a RNN a recurrent neural network which is basically a way to even remember older data uh and because because this data is sequential so uh the the AI is data right so the GPS data over a period of time is also kind of sequential so that's why we use the concept of uh that's why we use lstm to remember the previous location of the ship so that we can easily predict the next location we concatenate all this uh use a standard feed forward and then we with all this all these layers we can predict the arrival time again this is I won't go into much detail in this but this is possible when the data is large uh but what we see in many of our practical projects is that uh the data is not large so data is small and this is where the real problems come uh when when the data is smaller so here uh we did a project with a world food program and uh uh in the Sahel region of Africa we try to figure out how much uh what is the extent of malnutrition uh faced by children below five years um and might be a bit wrong there I think it's around six months uh so uh basically what we do is we look at uh take the burden of wasting what is called wasting and wasting is basically this medical condition where uh the the width of the diameter of the arm uh of the upper arm is taken and to determine if the uh there is uh when the child is actually eating food whether the food is actually absorbed by the body or is going through without being absorbed so this is a condition called wasting and uh the idea was to identify the burden the extent of child uh acute malnutrition in the sahil region of Africa so sahil region as I will show you in this figure it is made up of all these countries which actually have a lot of food insecurity uh because of because the Sahara Desert is on top they get all these hot winds and nothing much gross in this place uh they have incredible amount of food insecurity and uh because of that uh world food program needs to actually uh deliver food to many of these countries over a period of time but also they have a lot of walls and uh uh other situations that actually uh increase this food insecurity leading to acute child malnutrition so uh what we have seen is this is actually over a period of time this is 2019 and some of this could be also because of the data was not available and some of it is also because of kovid that we have seen that the amount of acute malnutrition is actually risen uh has increased from 2019 to 2021. this is where we had the last data set from uh uh so if I'm uh so 2019 2022 I think so uh so there we have uh 19 20 21 sorry three years and uh as we see here so I think this is 19 this is 20 and this is 21. um it has increased the and the more that this these countries are that's the higher the amount of the extent of burden of uh of China nutrition is right after that for that particular country so these are the countries and this is the kind of uh Hotspot data and uh what we did with this is we tried to predict uh the possibility of actually uh figuring out or predicting uh the extent of child malnutrition uh given the data of one year can we do it for the next year so given 19 2019 2020 data can we actually credit for 2021 and then see check with 2021 data how well we did how well our algorithms did the use of this is that uh potentially virtual program could use this uh in future to actually divert its Supply chains to send it to the countries where uh there is higher child malnutrition compared to the other countries right so at the hot spot send it to the workforce so that is the idea and uh or it's no it's actually uh it's slightly different from food itself so they have to it's kind of the medical condition so they need to maybe send uh more doctors or to that region right to handle this particular thing so uh so what we did here was uh we came up with two ways in which we can handle the data we have used uh supervised methods like I showed you in the first slide and unsupervised methods and then we hope that we uh arrived at a decision support system um and again this is uh under preparation to submit to a journal so this is not peer reviewed yet uh not entirely and what we saw here was that uh like I said small data so we had very little data for each of the years and how do you deal with it right so uh when we even saw the ground truth in the data so ground truth is basically the burden data that I'm talking about so the extent of uh the number of children with uh with child wasting uh acute child malnutrition um we see that the ground truth itself could be noisy so in this case that means the burden information that we get from them uh through many means they use surveys to collect this information that itself could have some loans so now what do we do with it so that is the question so how do we figure out it has noise when we actually predict it we get very low uh even different methods we use different machine learning methods we get very low direct accuracy and if you use one away accuracy so we don't exactly predict the uh the high medium low so we have three levels we don't exactly predict the right level but we we are okay with predicting a level one away from that level so we don't mind if it's low and medium did together then we are okay this is go ahead right so maybe there is so that brought us into thinking into maybe because much of the data doesn't seem very noisy by looking by analyzing it maybe the ground truth itself was nice so uh what we got you know the the actual child malnutrition data the the target variable so to say that is the ground rule now suppose we generate the granted by us ourselves so what we did here was we use unsupervised clustering it's a simple uh k-means clustering which is uh way in which we uh cluster data based on how far the data is uh from each other so the distance kind of metric so if the data is very close to each other they belong to one cluster right so pretty much the English understanding of clustering so when we cluster the data based on uh the way the the distance of the data points to each other and then use this generated ground truth we see that our accuracies go up of course you might be thinking hey this is kind of cheating because okay you use the same data to generate the dependent variable and then you're using the same data to uh predict that particular independent or the dependent variable that you just created but we of course manage this by using different parts of the data right one part was to a generating and in another part with testing so uh that way we go around that problem of circularity in logic so uh so that is what we've done and then we see that okay the results improve because the ground proof itself could be uh could be noisy in this case so that's one way maybe in which you can deal with noisy graph data or by the Target variable itself is noisy uh I'll go into something else so this is uh slightly different domain like different area uh going from medical uh child health to supply chain uh here we were trying to predict the arrival time of drugs and ships so this on the right is barges and chips like I showed you earlier with the the AIS data but here we are we were dealing with not very large amounts of data because uh when we have to get data from actual uh terminals where trucks go right uh they don't have a very good data handling policy so they don't have like data lakes or large data warehouses what they do is they keep data in spreadsheets so now we are dealing with spreadsheets just like we dealt with uh uh these large organizations like Uno like world food program and so on here again in terminals they have data in uh spreadsheets rather than large data warehouses so now again they're dealing with not very large amounts of data so in this case we used whether traffic driving style cargo planning distance truck trailer whether it had a trailer all these variables what we call features to determine uh the whether we can actually predict the arrival time of trucks to uh one of these distribution centers so uh this sorry this is this looks rather large and complex uh when we try different combinations of these variables right so we we used uh okay maybe some features are more important than the other features to predict the arrival time of drugs so if you see combinations of this and then we try combinations of our dependent variable by by binning it by saying that okay we are only interested in a three hour window so if the truck can either be late on time or or too early like sorry too early on time or very late late so we win it into three uh classes and then we do that so many combinations like this and but we still see that the the prediction accuracy with different algorithms that we tried which is on the columns is not very much right so uh even if we take uh uh three kind of uh classes if you're not doing very well here right so uh uh they're not super accurate these algorithms and then we try to figure out okay why this could be happening of course the data is now you see so we have to take that into account but there could be another thing so when we try to understand the domain about why uh these trucks were actually there were so many trucks that were actually late and I don't uh yeah here we see the distribution of the drugs uh very few trucks are really on time they are uh either early or they are late uh in this kind of distribution right so uh uh so when we actually try to figure out okay we asked them uh why they were coming late and some of them said that there was no actual incentive to come on time because if they are too early actually they're too late they have a they get very nice treatment so they get a nice warm cup of coffee and then they can chit chat with their uh with their other fellow truckers it's easier for them to actually do this uh rather than being exactly on time and then they're rushed through the through the terminal so understanding the domain uh gives us more insight into why these predictive algorithms don't really work uh especially when the data is uh small so when I mean what I mean by domain understanding is this concept like these human Concepts like the truckers were not really incentivized to come on time uh and based on that uh what sure did of C came up with is rather let's say a complex uh data mining uh method which basically boils down to this how much uh domain in knowledge can you get while doing while making the predictive algorithm so more domain knowledge you can get uh and more theories you can create so that's stage two here uh based on stage one the better would you would your predictive model eventually become okay so that is basically again a very high level view of This research uh just to keep time into account and the good thing about this research was okay he also applied it uh to cases where uh we don't use this method and in the cases where we use this method so this is again in a terminal we are trying to see the turnaround time of the trucks and when we apply this method and when we don't apply this method we see a clear difference uh so that is what this slide is about I won't go into the details here uh but if you apply this this long drawn method and get all the domain knowledge that you need a predictive model performs better again stating that that idea that when you're dealing with small amounts of data the domain knowledge is very critical and very important so uh so that's what these slides are about these are the results and you see that S6 is where the domain knowledge is really used and here with S6 we get better results than the others uh so we took took this whole concept about learning uh so learning from the domain so when is the domain knowledge really important so that is the that's the core idea now uh and do we use it in the selection data selection process so this firstly I have to explain that this flow that you see here is the typical uh what is called a kdd uh knowledge knowledge Discovery from databases right so this is the this is like the first uh model process model that people came up with or or actually uh that that researchers came up with when to deal with machine learning or any kind of data analysis so the question is at which point is the domain information more important and sell it data selection data pre-processing data transformation or actually the data mining right so uh what we are thinking right now is what about the validation right so do we also use domain knowledge there in the validation so uh what we're thinking now is uh how so this point about uh with where this domain knowledge really require is also kind of related with uh can we actually learn with an existing predictive model that has been developed so someone has made a predictive model for that particular domain so let's say someone has made a predictive model for arrival time of trucks in this distribution center and we are trying to figure out uh the uh whether we can create another predictive model for arrival time of drugs at a completely different distribution center right which is different country maybe different uh uh kind of truckers different kinds of trucks different traffic conditions different weather whatever so that would be the same domain truck arrival in a different setting different case setting right so now the question is can we actually learn so that way if you have very noisy data in another domain uh sorry in in the same domain but in another case in a different country or whatever Having learned the important variables that are required from this particular uh one predictor model that we create we can use that same knowledge in in this new predictive model okay so that is the whole idea here uh in what we call as cross domain analysis and where we learn from one predict predictive model and apply it to another so this again is done under collaboration with uh the PHD student with his name is Maya Ghosh myself under uh the co-supervisor working and uh here we see that uh uh we can learn right so we figured out that uh uh that this is possible that we can learn from one predictive model and apply it to another and we have done that with uh drugs arriving at two different distribution centers in Netherlands uh where we for one truck we used traffic and weather data and for the other the traffic and weather data were not so good but could we learn that this particular traffic data is more influential to the predictive model than this weather data and so on right so even if you have noisy data can we learn from a predictive model in one domain and apply to another that's what we call as cross-domain learning and I think uh florists would also explain something similar in robotics later on uh but uh that's the idea and uh so this is our overall uh model that we have thought about so one is just to explain how different are the two domain seconds uh compared to each other we have something called domain complexity analysis where we uh we we come up with like a metric how can we differentiate these two domains that we are trying to compare with right and then the modeling process itself has a predictive score for each domain with different uh probably similar factors because drugs you'll always try to predict drugs based on whether road conditions traffic uh type of truck size of wheel I don't know uh type of driver driving way these kinds of things right so those factors will never change uh what will change is the country it's driving the traffic conditions of the country the weather in that country and so on and that may be The Driver behave here in that particular country or state or location or whatever so uh so that's why we can actually Leverage The insights gained from one uh you know creating a predictor model in one domain and apply it to another so that is what we have done and we did it to uh what's called Road feeder service in um in Europe and uh distribution centers in uh in in in Netherlands and uh this is just a uh making it more clear the same model and we saw that uh they they we can't figure out the differences uh more clearly but also try to we can determine which are the important uh features by actually comparing the results of these uh analysis and we here we have just used a simple Anova to compare the results of these predictor models so uh we won't go into the details here just to let you know that this is possible I think I'm running out of time so I'll go to the key insights perhaps from this really long presentation which was meant for 15 minutes but overshot the time uh an apologies for that uh the one key Insight I hope was clear was that domain understanding is very important for the particular case we are creating the predictive model for the type of machine learning algorithm is important so not just for analysis so there's lots of papers and already lots of discussion that okay where is which kind of uh predictor model is important uh or uh useful so some people say oh uh live GPM is great and then you have like random Forest is also good like a nice Baseline so you have many predictive models the question is uh have you actually seen the data that you're dealing with so it's a small data naturally uh you can't use very you know sophisticated deep learning algorithms uh you need to use uh you know like uh more traditional algorithms and then you need maybe when you're dealing with small data data your ground truth itself could be noisy so that's an important consideration right so you have to check uh the truthfulness of the ground rules is it is it really reliable itself right and then maybe uh apply different techniques to actually check that and uh one of that would be unsupervised learning as I showed you in the world program is and then maybe these techniques help so one is uh complexity of a data analysis where we do very sophisticated uh analysis of trying to understand what the domain is talking about and we do this by by interviewing people uh interviewing talkers or in the case of arrival time of drugs or interviewing uh interviewing uh the the Distribution Center people for people working there and the other one could be this comparing predictive models across domain so maybe that's also useful uh we can gain insights from one predictive model and apply those insights to another so that was about it I I know I also at the time so apologies for that again uh and uh if you want more information about anything that I presented you could always send me an email so that was about it and I will stop uh sharing my screen now oh thank you I think your passion for data and he's so obvious and uh what you have done is also not just the dated or information that is uh models that are available or data and available but you have also shared more or less state of the art because some of them are in the pipeline for getting publishing so thank you for sharing this I think many have benefited from your overview thank you so your pardoned for overshooting because you have kept their attention very riveted thank you may I uh request Dr Flores to share his thoughts on this subject or his research yes can I see my screen Okay so thank you so much for asking to talk at this seminar I will be talking mostly about robotics um so to start off let's think a bit about what makes a robot intelligent um in the past Century we've had great success with industrial robots and if I asked industrial robots can you make me a sandwich well if I pre-programmed it their industry robots can make maybe a thousand centers per hour but an intelligence robots in my home will face a lot of different issues so first of all the robot might ask itself what is a sandwich and what kind of sandwich does the user want me to make what goes into a sandwich and how can we compost ingredients of the sandwich and also where should I make the sandwich and when should it be ready so many of these questions require the user to have a dialogue with the robots the robot needs to understand words it needs to understand where things are located and which which which ingredients we have in the house how to navigate through the house how to manipulate objects of high precision how to reason about tasks and how to handle dangerous objects such as knives and actually we don't want to pre-program the robots to do this because there's so many different tasks you want to able to do we cannot reprogram all the tasks um so I think this is the challenge of the 21st century how to make incentives robots which can handle kind of simple tasks like this in a way that's uh is acceptable so I quickly want to use our Research Center here at AST we're doing research into industrial cyber physical systems and we're actually focused mostly on the societal issue of Aging so this is a very leading issue for Japan um yeah our working population is expected decline from 87 million to 60 million so that's a massive decline in working population and this will lead to labor shortages and a lot of difficulties in passing on technology and know-how um and our Center is focused on improving situation and the way we do this is by increasing the productivity per person and also increasing the number of production workers um and also efficiently transferring uh know how from person to person and what we do in our Center is that we have different test beds to validate the adoption of Technologies and so we focus on factories Logistics drug discovery these are Industries which are labor intensive so there can be a big benefit of introducing robotics um and I'm personally in the automation research team which is mostly focused on the retail store and Factory scenarios so let's get deeper into some details um I will present some work which I presented earlier this year at the iPad conference this is one of the top conference in robotics and this work is on learning how to complete the depth of transparent objects using augmented and pairs data so for robots it's very useful to have a depth map of the environment because to allow the robot to estimate how far it is away from objects and also how to grasp objects however when we have transparent objects we Face various difficulties um at transparent depths map might contain missing that pixels it might report a background that instead of the foreground depth and it might report noisy.pixels for transparent objects um so in this image the left and middle picture is what the robot might see but the rightmost picture is actually how we want the depth map to look like so in our research we explore how to create a data set of augmented objects to do not exhibit diplomatic characteristics we do this by applying Stone spray to glasses so this solves at least the depth detection problem but of course in the real world we cannot apply Stones rate to all our classes so instead we want to collect eight sets from the real environments next we will collect two data sets one is the date set of the original transparent objects one is a data set of the opaque objects so that's the the objects the tree spray painted and also we will collect some small uh testing validation data sets which will actually have bare data but the training has to be done on unpaired training data sets So In This research we try to find the network design and trading process that can learn to complete depth of transparent objects using only unpaired data so here I show our collection methods um yeah we just roughly placed the objects in a dishwasher and we collect the data with a depth sensor if this method we can collect around 500 samples per minute and this way we collected 30 000 different samples so for each domain we have 30 000 samples which means 60 different scenes which we recorded dates of and also we collect a small validation data set and test data sets which you collect by manually aligning the objects um so first you place the opaque object and then we um align into the transparent object so to get the Accurate Alignment it might take a minute for generating one single sample [Music] um so now we have an Empire training data set but we also need a methods of training using these in paired data sets and for this we are using a technique called cycle gone so cyclic genitive as general Networks in cyclogon we have one generator which transforms elements from the main ax to elements that could statistically have been drawn from domain y and also we introduce an inverse generator and we can use this inverse generator to enforce structural consistency and we also have discriminators which learn how to separate generates elements from samples elements so by requiring the generates elements to be reversible to the inputs data we encourage the generators not to forget um the details from the input data and also not to hallucinate new details I saw someone arrested hands but we look questions afterwards or yeah yeah you you carry on okay we will put it in the chat box okay thank you so we apply this technique to two different modalities the first modality is step to depth where we have a death map was in Boots and we want to generate a new death map and rgbt to rgbd where we have a depth map and the color image as inputs and also convert this to a new completed depth map and a core image and by applying our Technique we could get these results so in the first two columns we show the input color and input depth in the First Column we show the results of using arm assets and these are the well the best free samples we could get in ideal cases an error of around 2.5 centimeters um on the fourth column we can find a safety art method called clear graphs and the fifth method is uh statistical methods and then the final column is the ground truth depth so if you compare our results of the ground truth depth you can see that in many cases our methods produces uh an accurate that map we can also Analyze This constantively um by using the foundation and test sets which we recorded in this case you can see that our method has an error of rounds of four centimeters on average which improves upon the state of the Arts method clear grasp which has a 5.7 centimeter error and also it's better in the statistical technique and this is video of applying our methods first we select Which object we want to grasp first um we segment this object out from the scene and then we use our method to estimate the depth of the object and then we use this step to generate a grasp so using this method we can unload the whole dishwasher um with much higher accuracy than other methods so I want to talk a bit about how to combine vision and language in the age of foundation models so Foundation models are new models that are trained on large well very large training data sets and promise to generalize to scenes which they were not originally trained on and maybe the first of these methods is called clip um which stands for contrastive language an image processing um so with clip we train a text encoder which encodes a text string and we train an image encoder which we give image images or image patches and we want the encodings of the images and the text to be close together in space if they're actually matching and for cases where the text and the image do not match we want them to be or the encodings to be far away in space so the great thing about this is that now we can query an image using text strings we can check if the image is actually matching with the text string we can generate a text string for an image and also we can combine this kind of technique with like the fusion models to generate new images based on our text strings so clip works on whole images um but in our case we're often interested in identifying parts of an image so one way you can do this is by creating a patch-based fission language embedding so the way we construct this is by generating a feature pyramid of an image and convolving this feature pyramids throughout the image so that's for each pixel image we get a language embedding so yeah you can see some examples of Patches at different sizes um yeah the reason why we take different size of patches is because depending on the size of the patch we might see an entirely different scene so the first patch might only show us the the top of the bottle whereas the the third match shows us like a much larger scene with multiple objects in it um so depending on which part of the image we are generating embeddings for we might need a bigger or smaller patch size so we're creating these pyramids we make sure that we cover all the different sizes or at least a variety of sizes of the image we can use this technique for creating layers and weddings for the whole scene and then we can query the scene using the language embeddings for example we can look for where the image is the trail located and we generate these kind of heat maps showing where the feature the image features most closely match the text features and you can see that it's quite clearly shows where this is relocated it also works for objects like soft scrub which is this uh dishwashing washing detergent or finding the master in image finding the potato chips or even finding like Pringles so we can switch between using either brand names or category names for some objects it might not work so well with the brand name for example French's doesn't accurately find the the French is mustard I think this is because yeah the words might be very confusing um of course France's country so the network can get confused by this kind of term so this allows us to create image embeddings for 2D space but also we want to go to 3D space um to do this we are exploring a technique called neural Radiance fields and our radius Fields basically insulate the different image viewpoints So within our convenience store we recorded a video of one of the shelves filled with products and then we sample different images from this video and we can interpolate between these images to create what seems like a 3D model of the Shelf however it should be noted none of these objects are actually 3D models um so this is a very easy technique or cheap technique to generate a data set of a 3D scene without having to manually model all the objects in the scene now we can combine a nerve representation of a scene with clip embeddings creating something called language embedded Radiance fields and this was a paper published this year at iccv um and now we can query the scene using text string text strings like utensils to find the utensils in the scene or wooden spoon to find a specific utensil you can even use F6 queries like where can I find electricity we can read text strings in the scene and we can find very specific objects like the blue dish soap or the Waldo figure or the paper towel roll and you can even query the scene for visual properties like okay what in the scene is yellow we apply this also to our convenience store so we can query for for example cup noodle and it would show us where in the scene we can find all the cup noodles or rare can we find the Pringles or where can we find the macadamia we can also combine a model like clip with segment anything to first segment the image into different image patches and then find the objects in this scene and in contrast with our previous methods um where we convolute the entire scene in this case we can find specific objects in the scene for example finding out where it's most likely to drill located finding the frenches actually in this case it works better than the convolution case finding where's the mustards where the potato chips where are the Pringles and where's the soft scrub um so yeah by combining these Foundation models we can get very good results and I think one important I think I should mention about the foundation models is that we don't train them at all at our site we just take a pre-trained network and we can combine different pre-trained networks to get useful results like for example in this case to locate objects in a scene so what does the future holds well new technology sector 3pt allow us to control robots through voice and dialogue and also New Foundation models allow robots to perceive the world to understand the relationship between the vision and language and also models like GPT allow robots to reason about complex tasks and I think these kind of uh new developments will really help us to realize our dream of having robots that autonomously learn adapts to their environment evolving intelligence and act alongside human beings which is the goal for our Moonstone research project we want to be able to have intellectual robots walk with us by 2050 so I would like to thank my colleagues at the AST and yeah let's start some interesting discussion interestingly sort of contrasted if I'm gonna say because one was focusing on learning machine learning and the other was saying even without my teaching it can do some tasks I mean through a process of learning of course but and the success stories that you have presented I mean very simplistic way has given us I think the audience a view of uh what the future is in store for us making everyday life much simpler I think especially you are saying about Japanese Society being aging and I think you can sit or lie down and Order the Pringles to come I don't know where the Pringles is a right example but yes I think noodles may be better yeah okay yeah thank you I think that's a thanks for sharing the overview of both aspects so if I may address the sort of first question to on the you did mention about how your key insights about managing data so I will I will put it in this my first question is primarily to anything from you what are the primary challenges faced in managing data for impactful um say AI models and what type of strategies for one or two strategies you did mention about what did you say I think it is complexity awareness is what you said so what are the strategies to overcome these challenges so my two part question is what are the primary challenges in managing data to create a impactful AI and what are the strategies to overcome this thank you that's a very interesting question and uh may be very useful for practitioners so I think the main challenge is to understand the key insights from the domain uh so what uh many companies generally do and many institutions need to do is uh gather all relevant important data that they need uh and uh manage it in a in a very sustainable way so uh typically that could mean that they have multiple databases uh and they could uh store it in one large data warehouse which they could use for analysis or nowadays they could use data Lakes I think uh but this needs to be done in continuously and in such a way that it's not an excess of a burden to uh to the company when they won't analyze their data uh the question here uh the concept is ahadi's set of features that or or this data are these data sufficient for the Target that they have right so if they want to predict maybe uh the I don't know the profit or the turnover uh are these data that they have actually gathered is this sufficient to actually make a good predictive model or did they lose out by not collecting some other data that was lying around uh so that is always going to be a challenge so that is what we try to do with this uh with this cross-domain analysis actually with both things like one is the complex aware data analytic method the so suppose you approach this as someone who's not working in that company or that Institute or dealing with that particular domain then you're coming in as an outsider and you really need to understand what the experts themselves know I think that's the essential uh first step in uh in understanding the domain so this domain understanding of any complex domain now if you have a very simple domain for example uh a vacuum chamber where you want to know okay what happens when you increase the temperature yeah the the pressure would decrease or the volume will increase right so of the gas so that is a very closed uh environment which is I would say low in complexity but many many physicists might disagree they would say this is also has high complexity but any domain which is in the real world has a higher complexity in general okay that's why many of these economic models don't really work the way they're supposed to work because they were dealing with a complex domain and uh in some cases even medicines don't work on a human body because again it's a complex domain so uh there are too many variables here and understanding what are the critical important variables takes a lot of time and a lot of effort so that is where this complex aware data analytic method might help is getting those insights from the relevant experts because it's almost impossible to gather that as an outsider without having much of an exposure to The Domain and then the next step is can we actually learn from an existing so someone might have tried this in the past on that same domain so they might have built a predictive model on that uh can we actually gain insights from that particular predictive model and apply it to uh our particular problems in your question right in question so uh that will be what I explained is uh what we call the cross domain analysis where we try to learn from one particular instance of a predictive model that has been created or many instances maybe can we actually learn which features were used and how important those features were so that we look for something similar in this domain and if it's not available um you did give one or two examples earlier about this cross domain learning applications and the power of that can you give another um maybe for a deeper understanding of of the subject of cross domain from your vast experience in the speed can you give one or two examples of how this has helped the outcomes to become better cross domain learning yeah so that's uh so what we are currently doing is okay this is again work in progress so this uh whole cross domain idea uh comes from the number of extensively made machine learning models that already exist right so the idea is to leverage that uh when you're trying to create a new model so uh what we're currently doing is we are applying it in the area of supply chain uh again uh in the arrival time of trucks which seems quite important for many of these distribution centers uh when they are dealing with uh yeah Logistics supply chain they want to know when the trucks will come so that the or when the ships would come so that the trucks would be ready so that the the Dargo is loaded and unloaded quickly I'm most efficiently so right now we are just looking at it in this particular let's say case our domain of uh arrival time of drugs uh but we could potentially uh even use it for example in robotics uh like what uh florist is doing in his research uh and compare any uh area where there are of course data challenges and it's a complex domain where okay we have people managing trucks people managing uh ships uh who are sailing and also people managing in the the whole process in the Distribution Center so whenever there are these humans actually involved in that whole process then you bring in a certain level of complexity because uh things are never as straightforward as when maybe uh simulation or when you have robots interacting with each other which can be kind of pre-programmed though you can indeed bring in some complexity there but the human complexity is a different level of complexity so uh so we whenever whenever we see a predictive model in a complex domain there's a chance that you can actually learn from that maybe those critical insights and in our case one of the insights we got from the predictive uh model for arrival time of trucks was that some of these drivers were not even incentivized to come on time so sometimes they if they arrive late or even early they are they get more uh better treatment in the Distribution Center than when they actually arrive on time so this is the kind of a thing that you actually understand uh when you talk to these drivers or talk to the uh people in the Distribution Center to understand what's happening in the domain um and these insights could perhaps be used like we also tried to use now in a new model uh where maybe some aspects of the domain are missing so for example maybe we don't have uh in the new model uh we don't have any information of the traffic at this particular granularity so in the first model we had traffic at uh maybe uh hourly level and the second model we have only at the daily level now this the very fact that one of the features is so different brings in some element of noise or into our model so we need to take that into account now if it's possible to get the traffic data and at an hourly level uh and it just takes a little more effort to do that probably this cross domain analysis would tell us maybe that's the way to go to improve your model so hourly level data would make much of a difference because we have already seen that in this uh in this predictive model this the same way if you don't have weather uh data at you know daily level you have whether data only uh uh at maybe a weekly level maybe daily level whether a data makes a big difference because if it's raining heavily the trucks would be delayed more on that particular day then if it rained heavily over the period of the week like uh the centimeters or whatever whatever so uh so this kind of understanding that we can gain from understanding uh or or analyzing one particular case uh can be leveraged when we are trying to create a predictive model for uh the same domain which will be arrival time of drugs but in a different setting so typically you could use that in many different cases right so for uh predict for creating a predictive model which in any complex domain where it's not very clear what is that effect of that particular feature in this uh in in creating the predictor model so that is that is where I think this cross-domain analysis has is is most effective okay I got it um just one question uh you did mention Flores about the employing in a very broad way generative adversarial Networks uh can you throw a little more light how this can help generate relevant data for the AI models in in your area which is robotics and what are the potential applications which will benefit Society in engine right um so yeah I just showed you I showed you an application of using cyclecom um Cyclone is kind of a specific way of applying gum um usually when you have a gun you would have some supervised data and it's one way you can use it for example is to make a simulation look more realistic for example I might make a simulation of a task which looks very basic like very low quality computer graphics but I also record um like real scene data of a task and then I use a gun to learn how to transfer from one domain to your domain so from the estimate domain to the the fertilistic domain um but this is a great way to create data sets for computer vision and that's one thing which I've seen it being applied mostly um yeah I I think like this kind of gun technique it's used mostly in computer vision domain um so to generate New Vision samples um yeah there's some studies which apply it like to to other like lower dimensional data sets um but yeah it's really mostly used in a computer vision domain um and also you can use a gum not just to generate like data for existing data you can also use guns to generate a whole new data um so in the example I showed you we take as inputs rgd data of the transparent objects and we generate the rfg data for the opaque objects you can replace the input actually with a random input Vector um so we just Generate random inputs and we use the ground to generate outputs and actually this the the traditional way in which scans were used and it's allows to generate basically novel data just by training it on existing data um and then various researchers have tried to find out if I have a gun trained on random data can I somehow make the data process or data generation process controllable um so for example they introduce sliders to control aspects of the scene like for example you can generate new images of cars and you have a slider which allows to change the color of the car or slider which change the size of the car um maybe you can configure the amount of Windows the car has um a number of wheels for example um so yeah this technique has been used a lot of a lot in artistic applications but we're also applying it in in more like task specific applications like robotics I think I have given us some idea about It's relatively complex subject thank you just uh you did mention about the Deep learning aspects and can you throw a little more light about the role of what I think is called Deep neural networks in handling large data sets and how do how do these Networks help extract meaningful insights because you did mention that it can become you know very uh simplistic or are too broad so how do you get insights and patterns from this large data sets and what are the implications of this in creating uh models using these large datasets because chat GPT is one such example but can you throw a little more light on that uh you are on mute yeah uh so uh we have applied uh deep neural Nets uh on a couple of projects and uh what we have uh so one of the main challenges like you mentioned is uh making it more understandable uh and that is always going to be so this whole thing about explainability is uh yeah is a I think almost a field or it is a field uh explainability AI uh you know is the field nowadays and uh it's it's a it is a challenge uh making predictive models especially with uh deep neural Nets which basically involves uh multiple layers of neural Nets uh each uh there okay some of some of it is also engineered so that's one thing that uh uh human I didn't uh know until uh we start you know reading the book on uh how these neural networks really work or how deep learning really works uh for some uh tweaks of the neural net it's not obvious why it should be there uh why that particular layer needs to be in this form or why the next one should be in this it's a little bit engineered so what people have seen over a period of time is this layer following this one the way it's connected and the way you got to do it it works the best for this kind of data okay so it's kind of uh something that these insights have been gained over a period of time and uh and that's how these architectures have developed uh and and it's not like suppose you have this particular data set oh this is the right architecture for it that's not very evident uh currently right so it's a little bit of uh uh you know cyclical uh uh doing and then figuring out okay maybe I can tweak it better this way I mean that's all for all machine learning but also uh especially true in deep neural Nets uh it's not a given that a particular architecture would work best with some kind of data uh having said that I think this uh whole thing about explainability maybe even sorris can join in uh there are certain techniques in which we can uh try to you know figure out how this uh model is actually working I think these uh techniques might be even easier with uh large language models than uh with uh trying to figure out other kind of structured data but some of these techniques like uh are there which try to uh uh say okay this this is the results we bought from this uh from this model can we actually approximate uh that particular result by using a more understandable model like a like a linear model or something like that and on that when we do that we can actually figure out okay which features have what kind of effect on that particular variable so suppose we take a kind of a approximate it with more like linear models uh is that does that make it more clear that is one kind of approach the other approach to or other family of approaches towards making models more understandable is uh using kind of uh kind of a game theoretic understanding uh I think these are called shapely values uh where uh they see okay these are the features and this is the effect that particular feature would have on the on the final model uh and then we can actually compare the effect of different features while removing the feature or adding that feature and this this total effect can be actually visualized very nicely and we can actually see okay which features but how much effect on your uh on your target variable uh of course there could be other techniques also uh one of the ways in which in our department we are trying to uh make more explainable models is by uh not approximating it with uh linear models but more with decision trees so they are busy with uh trying to make uh very let's say uh a little more complex decision trees where at the same time the idea the reasoning is that decision trees are easy to understand because they are like this whole bunch of if then else statements so if this or not this I don't know you go to this Branch or this Branch uh and that makes understandability of models uh better so uh so that is also an approach so I'm just giving you a very high level ideas about how uh understandability can be improved um I think I answered your question but I'm not sure and to both of you uh maybe this is the last question or last but one but uh what we are wanting to understand because you did say that we really do not know whether a particular in that model a particular step is good or not good but it is giving you a certain desire so does that mean that the optimization models is it that this is not really an Optimum model but it could be a sub-optimal one or how does one really work out that this particular model is best for that particular data set okay yeah um actually the the whole the large challenge I think in machine learning is that we have a data set which you train on and this data set actually is just a representation of the real data we want to apply the model on so I think we can build networks which are completely optimized for our training data um the only problem is we don't want them to be optimized for the trading dates we want to be optimized for the whole data set right the whole world State set so any any kind of data we might feed to them we want to get a good result so actually during trading we are not looking for the optimal model for the training data um we're trying to find a model which works well on the training data but also generalizes well to unseen data and actually yeah by definition this model will not be the fully optimized model for the training data do you understand yeah I think uh uh uh please just call me uh uh I think uh uh the uh the important thing here is that if you uh take everything every part of your training data and uh uh and use it uh and uh so the the there's a there's a risk of being of a model being over trained in the sense that it knows every aspect of uh the training Set uh and it might not perform very well with your testing set so this whole thing in machine learning about uh bias and variance so how biased is your model and how much uh how how much variance it has so uh and and that's kind of related to the concept of uh over training your model or uh let's say even uh Sunday training your model uh so if it's over trained it's kind of more biased so it really says okay I know everything about this and then you know that is related to bias by the way it's not entirely that uh uh and okay that this is this is how it should be it should work uh and that's why uh the we can't we can't be really sure because we use the part of your data for the training and then apart for testing and maybe a part for validation especially in deep neural Nets uh one can't be really sure that okay this is the exact Optimum model that we have uh what we can see is compared to this Baseline model we are performing better so it's always these small improvements that people make and that's why it's very important to establish a good Baseline and sometimes they use very simplistic understandable models for the Baseline and in fact talking about understandable models uh many companies they don't really like a very complex uh deep neural Nets because uh not just about making that connect uh deep neural net but also in maintaining it and even run time uh it's it is it is quite heavy on uh on on server space and so on so uh so what they prefer is a model that is more understandable uh and maybe with fewer layers uh which they can handle in a better way so uh that's what many organizations go for so sometimes a simplistic model or very simple model can be better in in a given setting than a more complex model I think this is a like a general uh let's say rule of thumb just uh one uh last question if I may because we are slightly um how does one actually know is there some Metric by which you can say this amount of training is adequate whether it is a robot we are talking about or an app we are talking about how do you know it is under trained or over trained when do you cut off your training well one benefit you have is that when you train a model you can store previous models previous versions of the model um so basically you can train your model until it starts to overfit the data so it starts to perform better on the training dates and on the validation or test data um and then you just take the previous model which you saved and use that as your final model um so yeah it's not a huge problem to pick the the best performing model um and usually I think you aim to overfit your model at first and actually this is a nice hint that actually you can make your model deeper if it starts to overfit yeah wonderful thank you gentlemen both of you for joining today's session and it has given us I would say a state of Art uh uh on all the things that are happening um as we discussed as you explained as a disclaimer at the beginning chat GPT Main be a sort of an Overkill but yes it does play about the large data sets and chat GPT does use large data sets whereas we use in more practical day-to-day we use small data sets like a truck example or other examples that you gave uh data sets are limited but in this limited data sets noisy data can be addressed and that is very refreshing to hear and you guys are working on trying to clean up this and addressing these quick uh no I won't say quick addressing them in a systematic manner so that overall uh Society at Large will have confidence in these type of uh these type of systems so thank you very much again and just one point I wanted to um bring to your notice that we at this Center we are working on a working paper series based on these webinars so for limited uh for the due to the time constraints we have discussed very limited number of topics on build but we would like to explore a little more in depth on these working papers and we request your involvement in producing these working papers which will again benefit not only the practitioners I mean not only the academic suits but also the practitioners both ways I was about to put it the other way but first both of you are academics so both ways it should help us to translate so I hope you will agree to continue to work with this Excellence Center of Excellence that I am thank you oh yeah thank you for having us thank you for having us been a real honor and a pleasure thank you bye-bye
Info
Channel: IIM Bangalore
Views: 1,845
Rating: undefined out of 5
Keywords:
Id: nnr810kYG9Q
Channel Id: undefined
Length: 103min 30sec (6210 seconds)
Published: Fri Aug 11 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.