Applications of Machine Learning in the Supply Chain

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
today we have the pleasure of I think with us a good friend of mine professor Sebastian Caputo from the school of industrial and systems engineering is he's got the background that there's many facets one I'll mention this one that he's been an investment banker okay so if you want to build projects within he's got a very interesting perspective on things Sebastian is at the crossing of industrial and systems engineering operations research and now very much into machine learning every time of the year you just ask a word is gonna be and your thick planet Earth and trying to pinpoint in last money's probably been in Europe in AZ are coming back here going back so his life is pretty hectic and I along those lines and talking interacting with many companies is one of the leaders in machine learning in very stimulated by getting many students involved in companies so he's prepared for us like an overall synopsis of research and links with the industrial insistence in supply chain that in logistics domain related to I will not talk more okay not offended I mean machine learning sorry a lot more okay roughly to this and so mark you're the questions/comments and it it's always open free-flowing thanks for coming thank you very much for the kind introduction and I'm very happy to be here today to talk about machine learning artificial intelligence and applications of those technologies in machine learning in logistics supply chain and logistics so the time of the talk today would be really much more than overview type of the talks on the first half I'm talking about about machine learning what is out there why are people so excited about it where does all the hype come from and so on so forth I would also talk a bit about the risks and the challenges involved in deploying these technologies in the real world and then the second half of the talk I will look at three specific applications where we have deployed machine learning and I within the supply chain and logistics context and most of the things are almost everything I'm talking about that we do is actually implemented and running in the real world there's a very new project that we just started that is at the beginning which is not good where we are working on which it has not yet been deployed within an industry context all right so where does all the hype come from so the basic idea is that if you talk to people in machine learning they think of machine learning and AI as being something like the like electricity so the same thing that electricity did for the industries hundreds of years ago the same type of impact people expect from machine learning to do throughout various industries so there is this idea that if you deploy machine learning and AI and I will use both of these terms interchangeably today if you deploy these technologies within an industry context that you can significantly transform whatever you are doing by adding a much more dynamic and smarter view on things and that's really what motivated this statement here by and Drang so of course if you if you look at machine learning we are all aware that there's a lot of hype out there and everybody's talking about it and everybody's doing machine learning to extend that if you actually go to people in the talk to that there's this running joke right now and the value is it goes roughly like this whenever you go out and you try to raise funds everybody's talking about AI when you actually hire PhD students you tone it down a notch and you're actually hiring ml PhD students or a talent then when you actually start to implement it at the end of the day it's very often not that sophisticated so you use technologies that we have had for many many years out there like linear regression models and so on so forth so that's not what people really would consider AI and then at the end of the day when you actually debug your code it's just a stupid printf type of statement right so that's what it really comes down to and this is the running gag in the value those days and also if you look at the Gartner hype cycle which is always a very good indication where we are in terms of technologies you will see that machine learning is right up here so this is considered the most overhyped technology out there and yes there is a lot of hype out and don't worry once we're done with machine learning it is going to go down there is general-purpose machine intelligence down here so we have a next thing in store so we will not be here unemployed anytime soon so having said that of course there's a lot of hype but I urge you to not underestimate the potential of a technology simply because there's a lot of hype in the system right I mean it's it's very easy to dismiss something by saying yeah that's all hype out there nothing is going on and people are just talking at the end of the day what we have seen in many many real-world applications machine learning and AI can make a big difference irrespective of the hype factor so I urge you to take a serious look at the technology and decide whether it can make a difference for you so having said that so why is everybody so excited about machine learning and AI and the basic idea is what I call the accelerated Pareto principle so if you look at the at the last twenty years how industries have been working then you will see that we have more and more companies that such as Google Facebook ooh BRR where you have a sing a company that is controlling a very very large part of the market because you have economies of scale and then you have the Matthew effect and the whole more gains more type of setup so very very few companies are extremely dominant the idea what people are thinking about what AI is going to it will bring this even up and not further so that he will have essentially one singer entity or very very few entities that will control most if not all of the artificial intelligence or machine learning type of infrastructure and you see this already happening in some sense all these big companies are raking up there ai infrastructure they have these huge clusters and if you want to build something by yourself on the same scale it's next to impossible to get the resources together to build something that is as competitive in terms of infrastructure so we see already starting what is called the AI arms race happening here and I think that is really where the excitement comes from there's a technology out there that as a as a technology itself has the potential to transform many many industries and that's what people are very very excited about it and we have gone a long way if you look if you look what we have done so far we have been extremely successful with machine learning they are autonomous vehicles so this is my favorite example of a cyber-physical system where you take a machine here at some intelligence and sensors to it to make it smart but there's something is truly interesting about the example of autonomous vehicles look at the companies that are successful in that space these are very often not your traditional car makers why is this soul because building an italian-american is not an engineering problem anymore it's a software problem and there's a huge disconnect between building a car which we already know how to do and building an self-driving car where we have no idea to truly make it intelligence although we have figured out many many things and highway autonomy and things like this we can basically consider to be solved and that's why these companies like uber and Google are extremely strong in the machine learning space when it comes to autonomous vehicles on top of that what is also interesting what you can see in that space for example is that there is a big trend to go away from building autonomous car as a company by yourself so the trend goes to building retrofit kits with autonomous sensors software technology that your retrofit and puts it on the wall or something like this that's for example called uber does and you reuse this type of already known knowledge in terms of building a car and just in source it in a smart way and in fact the car builder in some sense or manufacturer becomes something like a supplier to the AI companies that build the autonomous vehicle so that's in the autonomous vehicle space so and we have come a very long way here but and but then we also have come a very long way in specific applications where people have thought that we are many many years away of being successful so my example he has deep reinforcement learning or deep learning as well so why is this so there was this idea so so the III has been always very elusive so the idea was always AI is exactly what you cannot yet do so 50 years ago or 20 years ago or 30 years ago it doesn't really matter people said well you know we can build all these computer programs and it's amazing but the moment we can build a program that can play chess that is artificial intelligence we just need to get to be able to play chess have a computer beat the human champion or playing chess that's intelligence and then we did this and we figured out actually playing chess is not so hard because you can just enumerate that's basically all the interesting moves and you get very very strong chess computers and then people were like yeah you know maybe that's not so intelligent after all it's much more like a search problem that we can solve with current technology and then it went on and on and on there were many many other things on the way but there has been always one very interesting problem out there and that has been gold why because go is a game we need to reason and be creative for a very very long or extended period of time in terms of planning and you cannot solve this problem by pure enumeration the number of moves that you can make the combinations are so high that the traditional approaches with which we have solve problems do not apply in that way anymore and if you would have asked before this breakthrough people how long it would take until you would be able to beat the the world champion and playing go in terms of artificial intelligence everybody would have told you all we are like 20 years out it's not going to happen anytime soon and then deep mind showed up and they first beat the European champion and actually then teamed up with the European champion to use the European champion to actually train the system to then take on the world champion and it was done was it was a deep learning a TV forcement learning-based approach with many many other tricks to make it work at the end of the day but it doesn't matter what is the important message here is that it is unclear for many of these things how far we have come already and if you ask experts where we are it's not always clear that this judgment is is bottom in terms of prediction and we have come a very very long way we have very very strong technology out there and most of these things you do now get them as libraries out of the box sure you have to tune them and so on and so forth and if you want to push things you have to be very careful to make things work but many of these things that let's say the vanilla type of things are all available libraries to everyone out there good now having said that then they say well that's all fair and square you have all these problems that you can solve not with the technology but there is one thing that we will be never able to put into a computer and this discussion has been going on in many many years and this is this whole idea that that there's this notion of creativity that is something that is very inherent to humans thinking and that you cannot put into a computer and just to give you a bit of perspective on this I'm not saying that you can do this in all possible ways but just to give you a bit of perspective let's look at this work which is called style imitation okay so what you see on the left here is is a picture taking tubing I think and now what they did is they trained a deep neural network essentially to imitate styles of known artists to redraw the picture okay let's look at the examples the first one so the first one is in instead of Vincent van Gogh it's the same picture just redrawn by the AI using a deep neural network to make this look like a van Gogh picture and I would say if this is the original this looks pretty good in terms of getting close to it similarly it was then withdrawn was not working we drawn in one of Picasso style that you see down here and another one and another one in in Kandinsky scale so it's the same picture redrawn by an AI in terms of deep learning now you will tell me again oh yeah this is not AI because you know now how it works or this is not creative because I tell you what the algorithm is to do this and say yeah you know now that I know this is maybe not what I couldn't creativity anymore but the point that I want to make is a lot of things that we think that are very unique and special I actually maybe not that special at the end of the day versus there are other things of course that will be very special after all all right so this is just to give you a bit of an idea what you can do with AI a machine you're running right now and that's of course more on the on the very broad side having said this of course not everything is is is is great when it comes to an iron machine learning and in fact if you talk to some people they will tell you that they consider AI to be the biggest existential threat to humanity now I actually very much subscribe to that statement but for completely different reasons I'm not so much scared about the Terminator scenario that we build a terminator that is going to wipe us all out what I'm much more concerned was is the impact on society we have this technology in this technology has the potential to significantly transform our workforce and how we do automation and so on so forth so if you look at the numbers the numbers are actually quite staggering so let me start with the graphic down here so what you see here is what we have been monitoring over many many years that's the development of employment and productivity and you see there has been a very nice correlation for many many years and then something happened around 2000 and that's what people call the great decoupling and the idea is that productivity has continued to rise significantly you see this up there however employment remained flat why is this so because we started to very large scale more aggressively deploy industrial robots and what happens is you see this like q1 2017 32% more industrial robots deployed between 1990 and 2007 every robot displaces six human jobs and let's be realistic you have to come up with a with a way of managing this you cannot just do this and say well you know somebody is coming to deal with this issue and so on and so forth and if you for example look at a Dida's georgia factory here the small factories run by 160 human workers that's something that is this is extremely impressive from a technological perspective but grander scheme of things think in terms of work force work force transformation and societal impact then there starts something that should be extremely aware of and think about in terms of how we want to move forward with this I'm not suggesting that there is a specific solution or that I have the answer to this all I'm saying is that we need to be aware of the potential impact and let's also be very frank about this this is something that is very strongly enabled by things like machine learning and AI if you look at industrial robots before hand they were very good at very precisely defined tasks and his and it took a very long time to train a robot to do specific tasks or set them up know that game is very different just to give you an idea of how this is done nowadays so this is from from one of cookus robots let's have a look of how its trained I'm not going to run through the whole thing but the idea is essentially that you teach the robot to perform a task by demonstration so how does it go you take the arm of the robot and you guide the arm to specific control points have the robot memorize the control points and perform a certain task now in this case picking up the element here and then once you're done was training the task the robot will actually execute the task by itself it might test versions out there where was AI you refine the task and so on so forth to make it nicer over time so and so on so forth but the point is you have now a way of setting up an industrial robot which is much much faster than what we would have done in the old days I don't know if everyone ever and any one of you has seen how you train these robots or set them up before and you program them you have to be very precise the elements have to be all in place if there's a bit of variation you have hard time these things become very very robust and those versions out there that use a computer vision to to detect the shifts in position and so on and so forth this is very fast in terms of training it ok just to give you an idea where we are in terms of using this technology so that's just on the on the work force transformation side but there's also many ethical challenges that I would like to mention so and I don't want to go in too much into detail but I think one thing is very important to understand is how the technology can be used and whether we are happy with this like one one potential use of course is to use AI to predict whether somebody is going to committing a crime or not and here's the example of China but there's many many other countries and many many many many other entities that do exactly the same thing and use machine learning type of methodology to predict various behaviors of humans that's on the ethical side again I'm not saying that I want to suggest the solution I'm just saying that we should be aware of these challenges and and and and now let's talk a bit about the actual risk of deploying these systems in the real world because we always talk about how amazing these systems are but I think we also need to be very precise and honest about the vulnerabilities of these systems and it's great if I deploy something at Google and I do this to classify some images and something happens it's not a big deal there's no human life's at stake if I deploy these systems in the real world for example on autonomous vehicles I do very much care how safety systems are and whether they can be hacked or not and let me just give you an example what you can actually do so this example let me start with the one on the on the left here so this is an example of one of my former students that is now that that worked on this together with his cohort that is now at Google so this is the idea what they call it force aerial patch so how does it work so what you have up here is a vgg 16 network this is an image classifier and this thing with an image and it tells you what it sees okay image up here you put in this image here the s has a bit of a notebook but clearly you see there is a banana in the picture and the classifier also tells you hey with probability almost a hundred percent or probability one I'm essentially sure that what you see is a banana quick now what you do is you can actually build these pictures here so they look like Hostos small like like like chips like the coasters that you put under your class very small one and you compute them in a special way so that they actually when you put them into the image they heck the image processing that is happening within the AI okay that's an adversarial patch so what you do is you take this coaster you see the size here roughly this is the banana and this is the extra coaster and you put it on the table where the banana was like next to it and now this thing's things and it what is actually seeing it's not a banana anymore but a but a toaster okay think about this where do we use this technology we use this technology for example and not an oil but in some autonomous vehicles to decide whether we see a stop sign or something and now it gets even more interesting so this is you can say oh this is great that this oil steel rating can stop and so and so forth you can actually do this in the real world so there's a so called stop sign in tact what you do is you can place these white and black stickers that have certain patterns in a smart way on these stop signs okay and now if you take your standard image classifier that has been trained to to detect traffic signs it will actually tell you that this is not a stop sign but it's a 65 miles per hour speed limit sign for example okay now if you think about this if you have an autonomous vehicle that is relying on passing these signage information for decision-making and I tell you this is not a stop sign you can actually speed up this come to be disaster right or can be potentially be disaster that means we have to address these things in a different way we probably need different ways of having additional map information and so on and so forth but all that I want to say is that it's not so straightforward to deploy this technology just take it and put it somewhere and hope that things will work out the way we want it all right so that's basically all that I wanted to say in terms of giving you a bit of an introduction of where we are in terms of machine learning what the risks what the possibilities are what I would like to do now is I'd like to talk about machine learning and AI specifically in supply chain and logistics so and of course if you think about this there's various very natural application areas how you can think of using this technology in your supply chain logistics context of course there's the all-time favorite that everybody will come up immediately that's forecasting and of course it becomes even more interesting now that we have all these short-lived SKUs that are seasonal and you would like to find out the way of how to deal with this so this talk is not going to be about forecasting I would have one slide just to give you a bit of a perspective but this talk is not going to be about forecasting so then of course the other example that might come to mind very early on is dynamic routing we are facing more and more setups where we need to dynamically route something through a network beat car speed packets it doesn't really matter and these networks are more and more volatile and we get information more and more dynamically and now you would like to dynamically use that information to make better decisions so that's what I'm going to talk about and this is the actually also something that we have implemented already few years back and that has been running since then the next one that I would like to talk about is worker performance assessment and and you would see from what angle I'm I'm going to look at this and the idea is basically that if you have say fulfillment center and you have workers and maybe in the old days it was sufficient to do some time studies and then compute something like pics per hour as a metric if you look at how tasks are performed nowadays they are very complex and diverse and it's very very hard to assess whether a worker did a good job on a task or whether a work I did not could do a good job and it's not so much about the actual performance of the worker as we will see later but much more of the overall efficiency of say the fulfillment center or the warehouse and then afterwards I'm going to talk about something new that we just started at the dynamic inventory management and they're the ideas that you go away from this more stationary optimal policies that you have derived from a very many equations that we have all seen probably a hundred times but what we do is we rip this whole thing out and we replace this by deep reinforcement learning which is a much more dynamic approach allows to dynamically adjust to changes in the demand patterns dynamically learn seasonality and things like this and last but not least of course there's another all-time favorite pricing but that's a bit of the out of the scope of this talk so I'm not going to talk about pricing and that's for some other time all right let's start with forecasting so my message is very simple machine learning and forecasting let's maybe not talk so much about forecasting in fact there's extremely good libraries out there that do extremely high quality forecasts for you for free there are open source it's for example Facebook's profit library the open source it's a great library it includes seasonality ease you can put in calendar information and so on so forth and the nice thing is it will outperform a traditional hold vendors in a Rhema model significantly it's very easy to use it has been hardened and so on so forth so that's not the only thing that I want to make about forecasting if you want to do forecasting try out that library first or many of the similar ones and if it does not work then well then can come back and talk about actual machine learning because this thing solves many many interesting cases that we believe that are not under not solvable that easily before and it uses some type of machine learning just maybe not the deep learning that you would expect so that's for forecasting now let's let's talk a bit about the the the at least for me more interesting things such as dynamic routing so what's the over task that we face or the problem that we want to solve so the ideas that you want to route or tours through a network maybe you need to deliver goods to to wear to two outlets maybe you need to roll packets in the network it doesn't really matter that's your task and of course let's be realistic in networks that we are caring for those days you have a lot of feedbacks effects there will be congestion there will be delays there will be all types of problems links break down links slow down and so on and so forth right that's what we're actually facing nowadays so what would have been a traditional approach so the traditional approach when I for example worked at IBM I luck would have been you take a lot of historical data that you have then you run some form of regression to get a rough idea how long it takes to transfer it turns were an edge Traverse an edge then it would set an optimization model very fancy one of course because you want to use cplex and once you have done this you would solve your optimization model and you repeat at some point you would realize well you know my solution somehow don't look so great anymore maybe I should collect more data rerun my regression and repeat that would have been the traditional approach in many many cases now what what I suggest what we did back there is this like it's a different approach and ideas that you should do online learning so the idea is of online learning is that this learning piece that you usually do in the beginning and then you do the optimization piece you'd know do both things simultaneously and that says that has many many advantages at for example the advantage that you're learning never stops right if I do this to the traditional approach I first learned then I stopped then I optimize here both is integrated so you never stop learning you get more information new things happen you can adjust to them and so that's that's the basic idea here so just in a nutshell if you like the traditional approach the idea is somehow to get a good solution every time and here we relax just a bit we say look we don't want to get the best solution every time we just want to get the very very good solution almost all the times and why do I want to relax this because I can use this other times to do a bit of exploration to figure out if there ways to traverse my network that are actually more efficient so I can actually learn right you have to trade off right they give only limited time let's say you optimize 95% of the time and the other 5% of the time you want to use for exploration to learn new things so that's the basic idea of what you want to do and then there's of course this exploration exploitation trade-off that occurs almost everywhere that you have to make a decision when you want to learn and when you want to use what you have learned good so that's the basic set up so and I'm not going to bore you baki down with formulas here I just want to give you a very fundamental idea that is behind this and this very fundamental idea is what people call the regret so the regret is you say look what I want to do is the following thing I want to find the dynamic policy that over time gets as good as the best stationary policy that I could have come up with so the stationary policy corresponds to your eye your eye first regress and then optimize scenario okay so why you want to do this but if you think about it let's say you have to first collect data you have to wait half a year until you have enough data before we can start the decision-making process here you do everything together and what this equation really does is so the FT is essentially the realization of the reality you think of this as being the cost that you see in your network the XT is your let's say routing decision and every time step T and you benchmark the cost that you incur on average against the cost on average that you would have occurred if you would have traditionally just picked a singer' solution after doing the regression for example and what you get is that this goes to zero at the rate of 1 over routine so it means that over time your dynamic policy will be at least as good as the stationary one and in fact in reality it will be very off much better so it means that you're much more dynamically adjust to changes in the demand patterns or in the in the in the delays of the network and so on and so forth so you get a very nice dynamic routing approach that is extremely fast and now the the best our dollars so there's many many addresses that do this I mean we know this for like 20 years but the nicest thing about this is that there is a very interesting version of the algorithm the so called follow the perturb leader algorithm that's actually due to collide when Palin van policy at Georgia Tech and why is this such an amazing algorithm because you can reuse your old routing problem so very most people have already set up some way of solving the routing decisions right and if I tell you now how you know I have this new approach is much better than what you have there but you have to throw your old rotting model away it's not going to happen I mean let's be realistic if you try to roll out something in the reality and you have processes in place and people have been working with the model for 20 years you don't just show up at the door and say hey I have something better for you right it's just not working but the nice idea is that this algorithm allows you to reuse your old model so you add an egg with mix scheme around you're already running algorithm and you just make the edgar than smart by adding something to it and it's much nicer because you can actually implement this and people are actually willing to implement to do this because it's a very simple type of algorithm that runs around this and that makes it smart just by adding something to it all right good so let me give you an example how this looks like I apologize that ahead of time that the graphics graphics are a bit messy they're a few years old and I could not regenerate them so what do you see here so what you see is we look at a TSP instance so what does it mean we want to do two in a graph I have a truck I want to deliver product to certain outlets and the truck has to come home at the end of the day that's my task okay and that's the network I took is simple one 16-ounce okay and what you see here the blue lines are the optimal tours this is an overlay of two tours that you see here okay these are the blue lines and the green lines are the tours or the edges that the algorithm traverses by trying to learn the optimal tours okay so we know the blue ones the algorithm does no no no no not now the blue ones when it's going to run and what it's going to happen is that a time step 500 we will switch to a completely different regime so think of something like for whatever reason we have no traffic jams or some streets break down and so on and so forth and suddenly a completely different solution is going to be the optimal solution not what you had before and so you learn something you're extremely confident that this is the optimal solution suddenly something happens in your network and then you need to readjust to something new right that is a scenario that we face very often and let's see what happens let me see if I can do this in a smart way one second let me stop here for a second so what you see here is so what do you put eyes it so bad to see but this dashed line here okay this corresponds to the optimal way of traversing the graph at the current point in time where did I continue here here here okay that's the current optimal solution that everyone doesn't know this the green lines are the tools that the algorithm is taking and it's actually taking to the graph but also learning at the same time the edge weights that you see these thick edges are edges that I used more often so they are more likely to belong to the optimal tool the thinner ones are the ones that are taken only once in a while to make sure that you're not missing something you're actively exploring new solutions okay now let's see what happens so this is like a time step 56 so what happens is over time the algorithm if you look at this right now we essentially have already learnt the optimal tool okay after very few iterations we have found something that's extremely close to the optimal tool and you see only very occasionally some of these green lines which is when we explore and see whether there are other possibilities of actually getting a new tool that could be better okay let's keep this running so this continues and you see that the the green lines will be a fewer and fewer because we explore less and less over time and then times that 500 comes that happens in a few seconds like now so what you see now now completely different to us optimal okay you see that I have these blue lines now that were not there beforehand and you also see that the algorithm hasn't even taken this to once it's a completely different tool think of some streets broke down or the some bridges crashed like we had in Atlanta so you have to take a completely different tool but the algorithm doesn't know it that this is the optimal tool and then why it does is it continues it figures out that the current two is not good anymore and starts to learn which are the new tool which is a new tool that is actually competitive and you see that we are starting to traverse these new edges again you see the blue green ones here that means that we take these edges in our routing because then this continues and so on and so forth now how good is this let's look at the actual cost so what do you see here it's a it's an ugly graph right now but let me tell you what you see the red line here is the cost of taking the of taking the optimal route in each time step okay that's what the red line is and they calibrated the cost so that they are the same for all these tools why because if there's a big difference between the cost of the tours it's very easy to learn the new tour so we make sure that you cannot just look at the cost of the to the different differentiate if a 2 is optimal or not so we took a lot of the rewards signal away which is very important it makes it actually much much harder ok so this is what you see this is the red one here this is an optimal cost for in every time step to achieve what you try to do then what we have is the blue one the blue one is the cost incurred by our algorithm and the green one is the deviation from the optimal cost that would have been achievable with perfect foresight into the future so what do we see we see that at the beginning we start out with some random costs right because we just took a random tool we have to explore and then you see that the cost is very sharply decreasing to a point where we essentially optimal you see this right be very close to the red line and I say essentially optimal because we are actively exploring what I don't know the rate was probably between 3 and 5 percent so every 3 and 5 percent we take a sub optimal tour just for them making sure that we're not missing something if I would not be doing this I would be completely missing this spike here so what happens at time step 500 at time step 500 something happens in my network what it means is I suddenly changed the cost in the network drastically from one step to another so what happens my old tour that the algorithm is still taking most of the time right because it's believed that this is the great the the great tour to take to bring the cost down suddenly becomes very very expensive but it's what you see here so the cost jumps up and you see it's like more than a factor of three that it jumps up by okay so it's a very expensive tool but then the algorithm realize that it's not optimal anymore and starts to again learn that this new to us optimum then after another 150 iterations do I get again back to a point where the cost that you're in curing for you dynamically learnt to us essentially the same as taking the optimal to if you would have known it which you don't know in reality so what does it tell you it tells you that there's a learning algorithm that you can use for routing and actually this one is running in reality or reword large-scale networks that that learns which are the optimal tools to take if you don't know ahead of time what the edge cost or the times for traversing the graph will be now the interesting thing about these algorithms is if you look at the convergence rate for those of you that are into numbers doesn't look so great it's just one over routine the key point however though is that it's dimension independent that means this is as fast on my 16 node example as it is essentially on my 600 million not example there's a very small trade up in the logarithmic term in the number of nodes but otherwise we extremely fast so that means that these algorithms scaled very well for my toy example here to rework deployment and that's why people love to use them in real-world applications so that's in terms of dynamic routing what I would like to talk about now is worker part of performance assessment so that's a completely different type of application where we use machine learning and the idea is essentially and I will talk about why you want to do that in a second you want to measure and assess worker performance for example in a fulfillment center ok now why do you want to do this I talked about this in a second but let's talk a bit about what people have been doing in that space more traditionally so you come up with some metric maybe of pigs per hour maybe you adjust this bit by the task that you would like to do maybe you take the geometry of the work warehouse into consideration maybe do a couple of time studies and so on and so forth then you try to derive a model from this yeah yeah more weight than the coefficients that you obtained in the last yeah I know I know what you mean so that the answer is so the the the theoretical answer is no you don't need to you actually take the average over all rounds that you've seen so far you can still show that it's absolutely what you do in reality to speed up learning you take a discount factor similar how you doing reinforcement to learning to put more weights or more recent decisions but it's typically a very modest contractor it's like maybe 0.95 or something no none this was without this was completed completely without yes good so let's talk a bit about this worker performance assessment example so yeah so typically we come up with some metrics and you try to assess the performance of workers now reality if you go to a modern fulfillment center is very different why well first of all it's very high paced we have very small orders those days actually if you take your favorite ecommerce company most of the artists will actually be singers in order so you pick only one item per order okay but you pick multiple into a tote okay so there's a lot of challenges coming with this how when you put product what correlations do you want to consider to keep product close together so that you can reduce time to pick things and so on and so forth then there's questions where do you pick do you pick from the golden zone in the middle do you have to kneel down do you have to take a letter to go up how heavy is the item what's dimension of the item is it is it fragile is it maybe beauty products there's many many things that you have to consider in terms of picking that impact how fast you there's also things that depend of how long have you been working on the shift have I made you actually pick heavy things for the last two hours and yes at some point your performance is going to go down right so there's all these contextual things or these environmental things that you actually want to like to put into consideration to make a fair assessment of performance okay so why do you want to do this well at the end of the day you would like to learn a model that that gives you the possibility you tell me what the task is you tell me a bit of the history of the worker say the tale - gives you a very good prediction of how long the task should take and let's say in math week you would like to have something like an unbiased estimator so it gives you a fair estimate of the time in expectation and maybe it gives you some standard deviation that you should expect for this to make sense okay now why would you care and the key thing is it's not why you think that you should care the first thing if I say you want to assess worker performance is to think about evaluating whether a poor person by its service over are not performing so that you can make hire or fire decision that's actually not the point there's actually a very very small component of a bigger puzzle while you care for being able to assess the performance is actually the cases that we worked on for completely different reasons so first of all you would like to detect whether you have issues in your fulfillment center maybe at peak times around the holidays you suddenly realize that there are issues in the in the fulfillment center and the reason why the performance is going down of specific workers is because in a certain area you have a problem right for example the other thing is you would like to do a/b testing on associate training right there's like how do you how do you what what strategy do you teach the associates in terms of picking and so on so forth replenishing what's the impact of this so you can do a/b testing once you can assess the performance you can a be test new WMS policies how you how you how you pick and how you store how the replenishment Center and so forth and what become more became more and more important you can do real time work load shaping on a per worker basis what does it mean means if you can track the performance over time and you realize that somebody for example is getting tired you can assign the person to easier tasks for some time you cannot say alright this person seems to be very good at dealing with beauty items and gift wrapping them for example and this person seems to be not be doing so good at it maybe you can actually assign the right person to the right task and if you do this you have a win-win in the system right and so what this in this case gives rise to is a large optimization model where you want to actually maximize the throughput that you have in the system and you solve this optimization problem then you get an optimal assignment but very often you also interested in doing this type of performance tracking over longer periods of time why first of all you want to prevent worker burn on overtime and if you see that over months and workers performance is going down then maybe you want to pay very close attention what's going on but this is one component the other component is that if you introduce a new technology maybe you introduce new guns right for scanning then whenever you introduce a new technology the fresh or term impact is typically negative because people have to get used to it so you have to trick over a longer period of time what that actually makes sense right that's why you would like to be able to track this over a longer period of time and last but not least there's many changes that you can do to your fulfillment center warehouse replenishment thoth sizes like if you make bigger tones you have to carry more that it slowed you down right it's a trade-off between carrying a lot and carrying a lot often and things like this so yeah so what did we do so traditionally you probably have done some form of a regression type of approach maybe a bit of non linear regression you cook up your feature vectors we come up with something that somehow represents what's going on in your work in your warehouse or in a fulfillment center and then you will realize if you do your first regression that you get an arbitrarily bad r-squared why because this thing is highly nonlinear just think about this right like if you want to pick something and you don't pick it from the middle and you have to bend down then this is cost your time but let's say the second item that you pick is right next to it you already bent down so it's not going to cost you any additional time so all these things are extremely dynamic and nonlinear if you just run your traditional regression you will get a very very very bad r-squared so what we did is we came up with something else which is much more complicated where you look at the full context and you don't try to cook this down into just feature vectors and what happens now is with the combination of deep learning extra trees and random forests but what you get that is you get an r-squared like a generalized r-squared because what these machine learning models you don't have the residuals in the traditional form anymore of 0.6 to 0.7 okay that's what you get now the question is if there's a good or bad R square so you know if you if you take your traditional examples you would say yeah you know that it could be higher I argue actually that it should not be much higher than that why because you want a model that learns the base characteristics of the tasks that you are wanting to do but you don't want this model to start to overfit because you want the model not to incorporate work of variations because that's precisely what you want to measure if I have a if I want an unbiased estimator my estimator should be blind to which workers doing the tests and that creates variation that it should not be explaining with your R squared similarly there will be many tasks especially at peak times where the execution will not be nominal maybe an item is drop maybe every miss because there's the item is supposed to be there but it's not there so the task will be take much longer than it was supposed to take you cannot explain these things you cannot expect to explain these things and should not try to explain them because if you would explain them you would be overfitting so in our squared of 0.6 2.7 is actually very good in that context and we trained this on like 10 mill about 10 million individual data points and gives you a very finely grained model and again this is also something that is deployed and in use as of now and just to give you an idea how this looks at the end of the day what do you see here these are all the tasks all these dots are a task down here you see the true duration of the task that it actually took to perform the task and here you see what the model predicted how long this tasks would take and what you see is that this is all very nicely spread around this line here but you see also some very interesting things like for example you have nothing here but a lot here why is the sole well these are tasks what should have taken 40 seconds for example but took a lot much longer because there was some eager Larry T with the task maybe somebody dropped nighter maybe something was not in place and so on and so forth so the model kept us exactly what you want all the irregularities you have here because these are when things take longer than expected for the bite of the points they have the nicely center around your main diagonal which would be the perfect prediction line and you see that the density that most of this is concentrated here okay so and that as I said this is this is news and this works actually very well to assess performance unfortunately I cannot give you exact numbers because it's actually news good so that's what I wanted to say in terms of a worker performance assessment and then my last example it's actually something that we very recently started and that's about inventory management and so what is inventory management so let me give you my layman's this definition of inventory management is you want to essentially provide product for production maintenance or service at the minimum stock level with the highest possible service level so your trade off will be a service level versus stock level if I put everything my warehouse I don't care for cost I have perfect service level if I put nothing in my warehouse I will always stock out and I have the worst service level but I have minimum cost okay if I don't count account for store guard cost good so what's the idea here so the traditional prote somehow is in most applications you look at the demand you make some assumptions about the demand and then after you have made a demand assumptions you run a bit of forecasting maybe you try to guess off the distribution of the demanders and then you take some policy that you derive from the Behrman equations or some stationary policy that becomes the inventory management policy something like SS and you will run this and then you say well this is what is going on and if you look at various of the WMS software's out there they actually use these things nowadays so it's what I would consider very often less than an approach if you don't do something smarter good what's the problem with this the problem with this is that these things are typically derived and have very very strong assumptions so you make assumptions like my costs are linear my demand is independent many many assumptions that actually and reality almost never hold and that's a problem if you deploy these things because you don't you don't even know how bad your policy is and very often it cannot solve the actual inventory management problem in real time so you don't really know so what we started to do and this is as I said this is Hong Kong work with one of my student that is also sitting over there so the idea here is that you do deep reinforcement learning for inventory control what's the idea the idea is that you automate you you feed this model and we see what it gets and it automatically learns what the optimal control policy is given the observed demand in the system okay why do you want to do this well first of all you want to do this because don't need to know anything you literally just dump in the data and you let it happen sure there's a lot of model tuning going on and so on so forth but you don't need to make any assumptions and that is a very very big difference so the only thing that you actually observe is the state of the system how much do you have in your warehouse maybe then you have your actions how much can I reorder if I want to what's the lead time and so on so forth and then you have rewards so that is essentially I tell you you did a good decision or you did a bad decision if you stuck out that's probably they did something wrong in the past if you stuck out so you should pay for this if you hold to much inventory it's expensive you should pay for this if you run minimum inventory that's great you don't pay for this so and we want to want to do is we want to maximize say the reward of course there's a lot of issues with the as with all things convergence can be slow and for many of the actual reward problems where the problems highly nonlinear might actually in principle happen that it doesn't converge at all reality is a bit different because the stochastic and so on and so forth typically converges very nicely in applications and the way of how you should think what is deep reinforcement learning think of the same thing that people did beforehand by solving these Behrman equations to obtain the policies but now you do this numerically and dynamically with an egg with them using the data so you solve the same problem deep reinforcement learning also solves the same problem as your Behrman equation solves it's just dynamic so it's a numerical algorithm to use the data to do this dynamically in a model free context good what can you do let's first talk about the setup so typical setup is you have an agent that's the inventory controller let's say and the guy can do something in action the action is order 10 more units a time step T that's an action and then the environment does something with it so you execute your order product arrives with them certainly time so you go to a new system state and from that new system state here s t1 you get a reward and the reward could be in our case it's just the inventory costs that we had at the last time step that's a fair reward because we can roughly estimate what it is we know what we have in the warehouse we know how often we stopped on and so on and so forth we just use pass information or future information obviously good so how do you do this again I'm not going when you go to the details there's some algorithms that do this - typically in reinforcement learning one of them is cue learning it's a stochastic degree dissent type of algorithm you have you maintain this table queue here which tells you for a given state what the reward of the actions would be if you would take them approximately that's what you learn this magic a table cue it's a lookup table if I have seven thing my things in my warehouse my demand in the last ten periods was like this what am I going to do that's what this table contains as you can imagine if you think about it when I say it like this in reality this thing will be huge it's a big lookup table that tells you if I said is this do that problem is you can't stable and that's actually why the traditional reinforcement learning approaches although we know about them for very I've never been really used in that context now the idea is to do something different he said look I take this cue table that I cannot maintain I just rip it out and I replace it by deep neural network so I learn an approximation of this table but it's good enough for my actual learning test it's not as comprehensive as the full table but I don't need the full table I just need a good approximation of the table and I to put this network in the to do this for me and that's the idea of reinforcement learning in a nutshell you throw out the cue table and you put in the deep neural network to replace the cue table okay so what can you do let me give you first a simple example that should convince you that this is actually does something that is meaningful so let's look at something that we understand let's look at the traditional setup where the SS policy would be optimal okay that's a set up here so what you see here you see four pictures the way of how you should read this what you see here on the left is what the DQ ender so this is a deep reinforcement that's deep deep reinforcement learning based algorithm and what you see here's what the SS policy does blue is the demand yellow is the inventory that we maintain down here we see the order sizes yellow this yellow here or orange here is what the SS policy does blue is what the network did and then here you see the reward signal that was fee to the DQ n only the SS doesn't have a reward reward signal but we plotted the rewards as well so that you get an idea what's going on okay this is after 30 20s or so essentially nothing we just started this is just random guessing and you see that my SS policy has a reward of about 700 in this case and my DQ n gets a reward of minus 2500 I mean you can look at this it's crap okay now let's see what happens after we start training so now we are after about 500 episodes so you see my my reward is already 419 compared to 651 and you see before and I was like stocking up all the time it makes my inventory you see I'm already not stocking up that much anymore it still doesn't look as nice as I would like it to be but we are slowly getting there and you see that my order sizes are already somewhat aligned with what the SS policy would be doing which is not completely unexpected because the SS policies optimal just to be very clear how we how we came up with the numbers here so the SS policy is actually more optimal than possible in some sense because we optimize the parameters after the fact so you would never get these parameters that we used here for the SS policy because we optimize them after the fact to make sure that there's nothing better out there in terms of SS policies okay good so now let's see what happens after 850 iterations so after 850 iterations we essentially have the same rig reward that the SS policy has sure our reordering patterns here our order sizes are a bit different they're a bit more random that's because we still explore once in a while a bit and be allowed to all a bit less or more but essentially you see that the type of order is very closely matches what the SS policy does and that's also what you see up here in terms of the reward signal now if you look at it over time what you see here so in the upper row you see what happens within the training process and what you see on the lower one is is what happens in deployment okay after you're done with training if you would just take it as is and you were deployed you usually don't do this but this is just for illustration so what you see here is you start and so the green one now is the random policy we just randomly order ok that's your benchmark it's the worst possible benchmark that you can somehow come up with that's if you don't know anything about reality you just order n them or you just order the average or something like this ok so you see that the neural network because at the beginning it doesn't know anything either it starts at the same point but it's very fast picking up to get close to what the SS policy does and you see it's it's converging to it and then there is a dip I talked about this dip in a second and that it continues and so on so forth now if you look at this in terms of prescriptive news subscription script if news is in number is the coefficient that captures how much of the information in the data did you actually do use ok that's what Prescriptives it's like r-squared for for learning problems you and now how good how much of the data did you use that was in your model then what you see is here the prescriptive nurse goes very close to 1 we cannot go over 1 in this example because this is SS which is optimal and then there's this drop I talk about the drop in a second and if the in the trained out model you see that we are sent very very close to 1 so we get essentially very very close to this hyper optimal policy that you can never come up in reality just by learning from the real world data and feeding it in and there's no more than behind it I just fit the state the reward signal that's it okay so what's this dip so what is that's actually a very interesting example what happens here is the egg Rossum explores so once in a while everyone says hey you know what would happen if I just don't order anything and let's just like incur the the stock odd cost right maybe once in a while I do one order here and you see this is the green line here and then figures out maybe not such a good idea let me go back to where I was before him right and that's how this whole training setup works it's highly highly nonlinear and non convex but the algorithm explores and finds new solutions to your problem good so that's that's just giving you an idea yes if you deploy it the right way you're essentially as good as what you would get if you would derive these policies under the same assumptions is of course under the assumption where s s is optimal she would not have these assumptions then it makes no sense good now you can take this you can take the whole thing much further so you take the same setup but what we do now is that say you know you don't have demand that satisfies the requirements for s s optimality let's say now I add seasonality s every second week on Sunday suddenly the demand goes randomly up this is what you see these are the orange spikes and again same set up here and this is slightly different graphics because these were different runs but what you see here is like in the beginning again the Minette work essentially just are just some random number all the time something like half of the demand and you see the reward is pretty bad right and what happens now our time is after 50 it already went up significantly forget what the reward in numbers means it's not so important what is important if you look at the picture what's happening here you see that it already starts to get much closer to explaining these Peaks here and if we repeat this a bit further so you see in the order structure we already get some structure here in the orders they're not completely random anymore now you see we start to somehow mimic the same spikes that we had in the seasonality is here even more now these are still a bit bit white because the motor hasn't really figured out yet where the spike is going to come so it orders white around the spike to make sure that it has to inventory when it needs it that's why they are so wide here but what happens then after a few extra iterations the model actually figures out that every second Sunday I have this extra demand for whatever reason without you ever telling the model and it actually learns an order policy that you can see here that the day before the demand comes actually bumped up the inventory so he was now there was like no lead time involved and so on so forth but you can do the same thing with any time okay good very nice now you say what this is great you this is just academic examples let's see what happens if you take the same thing and now you deploy to something else so we take the same model literally the same model I just stopped the timer and now now I switch the environment so I took the environment that I had before it but now the costs change say for whatever reason my holding costs gone down dramatically and the profit that I get for selling an item goes up significantly why does it change because if you look here in the last picture you see that given how the inventory is handled I never built up inventory early which I would usually do if I expect a big spike why because my holding costs were really high here now let's see what happens if I change the hauling cost that it actually starts to make sense to build up inventory so what happens is the following so suddenly the cost changes the model gets very confused why because it has a prediction of what the reward should be for an action and now I give you a completely different reward so your expectation of what reality will be and what you actually get from reality back from the environment are very different so the model essentially gets very confused what happens is it starts to explore new policies and what then happens after again 800 iterations here it essentially finds policies that build up inventory early and then actually set it off because now it's very cheap to build up inventory again I haven't done any changes I run the same model I feed the same data nothing has changed the only thing that has changed in the environment that the model is blind to the car I mean it gets it in terms of the reward the cost has changed but nothing else ok good now for those of you that think about this for five minutes they would say well that's all fair and square but let's be realistic you tell me need 850 training episodes let's play real world a training episode is a day because you need inventory demand for day let's say that's 850 days that's not feasible it's never going to happen by the time that you have learned something with the amazing algorithm reality has changed and it doesn't make any sense anymore and that actually is one of the biggest challenge for most machine learning systems in the physical world because getting the data is typically a slow process and it takes it costs money to generate the data typically and it's snow so there's there's a there's a limit on that resource it's not like Google and Facebook where you have a hundred billion pictures that you can use for training it's not like this so you have to come up with something so the idea of course would be well build a simulator build a good simulation use the simulation to pre train the model now if you think about it for the seconds as well that that's a great idea but that's going to fail why because if you train in a simulated environment you're over optimizing to that environment the moment I put you in the real world what you've learned in the environment is not going to fit any more to the real world and you're actually performing bad and that's a very valid concern people are very much aware of this in the reinforcement learning but what it turns out what you can show is you can make reinforcement learning robust so you can robust defy this training process so that you can actually start to train in a simulated environment and then once you're done with free training in a simulated environment you can deploy it in the real world environment and that of course you can do much faster because in the simulated environment 850 episodes are nothing I do this in two minutes okay the reword it takes me three years almost three ok so that's why this is a very nice type of thing and it's a very simple set up just to give you an idea for those of you that know q-learning what it really comes down to in the end is that you add an uncertainty reward term in your training equation in a smart way and you can do the same thing for for TDE for q-learning for so these are the standard learning algorithms that you might know from the textbooks but you can also do this for newer network-based ones like the ones that I'm using here by robust defying the training process and then you actually can pre train in a simulated environment and then you can deploy in an environment that is very different not very for them it has to be similar but it's Nene don't need to get all the details right and then what happens is that the policy will be fine-tuned by real word deployment all right good so these were the example that I wanted to talk about let me finish with a very quick summary so I think if you think about using AI and machine learning and supply chain and logistics there's many many different ways of how we can use it and I would say that the journey really has just started and what I felt very important is when you look at new technologies there's a lot of new technologies out there but very often they don't scale to real-world applications and that makes them very excited in the classroom or for research but not really something that can drive innovation in the real world and I think that is the reason why people are so excited about machine learning because these things tend to scale very well if done right and that's the case and all these things that we've seen all these things are implementable or implemented depending on the application what is also very interesting is that for most of these approaches if done right you can actually reuse already present infrastructure that's important nobody's going to rip order and WMS just because you show up it's not going to happen nobody's going to throw out the algorithm say oh we just reset we start fresh we just throw out all the IT and we do our own IT it's not going to happen so if you have something that can integrate with things that are there already systems that are in place and you integrate with them nicely that has a much much higher probability of actually being rolled or implemented and many of these machine learning type of approaches they can do this reinforcement learning for some means nothing it means the what is going on the system what it can do and you say whether this was a good or a bad decision it's pretty simple same with the dynamic routing you take the routing that is in place already at the company and you just add this machine learning how around this which is a Python script 50 lines of Python code and that works okay good and yeah so why do you care because at the end of the day is always about money and yeah if you can get something out fast you get faster return on invest that's the nice thing but as I said I think we are really only at the beginning so there's very exciting things out there for example generative adversarial networks that will have many many applications so the the way of how you should think of games is you should think of two networks playing against each other so you can look at very interesting situations where for example look at a company and you look at the competitors and you have them play it out but you have to play them at out mat with static agents but with agents that learn why doing it because that's what you can really care for whenever we do these competitive analysis right we come up with something what could the competition do but we don't know right and that's different here because you have two networks that have essentially the same power play against each other and figure it out and yeah so and what we also see is of course we have a much higher integration those days with data data is much more available and if you can now integrate your inference mechanism with the data stream so you can do real-time type of predictions there are so many many applications where you can use this way you can cut down cycles and re optimize in reality and so on and so forth all right that's all that I had to say thank you very much [Applause] yeah hey so you mentioned earning 1% of thousand can be very big so I mean there's a buy statement obviously but let me try to unbiased a little bit I do think that the impact can be very high because but the companies that I'm working with one of the biggest challenges that I see is a significant increase in speed with which things are happening but traditional planning approaches don't apply anymore and the other thing that I see is a significant increase in diversity of the problems that have to be solved right like in the old days you know when I bet my optimizations when I start I built an optimization model for a problem and it was a very pure type of thing the problems nowadays involve data simulation decision-making and so on and so forth so I think there's a huge place for leveraging these technologies in supply chain logistics yes exactly yes very good point I mean this applies the same way to manufacturing as well I mean I took the example that we worked on and what we implemented but yeah sure you can use this the same way for manufacturing related performance assessment questions Thanks so yeah so in the examples that I showed here to keep them simple no but you can I mean it's you just change the Xtreme space you can I mean the action doesn't have to be have to order a single thing you can say you can only order in batches of 20 for example or things like this we do have stock US stock out cost in this example that I showed you lead time is a non-issue two in two to add seasonality s everything that you can map into the state space or the action space can be done it's not so much honestly so I think you can run this for very very large scale problems in fact I mean we learn the same algorithms to play computer games and there your state space is huge you I mean the state space is the image that you observe on the screen so I mean there's a very very large number of images so I mean sure I mean more costs more at the end of the day odds on resources but I do think it scales I mean we have implemented other similar things with the same technology with much much larger state spaces and action spaces and it works I mean I mean it's not so much actually it's not so much to be very precise it's not so much a question of if it takes more because I defined the size of the network it's the question is rather how good is the approximation ok so it's a trade-off between the network size like here we used three layers 16 notes per layer and we have a different action and value network right but I can do many many more complex network architectures and that costs more time not necessarily the size of the state space or the actual space because we forced it into the same network sites when we look at it essentially one of the message that is going on this doctor you just applied this very generate neural network this is something of that style and get it a lot of data in fact you can almost forget all the domain knowledge base take into consideration all the you foresee that this will be good for 90% of problems or as we grow or or it's gonna be kind of a combination so okay it's a very complicated question let me maybe answer this twofold way so AI of course picked my examples to make that point okay I picked examples where I can take out of the box things to show what you can do if I tell you I can get it done but I have engineered it for five months you would not be impressed right it works by taking something out of the box but I think the real opportunity is to take this and actually combine this with them and domain knowledge because then you get the you get the domain knowledge for everything that you know ahead of time which is static and then you add intelligence on top of this to make it flexible in the areas that you cannot foresee so you get the best of both worlds and I think that is the way to go down the road I mean you don't get the training data I mean in reality and that it works when you have a physical model to some extent that helps you map that and improve training so I think the real opportunity is to combine both large scale agree molarity simulations that are very near to real data that's one two three years of data from which we generate bunch of things but now got a lot of complexity going on so when you were talking about training I think that somehow training in such simulations can alleviate the bit of the issues that you had about the replicate and the fact that you can spend anthem on the simulation yes no no and that's exactly what you do so you build a simulation that captures like it's a complex simulation it's not like it just testing it's a very complex simulation that tries to get the very close to what is happening in the physical processes but still it's just a simulation and then you do this form of robust training that is essentially blind to these very small changes right and yet now it makes a big difference and that is I think also the way to go to generate the data from academia the second suit we thank you all for coming here and we will see you next time okay with always very nice [Applause]
Info
Channel: GTSCL
Views: 45,256
Rating: undefined out of 5
Keywords: machine learning, supply Chain, logistics, optimization, Georgia Tech
Id: pzzFvhJ6-LI
Channel Id: undefined
Length: 74min 5sec (4445 seconds)
Published: Mon Mar 05 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.