Anna Nicanorova: Optimizing Life Everyday Problems Solved with Linear Programing in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys my name is Anna I lead rapid innovation and prototype and yet an elect which is advertising technology company I know that I'm the last doc for a day so I'm going to try to keep it light short and entertaining and informative infirmity in a way that we're going to be solving important life problems things like worry to go on vacation or how to construct your reading list important things because we're all data people here we run a lot of algorithms every day the question is can we run an algorithm to help us solve everyday problems and about one of the such algorithms I'm going to talk today this is George dancing in 1947 he invents an algorithm called simplex it's named top ten algorithms of the 20th century and this algorithm is used to solve linear problems linear problems what the algorithm basically does it goes around a feasible region constrained by different linear functions and it finds the first intersection the optimal point between your objective function and feasible region linear problems are used widely across industries they're used in routing and logistics financial planning how to allocate money across your portfolio in manufacturing how to figure out the perfect drilling on the chip and in product mix and linear programs have been available in Excel for quite some time in fact it's just an add-on on your machine however if you're on a Mac like me good luck using solver on Excel if you have a problem with more than 200 decision variables Excel is not an option and if you want to integrate it into production with some interesting web application Excel is in our way together this is PI data so obviously we will be solving this problem with Python so during the next 10-15 minutes I'm going to give you a crash course into linear programming we're going to look at the implementation in Python and then we're going to dive in into important examples of picking destinations for you vacation so to solve a linear problem you need to have three ingredients you need to have an objective function what you want to achieve we want to maximize profit we want to minimize time what do we want to do we need to have decision variables things that we will be modifying in order to achieve our objective and we live in the linear in the real world so we have a lot of constraints we never have enough time enough money other things so to give you a small example of how linear problem was going to look like if we are to pick up what we're going to eat for dinner and we want to satisfy a minimum intake of protein of four units per day we have two choices steak of peanut butter and we want to have the minimum cost of dinner this is how it's going to look like in linear program we'll have x1 for peanut butter x2 for steak our objective function of cost and constraints of satisfying minimum protein intake this is how we're going to solve for it graphically if our y-axis is steak and x axis is peanut butter these are all possible options the moment you bring constraints you limit your feasible region then you start moving your objective function and the first time your linear objective function is going to cross feasible region voila this is a solution so our optimal dinner solution is going to be two steaks and $6 to translate this problem in Python you need to have two components you need to have a solver and you need to have a modeling framework as I've mentioned linear programs is a very well researched area in operation research so you have a lot of solvers out there open source and commercial solvers so we'd highly discourage you from coding your own solver a different thing is the modeling framework this is where you will bring all of your equations together there are three main packages that people use around stipe I buy almond pulp by Omo doesn't allow you to write your linear problem in Python it requires you to write it in a mathematical language and then it compiles it for you so if you want to do a very pythonic implementation with a lot dynamic variables it's not a good option SAP I will require you to translate all of your data into numpy arrays so if you want to have very complicated mattresses it's also not a great solution my personal choice is pulp it had it's open source it has very good licensing very easy installation but most important feature its interoperability that means that you can very easily switch from one solver to another if you acquire a very expensive solver moving forward you can very easily change it so that's how the problem is going to look in Pulp first we are going to define a pulp object which is LP problem we are going to append to it our decision variables and then our objective and constraints to solve the problem it's just going to be one line Rob that solve and the solution is going to appear like this so enough of theory let's look how solver actually how linear problems actually solved if you want to follow along I put two ipython notebooks on github so weird you go on vacation there are so many options out there you have so many deal websites but the problem is that we want to keep our costs down we also have a limited number of days that we can go on vacation so for this example I scraped a data set from a website which is called the climb com they offer you multiple packages from mountain biking in Peru to climbing Basecamp of Everest the data set looks like this we're going to have destinations duration of the trip the cost of the trip and a short description of what our vacation is going to be if we are plotting the data side the packages vary from somewhere from two days to up to two weeks and the cost will be from $200 to $3500 coding this problem in pulp we're going to first create our LP problem we're going to specify if it's a minimization problem or a maximization error case we keeping budget down and the data set is loaded as a panda's data frame so we're going to look through the data set and for every row we are going to create a decision variable and append it into decision variable array our objective function will be multiplying every decision variable by the cost of the trip and that's what we will be minimizing and then constraints whatever we pick we want to have all of our decisions to be equal to ten days and we can go overboard if you have more than 10 to 20 decision variables your objective functions and your constraints become extremely big so bob has this very nice function of exploiting your problem into a file with that LP extension where you can get a very good overview of how your function is going to look like so this is our problem defined every decision variable x cost subjects to constrain what every we pecks should be equal to 10 and a list of our decision variables we do prop that solve and assert that we reached the optimal decision making and we want to print out what the final cost is so our vacation for 10 days is going to cost us only $762 which is really not bad and then no matter which modeling framework you're using they're going to give you this very cumbersome list and you have to scroll and figure out ok what my decisions 24 was so I just wrote a small parser that is going to loop again three decision variables and put it very neatly into a data frame so the final decision where we're going on vacation we're going to go to Oregon for about six days and to Maine under $762 for 10 days I think it's really not bad another problem that I want to show you is how to pick reading lists and it sorry yeah vacations that together it's cackling yes that's a problem and you can do a lot of variations on it in case if you have some destination in mind for example but you want to keep the cost down you can put it as an ax constraint so the good part about cleaning a problem is it's really up to you and up to your creativity as long as you follow the rules and construct equations it can be a different problem every time exactly if you want main on destinations for sure and but just keep my cost down it can be one of your constraints for reading list problem we I don't know about you guys but every year promise to myself that I'm going to read more so with my friends we coded a problem where we maximize for the number of books the problem is that we all have limited amount of days that we can dedicate to reading books our reading speed is a constraint and and to go through this problem I scraped Goodreads that provides a list of New York Times bestsellers it gives you a very neat data set where you get the author of the book the name and the number of pages again it's a panda's data frame if we are going to plot it most of the books on average have about 300 pages and ratings are pretty much the same for a New York Times bestsellers in a similar fashion we're going to first define and an object we are going to specify that it's an LP maximize problem again we're going to look through the decision make through every row and assign every book as a decision variable I think I didn't mention but pulp allows you to specify the lower bound and upper bound and can it gives you an option to transform your linear problem into integer problem that means that every decision variable is going to be binary then we're going to parse through the box and we want to whenever a number of books that we want to pick we want to achieve the maximum number and our constraint is that we only have five hours per day to read and we can read it most 60 pages per hour that at least my speed if you have another one feel free to modify and whatever we can pick we can't really exceed this limit so after we defines a problem we are going to export it as an LP file it's going to give us a neat overview of how our linear problem looks like and after running this and parsing it we're going to get to a very nice data set where we will get one for the books that get scheduled so it gives us about 46 books which my opinion is a great gulf rise next year this is all the examples that I actually have for you guys the promise to keep it short I know that linear problems is probably not as sexy as deep learning but the most interesting part is that most simple algorithms sometimes the most efficient ones we use at my company linear problems to optimize across brands media assets it has more than billion rows and linear problems work magic for us thank you do you guys have any questions or can we go inside drinking hi yes it's actually involved it runs under zero point 17 seconds for about fourteen million rows of media across 30 brands so you get about a matrix with 30 brands and 14 million media assets for them and 0.17 seconds done ah we are we're using GU LP as the sole reasons we used and its death so it's fully populated it really depends I think your performance is going to depend on how many constraints you have if you have a feasible region big enough it figures it out pretty fast if you have a lot of constraints if you're saying okay those brands cannot play with this or if we're looking at the destinations if you will give it a lot of constraints and it starts it takes more time to think about it so funny the chest I talked about this in Portland and after this a couple of people coded with linear programming brewery visits in Portland so I was a judge a selling traveling salesman problem where they took into account the distance and the times that it takes and if you have two days visiting Portland how many breweries can you actually visit which are close enough a lot of people use it on fine examples I think once you start bringing the distance into it and closing opening times you can use it for scheduling how many museums you want to visit when you're traveling or covering all of the landmarks grocery store lists can be a very useful problem solve between your programming it's that's where you do that cert if you assert for your solution it's just one line here when you say did you achieve that optimal result is just going to tell you no no solution has been found yes exactly it's going to speed it admittedly if if there's a junior old times that it takes to solve your problems pretty fast it's going to tell you immediately I have not found the solution we use this long like a staffing try to see if we can staff customer service using linear optimization we found that there are so many constraints that it would take 8 to 10 hours and just to run so do you have any thoughts on just generally the approach to how do you speed up in linear optimization problem did you guys use Python to do this or you use Python Oh interesting I think my first thing would be how can you restructure a data set if I can instead of putting into constraints if I can redefine an objective function it will be it will save so much more time so instead of putting this in to constrain try already pre calculate things and just feed it a different data set so if you can solve for constraints beforehand I know this all is a scheduling problem we used it for this as well but I used Excel for that a long time ago cool thanks guys have a great day
Info
Channel: PyData
Views: 40,658
Rating: 4.9352226 out of 5
Keywords:
Id: 7yZ5xxdkTb8
Channel Id: undefined
Length: 16min 26sec (986 seconds)
Published: Fri Dec 04 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.