How to Forecast Sales Using Prophet and Python in 15 Lines of Code

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
cool thank you so much so just a heads up it doesn't look like i can get video on the live stream but that's fine i'm still here for all of you guys so for those of you that don't know me my name is nicholas trunot i'm a data science specialist at ibm as nathan was mentioning and today we're going to be going through sales forecasting with profit now if you've got any questions at all like i mean absolutely anything do mention it in the q a or on youtube chat i'll be monitoring it as we're going through you can see that we've got a bunch of people flowing in welcome so in terms of what we're going to be doing today we're actually going to step through a up we've skipped forward we're going to go through some basics of sales forecasting so in terms of our game plan we're going to start out by loading some data into a notebook so we're going to be using a jupiter notebook for this then what we're going to do is we're going to train a profit time series model so profit is a library which has been developed by the in-house data science team at facebook and it's ridiculously good for time series forecasting now this could be forecasting um weather which is a little bit tricky to predict but it might also be used for forecasting sales forecasting anything which really has a time series based trend so once we've gone to train that model what we're then going to do is we're going to make a bunch of predictions and evaluate our performance so we'll actually be able to do or perform a real-time training run and actually test this out in real time now i've used this in a bunch of places and funnily enough i was actually talking to people on youtube this is one of the first python libraries that i actually went about starting to learn when i was kicking off my data science journey so um feels like i'm coming full circle now in terms of how this is going to be working let's go to our next slide and we don't seem to want to go to the next one that's fine so what we're going to do is we're going to load some data into our jupyter notebook then we'll train our model with the facebook profit package and then and last but not least we'll go ahead and make a forecast so we'll evaluate our time series trend and make a bunch of predictions but again if you've got any questions at all throughout this by all means do mention it in the q a or just hit me up on the youtube live chat i'm more than happy to answer any of your questions so let's actually take a look at our data to begin with so i've just got a csv document here which has a bunch of information in it now specifically i've got three columns so column a column b and column c and in my first column i've got a column called date so again it's really important whenever you're working with time series data that you've at least got some form of date or time stamp column because this is what you're going to need in order to go and forecast the trend and actually generate a time series forecast then i've got a store and product combination now you might tend to get data sets where you've got store and products separated you might have multiple different columns now in this particular case we are going to train a individual model per store and product combination now you can see here that i've managed to get my hands on some elusive tesla model sales data this is obviously made up but what we've actually got is a combination of our store so in this case you can see this store is from la over here and then we've got a product as well so in this case we've got a tesla model x all right so it looks like we've got a question so for later does the do the dates need to be continuous or can they be gaps great question bill so you can have a ideally you want to have a continuous data set but it can have gaps profit will be able to impute certain amounts of data so it's core that you don't have giant gaps but you can have some missing data and profit can impute that just core thing to note um another question is this a github repository it sure is and i will share the link to that a little bit later so um you'll actually be able to pick up all this code and take it away i'll include the data as well okay so we've got our store and products combination and then we've got our values so you can see here that we've just got a bunch of data and i think i've got data for around about three years so it starts out in 2018 so all the way back and then if we look at one time series let's actually filter this all right so you can see that we've got data for three different product store combos we've got la tesla model s la tesla model x and san fran tesla model s so if we take a look at one time series so you can see we're starting at 2018 and we can go all the way to the end where are we ending up i've obviously got data oh wait no that's looking good so where are we so it looks like we've got data up until around the end of 2020. so we should be okay to go there now what i'm going to do is i'm going to release that filter okay so we've got some data we've got a date column we've got a store product column and we have a value column now what we can actually do is we can take this csv and keep in mind this csv i'm just going to move this out of my way so this csv is called dataset.csv so if i actually show you this let me open up my data science dojo folder so this is what our folder looks like so i've got one file called dataset.csv and then i've got our jupyter notebook right so now it's time to get into a little coding so great question gustavo is there a specific format for the date column yes and what you'll see in a sec is that we're going to pre-process our data to make sure that it is in a date time format so that's going to sort of help us out a little bit but ideally you just want to make sure that you're able to extract the year the month and the day as well but i'll show you how to do this in a second as well so you'll see how you actually need to process that okay so this is going to be what we're going to be working through today so first up what we're going to be doing is installing our dependencies then we're going to load up some data so in step 2 then in step 3 so this is for you gustavo so we're actually going to apply some data pre-processing and convert a date column to a date time format inside of pandas or with with our pandas data frame so you'll actually see this then we're going to create a time series model we'll evaluate it so we'll actually make some predictions and then i've got some bonus code for you so as well so keep in mind that whenever you're creating time series models what you tend to do is create one model per time series right so it is a mult so we've got multi-time series data we and we've got univariate data so we've only got one column that we want to predict and one column that's going to be used as our input but we've got different time series that we need to deal with so the way that we normally deal with this is to create different models per time series so we're going to create one for each one of our product and store combo so i'll show you that right at the end okay now let's actually quickly take a look at the library that we're going to be using so we're going to be using profit whole bunch of information in here it's really well documented so if you want to see how we've gone and built this up or if you've got specific things that you need to take a look at whole bunch of information there that you can actually take a look at um and whose question was this so mw so you're asking if this is a github repo it sure is so you can actually go to my github repo so github.com forward slash nick knock knack forward slash what have i named this multi-store product forecasting space with space profit i was naming that this morning probably a weird naming but that's fine so you've got all of the data there right um and that includes the jupyter notebook and the data set so you're able to pick up all of that and leverage it if you want cool so what we're going to do now is actually kick this thing off so first up what we need to do is import a few dependencies so we've got two key dependencies that we need to import so we need pandas and we also need profit now there is one additional dependency that you can work with and that is plotly so that allows you to have interactive visualizations i haven't got that set up in this particular case but that's fine it will still work so let's go ahead and import our dependencies so those are our initial dependencies now installed sorry i missed value sales correct current so the value is sales so this i should probably should have called that out this value is our sales value okay so those are our true dependencies imported let's actually zoom in on this make it a bit bigger so the first line that we've gone and written is import pandas as pd so this is going to import pandas import pandas and so pandas if you've ever done much data science so this is a or if you've ever worked with much tabular data pandas is a really popular tabular data processing library available inside of python and it basically just allows you to bring in any sorts of tabular data so say if you've got a csv say you've got excel say you've got an adjacent string that you want or json object that you want to go and pre-process into a table like object pandas is the library for that then the second line that we've gone and written is from fb profit import profit so this is actually going to allow us to import the profit library into our jupyter notebook so you'll actually be able to leverage that when it comes to forecasting yourself so if we add a comment there so this is going to be the profit modeling library cool so reasonably straightforward so we've only got two key dependencies that we're going to need to bring in for that and on that note that that's step one really now done so we've now gone and imported our dependencies now the next thing that we need to do under step two is actually bring in our data so so far we don't actually have any data inside of our notebook we've just gone and imported our dependencies so what we're going to do is we're going to use pandas so pd to actually read in our csv so if we could go ahead and do that so actually let's take a look at the read csv function first if i type in read csv question my question mark so you can see there's a whole bunch of options that we can leverage when bringing data into our document using the read csv function but basically the core function of that is to read in a comma separated values file into a data frame so that's exactly what we're going to do and if you cast your minds back remember our data set is going to be in the format of csv so that should be good to go so let's actually do this so what we're going to do is name our data frame df and we're going to bring in our data set so i'm just going to type in the full path so dataset.csv and that's pretty much it for bringing in our dataset so all we need to do is pass through the full path to our csv document and that's going to come into our jupyter notebook so i've written df equals pd dot read underscore csv and then i've passed through the full path to that particular data set that we want to bring in now if we go and take a look we can actually review our data so if i type in df.head you can see that this is going to give us the first five rows of data inside of our csv data set and again this looks really familiar to what we were seeing when we were actually working with or when we actually opened it up in excel now if we actually take a look so who was it that asked this so somebody i think the comments have gone now but it looks like so one of the core comments so gustavo you were saying that do we need our columns to be in a specific format we definitely do and right now it's not in the right format so if i type in df.d types you can see that our date right now is just an object which means it's just being treated as a string so as of right now we don't actually have any of the like value or the any of the richness of actually having a date column inside of our data frame right now it's just being stored as a string so we actually want to go on ahead and convert this into a date time format and that is exactly the first data pre-processing step that we're going to do inside of step three i'll pause there are there any additional questions you guys shooting stuff through doesn't look like we've got any that are unanswered okay cool we can keep going so let's go ahead and do a little data pre-processing so the main form of pre-processing that we're going to do is we are going to convert our date column into a date time format so this is we can do this using the two date time method from pandas so if i type in pd.2 datetime so basically the core thing is that this is going to convert a particular argument or column into a datetime format within panda so again really straightforward to actually get this done uh looks like we've got a question on youtube chat will this uh will the data set be shared sure will it's available via github so you'll be able to pick it up here so what were we doing so we're going to use the daytime function to convert this from a string to a date time format what we also need to do is we need to filter out one store and product because remember the profit package is only going to work with one time series at a time so we need to filter down and single out one specific store and product and one of the or one of the nuances with uh with actually working with profit is that you need the date column to be called ds and you need the sales or whatever your time series value column to be called y so we're also going to do that so basically there's a few different steps so we're going to convert our date column into a daytime object we're going to filter out on the store and drop this column we're then going to rename our column names as well so this will become ds and this will become y alright so let's do the first thing so let's actually go and convert our date column so i'm going to overwrite our existing date column right so remember when we're working with pandas we can filter out on a single column just by identifying the column name so just by typing df and then passing through the index date this allows us to filter through then what i can go ahead and do is actually apply some processing on that or overwrite it effectively so we're going to use pd dot to date time and then we're going to pass through the columns that we want to apply that true date time formatting on so if i type in df.date you can see that that has now gone and pre-processed our column now you can't actually see anything yet because i haven't shown you but if i type in df.d types this has now gone and converted our date column into a datetime object so we can now actually begin to work with that so if we actually take a look let's make sure that we've actually got all of our data so if we type in df so you can see that our dates still sort of maintain some semblance of structure so you can see down the bottom the last value is now reading as 20 20 12-16 i wish i'd actually kept it the same up here so i could have shown you but it's still gone you can actually first row about is sort of valid so you can see that the first value in over here is the 1st of january 2018. you can see that's exactly the date format that we've got over here so the first uh or the in this particular case it's going to be year day month so the 2nd of january 2018 2nd of january 2018 30 january 2018 3rd of january 2018. so on and so forth so we are good to go there now the next thing that we want to do is we want to create or isolate our data specifically for the specific store and product combo because remember i was saying we want to isolate the single store product combo because that's what we're going to generate our forecast on so let's actually go ahead and do that so what we'll do is we'll actually apply some filtering and we'll take a copy of this data frame of the specific part of our data frame that we need so let we're going to call it frame and let's go on ahead and do this so we're going to create a filter on our store and product combo and let's just say for now we're going to grab this storm product combo so we're going to grab la and we're going to grab the tesla model x we don't have the road stream stock yet so what are we doing there so i've written frame equals and then df so this is going to let me actually show you this so this is akin to effectively just filtering right so i've written df and then inside of here this is my filter so i'm basically saying on the df store forward slash product column i want to filter out and only bring back rows which meet the conditions of being lost underscore angelis dash tesla underscore model underscore x so this is effectively going to filter our data frame now if i take off this filter right so if we take a look at all of our data so right now with our filter on we've got 1080 rows if i take the filter off we've got 3240 rows so you can see that a filter definitely has been applied and if we actually go and apply some or take a look at a unique value so if we go and filter on store again you can see that we've only got a single store product combination there so we are good to go in that respect now what we actually want to do is we want to save this as a new variable so rather than having um just a in memory capture we actually want to store this as a new variable so we can begin to work with it so we're going to call this slice actually let's not call it slice because that's reserved we're going to call it frame so our frame is effectively going to be a copy or a slice of our entire data frame now what we want to do to make sure that we're not um partitioning this or we're not actually just applying this in memory is we're going to or not in memory we're not applying it in place we're going to create a copy of this particular data frame by passing through dot copy over here so now if we take a look at our data frame this is our data so we are good to go so we've at least got a slice of data now that we can begin to work with now so far let's actually take a look what we've done on our data pre-processing so we've gone and converted our date to our date time object we've also gone and taken a slice of our data and made a copy of that now the next thing that we want to do is drop this column because we don't need that anymore right it doesn't actually add any value because we've only got one store and product combo so let's actually drop this so we can use the drop method for that and i'm going to say what column i want to drop so store four product and because we're dropping on a column we need to pass through uh we need to set the axis so by setting access equals one we're effectively saying drop a column not so much a row and we're going to apply this in place and that should effectively go and perform our drop so let's take a look at what we wrote there so i wrote frame dot drop specified the column that i want to drop so that's our first argument and then i've gone and specified a couple of additional keyword arguments so i've specified access equals 1 because we want to drop a column not a row and then i've specified in place equals true because we want to go and apply it on this data frame we don't want to create another copy of it so if we go and run this now can't see anything yet but if we go and take a look at our frame you can see that we have in fact dropped that column so we're now good to go all right so we've got our date and our value so we've gone and applied three different sets of data preprocessing so we've gone and created our date time column we've gone and created our frame or created a copy of our frame and we have also gone and dropped our storm product combo let's take a quick look at our question so do i need to install facebook profit i seem to run into an error when i run the first line if you install facebook profit do i have to install python so yep so you've got install pi stand if you're running on a windows machine take a look at the min gw compiler that hopefully helps uh that go and transform correctly uh so joe so the did the date conversion happen correctly the original day dates were daily but now they look like they're monthly so i went i checked this previously so it has happened correctly so um it's gone and converted it so in this particular data frame i actually had it as what did i have it as uh day month year when we go and apply it uh this data preprocessing it goes to year month day so again it should be fine you'll see see this impact once we actually go and produce the forecast cool all right so what do we what do we have to do now so then the next thing that we need to do is go and rename our column so right now it's specified as date and value but we need to go and flip these around or rename them effectively so let's go and perform this last data pre-processing step so we're going to set frame dot columns equal to ds and y and that is our last data pre-processing step now done so if i go and delete this so you can see that we've now got our date and we've now got our date column and we've now also got our sales column so we are now effectively good to go with our forecasting okay so that is let's quickly go and recap so in terms of our data preprocessing we've gone and converted our date columns to a daytime format we've then also gone and performed a filter on our data frame to single out one specific time series and then what we've gone and done is we've gone and dropped our store and product column to make sure that we've only got one time series that we're going to be working with and then last but not least we went and renamed our columns and this is our output so we've gone and taken this over here and transformed it into to this which is now ready for us to pass to our time series model so let's go on ahead and do this so right now we don't actually have anything under step 4 but we're now ready to go and produce our model so let's do it so first up what we're going to do is we're going to create an instance of the profit package or profit class cool i'm just going to grab a sip of coffee okay and then the next thing that we're going to do is we're actually going to go let's actually take a look at what we write there so basically i've written m and m is going to represent our model in this particular case and we've set that equal to the profit class then we've also gone and passed through one argument and we've set out interval width and we set that to 95 or 0.95 now you can play around with this if you wanted to but in this particular case this is sort of like the baseline model setting that you might choose to tune with now the next thing that we actually want to do is we actually want to go and train our model so again pretty similar to how you might go and train a scikit-learn model or how you might go and train a tensorflow model it's also it's going to be fit and predict so we're going to store our model inside of a variable called uh we're going to store our training results inside a variable called model and or this could be called training run and we are going to run m.fit on our data frame so remember we named this over here frame so if we go and run this now that's our model now train so you can see that it went reasonably quickly so that that's pretty much it in terms of training so i've written training dot or training underscore run and i've set that equal to our training run so m.fit and then i've passed through our data frame in this particular case uh no so i'm not doing a train test split over here sean so um so we're ice we're actually going to evaluate on basically everything that we've got and then we're going to push it out you could go out and split it out but in this particular case we're actually going to produce a future a future split and push this out again you could pump out this this entire model or pump out this particular run and make it significantly longer if you wanted to but in this particular case we're just going to train on all of our data okay now the next thing that we want to do is we actually want to go on ahead and make some predictions so let's go on ahead and do this so first up what we need to do is make a future data frame um so what is profit doing in the background is it using a neural network so no this is using specific time series algorithms that have been developed by the facebook team so again whole bunch of information on this if you want to see as to how it's actually gone and being generated um and sort of who was at the mention so joe so you can see that this is sort of the date format that the profit was expecting so year year month month day day which is exactly what i had um and then it went and transformed it but again if this is if you do pick something up which is not quite right do mention it in the comments below i'll update the code if you've seen something there that it's not quite right but um in this particular case we're going to power forward so the next thing that we want to go on ahead and do is evaluate our model so first up what we're going to do is we're going to make a future data frame and we're going to store that inside of a variable called future then we'll actually go and make a prediction and then we'll go and push this data out so let's go ahead and do that so in order to make our future data frame i'm going to create a variable called future and we're going to set that equal to m which in this particular case is our model from up here and we're going to use the make underscore future underscore data frame para or method to actually go on ahead and do that we can then go and set how many periods we want to generate so i'm going to set periods equals to i don't know let's set it to 200. so this is effectively going to go ahead and forecast for 200 days forward then i can spend my specify my frequency so i'm going to set that equal to daily and so that's our future data frame and now created so if i type in future you can see we've got all of our future data frames going all the way out that doesn't look like it's actually gone far enough out so let's look at our tail so that's out to 2021. so if we take a look at our tail you can see that those are the data frames that are appended to the end so if we actually take a look at our head this is going to include our initial training data or initial training data dates so you can see out to 2018 we go all the way to the end all the way out to 2021. now the next thing that so you can see that basically if you wanted to go and train for or go and forecast for longer periods you could push this at 400 and you can see that we're now generating future periods which are further out right so again you can push this out a whole heap further if you wanted to now in terms of actually going and generating this forecast all we need to do is run so remember to train it's just m.fit to actually go and forecast forward it's just m.predict so what we can do is we can create a new variable called forecast to hold our forecast and what we're going to do is we're going to set that to m.predict and we're going to pass in our future data frame in order to generate that forecast and that is our forecast now generated so if i now go and take a look at our tail so forecasts dot tail you can see that we now have our future forecast now pushing out there so we've got a whole bunch of information here we've got our date which is our date column and we've also got our which is most important we've also got our y hat column but you've also got a whole bunch of additional information so you're able to take a look at the trend you're able to take a look at the upper the lower bounds you're able to take a look at the specific trend components so trend lower trend upper your additive terms your weekly values or your weekly trend components or weekly weekly value weekly lower weekly upper yearly yearly lower yearly upper and you've also got multiplicative terms and but the most important column is going to be this y hat column so your y hat column is effectively what actually represents your forecast in this particular case now if i actually take a look at our head value so right now we've got our so if i scroll out so we've got our baseline value which was uh it doesn't look like we've actually got that let's actually bring it in so df.head or frame.head so you've got your df or your frame original frame that we actually went and trained on so in this particular case our y value was 2926 and our forecast was sort of saying we've got around about 2802 so not perfect but what you'll actually see is when we go and visualize this you'll actually be able to see the trend behind it so let's go on ahead and actually forecast this out so oh let's actually go and visualize it so if i type in plot so let's actually specify a plot so we're going to save this so plot 1 equals m dot plot let's actually take a look at what comments we've got um so where did i use 0.95 as our interval with so that's just our confidence interval that we want to use or how confident we want our model to be what type of model is being used in the back end i haven't actually taken a look at what models are used in the back in daniel i've just sort of used the package i can actually get you some information on that if you want more detail what else do we have have i missed anything ah good to know another cool thing is that there's actually a library called sorry i was just reading the comments so profit uses an auto regressive model but uses fourier transform on the training data to estimate the best parameters thanks mandy so you can also take a look at the another profit like package which is called neural profit and the cool thing about that is it actually uses an ar net neural network behind the scenes so it's actually again so it's sort of based on this but it's actually using deep learning in the background can you show the data and prediction line on a graph before the future data frames yeah so this is exactly what i'm going to do now so i'm actually going to show you what this looks like so let's actually go on ahead and do this so what do we have so we've got our plot and now all we need to do is pass through our forecast and there you go so you can now begin to see how our model is generating a prediction so you can see that we've got a large number of forecasts sorted towards the bottom here it's now shooting up and keep in mind this is our training data so all the way up to here is our training data cool and you can see that it's actually mimicking this trend so a large majority of our forecasts are towards the upper bound so it's actually going and predicting that this blue line is actually our forecast that's actually profit doing that forecast so you can see it's starting to mimic the trend that it it's actually seeing and it's the in the actual data there right cool any questions on that so that sort of gives you have you tried so we've actually got a question there so have you tried neural profit is it faster in model fitting it's actually a bit slower medi so that's just a key thing to note um but again it's it's still pretty cool as well it's not as well supported as facebook profit but again there's quite a fair bit that you can do with it okay so what we're going to do next so let's actually take a look at what we've done here so under our evaluate model method we've now gone and made our future data frame we've then gone and made a bunch taken a look at our results and then we've also plotted them out now the last two things or the last thing that we're going to do is actually take a look at our different trend components so we can actually break down our trend or the trend that we've actually got in our time series using the plot components method so if i type in plot2 so we're going to save this plot two equals m and then let me zoom in on this call components and then we if we pass through our forecast again this will actually break down our sales forecast into its individual components so you can see here that it looks like from a yearly period or overall we've got an upwards trend and you can see that that's replicated when you actually take a look at our data here so it sort of starts out a bit lower and then we end up heading towards an upward trend if we go and take a look at our weekly trend or what is this yep weekly trend so you can see that on sunday we're sort of starting up we're dropping a little bit further down and then on wednesday we're sort of hitting our low thursday we spike and then friday saturday sunday looks like friday and saturday spiking so again this might be indicative that people are buying more tesla's on thursdays fridays and saturdays so again they've got a little bit more time and they can actually go and spend more time going and shopping around so on that note that is pretty much the crux of this tutorial done let's actually take a look i'm going to show you how to do it without multiple time series in a sec but let's actually take a look you have more than one point per data now i believe we've got one but carlos if you're picking up that we've got more than one data point per data that might be i might just need to fix up that data set in that particular case good pickup though if that is the case i've got to double check for it but if it is awesome um so what else do we have have you tried splitting the training data into test and training data i mean can we split yeah so normally i'll actually split this out but in this particular case i've just gone and thrown everything to it so best practice would be to split out your training data and testing data separately and then going and potentially performing like a mean squared error analysis on the what is it on the test date on the looking at your testing partition versus your what you've actually gone and predicted so your y hat can you include other features columns for example buys ad buyers and see how they impact the model so that is beginning to get into the realm of multivariate forecasting so you can actually add this thing so let's actually take a look so you can actually add in a whole bunch of additional information into your forecast so you can actually add holiday effects regressors you can also add in um what was it there was uh what was i looking at before might not be here but i was actually taking a look at whether or not you can apply um so ad spend is a really good one but holidays is probably an even bigger one ad spends you might actually take a look at performing um a look at other techniques for that but you could definitely add leading and lagging regressors so that might be another thing that you choose to add in what else do we have there seasonality yep so seasonality sort of built in i'm just looking at the comments here okay so where are we now so again there's a whole bunch of addition i've sort of gone through this very at a very high level but you can definitely go into it in a whole heap of additional detail you can add holiday events um you can add change components to say for example you know that there's going to be a change in trend say for example you had covid and that was obviously going to throw out your data you can actually add change i believe here so trend change points so this actually gives you the ability to go hey at this particular or in this particular time period something's going to shift and we need to make sure that we actually cater for that so that we can actually um adjust our time series going forward so you can actually build that in as well so what's the score of prediction so you can actually so francisco so that was your comment so you can actually get the y hat value out of here in terms of going and performing like a msa test you could just bring that in from scikit learn and compare your y hat against what you've got inside of your base data frame because keep in mind when you go and produce this forecast it's actually going to go and bring all of your it's actually going to make a forecast for your training data as well so you'll actually see that there so these dates are actually part of our training data any questions i have an answer so sean i'm just trying to think about your particular question so can you include other data features i haven't done that myself you could probably try to build that in as a leading regressor i mean to be honest if i've got some ad spend data i'd probably start doing that it might be indicative of the fact that it's no longer a purely seasonal data set so you might choose to to model that slightly differently but again hit me up in the comments below i'd be interested to delve into that one a little bit more detail with a little bit more detail but not something that i've traditionally done with profit in this particular case okay what's our last thing that we need to do so we've now gone and done a bunch of stuff right so let's quickly recap so we went and imported our dependencies we went and loaded in our data yeah i didn't split it into training and testing but you could definitely do that if you wanted to just keep in mind when you're splitting into training and testing you're not going to be doing a standard random split so you'd actually be partitioning on segments of your data so you'd actually need continuous data sets so what you might choose to do is you might choose to train on i don't know uh two years of data and instead of actually going and so train on two user data and then go and forecast out on the rest of your data in that particular case um do you really appreciate the q responses do so another question from sean so really appreciate the q responses do you know if profit automatically discounts the value of older data no no so what it actually does is it factors that into the trend so it's actually so even though we've got lower trend so it's actually estimating what's happening over time so older data is still very much important when you're actually using um or actually building this into the model key thing to note though is that if you've got if you've got change points so say for example right i'm a i'm trying to think of a business that has probably been hit say for example i'm a hospitality retailer and i know that say for example i'm a pub even better so i know that i'm going to sell 3 000 pints on average per day um i'm obviously going to have a bunch of seasonality in there i'm going to sell more on weekends now covid hits and you know that there's going to be a huge shutdown what you might choose to do is actually configure a change point and go hey once covid hit we're actually expecting a change in our trend so you might actually choose to change or allow profit to actually retrain over that change point period so this means that you'll actually be able to see the impacts of a downwards trend or kovid actually hitting uh during that period so again there's a whole bunch of additional stuff that you could do there okay so the last thing that i wanted to go through was how we can sort of scale this up so remember when we went and took a look at our data set we had three different product lines so we had stores and products and we only took a look at what did we take a look at lots of underscore angeles and then we took a look at our model x's so how would we actually scale this up so i'm not going to write this code from scratch because there's a bit here and we've already gone through what are we at 44 minutes so what we can actually do is we can actually extract all of our different product and store combinations and remember i did this a little bit earlier before so if we type in df and then pass through the index that we want to process we can grab all the unique values by typing in dot unique so this is actually going to extract this out so you can see by doing that i've extracted all of the unique product combinations so in this particular case la tesla model x la tesla model s san fran tesla model s now in this particular case we're dealing with i don't know so it's going to be tesla so it's an automotive company but this could just as easily be a whole bunch of different skus in a supermarket uh it could be um i don't know drinks that we were talking about pubs could be say for example you've got a whole bunch of pubs that you want to forecast sales for you could definitely do that um and forecast food and beverage sales um versus other streams of revenue so you've definitely got the ability to do that um great question so i just saw this question pop up on the youtube chat so if it was sales data would you choose to transform or normalize the 2020 data during the during covid in some way talking to a large number of my clients what they're actually choosing to do um is to exclude cove the the 2020 data entirely from from their sales data um because it was an extraordinary period and ideally if everything sort of starts shifting back to normal they're not expecting that to to repeat a large number of them are actually excluding 2020 entirely from um from their sales forecast so you've got the ability to sort of do that um you could configure a change point but the only thing is that once covid sort of actually steps or once covered it sort of ends you might expect the trend to actually go back so you might choose to configure one change point for coveted hitting and one change point for the impacts of lockdown opening backup so um that's potentially what i'd be doing so i'd actually be configuring two or excluding the data entirely and depending on what type of business it is um so normally you'd like to go and and tune depending on how you actually see your model coming out that way okay what are we up to now so the last thing that we want to go on ahead and do is actually oh so we were actually doing this for multiple products now let me explain in what's actually happening here so we're actually just looping through all of our different products so if i actually show you this let's zoom out uh so if we go uh and type in force stock line in lines what we're effectively doing is we're just looping through each one of our different product store combination stock line so you can see there that by looping through all of our different lines inside or all of our different stock lines inside of lines we're effectively just grabbing a string in this particular case now in this particular case what we're going to do is we're going to store our fit models inside of a dictionary called fit underscore models so this is pretty similar to how i'd actually go about training a number of machine learning pipelines when i'm using scikit-learn so i'll actually did profit give facebook access to your priority no so it's all open source shawn so so that's a key thing to note so it's all stored locally um you don't once you've gone and installed the package you can actually disconnect from the internet and just run it locally so again it's um you can safely use it what were we talking about so fit underscore models so you can actually store your models inside of a dictionary called fit underscore models and again this is similar to how do it with scikit-learn so what we're then going to do is create a new instance of our profit class exactly the same as what we were doing right up here and then we're effectively just going to replicate our training pipeline so we're going to create an instance of our profit class we're then going to fit our data on our frame and we're not actually filtering this out so we need to do that so let's actually go and do that should have taken a look at my data nick so we actually want to go and isolate our data and then we're going to drop our column so what we'd need to do is we'd need to go and isolate our column and we're going to go and ahead and do that and so what we're doing is we're effectively just filtering out on our data so if we go and perform that filter now so we're going to set it equal to stock line and that's going to make a copy of our data frame and then we need to go and apply our transformation again and then we also need to go and reset our column names so you can see here that what we're doing is we're effectively just going and replicating how we went about training our initial model but rather than doing it individually this time we're actually going to go on ahead and loop through and train on our individual stock lines so this first line that i've actually gone and read in here is we're actually going on and filtering out on our store and product combinations so remember we're going to be looping through each stock line from up here and so we're going to create a filter for each one of those different stock lines so when we go and loop through so let's actually take a look so say i go and copy this and then if we go and run uh let's go and print out df dot tail so what we're actually going to be doing is a looping through am i actually filtering through on that no i'm not uh what's happening there oh frame.tail so what we're actually doing now is we're actually looping through each one of these different product combinations so in this particular case we've got store and product and we're actually going in outputting the last couple of results from each one of those product lines now if we go and do that over here we're effectively going and resetting our frame to the new product line each time we actually go and run through this so we're going to go on ahead and drop our store and product column and then we're going to go and reset our column names as well so ds is going to become our date column and y will be our y value column as well and then we're effectively going to go on ahead and fit on this frame so let's go and set that there so if we go and run through that now we should effectively oh and then this last line is quite important as well so once we've gone and trained our model so remember we've got model equals m dot fit and we're passing through our frame we're then going to go on ahead and set our model in or push our model into our fit underscore models dictionary and we're going to set that equal to m so if i go and run through this now it looks like we've got an error data frame has less than two non-nan rows so it looks like we're getting errors there but that's fine you can effectively just go on ahead and loop so i'm guessing there might be some issues with our data let's actually filter add lines uh let's actually do it on our first two looks like this data set we might be missing some data there but that's fine let's actually just loop through our first two so lines equals lines are two okay so there you go so i've only gone and trained on two models looks like somebody said that that might be some uh duplicate data might have more than one data more than a particular indication for one particular date so it looks like there might be something wrong with that data there but that's fine you can still go and loop through your data sets so i've gone and isolated the true data sets that we're going to work with so if we're going to type in lines you can see that i've got loss underscore angeles and then we've gone and specified so we've now got our model x and we've now got our model s so if we go and loop through now you can see that we don't have any issues there now if i go and take a look at our fit underscore models dictionary you can see that i've now actually got two different models saved so this is sort of how you might actually go and loop through and actually generate a large number of different models now if you wanted to access a specific model we can actually isolate these so if i go and copy this particular one you can see that i've now got one model and again as per usual we can make a future data frame so which is effectively what's happening here let's actually just do it there right so the so what we're actually doing here is we're first creating our future data frame so in this particular case i've typed in forward equals fit underscore models and then we're isolating out the specific model that i want because remember inside of our fit models dictionary we're now going to have more than one profit model actually saved so this is a separate model and then this is a separate model over here as well now in this particular case we're going to make a future data frame so no different to what we're doing over where is it over here so these two lines are effectively the same right so we're creating a future data frame and then what we're going on ahead and doing is performing a forecast on that so if we go and run this now we're now going to have a forecast generated and so this is a forecast done using this model if we wanted to go and do it on our tesla model s model we could do that as well now so i can go and grab this paste it in there and we've now got a new model generated as well so again this is now going and doing it on or we're now forecasting for tesla model s's instead of tesla model x's on that note that is everything that i wanted to show sort of done in a nutshell so let's actually take a quick recap and then i'll answer a bunch of additional questions if you guys have any so we first up went and imported our dependencies we then went and loaded up some data we then went and applied our data preprocessing and remember we went and did what is that four specific things so we went and converted our date column to a date format we went and isolated a specific time series that we want to go and forecast for we then went and dropped our store and product column and then we went and renamed our columns to ds and y which you can see there we then went and created a time series model we went and created a bunch of future predictions and if you wanted to yeah i didn't actually go and split out the data into training and testing but you could definitely do that if you wanted to just keep in mind that what you want to do is you want to split on time not just do a random split in this particular case so rather than splitting randomly you'd actually go and potentially take the 2018 2019 years and go and um test out on 2020. and then we went and plotted out our forecast plotted out our components and then i sort of gave you an idea as to how you might go and scale this up if you had multiple time series that you needed to go and forecast for all right let's take a look at our question so um what did we have so gary do i use profit in my role at ibm so i actually use this for um so we've actually i actually work really really closely with a data simulation or a piece of software that actually does data simulation and data modeling so i actually use this um integrate with the api for this pretty um or use profit with that api quite a fair bit so um yes i definitely do it uh do use it at ibm as well what other questions did we have um i'd use dummy data variable for 20 uh so i'm just reading the comments so it looks like we've got a whole bunch of comments as to what you potentially could do so whether or not you could use a dummy variable for 2020 data points definitely could do that to do uh so this is an interesting question so um and this is coming from carlos carlos i think you were the person that called out whether or not we've got more than one point per data set um i'm gonna go and check that so if that is the case i'll go and fix up that data set for you guys so you can pick that up in the case of forecasting a huge number of skus say for example more than a thousand skus using this loop would you say this is going to be prohibited in terms of running time and whether or not a gpu would help to be honest you can see that it's actually pretty quick so i've done this for up to like 100 120 different skew and product combinations so you should still be okay if you're getting to the point where you're getting two thousands of skus what you might choose to do is you might choose to separate out your data frame and rather than just doing a loop you might actually have a proper training pipeline set up if you're going to do that in that particular case um what else do we have any any other questions guys nope okay all right let's quickly take a look so what we did is we went through a bunch of stuff so we took a look at how we can load our data in a notebook how we can train a profit time series model and how we can go ahead and make some predictions and evaluate our performance now again there are a potential number of improvements that you could definitely make to this so you could definitely split it out into a training and testing partition it's probably good practice to also run a bit of evaluation so i'd suggest taking a look at mse and see how far your predicted values differ from your true values so again you can just use the msc metric out of scikit-learn for that and then in terms of how we went through it so we loaded our data into jupiter we're going to train our model using facebook profit and then we actually went and made some forecasts and again all this code including the data is going to be available via profit and if you've got any questions or any issues by all means do hit me up so i do have a youtube channel so you can definitely go and take a look and all right one last question so would you review what you need to have installed to be able to utilize profit so in this particular case you need to make sure that oh i'm operating inside of a jupiter notebook so you could definitely do that um potentially use anaconda as well you're going to need pi stand to be installed and i stand has a dependency on a i believe a c plus compiler so if you're running on a mac you should be good to go if you're running on a windows machine you need to install something called min gw so this is going to be the c plus compiler that you're going to need in that particular case might just be easier to use the windows subsystem for linux in that particular case but so what did i say so jupyter pi stand and then you're going to need to install facebook profit as well so should be all good to go with that but on that note that about wraps it up nathan i will throw it back to you hey nick thank you so much i hope everybody really enjoyed themselves um and based on the questions i think they all did uh nick i just want to say hey thanks for being here on wednesday we're going to be hosting another webinar on an introduction to deep learning with pytorch and our presenter on wednesday is going to be jericho borovich a senior research engineer at grid dot ai and in that talk where jericho is going to go give a quick overview of deep learning and then dive into some examples using pytorch lightning um if you're interested in attending that i'll be posting the link in chat here in just a second again thank you nick and i hope everybody has a good rest of their day thanks so much nathan thanks for having me guys
Info
Channel: Data Science Dojo
Views: 2,026
Rating: undefined out of 5
Keywords:
Id: wXS9IzDjuZQ
Channel Id: undefined
Length: 55min 3sec (3303 seconds)
Published: Tue Jul 13 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.