Tidy Tuesday screencast: Analyzing US dairy consumption in R

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I'm Dave Robinson and welcome to another screencast where I'll be analyzing and exploring data in our that I haven't seen before as usual this dataset comes from the tidy Tuesday project which every week releases a new data set and let's see what we got this week it's gonna be a dataset on dairy production and consumption all right I'm excited so we had a lot of CSVs this week all right it comes from the US Department of Agriculture let's see ask the original data economy requires a lot of mess with Excel sheets but we'll get to that today let's see how we are know maybe not there's quite a few CSVs that we already have to work with all right and that comes out with two articles sometimes I check out the article near the end I don't like to use it I don't like to look at it yet cuz it'll affect how I explore the data I can see what what do I find interesting about the data first all right we have five Cs bees we might and it says might require a few team datasets okay so one thing that's saying is that fluid milk sales mel cow facts and state milk production good for just plotting and milk products and clean cheese allow for some creativity in the data organization steps maybe maybe i'll start with one of the milk let's see milk cow facts looks like it doesn't have a lot of columns i was right fluid milk sales because never doesn't have a lot of columns real start with mel cow facts then move on to one of the ones ah this is gonna be interesting this one probably is going cheese and milk products probably move on to those afterwards hmm you know what I think I will start with milk products facts let's start with a slightly data set that requires a little more creativity and exploration so I'm gonna your your mileage may vary you could choose another data set I'm gonna start with this one milk products fats alright what I do is go to the appropriate page raaah and grab the humble ink and then I open up my arm D gonna save this one as us dared Gary library tidy verse and I'll do read CSV milk product facts products facts facts hmm let's look at what data we have I took I put into view alright ah so this is going to take a gathering step because we can see it has one column for every type of dairy now one let's see one trick is let's see what the units are cuz I'm seeing very different numbers let's see average milk consumption it looks like it's always am i right that it's always pounds per person yes looks like it okay then this isn't too hard a tidying step this is a classic case of gathering I have one column year and really my two other variables should be type or let's say product so I want to gather product and pounds per person for everything except the year column so that's my gather at my tidying step wherein I have your product pounds per person so I could ask a lot of questions but one thing I could say so this is milk products tidy I could ask just about the most recent year I could say I could say filter year is I also know say max year I don't know what the max I didn't know the max is yet as it goes 2017 and I could count the product and the powers per person this is me just interested in getting a feel for the data anytime I do a bar plot like this I want to flip the coordinates I'm gonna spread this out a little and I also if you've seen my screencast before you know that I'm probably going to do an FC kiri order of product by pounds per person this is let's these items be sorted the other step I want to do is I don't love it being is if we called snake case with the underscores I would like either I'm gonna try making a title case so before I reorder it has to be before I do string to title which should turn this into title case ah but it doesn't remove the spaces oh okay I get it I would need to do yeah I think I would have to in an inside do string replace all place all the underscores with spaces and then do string to title on that here it is I have the option of doing string can I do sentence case I actually don't know nope that's okay maybe I could do upper string to offer and that would be oh no not all upper case I meant just the first letter yeah not worth it string to title this looks pretty good I usually if I if I were doing this as a graph I would I would remove the the x-axis label make the y-axis label pounds pounds consumed per person in 2017 and this just gives a general sense that like yeah the average Oh per person per us this is a person in the US drinks lots of milk some various kinds of cheese some yogurts okay that's solid another way to think of this is that notice we have category fluid cheese some evaporated dry mother type of cheese other so we actually have some categories within this that is going to be relatively look over time I'm not gonna hmm now I'm gonna take a look I'm going to try taking a bit of a look at that excuse me um is it always the first word yes it is okay so I think how my milk milk products tidied I'm going to do the processing in here no I take it back I'm gonna do the hmm yeah I'm gonna do the processing afterwards also why I'm thinking about that I did a gather not have fluids cheese's dry everything I actually want to separate those into two columns I want to call them category and product and I'm the oh I'm doing it wrong I separate product into two categories category and product the set is underscore and with when I have extra I merge what do I mean by extra what I mean is is in cases where it's ice cream regular gonna quickly count category and product could have done distinct but I want to leave these all merged so it's an extra merge and I also want to say fill equals right yeah which will switch will allow me to say butter and the product is also butter so butter is a bit odd because the only one that has this like sentence and this is category on this product at the same time and then in the same place so I might want to throw in one more step which is if there's no product I want to be coalesced between the the product and that category so what does coalesced mean coalesce means if it's an a if the category isn't a um my mistake oh no it nope later to write this product category if the product is a name dropped back to the category so when butter was when product was in a used butter okay so this is my cleaned version why did I go through the work of doing that why did I separate category in milk I also will probably want to do so I did at coalesce I probably also want to do my string to title string replace all on the product I probably also want to say category string the title of category just gonna make it look a little bit better fluid mal fluid etc if I distinct category and product now it looks pretty pretty great yeah yeah there we have our categories why am i doing it like this well I'm gonna save this as milk products tidied it's because I might want to aggregate across years that this is important because we have too many categories to make a line plot over time so I can do but I can still do group by category and year well first thing I can do let me let me quickly notice I've made this grill oh yeah I don't need I need this step I don't know what's up here I can do this can I do this step I can can I do this list of lymph 9 not meaningful I don't actually know what this I have not seen this bug before I don't think okay oh oh I see I accidentally had add an extra greater than sign up here okay I reordered the product ah okay mm-hmm I can't the the problem here is that there's multiple other categories oh if I reordered this with some and then I said fill equals category yeah that's that's closer I'm looking for excuse me let is to say I want to say there's other there's other cheese and there's other frozen or should they really be combined it's hard to say but this is one way of visualizing one year per category but this is not as important as this is not not the only kind of graph I want I want to say group by category and year and summarize pounds per person is the sum of pounds per person combined the categories everything from butter to frozen - to fluid and I wonder if people been moving from fluid to evaporated or solar fluid - frozen if I want to answer a question like that I could say Year category year pounds per person color is category and giome line whenever you get an error like non final location just means it didn't find it couldn't find something was actually interesting about this is that it looks like fluid consumption almost certainly milk has been going down over time and cheese consumption been going up frozen down a bit okay so this was one way I wanted to aggregate all the data we have a problem whenever I look at what I'm going to be comparing to anything on a line which is that we have that that ml category is kind of taking over everything sometimes we don't we don't have any way around it that's that's by category this would be I would call it [Music] consumption by dairy category dairy consumption by category is what I named this Y is pounds per person X is still a year but I can capitalize it if I like and then I say based on us consumption USDA source USDA you can start as add details like that to a sub 2 a sub title now you know it's not a tear not not a bad graph I don't think it's it's not the most informative we could get I'm gonna quickly change our ggplot2 theme and come on back okay it's one thing we learned that was one way of working with these many categories I miss just two other ways one is FCT lump I could say that I've used that a lot in these screencasts I say product is FCP lump of product into the top six and it's plus an other category and they want to weight it by by pounds per person and I want to group by both product and year and I want to summarize pounds per person is sum of pounds per person and if i graph this now he's a very similar graph where I say year pounds per person but I say color equals product what I get is American cheese it was a milks been going down American cheese been going up a bit yogurt looks like the real winner here it looks to me like yogurt has been increasing look at that okay a man it's time to having milk on this graph just kind of taking everything over when they realize separate out the category makes it a little can make it a little bit confusing where I said I'm there what American what but I can add a few more categories probably not really helpful another way like I mentioned there are two ways they can try and show everything another is I certainly can't do product without doing a summary look at what this is going to look like too many colors in the graph in fact we have two other categories because I did that separation we have to other category even if we didn't way too many categories no good Zbornak could do would be facet wrap by product it's still not gonna look great it's gonna be too many graphs too many of the points but I could say scales are free I got a deal with this other I'm gonna promise I'll deal with it in a second [Music] and throw in and expand limits I always when you do especially in a case like this I need all the Y axes to start at zero and yeah the problem with other is that it's only meaningful if it's like frozen other or cheese other so I'm going to get that actually in the cleaning step what I'm going to say is product is if else product um if the category nope I think about if the product is other then paste the category and the product actually the product of the category I realize I kind of like the name other cheese rather than cheese other otherwise just leave it at the product what did that do what that did is if I try this fascinating again it's gonna have seventeen groups which I don't not crazy about nope didn't work OH other only became capital after these other two these other steps that was a that was a little issue when you do these multiple mutates in a row you really got to watch the order I've kind of got a mental model going of what they are at every step but I still have make they certainly make mistakes and it can be useful this what run one mutated a time and look what I'm what's in between okay I don't love making a FAFSA when there's something like an odd number or they go number that doesn't naturally lend itself to a grid like this but this actually does get across a lot of the idea where we can see is that yogurts on the up-and-up American cheese been increasing Mel consumption has certainly been decreasing but it's funny because the 80s I associate with got Malcolm you a ad campaign in the USDA's push for more milk but we have we have other ones that are flat whenever I see lots of trends and I just want to say which are going up in which you're going down I'd often want to fit a model to each I'm not sure I'm going to do that in this case because so normally what I would often do would be fit linear models to many of these I like a new forecasting I haven't thought of and thought of doing forecasting I think that's what I'll do next actually I think we might want to say what is the future of milk what's the future of American cheese what's the future of weigh in the process you're going to have the great adventure of watching me remind myself how did yous how did how to do a forecasting model because it's been a little bit of time so this is going to be a great adventure for all of us are doing forecasting in a tidy way just like up about a month ago I did some we did some dolphin data and we looked at um and I did survival analysis and it had been a while okay so go I'm going to remind myself a couple things I'm going to remind myself the forecast package here we go can the forecast package take so I've done it some times before it's been it's been about a year and a half I think since I last ate forecasting and I'm certainly no expert I'm also trying to remember if room if I do library broom no forecast hi dears hmm remembering if what it what are you having room oh not STM I actually wanted no it's not seasonal I don't need it my name is Christa forecast yes the sweet package I haven't this is fantastic spilt by the people that I think I'm Business Analytics like Matt Dan show and Davis Vaughn fantastic package I just need to remind myself how it works this is gonna be like I said it's gonna be a real adventure here we go a lot of these are it's exactly right do I have sweet this is actually going to be a little adventure in seeing how I'd learn how to use a package last time I did forecast I don't believe I use the sweet package but I am generally um familiar with it let's see I can look at its vignettes on cran I'm gonna start with forecasting at scale okay forecast some groups I want to nest PS I'm moving a little fast through this documentation because I'm generally familiar I just want to feel confident in the approach map to the data you're TKTS okay I'm seeing TKTS is time-based okay what I'm gonna do is I'm actually gonna try this on one example first I want to for example filter product equals milk yeah should only be the one fluid milk category what happens when I do TKTS on it TKTS oh I think I need time TK package know day to day time column okay didn't need that I need to tell it that it's a year column let's see okay I think I may need to turn this this year column about a hundred percent sure about this again I'm learning along with you but I think I need to turn it into a into a date column so it's as Daved 1975 I'm actually gonna do it a little bit differently what I'm gonna say is let's take the year zero there is no year 0 hit the year one and add the number of years to it year's year years I think is a lubricate function ah here we go we're quite close I just subtract one year from it no so this was my way my slightly silly way of saying I want to turn every one of these into a date like I said there could be a better way what happens if I do TKTS to this oh you can type it into TKTS it keeps the date it understands the date but it drops know it lets see he kts what if I say start equals 1975 oh I I put that in the wrong place what is select give me a moment while I I'm actually gonna go back to the the original vignette and see how do I create a TKTS object alcohol sales is going to have this date and price and here I would say start and freak I probably say freak equals one I don't understand why it's saying oh it's dropping the non numero oh oh okay product and category makes sense to drop I don't quite see why it's dropping a year because year is my date column well maybe it just doesn't maybe it just doesn't need year because I already said see this what I'm not really understanding is I feel like this is the date column and why it is it column was why do we need to tell it which column is they why why is it it's dropping my date my date instead of um I don't really understand why does it need the Year column if I remove the Year column with that change sign about it know it likes it I'm a little confused if I said date is this no it would then remove the date column yeah it's a um I don't know why it's saying year is a non numeric column I I kind of just thought it was a be a date I'm gonna keep I'm just gonna stick with this for now um and but yeah though that was that so far is is the gist is that when I have one column I probably want it go I want to change it to date so my category may have my product why does and this is the code I would use to turn into a date why does that matter it matters because I can nest by everything but category and product now I've got one data frame for each of these and now I can do what the vignette was recommending I can take that column and say yes that the time series would be map on data that ekts function with the two arguments start and frequency I think the vignette took a slightly different approach but it should be the the same idea so if I took whole TS I should get a long sequence every one of these is a time series column okay so that yeah that we go kutina look great okay that gets my time series so I can call this milk product yes we now have a call for time series we even now have a row for every individual milk product that as this column whose class is TS this special business analytics I defined class for a tidy time series if I there are other tools for work with time series I could have worked with them I just I these I think are generally well designed to work with with their implementations of the broom tidy augment class methods so why does this matter well I could do I can map a TS I can map I believe it's called SW for sweep glance and Oh can't quit oh no I actually want to do my bad I want to do a model on every one of these time series I could use each yes what is each yes it's I believe a forecasting Oh X potential smoothing oh no it's error trend seasonals an exponential swimming Air Tran seasonal model that's exciting oh wow we've just fit 17 models just like that we don't know yet know what to do with them but I could glance at all of that I could find out the model parameters that would be map model glance you just told me that you could glance oh you cannot know Oh SW glass their version of glass what this does is give me the model parameters for each of these FETs is include and oh that's um this is interesting it looks like a problem in printing the table no okay no good way back to normal this is everything for an AIC to be IC which criteria for choosing between models the route the means mean root mean squared error the mean absolute error these are all ways of evaluating these fits they're not the easiest way to compare between these will be hard to say which of these models was better than others because um the data the underlying data is on different scales mel cos is a much larger numbers so i wouldn't pay too much attention this but we could compare multiple models for the same data using silac AIC NBIC that's really exciting because later we might try a different a different model on the forecast model on this data so why so here i'm gonna call it eat milk product ets glad I can glad set it but I can also unnecessary argument is that we have the we have time what they're calling index we have the actual value that's the original we put in and then we have the fitted value according to this the ETS model so this makes it easy to visualize I can say index and I can say index on the x-axis the let's see actual on the y giome line this is going to be and facet by let's say by products I think that that'll have one row for each I'm not going to use categories yet I probably do the same gonna get before and free up the scales equals free why so we've seen this graph before this has one facet for each of the products we haven't seen is that I can now add a gym line let's say color is blue it uses the y-axis instead of using the actual uses the fitted values the predictions so this is going to add a trend on top of each of them as you can see an EPS model actually gets really quite close in terms of terms of the predicted value within aam really astonishingly close we don't have seasonal trends in this data so if we feels over fit to me maybe we could try a different model besides in EPS what other forecasting might we have let's see I'm going to quickly try a Rima I'm not I don't remember exactly remus stands for auto regression something but if I anything that I put in here anyway so here is my graph of my EPS predictions I should don't know a lot about I don't know what about the ETS model so I'd have to look more into that but if I try changing this to nope not a I Remo let's see what happens when I augmented oh it just gets the residual that's a bit odd what would happen if I use s to not STM I'll try capital a ARIMA I don't know what the difference between those and why one has indexed one doesn't maybe I'll create a another model later so ARIMA isn't able to see a trend in this case we just just fits a blank line o STL is the word I was looking for not STM univariate series that doesn't help what else do I have that has I'm looking for things that have an augment method I don't know what Holt wonders is oh must only work on periods let's see if there are any other recommendations ah the problem with ETS are not realizing is that I'm fitting this this trend doesn't have an you expect UTS when you have lots and lots of um of residuals okay you also want some periodicity which is not what we're looking at here we don't this is not a perfect dataset for forecasting because we don't have periodicity oh well that's a bummer I'm probably looking at I'm probably going about this entirely the wrong way forecasting using multiple models but I'm not um this might not be the right data set for forecasting or if it is and I'm probably doing it wrong hmm quickly looking through what options they recommend this would be an arene yeah they have they have a similar approach Anna let me see so this is not this is similar data to what we're looking at and it's showing so it's showing these models as a Rhema that's ETS myself ETS okay I mean I'm gonna stick to the ETS approach for now we're gonna use ETS for for forecast in the future okay I was I was curious for a minute but it's worth um look it's worth figuring out more all right what I would do is I've got bits oh this is actually really neat that they're recommending different multiple types I might come back to this approach where we try a few different type models for forecasting but I'm gonna start let's say just with using ETS all right I'm learning along with you here here's my ETS models you can see them in this column called model each of which is an ETS type I can also create a forecasts' forecasts column where I Mack onto my model the forecast function I usually say I think H equals 6 hmm recollection stands for horizon and means we're forecasting six steps in this case six years into the future so if I then unnecessaries columns osw sweep oh man I have to remember these prefixes here we are what we get is the wow this is perfect we get is the data so far and then intervals for the future I think that they set up notice I'm jumping back and forth between the vignette a lot because I have some senses of what each of these steps mean yep it'll be colors key group is what is f is group Oh F in this case the model we're not using that okay so I've uh nested and now I have the actual data and then what else is in the key column forecast perfect I'm going to create a plot it says on the x-axis the index on the Y actual axis is pounds per person and the color is key and the rest of the graph is going to the rest of the UM the graphing code is going to look pretty similar oops this line is gone we're not we don't have a comparing fitted to actual anymore here we go we have actual data going up until until the current time and then we forecast the future six is actually not a lot I'm gonna try forecast in ten years and this is useful because I'm saying here's what we expect about the about the future but it's all uh I'm gonna zoom in a little bit notice that the moment where it changes from being read like an actual data to a forecasted data but it's also important that we have confidence bounds and we can get those with genome ribbon we notice that there is a low let's start with with the eighty percentile what I do is why men is low 80 why max is high eighty and I usually want to be transparent let's give this a shot well this should do is add confidence bounds to the future won't a conference balance to pass there is no uncertainty in our data around the past but this gives a sense of our confidence bounds for the future so so we can see that in some cases looks like um this model ETS or at least our data that doesn't have our seasonal trends data that doesn't have seasonal trends it looks like it mostly will predict a flat line go into the future sometimes with that positive slope so is a negative in a case like trebay it looks like it kind of went down but then it kind of predicts a flat e and according to the to the ETS model and you know this is a solid way of representing future forecasts I could do it with multiple models so another one would be is called Auto ARIMA function do they use to create that let's see where's my Ottawa Reema you know what damped equals true it looks like into another argument they used to ETS I'm gonna go ahead and skip it go where do I call the Ottawa Rhema function this is what I'm trying to find where this fit gets created VOC map oh no that that's uh interesting as a as approach okay what what they did what I think it's a very good it's a very good vignette what it's doing is creating a data frame where it takes each of these it takes the name of the function and invokes it with a list of parameters that go with it but I think that does mean that Auto ARIMA is a function yeah and that's the it might be the appropriate one so right now I'm using ATS what would happen if I change to Ottawa Reema good I'm learning my way around this too hmm it looks pretty similar to me which is I think reasonable but what if I wanted to include both what I could do this is what it what they're effectively doing we have our data that has one row for each of our data of our data sets I could add crossing model name is either Auto ARIMA or ETS excuse me and I could add see yeah I have Auto remarried yes and then I say instead of map the same function onto all of them I would map I would map to I would see invoke takes the function o invoke map I didn't even know that was a I didn't even know that was a thing in the VOC map don't think it's gonna work in my case I wonder how invoke works can I say invoke mean on one two three I can't oh I can but it doesn't look like it's oh I'd have to say list of one two three four five okay hmm this is going to be not worth the effort nah I'm not going to do it it honestly the models look really similar and I think I'd I'm giving up ah I'll do it okay what I'm gonna do here is say I want to I want to apply every I each of these appropriate functions to every item in this in this list so I'm gonna say Mac two on the TS theta and the on the open up on the name so model name and on my list of yeses don't apply the invoke function so this will say invoke or Rema on the first TS for example on this would be an example of one of the things that was doing oh it doesn't like that it would need it to be list wouldn't it yeah that's that's like I mentioned this gets a little bit complicated I need to wrap this function slightly was it worth the effort hard to say but we're getting close what I would say well it would do next to say well yeah I see how these it's actually these models are quite different so I'm actually I'm gonna be happy that I did this what I'll say is color is model name and the take a look at what the data looks like after I fit all the fit both the ARIMA and the EPS model and I've unnecessary and you also have your your confidence bounds so what I'm going to do is say here we are I'm going to say color is model name lty is key I'm not sure how it's going to look yet that's actually kind of the best that one could expect we're seeing here probably gonna drop the key from the from the I'm probably going to drop the the key from this graph the legend I don't think it's as important but we're seeing here is that we have a model for the future and every single graph this is actually kind of kind of quite interesting the trends that are expected are similar so ice cream reduced-fat it's been flat for a while we expect to keep being flat but the ETS model expects has a wide range of possible futures and Auto Reimer is narrower in contrast the non fell fat milk ARIMA has a has a very wide future in ETS a thinner future so this is important for example if you um if you used one of these forecasts and you use it you publish in the paper or you um or you gave it in a presentation to a group of statisticians they might ask why didn't you use this other model this is a really good graph to have an appendix where you can say well here's the difference between these two models I don't know enough about the two models to compare them I also think a lot of the advantages of forecasting comment once you have once you have seasonal trends which this data doesn't have maybe we'll see it in a future um a tidy Tuesday but this definitely this is I have really appreciate to get an opportunity to learn how to use the forecast of the UM I should say the sweep package and the tea and the TKTS in the time TK package to do forecasting on a future data in fact this is a really kind of quite neat graph though I need that I'm worried about its size and the number of items on the the x-axis I'm gonna tidy it up just a little bit I'm gonna say X's year Y is average consumption pounds per person u.s. consumption and title is forecasted consumption of dairy products so notice it wasn't that complicated data said it took a little bit of tidying a little bit of cleaning but the interesting part came when we started doing multiple models and for forecasting we said I want to predict future trends I really appreciate the the excellent vignette the business science people put together the yeah business signs yes I put together for for this forecasting approach because it really helped me get a get a sense of how to use these tools I have some experience of using broom but this this is a very similar approach but different different enough to be worth thinking about I probably need to adjust these these x axis labels while I'm tidying things up I'm going to say continuous brakes are I'm going to say just every 20 years 1980 2000 2020 and I'm also going to say scale line type continuous I think I say guide equals false if I want to remove the um that can be actual versus forecast oh I said continuous should have been discreet is it guide equals false let's find out let's find out together how I remember this good I probably also want to say color is model I could change Auto Reema's named is something that fixing more of a so they fixin in more of a graph as more presentation ready but I don't think it's that essential and this is a pretty solid visualization yeah okay okay so this was taking a look at forecasting to reiterate the approach that we would have done on that we were doing on each of these these categories we nested it so it had one row for every for every category and product we turned each of those into a time series object we we then reply applying two different models either Auto Rhema or ETS we then fit that model and forecast the time horizon of ten years in the future and then we use sweep to turn that forecast into a into future predictions of low ninety-five percent and 80 percent confidence intervals that's something I'd probably add based on USDA data 1975 to 2017 showing 80 percent prediction intervals I said confidence Newell's they're slightly different in prediction intervals and this is our our finalized version all right we have a few minutes left that was I'll have some really fun look at forecasting I wonder hmm I think might be worth seeing what else is there in the data in the data that could be they could be interesting uh see then she is oh there's all types of cheeses like I think we'd apply the exact same approach 100% sure but it looks like this is all milk products this is cheese these are not going to oh this is actually this is interesting we would not have me see a price price price price I'm well let's quickly pull out the the cheese data that's gonna be best going to my plan is this quickly going to clean cheese and we'll apply some of the same approaches maybe we'll do forecasting maybe we won't but you know cheese is pretty cheese is pretty fun all right so let's say it's a clean cheese sure I'm just gonna call it cheese we know that some clean has already been done but I'm just gonna call it cheese ooh this is gonna take a little more processing I think oh not necessarily though we have one misspelling what I would really were just I do is this one doesn't have categories so it's going to be a gather just like the one before we know where there's a year comedy gather type and is this average consumption per year in pounds per person yes every column is pounds per person pounds per person and in everything except for year I'm gonna rename the year column to be lower case to fit the others I'm also going to I just mentioned that there's a funny misspelling if I distinct type I can actually see total American cheese I can use mutate type is FCP recode I want to on type I want to replace I want total to rename total American cheese to cheese because that's gonna bother me otherwise now it's cheese anything else needs to be recoded um I probably want to turn everything to sentence case um so I'm gonna say type is string the title case I should say type cuz otherwise I'm a little bit precious sometimes other is low case and with foods and spreads whatever this is pretty this is pretty good okay and sixteen types that's gonna be great enough it in a faceted graph seriously so I can actually look at year type cheese oops it's gonna be your color equals type pounds per person it's the same graph I'm not gonna not go into a copy and paste so I could have scales equals free Y and you I needed an expanded limits y equals zero there's gonna be 16 cheese it's really good for a facet maybe this is the one I'm going to maybe this is the prettier graph than the other one let's find out in terms of forecast I really don't know I don't know what kinds of cheese have been increasing and decreasing oops I want ice a color equals type a candy color equals type but then I definitely want I don't want to annex acts a legend I'm gonna skip color equals type hey most cheese's on the rise blue cheese has some missing data donno what brick cheese is but it's been it's going down okay mmm-hmm all right most of this is really similar to my earlier graph tremendously similar in fact I think the only difference I could copy paste this I'll need to do this let me see does it start what year does it start that's an important one Oh mmm if blue is I wonder if the missing date is going to be a problem we're gonna find that out together I gotta call this cheese tidied cheese I need graph it and cheese time series on everything separate type the year the year starts in 1970 instead of 1975 cheese tidied yeah it's that odd column that I don't really know why it's dropping the date I create some time series I want to look at blue cheese I'm gonna do this in the bed I want to look at blue cheese I can basically copy paste everything that means that maybe when a shoe done is combine the two sedated on everything together I'm not gonna do that here because they do want them in separate visualizations but I would need just to say and let me see model name key all of this correct I just need to change this to tight boy isn't that something ah yeah wow it actually this really nailed it um in terms of what I wanted from blue cheese so blue cheese is weird Wow blue cheese got really weird I think the Ottawa Arena ETS handled blue cheese differently because of the missing data I don't have a great solution to that I don't know immediately what what to do with it nevertheless is a pretty cool um this is this is separately a pretty cool visualization where we would say Italian other going up mozzarella going up totally Italian cheese going up are all cheeses oh yeah that's right brick cheese here it is we generally put it generally big staying flat based on uh the current trends okay so we could apply we applied the same thing to cheese that we did to milk now that was one one approach we did for today and an alternative we could have used would have been to create an interactive visualization we could have done something where you chose what cheese's you wanted to compare and they popped up on a on a shiny plot I think and Oh or well because we could have done the same thing for them for milk both of those would have been ways to let people explore this data alright so that was looking at dairy products in general everything from from in general the Dave consumption by category - the most common types of darien how they might have been changing and once you realize we wanted to fat facet it we realize we really wanted to once who doesn't want a fast we really realize we really wanted a forecast and we applied the forecasting method from the sweep and he end time TK packages so again I really recommend checking out these um the the sweep and type the titles before casting set up by Matt dance show India and Davis Vaughan there's some really cool tools for working with tidy time series data alright well that concludes today's announced I was really excited to get to learn a bit about forecasting with all of you and I'll see you next week
Info
Channel: David Robinson
Views: 3,784
Rating: 4.9603958 out of 5
Keywords: rstats, data science, tidy tuesday
Id: 13iG_HkEPVc
Channel Id: undefined
Length: 58min 53sec (3533 seconds)
Published: Tue Jan 29 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.