Fast Running Totals in Power Query (Complete Guide)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey friends welcome back for another fun video in this video I'm gonna delve with you into running totals and we're going to start basic and we're gonna see what kind of stuff we're gonna do in power query to make our running totals perform blazing fast so after this video You're Gonna know everything you need to know about running totals stay tuned [Music] so for those who don't know me my name is Rick from bi gorilla and in this video we're going to look a little bit into running totals and power query you might have some requirements where you want to calculate the cumulative sum of something and in the beginning when I started out with this I thought it was difficult but most of all it was annoying that those were always slow let's Dive Right In and see why that happens and what we can do about it so let's work with the data I have on the screen here I just have six different rows and an amount of the number of sales we've done so here we see the sales amount and let's say we want to create a running total here what kind of issues do you bump into when doing that okay let's first have a look at two methods that I see most used and that are generally pretty slow first of all whenever you work with a data set like this you always need to think about the sort order because if you have a running total you want it to be in the right order I'm gonna make a copy of this one here and we're going to call this one let's first look at the list first n and I cannot have a dot in the name here but we can just do list firsthand so the first important thing when doing running totals is to sort your data okay that's done the next step is we need to make sure to calculate the cumulative sum and what does that mean that means that on the first row we want to add the sum of just this one on the second row we want to add the sum on both 14 and 18 and on the third row we want 37 in there as well to get your cumulative sum now the easiest and slowest way to do this is to add a column from one and we're going to use this column to say I want to make a sum of the first value the first two values the first three values the first four values how does that work let's say we add a custom column now in this custom column if I can move it around I seem not to be able to can I do it now yes in this custom column we can use the list first n function to now the list first n function lets you retrieve a list of values and then just lets you tell how many of those values you actually want to see so what is the list of values we want to work with it's the values in the amount column and I can reference that value by first mentioning this name of the table step the name of the step which is a table it's called edit index and then we want to reference the amount column so we can write edit index and then between square brackets I write them out this is my first step running total I will call this so this creates a column that contains a list with the values of the amount column and we can see that by clicking on the white space next to the list and as you can see the value is 14 to 30 are all in here and that happens for each of the rows now you can work with this if you change your formula here and add list first n this function wants a list as input and lets you return only some of the values in the list so we first start again with this whole list of items and then we can say I want to have the for example the first one or the first two or the first three items conveniently we have this index column which we can reference so I'm going to reference index if I close my parentheses we're going to have the same column but then with different items in the list so our first row has only a single value the second one is two values three four five six so as you can see on each of the lines if we would want to do a running total we now have the values that we need to add up and the only thing left for us to do I'm going to put this in a new line is to wrap the list that we have here in the list sum function and by clicking we now already have our first running total here so 14 is actually the 14 right there and if we want to have let's see oh that doesn't look too good uh so we start with 14 and then we get to 14 plus 18 is 32 and then the first three together is 69 Etc that's our running total perfect you'd be done you think in a similar fashion I'm going to copy this query duplicate and we're going to change this around just slightly in a similar fashion we can also use the list range function so we can reference the the column name again by writing and here we go edit index and we have the amount column s and this first generates the list that we just had as well and now instead we can write list range and the list range function just like list first n it requires a list to start with and then you can give indicator range that you want to return now the second argument asks how many of the values you want to offset for your range I'm going to say I want to start at the beginning and then how many items do you want to return so similar as list version n i could write the index column because the index column will show us how many of the values we want to retrieve so similarly now we only retrieve the values we need for the running total and wrap this in the list sum function and we have a running total okay so Rick what's the problem you're asking me because we now actually have a running total right you're correct so in principle if you don't have too much data this farm this setup here it works just fine now let's say you have a lot of data there's two big problems with these approaches let me just rename this one as well here so we have list range the two big problems that we have I can indicate by just going back a step here so let's say we are back at the number of uh at the list values that we want to want to sum up so let's go back here so in the first line we want to sum up 14 then 14 and 18 14 37 Etc now the big problem here is first of all we always have to keep on retrieving this data so for each of these steps power query has to go back to the edit index step and find the values in the amount column and because it repeatedly has to do this it is going to cost a little bit of extra time to do this so the bigger problem here is that power query does not hold these values in memory now there's an easy fix for this let me show you so instead of referencing this uh this step name here called edit index and the amount what we can also do is we can just after this edit index step we're going to add another step here and over here we're first of all going to reference that amount column so far nothing changes we just have this list here but you can tell power query to keep this in memory and by telling it to keep it in memory you're going to already see a substantial speed benefit and you can do that by using the list buffer function so simply by wrapping this in list buffer the result stays the same but Power query puts it in memory for so for that the calculations actually can can make use of that in-memory list of values I can call this buffered values and in our latest step here instead of well first of all of course it now references the wrong step so if I want this to work I can go back and write edit index again and same for this one and it's index so this style still works but to make sure that it uses the the list we have in memory you now need to adjust the first argument of list range and over here you can now write buffered values and the good thing about that is is that the result here stays the same but that it uses the in-memory list one of the big performance benefits already okay so what is another problem of this approach so the first problem we solve now with the buffered values but maybe a bigger problem here is that we're doing a lot of unnecessary calculations what do I mean by that on the first step that we are that we have we need to sum up 14. that's okay on the second step we again need to sum of 14 and 18. then we need to sum up 14 18 and 37 14 18 37 and 35. and imagine doing this for a big data set where maybe you have a million of rows even if power query holds these in memory the calculation still requires you to add all of these numbers again and again and that's a big problem because these are unnecessary calculations that that costs you the compute resources of your computer so what can you do instead well let's imagine that after each of the running totals for each of the rows we could save the result of that so that when you get for example to number two you have 14 plus 18 is 32. but let's say you're at 32 the third line what if you could simply add 32 plus 37 so instead of adding three values you're now only add two values and what if you would get to the fourth line here and you would still remember what the addition was in the previous line what the running total was up to that moment and you would only have to add the latest number to it that would save you a lot a lot of compute resources especially when you get to let's say 9000 because instead of adding up a thousand separate steps it would just have to pick the running total from the previous step and add a single value to it now that's what we're going to look at now to make sure we can also work with very fast performing running totals so the slow running totals were at least first analyst range and now let's have a quick look at a quick performing one and the one I'm going to show you is from the list generate so let's generate here I'm going to take some steps back but it does not require us to have an index okay so with this data set in place I already see that the data is sorted by date and also for Less generate I'm also going to use the buffered value step now what's going to be different now is let's generate allows you to write away the sum of a certain step and the sum of that step I can reuse in The Next Step so let's give that a try so to get started with that I'm going to create a separate step here and it starts with list generate and if you don't know how that works I have a video about that and let's generate first wants you to create a starting value so we're going to create a record and we're going to create running a running total this is going to be the value of the running total and as a starting value my running total should equal the sum of only the first value here so I could write my routing total equals then I'm going to reference the buffered list from the previous step so I could write buffered values and only the first value and then something else I need I need something so power query so let's generate notes when to stop doing the running total calculations and I'm going to make use of a counter for this so at the starting value my counter is equal to zero but this is going to increase with each of the running total calculations we do so this in principle makes the the start or the initial value of let's generate argument requires us to write for the condition for how long the running total should run for so I could write something like okay I want to make a running total as long as this counter that we have that is going to increase with each step but as long as that one is smaller than the number of values in this list so I could write list count and I'm going to count the amount of values in the buffer values list so I write the buffers values okay that's all that we need for this and I'm saying smaller than because the power query index starts at zero so because we start at zero the index is going to end at five another six and that's why we need this list count to be smaller than things get exciting because the third argument in list generate requires us to indicate each of the steps that it takes what's your power query do so what I should do is of course Define the the height of this new value so I'm going to have to Define what the running total will be and our running total will be the value of the previous running total value plus an additional value of the next row so I could write here my running total value equals the running total of the previous row and I'm going to add here I want to go to the buffered values list and then I want to find the next value in the list and normally my first value is 0 here which equals the counter but now I actually want to find the next value which will always be the counter that I have with n plus 1 value so I could say pick the previous counter value which was Zero and add plus one and then by doing this you find the next value in your list so by summing up the previous value in your the previous running total and the next value that we have you find your next running total value and what's left is we still need to Define our counter and the counter is the value of counter and then plus one that's all that we need now with this we have the the draft ready for our list generate function now when we press OK the result of this is a list of Records so we started with a record and we also defined what the field should look like in the record but the result of this is also a record so our running total the first record chart with 14 then we go to 32. 69 104 138 and 168. so that in fact is the running total over here now if we only want to return one of the columns there is another argument that let's generate which is the selector it's optional but it's very useful if you want to return just a partial part of a record or if you want to do a calculation on top of the already generated list but in this case I'm just writing that I want to return the running total by writing each and then the running total column name and here I have an error why is that the field running total of the record wasn't found ah I have a space tether and I should remove the space okay so the buffered value step was earlier just like this and this now creates our the running total list great now the question you might ask yourself is okay list first analyst range resulted in me having another column in my base table and now we're looking at a list of values which is separate from my table so how are we going to make sure those fit together well before we get there let's first just have a quick look at why we did this and what the effect is so the reason why we did this is that list generate is able to save a value in its memory and only add the next value to it which is going to give us a lot of savings in calculation time especially for larger data sets that's why we're doing it another benefit is since we're only having to do this once we only since we only have to calculate the values one by one that's also a Time Saver but the big downside is we now only have a list that's separate and we still need to stick those together so how can we do that well let's have a quick look because we actually already have a source table here I'm going to call this one data set or uh main data why don't we call this main data and this main data what we want to do with this we actually want to attach the running tool list to it and you can do that in several ways but I'm going to show you a way that's pretty short and efficient and that's the following so if you have a main table like this and you want to attach something to it there is not an easy function for it because by adding a custom column you're actually and if you would add this list to it you would actually add all the all the values of the list but what we could also do is we first put all of those all of the columns from the main table we transform that one into a list of list values well we already had a list of running total values and then we can combine all of those together and transform it back into a table let me show you what I mean so if I go to a new Step here and our first step was would be from the main data part so this step is now called consolidations this data here I don't want to have it as a table because there's no function available where I can easily attach a list and then just hide this from the list in here so what I could do is I could write table to columns and what's going to happen with these three columns is I'm just going to have a list of lists here so this is the first column the date column this is the second column and the amount column now what I can do is on top of this I could write the Ampersand and then I can write running total list now that's not completely what I wanted to because this was a list with lists so by adding this list value here you have three list of lists and then the values so we need to wrap this in another query bracket here we go so so far so good and we have attached these together and I was left for us to do is to create to to transform these again to a table so what you can then do we can write table from the list and table from list table from list wants a list of lists which is what we just had here so and what they didn't want is to still make sure you get the the right column names but let's first try this um so we could for example say table dot column names the next argument is the column names so the column names of our main data and then we want to add another column called running total type function and now I'm gonna just have to quickly Peak because there's something wrong here so I go to my running total a table to column table column names I'm just going to peek a little bit here and put it back right there ah one mistake I made here we need to table from columns not table from list table from columns there we go so our table from columns here combines the main columns here with these and as you can tell most of the data types have been consistent so they remained so in my main data I add data types and those are still the same here but there is one thing that you might notice and that is that the running total here does not have a data type and the reason for that is that when we try when we actually transform these columns into a list they knew what data type was here but at the moment that we transformed this buffered value list into a list right here it didn't actually remember or it didn't actually know what kind of data type it was so what we can do is still make sure we Define a data type here and we can do that and I'm going to show you what code works for that by writing here we go you can use the value replace type and then actually give a type to this uh to the list that we have so I can incorporate it right here so we have value replace type and then we have the list that we had before and here comes the fun part we can then say type and as a type you could write type list but we don't Define what kind of items are in the list but you could also open a curly bracket and then say like I only have whole numbers in here so the N64 type and then you close your brackets again and you close your brackets again and let me see if I replace type I need to close these brackets and in principle you have now made sure that the list of values received the data type and by doing that that actually allows you to combine the different lists but also remain will also keep the data type or also Define the data type that you need okay so that is the first fast method that we have and it's fast because we don't have unnecessary calculations and we keep all the values that we use in memory as a buffer value here now I have another way in which you could do this and that this would list uh list accumulate very top formula so I'm just going to show it to you but the other way we could do with this like this so we go to just a running total step so let's accumulates is very similar to Let's generate the only difference being that we don't make a condition of when the generation of a list ends but we actually have a fixed amount of items in uh for which a list will be created and in this case a list accumulate starts with a starting list and this start this list we're going to add up I'm going to skip the first value here that's why I have it here I'm going to skip the first value because my starting value is already the first one so the 14 I already have it so the list we're going to iterate through I'm going to skip it but because I want my calculation to go on until the last one I just added an empty value at the end of the list okay second argument of list accumulates the second argument shows what the starting value or the the seed or initial value is now since we have multiple I create a record and my running total value it starts with nothing so it's empty at the start but then the current value that we're going to use is the first value in my buffered values from the previous step then we get to the third argument the third argument will let you know what to do with each of the steps that list accumulate iterates through and what it shows is that it starts with the current value and the current value is actually the sum so the value that the current value has the first one and then on top of that it adds the current the current part and this current one is always a value from the list that we have in the beginning here so whereas the first time we iterated we just have this first value the second time that I did write it it also got the first value from this list which resulted in our in adding things to our running total and at the second part here I'm saying that the running total result and we'll create a list so I'm starting with a value here which is the empty list and then I'm just as a list adding all the current values which will result in our running total list there's one thing that I haven't mentioned to you yet which is important to make your Solutions stable if we go back to this list generate example this was our definition of it but there's one problem with this if we go to our main source and over here I would remove the value 37. and then we change the data type to amount we're going to see that one of the values has a null value which is different from zero so instead of having a number that it can add up it has a missing value now missing values have a particular characteristic so if you pick like a value like 18 and you add plus null the result is null what does that mean for our function if I go to my running total list you will find that it takes 14 32 and after it reaches the null value no other additions are being done and I find this a a pitfall of the current setup of this function so one slight adjustment that I would do is instead of making sure to add numbers like we're doing in this part you can also use the list sum function and the list sum function has some error checking in there that at the moment you have a null value it actually allows you to treat it as if it was a zero or treat it as if it wasn't there so let's let's try that out I could write the list sum here and list some once you wants to add a list of values so you open a curly bracket then we don't need to add this plus sign but we can just add a comma because the second value that's right here is simply going to be an item within the list and all of the items within the list will be added anyway then we need to close our curly brackets and close our parenthesis and by doing that the null value that we had earlier here it actually doesn't impact our calculations and it just makes it a little bit more robust now that was a lot for a single video um what you learned was that there's pitfalls that they're slow and fast methods I recommend the list generate version and if this was a bit too quick for you on my website you can find an elaborate article and you can copy paste both the code and download the file now you might also be interested in doing this calculation not just for a data set that we have right here but maybe you want to do it by category or by group well the good news is I have a video coming out for that as well and the blog post is out already so if you want to if you don't want to miss that one make sure to subscribe to my channel and with that check out the next video to know uh how to get your group running downloads as always thanks for watching and I'll see you in the next one [Music]
Info
Channel: BI Gorilla
Views: 11,055
Rating: undefined out of 5
Keywords: power query, power query tutorial, power query for beginners, running total, running total power query, cumulative sum power query, create running total power bi, power bi runningt otal
Id: RIvKaqGGB3w
Channel Id: undefined
Length: 29min 16sec (1756 seconds)
Published: Wed Oct 26 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.