Power Query Running Totals - The Right Way!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video i'm going to show you a technique from phil my partner in excel crime on how to add a column containing a running total in power query that doesn't suffer from the performance issues you get with the common approaches you're likely to find on the internet we've tested it on hundreds of thousands of rows and it evaluates in a couple of seconds let's take a look we all know that a running total or cumulative sum is calculated by adding the previous value to the next and writing these formulas in excel is easy but creating a running total in power query is not so straightforward at least if you want a fast usable query that is i've got a very simple table here that only contains the values that i'll use to illustrate the different techniques notice there's a hundred thousand rows of data so i'll load the table into power query on the data tab from table range a common approach you're likely to come across is to use list dot sum and list dot first n functions but this approach is extremely slow when working with a lot of rows and because of that i'm going to filter to just keep the top 5000 rows and that will make it more manageable this technique requires an index column so i'm going to add column index column and i want it to start from 1. next i need to add a custom column for the running total we use list.sum with list.firstn and list.firstn takes two parameters and returns a list as its output the first parameter is a list which in this case is the values in the value column at the previous step which is added index the second parameter is a number or condition that tells list.first n how many elements in the list to return i'm supplying it with a number specified by the index column that i added i'll click ok and you can see there's my running total so what's happening here if we look at the third line where index is three list dot first to n returns a list comprised of the first three items they are values 36 63 and 89 and then list dot sum adds the values together to give the result 188 and as you can see by doing this for every row we generate the running total now what makes this approach slow is that for every row in the table the value and index lists are re-evaluated it's like it gets here and it forgets that the lists exist and then for every row it goes oh i better go and see what these lists are how many items in them and check that they contain valid data etc and you can imagine doing this 100 000 times for two lists takes a lot of time let's give this query a name i'll call it table with index now you can speed up the query by using list buffer so what i'll do is just duplicate this query and we'll modify it to include the buffer so let's give it a name and then in the advanced editor i can add a line to buffer the list so after this line here i want to add in my buffer we'll call this step buffered values so list.buffer is just referencing the previous step and it's grabbing the value column from that step then i can copy this name and in the last step i want to replace the reference to the added index step with my new buffered values list and that's all we need to do i'll click done it doesn't look any different but it runs significantly faster what's happening here is list stop buffer loads a list into memory where it stays while the query is running most importantly the query doesn't forget about the list so every time you refer to the list power query doesn't have to go and evaluate it all over again this makes using buffered lists significantly faster than non-buffered there are some trade-offs though buffering breaks query folding so queries to databases can actually be made slower by using buffering but in this scenario query folding isn't being used so there's no downside however while this query is a lot faster than the original non-buffered query there is still an issue because the query is using an index column to indicate how many rows or values list dot first n should return and hence be summed by list dot sum the query is still having to evaluate the index list for each row and trying to buffer the index list actually doesn't help even if you did create a buffered index list you'd need to explicitly keep track of what row the query is currently operating on and then use that variable to access the buffered index there's just no mechanism to do this in a query like this being able to work with lists that are all buffered and explicitly track what row you're working on will give you the best performance in this scenario and using list generate allows you to do this based on a set of initial values and rules that determine how those values change list generate creates a list of values then iteration or looping is controlled by a loop counter so you aren't evaluating a list on each iteration and you therefore don't need an index column for this approach now you may find this next example quite advanced but you can copy and paste the function i'm about to write and reuse it in your own files so no need to be able to write this yourself but i'll still explain it as i go we start by creating a custom function in a new blank query and then in the advanced editor the function will take an input i'll give it the name values as a placeholder this is a list and it will return a list there's just one step called rt which is short for running total and i'll use the list generate function which has four arguments first i need to provide the initial values and because power query starts from a zero base the value at position zero in the list is the first value i want to sum and because i don't have an index column we'll declare a variable called counter to use as an index for the values list and it also starts at zero while the counter value is less than the number of elements in the values list continue generating the running total in the next running total value equals the current running total plus the next value in the values list and it increments the counter by one and now add the new running total to the running total list so to summarize this will generate the running total and pass in the list of values as the only parameter the function returns the running total as a list so i'll click done now you can see the function in the list of queries let me rename it fx running total so i'll duplicate the previous query and we'll strip out the last two steps so we're back to just the first 5000 rows of the values and then in the advanced editor i'm going to add the code for my custom function so after the last step we add a comma and on the next line i need to put my values that i want the running total for into a buffered list to help with performance and i do this by referencing the previous step and specifying the value column which is the column containing the values i want the running total for now if your column name is different to value then just change it accordingly in this code so another comma and on to the next line here i'm going to add the running total with the table dot from list function we'll call this rt here we call the running total function and provided with the buffered values we created above the next argument is the splitter now this should be optional but the function fails if you use null here or you leave it blank so i'm going to use the splitter split by nothing function to satisfy this and lastly i'm going to call this column rt for running total now i need to join this new column to the original columns in the table albeit there is only one in this example however in your file there's bound to be more we'll call this step columns i'll use list.combine with table.2 columns to get the columns from the kept first row step and add it to the list created in the rt step above now i need to convert these two lists into a table and i'll use table dot from columns for this we'll call this step to convert it to table that's pretty self-explanatory table dot from columns is made up of the list called columns in the previous step and the column names are found in the kept first row step and i'll call the new column running total the last thing i need to do is simply copy the name of the last step and place it after the in i'll finish the hard coding here and we'll close the advanced editor now all i have left to do is set the data type here which is whole number now if we look at each step you can see the stages so let's start at buffer values this was the first line of code that i entered this extracts the values column and converts it to a buffered list that i can then reference in the custom function at the rt step i invoked the custom function that calculates the running total and you can see that here then i combine the original columns and the new running total into two lists if i click on them and you look down at the bottom let me make this a bit bigger you can see this is the list of individual values and this is the list of running total values i then converted that into a table and lastly set the data type for the running total to a whole number now this is currently only operating on 5000 records so let's see what happens when we change it to say a hundred thousand so now there's no filter on the list and if i jump to the last step you can see it's instantly calculated the running total so as promised this is lightning fast and although the code is quite advanced it should be a case of copying the code from this file which you can download from the link in the video description and simply changing the references to match your files column and step names well thanks to phil for this technique i hope you found it useful you can download the excel file for this lesson from the link here and if you like this video please give it a thumbs up and subscribe to my channel for more and why not share it with your friends who might also find it useful thanks for [Music] watching
Info
Channel: MyOnlineTrainingHub
Views: 24,835
Rating: 4.965909 out of 5
Keywords: power query, power query running total, running total, power bi
Id: 7kFZw8zimCU
Channel Id: undefined
Length: 11min 53sec (713 seconds)
Published: Mon Nov 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.