Implementing linear regression in Power BI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
ciao friends in this video we will talk about the function for computing the linear regression index we have two functions Line St and line sdx but these two functions are particular because they return multiple values and in order to do that they return a table and in order to consume the different arguments we need usually in a formula in a measure to display a line in a chart we have to extract different values with the technique using select columns so something unusual that we never seen in other Dax functions so it deserves a video and now let's start with the demo the linear regression is a function we usually use in chart for example we have a line chart where we have the average price by line quantity so depending on the quantity amount we have in each transaction we can have a different price and of course we are considering the average price for all the transactions whereas probably you will have in your particular report as slicer selecting a product category or a product model or a product name now we are not interested into the interaction now just in the calculation how can we obtain in this chart the linear regression which is a line that basically applies the linear regression formula that in case of Dax can be obtained with a function it's called line sdx so if we take a look at the documentation of this function line sdx so we see that we can provide a table with the the list of the values we want to compute usually the list of the values for the x-axis however what we are going to get is a list of two arrays at least two arrays one that has the list of the expression for the y-axis and one that has the list of the expression for the x-axis now you see that this argument is repeatable and we're not going to see how this works because actually this is after the scope of the video how to obtain the linear regression with multiple values for the x-axis this is not what we want to do in this video but I suggest that you take a look at the examples to see more advanced examples on how you could use the linear regression in this case we start with the two arguments so expression y expression X which in our example correspond to what we have and let's go back to the initial data what we have in X and Y axis for each position so you see that we have basically 10 positions and for each one we have a number that corresponds to the value on the x-axis the number that you see one two three four five six seven eight nine ten and the value for the average price of course these values could be different we could have a very different a very different set of data and just to give you a reference about which algorithms are implemented what is the algorithm that is implemented by the linear regression function we have links that also we included in the article for the least squares algorithm that is used by this function and there is an interesting article on Wolfram on the website wall from where we have an explanation a more detailed explanation with graphical representation of the algorithm that is used so we're not going to see how this work our focus is that okay assuming that we know what we're doing and we want to use the result of the linear regression because at the end we just want to display and let me go back we just want to display this chart our goal is to obtain a single measure that generates the value the linear regression value that we see used to plot this line this line is almost horizontal but of course in in other cases it could be very very different so let's take a look at the function line sdx and in order to do in order to see how this function work I can create a table so I go in modeling and I go in new table because line STX is a function that manages a table and returns a table and the easiest way to see that is to just try to execute this function so line STX where we want to compute for each value of the x-axis we want to sorry for each value of the table we want to provide the different data points so first of all we provide the the table that we want to iterate and let's start with the all selected of the sales quantity because in this case our particular chart has this value on the x-axis and you see that the second argument is the expression we want to provide on the y-axis so for each for each row of the table passed in the first argument we want to provide the expression on the y-axis that in this case is the average price measure that we already computed before and on the x-axis or the x-axis we could have one or more values in this case we provide only one which is exactly sales quality and we use the column reference in this case for this quantity line sdx returns a table so the result of line STX is not something we can use in a measure indeed I created a table because I want to show that the result we have is a result that has multiple values you see in slope one intercept standard error in slope one solidarity intercept and so on there are many information provided as the result of this algorithm but of course we need to create a single measure what we have to do for the linear regression is that basically we have to create a formula that provides for a particular y we want to resolve the formula that has X multiplied by the slope Plus intercept so this is what we want to get as a result now if you look at the results provided by line sdx we have slope 1 and we have intercept which corresponds to this oops I made the mistake to this slope parameter that we multiply by the x value and the intercept offset that we have to provide in this part of the formula so the challenge at this point is that line STX Returns the table but we had to create a measure that uses two of The Columns of this table in order to generate the result so how can we do that so if I copy this function in a measure so let's go back to the to our initial design you have already seen the solution and I'm gonna create another measure to generate the same solution so let's call this linear regression I don't know if it is the same that I already used see if I can type this and I type the same line sdx now this will return a table and of course this is not what we need so what we do is we store the result of this linear regression measure into a variable and maybe instead of line I think we can just call it line so line equal to line sdx now at this point line is a table we have seen before that line is a table that has multiple values but what we need to get is the slope which will have a value let's store zero at the moment and intercept that we have to extract from the table that we obtained before and at the end we have to create a variable that is our y value which is equal to X multiplied by slope plus intercept so what is and and of course in the end we want to return y now what is our X again we can create a variable to complete also this part and our X is the value we have in sales quantity but we are creating a measure so the measure is evaluated in a particular position of the chart we can use our selected value function that Returns the value of sales quantity that is selected for the current position in the in the line chart the current state of the filter context basically so whenever we have a single value for sales quantity selected we have a value for x that it is not blank and now we have to extract from the line table we have defined here the value for slope and the value for intercept because we know that the table has only one row what we have to do is just to extract the column that we need and so we can do this with the function select columns select columns can get as a parameter the table that we want to pass so in this case line and the second argument is just the column that we want to return so we can write for example slope is equal to the value of slope one now you see that actually this version of power Bia desktop still doesn't recognize the result of line CX and line STX and this could be confusing because actually you don't understand that slope 1 actually exists in the result and actually I don't need the column name I just have to extract the column slope 1 from the result of line one I repeat this four also intercept and of course I replace this with the value of intercept and remember if it is one or not so let me save this measure I go back to the table definition of table 3 where I Define my table three and you see I have slope 1 and intercept as a name so actually it should be correct so let's go back to the measure that I Define and let's see what I have that is not valid oh return T is actually return y and now I click enter you see that now that the yellow line disappears The Measure is valid even though I see these errors in in the Dax editor of power bi desktop this is a bug of the current version of RB desktop that hopefully will be solved soon so now let's go back to our chart so my linear regression measure now should work I put the linear regression function here and I see my line displayed so the main takeaway is the following let me recap first of all line sdx returns a table so because it is a table the best way to understand how it works is to try to execute it index Studio or in a table expression using a calculated table just to see what is the Excel name of the results of the columns that we have in the result then when we have to consume these arguments usually we use at least a slope and intercept but we might want to use the other if you have a more complex statistical calculation to implement in the measure we do exactly the following we use a select columns to extract from the table that we stored in a variable so we don't execute lens STX multiple times the value of the column that we want to extract so this is the behavior of line sdx and Lan STX like Line St performs the calculation using the information provided usually we will use a mainly line sdx because we can control the table that we provide here otherwise you always apply the calculation over the entire table and only providing column references instead of measures like in this case or or just column reference like we did in this other case so Linus DX I think is more common if you have to implement these calculation but both line SD Line St and line STX Returns the same result a table with one row multiple columns we have to extract using select columns so we have seen that the line STX function is a statistical function that provides multiple results this function uses a single table with one row and multiple columns and the best way to manage the results is to store the result of this function in a variable and then you select columns to extract the individual values from this table this single row with the multiple results take a look at the documentation to see how to use the multiple parameters for the linear regression we usually only use two of the results but of course we have many more for more complex statistical evaluations and as usual enjoy ducks [Music]
Info
Channel: SQLBI
Views: 22,368
Rating: undefined out of 5
Keywords:
Id: EmsR8lc7w78
Channel Id: undefined
Length: 13min 52sec (832 seconds)
Published: Tue Jun 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.