Python in Excel Makes Power Query a MUST-HAVE in 2024!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
make no mistake power query and Python and Excel are like chocolate and peanut butter they are better together while Excel power query is nothing new let's be completely honest the vast majority of Microsoft Excel users have never even heard of power query and even fewer actually use power query in their daily work and that's a crying shame because power query is extremely powerful however I think power query is going to be something that's a lot more popular in 2024 because of python and Excel due to the following five words the Microsoft Excel data scientist yeah yeah yeah you heard that correctly I'm expecting some hater comments as a result of me saying that with the combination of SQL power query and Python and Excel Microsoft Excel is actually now realistically a full data science stack it is a technology solution that is applicable to a huge number of real world data science and data analysis scenarios to prove this to you I am going to demonstrate a real world scenario I'm going to pull around 330,000 rows of data out of a database into Power query and then feed that into python to excel that'll get pushed up to the Microsoft cloud it'll chunk around for a little bit and come back down and then we can do some cool things with it like running some data visualizations on the data set so I'm going to admit this is a pretty cool video I'm pretty excited about it so let's go ahead and get started the first thing I need to do is level set you on the environment so I'm recording this video on a Dell laptop and it's about three 4 years old now it has 16 gigs of RAM reasonable processor it's connected to the internet using a decent but not spectacular internet connection and the reason why this is important is because of this Python and Excel code actually doesn't run locally in your Excel workbook it actually runs in Microsoft's Azure Cloud which means you have to move the data up from your local workbook to the cloud let it run there and get the results back at the time of this video recording you can only move 100 megabytes of data from your workbook to to Python and Excel and back so one of the first things that we need to do is estimate about how big our data set's going to be to make sure that it fits within that constraint so what I'm going to do here is show you one way to do this so what we're going to do is we're going to go up to data and we're going to go over here to get data and there are so many options in power query for sourcing data it's one of the things that makes it awesome by the way anything that you can Source from Power query you can push into Python and Excel but in terms of analytics the single most useful thing is is likely to be from a database because most valuable data is stored in databases I have SQL Server running locally on my laptop so I'm going to say from SQL server and I'm going to fire that up and what I'm going to do here is say hey look my database is running locally on my local host and it's in a database called python in Excel and I'm going to open up the advanced options here and I'm going to go ahead and paste in some SQL so I got my SQL and look what I'm doing here I'm using Microsoft SQL server and the variant of SQL supported by SQL Server allows you to do something called top 1000 it essentially allows you just to grab a thousand rows of data and this is a one way to estimate how big your entire set of data is going to be to see if it's going to fit within that 100 megabyte limitation so what I'm going to do is say okay cool I'm going to just pick the top 1,000 rows and I'm going to go ahead and click okay and that's going to run and it's going to show you some data here now this is a data set that represents flights that took off from New York City airports it's got about 3 36,000 rows of data total and it's got varying kinds of data it's got numeric data it's got string or text data and it's Got Dat time data so it's a reasonable data set to play around with so what I'm going to do is I'm going to say look I want to load this to I want to go to load two and I'm going to select only create a connection because that's all I need I just need power query to set up the connection and then I will leverage this connection in my Python and Excel code so I'm going to go ahead and click okay and I get back my query and I don't like this name so I'm going to go ahead and click right click on it and select rename here and I'm going to call this top 1000 flights now what's cool is that I can go ahead and close this down and actually use that connection name in my Python and Excel code so how I get started in Python and Excel is I use the new py function so equals py and as soon as I hit open parentheses I get python mode and what I'll do here is paste in some code to leverage that connection so that we can take a look at the first thousand rows and estimate how big they are okay now you can see my code here which says look I'm going to load a panda's data frame from SQL Server via power query a data frame is nothing more than the way python represents an entire table of data and what we can see here is I'm using the brand new Excel function the Excel function is your gateway into your workbook data and all I need to do is just pass it in the name of my query what'll happen is behind the scenes the Excel function will look through all the named entities in your workbook tables and named ranges and power query connections it will find that top 1,000 flights is available and it will then load that into a data frame called flights so what'll happen is power query will reach out to my local SQL Server run the query get the data push up to the Azure Cloud bring it back and then what we'll do is we'll get the information about the data frame and the information will include how much memory how big are the 1,000 rows and then we can use that to estimate our overall size of our data frame so to run this code I just hit control enter and this will take a second to run not surprisingly and we get back information in the Diagnostics Pane and if I scroll down what we can see here is my memory usage is about 66 about 67 kilobytes so this is not going to be a problem I can pull all the data in because 1,00 rows is this 300,000 rows is about 300 times that and that's well below the 100 megabyte limit so I'm good to go so what I'll do now is I'll create a new connection so I'm going to go ahead and create a new connection from SQL server and once again it's Local Host and my database is called Python and Excel and what I'll do here is put in my SQL and this time I will use the same SQL but I'll remove the top 1000 okay you can see here I have my code and I'm going to go ahead and remove the top 1000 here cuz I don't need that anymore and I'm going to go ahead and click okay and I'm going to go to load two and then I'm going to say only create a connection click okay and I get my new query here and what I'm going to do is I'm going to rename it and I'm going to call this one just flights DB so that tells me that I'm pulling all the records just a naming convention you can name your queries whatever you would like now that I've got this query set this connection set up I can run some python code just like we saw before but with a different connection name to pull all the data now here's the problem by default Python and Excel is set up to run every single python cell code every single single time and that gets kind of boring and annoying to be quite Frank so what we want to do is we want to turn that off so how we do that is we go up to formulas we go over to calculation options notice that's on automatic that's the default that means every python cell is executed every single time so I'm going to move this to manual and that means that only the current cell the current python formula that I want to run will get executed okay so we're cool there so I'm going to go ahead and load all the data now so equals py open parentheses and I'll paste in my code here and you can see here I'm now using the flight DB connection so this is going to pull in about 336,000 rows of data and it's going to push it up to the Microsoft cloud run it and then bring it back and then cash it locally this is going to take a little bit of time to run I'm not going to let you just sit and watch the video but what I'll do is I'll start timing it for you so that you know exactly how long it runs even though I'll do a cut scene here okay so we're back and what we can see here is that we have in our Diagnostics pane 336,000 rows for example and what we can see here at the very bottom is that my data frame is around 21.8 megabytes so I'm like well within the 100 megab limit which gives you some indication that you can push quite a bit of data up to Python and Excel in the Azure Cloud but what's really interesting is now that we've got the data let's go ahead and do something with it so I can show you the performance for example of doing a data visualization so one of the things that we can do is use the Seaborn Library which comes out of the box with python and Excel and it is a great way to create nice powerful data visualizations so I'm going to go ahead and say equals py open exclamation point and then I'm going to copy some code into here paste some code into the cell and you can see here I'm importing the Seaborn Library as SNS and I'm going to create something called account plot in caborn account plot essentially is a bar chart where you're counting up individual values typically what you're doing here is you're using some sort of categorical data so for example I'm using my new flights data frame here which is now all 336,000 rows of data and I'm saying look create a bar chart counting up how many flights took off from each of the three New York City airports in the data set so if I run this code I'm going to get back a python object that's not really what I want I actually want to see the visual inside of excel so what I do is I switch over to an Excel value and that reruns the cell and you can see here I get a little chart now one of the easiest ways to see the chart much larger is to right click on it go to picture and cell and create a reference and that shows you the image in a much larger form so usually what I end up doing most of the time is bring up the cell like this the image like this look at it and then and then when I'm done I hit contrl Z to put it back so what we can see here is the count plot once again let me make that a little bit bigger and what it's showing us is this is Newark this is LaGuardia this is JFK you know they're all about the same I mean there's a little variations in the total number of flights that took off from each airport but in general they're all around the same they're pretty equally distributed now another thing that we can do is we can create histograms if you remember histograms from your statistics class if your experience was anything like mine you were like oh histograms I'm not sure how they're actually useful turns out in the real world they're super super useful let me show you a reason let me show you an example of why so we're going to go ahead and go to this cell equals py and then I'm going to go ahead and paste in some code make this a little bit bigger so we can see it paste it in and here's what I'm doing I'm creating a histogram of the from the flight data frame and what I'm looking at is the distance okay the distance which is how far did the plane travel it took off from a New York City Airport how many miles did it fly to get to its destination and what I'm doing is I'm saying look create a histogram of distance but create one for each unique value of origin and then what I'm doing is I'm setting my buckets to be about 500 miles wide so flights from 0 to 500 miles would be in one bucket 500 to 1,000 miles will be in another bucket so on and so forth so if I run this code by hitting control enter I get back a nice histogram but I need to switch it over to an Excel value once again runs the code and then what I can do here is let me scooch this up and then picture and cell create a reference and then you can see here a very useful visualization which shows you for example that typically speaking the planes that are taking from L taking off from LaGuardia 2,000 mies or shorter and notice here JFK we have a lot more flights that are over 2,000 miles including some that are very very far which would be potentially interesting to say well where are these flights going why are they so far and then Newark is somewhere in between laguard and JFK so this is an example of how you can conduct powerful visual data analyses using Python and Excel and in many cases you can create visualizations in Python and Excel that are just not possible to do without of the Box Excel charts now this is a very simple demonstration I loaded up a big data set I created some visualizations however if you check out some of my other Python and Excel videos on my channel you would use the same technique for even more advanced analytics for example things like building machine learning models performing logistic regression analyses or doing a cluster analysis there you have it a quick demonstration of how power query and Python and Excel are better together I've created this video to demonstrate the true power the true potential of python and Excel but I'll be the first to admit that Python and Excel isn't for everyone Python and Excel is designed for professionals that want to differentiate themselves at work by providing new and Powerful insights that just aren't possible without of the Box Excel to continue my content on Python and Excel my next video is going to address a common question that I get should I learn VBA ual basic for applications or should I learn Python and Excel and in the next video I will answer this question when that video is ready I'll put a tile here on the screen for you so you just click it so you can see that video and in the meantime I'll put up another one of my Python and Excel videos that I think you might enjoy okay that's it until next time please stay healthy and I wish you very happy data slothing
Info
Channel: David Langer
Views: 35,552
Rating: undefined out of 5
Keywords: pythoninexcel, python excel, python in excel, python for excel, pythonforexcel, pythonexcel, power query, excel power query
Id: 0ICD9zMMzZ4
Channel Id: undefined
Length: 13min 18sec (798 seconds)
Published: Wed Feb 28 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.