Python In Excel: Microsoft Changed EVERYTHING

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
that nerds Microsoft just released python in Excel this allows you to run python code within an Excel spreadsheet and do some pretty Advanced analytics not only does this unlock more amazing looking visualizations but it also allows you to do things that python is better at than Excel such as segmentation or even machine learning well that's at least what Microsoft is saying so let's dive into a data set and explore more about this feature here's an Excel sheet on data science job postings it includes things like the job title posting time company name salary and even skills we're going to perform some exploratory data analysis with python I can enter python mode either through the formula bar or simply by just typing py open parentheses so first I need to import this data into a data frame which is pretty easy all I have to do is select the data I want and it Imports it in using this XL function now with it loaded I can scroll over and see that this data frame is a representation of all the columns and data in this Excel sheet let's get to work with some descriptive statistics I referenced the previous cell with our data frame in it and then used this dot info method in order to see more about it here we see we have almost 55 000 job postings and looking at columns for missing values it looks like the work from home has only 8 000 which makes sense because it's only filled in if it's true next we can see the types remotes everything is a string or object along with a date time for the posting and float for the salary column let's actually dive into this one but I don't want to keep referencing the cell as I may move it so let's load this data frame into a variable called jobs DF now I can run the describe method on this variable and it provides me these statistics showing me the min max and also the median and I can see it by hovering over it but actually it's easier to see if I change this output to an Excel value now I can more easily see things like min max and median and do built-in Excel features like conditional format also python cheat codes like this helped me more quickly analyze the data Vice writing individual formulas for everything I need with this salary data I want to take a step further and visualize it with a histogram so I'll enter python mode which you can also use a shortcut of Ctrl shift alt p and from there I'll name the data frame along with the column and run the dot hiss method on it and I get this tinyass graph which I could expand this cell to make it bigger but then it jacks up all my formatting so I right click it and under picture and cell I select create reference I get this picture and it's much easier to move and format around but looking at this graph it's pretty ugly I can't really tell what the x-axis is and it only has a few bins so it's pretty much not usable and so now we're getting to the point where we need to use some more complex python in order to build out these visualizations that are actually usable so I just add about 11 lines of code and get this which I think is pretty damn good at showing how salaries Peak at around 120 000 for data nerds but this required me to use some pretty Advanced python in order to build this visualization so for those that don't know python is this feature pretty much useless well not so fast I actually lied to you I didn't generate that code for that visualization on my own I actually used Chachi BT and I think that this AI tool can be used by non-coders in order to work with python in Excel so I developed this prompt that tells this chatbot everything it needs to know to work with this new feature I also give it the spreadsheet itself so it's also familiar with the data set going back to our previous example all I have to do is go to chat gbt and now that it's primed ask it a simple question such as how do I generate a histogram with the job salary column and for my original prompt it not only generates that visualization I should expect to it see but also all the code that I need to use to generate this so I just copy and paste this code put it into Excel and voila and for those keyboard Warriors out there they're like Luke you literally have your results right there in Chachi BT why are you wasting your effort going back into Excel to that I'd say I've literally been trying to tell you guys about how easy it is to do data analysis in chat gbt for the past four uploads but nobody seems to be listening so I just take this unnecessary step of copy and pasting this code and paste it right into Excel and get the exact same results but on an also serious note who is this product for and why all of a sudden is Excel the number one software in the world used by 10 of the population teaming up with python the number one programming language in the world okay so section number two but HTML everybody knows that's not really a programming language well I think we need to test it more to understand better what are its actual capabilities and more importantly who this is actually for all right so it's been a couple days and I've been exploring more about the functionality of python and Excel and I found a lot of amazing things coming out of this product but I've also found quite a bit of limitations so let's start with the good before we get to the back one of my biggest pain points in Excel is generating visualizations that actually helped me with my job as a data analyst specifically I care about statistical analysis well python has libraries for this basically things you can install into python writing python in order to visualize graphs super simple my favorite is Seaborn and it helps you generate all of these visualizations with minimal amount of coding so let's look at a real world use case of that histogram example that we did previously the problem I have with this plot is that it doesn't tell the full picture it has all those different job titles wrapped up into this one single graph and this really isn't good for me so I just started by simply adding job title to the legend or hue I got this graph which is pretty hard to read I can't really see the distributions of n anything behind the senior data engineering role I even tried a violin plot yeah no but then I found this work of art on Seabourn for this Ridge plot and this thing is pretty complex I mean look at all the lines of code it takes to build this bad boy so I gave chat GPT the sample code and then with the results from it fed it back into Excel and generated this work of art and holy smokes I'm blown away by this thing it not only shows the salary distributions flow upwards from debt analysts to data scientist to Engineers but it also helps me spot anomalies like this huge Peak for senior data analysts at one hundred thousand dollars so I can use the functionality of excel of filtering down tables to dive into this data I find that Cox Communications is spamming jobs for this role sometimes they list the same posting as much as 28 times in a day I can go back into the original import of my data frame and specify the drop duplicates method in order to get rid of these duplicate Cox jobs or really any others and this is where python is pretty awesome now all of those visualizations that use this data frame will then be updated in order to see these different distributions for what they really should look like comparing the old graph to the new one we can see this one has more of a normal distribution which is something I'd expect out of this data as hard as it is to admit I did find some visualizations that Excel is better at than python can actually do for example I wanted to use this country column in order to plot the distribution of jobs from around the world they have a lot of the core libraries that I use day to day but this geography one they just did in half so I ended up using Excel just to plot this instead but moving on we're not just limited to Unique visualizations python actually has some very strong capabilities in processing data so I have a column in this data set on the skills required in these job postings it's in the format of a list which Excel isn't really good at handling this well kinda power query which is the the tool used in order to import in and also transform data is really powerful at doing this but it takes a lot of steps in order to get the data how I need it and then I'd have to establish a data model and connect it to that but with a few lines of python I can clean up this data set and get a visualization that fits my need and I get this graph showing the likelihood of a skill appearing in a data science job post and coincidentally this shows python as the number one most popular skill and Excel is a ninth but similar to that histogram issue this is for all job titles so I use some built-in functionality of excel to create this drop down list of all the different job titles here we have data analysts where SQL is number one but then Excel and python are in the number two and three spots so I think this visualization at least for data analysts helps understand the importance of this combo of excel in Python and this is pretty awesome because now I'm combining the power of python with the functionality of excel so I can build things that non-python users can also use all right so that's been a lot of the good let's actually get now into the bad the Nerds editor Luke here and when reviewing this footage I realized I missed a major aspect of python that it annihilates excel at and that's machine learning the data that we have is pretty unique in that we have a salary column that has attributes about it such as title skill schedule and country that can be used to predict salary and this is where machine learning models come in specifically I used a linear regression model from the scikit learn library in order to predict salary I can then use the functionality of excel to plug in things like the job title the job type the country that's located in along with a list of skills if we wanted to look at what a full-time data analyst makes in the United States with the skills of excel in Python we can find it also if we wanted to compare this to another job such as senior data analyst we can do this as well and as expected the salary is higher and this dashboard right here is blowing my mind because it makes it so easy now to build machine learning models with Python and Implement them into Excel to give out to my colleagues they're going to have a blast with this all right let's now get into those limitations and get back to past Luke oh and I'll start with this this is a beta feature in Excel so they're not going to have everything right there's going to be some limitations that we're going to run into and a lot of these I expect Excel to fix but onto the bad like when I run this d-types method and try to figure out what the types of data I have it just gives me a bunch of Errors also so you understand how this code is executing so python is not running locally on your computer instead they have what's called a Sandbox environment up in the cloud where python is running and then those results are fed back to you so sometimes this python environment crashes and you have to go in and actually reset it and it takes a little bit of time to get set back up next is data size limitations so I try to initially load in about a million rows into a data frame in Python and I got an error message saying I can't do this I was able to eventually get around 400 000 jobs in but the processing time in the sandbox environment was just so slow so my data set of around 50 000 jobs seems about perfect for what's needed for that environment the last major issue is that you can't use any python code to connect to external resources such as files or apis or even web pages typically with python I want to connect to things that have data in it like databases and csvs so this can't really do it but I don't think that's too major of a limitation because of the power of power query and we can use this instead to import and even clean our data and then use Python to connect to this data source so what's the big issue then well I just wanted to have three things to complain about anyway so who is this python in Excel for well I made a fancy graph for this that I want to show you in order to explain this so This x-axis right here is your skill level in Excel are you a newbie to this tool or are you on Bill Gates level similarly we have python on the y-axis and are you a newbie to this or are you on Guido's level we're going to start in this quadrant right here and this is for all the python nerds that are forced to work with all those colleagues that won't move out of excel this tool is great for you because now it allows you to flex your python skills at all your Excel colleagues and show them how inferior they are okay that was a joke mostly all right next up is this portion down here for Excel nerds that don't really know that much python you know these guys they sit there and brag about the solutions that they built in Excel for these nerds I don't see it really unlocking a lot of new features that they don't already know how to do in Excel so so did I'm using this too much so what about these hardcore Excel and also python data nerds well frankly they don't exist they're a unicorn they tell you exist they're lying to you run but what about those that are new to both Excel and also python well I'd still also recommend using this feature in Excel for data analysts specifically I would prioritize Excel over the python but I think it's still good to focus on learning both of these together and specifically using tools like chat gbt in order to build prompts as I have a theory about this let me explain all right so I think that Microsoft has coincidentally releasing this feature of python in Excel right near when they're also going to be releasing copilot for Excel and somehow I think these two are connected now co-pilot and Excel oh my gosh now copilot in Excel is going to be built on a very similar model as Chachi BT open AI is generating these models for child GPT and also for copilot so I think learning one is going to be transferable to the other so I think it pays dividends if you spend the time now to start learning chat gbt now this relates to python because now if we have this copilot in Excel it's going to be much easier for this large language model to write python code to generate different visualizations than actually trying to go in and control the Excel functionality even if this Theory isn't correct Microsoft is still going to be using a chat bot inside of excel so it pays to learn this so if you're curious to learn about how I use chat gbt as a data analyst check out this video right here and with that I'll see you in the next one
Info
Channel: Luke Barousse
Views: 445,737
Rating: undefined out of 5
Keywords: data viz by luke, business intelligence, data science, bi, computer science, data nerd, data analyst, data scientist, how to, data project, data analytics, portfolio project, sql, excel, python, power bi, tableau, data engineer
Id: cOhGrBwZZqE
Channel Id: undefined
Length: 14min 36sec (876 seconds)
Published: Wed Sep 06 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.