Automate Excel Reporting Using Python (Real-Life-Example) | Pandas, Plotly, Xlwings Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hey guys, In this video, I will be sharing my favourite ways to automate Excel using Python. In particular, we will be using the open-source tools: Pandas, xlwings & plotly. After this video, you will be able to create a custom Python script that allows you to combine excel files & create charts out of them. And without further ado, let us get started. First things first, we need to install the required dependencies. I will upload this Jupyter Notebook also to my GitHub page. So, if you want, you could remove the comments and run this cell to install the packages. Alternatively, you can install the libraries by typing pip install followed by the package's name in your command prompt or terminal. So pip install pandas, openpyxl, plotly-express, and pip install xlwings. Once we have sorted this out, let us start with a simple example. The main purpose of this example is to merge multiple Excel files, perform some calculations on the data and visualize our result. Let me show you what I mean by this. In the same directory of my Jupyter notebook, I have stored the examples. If we have a look at the input folder, we will see 5 excel files. Each Excel file contains the financial data for the respective country. For instance, if I open the file for Canada, we can see business transactions by Segment & Product. Additionally, we have got a lot more information, like the total sales, cost of goods, profit, and so on. The data for the other countries looks exactly the same. Out of the five countries, I would like to know which one has generated the highest Sales and Profit Margin. To find this out, I will switch back to my Jupyter Notebook. As the first step, I will import the libraries. Pathlib is a Standard Python module, so no additional installation is required. I will also import pandas as pd and plotly-express as px. Then, I am going to declare the path of the example files we just have seen. As we will also be exporting an interactive chart, I will also declare the output directory. To combine all Excel files, we only need three lines of code. As a first step, I will create an empty dataframe. Afterwards, I iterate over the input directory and append each excel file to our dataframe. Keep in mind that this only works, as all the files has the same structure. To validate if our dataframe contains all countries, I will print out the unique values of the country column. As we can see, we have got all five countries in the dataframe. Now, we could go ahead and use all the powerful tools pandas has to offer. For this example, I will be grouping the data and calculate the sum of the Sales & Profit column. After running this cell, our new dataframe looks like this. We could already identify which country generated the highest sales. Yet, it would also be good to know the Profit Margin for each country. We could easily do this by inserting a new column, dividing the Profit by Sales, and multiplying the result by 100 to return the Profit Margin in percent. To make the analysis easier, we could also plot our new dataframe. I am going to use a bar chart of the plotly express library. The first argument is our dataframe. On the x-axis, I would like to return the countries, which are currently used as the index of the dataframe. On the y-axis, I will return the Sales. The respective margin should represent the colour of the bars. The lowest margin should be red, whereas the highest one should be displayed as a green bar. Instead of the default plotly template, I am going to use the plotly white theme. For the title, I picked 'Profit Analysis'. Optionally, we could save the chart as an HTML file & export the grouped dataframe to an excel workbook. After running this cell, we can see that the US generated the highest sales but got the worst profit margin. France, on the other side, seems quite profitable. Looking at our folder, we can see that plotly exported the chart as an HTML file and the Excel workbook looks like this. Ok, and before moving on to the next example, I would like to empathize that this was a simplified example. In real life, the datasets you are dealing with might be not that well-structured. Nonetheless, I hope you can also see the power of Python, or more precise the options you have on hand, with all those great Python packages. Ok, with that said, let us move on to a more realistic business scenario. The purpose of the script will be very similar to before. We want to merge multiple Excel files, perform some calculations and visualize the results. Yet, this time we would like to keep the original formatting and formulas. So, in the input folder, we can see the 5 Excel files again. However, the worksheet looks like this. You will notice that the data does not start from the first row. Additionally, the Gross Sales, Net Sales and Profits are calculated with the help of Excel formulas. I now intend to merge all files, but I want to preserve the current formatting & formulas. Let us see how we can do this. For the first step, I am going to import the libraries. This time, I will also be importing xlwings as xw. As before, I am going to declare the input- & output paths. Instead of creating an empty dataframe, we will generate an empty workbook and rename the first worksheet to summary. After executing this cell, you should see a fresh workbook that looks like this. We will use this workbook to append the data from the other Excel files. However, this time we need to write a little bit more code. As before, I am going to iterate over each Excel file and open up the workbook. When opening the file, I want to suppress any potential Excel warning by setting display alerts to false. Optionally, you could also set the visibility to false. In that case, xlwings will open the workbooks in the background. Next, I am specifying the worksheet with our data. If we look at the excel file, we can see that the data starts from cell B5. You can get the dimensions of the Excel range dynamically through the expand method. It is like pressing CTRL+Shift+Down and the right arrow key on your keyboard. BTW, for now, I will ignore the headers. We will take of the headers in just a moment. As they have now the data selected, I will copy it to my clipboard. Before pasting it into our summary workbook, I need to determine the last empty row. As before, we can use the expand method, which gives us back a range object. From this range, I would like to return the last cell. In our current empty workbook, this line of code would return 1. However, when I append the data, I need to add one more row. Otherwise, I will overwrite the last row. Now that we know the last row, I can construct the cell range by concatenating it with Column letter A. After pasting the data, we can close the workbook. So, when I execute the cell, we will see that the workbooks are shortly flashing up. Once done, we are left with our summary workbook, which looks like this. As expected, the original formatting and formulas got preserved, but the header is still missing. We could also copy the original header from one of the workbooks, but I wanted to show also some more functions of the xlwings library. Therefore, I wrote down the column headers in a list. The header range will be from cell A1 to K1. I will insert the header names within that range, set the font to bold & white, and fill the background with a grey colour. After running this cell, we will see our header. One of the very cool features of xlwings is that it makes it very easy to create a pandas dataframe out of any given cell range. For instance, this line of code will transform the data from our summary sheet into a pandas dataframe. Once we have the dataframe, we can use pandas to perform any calculation we may need, like aggregating the sales by country. Afterwards, we can export the dataframe back to our Excel sheet. In my example, I will insert it into Cell M1. Xlwings also allows us to create Excel charts. In my Jupyter notebook, I will first insert a blank chart object into the top left corner of cell M8, with a width of 400 and a height of 200. I will then set the source data to the sales by country and choose a stacked bar chart on the chart object. After executing this cell, we have got now our Excel chart. However, you could also take advantage of the matplotlib, seaborn or pandas plotting capabilities. In this example, I am going to use the default horizontal bar chart from pandas. Xlwings will then convert the chart into a picture and insert it into our worksheet. As before, I can set the position and dimension of the chart. If I run this cell, we can see that the bar chart got added to our worksheet. Now, I kept the design pretty simple here, but you could, of course, customize the charts further. As the last step, I will adjust the column width and save the workbook before closing it. In fact, if this is the only workbook open, I will quit the Excel instance. Ok, guys, and if we go back to my folder, we will find our summary workbook we just have created. Now, all those steps might seem like a lot of effort just to merge the Excel data and do some basic plotting. But keep in mind, if you need to perform this kind of task daily, weekly or monthly, writing the script once might save you a lot of time in the long run. Especially if you want to keep the Excel formatting & formulas, xlwings might be the library you want to use. And if you are coming from a VBA background, the syntax might be very familiar to you, as xlwings is just a smart wrapper around the pywin32 library. As we have seen, xlwings also works nicely together with the pandas library. However, be aware, if you are dealing with many excel files or larger datasets, xlwings, or more precise Excel, might be not the best solution. If you want to learn more about xlwings, check out the videos in the info card. Before wrapping up the video, I want to give you my general take of Excel automation and my view on VBA vs Python. After seeing the possibilities of Python, you might be tempted to automate all your excel workflows. Yet, ask yourself if it is worth the time to spend hours on coding just to save you a couple of minutes doing the task manually. Also, keep in mind, there are already many great tools out there. You just need to find the right tool for the job. For instance, merging and cleaning up excel data can also be done with PowerQuery. Last but not least, I want to touch on the question if you should use VBA or Python for Excel automation. I am planning to create a separate video on this topic. But in a nutshell, programming languages are just like tools in a toolbox. And different tools are designed to solve different problems. Python is a general-purpose language, whereas Microsoft developed VBA to extend only office applications. I think one of the biggest advantages of VBA is the very intuitive syntax for Excel users. Just by looking at this VBA code, you can already guess what this code will do. Also, VBA comes already with office applications. There is no additional installation or setup required. This can be a huge advantage, as you can send your Excel macros to colleagues, and they can use it right away. Yet, VBA is mainly used for Office applications. With Python, on the other hand, you can build whatever you want. And most likely, there will be already a package available that makes your developer life easier. Also, you should keep in mind that Python is one of the fastest-growing programming languages. Therefore, learning Python might be an excellent investment that will pay off dividends in the future. That said, distributing your python scripts to colleagues is not as straightforward as VBA macros. There is, of course, the option to convert your python script to a standalone executable file. Yet, more often than not, those executable files are blocked by the IT security in larger corporations. So overall, Python might be more powerful than VBA, but as I said earlier, what matters is what you intend to achieve. There is no need to kill a mosquito with an atomic bomb when a simple spray can do the job. I sometimes find VBA more appropriate for a certain task, but I love the powerful Python packages and the unlimited possibilities Python offers. Let me know your thoughts on this topic or any questions you might have in the comments section below. Thanks for watching, and see you next time!
Info
Channel: Coding Is Fun
Views: 2,676
Rating: 5 out of 5
Keywords: coding is fun, Automate Excel Reporting Using Python (Real-Life-Example), python automation, automate excel, python excel automation, pandas tutorial, xlwings tutorial, plotly tutorial, python vs vba, python data analysis, data science, data analysis, merge excel files, combine excel files, merge workbooks, combine spreadsheets, excel automation, excel python script
Id: JoonRjMsSdY
Channel Id: undefined
Length: 12min 51sec (771 seconds)
Published: Sat Oct 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.