Hey guys,
In this video, I will be sharing my favourite ways to automate Excel using Python. In particular, we will be using the open-source
tools: Pandas, xlwings & plotly. After this video, you will be able to create
a custom Python script that allows you to combine excel files & create charts out of
them. And without further ado, let us get started. First things first, we need to install the
required dependencies. I will upload this Jupyter Notebook also to
my GitHub page. So, if you want, you could remove the comments
and run this cell to install the packages. Alternatively, you can install the libraries
by typing pip install followed by the package's name in your command prompt or terminal. So pip install pandas, openpyxl, plotly-express,
and pip install xlwings. Once we have sorted this out, let us start
with a simple example. The main purpose of this example is to merge
multiple Excel files, perform some calculations on the data and visualize our result. Let me show you what I mean by this. In the same directory of my Jupyter notebook,
I have stored the examples. If we have a look at the input folder, we
will see 5 excel files. Each Excel file contains the financial data
for the respective country. For instance, if I open the file for Canada,
we can see business transactions by Segment & Product. Additionally, we have got a lot more information,
like the total sales, cost of goods, profit, and so on. The data for the other countries looks exactly
the same. Out of the five countries, I would like to
know which one has generated the highest Sales and Profit Margin. To find this out, I will switch back to my
Jupyter Notebook. As the first step, I will import the libraries. Pathlib is a Standard Python module, so no
additional installation is required. I will also import pandas as pd and plotly-express
as px. Then, I am going to declare the path of the
example files we just have seen. As we will also be exporting an interactive
chart, I will also declare the output directory. To combine all Excel files, we only need three
lines of code. As a first step, I will create an empty dataframe. Afterwards, I iterate over the input directory
and append each excel file to our dataframe. Keep in mind that this only works, as all
the files has the same structure. To validate if our dataframe contains all
countries, I will print out the unique values of the country column. As we can see, we have got all five countries
in the dataframe. Now, we could go ahead and use all the powerful
tools pandas has to offer. For this example, I will be grouping the data
and calculate the sum of the Sales & Profit column. After running this cell, our new dataframe
looks like this. We could already identify which country generated
the highest sales. Yet, it would also be good to know the Profit
Margin for each country. We could easily do this by inserting a new
column, dividing the Profit by Sales, and multiplying the result by 100 to return the
Profit Margin in percent. To make the analysis easier, we could also
plot our new dataframe. I am going to use a bar chart of the plotly
express library. The first argument is our dataframe. On the x-axis, I would like to return the
countries, which are currently used as the index of the dataframe. On the y-axis, I will return the Sales. The respective margin should represent the
colour of the bars. The lowest margin should be red, whereas the
highest one should be displayed as a green bar. Instead of the default plotly template, I
am going to use the plotly white theme. For the title, I picked 'Profit Analysis'. Optionally, we could save the chart as an
HTML file & export the grouped dataframe to an excel workbook. After running this cell, we can see that the
US generated the highest sales but got the worst profit margin. France, on the other side, seems quite profitable. Looking at our folder, we can see that plotly
exported the chart as an HTML file and the Excel workbook looks like this. Ok, and before moving on to the next example,
I would like to empathize that this was a simplified example. In real life, the datasets you are dealing
with might be not that well-structured. Nonetheless, I hope you can also see the power
of Python, or more precise the options you have on hand, with all those great Python
packages. Ok, with that said, let us move on to a more
realistic business scenario. The purpose of the script will be very similar
to before. We want to merge multiple Excel files, perform
some calculations and visualize the results. Yet, this time we would like to keep the original
formatting and formulas. So, in the input folder, we can see the 5
Excel files again. However, the worksheet looks like this. You will notice that the data does not start
from the first row. Additionally, the Gross Sales, Net Sales and
Profits are calculated with the help of Excel formulas. I now intend to merge all files, but I want
to preserve the current formatting & formulas. Let us see how we can do this. For the first step, I am going to import the
libraries. This time, I will also be importing xlwings
as xw. As before, I am going to declare the input-
& output paths. Instead of creating an empty dataframe, we
will generate an empty workbook and rename the first worksheet to summary. After executing this cell, you should see
a fresh workbook that looks like this. We will use this workbook to append the data
from the other Excel files. However, this time we need to write a little
bit more code. As before, I am going to iterate over each
Excel file and open up the workbook. When opening the file, I want to suppress
any potential Excel warning by setting display alerts to false. Optionally, you could also set the visibility
to false. In that case, xlwings will open the workbooks
in the background. Next, I am specifying the worksheet with our
data. If we look at the excel file, we can see that
the data starts from cell B5. You can get the dimensions of the Excel range
dynamically through the expand method. It is like pressing CTRL+Shift+Down and the
right arrow key on your keyboard. BTW, for now, I will ignore the headers. We will take of the headers in just a moment. As they have now the data selected, I will
copy it to my clipboard. Before pasting it into our summary workbook,
I need to determine the last empty row. As before, we can use the expand method, which
gives us back a range object. From this range, I would like to return the
last cell. In our current empty workbook, this line of
code would return 1. However, when I append the data, I need to
add one more row. Otherwise, I will overwrite the last row. Now that we know the last row, I can construct
the cell range by concatenating it with Column letter A. After pasting the data, we can close
the workbook. So, when I execute the cell, we will see that
the workbooks are shortly flashing up. Once done, we are left with our summary workbook,
which looks like this. As expected, the original formatting and formulas
got preserved, but the header is still missing. We could also copy the original header from
one of the workbooks, but I wanted to show also some more functions of the xlwings library. Therefore, I wrote down the column headers
in a list. The header range will be from cell A1 to K1. I will insert the header names within that
range, set the font to bold & white, and fill the background with a grey colour. After running this cell, we will see our header. One of the very cool features of xlwings is
that it makes it very easy to create a pandas dataframe out of any given cell range. For instance, this line of code will transform
the data from our summary sheet into a pandas dataframe. Once we have the dataframe, we can use pandas
to perform any calculation we may need, like aggregating the sales by country. Afterwards, we can export the dataframe back
to our Excel sheet. In my example, I will insert it into Cell
M1. Xlwings also allows us to create Excel charts. In my Jupyter notebook, I will first insert
a blank chart object into the top left corner of cell M8, with a width of 400 and a height
of 200. I will then set the source data to the sales
by country and choose a stacked bar chart on the chart object. After executing this cell, we have got now
our Excel chart. However, you could also take advantage of
the matplotlib, seaborn or pandas plotting capabilities. In this example, I am going to use the default
horizontal bar chart from pandas. Xlwings will then convert the chart into a
picture and insert it into our worksheet. As before, I can set the position and dimension
of the chart. If I run this cell, we can see that the bar
chart got added to our worksheet. Now, I kept the design pretty simple here,
but you could, of course, customize the charts further. As the last step, I will adjust the column
width and save the workbook before closing it. In fact, if this is the only workbook open,
I will quit the Excel instance. Ok, guys, and if we go back to my folder,
we will find our summary workbook we just have created. Now, all those steps might seem like a lot
of effort just to merge the Excel data and do some basic plotting. But keep in mind, if you need to perform this
kind of task daily, weekly or monthly, writing the script once might save you a lot of time
in the long run. Especially if you want to keep the Excel formatting
& formulas, xlwings might be the library you want to use. And if you are coming from a VBA background,
the syntax might be very familiar to you, as xlwings is just a smart wrapper around
the pywin32 library. As we have seen, xlwings also works nicely
together with the pandas library. However, be aware, if you are dealing with
many excel files or larger datasets, xlwings, or more precise Excel, might be not the best
solution. If you want to learn more about xlwings, check
out the videos in the info card. Before wrapping up the video, I want to give
you my general take of Excel automation and my view on VBA vs Python. After seeing the possibilities of Python,
you might be tempted to automate all your excel workflows. Yet, ask yourself if it is worth the time
to spend hours on coding just to save you a couple of minutes doing the task manually. Also, keep in mind, there are already many
great tools out there. You just need to find the right tool for the
job. For instance, merging and cleaning up excel
data can also be done with PowerQuery. Last but not least, I want to touch on the
question if you should use VBA or Python for Excel automation. I am planning to create a separate video on
this topic. But in a nutshell, programming languages are
just like tools in a toolbox. And different tools are designed to solve
different problems. Python is a general-purpose language, whereas
Microsoft developed VBA to extend only office applications. I think one of the biggest advantages of VBA
is the very intuitive syntax for Excel users. Just by looking at this VBA code, you can
already guess what this code will do. Also, VBA comes already with office applications. There is no additional installation or setup
required. This can be a huge advantage, as you can send
your Excel macros to colleagues, and they can use it right away. Yet, VBA is mainly used for Office applications. With Python, on the other hand, you can build
whatever you want. And most likely, there will be already a package
available that makes your developer life easier. Also, you should keep in mind that Python
is one of the fastest-growing programming languages. Therefore, learning Python might be an excellent
investment that will pay off dividends in the future. That said, distributing your python scripts
to colleagues is not as straightforward as VBA macros. There is, of course, the option to convert
your python script to a standalone executable file. Yet, more often than not, those executable
files are blocked by the IT security in larger corporations. So overall, Python might be more powerful
than VBA, but as I said earlier, what matters is what you intend to achieve. There is no need to kill a mosquito with an
atomic bomb when a simple spray can do the job. I sometimes find VBA more appropriate for
a certain task, but I love the powerful Python packages and the unlimited possibilities Python
offers. Let me know your thoughts on this topic or
any questions you might have in the comments section below. Thanks for watching, and see you next time!