Hey guys,
This tutorial will show you how to convert a simple excel sheet into this interactive
dashboard using Python. In particular, we will be using Pandas, Plotly
& the streamlit library. The beauty of Streamlit is that you can create
web applications directly in Python without knowing HTML, CSS, or JavaScript. In the dashboard, the user has different options
to filter the dataset. The KPI's on top, and the charts below are
updated accordingly. While coding out the solution, you will learn
very cool tips and tricks for data manipulation in pandas, how to use plotly to create interactive
visualizations, and of course, how streamlit enables you to create a dashboard like this
one. Before coding out the solution, let us have
a quick look at the data. We have got here the supermarket sales for
three different locations. Additionally, we have the information on whether
the customer was a Member or not, the gender of the customer, the different Product lines
and the total sales, including taxes. Furthermore, we also have the time of the
purchase and the customer satisfaction rating on their overall shopping experience. At least, those are the columns & information
we are going to visualize in our dashboard. To work with the data in Python, I will be
transforming the Excel data into a Pandas dataframe. An easy way to do this is by using the following
add-in, called 'My Tool Belt'. I will leave the download link and the video
tutorial for using this add-in in the description box below. However, you do not need this add-in to follow
along with the tutorial. In just a moment, I will explain how to read
the Excel data by using pandas. But for now, let me go ahead and click the
following button. As the next step, I will select the data and
hit ok. By doing so, the add-in created a python file
in the workbook's directory. Before opening up this file, I will rename
it to app.py. In the Python file, you will see that the
tool belt uses the pandas' library and the read_excel method. If you execute this script, you will get back
a pandas dataframe. As mentioned, you could type the following
lines of code yourself to read in the Excel data. Yet, using the add-in saved me some time,
as the tool belt inserted already the excel file name, sheet name, how many rows it needs
to skip, which columns you want to use, and how many rows are included in your selection. To interact with excel, I am using the 'pandas'
& 'openpyxl' library. Make sure you have those libraries installed
by typing' pip install pandas and openpyxl' in your command prompt or terminal. To create the web app, we will be using the
streamlit library. You can install streamlit by typing 'pip install
streamlit'. And last but not least, to create the visualization,
we will be using plotly-express, which you can install by typing 'pip install plotly-express'. Once done, I will be importing the libraries
we just have installed. So, go ahead and type import plotly.express
as px and streamlit as st. As the first step, I will be setting up some
basic configuration of our web app by using st.set_page_config. After setting up the title, I will give our
web app also a favicon. The favicon is a small square image that symbolises
a website in web browsers. Instead of an image, streamlit also supports
Emojis. In my case, I will pick a bar chart. On the following website, you can find a vast
selection of emojis. Just select the one you like and copy the
codename into your streamlit application. I will also include the link to this website
in the script. Next, I will specify how the page content
should be laid out. The default is set to "centered". However, I would like to use the entire screen. We can do this by setting the layout to 'wide'. Before firing up our streamlit app, I will
display our dataframe onto the page instead of printing it to our console. With this change in place, I will go back
to my command prompt. Make sure you are in the same directory as
your python script and type streamlit run, followed by the name of your script. After hitting enter, we will see our web app,
which contains the page title, page icon and the dataframe. Ok, so far, so good. Now that we have your web app running, I will
build out the sidebar section. We will use the sidebar to gather the user's
filter criteria and apply those filters to our dataset. I am planning to filter our dataset based
on the different cities, customer types and gender information. I will be using the streamlit multiselect
component for all the filters and starting with the city. For the label, I will type 'Select the city',
and for the options, I will use the unique values in our Excel column 'city'. We can also set a default value when starting
the web app using the keyword argument 'default'. In my case, I would like to display all City
Names. Ok, with that in place, I will be saving our
script and refresh the page. As a result, we can now see our sidebar with
the different cities. So far, it has no functionality, but we will
take care of that in just a moment. Before that, let me copy and paste the city
selection and adjust the code to display the different customer types and gender information. After a quick refresh, we have got now our
filter selection in place. For each field, streamlit will return a list
with the selected options. Currently, those lists are stored in our variables,
city, customer type and gender. To filter our actual dataframe, I will be
using the query method and store the filtered dataframe in a variable called 'df_selection'. I can now go ahead and query our columns based
on your selection. Pandas should filter the City column based
on the city list. You can use @ to refer to a variable. I will do the same for the customer type and
gender. To test this out, I will be returning the
filtered dataset to our web app instead of the original dataframe. After saving the script and refreshing the
page, let me test the selection by removing the normal customer type. As a result, we will get back a dataframe,
which only contains 'member' as the customer type. Likewise, we could go ahead and also test
the other selection fields. Now that we have this in place, I will display
some KPI's on the main page instead of our dataframe. Before calculating the KPI's, I will insert
a title. As before, I will use the bar chart emoji
in the title. To separate the title from the KPIs, I will
insert a new paragraph using a markdown field. For the KPI's, I would like to display the
total sales, the average rating and the average sales by a transaction. The total sales will be simply the sum of
the 'total column'. This line of code will return a float number. But as those KPIs are intended to give the
user a first glance at the numbers, I will display the sales amount without decimal values
by converting it into an integer. For the average rating, I will take the mean
of the rating column and round it with one decimal. Next to the average rating, I would also like
to illustrate the rating score by emojis. Therefore, I will multiply the number of the
star emoji by the average rating. I will take a more simplified approach here
by rounding the rating. So, a result of 6.9 would be displayed as
seven stars, for instance. To multiply this number with the star emoji,
I will also convert it to an integer. Ok, and last but not least, I will calculate
the average sales of a transaction by applying the mean to the 'Total' column. With that in place, we can go ahead and insert
the figures next to each in three separate columns - a left, middle, and right column. To do so, I am using the streamlit columns
method. As the next step, I can now insert the content
into the different columns. Within the left column, I would like to display
the total sales. I am using here an f-string to concatenate
US-Dollar with the actual values we have calculated earlier. To make it easier for the user, I will display
the total sales with a thousand separators. Within the middle column, I will place the
information about the rating on the very right side of the page; I will put the average sales
per transaction. To separate those KPI's from the next section,
I will also insert a divider by using a markdown field with three hyphens. Ok, before moving on, let us test this by
saving the script and refreshing the web app. Ok, and here it is. As expected, when we change the filters, we
should also see our KPI's changing. In the next section of the web app, I will
include two bar charts. The first bar chart should plot the sales
by product line. Before plotting the graph, we will need to
do some calculations in pandas. To explain those calculations, I will shortly
switch to my Jupyter Notebook, and we will come back to our script in just a moment. Ok, here in my Jupyter Notebook, let us see
how to aggregate the sales by product lines. One way to do this is to use the pandas groupby
method. In our case, I would like to group the dataframe
by Product line, and I would like to sum up the values. After running this line, pandas will sum up
all numerical values by Product line. However, we are only interested in the Total
sales; therefore, I will filter the dataframe accordingly. As a result, the dataframe will look like
this. Optionally, we could also sort the values
within the new dataframe. By default, pandas will sort the values from
the highest to the lowest number. Ok, and this line of code will be the starting
point for our bar chart. So, back in our script, I will type out what
we have just seen in the Jupyter notebook. I will store the bar chart in a variable called
'fig_product sales'. We will use the plotly express library to
plot the data. To create the bar chart, we can type 'px.bar',
followed by the name of the dataframe. I plan to use a horizontal bar chart to display
the total sales on the x-axis and the different product lines on the y-axis. If you have a look at our dataframe, the product
lines are currently used as our index. Therefore, I have typed it like this here. As mentioned already, we use a horizontal
bar chart. You can also use HTML Formatting Elements
within the title, like displaying the title as bold text. To set the colours of the bars, you can use
'color_discrete_sqeuence'. I am using a little hack here by multiplying
the hexadecimal code with the length of the dataframe. On a side note, plotly comes already with
different template styles. I personally like the plotly white template,
which is a very clean & tidy chart design, in my opinion. Ok, now that we have our bar chart, I will
plot it in the web app using st.plotly_chart. Like before, whenever we change the filters,
the bar chart will be automatically updated. To polish up the graph, I would like to do
some minor tweaks. For instance, I would like to remove the gridlines
and the background colour of the chart. To do that, I will update the layout and set
the background colour to transparent and remove the grid from the x-axis. After this minor tweak, the updated chart
will look like this. Ok, let us now move on and create a second
chart to plot the sales by hours. The first challenge will be to separate the
hour from the time column. As before, I will shortly switch to my Jupyter
Notebook. If we inspect the dataframe, we can see that
the Time column has currently the datatype 'object', which is like a python string. That is why you will get an error if you try
to extract the hour information from the 'time' column. First, we need to convert the column into
a DateTime object using the build-in pandas method 'to DateTime'. Additionally, I also need to specify the current
format, so first, we have the hour, then minutes and then the seconds. Once we have transformed the 'time' column
into a DateTime object, I can easily extract the hour information. The new column will look like this. And if I print out the entire dataframe, we
will see the 'hour' column on the very right. Ok, so let me copy this line of code. Back in the script, I will paste it right
after we have loaded the dataframe. In fact, as we are already here, I will improve
the performance of our web app. You might have noticed already, but every
time we change the filters in our web app, streamlit will rerun our entire script and
load the dataframe again and again. We could avoid this by caching the dataframe. Think about it, like storing the dataframe
into a short term memory. Whenever we filter our dataframe, we will
not read in our excel file again. Instead, we will get the information from
your short term memory. To do that, I will wrap the following lines
into a separate function. Once we have the function defined, I will
use the streamlit cache decorator. With that in place, I will call the function
to store the returned dataframe in a variable. And that is all there is to it. Ok, now, back to the bar chart. The bar chart will be very similar to the
one we have plotted before. First, let me go ahead and group the sales
by hour. Then, I will construct the bar chart. There is not much change compared to the previous
one, except we are not using a horizontal bar chart this time. As before, I will also update the layout and
use a transparent background and turning off the grid lines. Additionally, I will set the tick mode to
linear. This change will ensure that all the tick
labels on the axis are displayed. Once done, I can plot the chart. Ok, now that we know that is also working,
I will place the two charts horizontally next to each other. So, let me delete the charts here and insert
a left- and right column. In the left column, I will display the hourly
sales and use the entire container width. And in the right column, I will place the
sales by-products. To finish up the report, I will apply some
styling. First, let me remove the hamburger menu icon,
the 'made with streamlit' footer note and the colourful header. We can achieve this by injecting some custom
CSS into our web app. In the CSS code, we will set the visibility
of the main menu, footer and header to hidden. After a quick refresh, we can see the changes
in place, and I think this looks much cleaner now. The last tweak will be in regards to the overall
streamlit appearance. Instead of the default streamlit colours,
we could use a custom theme. All you need to do is to navigate the root
folder of your streamlit application. Here, you want to create a new folder called
".streamlit". In this new folder, create a new file called
'config.toml'. Open this file with your preferred text editor
and paste the following code here. I will make sure to upload this project also
to my Github profile. You will find a link to that repo in the description
box. So you could just copy the code from there. Within the theme, you can set different parameters,
like the primary colour, background colour, and so. Ok, and after saving the file, we need to
rerun our server to reflect the changes in our web app. So, back in the command prompt, I press Ctrl
+ C to stop the current session and rerun our application. And now, you will see that streamlit changed
our overall theme. Ok, guys, and that's it for this video. You could now also go ahead and quickly deploy
your web app to the internet for free. I have created already a tutorial to show
you how to deploy your streamlit app. I will leave the link to that video in the
info card. As always, if you have any questions, please
let me know in the comments. Thanks for watching and see you next time.