PyGWalker - Python Data Visualization tool / Streamlit Integration

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
when you have a lot of data in your application it's often important to visualize that data and analyze it in different ways so that users can get a better understanding of the data and that's where we're going to introduce a python library and it's called Peg Walker now when I first saw this Library I thought you would pronounce that Pi G Walker but apparently this is actually called Pig Walker and what it is is an open source alternative to an application like Tablo which is used to analyze and visualize data so Pig Walker is an alternative to that and in this video we're going to take a data set and we're going to analyze and visualize that data with pigwalker and we're also going to see how to integrate Peg Walker into a streamlit application so let's get started I have the documentation for pig Walker open and as you can see it allows you to turn your pandas data frame into a tableau style user interface for visual analysis and this is extremely easy to do so we're just going to dive into it in this video and let's start by going to a Jupiter notebook what we have at the top here is a pip and install command that you can execute in the notebook in order to install the peg Walker Library into the notebook environment so make sure you install that what I'm going to do below that cell is import the pandas Library as PD and I'm going to load a bit of data into a data frame here by calling the read CSV function and we're going to pass a file name to that function now if we go to the desktop here and we go to the directory in which this Jupiter notebook is in you can see we have a file here called kaggle income dot CSV I'm going to copy the name of that file and paste it into the read CSV function and then we can output the heads of this data frame and I'm going to show you where you can get this data set in a second but as you can see we're getting a Unicode decode error now we're going to fix this error but first of all let's have a look at the source of this data set it's data on us household income statistics and this is from kaggle I'll leave a link to this below the video you can download this data at the top right here and once that's downloaded you will have fail called kaggleincome.csv so this is the data set what we're going to do is go back to the read CSV function and it turns out we need to pass an encoding to this function in order to correctly read in the data to a data frame and the read CSV function and pandas can take an encoding keyword argument and we can set that equal to in this case Latin one that's the encoding that we want to use in order to read in the data to the data frame so when we execute this when we get the head of the data frame back we are now getting these rows in the data frame so the data frame is now loaded from that file and if we look at the data frame on household income statistics you can see that we have for example states in the USA as well as counties and cities and on the right hand side we have some numerical data for example we have the Min and median incomes for that particular location as well as the standard deviation of incomes over that region as well and we also have latitudes and longitudes and some other numerical data as well what we're going to do is use Peg Walker and this video and we're going to create some visualizations on this data and we're going to see how easy this is to achieve with a tool like pigwalker so let's scroll down to a new cell and what I'm going to do in this cell is import pigwalker and we can do that as in this case pyg so we're importing Peg Walker as Peg and then when we have the peg imported what we can do is we can call Peg dot walk and this walk function takes as an argument the data frame that can be a pandas data frame or a polars data frame and when we call that pig.walk function it will give us back an object that we can store in a variable called Walker so let's now execute this cell and we're going to see what comes up on the Jupiter notebook for this particular function pig.walk so when we scroll down you can see we have this visualization tool that has become available for us on the notebook what we have is the ability to drop some of the columns that you see in the left hand side which are coming from the pandas data frame we can drop them onto the X and the y-axis of a chart in order to build up some charts using this tool called Pig Walker so let's say that we wanted to look at the average or the Min income for each state in the USA we can take the name of the states and we can add that to the x-axis of a chart and what that's going to do is add every single state in the USA to the x-axis of a chart and on the y-axis on this bar chart that we're going to develop we want to put the mean income so if we scroll down here you can see there is a column called Min let's use that mean column and drag it up to our y-axis and place it into the y-axis of this chart and you can see we get back this bar chart here and for each state in the USA it's showing the mean values and these mean values have been summed up we're performing an aggregation so for example there might be many entries in this data frame for the State of Florida what this is doing is it's taking all of the means for each entry and it's summing them up to get the true mean over Florida we can set this equal to the Min that's the aggregation column and this gives us back about chart with all of the states in the USA and the mean incomes for those States now you might want to view this from the highest to the lowest or vice versa we can actually sort the chart by simply clicking these buttons so we can sort an ascending order and we are going to get back the state with the lowest mean income first and in this case that's Puerto Rico followed by Mississippi here we can also flip this round by clicking the descending order and we can see that the District of Columbia apparently has the highest mean income followed by Connecticut so you can see how easy it is to build these charts with PEG Walker we can also add filters to the charts to customize what is shown on the charts for example if we drag the state name into the filters column we then get this model popping up and we can select a bunch of States here and it's going to only add the states that we select to the resulting chart so what I'm going to do is unselect every single state and then I'm going to only select some of the states in the south of the USA for example Minnesota and we also have North Carolina Tennessee let's select that one as well as South Carolina and I'm also going to select Florida and if we scroll down a bit further we can see also Texas here so let's select those and we can then confirm that selection and you can see the resulting bar chart now only contains the states that we selected so we've applied a filter to this bar chart in order to customize what's shown on the chart now you may also want to add a legend here or different colors for each state we can also do that by dragging the state name into the color filter here or the color field and what that's going to do is apply a different color to each state in the data that's visualized in this bar chart and when we do that every bar in the chart has a different color and there's also a legend telling us which color and which label belongs to each bar and we can also change the layout of this chart for example we have a tool here called layout mode and that is set to Auto but we can change that to fixed and then we can drag and drop this chart across the pig Walker UI and you can see that customizes the length and the width of the charts and also the width of the bars that are displayed in that shot now it's important to note at the moment this chart is in aggregation mode and that's why we have not only the column of mean but we perform an aggregation over that y-axis column in this case we are getting the mean values for each state we can also set that to different aggregations for example count and you can see in this case the state of Texas has the most rows in the data that have a Min Texas has 2 300 rows followed by Florida and that goes all the way down to South Carolina here and that's why when we sum these up you're expecting to see Texas at the top because it has the most rows and therefore the most mean values in the data and when it sums those up it gives us back that so when we are applying an aggregation we want to apply the Min here in order to give a balanced representation of the actual mean values for each state so that was a quick assign I hope that makes sense let's now move on another thing we can do here is transpose this chart so that's very simple to do we have this tool here when we click that we are transposing the bar chart and now we have a horizontal bar chart rather than a vertical bar chop and I think this is a bit clearer for this data we have the states on the left hand side on the y-axis and on the x-axis we have the mean values for each state so we can see how easy it is to create these bar charts and to apply sorting and transpose the chart and also change the layouts of the charts and also we can see how to apply filters to the charts what I want to do now is just show how to change the chart that's displayed here so currently we have a bar chart what I'm going to do is remove these filters from the chart and I'm also going to undo the transpose of this chart as well and what we're going to do now is we're going to change the type of chart that's displayed here and we can do that by clicking this icon here for the mark type and you can see the different options we have for example the line chart the area chart scatter charts and other charts as well so I'm going to click the area shot and we're going to see how this affects the resulting visualization and actually you can see that has removed everything from the visualization let's also remove the color from this filter field here and now you can see the chart and you can see the different states that are contributing the highest mean here from the District of Columbia down to the Mississippian Puerto Rico States now if we wanted to take the summation here rather than the mean what we can do is change the aggregation to the sum and you can now see very clearly that California has the highest sum of means in this data and you can probably infer from that that California is the state that contributes the most to the US GDP let's now move on and create another type of chart this time I want to create a scatter plot in the pig Walker UI and I want to show on this scatter plot the main values that we're getting from the data for two states one being California and the other being Mississippi so because we're going to show each observation from the data frame I'm going to turn off this aggregation mode and what that does is it removes the aggregation function from the y-axis and it's then going to just use the raw data and you can see the plot that's created here for the area chart and you get to see every single point in the data and it creates this rather crazy looking area chart so on the x-axis we have the state name and on the y-axis we have the mean let's now add a filter to this chart and we're going to only filter it down to Mississippi and to California and you can see that filter on the left hand side we have one of California and Mississippi so it's only going to show the values for those States and this chart doesn't really make a lot of sense at the moment so we're going to change the type of chart now to a scatter plot so we can select the scatter option here and then on the right hand side we now get this scatter plot you can see from the scatter plot that California has a greater variety of values here a greater standard deviation and you can also see that it has much higher values in general than Mississippi but we can see this better actually with a different type of chart I'm going going to change the chart type here to a box plot which you can select near the bottom and that will give us back this kind of chart that you see here you can see the interquartile range here and that is displayed within the box that you see and that range is much higher for California than it is for Mississippi and when you hover over the box you can see some of the statistics here for example the median value for Mississippi is 47 423 whereas for California it is 72 331 so that median value of the mean incomes for California is much higher than it is for Mississippi and we can also see at the top here some outliers in the California data some very high numbers for the mean income and this implies that there are certain areas in California where the Min income is much higher than the average and a box plot allows us to see this kind of data in a better way so what we can also do when we've generated the chances we can export them to a PNG or an SVG fail in order to do that you can click this icon here to export and you get to select the type of file that you want and that will export the resulting chart as one of those file types let's move on we are going to show one more example and then I'm going to very quickly show how to integrate this with streamlit at the end of the video we're now going to show a different type of filter here rather than the state name which is our one of filter we're going to show a range filter now so let's go back to aggregation mode and on the y-axis again we're going to select the Min aggregation function over that column and we're also going to go back to a bar chart here so this is the aggregated values of the Min for California and for Mississippi you can see that California's mean is much higher over this data set than Mississippi's I'm actually going to remove this state filter that we have here and I'm going to add a different type of filter instead so let's grab the latitude column here and we're going to scroll up and add that as a filter and we can see a different type of filter here it's the red range filter now I'm just going to randomly select a lower boundary for the latitude and we're going to go up to the maximum value for that Latitude as well so let's confirm that and we're going to see the resulting chart now I'm not sure if this has cut back on any of the states so let's go back to the filter here and increase this latitude filter and confirm that now when we do that we get back a single bar and that's for Alaska so obviously I've went too far here the lower boundary of the latitude is actually too high in this example so the only U.S state that is north of that Latitude is Alaska so let's adjust this slightly and go down to let's say 41 and you can see we are now getting back more States I'm just going to adjust this slightly again and let's go up to 46 now and confirm and the states that we're getting back this time include Alaska Washington North Dakota Oregon and so on so these are the most Northerly states in the USA and the purpose of this example is simply to show that we can use these range filters here it's a different type of filter that we can apply to the data in order to filter down what the resulting chart is showing so let's now move on and what we're going to do to finish this video is show how to integrate Peg Walker with streamlit so what we're going to do is open up vs code and I have opened the directory that contains the data file which is the CSV file as well as the Jupiter notebook that we've been working with in this video what I'm going to do now is create a new file and let's call this file main.pi and this is going to be the file that contains a streamlit application now once we've got that open what I'm going to do is go back to the Jupiter notebook and I'm going to copy the code to read in this data frame into this particular file so let's copy that in at the top and we can remove the call to the data frame dot head we don't need that and what I'm going to do is activate a terminal here and within that terminal let's activate a virtual environment and we're going to install pigwalker using pep so let's run the PIP install command and we're going to pass the name of the library Pig Walker to that command and once that's installed what we're going to do is go to to the documentation here and I'll link this page below the video this is a page on integration between Pig Walker and streamlit and as it says on the what is pig Walker section this is a library that allows you to generate Scatter Plots line plots bar charts and histograms with simple drag and drop actions and no need for coding skills we've already seen a little bit of that but if we scroll down a bit further here we have a section on streamlit and also how to integrate the two and of course we need to have pandas Pig Walker and streamlit installed for this example now it's this section here on embedding pigwalker in a streamlit application what I'm going to do is copy these Imports at the top and I'm going to bring them into vs code so let's go back to vs code and I'm going to replace the pandas input with all four of these and let's go back to the documentation and we're basically going to copy these two lines of code here and again we'll bring them into vs code so what we're doing here is we're using streamlit and we're setting some page configuration or setting in the layout to wide which is what we want to do because we want to show this pig Walker UI on the page so the wider the page the more room we have to work with for that UI we're also setting the page title which will be shown on the browser we'll show that in a second and finally we're explicitly adding our title to the streamlab application and the title is use Peg Walker in streamlit let's go back to the documentation and we are going to copy this line of code here and I'm going to paste that in and explain what's going on here so if I paste that at the bottom here we're creating a variable called Peg HTML and again we're calling that walk function that we called in the Jupiter notebook and we're passing the data frame to that function but we can also pass a keyword argument to the function and that's return HTML and we're setting that to True Now the default argument for this parameter is false when we set it to True what's going to happen is when we call the walk function it's going to return the HTML for the pigwalker UI and then we can use that HTML within the Streamline application and show this particular widget on the page so we get back the pig Walker HTML let's go back to the documentation one more time and again we're going to copy this line of code and we'll paste that just below this HTML so what we're doing in this line of code is we're embedding the HTML into a streamlined application and we do that by calling the components.html function and we pass the HTML into that along with our height and the parameter of scrolling set to true now what is this components that we're using here to call an HTML function if we scroll to the top here you can see this import where we are importing streamlit.components.v1 as components and this module has this HTML function that we can use in order to embed external HTML into our streamlit page so if we go to the documentation very quickly there is a page on these components and you can see this create a static component section here if your goal in creating a Streamlight component is solely to display HTML code or render a chart from python then streamlit provides two methods that greatly simplify this process that's the components.html function and the components.iframe function and we are using this components.html function that's defined so that explains what this code is doing and all we need to do now is actually run the Streamlight application and see what kind of output we're getting on the page now in order to run that Streamlight app we can use the command line tool and we have a command called run here and we can specify the name of the file and that will then run that file and load it into the browser and when the page loads here you can see the title that's appearing and after the loading is complete we should see the peg Walker UI just below and you can see that has appeared in our streamlab application that's embedded here using that components.html function and again we can drag and drop these columns from the data frame into the X and the y-axis sections here and that is going to generate this chart here and again we can change the aggregation function for example by selecting in here and as before we can also change the order in which these are displayed so let's sort in descending order and that gives us back the state with the highest Min income according to this data set so very simple to integrate this pig Walker widget into a streamlit application and this might well work with any back-end framework for example Django or flask and that's because if we go back to vs code here when we call the walk function we can get back simply HTML that we can then embed into other application so that's all for this video we've learned how to use Peg Walker in this video and we've seen how to generate a variety of different charts and how to sort data and transpose data and also add filters to the charts to customize what's been shown at a given time and finally we've also seen how to bring pigwalker into a streamlit web application in order to show that widget in the context of a streamline application it's very easy to do that I'll leave all the links below the video for these particular pages that we've looked at thanks again for watching this video if you've enjoyed it please leave like And subscribe to the channel and we'll see you in the next video
Info
Channel: BugBytes
Views: 17,055
Rating: undefined out of 5
Keywords:
Id: ogyxjkYRgPE
Channel Id: undefined
Length: 20min 7sec (1207 seconds)
Published: Wed Jul 12 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.