PyGWalker for Exploratory Data Analysis In Jupyter Notebooks

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
creating effective and compelling data visualizations is a key part of the data science workflow many of us turn to python to create visualizations using matplot web c bar and plotly Etc but all of this requires some level of coding and there are a number of libraries available out there that help with the exploratory data analysis workflow so some of these libraries have already covered in my medium articles such as sweet res detail and many others as well and these make it very easy to get a statistical analysis as well as visualizations of our data with very limited lines of code and one Library that's just been released is pi G Walker or as they offers like to rename it as Pig Walker Pig Walker is a python library that can help speed up the data analysis and visualization workflow directly within our jupyter notebook and it provides an interface similar to the very popular data analytics platform called Tableau where we can drag and drop features from our data set into the workspace and create interactive visualizations all with minimal effort so in today's video we're going to see how we can use Peg Walker on a well data set we're not going to do an in-depth tutorial this is more just an overview of what it can do and some of the basic functionality so let's go over to our Jupiter notebook and see how we can use the library to get started with PEG Walker we need to install it first we could do this through the terminal but we can do it directly from Jupiter notebook using an exclamation mark followed by pip install and then the library name in this case it's Pi G Walker or Pig Walker and when we run that we'll go through the steps of installing that library and then it will tell us that it's done so in this case I've already installed it so it's just telling me that that requirement is already satisfied so we can then move on to the next step but if we end up with significant text here which we may do when we're installing the library we can click on this blue bar on the left hand side just to shrink that output down so we don't see it and we don't have to run this every time we come back to the notebook we just need to run it the ones next we need to import our libraries that we're going to use which is pandas and that is going to be imported as PD which is an abbreviation for pandas this just makes it easier when we're referencing that library in the code rather than typing out pandas we can just type PD similarly with pig Walker we import that is pyg so we can run that to import the libraries and then we're going to load the data into a data frame object and we do that using pandas read underscore CSV and for this example I'm using data from the Zeke Force 2020 lithology machine learning competition and we're just using one well within that data set to then get Pig Walker working we call upon pyg dot what and then pass in our data frame so we can run that and within a few seconds we get this interface here where we've got a nice dark looking interface which is very nice I prefer dark mode anyway so on the top left here we have two tabs data and visualization so the data tab will present all of our data that is contained within that data frame we have our index for the data frame as well as a categorical variables such as the well name group formation and lithologies further along and then we have our numeric variables such as caliper deep resistivity row b gamma ray Etc and we can change these if they've been incorrectly selected we just click on the drop down for each of these and we can switch it to a dimension or a measure so Dimension is where we've got a categorical variable and a measure is where we've actually got a measurement and a continuous variable so there's not much functionality within here we can scroll through and just look at the values and that's pretty much it but it just provides a nicer looking interface to looking at data frames compared to pandas options the main part of the library is the visualization Tab and this is where we can visualize our data now if you're familiar with Tableau which is a data analysis package and much more we usually have our variables on the left and then we can drag and drop them into the specific spaces such as the x-axis the y-axis filters and we can even color it change the opacity size shape Etc and it's all done through drag and drop so if I take for example row b and plot that on the y-axis we then just get a single bar back and this is just a summation of all the values within that particular curve and then we can put n Phi on the x-axis what we get back is a scatter plot but we'll see that we've only got a single point on here and this is because it's summing up all the values by default Pig Walker as has the aggregation set on so we can click this little Cube icon in the menu after a few seconds it comes back with the scatter plot where we can then see the individual values being plotted now this chart is a little small but we can change that by just clicking on this layout mode and changing it to fixed and then we can see that the shape has changed but we can also now easily select the edges of the chart with this double-headed arrow and we move that over to the right and then we do the same with the bottom and then we have a much bigger chart to look at and we can hover over each of the points we can see the individual values for these points N5 or B uh we can see that changes as we move through the data now if we want to zoom in and move around the port we need to come to this little icon here for Access resizing so it's a double headed Arrow we would click on that and it will turn a sort of a very dark shade of blue it's hard to see in the dark mode but when you do that you can then click on the plot and left click with the mouse button and then we can move around the plot and then we can also zoom in with the mouse wheel at present it doesn't look like we can add the axes to change the values something which is very popular within plotly where you can just double click on certain parts of the axis and then change the values within that maybe that is something that will come in a future version but for this version uh we we can't really control that other than being able to move around and zoom in so we've got different ways that we can use the other variables so if we take for example the formation variable here and drag that into the color column or row here we'll soon see that the plot will update with colors representing the different formations we see we've got the the frag formation in Orange here which you can see is hiding sort of behind all these points here and then we've got other affirmations over here on the left however we can see that this is quite a lot of data if we want to remove some of it we can do just by dragging and dropping that variable within the filters category here and then it will automatically pop up with editing the rule and that will list all of the available formations so if we are only interested in say the border and the Celly formation we can unselect them now this may seem counter-intuitive at first but then we just click on the reverse selection button here and that will just reverse our selection alternatively we could have unselected all of the different categories and then selected the ones that we want so just a couple of different ways that we can work with this which is nice so if we click on this little tick icon here we'll go back to our plot and then we'll see that we're left with the two formations Boulder and necessary formation so this is very nice we can then view that that plot and if we wanted if we wanted to we can color things by different variables such as by the the gamma ray instead of the formation if I delete the the formation here and then drop on the gr curve into the color column we then see that we've got a range a sequential range here from about zero up to about 120 and that gives us our different colors for the variables again it doesn't seem like we can change the color map that's used by default it just comes with this white to blue color map if we want we can also filter by values so if I take the the gr curve and drop it into the filters section we then have a slider here unless it's just taking the values for the entire curve or measurement we can see we're going from over six up to 804 so at the moment it doesn't seem like we can control these manually in terms of being able to type the values in but we can adjust these sliders so if I adjust that slider all the way down to say around about 80 API and then click on the little tick we can then see that their color scale has changed and points have also disappeared so let's just remove these reset the chart basically so that we've got n Phi and row b now the nice thing about this is we can have multiple subplots without having to specify what I want two rows by three rows uh like what we would do in matplotlib we can in fact just come in here and click and drag gr and drop that on the x-axis and then we'll see another subplot appearing on the right and which is our gamma ray and we can do the same with pef we can click that and then just drag it on to the y-axis and then we'll see our two by two grid so we'll have our N5 versus row b up here and then gamma ray versus row b here and the two subplots at the bottom here are pef versus N5 and gamma ray on on the this one on the right if we want to create a new visualization we can click on the new button up here in the top and then we get a new chart so for example maybe we want to view a line plot instead of a scatter plot so we could take our depth measurement and put that onto the x-axis and then we could put on say the raw B curve into the y-axis and again by default it sets aggregation on which we can easily just uncheck by clicking this button up here and then we have our scatter plot versus uh of Ruby versus Dev which is not quite what we want we want a line plot so let's first resize this so that we've got a bigger plot to look at so once that's updated if we can change the scatter plot to a line plot by clicking on this little orange button up here by default pigwalker sets it to Auto however there are various other options in here such as bar charts line charts area charts uh Trails scatter circles rectangles are so your pie chart and then a box plot at the end so for this one we're just going to select the line plot click on that and now we have our line plot over row B versus depth and that's the very basics of using Pig Walker to analyze a Well Log data set there's so much more functionality and it's definitely worth exploring if you enjoyed today's content be sure to give it a thumbs up and if you want to see more content from this channel be sure to click on that subscribe button and ding that notification Bell so thanks for watching and until next time bye for now
Info
Channel: Andy McDonald
Views: 10,553
Rating: undefined out of 5
Keywords: andy mcdonald, geoscience, exploratory data analysis, data science, jupyter notebook, exploratory data analysis in python, pygwalker, gwalker, pygwalker in python, data visualization, data visualisation, data science tutorial, python data science, eda in python, python tutorial, data analytics, python programming, data science project, data preparation, exploratory data analysis jupyter notebook, jupyter notebook tutorial, machine learning, data wrangling, eda python
Id: 3WjWeH3HIMo
Channel Id: undefined
Length: 10min 59sec (659 seconds)
Published: Fri Mar 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.