Automatically Visualize Datasets with AutoViz in Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on guys welcome back in today's video we're going to learn how to automatically visualize data sets with the proper plot types and chart types using just a single line of code in python so let's get right into it [Music] all right so we're going to learn how to automatically visualize data sets in python today and we're going to do that in a single line of code now when i say we're going to do that in a single line of code what i mean is that we're going to do the visualization step in a single line of code so we're going to have one line of code that visualizes our data set automatically it produces multiple plots and graphs of different plot types to automatically visualize our data set but we're still going to need some extra lines to load the data and to prepare the data and so on but the actual visualization step will be one line of code and for that we're going to need an external library called autovis this is going to be the core library of today's video this is going to do the automatic visualization and in addition to that we're also going to install pandas scikit-learn and seaborn those libraries are only going to be used to get some data sets and to feed them into autovis so you don't have to install those if you have already a csv file for example your data set that you want to use uh but the autovis is what we're going to use the autovis libraries what we're going to use in today's video so the first thing is we're going to open up a command line and we're going to type pip install autovis like this in my case this is already installed and then we're also going to install now pip install sklearn only for the data sets we're not going to do machine learning only for the data sets also seaborne only for the data sets and uh pandas in order to be able to combine uh columns into data sets and so on um now before we get into the code one more thing to to mention here you should use a jupyter notebook in today's video usually i say i'm going to use one and you can use whatever you want but autovis is actually programmed and developed for ipython notebook so make sure you use an ipython notebook for today's video if you don't know what jupyter notebooks are and how they work i have videos on my channel about this i have a jupyter notebook tutorial and also a jupyter lab tutorial where i show you how you can work with these things so make sure you have um a jupyter notebook or an ipython notebook that you're working at so the first thing to want to do is we want to we want to import from scikit learn a data set by saying from sklearn.dataset sets plural import and we're going to go with the iris data set so the load underscore iris data set which is just a function that loads this flower type data set and we can say here now data equals load iris like this and then if i run this and i show data you can see that we have here the data key with uh the values and then we have also uh the target which has the target class classes basically the labels and then we have the target names those are the classes that we can have and we should somewhere have the feature names as well but those are not going to be too relevant here we we don't really care about the actual data we just want to see how we can automatically visualize the data and what we're going to do now is we're going to say here import numpy oh by the way numpy has to be imported as well so just for the sake of completeness you go here pip install numpy but if you install pandas you should have numpy installed already so import numpy smp and then we're gonna say here full underscore data is equal to np append data data so we get these features here and also data dot target and we're gonna reshape the targets here to be negative one one so we we basically flip them so that they um can be appended to the data and the axis on which this is going to happen is the axis one so we're going to do this column wise and then we have here the features and the class uh in one in one array here and now what we can do is we can say import pandas as pd and we can say that the data frame is just the pandas data frame um of the data and then we're going to say that df columns are going to be data feature names plus and here we're going to say type or class whatever you want to call it and now you can see what this data frame looks like now all this is just preparation this is not visualization so if you have this done already if you already have a csv file you don't need to do all this this is just taking a scikit-learn data set and turning it into a simple pandas data frame which we have here now we have the four features and we have the type the class so to say and what we need to do now in order to be able to use that with auto business we need to export it to a csv file so two csv and let's call this now uh iris.csv whatever and now we have the csv file here as you can see and all we need to do now for the actual visualization is we need to import from autoviz dot auto this underscore class import auto vis where's the auto completion come on out of this underscore class um and then we're gonna say here a matte plot lip uh inline which basically tells the notebook to display the plots in line here uh and then we're gonna say here av equals autovis class and now the actual visualization step is just one function call it's this equals and then av dot autovis like that and here we specify now the file name iris.csv in this case and the separator is by default a comma so that is that now this is still a single line of code if i write it like this so i didn't lie to you there it is the single line of code for the visualization and if i now run this here uh this will create all sorts of plots of the data set you can see features plotted against each other you can see how if there's a correlation or something like that you have box plots you have distribution plots you have probability plots all that um we have violin plots we have this heat map here so this correlation uh heat map i think it's a correlation heat map right not sure yeah i think so yeah because we have the ones here in the diagonal but as you can see we didn't do anything special here we just passed the data frame and now we have a full visualization of everything that we could uh simply compare here now of course we can do some more complicated plots as well but we have box plots distribution plots features plotted against each other very very easy to create these visualizations with just a single line of code of code and now we can go ahead and also specify some more parameters so what i can do here for example is um let's maybe go ahead and copy this here i can now go ahead and add some parameters so i can say for example that i want to use um i don't know a chart format i want to have the chart format html so i want to have it exported as html instead of plotted um inline and then i can say for example if i want to i can say max rows analyzed so i can say max analyzed is 200 for example so i can limit the amount of rows that we use for the visualization the same can be done with columns now i'm not going to do this here now because there is no purpose there's no reason to do that here but you can do something like max rose analyzed max calls analyze and you can limit the amount of columns and rows to look at when it does the visualization then it focuses on just a few rows that are important um and let's just run this now as an html here and what happens is it doesn't show me the visualization here i have now this directory here and i have the individual html file so i can open this one for example i can say i trust it and um does it work come on let's open another one i think this is just uh jupiter lab not opening the respective files so let's uh actually i'm going to open a command line i'm going to navigate to my pi directory i'm going to go to the current that i'm working in i'm gonna say explorer uh and then i'm gonna go here there you go heat maps and now you can see we have an interactive heat map here right and i can do the same thing for um distribution plots here i can look at the distribution plot i can look at the scatter plot so i can plot for example the width against the width this is just a line um and then i can choose all these settings here you can see i didn't do anything actually that required any thinking or any brain power i just said visualize the data set and then it created uh interactive html files or it created um inline plots i think we have a bunch of more settings for example i think i should be able to say server and then it opens them immediately here in the browser um this is also something that works and of course i can also say png jpg or something like that so i can say png and then they're plotted i think inline and if i set now i'm not sure if it does it already no it doesn't save the images if i say here though keyword verbose equals 2 like that let me just zoom in so that you can see it verbose 2 if i say that it should also save the images um as images here so you can see we have an image with the plots we have a heat map we have the pair oh this is actually the html file uh we have the violin plots and all that so this is also something that you can do here and this works with different data sets now one thing that i want to show you here is how crazy this can get with uh another data set so what we can do for example is we can say import seaborn s sns and then i can say data equals sns load data set and then i can load the titanic data set i think this was the one that produced a lot of plots so i can say now data dot 2 csv immediately because this if we look at it this is already a pandas data frame so i can say 2 csv titanic.csv um and then essentially if i take the code that i have here and i change just the file name titanic and let's say i want to keep the default format so i'm just going to do it like this here i'm not even sure if this is going to happen fast enough um was this actually the crazy data set i'm not sure but you can see that we now have also bar plots for the average values here uh i think the crazy data set was actually a different one let me see if it was the breast cancer data set but you can see that we have different plot types depending on the data it finds so now let's see from sklearn dot data sets import load uh breast cancer and then we should be able to just let me collapse these cells here so that we can actually scroll through the code i should be able to do the same thing essentially so full data equals whatever it is um so data equals load breast cancer then we have the full data whatever it is and then we have this here so we say make a pandas data frame out of it then we can say classification because the classification is malignant or benign and then we're going to export this df to csv cancer.csv uh what was the problem here i think the problem here was that um the problem was i think that it had a different type which which line was this plus classification so i think this here has to be turned into a list right there you go okay so df is this year and we already exported it to csv so if i now go ahead and do this come on let me just remove the c here and if i change this now to cancer csv this should produce a lot of plots because we have a lot of different variables and it also exceeds the limit taking the top 30 variables you can see it's still loading nothing happens um this produces a crazy amount of plots so depending on the data set you will see a lot of graphs you can again as i said limit with the max rows max columns analyzed i'm not even sure um how long this will take this might take quite some long time but yeah this is a nice way to just go ahead and say i have a data set i don't want to explore it i don't want to call the info function to describe function look at it look at the documentation just feed it into the auto base and see what happens see what features you have see what distributions you have you don't really have to think at all about the data you just throw it in there and see what happens maybe you want to limit um the columns or the rows now we have the plots here you can see a lot of different plots a lot of features plotted against each other and then if i scroll down more um oh this this quite a lot of plots as you can see here but if i go down even further oh my god this is huge here you also have box plots and distribution plots uh yeah here it makes sense to just limit it you have also a huge heat map but here it makes sense maybe to limit it we can actually try out the command i think we should be able to limit here by saying max calls analyzed equals two for example then we only have two columns analyzed here the two i think more most important columns here uh taking top two variables uh i can also go with five and then we don't have so many plots but yeah this is how you can limit that you can also limit the rows but i think this is a nice way to automatically visualize data sets in python so that's it for today's video i hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye [Music] you
Info
Channel: NeuralNine
Views: 16,534
Rating: undefined out of 5
Keywords: autoviz, python dataset visualization, python visualize datasets, python automatic dataset visualization, python visualize datasets automatically, python autoviz
Id: 68T3timvdt8
Channel Id: undefined
Length: 15min 38sec (938 seconds)
Published: Thu Jun 16 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.