Data Scientist Ranks Every Data Visualization

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey y'all it's andrew couch here and this is not an actual tidy tuesday video instead i wanted to make a video where i rank um common data visualizations and i actually got this inspiration from doing some movie tier lists with a friend of mine and while i was doing it i thought it'd be kind of fun to do it with the data visualizations so i created a little tier list from the teamrecorder.com with just a bunch of uh data visualizations i actually got it from the r graph gallery um in the chart types so i figured like this is a pretty good um kind of like array of data visualizations are pretty common so i'm pretty bad with actual data visualization names so i'm just going to keep this on the side so i can kind of say the official name and then i'm going to give my thoughts about the actual chart type and where i see it uh being ranked as uh as for the actual rankings i think i'm going to kind of say it's like ranked by usefulness and effectiveness of a data data visualization but as we know most data visualizations are important and they all have you know their own use cases so this is definitely not like a an objective measurement where you shouldn't be using a certain type of data visualization it's definitely a case-by-case basis so unfortunately some of these charts are pretty some of these pictures are pretty small so uh i'll definitely try to refer to this the my little um uh chart table right here so we can kind of see in a little bit bigger format but i know these charts pretty well so we can first start up with this um so right here we have a little scatter plot um let me see if we can find it right here so we have a little scatter plot again and i think the scatter plot is basically one of the best visualizations you can have and i would definitely rank it as s-tier as is i think when you're looking at numerical data or just continuous data in general you generally will make a scatter plot to do some type of pairwise um relationship so i think the scatter plot is just a great plot to use um as is okay so next is this um was this like a uh kind of a dependency or what kind of plot is it called it's called a uh let's see here it's a uh let's see a heart a dendogram so it's called a dendogram and they look like this and i kind of get a dendogram because there it's all about hierarchy data you know you have kind of like a tree and then you can see how these end nodes are related to this branch and it's really just essential branch in general and although i think it is kind of useful when you're looking at like network analysis and stuff like that i i don't think it's a very common data visualization technique and frankly it's a little confusing to actually visualize um so i'm going to rank this actually kind of as like a c i don't think it's really commonly used um as is uh just because there's a little bit better ways to do it um for me i definitely like a like uh a a uh sanki or sinksy gram diagram right here but i'm going to rank it right now at c um just because i think in general it's kind of hard to plot out and uh in like gdpr so you have to go through other packages and also uh it's kind of confusing to actually look at um just from like a layman's term like if you're not really into data even when you see that it's a little confusing okay so this is the next one is the heat map which i think heat maps are actually pretty underutilized i think heat maps in general are pretty useful to look at um especially when you're dealing with a lot of data so when i have a lot of data generally i'll try to bin it or group it and that'll cause that'll allow me to basically uh um see if there's any relationships that i want to look at so a heat map when we bring up heat map heat maps are definitely pretty easy to make you just use gm tile and use the fill and you can add a lot of stuff to it so not only is it useful from like you know using uh two groups but it's also kind of useful because you can condense it both on like a categorical variable but also numerical variable additionally if you're looking if you're working with a lot of geospatial stuff um actually heat maps are pretty useful uh because you've kind of been in these little things and you can actually see um a lot of stuff going on so i actually really like heat maps i don't show them a lot just because uh i think heat maps are kind of uh usually i'll do a scatter plot but he maps in general i'm gonna rank him as an a tier okay so the next is i believe like a was it a a bubble plot maybe like this thing right here um let me pull up a bubble plot um okay oh actually it's maybe it's called a uh let me find it um yeah it's called i guess a circular only circular packing or circular tree map so for me i i generally enjoy um these type of plots like from a data journalism from a data journalistic standpoint i think they're really compelling uh you seem a lot like the economist and stuff like that but in general if you're doing like an eda i don't think they're very useful or they're for the amount of effort you put into creating these like um these plots um it it does take a lot of effort to do it and you'd probably have to do it in like d3 or something like that so in general i think these are actually better than say this plot right here but i i wouldn't do it every day in my day-to-day uh edas that being said um i think another variation the plot is like a be swarm plot um which are very cool so i think i would probably rank this at least above um um this plot so i do think these think plots are really cool gdpot does allow it's kind of like a gm jitter but it's really not the same thing okay so for this plot which i believe is was like an area line chart right or a stacked area line chart and i i really like these charts uh not only because it's a good way to compare multiple groups from like specific like a time series basis but it's pretty easy especially when you do a lot of reordering with the actual colors to make a pretty good insight and kind of summarize all of your data into just one simple plot so you kind of see an overall visual trend and you also see you know what uh components are really driving an overall trend so i'm actually going to rank this as maybe b tier you don't see them every day at least in my work but i i generally think these plots are really cool um one variation that i actually like is the um like a stacked column chart where it's basically like this but instead of the um the lines it's actually like a calm but in general these plots are very good and they're very compelling uh to use when you're trying to get some type of insight from it so these plots right here um to my left is i think called like a chord a chord plot yeah a chord diagram and i i think chord diagrams again are one of those plots where you'd probably see like in the economist or from a data journalistic standpoint but when you're actually doing these types of plots say doing an exploratory data analysis or just in data science in general um it's kind of hard to create i do think these are very cool um but you know there's a lot there's the use case for is always about like networks and relationships and i think that's really useful and i think that if you were to create a like an article about it you can write a lot about you know how you can see these flows of relationships and their you know degrees of freedom uh with their relationships between you know multiple groups but if you have this plot say like in a presentation to like you know stakehold business stakeholders it takes a lot of time to kind of figure out the so what of of this plot right even when i'm looking at this i think this is from um like the gap minder package maybe or something like that but again it's something where yeah yeah we can see south asia and west asia there's something connected to it but i don't really know what to really get out of it from like the other other relationships right oh so through migration so i think that's kind of interesting but again it's something where uh it takes a lot of time to look at and obviously you can see how the arrows can show those relationships with it with each other but again it's it's a little confusing and you wouldn't want to do it every day or you wouldn't like create that um you couldn't create that visualization pretty very fast at least in my opinion so again i'm going to probably rank this a little bit above this um maybe maybe below uh the bubble plot the this chart or the circle tree chart um so you may notice right here for this next plot we have two of the same scatter plots this scatter plot or one of these scatter plots is actually like a a uh an animated scatter plot so one of the famous um visualizations using animation is like i think like the gap minder of like um life expectancy or mortality rates and it shows how the trend of all these countries changing throughout time and how we as an entire society has have uh lowered our mortality rates and it's very interesting it's very compelling and it's a really good story and good presentation but animated charts just are not good i i really don't think anime charts have a place in a lot of like data science workflows um it's mostly to do one centered presentation and even that uh there there are a little bit better ways to do it one way to do it is mostly kind of like a uh a basic scatter plot and just have a and b and have like a connected segment and you can see how like hey this this this point went to here in 10 years um and that makes it a little bit easier especially when you have you know a lot of frames or a lot of animation time in it where like if someone stops looking at it and after 10 seconds you kind of lose your train of thought you have to go back and look at it again it's not the most the best way to actually look at and gain some insight but when it works it works very well um so that just because it's like a d tier it doesn't mean it's a bad visualization it's just it's really hard to pull off especially in a data science workflow so here is your your simple like line or time series chart i think um oh i actually have two of these and i for me obviously you're just gonna put it on the the s tier like any type of relation charts you know these charts are simple uh they're simple they get straight to the point you can visualize an overall trend in seasonal uh seasonal patterns it's a solid chart uh i don't think there's much to really discuss about that um i will say i think a problem with a lot of line charts is there's a lot of noise and a lot of uh a gridline stuff that which kind of takes away from it one thing i guess i will say about a good line charts is to be aware of your scale of the y-axis so sometimes people do like a log on the axis sometimes people won't start at zero they'll start at some other axis uh some other number and i think kind of making sure that you you tell whatever viewer is looking at the chart to like understand the scale uh so that he or she won't be confused okay so now we have the actual bar chart and for me my the bar charts are my favorite chart in general so i i would consider it the best chart um just because like i you you deal a lot with you know categorical variables and you you generally do a lot of accounting for me you know when you think about data science you think about complex neural networks you know crazy asian analysis um you know a lot of like rigorous statistics but a lot of it is just doing simple accounts and simple accounts can give you a lot of insight towards your own data set for modeling towards insight and towards growth what's driving things so in general like uh column charts or bar charts are great for me personally i prefer the the flipped comp bar chart where the bars are going horizontal instead of vertical just because you can stick your access names a little bit longer um but again when you don't have that many uh things you know the classical bar chart is is great so this chart right here is a 3d chart um let me see if i can grab an example from my little uh reference so let's see it's like i guess yeah 3d chart right here i'm actually gonna do that i think it's just better for me to search it um whenever i need it so so we have a 3d chart and 3d charts like this are terrible that you need to have three charts that you know have a reason to be a 3d chart for me right here this is just this would just be a bar chart right even though it's a pie chart it should be a bar chart because there's no the the scale or the the dimensions don't mean anything right here but for this right here where you're doing like you know maybe like a cluster analysis and you want to see a lot of interactions i think a 3d chart is actually pretty useful um especially when you're dealing with um surfaces too with like geospatial stuff and just in general when you are trying to do an eda with clustering i think 3d charts are actually very useful um i don't want to use them too much because i use i just do a bunch of pairwise charts but sometimes uh i'll do a 3d chart and those are always very interesting to see so i think in general they're pretty good i'm gonna put it right there okay um so this right right here is i think called like a was it a spider chart um let me see where it where it's called what it's actually called um yeah a spider or a radar chart um in general i think spider charts or radar charts are are interesting um and i think a lot of times you could say this could just be a a bar chart right because it's just categories and their values um a lot of times i would say is that sometimes you want to compare multiple groups by multiple categories i mean having a bunch of spider charts like this or that um is actually pretty cool um i think for dashboards it's pretty cool to have just because it looks a little bit more fresher and you can store a lot more data um i guess like per square inch of information um with with a raider chart so i think you see a lot of radar charts in like sports when you're showing like you know in baseball like attributes of a player and or attributes of like teams and trying to compare the teams uh so i think raider charts are pretty good actually i think they're pretty underrated they're very hard to create in ggpot but i still think um they have a lot of their use cases so i'm actually gonna put this kind of on the bottom see just because you know you're not gonna make a lot of them uh in your in analysis but when you do need to do it raider charts are pretty much perfect so nothing too nothing really too negative about radar charts so this chart right here is called a lollipop chart and i used to be a pretty big fan of lollipop charts because uh from a data visualization standpoint you know they take a lot of i guess the um yeah it prevents it it solves a lot of issues with a calm chart where when you have a lot of groups it can kind of be a lot of information to take in so when you have a lollipop chart it's you can have more columns basically and it won't be as distracting for a a reader audience additionally you since you have the little point or the little like i don't know the end of the lollipop you can actually add some more data to it which can make it pretty useful so you kind of have a column chart and you can also add a little bit more to the column chart um so the flip to all or the the football always pretty good um a variation of it is the the dumbbell plot we essentially have two lollipops um going on both ends and i think those or the dumbbell plots are really good obviously for comparing things among multiple groups so if you're doing like anovas and stuff like that i think lollipop charts are very lollipop or dumbbell charts are really good i'm used to data visualization and something you really can't do with a bar chart um very well so i i'm a huge fan of uh lollipop charts i actually might put it above the heat map so just for now i'm probably gonna go by here i'll go through here again and maybe do a little bit of rearranging and talking about why but i think right now a lollipop chart is is definitely a tier chart okay um so this chart right here um i need to figure out what it's called it's i guess it's called a circular bar plot um i i don't like these at all i think they look cool so again very useful for data visualization but it's just needlessly messy um like it's kind of cool how we have like different subgroups right with d a a b c d among these and you can kind of see it right here but it's kind of hard to compare like mr 11 and group b versus you know group c's mr 48 like how is this one bigger you do a lot of um i'm searching and i think this is very cool for maybe like a a cover of some data journalistic data journalism um magazine but from just from a usability standpoint it's it's way more style over function um i think it's probably the worst spot to use i don't think there's any use i don't think there's any reason why you would want to do a circular bar chart over like a faceted bar chart unless it's pure aesthetics so um yeah not a big fan of it um i i i don't think it's a very useful chart in general um the donut chart so i i there's a lot of uh of discussions about you know in general if you're into data visualization you know bar uh uh pie charts are the worst part pie charts are terrible uh they're bad for comparing things the only time a pie chart is kind of useful is when you have like one large group with like two other small groups but in general a a pie chart can be a bar chart that being said i think donut charts are a little bit better than pie charts but just only slightly um i don't think they're very good i i do think they look cool on on dashboards but they're not they're they're still [Music] i i'm not really a fan of them i'm gonna put them basically below the animated charts because sometimes you want a little bit more visual uh a variety you know you can't just have but you can and i mean i i would actually prefer a dashboard that's just all bar charts but sometimes you know you might want to do maybe like especially like finance you know maybe put like a donut chart right there uh and and that that's fine you know it's it's nothing i'm gonna get really angry about but you know i think most people would rather just have a bar chart um but you know vps probably like it so with the pie chart i've already kind of gone through my tired of a pie chart i think they're pretty bad i think they still have a like one very specific use case so it's going to be obviously below the donut chart because whenever you think even if you want to do a pie chart for its use case a donut chart will be better but i still think it's a little bit better than the the the circular bar chart just because like this could be a pie chart and it'll look a little bit better but realistically they should all be bar charts okay so this chart right here i i didn't really see it or maybe maybe i did see it uh no so this is like a pairwise chart uh which basically shows all the pairwise interactions and distributions uh you can see it in gg ally um pretty well so let me see if you have a little article so gg pairs uh for me i love gg pairs i think generally when i'm working with a lot of columns and you kind of are frozen or you have so many so many features that you want to look at and understand the relationships but you don't really know where to start right you know you don't know if you want like oh maybe i should plot out the distribution of this this column or maybe i should see the uh the scatter plot of this column between these two things uh i think gg ally is basically the thing you should be doing because you get a high level summary of all the relationships all the distributions between like you between your columns you're able to you know color it and and look at all the stuff so basically this is like a like one brief eda in one plot um it does take a while but it's it's definitely worth doing and it really um alleviates all the manual plotting work you actually have to do so i consider it s tier it's not better than the line chart or the uh the scatter plot because those charts basically make up the pairwise plot but it's still pretty good um i don't know why i have two of these i think i'm just gonna put them right next to each other uh because i think these are the same thing yeah they look like the same thing okay um area charts so i think i i in general i think area charts are kind of interesting uh when area charts like this one right here i i i'm not really a huge fan of it right here unless if it is obviously if you're doing a a density um but i think this is mostly just like a long chart with area it's not necessarily a density plot so for that i still think it's pretty good what i will say is like when you have it kind of crossing the axis like this i think these charts are very useful um and a line chart won't really capture this kind of a relationship where you can kind of see the actual periods of like a positive value in periods of a negative value it's much easier to look at it with an area chart than it is with a line chart so i still really like them i don't see them a lot but just because i don't see them a lot doesn't mean that they're not useful so i'm going to put it maybe i'll put it right there okay um so this is right here like a hex map and i really like uh hex maps so let me grab that and hex bin map kind of solves a lot of geospatial problems so a lot of times when you're visualizing data on a geospatial or like you know just anything to do with like locations a lot of times you see a relationship that's basically tied towards population or or so area and stuff like that so a good i think that these uh hex bin maps solve is that they basically can uh convert all of the little states or or regions that you want to do into a the same area or same size of of a point so this makes it so you can kind of look at it and not be i guess uh skewed towards larger area larger areas right so with texas texas is a huge state it's in california but once we normalize it to these hex hexagons and then we normalize it by population we get a clear picture of what we want to look at like say marriage rates so i'm a huge fan of uh these hex bin plots i actually started off my data visualization journey creating hex spots for iowa counties uh so i they always have a a uh you know a place in my heart so i guess the big issue with a lot of these hex plots is that there is not a lot of hexbots uh available and obviously there's probably a plot there's probably some type of uh uh library that can kind of condense it uh sort of but you know a lot of these state ones uh or accounting things you kind of have to make it yourself by hand and that takes a long time so if you have if you find a plate a hex map that meets your you know your your data that you need need it for then it's extremely useful but sometimes you just don't have it uh and that can be a problem so i'm going to put it right there yeah i'll probably probably just because you don't see them too often and a lot of times you won't have the ability to do a hex map but when you can they're very nice um they're very nice to have so so the next one is like a network plot i believe um let me see if i yep we have a network bot um so yeah networks i think they're very cool they're very interesting uh and there's a lot of data visualization theory that goes into network plots so you know when you're plotting on a network where should a be relative to b or where should e be relative to c you know so unless if you have specific coordinates you know if you're working with like for me transportation where we have these coordinates it's kind of hard to determine that and sometimes they'll just do it randomly like that sometimes they'll make it a circle um and sometimes there's different algorithms that will do it so there's a lot of data theory databases visualization data visualization theory that goes into network plots uh i find them very useful in my in my work uh so it's definitely a thing that you should always have conscious in your brain uh when you're when you're learning data science but you don't see them too often but they are very useful and there's a lot of theory that goes into it so i'm actually gonna put it pretty high just above the uh and probably below it uh the heat map so they're very cool plots ggpot doesn't have a great support for network plots but igraph does so if you ever need to do a lot of network analysis um going into the igraph library which i believe is like a subset or a you know a json ggplot library okay so the next one is the sankey sinky plot uh i find cindy plots to be very cool but you don't use them a lot uh i think if you're doing a lot of like network like process analysis uh then a sankey plot would be very useful so maybe for like more managers and stuff like that or just in general anything that has to do with like a process uh sandy pots are pretty useful um you see it a lot with like you know your salary and where it goes to and stuff like that you see a lot on like our slash data is beautiful that being said i don't do a lot with these stinky bots i don't even know how to make one in uh in our so although i think it's good i i don't use a lot and i don't find it to be that uh i don't i guess i don't find it to be useful in my own work i can see how it could be very useful for a very specific task um so i think i'm going to put this maybe below the 3d plot on honestly i might switch i'm going to switch these out right there because i think the does the more thing about the more think about how the writer and spiderbot is going to be used a little bit more in a data science workflow then i guess thank you but okay so this next plot i believe is called like uh was it a parallel plot or something like that yeah a parallel plot i actually and parallel plot is basically a line chart but you have on your x axis like categorical variables um i find parallel plots to be actually pretty useful uh when trying to visualize distributions uh among many samples characteristics so right here they're showing it off like a uh was it the iris data set and you can kind of see how each line represents a sample and the y-axis represents that features value for that sample and you can kind of see the variation among samples and also the variation among species right so i actually find these pods to be very useful and i think honestly underutilized um a lot in our data science workflow so these spots are very i think they're very nice especially with like uh if you're doing a bayesian analysis these are plots are pretty useful to look at since you're always trying to look at like you know um the variance of a a of a class or a group so i actually really like these plots i do make them probably like once a month uh when i'm doing an eda so it's definitely something that you should consider adding to your toolbox to use so i would actually consider this on the top of a b tier um the next one is maybe like a what it's called like a um uh what's it called a dense yeah density 2d bot so density 2d plots again i think they're pretty cool i i think they're you they're basically they're they are basically the same thing as a his as a heat map but it has more to do with actual areas so like if you're doing a lot of geospatial stuff it's pretty useful so for me i i still think it's it's good that being said i i have a preference towards uh heat maps in general just because the binning is kind of weird in ggpot and a lot of the stuff is kind of weird but we can see how there's a lot of variations uh with the uh hex was it the dense 2d density plots um and you can you never get an insight so like what's going on in that hot spot or you know stuff like that uh i i don't like contour plots as much with these lines uh even though like i think most data scientists and analysts can read it i i think i'd be hesitant to show it in a in a presentation because a lot of times people will misread a contour plot and it's better to just have like you know the brighter the color the the more the value instead of like oh the closer the lines that the more the value is is increased um so yeah not a fan of contour plots so the next is was it a uh uh was it clello a coral choropleth yeah choropleth so again if you're doing a lot of plotting and geospatial stuff you're gonna do this um generally i think i would just call it like a pot of a map and stuff like that but one of the main problems that you face with these uh coral black plots is that when you have these colors you can see like your eye kind of darts towards the larger um countries and not really the actual colors of it so it can kind of do a lot of misleading it can't kind of mislead you on it right which is why we have you know the uh the hex hex maps and stuff like that so i think they again they are useful i i'm not saying they're not useful but i would always try to prefer a a a a hex bin plot so now we're actually at the kind of the nlp stuff which is the word cloud um and for me my friends actually know my reputation on um word clouds and the thing is i i really hate word clouds i think word clouds are just terrible i i think they're kind of weird and i think that they don't display a good use of data visualizations and what i mean by that is one right here there's way too many colors the colors don't obviously mean anything it's just to separate and distinguish the words but the size of the words is very hard to distinguish right so oil i guess is the biggest word but a lot of times the size is also dictated by you know the amount of letters a word has which makes it very hard to compare you know tokens with each other so maybe it's it's cool for maybe a design standpoint if you're trying to make a some artwork but for me from a data data visualization point it's it's just terrible uh i would rather have a column chart that is representative of that word right and then maybe have some type of like lumped other column so yeah not a fan of word cods i think they could be interesting if you want to do like maybe a dashboard or something like that or just kind of something for someone to kind of browse but for actual insight it's not good um i would actually consider it maybe on the tier of like a pie chart so this plot right here i believe is like uh kind of like a network thing with uh a connection plot um a connection map i think this is super niche uh it's basically the same thing as a network plot but you know again use with maps i think it's a fine plot maybe a little bit above the hierarchy plots or whatever but you're not going to use it every day it's something if unless if you're in like transportation you'll use uh so i don't have much to say about that okay so this spot right here i think it's called like a joy plot or a ridgeline plot which is basically based off the uh joy division album so um these plots are getting more common now in uh in in data science right so it's an unknown pleasures album um i am a huge fan of a ridgeline chart because a lot of times you're trying to compare distributions of some type of parameter or some type of value among multiple groups and although you know ggplot has a thing where you have the colors and you can you can decrease the alpha having it as different lines makes it way easier to compare values um i'm a huge fan of it i think it's honestly one of the better s-tier plots that i use most most of the time so if you're doing bayesian analysis ridge ridgeline plots are extremely useful if you just want to compare distributions among groups or among multiple columns uh again ridgeline plots are great so i believe this is a was it a density plot right let me see yep a density plot oh do i not have a history oh no i do uh i like density plots a lot um so right here this is a time where you know you don't really need a ridgeline plot but right here yeah a ridgeline pot would be pretty pretty useful because it's kind of hard to compare these things they're just kind of stacked together or you can fasten it like that but i i really like density plots if you have uh things like this this is a time when ridgeline pots won't really make any sense so a mirrored density pot is pretty cool um yeah in general i like densities um sometimes i do prefer some binning just remove the tail ends but i do enjoy a density plot so i'm going to compare it to pretty top of eight here um and then the histogram right here uh histograms again very similar to a density plot but we have some binning uh i enjoy a a good histogram i don't like these histograms where you can't see any distinguish of the bins so like i generally say a color equals white uh so we have like kind of like this style but in general i i i do enjoy uh using i do like uh histograms in general sometimes i i prefer not to have any actual bins and i'll just do a simple bar chart but when you're doing bar charts and histograms they're relatively similar except histograms are bending it in in groups so uh i i definitely consider it to be more at least top of uh the density plot i think for me personally i use density pots a little bit more than histograms but i i know a lot of other people will use histograms for density plots so uh nothing much to say um what is i'm curious what this is oh yep this is a bubble plot so bubble plots are just scatter plots where the size of the point is increased and i'm pretty against bubble plots or i'm in general i'm against changing the size as a variable because it's kind of hard to compare two circles uh for values uh because a circle twice the size of another circle doesn't really look exactly twice because you have like radiuses and stuff like that um so if you are using bubble charts it's more of just kind of a general trend not to have a direct comparison right so it's like oh these two these these two bubbles right here are larger than the other ones what are those bubbles but if we want to compare how much bigger is you know that bubble compared to this bubble it's not a great thing to do but again they're very common uh they just can't be misused in the wrong manner which you can say about any plot but i think these spots are very easy to mis misuse and misinterpret where you can kind of um where it's harder to compare direct direct samples together okay so this plot right here i'm uh i'm actually not sure what this is oh an arc diagram that's what it's called um i i think these plots are kind of pretty not very hard to figure out what's going on uh i if i guess it's supposed to be like a network plot where you can see how like a is being has a relationship with all these you can see how b has a relationship with a and e i think that's cool um maybe but again this this plot is harder to understand compared to this plot uh so i don't really think i think again it's more the style over substance stuff i'm gonna put it right there it's still better than you know a donut chart no actually it probably isn't no i'm gonna put it below uh below that it's not as bad as this circular bar chart but it's pretty bad i i think it's it's it's almost like a pretentious like a data visualization where i think it's not a good way to analyze data okay um what what is this um oh yep this is a hierarchical edge edge bundling plot um i think this is pretty interesting i think again it's pretty much a variation of uh of the uh of the chord quarter gram i think this is a little bit cleaner just to see like oh we have a little group so i'll put it above right here but again i don't think maybe i guess these are pretty similar so i don't know why they would be this one is better than that one actually might switch them where uh oops or this is a little bit better than that plot but again uh i'm not really a big fan of them i don't use them too often too so maybe i'm just missing something so the next one second last one is the violin plot uh i think violin pots are really cool because violin plots are essentially density plots flipped on its side and mirrored and one of the big criticisms of a density plot is that the mirroring effect kind of is confusing additionally it's it's pretty hard to read for someone who's not really familiar with the density plot but i like density i like the violin plots because it shows a little bit more detail of a gd plot of a box plot while still maintaining a lot of the clarity of a box plot that being said um one of the biggest criticism is if you need more detail on a box plot just plot like a gigi rig a ridgeline plot right from from there and i kind of agree with them i kind of think that violin plots were a hot trend for a year and then people realized that there's alternatives to it but i i still enjoy seeing a violin plot every now and then i just don't use them anymore uh i i definitely switch more towards ridgeline pods so i'm probably going to put them up to see i think they're above a sankigram because you'll probably see a violent pot more often than it's sinking but but it's not not the best plot not now that there's more modern alternatives so lastly we have the box plot um box plot shows the quartiles or and then you can also see the mean median and outliers i love box plots i it's probably one of my favorite plots to look at and there's a lot of use cases with them right you can compare groups and their distributions you can all see like oh would between two groups which one has like the higher median um that's very useful additionally what you can do is add in your gm points or so you can kind of make almost like a scatter plot or whatever um and i think instead of having a violin plot it's actually nicer to have just a box pot and then you put in gm jitter with a height uh at zero so the jitter the points can only go widthwise and can't go vertically so that way we can see you know the the summary statistics summary statistics summary statistics of the outliers quantiles median while also looking where is the distribution of all of our samples so i love box plots i think they're incredibly useful and i use them literally every day so i'm going to put them right above here i'm going to look at these just a little bit more just see if i want to change anything um yeah so animation chart i think i'm maybe i was a little too too cruel on the animation chart um i think this chart is pretentious but actually still gives some type of insight whereas the the word cod doesn't i do think i'm pretty against the circular bar chart and obviously we're gonna put uh the pie chart up below actually just because uh just uh because of data visualization i'm gonna put the pie chart all the way on the bottom um as for s tier i think bar charts are the best scatter plots are definitely good line charts right here are very good the ridgeline plot is it's extremely useful the box plot is extremely useful the pairwise plot is the thing that i i usually do a lot when i start my edas density and histograms are are very good i kind of yeah no i i definitely use ridgeline pods more than these guys and i use box spots more than the histogram stuff like that i think lollipop charts are again pretty good too networks are good area stuff like that i think one thing i'm thinking about seeing right here is a lot of these plots that i think are lower rated have to do with a lot of like relationships and processes um and for me i just don't do a lot of that stuff so i'm sure if we have someone who is watching this video like network analysis they might disagree with me but i think for a beginner you're not going to see that a lot um so yeah um here's my final tier list i'll probably i'll post the link to towards the uh towards the tier maker website so you can make your own to your list and yeah i'll see you guys next time and tidy on
Info
Channel: Andrew Couch
Views: 736
Rating: 5 out of 5
Keywords: Rstudio, Tidyverse, R Programming, Data Science, Analytics, Statistics, Data Visualizaiton, Data Viz, EDA, TidyTuesday, RStats, Data, Data Modeling, Tidy Tuesday, Learn Data Science, Learn Statistics, Learn Machine Learning, Machine Learning, ML, R Stats, R Studio, R Shiny, RShiny
Id: Nx7gs_GEswo
Channel Id: undefined
Length: 44min 28sec (2668 seconds)
Published: Tue May 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.