Get it Right in Black & White Episode 12 - Quantitative and Categorical

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
and we're live oh my camera doesn't seem to be coming through in the video uh-oh hang on i want to leave this meeting and come right back all right i'm back and my video is working yeah google me grabbed my video but now it's live on the stream all right let's see we are live welcome everyone oh wait testing testing looks like my sound was the sound was a little clipping there we go all right welcome everyone to episode 11 of get it right in black and white quantitative and categorical we've got a full house today uh we've got kostov here adil is here eric is here anita and myself welcome everybody are things going all right great great thank you all is well all is well um yeah the submissions for this week were quite cool um looking forward to step through them and maybe we can bring up some any questions that that have arisen through trying to do this exercise so here we go here's the forum submissions from last week um oh there was a typo in there make the menus work with species that was the goal so let's see what happened um let's see oh can you all see my screen actually i can't see your screen there we go sorry now you should be able to see it oh great yep all right so kostov uh had some had some great um discussion and i was so impressed uh that you were able to use the little snippet embed feature of his hub that's so cool so we can talk about the specific line numbers yeah this was really helpful to have the snippet in the chat as well thanks for the tip on how to do that nice yeah my pleasure i'm happy to see the feature being used it's a feature that's been there for a while we haven't really used it much but it's supe it comes in handy when we're you know asking questions about the code so the question was um you're trying to add the click event listener to the option element yeah see this is actually what i did too the first time i tried to get the menu working and i was also confused as to why it wasn't working but it turns out that what you need to do it turns out that what you need to do is actually add the event on the select element not the inner option elements and once you do that it seems to work but yes yeah did that work out for you custom yes yes that worked out so i was instead of use applying the listener on the options element and applying on the select element it worked yep excellent so yeah i think event.target.value works um and this is a great lead to some of the stuff we're gonna do today but the follow-on question was okay now that i have you know the column name back from the menu how do you get the text of it like the the label text and it's a great question um there are a couple of different approaches but my preference is to keep uh sort of metadata about the columns in a data structure that you define yourself and this is also something i'd like to do in today's live coding this way you can associate the value the label and the type for each of the columns which lets you then distinguish okay species is categorical and you could do this by introspecting on the data like checking the type but i like to you know type it out explicitly so that we have this data structure and to be clear you can generate this data structure by looking at the first row of the data so it's a nice way to decouple those two different activities to have an intermediate data structure like this and then to answer the question you could look up the text by constructing a map which is a data structure that's essentially a dictionary looking up key value pairs so yeah um yeah this was really nice i really liked the way you structured your metadata as a separate variable and looking into it has also found a subtle difference like when you said map i think previously we also used the map function which had a lowercase m right this map this had this uppercase m for sure so this uppercase m map that's the dictionary and the lowercase one is like uh a functional functional programming construct is that right exactly that's exactly right and i can see how it's confusing because they have the same name but yeah map is a data structure built in from es6 from i think it's from the introduction of es6 so it's a fairly new data structure if i want to know about it i just typed javascript map mdn which is the best place to look for documentation so this page here tells you all about uppercase m map what it does how to use it so it's it's different from regular objects in a number of ways but essentially it's the same and it has some nice methods on it hey felipe is here welcome felipe but yeah map you can create a new thing you can set keys and values for that thing and then you can check does it have a certain key all of this stuff is available on maps whereas javascript array map is a method on arrays which you can find out more about here it's it's indeed a functional programming construct where on an array there's a method called map you can pass in a function that takes as input one of the elements of the array and returns something and then you get a new array of just the returned values so yeah that's the distinction between the lowercase m map and uppercase m map here and i think we'll we'll end up using this data structure in today's live coding as great well this was really helpful nice so yeah here's the the submission oh i like the styling on the menus very nice and it works fantastic beautiful beautiful very nice work awesome let me just take a quick look how it's done nice see here's the trigger nice work so it says if x name is not species so for any others column except for species we use scale in here just like we did before but otherwise use a point scale and set the domain to data.map x value which isolates all the unique values and then point scale spreads them across the visible space so that's why this works nice job let's see what else do we have well assad also took a stab at this and went a little bit beyond as well uh this is quite impressive so we got sepal length pedal width using linear scales species works using the point scale and nice job adding the padding see it's not all the way down to the bottom and then also added an option to use a square root scale this is amazing or a power scale what is this so yeah different types of scales in use here it's brilliant so if we want to see what the scale does we can say okay pedal width linear and then petal width square you can see it's a square root relationship if i remember my high school math is this a parabola or maybe not i don't know but yeah very nice work i have question about this one sure are there circles on top of like many circles in one place it looks like that yeah it looks like there is yep you could see this the edge on this one is is sort of soft i don't know if it comes through in the stream but the edge on this one is really rough which indicates that there's a bunch of circles in the exact same spot yeah and this makes sense because there may be a bunch of data points that share the same value what's the point of square scale ah great question i think here is just an experiment so there's not really any any point per se except to figure out how to do it however the the place where square root scales come in handy is defining the radius of circles based on data such that the area of the circle is equal to the quantity in the data yeah i think we'll get that we'll get to using it like that some in some future stream where we use size but that's i think the main use case for a square root scale yeah and then there's a log scale which is not here but that's useful when the data follows a power law distribution and i think we'll use we'll get into that again also in the future but yeah great work here from uh well assad let's see how he did this just adding a type property to the various columns very nice band square square power and then in the scatter plot hook into that value and he even made a separate function called getscale which is very nice very nice refactoring so if it's in the the code is very readable if it's linear then use scale linear if it's log then use log if it's square then use square root otherwise use power scales or band scales so yeah great work let's see what else yeah there was some interesting discussion using the options array approach works oh eric has one look at this eric's here with us right now let's see what is going on here oh my gosh they're blue that's awesome and it does filtering look at that oh this is brilliant brilliant so it filters by the species you know each particular species eric you want to talk a little bit about um your experience with this sure yeah i uh i guess misinterpreted the assignment so i i was just wanting to look at um all the data but stratified by species this is awesome oh nice yep oh yeah flex box can be um kind of tricky and the way i did it last time was sort of a shortcut in a way to just overlay them on the top but yeah when you need more complex layout you need different different tools css grid is an amazing tool to use nice that's awesome and i see that the colors change as well now that they're now they're all blue but if you go to one of these it's it's red nice nice yeah this is awesome um so yeah let's just take a look at how it was done options so we've got all setosa color and virginica which is this menu here and then that's passed in to the scatter plot and then here's where the filtering happens nice so this is yeah this is great it uses the dot filter method on arrays and if if species equals the z value of d which is passed in through these accessors then include it very cool very cool and i like how you split it out into functions with dot call that's a nice approach very nice yeah it is kind of crazy yeah that's the crazy thing about javascript um and one of the reasons why i love it because it's it's got elements of functional programming and it's got elements of object-oriented programming and you can make it yeah you just showed that map yep slower and uppercase and like what yep it's it's a great playground to be in there's tons of tools in the toolbox so to speak yeah yeah it's getting complicated yeah it is it is it's a journey it's a journey of learning new things using them and this functional and object oriented programming is kind of confusing really and you don't know where which one starts and which one is right yeah and the other thing that's also interesting is how i don't know if this is a style right yeah it does get confusing when a variable has the same name as a function or a property has the same name as a function somewhere else yeah yeah it's totally a trade-off picking variable names but um the one piece of feedback i would i would have for this in terms of like visualization design is that when you when you switch between setosa and versacolor the dots animate and that's a cool effect however it communicates to the viewer that each dot is like the same in a sense you know what i mean it's like when a dot when a dot moves across the screen it makes sense if it's the same exact the same iris flower as it was before like this kind of animation makes perfect sense because when the circle moves its identity remains unchanged it's the same flower that it was however when when this animation happens it's actually kind of misleading because the flower is a different flower that's represented by the same circle as it moves across the screen you know what i mean well i think it's just a matter of you know noticing that that's an issue and we can go in and fix it and here's how i'll fork this so i can play around and i don't think i ever even talked about this but because we didn't need it but there's a second argument that you can pass into dot data which is a function that defines the identity of the thing and d3's internals will use that to distinguish between um you know things that are the same and things that are different so here's here's what we can do uh we need some some kind of identity across the different iris flowers first actually before we do anything else so here's here's what i propose when we load in the data there is no column that is a unique identifier for each individual flower it's just not there but we can easily make one by saying d dot id equals you know the simplest way to do it is just to have an id counter so like id counter plus plus and oftentimes there is something in the data that is a unique id but since there's not here we can just use the index in the table as the id and so just to see what this does it gives us oh sorry d dot id i had a typo wait why is that not working somehow it's broken did i break it i didn't really do anything huh i don't know i'm having um what i like to call a twilight zone moment or i have no idea what's going on so i think i'm just going to delete this and try again i'm getting a little sidetracked here it is the original one is not loading now which is weird huh i don't know maybe there's some outage or oh there it comes there it is maybe the data was just taking forever to load or something i don't know let me try again i want to do this because it's a very useful thing to know okay now it's working it must have been like a data latency issue or network problem so id counter is zero d dot id is id counter plus plus console.log d just to see what we're dealing with here it should be integers see this one id 3133 id 134 the point is that it's unique to each of these different iris flowers so now that we have that and also we might actually get past i as a second argument to parse rho i don't remember but if we do then we don't need to do the id counter oh we do check that out that's nice that's even easier we don't need to do the incrementing ourselves so d dot id equals i done we're just going to have integers as the ids now in the scatter plot when we do the data binding we can pass as a second argument a function that takes as input one row and returns d dot id and hopefully with that one single change it should actually solve the visualization design issue that we were talking about okay it sort of does there's there's one there's one that's not behaving correctly but it's getting close and this is this is more accurate in a sense more honest in terms of what's happening although there's well i mean i've seen this sort of thing so many times that like oh it's sort of muscle memory at this point i don't know why that one is misbehaving though i'm getting not a number from somewhere is it let's try console.log d dot id just to see what it is yeah so these are coming from our index.js there's not a number coming from some other console.log somewhere else but i think the ids are good maybe maybe they need to be strings or something so if we do empty string plus i they're going to be strings now maybe zero is coming back falsey and that's causing problems no that doesn't do it the problem is that there's a little there's one of these red ones yeah good idea you know this is buggy too because now it they should animate they should animate smoothly yeah it sure does it looks like they're they're exiting and then entering again which is not supposed to happen good idea ah it does not yeah see we're doing this i think i know what's going on we are missing the id in this transformation here so good idea to look at this place we can just solve it by adding the id to the resulting objects and this makes sense i mean it decouples the the data transformation from the rendering which is good but it adds a place where we can forget something well let's see if it works now it still exits all the time which is not what we would want oh there's something else marks overall oh there's different paths that it takes so let me add it there too now this is working as i would expect so when you do it like that the right thing happens now the right thing happens as well see okay great this gets to the bottom of it and i think this is like the you know correct quote in quotes correct design where if you change these columns it makes sense that because the identity of the dot correlates to that iris flower but then when you do the filtering they don't animate anymore they sort of disappear and then reappear and if you wanted to be complete about it you would make it so that the exit mirrors the enter in terms of the animation so instead of animating in it would animate back to zero radius and then disappear that i think would be the ultimate best solution but this is uh this is pretty awesome as it stands so i think i'll leave that as an exercise for you eric if you want to take that out yeah yeah this this is great learning so thank you awesome awesome yeah my pleasure and the way to do it just a little teaser would be to um put the transition on the exit okay and have something like grow radius but oh here's actually how you would do it dot transition t dot call shrink radius something like that oh cool yeah can i have a quick question here sure when you have things entering and exiting at the same time which runs first in terms of animation well they that's the elegance of using the same transition they run at the exact same time i see and what would you do if you want things to exit first and then enter again well you would have to invoke the entire code multiple times with different data um that's the only way to trigger things that happen at different times or now that i think of it i mean you could add a delay to the whole thing so so that the transition is delayed by a fixed amount of time if you wanted to yes i see so if i want them to exit first i would add a delay to the enter so they would exit and that would that would be enough time for them to exit and then enter will will be involved oh i see what you're trying to do yeah yeah and it almost pains me not to do it right now because we're so close and we're discussing this let me just let me just do it real quick thank you so on exit we use exit.call so that we can use the transitions and then um we can call shrink radius and then dot remove calling the dot remove on the transition will remove the dom elements only after the transition finishes which is something that only d3 does really well if you try to do transitions with other libraries like react it's such a pain to do this sort of thing but d3 does it very well now we just need to implement shrink radius which should be pretty easy just copy grow radius and make it transition r to zero like this let's see if it works it does but the delay is problematic the delay is like too much i don't think we need any delay at all as a matter of fact the delay is the fancy thing for moving x and y but for exiting and entering i don't think we need any delay at all hmm not sure what's happening here is shrink radius taking the enter selection as an input shrink radius is just taking well it's it's called enter here but the name doesn't matter oh wait a minute it's adding the transition here oh well if it's if it's adding the transition there we don't need to add the transition here as well and we don't even need this call so here's what we can do just exit dot call shrink radius easy peasy let's see if that works oh i think i think we're not making the radius we're not growing the radius on update and shrink radius really should remove the elements as well let's see if that works yeah there we go that's how it should be yeah this is exactly what i was hoping for okay cool problem solved and it was a interesting journey to get there okay great so i think now what i'd like to do is you know present my take on how we would implement this just so that every all the pieces are very clear but before we do that i think i'll take a five minute break we've been going already for a little bit so um let me check if there's any questions in the chat oh there's a good question about let is um is i is let id counter the same as var id counter yeah essentially just i mean let us the modern version of the syntax all right i'm going to take a five minute break and when we come back we will live code the solution to this problem of handling species see you soon you all right it's been five minutes i'm back um i was just noticing that the problem we were just working on is not fully done i'm not going to solve it now but just just to show you if you wait for the transition to finish before you change the menu everything works perfectly however if you change the menu twice before the transition ends or rather if you change it three times before the transition ends you can end up in a state like this that's not correct and i think what's happening here is that the exit transition is starting and then you change the selection which causes the filtering to change and everything to update but that exit transition is still going on and when it finishes it removes those dom elements and so to really solve this um eric if you wanted to take this on as a challenge because this started from your work um if you really wanted to solve this the thing to do would be to cancel the exit transition whenever you render the data again which may be a little tricky to figure out but that would be the ultimate solution okay give it a go yeah give it a go see if you can make it happen because this is i'm glad we got to this point though because this this reveals some of the trickiest aspects of working with d3 and this would be a good solution a good a good little puzzle to solve to really solidify the knowledge of everything great thanks yeah so good luck good luck with that okay let's dig into today's um live coding by handling species using scale point and manually adding metadata about the columns like type for example so column name and type that's what i'm thinking of as metadata before we dig in though i want to talk about these terms quantitative and categorical these are terms that i got from tamara munsner's book tamar munsoner's textbook called visualization analysis and design which is a great book really great book highly recommend and it's used in a lot of classrooms you know people teaching about data visualization use this as the textbook i use it for example when i teach every fall and that book defines terminology that can be used across you know discussions of visualization design so on and so forth i mean there's a lot of different terms that have been used over the years and she sort of defines it like okay these are the terms i'm going to use so i'm going with her set of terms to define different types of attributes in the data and the term attribute that's another term that means column some people refer to them as columns i often refer to columns as columns because that's what they are to me but in a data table sense columns can also be called as attributes and attributes have types much like variables have types and the types of attributes that we're dealing with here when we come up against this problem of handling species in our drop down menu is categorical categorical attributes have different type things in them meaning not numbers they are not numbers they're strings or identifiers for for things that have identity so the identity of the values in a categorical column categorical attribute are different and in the case of this iris data set we've got a bunch of quantitative columns that are numbers that you see here on the left but then we've got one column that is special and needs to be treated differently than all of those other columns that are numbers namely species and the reason why it's fundamentally different is because it's not numbers it's strings it's a different species of a virus setosa is one species versus color is another and there's one more but the point is that they're strings they're different things they need to be mapped to the visual space in a fundamentally different way and so that's what we'll get into here any questions so far about this all right i think i'll move on okay so now we'll actually handle species and we're going to use scale point which is kind of like scale linear but it deals with categorical attributes you can give it different strings and it will it will spread the unique values across the space which is how we want to do it all right so let's dig in the place where i'm going to start from is animated scatter plot with menus this is what we created last week and just to give a quick recap of what we did we added these menus with these animated transitions so you can change x and y to be the various quantitative attributes of the iris data set but when you select species it just breaks it crashes and the way we did this is we introduced a menu component which is using the d3 reusable charts pattern but for a menu added event you know event infrastructure to that thing and then in index.js we add a a listener for the change event on these menus which will change the x value accessor of the scatterplot instance and re-render it with svg.call and in scatterplot whenever it gets rendered it redefines these these scales x and y scale and it's here in the definition of the scale that we're going to have to make some adjustments so to start i'm going to fork this one i'll call it scatterplot with menus that handles um let me come up with a nice title including species thank you that's perfect perfect thank you okay so what we want to have happen is that when you select species here in this menu and for y for example it should spread out across the y coordinates the different values but it doesn't do that right now let's see where we can jump in to solve this issue to me it makes sense just to start scratch that to me it makes the most sense to start here at the definition of these options for the menus and what this does here is it defines the entries of our menu essentially that's the id for the thing this is the text the label the display name that appears in the menu i think the the most sensible approach would be to introduce another property on these objects called type and the type of the species column i would say using tamar munzener's terminology is categorical and the type for all the others is quantitative so just for completeness sake i'm going to fill all those out run prettier yeah unfortunately because it gets to be long i think that's okay it's explicit so now we know when we select a given option what type it is which is information that we need in order to change the type of the scale that we use so now that we've got this in hand we need to tell the scatter plot you know what type of column it's dealing with we could do that in a number of ways but i think this the most straightforward way would be to add another accessor you know at one of these getter setter functions on the plot called x type because we already have x value it's working perfectly fine we don't want to over complicate that but i think we can add another one called x type and the invocation would be something like dot x type and then we need to figure out what the type is for the column so we have column which is you know the name of the column and we can use it to construct the accessor like this but we also need to use it to get the type somehow i'm not sure how and because i'm not sure how you know this is a perfect place to introduce a level of indirection i'm going to i'm going to call a function called get type yeah just just one question i was trying to do this uh at home but i was not able what i tried uh i first i rewrote the get data i put outside in a model and and then i try to get the json that's inside your your github uh together with the the data there is in the site where i got the data there is a json there that has all these options already the type really yeah so the gist in here yes and i try to use the json but i'm not able to get the data out of the scope when i try to to put this in an array or in a dictionary uh i lose i lose the data inside the loop i have the data right when i get outside of the loop the they don't vanish so how could we use this so we don't have to hard code like the type string or the type number right from from this right and it's so funny i actually forgot that i put this here um let's see when did i make this this was like years ago that i put this up here at that time as well i was thinking about metadata for columns because it makes sense to be able to manipulate the metadata along with the data set which is totally possible so i mean we can use the raw url here and we could potentially use d3.json to fetch this file and use it but for simplicity's sake i'm just gonna you know paste it into the code but because we have this data structure you know uh we could potentially do it like that and years ago i was thinking about like oh i'm gonna develop a data publishing format where it's supposed to be a csv file and a little json file like this that describes each of the columns don't give up on that right [Laughter] it's not a bad idea i mean it would be super useful to be able to have like a standard that you could just plug in so i'll keep that in mind but yeah the way to do it would be d3.json to fetch this file and then use promise.all to fetch the csv and the json at the same time and then run some code after both of them have loaded but no matter where this information comes from we're going to have to implement this function get type for the column i just yes yep sure and so let me just put that right here get type is a function that takes as input the column and returns the type of the column now the name column might be confusing because conceptually a column is described by one of these objects but what this really is is the column value meaning like the name the name of the column so it's just a string it's it's not the entire object if it were that'd be simpler but it's just the string sepal length so you know honestly to make that more clear i'm going to call it column name or column value i like to think of it as column name that way column [Music] column attribute i don't know in a way i wanna i wanna i want this to be name like this but it's a refactoring that would me we'd need to update the code elsewhere i don't know maybe i'll just leave it like the way it was but we just have to understand here here's what i'll do i'll add a comment say column is a string corresponding to the value property on metadata objects can i can i ask a question about the problem we're trying to solve here sure thank you so am i right to understand that we have an array of objects and we have a value a unique value for one of the properties of those objects somewhere in that array and we want to reach the other property of that same object within the array that's exactly right yeah we have this as in we have the string corresponding to the value properties in these objects that exist in this array and what we want to get is the type which is a strain that comes from a different property of those objects but we have a array of objects right correct and we we need to get the value of the object inside the array okay okay yeah we have to given the column name for example species this function should return the type namely categorical yeah and so one one way we could do this is you know iterate through each of these entries and then when it matches when the value matches we'll have access to the entire object and then we can just access the type of it and that would look something like options.find which is another method on arrays d dot value equals column this would give us the object and then we could just access dot type that should work yeah cool yes and and that will enable us to get to jump from any property to any other property within the same object exactly in an array of objects correct perfect however the dot find method on arrays will check you know the way it works dot fine the way it works is it checks each and every one it says okay let's take a look at this one and it passes that object into this thing and then this function runs it returns true or false so it says is dot value equal to column it checks the first one it says okay d dot value is battle width is that equal to column which is you know whatever column we selected if it is then it returns true and then array.find returns that one that matched but if it returns false array.find goes on to the next one and it passes in the next one and says okay is that one the same no it's not the same go on to the next one check it again if if it is the same then the define method returns that object so if it matches species for example it would return this entire object from this expression here that's why we can say dot type we could we could just as easily say text to access the text so because it iterates through each and every one i generally don't like to do it this way because it's big o of n algorithm wise that means the algorithm takes n steps where n is the number of columns in this case it's it's not that bad there's only like five so it's not really it's not really an issue so this would work perfectly well but i kind of do want to show the way i would do it which is to create a lookup table using the map data structure but any questions so far i just had a quick question around this find method yes so if like it's not present in this current table lookup example but say we had multiple columns or multiple columns with the same name so species was appearing twice so what find like return both those objects or will it just return the first uh occurrence of the of the selection selected column or the column that we're trying to find that's correct it will just return the first one okay okay yeah that's what fine does it just returns the first match that it encounters so if you did have species as the value for multiple of these which you wouldn't want to that would be a bug but if you did uh it would just return the first one and not the second one is there uh find all uh method um not the case here but uh just yeah this is what i would do i don't think there is but there is a radar filter which is essentially does the same thing yeah okay yeah exactly yeah so if you wanted to do that you would say options.filter and you wouldn't want to say dot type because the result would be an array this would find all of the matches it would so filter is essentially find all that match perfect okay clear now i just want to before moving on to the implementation with map i just want to make sure that this is actually working i have changed the filter for find oh thanks so to just check if it's working i'm going to say console.log get type of column this will let us just check if it's working and this is on the the x menu so when i change x it works it says quantitative excellent what about yeah so if i if i use species it outputs categorical perfect so it works quantitative quantitative quantitative quantitative categorical if it's species so that part works that's great x dot type is not a function but yeah we'll deal with that later but first because we're on this topic and this is such a common thing to have to do i just want to say this is not how i would actually implement this how i would actually implement this is to create a map i'll call it column 2 type is a new map data structure and this is built in to the browser you don't have to import any libraries or anything it's an es6 feature and map has a number of methods like set and get so column two type meaning the way i named it like that because it's a lookup table from column to type you know from column which you use as the keys to type which are going to be used as the values and so what we can do is loop through all of these options maybe something like options dot for each is a way to iterate through these and we could say for each of these options we can say we want to set the value and the key will be option.value and the value that the key maps to which is confusing because the value that we're using the word value here but the the value that the key maps to will be type so how do we put these together this is where we can say column to type which is an instance of the map data structure dot set and set takes as input two arguments the key and the value okay i think this uh solves the problem that i was having when i imported from the from the website the json ah because when i try to to set the what i did i create a external array like before the column to type and try to push the value inside this array but outside this scope when i try to get back the array it's empty inside the four each if i if i create a array outside and try to push the values inside this array when it's outside it's empty so i think this set may solve this yeah and it's important to connect the dots between the data structures because what you need to implement this get type is a a dictionary essentially a map a key value mapping where you give it the key gives it gives you the value but if you're starting from an array you have to do something else like dot find which traverses each element of the array which you don't want and you can't you can't say like options at index column which is you know you could use an object as a map as well but um but yeah i think this does solve the problem and once we have built up this map we can access it uh but first just to really comprehend what's happening let me say console.log column to type so we can see what this map and ended up looking like it's a map instance that has a bunch of entries and this is just how chrome presents it to you in the console it means the key is pedal width and the value is quantitative and this double arrow here means like it maps to you know if you give pedal width to the function dot get it will return quantitative and it'll do so efficiently it doesn't have to check each one internally it implements probably like a hash table kind of a lookup scheme where it's big o of one instead of big o of n in algorithmic terms it doesn't have to check all of those it just gives you back instantly the one you that you want so if we say column to type dot get column inside of get type this should work as well and to test it we can change the column and observe that yeah okay it still prints out the right thing quantitative and if i switch to species it outputs categorical okay this is working this is working however this is not the best way it's not the simplest way to do it the simplest way to do it is to take advantage of a way of calling the map constructor where you can actually pass in an array of arrays and each of those inner arrays has only two elements the key and the value that way of doing it is much more concise and all you know although it is a bit more cryptic and the way that we would do that is is we can pass in options.map again the functional programming construct that lets you give it a function that function accepts each element and the return values end up as a new array we can return an array that has just two elements option.value and option.type and this should work as well let me see if i got it right yeah seems to work just fine so just to review what i did there options.map maps over all the options and it returns an array that has the value and the type the first thing being the key of the map the second thing being the value in the map and this is just an alternative syntax for doing the exact same thing as this other thing does namely looping through all of the entries and calling dot set and to simplify this even further we could use es6 destructuring to destructure value and type from the argument and then we don't have to say option dot and option dot so all of this is a roundabout way of you know exploring the ins and outs of constructing maps um but since it comes up so often i wanted to dig into this this level of detail because you know i think it's it's really important to fundamentally understand how to use maps in today's javascript world i see there are some questions let me see what is this and is it live yeah it is it's live why are you not using typescript well i'm not using typescript because typescript is a pain to use in my experience um then you don't have to write comments yeah i know yeah it's just it's just a pain is this microsoft monaco no it's not it's a viz hub it's a thing that i made and you can use it too anyone can use it you can fork this stuff and get a link to it what are we doing here i love these questions these are great um it's a new visualization tool people can use yeah i mean this being vishub yeah i created about two years ago d3 has been around a while as larry points out thank you larry yeah d3 has been around a while but the apis have changed so that's why i'm doing this tutorial now to like use the most modern way of doing it okay thanks for those questions okay now we can move on to the next phase namely implementing x type but before we do that i just want to make sure are there any questions so far about what we've done here so far please go ahead that song oh she just said it's so complicated yeah i know there's a lot of details there are a lot of details to remember but um and the first time you see stuff like this i realize it can be overwhelming that's why i wanted to like yeah this happens all the time first time i see something it's like wow i'm not understanding it and once i get into it it's like oh it wasn't that bad yeah and console.log is your friend i mean if you were on your own you can say console.log options.map value type and see what it is and and use use console.log to interrogate what you see here anything that's confusing to unpack it and understand what's what's happening in that intermediate stage see it's an array of arrays and i realized when i described it it's a bit abstract but when you console.log and you see it you can understand oh it's an array of arrays with the keys and the values and also if you're on your own and find the stuff confusing it's good to consult mdm which is the de facto standard documentation for the built-in stuff in javascript such as map and it says right here it describes the map constructor creates a new map object and it says right here that you can pass an iterable into the constructor which is what we're doing and it just it documents that right here it says what is this iterable it's an array whose elements are key value pairs in the form of arrays so here's an example that maps numbers to strings so yeah if you if you're ever looking at this stuff on your own and feel oh my gosh this is so confusing use console.log and and and just do google searches for the data structures that we use like map because the documentation is great and also the same goes for d3 the d3 documentation is really good so if you see a d3 method that you don't know like just google it find the documentation um i do it all the time but it takes such a long time it does it does there are a number of um there are a lot of rabbit holes that you could go down and get distracted really time consuming and just takes a lot of time yeah everything it does yeah it does work yep i would say it's well worth it it's an investment in your future yeah i agree that's why you and you have every other week yeah i changed and that makes it difficult because for example i'm so behind i'm supposed to catch up yeah i wish it was every other week well actually i was going to announce it at the end but since you brought it up i did change it to be every other week not every week so going forward is going to be every other week thank god i updated the meetup page why because yeah it's for me too it's a bit hectic to do this every week anyway are there any other questions i just wanted to ask what what would happen if you have two entries with the same keys so you could like while you were trying to iterate you're passing two keys two exact keys to in your map object that's a fantastic idea well if we if we go back to this other variant that i did here with this variant it's easy to understand what the answer to that question would be if you understand the semantics of dot set if you call dot set multiple times with the same key it will change the value and so in this case let's say we had two of these where species was the same what this algorithm would do is call dot set passing the key species and it would set the value to be this one here however the next time around you know in the for each when it gets evaluated with the last option it's going to call dot set again and so when it calls set the second time with the same key the way that maps work it's going to overwrite the first version of it and so we're going to end up with a map that only has one value for species and it's going to be the last one encountered in this array so that's complete opposite of find in a way exactly it's the opposite of find yes yes that's a great insight the way find works is it checks each one and it returns the first match the way this works is it sets up entries in the map for each of these entry entries in the options array one at a time and if it does encounter the key multiple times it overwrites it so essentially it's putting the last occurrence of the match in the original array as the the value in the map that you get thank you yeah my pleasure and i'm not so certain what the behavior is when you use the constructor like this but it may well be the same yeah it may well take the last match although i'm not 100 sure so you know what we can do we can actually test it out and this is the beautiful thing about coding too you can use the code to ask questions about the code for example um oh there's a breakage column to type is not defined i'm sorry forgot to uncomment uh sorry i'm just a little disoriented but yeah let's do this little experiment to figure out what the answer is all right so when we use this variant that uses four each goes through all of them i would expect it's going to give us the last one and if we select species it outputs categorical which is the last one see the first one was quantitative and we actually have the same key here multiple times now let's answer the other question of what if we use the map constructor like this is it the same so if i select species it outputs categorical so the answer is yes it is the same that takes the last one great question great question i love how it digs in a little deeper yeah thank you okay now let's go ahead and solve the next piece of this puzzle now that we know what the type should be when we select it we need to pass it into x type and you know while we're at it let's pass it into y type as well why type is get type of column oh there's some error there's a syntax error i don't know what that was about so i'm going to also call why type here okay it's fine nothing was there was no problem so now from the outer sort of view of things we're invoking it where we want to be invoking it we're setting why type when we change the y menu and we're setting x type when we change the x menu now the task at hand is to implement that method in our scatter plot so let's go into our scatter plot code in scatter plot dot js we've got a bunch of these getter setter accessors and let's just make a few more we've got x value and y value i'm just going to copy paste these and change value to type so x type is going to be x type change y value to y type and we're referring to the variable y type and x type those don't exist yet so let's make those at the top of the file like this now these are available to us when we render our scatter plot let's just make sure because this is where we're going to want to use those let's make sure it's available with console.log x type here so we get undefined initially which makes sense because we're only passing it in when we change the menu but when we do change the menu we get quantitative for these and if we type if we select species we get categorical which is exactly what we want okay this is great and i think what we can do is say if x type is categorical using this um ternary syntax we create a linear scale otherwise now this is where we can use category exactly yeah yeah the good call so i got the order reversed so it's going to be scale linear if it's not categorical but if it is categorical then it's going to use scale point just use prettier to format that and if it is a scale point we can set the domain to be data.map x value and what this what this does is it returns an array of all the different x values including duplicates so it's going to return like versus color versus color versus color setosa setosa setosa but then when you when you pass that array into dot domain the scale will internally figure out that there are duplicates and it will deduplicate it so the domain will only end up having three entries the unique values of the species column and then dot range um actually should be the same because we want it to span across the same space in pixels so this should work let's see if it does if i change x to species there's some breakage let's see what it is oh scale point is not defined yeah i forgot to import it we just need to import that from d3 along with this other stuff now it should work okay great check it out it works hooray we solve the problem yep everything's working the quantitative stuff is working and when we hit species it transitions to this which is the behavior of a point scale by the way it just takes the values that it sees in the order it receives them and identifies the unique ones and spreads those across the space across the screen it's exactly what we want and i must say i am impressed by the way that d3 axis handles the transition see how it fades d3 axis is brilliantly implemented because if it is given another linear scale see how it animates the numbers it animates the ticks but if you pass into d3 axis with the transition a different type of scale it does this nice fade animation which is just brilliant brilliant but anyway there's a little bit of cleanup work to do here but um any questions so far yeah just um one question here is eric um just curious uh how hard would it be to actually show all the data that are in each of those categories as dot plots so in other words in other words some of those as you mentioned earlier are on top of each other but you would really have sort of a frequency is this what you mean by dot plot yep yep so so so what you're trying to show is uh almost the distribution across those categories so that you can see where the peaks are so that's hidden right now yep but it's a sort of a natural transition but as i'm thinking about it just trying to get a sense from your expertise how deep one would have to go and actually to do that not that we do it here but i'm just curious yeah it's a great question go ahead i i just want to suggest an idea for doing that and and let you see whether it works or not so we yes that's right um to implement that what you described eric the first thing i would do is just look at it from from a bird's eye view and say like what should the architecture of this be should it be one visualization or should it be a parent visualization and a child visualization where the child is one of these reusable components like the scatter plot but for a single dot plot got it you know what i mean and then i would i would get it to work that's probably how i would do it i would get it to work for a single dot plot and like you said costa that requires a step of binning where you take these um well in this case you can just use the the different values that are present here i think it increments by 0.1 that's the resolution of the data so for each one of these unique values you would want to bin them and count how many occurrences there are for each of these unique values and that there's a feature in d3 called d3 bin i think or d3 histogram if you google search d3 histogram you can see this in action so that's a d a data processing step so first you need to do that binning so that for each of these numbers you know how many occurrences there are and then based on that after that binning happens you can visualize that as a dot plot got it and then that would be one of these components and then what you're doing essentially is called small multiples dot plots where you would want to actually have three dot plots one for each species and so then you would need to change this scatter plot code completely so that for each of the values across the x-axis here it would iterate through those and for each one invoke that reusable dot plot instance so you'd have three instances of a reusable dot plot that's one way to do it okay that was that that sort of just curious how you'd approach it so um not that obviously we tackle it but it's it yeah so that's how you would do that but there's a uh if you step back a little there's another question what is the best visualization design to show this type of data and small multiples dot plots is one option but there are a number of different options there's actually one option that's very low hanging fruit for us that we could do right now and i love doing this because it's so simple on the circles we could set the fill opacity to 0.2 in the css um i don't think that actually worked maybe it's just opacity yeah there we go that worked so if we set the opacity on the circles and we subdivide them by species you can see it's doing it right there you see what i'm saying yeah yeah so this is a simple modification we can do to make the visualization communicate the information of density like how many overlapping dots there are which is the same thing that would be communicated by the dot plots yep yes so this is bird's eye view yep okay so the bird's eye view meaning what i meant by bird's eye view is you step back and look at the design space of the visualization this is one other option there are so many other options i mean you could have small multiples box plots for example you could have small multiples histograms like bar style histograms you could have small multiples violin plots so many different ways i mean once you get this data structure and you want to visualize it there are so many options but you can frame it as small multiples meaning you want to implement one instance of it and then just multiply it across the different values that's really cool nice yeah and r by the way ggplot in r does all this stuff okay it's brilliant the way it does it so yeah stuff to look into thank you yeah my pleasure all right let's finish this up here it's almost done but we've only done it for x and not y and this code itself can be cleaned up whenever i see duplicated logic like this dot range dot range copy pasted it's exact same thing i asked myself how can we get rid of this duplicated logic in this case it's fairly simple because this expression if you put parentheses around it returns a scale it could be a point scale it could be a linear scale but it returns the scale and so we can essentially factor out the call to dot range so that it gets applied to the returned scale whatever type of scale it happens to be then i just run prettier on that and this is what we get yeah i would prefer this just because it has less duplication so to be clear this set of parentheses creates the scale you know a different type of scale depending on whether x type is categorical or not it sets the domain but it does not set the range but whatever scale comes back from this expression we call dot range on that scale whatever it is so it's just a simpler way to do it now that we've got that we can just do the exact same thing for y i'm just going to copy paste it change x to y all over the place uh so instead of x value is going to be y value [Music] but with the y scale we need to be careful about the range because it's different its height minus margin.bottom and margin.top so i'm going to take this definition of the range and use it here and get rid of our original y scale definition and this should do the trick for both x and y let's see if it does x species works y species works excellent and one last little thing that i don't like about this is that there's no padding it goes all the way up to the edge maybe it's just personal preference or stylistic but i like always to put a little padding and the way where we can put that is right here scale point.domain.padding that's a function on these scales i'll say 0.2 yeah so if we look at it now we get this nice padding it doesn't go all the way up to the edge which i just find kind of i don't know distasteful it has space to breathe the labels you can read all the labels now so virginica used to be off the screen and just as the final step i'm going to call dot padding in the case of y as well so now it works for both x and y and the the labels get cut off yeah let me just change the margin to address that problem because what i want to do is finish today with a complete product that works for all the cases that you select but where's my margin margin oh it's right here margin left let me set it to 150 pixels and see if that's enough for species okay that's a little too much maybe 120. all right there we have it any questions so one question i had was with those drop downs there um i couldn't find where one um does this or if it's even possible to change the font and the size of those drop downs oh is that a css thing that's a great question you know the html select element and the options it's notoriously difficult to style with css so if you're working on a product where you need styled menus the best approach might to go might be to go seeking out some third-party library that implements a drop-down widget that you can style with css okay yeah explains the hunt yeah they're right it's it's a hunt yeah and like you just it's very difficult to style these the way you want um but again it's a whole other level of complexity to like evaluate the different libraries and pick one and figure out how to use it however with this d3 reusable chart pattern you could implement a menu component just like this having the same api you know having the same methods and everything but internally it could use that third-party menu library that would be the approach that i would suggest okay thank you but that said i was actually surprised to see that in one of the submissions the menus were styled let me see if i can find which was that you cost off yes custard how did you do this i think i used normal css to do this this is beautiful it has that custom font that's awesome yeah i just wanted to make the font consistent with axes and menu so i just use this select uh html and apply this property here brilliant so you know there you go all right i was trying to figure that out yeah there you go so it looks like it looks like you had to style the select the label and option to get it to work yep so i think the labels were for these uh access uh labels and the option and select work for the drop down right yep so there's your answer eric should work thank you and i was actually quite surprised to see this working because um i've struggled in the past to try to do this but it looks like this works yeah i was just playing around with it i totally forgot that it worked for my case that's awesome and i wonder if it would behave correctly if you try more advanced css like setting the the background color or the you know the roundedness of the edges i think you might run into a wall beyond which you can't customize but worth exploring for sure right but yeah this is very nice and i i have to say it's really nice how these animate or how your labels animate really nice work all right well i think that's all for today i thank you so much yeah my pleasure thank you all thank you very much thank you yeah thank you very much and uh yeah like i said it's i changed it to every other week so just mark your calendars it's not going to be every week and i will leave you all with an exercise for this next two weeks i want you to find a compelling data set look around online for different data sets and yeah try to find a compelling public data set and just search around you know i would suggest coming up with some idea of like data that you want to see visualized that you've maybe seen in the past or thought about deeply like some sort of existential like question about society or i don't know climate change the keeling curve if you the killing curve would be a good one but yeah find a compelling data set that interests you and the way to do it is just search for the topic see if you can find any visualizations that pertain to the topic track down their data source uh you know follow links until you can get to a downloadable data file like an excel file and then export it as csv or just try to find a csv or a json file load that up you know fork the scatter plot we made with the menus load in that data update the data parsing logic and i want you to actually explore the data using this scatter plot with menus because now that now that the scatter plot has the transparency uh we can see the density of things and now that we made it handle species it can handle any column that's categorical and uh i want you to write up the key insights that you discover in the readme of the viz so fork this modify the readme.md file to write up like bullet points of like this is cool um this other thing is cool you know interesting insights that you find about the data not about the vids or the coding but i want you to like actually explore some data and uh yeah share your results in the forum i've made an entry for for today episode 12. here it is all right so now it's goodbye for real thanks for joining and i'll see you all in two weeks yep have a great weekend thanks take care all right bye bye take care thank you thank you bye
Info
Channel: Curran Kelleher
Views: 503
Rating: undefined out of 5
Keywords:
Id: tbvJbgBgvPM
Channel Id: undefined
Length: 110min 42sec (6642 seconds)
Published: Sat Jun 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.