Get it Right in Black & White Episode 8 - Scatter Plot

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right we're live hello hello all right we're live on youtube i'm just gonna give a couple minutes for other folks to join you so today i'm going to try and experiment i'm opening up the um the live audio to anyone who wants to join so in the viz hub forum on episode 8 scatterplot i dropped the the google meet info so if you happen to be watching and you want to participate as in you know ask me ask me questions live via audio feel free to join this meeting everyone is welcome to join today and we'll see how it goes i'll give folks a couple more minutes i'll drop this link in the youtube chat too you you all right looks like it's just me today nobody uh no live guests to stop me and ask questions but anyway i dropped the link oh here here's someone yeah so i dropped the link into the youtube chat if anyone wants to join just go ahead and join all are welcome today so i'm going to get started welcome to episode eight of get it looks like it's just me today oh wait a minute uh no live guests to stop me and ask questions there's an audio glitch okay there we go ah sorry all right welcome everyone to episode 8 of get it right in black and white today the goal is to build a d3 scatter plot from scratch so introductions for new folks if any i see um oh someone joined the call and left but you're all welcome to join um if any money if anybody does join i'll come back and we can do introductions i'd like to begin by reviewing some of the submissions from the last episode which was two weeks ago now but let's take a look at what folks did that was the one on javascript modules and build tools and here's a nice roll up example by floating per very nice very nice and here's another one that uses vite this looks kind of interesting i've never used devite but seems to it seems to work and here's another one that seems to be just the straight viz hub export which works too but yeah i'm really glad to see folks are able to get their stuff out of his hub into github um so that's that's one of the most commonly asked things like how do you get out get out of vis-up so that's how and uh hopefully that's satisfactory and by the way i recorded episode 7 by myself a couple days ago i'm going to release it as a video soon because i was doing this work anyway of of getting a uh you know makings like an open source product out of something that's in vis up and i just recorded it so i'm gonna post that eventually it's not posted yet but that's why we're on episode eight now not episode 7. metal guitar covers hello everyone hey feel free to join the audio if you'd like it'd be a lot of fun today we'll build a scatter plot and the goal here is to create a vanilla html implementation of a d3 scatter plot that uses modern d3 conventions and implements this pattern of decoupling rendering from data processing the goal of that is to make it easier down the line to swap out the rendering logic if you want to with another framework or canvas or what have you and i hope to get out of this a forkable template that can be sort of a go-to resource uh if anybody asks like how do i make a scatter plot with d3 you could just send them a link to this template and like boom that's it and it's like probably less than 100 lines and it works and it's basic and it's vanilla and i mean truly vanilla so we're not going to use es6 modules even because i want to show what that option looks like to not use es6 modules okay i'm going to start by creating a viz invis hub from this vanilla html starter i'm going to fork this and say d3 scatter plot so what have we got here anyway it's a basic html template with some space for javascript i'm just going to gut this delete everything but i'll keep the places to put things like the script and the style because i know we're going to need to use that stuff okay step one let's load in d3 and um there's a number of ways to do this but since we're doing vanilla html let's just load it in from a cdn i'm going to use unpackage as a script tag at the end of the head i'm going to make a new script tag open script close script and the source will be d3 somehow and the way that we get that is i like to go to unpackage dot com slash d3 and it resolves to this minified build of d3 so i'm going to paste that url here now we have access to d3 which we can prove by saying console.log d3 and taking a look in the console and we get this object that has all of the d3 methods all right the next step is to load in some data and since this is a you know going to be a template i like to use the iris data set if you just do a google search for the iris data set there's even a wikipedia article about it that helps to give some context it's also known as fisher's iris data set it's a multivariate data set from like 1936 about flowers about iris flowers these are the iris flowers and it's got like petal width petal length and they're just measurements in centimeters about these particular flowers so it's a classic data set i'm going to use it because we end up with nice scatter plots but part of the goal of this is to develop something that's a template that you could easily swap out the data to be something else so in a sense it doesn't really matter what data we use to start anyway if you scroll down in the search results i've got this just up in github that contains the iris data set as a csv file publishing data in github gists like this is a pretty common practice uh for visualizations that are hosted on the web and various platforms it's great because github essentially hosts the stuff for free and lets you pull it down from any program so if we click this raw link right here we get to see the content of this file which is a csv a csv file and csv stands for comma separated value this is the header row and these are all the rows that contain the data so it's got sepal length and the sepal length of the first flower is 5.1 it's got sepal width and the sepal width of the first flower is 3.5 centimeters it's got petal length and the petal length of the first flower is 1.4 and it's got petal width the petal width of the first flower is 1.2 and it's got species and the species are iris setosa iris versacolor and virginica these are different species of iris so we can take this url right here copy it and paste it into our program i'd like to make a variable for this so i'm going to call it csv url equals and i'm just going to paste it right there and at this juncture you know for me i don't really like seeing these long lines that wrap around in sort of an unwieldy way so here's a little trick that you can use to to clean up the code around strings i'm going to make it an array of strings so we can see the different components of this url yeah i'm just going to split it up here close out the array and then we can say dot join which is a method on javascript arrays and then we can use prettier to format that nicely and now i feel this is a lot more readable we know that it's coming from github gist this is the user name this part of the url is the id of the gist and this part is the commit and then this part is the file name so that's how you can split a long string into multiple lines in you know a somewhat elegant and readable way in javascript and just to make sure we're getting the right thing still we can console.log csv url and we should see that url yeah there it is and we could even click on that and it resolves to the right thing okay now that we've got this csv url we have to fetch this csv file to do that we can use d3 dot csv now at this juncture we can make a decision do we want to have d3 dot all over the place in our code or do we want to extract the d3 stuff into local variables the latter would be my preference because that's how it ends up when you use imports es6 imports so to make this vanilla javascript more closely resemble the code that you would write if you were to use es6 imports i'm going to make a decision to destructure all this stuff out of the d3 global like this we can say const csv equals d3 and i'm going to do it at the top because traditionally that's where the imports go so what this is doing is is exactly the same as this const csv equals d3.csv it's exactly the same thing but it uses es6d structuring which is a nice language feature so now that we've got this the csv thing we can we can use it we can say csv and pass in csv url and what this does is it makes a network request using http for that particular url and the csv function returns a promise uh which is a construct in es6 javascript to deal with asynchronous control flow so when you run this code it makes a request and then that request takes some time to come back and then after it comes back the promise resolves and that's when we want to get the data that comes back to do that we can say dot then dot then is a method on promises and we could pass in a function that takes as input data and then we can say console.log data to see if it worked indeed it did see we get this array of 150 elements that also has a property on it called columns see this is the result of d3 dot csv doing its thing internally it actually loads the file and it also parses it it goes through that big string of text and it it splits it into rows and it constructs these objects one object per row so what we get here it's an array of objects and each of these objects has a bunch of properties like sepal length simple width however these are strings and ideally these would be numbers so the next step here is to parse these strings into numbers d3.csv takes as input a second argument i'm going to call it parserrow and this is a function that takes as input a single row i'm going to call it d and it returns some object that will replace that row so what i'm going to do here is i'm going to return d but before doing that i'm going to mutate d and say d dot sepal width well actually let me just copy the set of columns out because i can't remember all of them and we only want to do this for the things that are numbers so species is not a number so we don't need to parse the strings into numbers and by the way the reason why it's good practice to parse strings into numbers is because you know when you do math on strings you don't always get the right result just as a brief detour let me just show you if you add like five plus five as strings you get fifty five but if you add five plus five as numbers you get ten so how do we get from strings to numbers we can use this operator called unary plus if you put plus right before the string it parses the string into a number and so that's what we can do for for all of these numeric fields here i'm going to paste that here and i'm going to use the macros i think to just write this code quicker so we can say d dot zepa length equals plus d dot sepal length and then set up for the next one and with these macros i can automate that and then i'll use prettier to format the code and there we go we've got numbers instead of strings and let's just see if that worked in the console we can see indeed these are numbers now we've got some questions in the youtube chat by the way anyone can join the live audio be higher bandwidth so csv is converting the file to json structure yeah you could think of it like that i mean it's a it's converting it to a javascript object in memory which is the same structure as a json file but json is is a serialization of javascript object so it's not exactly json but it's similar it's it's yeah it's an array of objects that's true doesn't d3 add columns yep yep totally does yeah so d3 does all that magic internally where it adds this columns property yep okay what's next here let's start using d3 to build up this graphic using svg so the first step of that is to create an svg element on the page we can do that right here i suppose const svg equals d3 dot select body dot append svg this will append a new svg element to the body however since we don't want to see d3 dot sprinkled throughout our code i'm going to also destructure select from d3 up there and for svgs to work we need to give them a width so i'm going to say attr width and also we need to give them a height so i'm going to give it width as the value of a local variable called width that i haven't created yet and the same for height height gets height and let's define width to be window dot inner width this measures the size of the browser at the time where the program runs i like doing it like this so it's generic and it's not a hard-coded number okay let's see did this work we got these uh these undesirable scroll bars which we can address with some styling in the css we can say okay on the body element set the margin to be zero this is just a common trick that you sort of need to do all the time if you want to have full screen svgs and overflow is hidden that will hide those pesky scroll bars okay that looks better all right now that we've got this svg what do we want to put in here um well a scatter plot is made of circles so let's just go at it that way you know let's add some circles to this svg based on the data the thing is though the data is only available inside of this callback here and i don't know this dot then i don't really like it because because there is a more elegant construct called async and await um my preference is to use that and the way you can do this is introduce a function i'm going to call it main and this is going to be an async function and what that means is we can say const data equals a weight csv all this stuff here this strike strikes me as a little bit more elegant you know i'd prefer to use it like this and then we have to invoke main of course and let's just see if it still works after this console.log data indeed there it is okay so that's how you can you can use async and await instead of the then syntax for promises which is sort of a more modern approach now we can add circles to our svg with the d3 general update pattern svg.select all circle dot data data dot join circle and now we can set our attributes on the circle attr cx is going to be now what is it going to be i mean this is where we need to do a lot of work actually to figure out what that x position should be and it brings up the point that um i would like to have the concerns separated of the data processing and the rendering and so i feel like we're getting a bit ahead of ourselves we're not quite ready for the rendering because we haven't done the data processing and so by data processing what i mean is figuring out which x and y coordinate each circle needs to get based on the data and using this construct called scales linear scales in particular so let's do the data processing first and then circle back to this rendering logic all right i'm going to comment out this and to figure out where x should go we need to use a scale and i'm going to just call it x this is going to be the x scale because this is a convention that i've seen in recent times with d3 programs this is going to be an instance of scale linear and it would be d3 dot scale linear but we're also going to destructure that up here out of d3 scale linear but what does a scale linear i mean what is this what is scale linear why do we need it i've prepared a diagram to explain this scale linear has a domain and a range the domain is typically the values that you see in the data uh in our case this would be like the you know the width of the sepal or whatever and typically the domain goes from the minimum to the maximum value that we see in the data so let's just say we have some data set where the minimum value that we want to use for say the x position is 0 and the maximum value is 10. and what the purpose is for scale linear in this case is to transform data space into pixel space and the range of the scale is the pixel space and in the case of the x coordinate this is going to be you know the the lowest and the highest x pixel coordinate that we want to see whoops hang on so when we have a data value and we want to project it onto the screen we we give this scale a value from the domain and what the scale does is it gives us as output a value in the range for example if we were to have the number five in our data we could say okay mr skill linear my data value is five where should that be on the screen and then scale linear would say oh i know that should be at 25 because think about it five is halfway between 0 and 10 and 25 is halfway between 0 and 50. so that's the purpose of skill linear i see someone has joined hey felipe is here hello hello hey i'm late it's really difficult to me to make it oh no worries i'm happy you could join yeah meet you meet you i'm also glad that i can a good job nice um i can't remember if you introduced yourself last time you came on but maybe just give a brief intro to yourself here okay i'm philippe i'm an engineer as a profession i don't code i just love to code but it's not my job and then i'm i'm trying to learn as much as i can to start coding and make things more more easily very nice i really i'm really enjoying making making things awesome yeah i've been very impressed with your work you know i'm so happy that you've been following the series and and doing the the exercises you know it's very it's very great to see and i'm so happy you could join me today so here's what i'm going to do i'm going to share my screen in the google meet okay so that um you can follow along there and you can close the youtube stream because there's a lag there but this is great because you can we can have discussions in real time if you have any questions so feel free to interrupt me if anything's not clear okay we can dive deeper but in the meantime please mute yourself because there's a little bit of background noise oh my gosh adil is here too hey hey adil how are you you're muted hello everyone hi karen how are you hello so glad you could join today sorry about the the last minute arrival at some babysitting duties oh no worries no worries this is great two folks joined it once and felipe just joined philippe have you met um a deal uh not personally but nice to meet you nice hi philippe hi how are you great well let me just um give a quick recap because you're all just joining this is what we've got so far it's not that much it loads d3 it fetches a csv parses the rows from a csv constructs an svg element and constructs a scale linear and the goal here is to make a scatter plot and i just went through this slide about linear scales and the the purpose of linear scales is to convert values from a domain to a range domain is like the input the range is like the output the domain is the data space and the range is the screen space so in our case we want to position these circles in the scatter plot in the x direction and what should that be though we have to pick one of these columns to use and this is where i like to define an accessor that i call you know i like to call it x value y value color value size value whatever it's a naming convention that i like to use and it's a function that takes as input one row and it gives you back some value from the data that we should use so here are our various options um i don't think it really matters at this point which one because i just want to get something to show up so i'll pick pedal length this is going to be our x value and the same for y let's try simple length for y so we're going to build a scatter plot where you know petal length determines the x position of the circles and sepal length determines the y position of the circles now that we have that in hand we can define the domain of our x scale and the way that you do this is firstly you construct an instance of scale linear by invoking d3 dot scale linear as a function this gives you back this object that has methods on it such as domain and domain can be used to set the domain and the domain remember is just two values a minimum and a maximum and so we can this is going to be like min and max min and max are actually functions from d3 so we could say d3.min give it the data and give it the x value accessor and this will actually iterate through all of the different values and compute the minimum from there and we can do the same for max d3.max data x value and it will compute the maximum x value let's just inspect uh what happened there x value is the same of x x value is just this function yeah it's just this function here that returns the petal length for each row and we could tweak it here if we wanted to change which row to use but yeah that's all it is it's just a function that returns the petal length for each row and so what we're doing here we're passing in that function to d3.min which expects a function that takes as input a single row and we're also passing in the data and when we run this d3.min call it actually checks each and every value that gets returned from this function for each and every row and it keeps track of the minimum one that it saw and then it returns the the minimum of all the values and same thing for max but just for the maximum and we can check if this worked by saying x dot domain and the thing about this scale dot domain is if you pass it a value it acts as a setter meaning it sets the value of the domain but if you don't pass it any value like this it acts as a getter and it returns the value that you set previously i just use prettier there to format and so if we look at the result we can see an array of two values and check it out it goes between 1 and 6.9 so d3.min and d3.max are great and all but there's actually a function that does both of them for you at the same time which is called extent so instead of this we can use d3.extent oops and d3.extent actually returns an array with two elements so when this runs we can see that it's actually the same the same end result and as a last piece of cleanup instead of using d3 dot extent i'm going to destructure extent from d3 up here okay so now we've got the domain of our x scale figured out it's the minimum and maximum value for that column in the data now we need to figure out what should be the range for our x scale we can set the range in much the same way that we set the domain with a function called dot range and by the way note how it's calling dot range on the return value from dot domain and the thing is uh with this with the d3 api it's using this technique called method chaining and what this does is um if you use for example domain as a setter like this it returns the instance of the scale which is the same thing as x so we could potentially say okay x is a scale linear and then x dot domain is this x dot range is this other thing and this all would work but since it's developed with this method chaining api we can use this shorthand and this works as well so let's figure out what should the range be well the range it's going to be an array of two numbers and i think to start let's just make it go between 0 and width to fill up the screen and we can see what that is by saying x dot range like this and its array of you know 0 to 960 that seems right okay that's our x scale it's finished now let's do the same for y i'm just going to copy all that change x to y and then we want the range to go from 0 to height but not actually because the y-coordinate is flipped in svg so so 0 is at the top not the bottom but we want the lowest data value to be at the bottom of the screen so we actually need to flip this the range will go from height to zero now we can do our data processing i'm going to create this thing called marks and it's going to be data.map where we invoke a function for each and every row of our data set and we can return an object and this object can have on it the properties of x and y so x will be x meaning this x scale of the value from the domain and by the way these linear scales are functions and you can pass in a value from the domain and so the value from the domain will be you know d dot um what is it pedal length oops however since i don't want to hard code these values in here and we already have this accessor function we can just say x value of d that way it's nice and generic and the same thing for y just replace x with y and there we go now let's console.log marks to see what we ended up with oh i've got two console.logs here which is confusing let me get rid of the other one i'll get rid of this console.log data so now we're just going to see the marks and we have objects that have x and y in pixel coordinates all right fantastic this is great now we're ready to move on to the rendering step and and because we did the data processing separately the rendering step is a very straightforward thing where you just map x and y from these objects directly onto cx and cy of the circles pixels pixels for me i have the the impression that should be in intelligent numbers not floating numbers why is that yeah that's an interesting point pixels being integers versus uh floating point numbers with decimals yeah it's interesting um svg because it is scalable vector graphics when you're looking at it on the screen um let's say you're looking at it on a high dpi display like a retina display on a really expensive like macbook and um and you set a pixel value as like 100.5 on a high dpi display that has double or quadruple rather the pixel density as a regular display that incrementing of the pixel coordinate by 0.5 will actually increment it by one physical pixel and svg within svg you could do transforms and even with css you can do transforms of scaling and so svg actually does accept pixel coordinates as floating point numbers they don't need to be integers and by the way interestingly this is different with canvas html5 canvas which is a raster based system if you draw a line in canvas and you give it a floating point number i i i'll have to double check but i think it like takes the floor of the number so you can actually only get down to a single pixel in the canvas which is a raster image because it's made up of pixels however in svg it's more um detailed than that so somewhat surprisingly you can actually use fractional values for pixels and also for things like line width if you set the line width to 0.5 it it makes it like slightly lighter when it gets anti-aliased onto the display but again if you were to view it on a high dpi display it would actually be one physical pixel as opposed to like two physical pixels because it depends on the what's it called the device device pixel ratio or something like that but yeah long story short svg accepts floating points as um values you yeah my pleasure yeah i remember it blew my mind when i learned that actually um and it makes sense in a way because let's say you print an svg when you print something it's like 300 dpi so you can actually get to sub pixel precision when you print it or when you view it as a hana high-res display um current yeah when it comes to choosing a scale um does that depend on what kind of um what kind of metric we're dealing with so we're chosen a linear scale in this case and does that is that because it it's there's something about the original metrics that that makes a linear scale appropriate yes it does and um i will be getting into this in the future but yeah long story short linear scales is just one type of scale it's a particular type of continuous scale and these continuous skills include other ones like a power scale which includes a square root scale a log scale a time scale a radial scale so all of these continuous scales are meant to deal with numbers as input so anytime you have a column of data that just has numbers in it you can use this type of scale however there are other types of scales that make sense to use for other types of values time scales for example work with dates and it it's categorized as a continuous scale because it it acts like scale linear but for dates it treats a date as a point in time and also there are ordinal scales so in our data value for the iris stuff we have one column which is species right and it has three distinct values that are strings that are not numbers and so when you have strings in your data you can use ordinal scales either an ordinal scale directly which is essentially a dictionary that maps an input value to an output value but within this space of ordinal scales you have band scales and point scales a band scale maps these discrete input values to discrete output values that are arranged as bands on the screen so this is useful for bar charts for example and then you have point scales point scales can be used to represent um data that has discrete input values like different strings but then project that onto a space so if you wanted to have a scatter plot where for example y is the species you could use scale point to map species to a y value but since we're just dealing with numbers mapping numbers to the screen that's why we're using linear scale and we will start to use all these different scales in the future for all the various um visualization techniques yeah thanks that makes that makes uh sense thank you my pleasure okay let's get these circles to show up and make a basic scatter plot i'm going to bring back this rendering logic that we started on earlier by uncommenting it and i'm going to change it to work with marks instead of the original data now if we want to set the cx attribute we can just give it a function that takes us input one entry in the marks array and it can return d dot x and the same thing for y c y which is the center y coordinate of the circle can return d dot y and let's see if it worked sorry they are missing yeah good call good call yes yes yes that's it if you don't give it r if you don't give a circle a radius it doesn't have a radius and it doesn't show up so let's say five all right there we go boom got a basic scatter plot all right sweet i'll see if there's any questions in the youtube chat that's tiny input is the domain value output is the range yep display density yeah thank you r yep you're on it you're on it all right cool um let's see what's the time i think we can take a break i'll take a five minute break i'll be back in five and then after the break we can add axes to this scatter plot okay see you in five you okay we're back how's everybody doing good you know felipe i've always wondered how do you pronounce your last name nice nice okay let's add axes to this scatter plot i think i'll fork this at this point because this is a nice state to have it at as a reference so this is d3 partial i'm going to make it d3 let's get a plot one question is the the projects that we make in this hub storage somehow in github no no it's not it's a separate database so i i developed vishub over the past couple years and it's all stored in a big mongodb database okay yeah but i was thinking about making not to get too sidetracked but i was thinking about making a feature where vishub could automatically mirror your stuff to github gist because the export from vishub is compatible with blocks.org so you could have like a backup of all the stuff in vishab in github gists in case there is somewhere to ever go down or whatever but yeah it's a separate so you you would you would save space in your server right yeah use only github to store the data yeah that's true that's true but then it would be difficult to implement the real-time collaboration features yeah oh anita is joining perfect timing hello there hi let me recap what we've done so far um ideal is here philippe's here we've made this scatter plot so far and we're just about to add axes to it okay so how do we add axes to a scatter plot first of all we need to make room for the axes using a margin and in the d3 world there's this this thing called the margin convention actually let me just show that one because it's it's really good if you just google d3 margin convention mike bostock the author of d3 has this piece in observable which is the d3 margin convention the idea behind this is to make room for axes on the left and on the bottom or on the top and the right wherever you want to put them but the whole idea is we define this margin object that has top right bottom and left and our code will sort of use that margin to to know where to put things on the screen so it impacts how we compute the range of our skills instead of going from zero to width for example we would want to go from margin.left to width minus margin.right and that's what this example does here back in our code let's define a margin margin is going to be an object that has top i'll just set them all to 20 for now can change it later right is 20 bottom is 20. and left is 20. and by the way this is an actual convention to to go from top right bottom left because it's clockwise around the screen and i think that's how css sometimes in css you can you can like specify things as a big string and it goes from top right bottom and left like that so that's why i chose that ordering but anyway now that we have this margin we can tweak our scales instead of going from zero to width for our x scale we can make it go from margin dot left to width minus margin dot right and now we can see that there is actually a margin being applied let's do the same for our y scale instead of going from height to zero it's going to go from height minus margin dot bottom to margin dot top like that and everything else just sort of works everything else flows from the scales so it just works we can verify that it's working by testing each one of these one by one like what if we set the top to 200. see that it works we have a big gap on the top what if we set right to 200 it works we have a big gap on the right how about bottom yep got a gap on the bottom and left yes i have a gap on the left that's how you can implement the margin convention now that we have space for axes let's add some axes when we destructor stuff out of d3 let's also pull out axis left and axis bottom and the way that we can use those is on our svg we can append a group element for each of our axes so this is going to be let's do the y-axis first svg.peng.call and this is where we can construct a new axis axis left and when we construct this axis we can pass in the scale namely uh y like that and we're not seeing anything because it's sort of off to the side of the screen what we need to do is move this group element to the right a little bit actually we need to move it to the right by margin.left we can do that by saying dot attr and we can set the transform to be translate and this is an es6 string template literal we want to translate by margin dot left in the y direction and in the x direction and zero in the y direction and now it shows up see get some tick marks uh the numbers seem to be cut off though so let me just tweak the margin i'm going to tweak the left margin to be let's say 50. and now we can see this axis shows up all right that's how you can add an axis with d3 maybe let's do the same for the y axis or let's do the same for the x-axis now i'm just going to copy this code and modify it to be for the x-axis which i would like to put at the bottom so we can call axis bottom and pass an x like this but the transform is not quite right anymore see we've got all these numbers at the top we want these to be at the bottom so what we need to do is translate by 0 in the x direction but in the y direction we want to translate it to be all the way down at the bottom and so that's going to be height minus margin dot bottom yep and that works all right very good that's how we can add x and y axes to our scatter plot one last little tweak i would like yeah question uh i'm sorry i i'm not sure how how do you ensure that the the mark in the in the axis is exactly matching the the value of the data because since you can transpose the axis you could be ending the number in a different place i don't know if i made myself clear oh sure yeah yeah i mean the shorter the short answer is because they're derived from the same scale but yeah if you if you wanted to you know you could translate it by let's say like 50 pixels in the x direction and make it misaligned like now it's not lined up but if you don't do that um it is perfectly aligned just because of how um the scales are set up so because the axis takes as input the x scale the axis actually uses the range of the scale to position itself and since it's the same scale that drives the x and y of the circles it it ends up aligning perfectly yeah and we're translating it by notice how we're not translating the x-axis in the x-direction at all we're just translating the x-axis in the y-direction to put it at the bottom and so all yeah so all the x-positioning is just strictly coming from the x scale okay okay nice nice and same with y by the way we're translating y in the x direction just to align it with the x scale so when all of the y positioning is just strictly from the y scale here so that's why that's why they all align correctly and you can sort of see that they align because this the the lowest x value which is the circle here the center of the circle is exactly on that line of the axis can i see something yeah we can check the value for each circle by putting title on it and we will see it matches with the axis oh we totally could yeah that's fairly easy to do let's just do it that's a great idea so there's this thing called title um you know what that might be a little more complicated because we need to tap into the enter selection yeah i don't know that seems a little bit out of scope for today yeah we need to we need to change this code around to access the answer selection so i don't know i don't really yeah it's no problem my questions was wondering if we could mess it up but as you said we we never transformed in the axis that we are actually moving exactly made sense completely nice yeah great great but but yes we could potentially add tool tips that will show the numbers um which which we could then verify that could conceivably work but i don't think i'd like to do that right now because i want to keep this example um nice and simple i have a question um if if we wanted to the the axes to begin from zero uh what would we need to do oh yeah sure if you wanted the axes to begin at zero you could just use um instead of using extent you could use max d3.max and as the first value you could explicitly just pass in zero like this now the x-axis starts at zero very cool yeah thank you yeah that makes total sense nice yeah my pleasure and i love how you know d3 is is designed in such a way where it's like you can fit things together like lego blocks you know so it's you can just plug in whatever you want to for the domain but it provides utilities like this max and extent for the most common use cases which is you know deriving the domain from your data and for a bar for a scatter plot it makes sense to use extent but for other visualization types um like a bar chart it always makes sense to start at zero so you know if once we make bar charts we'll always start the domain at zero and then use d3.max and same thing if you're assigning areas to circles like the size of a circle based on some numbers i mean we'll get there in the future but i'm saying like there are very concrete use cases where it makes sense to start the domain at zero and then go from zero to the max of the data and that's how you would do it so yeah i think we're pretty much done here i kind of want to just um you know give this a once-over oh one thing i wanted to do was make the ticks the tick marks bigger because um in my opinion the default size is pretty pretty small and i want to give a just a flavor of how you might customize some of the stuff on the axes which is a rabbit hole there's a lot of different things you could do there but the simplest thing you can do is just um use css to say for all text elements which includes the tick marks we can set the font size to be i don't know 24 pixels and that works it's it's bigger but as a best practice i like to scope the changes because if there's you know if there's other text this will apply to all the text which may be what you want but with d3 axes in the dom there is actually a structure that we can leverage namely that each tick mark is is composed of a group element that has a class of tick and then within that group element there's a line element and there's a text element so we can select the text elements inside of the ticks by leveraging the fact that this group element has a class of tick by saying dot tick space text what this what this does is it constructs a selector css selector that selects the tick elements and then the space signifies okay within within that look for children of that element that are of the type text so this is how you can set the font size of the tick elements and the labels are getting cut off so i'm just going to tweak the margin again i tweak the bottom margin to be say 50. or maybe 40. yeah and um at this juncture i think the code is is done in a sense but now just to close this out i want to look through this code from the perspective of someone who wants to fork and modify this because that's what the assignment is to fork this and modify it to show some other data set and so you know in this direction what i want to look for is are all of the tweakable things in one place and all the generic stuff in some other place that's what i would aim for if if the goal is to create a reusable template so let's just look through and i'm going to make a comment here called tweakables and all this stuff should be data set specific so we got the csv url very much data set specific we got parse row very much data set specific x value and y value very much tweakable and actually at this point let me just show you how easy it is to change the meaning of of one of these um x or y instead of sepal length we can change it to sepal width and boom our scatter plot updates just like that and so when you load in another data set part of the process is just going to be trying out different values for the columns and see and see what pops out the margin very tweakable something you'd want to properly tweak a lot and the width and the height yeah this part is sort of something you might want to tweak like if you have some downstream code that where you want to position the svg inside something else so sure it's tweakable now this part is generic so i'm going to go through this and see is there anything that's specific to this data set in this code or not um so far it looks pretty generic you know this cut this same exact code could run on any data set oh there's one thing that's tweakable the radius so i'll pull that out into a tweakable variable i'll put it along with x value y value and margin radius is five yeah that's an arbitrary thing that uh you might want to tweak but um i think that's about it that's all i see so that's how that's how you can polish up a uh a forkable dataviz all right great i feel like we've set out the original goal to have something that's just vanilla html javascript it doesn't use any bundler or anything it approximates the feel of es6 modules by destructuring the stuff out of the d3 global but it's it's totally vanilla so anyone could you know export this and use it all right i think that's it any um any questions or things you want to discuss um i have a question about the call function you used sure let me just plug in my laptop there's the batteries running out yeah the question is about dot call i guess you could phrase the question as like what is that what is it really doing and it's it's really just a shorthand for invoking functions and there's a way that you could do this without using dot call let me show you what that looks like um let's let's look at for for axis left we could potentially pull that out like this into a variable and then pull out this group element as another variable so now we have i'll get rid of that dot call now we have the axis as a standalone thing and the group element as a standalone thing and we want to sort of inject that axis into that dom element and one way we could do it is y axis g dot call y axis which is the same as it was before but now things are split into variables the other way to do it is like this y-axis y-axis g like that and it still works so what's going on here is when you invoke dot call on a selection it expects that you pass in a function and it will invoke that function passing in the selection on which you invoked.call so it's exactly the same as this y-axis is actually a function that expects to be called with a d3 selection of a group element as the input so these two are exactly the same behavior-wise dot call is just it's just sort of a shorthand a convention it's convenient because you can chain stuff onto it you could say dot call that dot you know select um you know domain path or something and remove it just as an example use case where you might want to use dot call and see that removes the domain line that path which is that vertical thing and often i find myself wanting to remove the domain path because it's like i don't see much value that it provides but anyway yeah i hope that answers your question these two are equivalent dot call just invokes a function and passes the selection but i think i'll put it back to the way it was it's kind of confusing but it will work yeah yeah there's always a trade-off there's always a trade-off between writing concise code that might be a little cryptic versus verbose code that's like twice as long but it's easier to read yeah it's gonna try to strike a so balance the first argument to the uh yes the first argument of the first argument yeah maybe i should just keep all that around as a reference yeah just commenting will be nice so the you know it's like the argument of access left function is the selection svg selection is that right yeah the argument that y-axis expects is a group element well rather is a d3 selection of a group element that's empty and the y-axis function was going to be like okay you give me an empty group element i'm going to put an axis into it and if there's an axis already there i'm going to i'm going to you know update it to be to be accurate and it has to be group element it expects a group element yeah i mean you could conceivably pass in like the top level svg element but yeah i'd expect a group element but yeah it's a good question could you could you pass in something else i don't know i've only ever seen it used with a group element oh i i broke it there we go so so yeah i mean it's it's good to study this um because it is a lot of stuff sort of wrapped into one so what it does here is says okay i'm constructing an instance of an axis and i'm passing in the y scale so the return value from this constructor is a function that we could call y axis if we wanted to that expects as input a d3 selection of a group element as its argument when it gets invoked and that's exactly what gets passed in when you when you use dot call and the dot call is is called on this group element that's been transformed but um yeah yeah that's how it all works it does take some time to to wrap your head around it um but oh in time i've come to prefer like this this sort of construct rather than making a bunch of variables but again it's just just personal preference either way works you know that another thing another while we're at it another way that you could potentially do this which might be even more confusing is that you can construct the axis and then pass in this selection here that works as well i i don't understand this to me it seems like uh could be because i'm more used to python but it seems that you're passing like uh the svg thing like a argument to the function that's it's left yeah it's not clear yeah that's right that's right so when you call axis left from d3 you give it the scale it returns an instance of a d3 axis but that instance of d3 axis is in fact a function that expects to be invoked with a single argument namely a d3 selection of a group element and so this statement here returns a function and that's why we can invoke that function like this so this part here we are invoking that function that gets returned from the axis left constructor and we're passing in a d3 selection of this group element because as soon as we call svg.append g it creates a brand new d3 selection of a brand new dom element which is a group element and then it calls dot attr on that group element and because it uses this method chaining api the return value from dot attr is the same as the return value from dot append g it's a d3 selection of the group element and so this whole expression here yields you know it returns a d3 selection of this group element and that's what gets passed into this axis left function okay when you inspect the inspect the element the result it's exactly the same right correct yeah all of these different um all of these different ways of of invoking the axes result in the exact same behavior they're just different ways of formulating it but yeah when you when you when you see it visually it's the same and yeah when you inspect the dom it's going to be exactly the same and from the perspective of the dom this thing here is the parent group element that we created with this code here svg.peng said the transform to be translate this is that group element here and then when we pass that into the d3 axis the axis implementation adds all this stuff as children to it so there's a path which is the domain line and then there's group elements for each and every tick and within each tick you have a line and a text element so yeah no matter how you invoke it it ends up to be the exact same dom structure it's quite confusing yeah it looks like you can pass axis function in the svg selection and vice versa well you can that's exactly what dot call does it sort of inverts everything you know i think probably the clearest one to read is this variant here where we don't even use dot call we construct a variable called y-axis which is that return value from axis left and then we construct a group element by appending a group element this is a d3 selection of a group element and you can see the relationship very clearly right here y-axis is a function that expects as input a d3 selection of an empty group element and this is this is um i guess shows the border between them very clearly but yeah i get it that it's totally confusing that you can invert the order of those in the code with dot call yeah it takes a while to wrap your head around okay let me ask you one thing it's not quite related i i think i got my my head around this it's regarding the the arrow function uh i don't need to use uh the return ever in the arrow function it's always the last line that's returned right there it's only one line so it doesn't make sense my question but if i have um an object and it has more than one line um do i have to use return or not that's a good question yeah yeah let me show a variant of this one so with the arrow function if it immediately goes to an expression that could be an object literal could be a string could be a number it it sort of activates this thing called implicit return it's it implicitly adds or exec it implicitly executes a return statement or the equivalent of a return statement and that's sort of one of the magical features of these arrow functions however if you start the arrow function with a curly brace like this then it opens up into a function body where you need to explicitly return so you you could like you know run some code blah blah blah but then at the end of it if you just define an object like this it's not going to be returned you need to explicitly return it like this and so this function here is exactly equivalent to this function here but the one on the top uses that syntactic sugar of the implicit return that comes with arrow functions and that's by the way why it needs to be wrapped in these parentheses because the parentheses signify okay we are defining a literal object and and that's why it gets returned if you leave out these parentheses uh it's not valid javascript and it breaks so like if if we try to run that it's going to say it's not valid it's unexpected token because when the parser goes in it it interprets it okay like we're starting a function body now and then this is like garbage doesn't make any sense so yeah that's why you need those uh parentheses okay i got this error a lot so it's good to know that this could be a reason nice the unexpected two columns yep yeah that's one of the most common areas you got a syntax error then you need to track it down figure out where it is um and by the way you know a lot of folks who have been maybe programming in javascript years ago you're used to this other syntax where it's like a function like this and this is like i like to call it the old school function notation this works too but the arrow function syntax is just it's a lot more concise and um that's what folks would use probably nowadays all right very good a lot of fun we've explored some ins and outs of um of the details here and we've ended up with this forkable template oops hold on so i'd like to wrap up and leave you all with an exercise the exercise for this week is fork the scatter plot that we made and i'll share the link and change it around to visualize a different data set and i've provided this link here to uh this data repository that i have where i've just been accumulating over the years various public data sets of interest and you don't have to necessarily derive any data from here you could just find you know do a search for data that's interesting to you and plug it in to this scatter plot and i've got this other video on youtube called preparing data for visualization and this walks through how you can create a gist and put a csv file in there and then once that's there you can follow the same steps that i did today to pull it into your code and visualize it and this is all there in the vishub forum so please submit your your work here looking forward to it okay thank you all right yeah my pleasure and uh thanks everyone for joining today it's a pleasure as always all right take care thank you bye bye take care bye
Info
Channel: Curran Kelleher
Views: 793
Rating: undefined out of 5
Keywords:
Id: dz6KLhurKMI
Channel Id: undefined
Length: 105min 20sec (6320 seconds)
Published: Sat May 01 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.