No Black Box Machine Learning Course – Learn Without Libraries

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to this No Black Box Machine Learning course in JavaScript. You will gain a deep understanding of machine learning systems by coding without relying on libraries. This unique approach not only demystifies the inner workings of machine learning, but also significantly enhances software development skills. The course is designed and taught by Radu, who has a PhD in Computer Science and as a university lecturer, from building a drawing app and working with data to exploring feature extraction and implementing various classifiers, this course covers a wide array of topics that will equip you with the knowledge and expertise to create your own machine learning driven applications. Hi and welcome to the No Black Box Machine Learning course in JavaScript. It's a course where we code without using libraries because it's the best way to learn all inner workings in a machine learning system, and you'll greatly improve your software development skills as well. Actually, the course is a lot about software engineering, especially in the beginning, and the focus slowly shifts towards machine learning as it goes on. Our main goal is to build a web app that learns to recognize drawings. If you've seen Quick Draw from Google, it's something like that, with a few key differences I'll point out along the way. For machine learning, we need the data, so I'll first teach you how to build a data collection tool. It's a drawing app that works on desktop and mobile devices. We'll later reuse the sketchpad component to build a recognizer as well. Next, I'll teach you how to process and visualize the data collected with the tool, the drawings you made when asked for help. If you missed that video and want your drawings to be used at some point, it's not too late to do it now. Then we move on to feature extraction and visualization. We build this chart component from scratch. It's a relatively big project, but worth it, because building it we practice much of the same math we later need for machine learning. We also customize it so much it becomes an essential teaching tool for what follows. Speaking of which, this is where most machine learning courses typically start. We use some existing data in features, famous data sets that are publicly available, and that's okay, it helps dive into things much quicker. But I include this part, because in my experience, it's never that easy in real life. Being able to collect data, visualize it, clean it, and shape it in a useful form are all really important steps you need to do in practice. And not understanding data well is why people fail at machine learning. And the most sophisticated libraries fail if the data is not what you expect. But this is our data, you made part of it, so I'm pretty sure we won't have that problem. Now the simplest learning method I can think of is the nearest neighbor classifier. I'll teach you how to implement it and integrate it with the sketchpad, so it recognizes what we draw. It will work okay-ish, it's a simple method. But you can work even better if we apply data scaling, a step everybody performs when doing machine learning, but very few people can explain why, at least from what I've seen. I'll make sure you understand why it helps. We then implement the more advanced K nearest neighbors classifier and calculate its accuracy for different values of K. To understand things even better, I'll teach you how to compute decision boundaries and display them on our chart. Now at work, I teach a version of this course in Python, and a poll I made recently shows that some of you are interested in that as well. So the last lesson will be a review where we do the same things in Python. Spoiler alert, I'll use libraries and it will be over quite fast. Now when I said last lesson, I mean data phase one. This is when the course will take a short break. It will give you enough time to focus on the homework. Sorry, but if you really want to learn, following recipe videos like this is just not enough. But don't worry, they're not that hard. High school math and some programming experience is all you need. If you need to brush up on those, these videos here can help. And if you get really stuck, join my Discord, where me and other students can help. We then begin phase two, where we learn more advanced methods like these. I actually got a pretty good accuracy using the same neural network we coded in the self-driving car course. There might be some kind of crossover happening. Let's see. Let me show you how to build a drawing app for data collection. It will ask for some information first, and then tell you what you need to draw. The sketchpad also has undo and save functionalities, and it works on desktop and mobile devices as well. We'll implement everything in this course using Visual Studio Code. And to test, we'll use a web browser. Today we build a web application. So let's begin by creating a new folder called web. And inside this folder, we'll create our first file, creator.html. It will be a web page that we use to create data for our data set. Now we begin to type basic HTML. And in the head, I will give this page a title, let's call it data creator. And let's link an external style sheet, a CSS file that we will have to implement. This will hold all the styles that we need in this project. Let's close the head section and define a body where I'm going to have a div with an idea of content that will hold everything we'll implement today. And that will include a title section, this data creator using an H1 tag. And let's have a line break and then a div with ID sketchpad container. We'll implement the sketchpad today. And this container is going to hold it. Let's close our content div and our body and the HTML, save the file. And now let's open the file in the browser. And we get this. The only thing we see in the page is this H1 tag from here because this line break and this container are invisible for now. We also get an error in the console here. I have the browser developer tools open and I recommend you do the same. The error here just means that we don't have the styles CSS file existing yet. Let's create it by going here and typing styles.css and inside I'm going to add basic styles for the page. For the body, I'm going to set the font family to Ariel and the background color will be sky blue. I will also add some styles for this content. I want it to be in the middle of the page always. So the position will be absolute and I'm going to give it 50% from the top and 50% from the left. But this will make the top left corner of the content be in the middle of the page. So I want it to go half its width and half its height to the left and towards the top. So I'm going to right here transform, translate minus 50%, minus 50%. And text a line to center. If I refresh this, I'm going to get data creator exactly in the middle of the page. The background is blue and Ariel font. Let's begin to implement our sketchpad. So we'll do that in an external JavaScript file and we'll load this at the end of the body here, so it will be in a folder called JS for JavaScript and we'll call it sketchpad.js. Let's close the script tag and here we are going to instantiate our sketchpad in another script tag and we will define it to be on the sketchpad container like this. Let's close the script tag now and define this file. So first the JS folder, I'm going to create a folder called JS and inside of this folder, I'm going to create sketchpad.js. Make sure that it's really inside of the JS folder and not in the same line with the other ones. And here we will use the class syntax. So let's define our sketchpad class and the constructor will have a container, in this case our div from earlier and the size, let's say 400 by default. To implement the sketchpad, we'll use an HTML canvas element and I will create it here as an attribute of this class by calling document create element canvas like this. I will set the width of this canvas to be the size and the height of this canvas to be the size as well. And let's give it also a little bit of style. I will write this canvas dot style is equal to and now I'm going to open a back tick to write a template literal, a string that works on multiple lines. And let's write here a background color of white and a box shadow with these properties black. I will close the back tick here and the semicolon. So notice here this back tick don't confuse it with the single quote. It won't work on multiple lines like this. Now let's go here and append this canvas to the container to the div. And we can save the file, refresh the page and here it is our future sketchpad. To draw on this canvas, we are going to use the canvas context, the 2D context, which we get like this and I'm storing it here in this city X attribute of the sketchpad. And we will have to add event listeners to detect the mouse actions. I will do this using a private method add event listener here like this. So we will have to implement this method next. I will do it down here and it's a private method. It has this hashtag in front. It means that it cannot be called from outside this class. And the first thing that we will do is add to our canvas an unmouse down event listener like this. We will detect the unmouse down action and we will figure out the coordinates. We will do this by first getting the rectangle of the canvas bounding area with this function. And then we will obtain the mouse coordinates by taking the client X of this event minus the left side of this rectangle. So in this way we get an X coordinate that is relative to the left side of the canvas. And the same thing for the Y coordinate relative to the top part of the canvas. This is 0 0. Now notice here for mouse I'm using an array syntax. So this will correspond to X coordinate and this will correspond to Y coordinate. We will be using this syntax everywhere because at some point we will start working with higher dimensions and in this way we don't run out of letters to use. Now let's log this mouse location and see if this code works. Save the file, refresh the page and if I'm going to click we can get here the coordinates. Every time I click I see different coordinates closer to the top left corner we get coordinates very close to 0, closer to the bottom right corner we get coordinates close to 400 400 how we specified here the size to be. Now these are floating point numbers with incredibly high precision. We won't need any of that and integers are enough. So I'm just going to go here and add a rounding to these values. So let's just say math round like this and let's close this parenthesis as well. Refresh and now when we are clicking we get these integer values that are easier to work with. Let's remove this debugging and start creating our path that we draw on the sketchpad. So I will say here this path is an array that contains the mouse. When we just click on the canvas that's what we get one point added into this path array and I will also mark here that the drawing has started like this. This will be attributes of the class let me go up here and define them. Is drawing is set to false by default and the path is empty. Now we are going to do the same thing here but for on mouse move another event listener. I'm just going to copy everything and replace here on mouse down with on mouse move like this and we will process this only if we are drawing. So we will be moving on top of the mouse all the time but only if we are drawing. So only if is drawing then we do something here and we get the coordinates of the mouse same as last time but here we add it to the path. So we write here dot push this mouse location and we know that is drawing is true otherwise we wouldn't be here. So let's just remove this line and log here the length of our path. Let's also add an event listener here for on mouse up and the only thing that we do here is set is drawing to false again and we don't actually need this parameter here anymore. We don't care where the mouse is up because the path has been modified with the last mouse move. So I'm going to save this refresh and now when I click and the drag the number that you see there is the number of points that have been added to the path that I'm drawing. We can't see the path just yet but you can see it growing every time I'm moving the mouse. And if I release the mouse now I released the mouse and if I'm going to press it again and start dragging I create a new path again. What happens is that here path is re-initialized with just the mouse location on mouse down. Now before we continue part of this code here needs to be fixed like this is essentially the same thing as this. Let me just remove this part from here and extract a function for getting this mouse location. So it will be a private function. Let's call it get mouse given an event parameter. I'm going to paste here this code and start to align it. And here we can just return directly these coordinates like so. Now this get mouse function we can write it here as mouse is equal to this dot get mouse with the event parameter and I can copy this for the top part here as well. Let's refresh and the same functionality as before but now the code is much easier to read and simpler. All we need to do now is draw on the canvas instead of logging these numbers here. So let me write here this redraw. This will redraw everything on the canvas. It's a method that will implement next. And I'm going to write it here like so. We will start with clearing the canvas using the clear rect method starting at 00 the top left corner and going all the way to canvas with and canvas height. And then we are going to draw our path. For that I'm going to implement a draw utility object. It will have a function called path that we can specify here the context and this path the one that we want to draw. Now let's create this draw utility object inside of its own file. I'm just going to import it here first. It's going to be called draw and let's create it in the JavaScript folder. So inside of the JavaScript folder here we create draw.js. And we initialize our object called draw like this and let's add the path function to it. It's going to have a context, a path and a default color set to black. So the stroke style is going to be set to the given color black in this case. And let's set here a line with maybe three. We begin the path and now we're going to move to the first point in the path. Now this move to typically has two parameters but this path of zero has the XY inside of an array. And this syntax here spreads that array out into its two components. So we can call this move to just specifying one item here. And then we will iterate from one all the way to the path length like this and do a line to the same syntax with path of I like this. And then we stroke. And save this refresh and now you can see the line appearing there. And if I release the mouse and click again it starts again actually. So it's not good we would like to draw multiple paths but it's something. And before we get to multiple paths there is one thing that I don't like very much. Maybe I can show you if I zoom in. Can you see when drawing there are some corners appearing when I change suddenly direction? I'm not sure if you can notice what I mean. But for example this corner, this corner, this corner, they appeared out of the blue somehow. And this is because of how the line join is done. And also the line cap is ending here that is straight. I would like it to be more round. It looks better like that. So we can change these things like so. Let's save this refresh and now the line cap is round. And when we change direction like that we also have a round corner point here. So those problems don't happen anymore. Let me zoom back out. Okay, to draw multiple paths I'm going to generate here a new function. Again a context path this time. And then again a default color set to black. And what we will do is go through all the paths and for each path we draw a path with these properties. So what we do here is reuse this other function from above to draw multiple paths instead. And in the sketchpad we are going to go up and rename this path to paths. Now this will be an array of arrays. And here we are going to say path.push an array containing one point. And here where we want to add a point to the path we now want to add the point to the last path. So let's get the last path first. It's going to be the last one in this paths array like this. And then we are going to push the mouse into this last path. And our redraw method is going to call draw paths this time with paths. Let's save this refresh and there is an error here because I forgot to change this path into a path as well. I should have used this right click rename symbol feature to rename it everywhere. In that way you don't make such mistakes. Now refresh. Okay, but it's not yet perfect. If I'm going to press this button here, this is how it would look like on a mobile device. And this is just not great. To fix this I can go to create their HTML. And here at the top in the head section I'm going to say meta name equals to viewport. It's a viewport meta tag with a content equal to width equals device width like this. I'm going to save this refresh and notice how the width has changed. Here at the top you can choose multiple devices if you want to try them out. But this one has a width of 375. And my canvas here is a little bit too large. So you can actually fix this by specifying here comma and then maximum scale something like 0.9. In this case it's going to rescale everything a little bit downwards to 90% and my canvas fits even on these smaller screens. Now we don't want users to zoom in and out on this page. And we can prevent that by going here and specifying user scalable 0. But there is a big problem here, this sketchpad just doesn't work. I'm trying to draw now and it doesn't work. And the reason for that is that the event listeners for the touch are different than the event listeners for the mouse. So we're going to go to our sketchpad to add event listeners. And here I'm going to write untouch start, a new event listener. I'm going to get the location from the event touches. And I'm going to just call on mouse down with this. So the same code from here, we reuse it. But we get it from the first touch because multitouch is also possible on devices like this. Now we do the same thing here for untouch move. And the same here for untouch end. Now if I'm going to save this refresh, it works also in this mobile device mode. But some people have mentioned to me that it sometimes doesn't work on Apple devices. And when you try to draw, it moves the page up and down. I'm not sure if this entirely fixes the problem, but you can go to the body and say here, over scroll behavior to none. If somebody else knows how to make it even better than this, please let me know. Let's save this and I'm going to exit now from this mode. Let's implement undo functionality. I'm going to go back to the sketchpad in the top here. After we add this to the container, I will add a few more things, first a line break. And after that, a button, an undo button with an inner HTML set to undo. And let's append both of these to our container. Let's save this and refresh. And you can see the undo button now here. Let's add an event listener to it. I'm going to go here at the end of this function and say, undo BTN, unclick. And we are going to pop from our paths and redraw everything. Refresh, draw some things, undo, and it works. But we shouldn't be able to undo things after this is empty. So I'm going to go here at the end of redraw and I will toggle the undo functionality. So if this path's length is greater than zero, this undo button is going to be enabled. Otherwise this undo button is going to be disabled. Now if I'm going to refresh this and undo two times, this is a disabled button here. We can make it look nicer. If we go to styles, I'm just going to add the general style for buttons and say, I want the cursor to be pointer. Now it's the default cursor that looks like an arrow, but the pointer is that kind of hand symbol and I like that over buttons. Let's remove the border, set the padding, and a little bit bigger height. A border radius, make it a little bit curvy, and a dark blue color. Let's use navy and a white color for the text. We're also going to add a different color for the hover state. I'm going to set background color to medium blue. And let's add styles for the disabled state and also the disabled state on hover. Both of them are required. And I will make this gray and then we can put the default cursor here. Save this, refresh, and now the button looks a little bit nicer. And when it's disabled, it looks like this and the cursor also changes. Now if I refresh this, the undo should be disabled here. So I'm actually going to go to the sketchpad and call this redraw here in the beginning. It's going to handle that as well. And it becomes enabled as I'm drawing something. Now let's proceed to add the functionality that we need to create a data set. We have to record who the person is and to have a button to advance to the next drawing. I'm going to go here to creator and actually we also need to make this sketchpad invisible at the beginning while we ask the people who they are. So let's start with that. I'm going to go here and say style visibility is hidden in the beginning. Let me just put this div on a new line there. And up here, I'm going to start to define the other objects that we need. An input field, I will call it student. It will be a text input field and let's have a placeholder here type your name like so. So I'm calling it student because I sent this to my students to fill in and a button to advance to another object and on click let's set this to start. It will have a text just saying start in the beginning but we will change that soon. You'll see. Now let's close this div, save the file, refresh the page and now the sketchpad is invisible and we can type in here and if we press start, nothing happens because this start function doesn't exist yet. There is one thing that we should consider at this stage though and that is here in the head section, let's define another meta tag for the char set and say it should be UTF-8. Many people have names that have special characters and also some people use emojis here as their name and this will support that. Now let's start to collect the data from those fields. I'm going to go here and write data and object with three fields, the student which is no by default. I'm also going to have a field for the session and this is going to be a unique identifier. I'm going to use the date and get time for the session and I'm also going to have an object here for the drawings. This is where we will store all the paths that people are drawing for each of the different drawings that we ask them to do. Now because my system is online and several students could theoretically be drawing at the same time, I have a more complex code on the backend to make sure session IDs are unique. But so far this hasn't been useful, all time stamps are unique and it's not a big surprise really given they have millisecond precision. So I won't go into more details here and keep it simple like this with just the timestamps. So let's implement our start function next. I will do it here at the bottom and say start and check if there is a value provided in the student field, this input field from above. If it's empty, I'm going to say please type your name first, an alert and I will return. Otherwise the code will continue here below and I will set the student from the data to be the value from this field. I will also make this input field disappear like this and I will make the sketchpad appear like this refresh and if I press start, I'm getting this alert here. If I fill in this first and press start, I start to see the sketchpad but this should be something else here. So let's see what things we need to draw. I'm going to go up here and start by defining an index which thing I'm going to draw from a list and this list is going to have the labels, car, fish, house, tree, bicycle, guitar, pencil and clock like this. We will have another field up here for instructions and I'm just going to use a span for this with ID instructions like so. And at the end of the start function here, I'm going to do a few things. I will get the label of the thing that we will draw first like this and I will set the instructions in our HTML to be please draw a label. And then we'll also modify the inner HTML of the advanced button from start to next. I will also change the on click event on the button to next and this is going to be a new function that we will implement here. What we do inside it is we increase the index and we are going to copy the same things from here. We are going to get another label and we are going to update the instructions like this. Now before we increase this index, we are going to go up here and try to get the paths that were drawn using the sketchpad. So if there are paths, if there are no paths, we warn and say draw something first and return. So the code doesn't go further. But if it goes here, we can get the label of the path that was drawn. So still this index. And here we can store in the data at the drawings of the given label, the paths coming from the sketchpad like this and the sketchpad will also need the function to reset itself to empty the paths and redraw everything. So let's implement this too, but before that, this is a problem here redefining this label. This is going to be the next label and I'm going to copy it here as well. Now in our sketchpad, I'm going to implement here our reset method, which is a public method. We are accessing it from outside and this is just going to do these things here actually. So I'm going to move them inside and let's call reset here this dot reset. So this should still work. Let's refresh, type something here, start. Please draw a car, let's draw something. Next please draw a fish and let's look at our data. So in the console here, I'm going to type data and we have collected student ASD, that's the name I entered, then session and then the drawings so far, only the car has two paths, one with 53 points and another 27 points. You can watch these individual points if you want, but I believe this works. Now there is a problem. If I'm going to go all the way to the end, we don't have functionality for that. So the clock is the last one and then please draw an undefined. So let's go back to creator HTML here and here we are going to update the instructions only if possible. So if the index is less than the labels length, then we do that. Otherwise we are pretty much done. So we say else sketchpad container visibility hidden, we hide it, maybe we are polite and we change this to thank you and the button is going to turn to say save. This is because we are going to save whatever paths we have drawn locally. In the original version, this was done on my server, but here I want you to be able to save these files on your local computer and the new function on click will be called save. Let's implement this function as well. So what we do here is we completely hide the button that says save and the instructions are going to say what to do with the file that you will download. Take your downloaded file, place it alongside the others in the dataset. This will make more sense after the next lecture. Now to create and download the file from the browser, we can make an element here and a element and we can set the href attribute to data plaintext with this UTF-8 char set because we may have some UTF-8 characters there. And then we will encode URI component using the stringified version of the data. So we have to convert this into a string and we will use JSON to stringify this data. So the data that we are collecting, the paths that people are drawing are going to be saved as a JSON string in a file. JSON stands for JavaScript Object Notation, a standard format for storing data like what we have here. It's humanly readable and most programming languages have support for it as well, like JavaScript. In JavaScript we use JSON stringify to convert data into a JSON string and JSON parse to convert the JSON string back into an object. Now we are going to give a name to this file and I want to give the name of the session a unique name that we generated from the timestamp. And now we are going to set the attribute to download to this A element and here specify the file name as well. And finally we need to trigger this download action. So let me first make this element invisible because I'm going to add it to the body of our document, I'm going to trigger the click and then I'm going to quickly remove it from the document. I just don't know of a better way of doing this. If you know, let me know. And this is our save functionality. Let me save this refresh, type a name here, start and then car, yeah, I'm not going to do it now. I already did it a while ago, now house, tree, bicycle. I know some of you guys are much better artists than I am, guitar, pencil and the clock. And now thank you, save and when we press save, we get the file here. One small thing that I still want to show you is if I'm going to type here the name and access this sketchpad, if I draw here, it works fine. But if I'm going to move out of the page, release and go back in the page, this mouse up didn't trigger because it's done on the canvas. And sometimes you might want that to happen. I mean, for example, if you want to draw maybe a line like a horizon and then your line just continues there, it's not a nice user experience. So to fix that, you can go to the sketchpad and the event listener that you're going to add for on-mouse up is going to be on the document instead, like this. And we have to do the same thing for the on-touch and event like this. Refresh, go here, release the mouse and then go back in. The on-mouse up has triggered and now we can start to draw a new path. It would be nice to know if this system works on all devices and I'd be grateful if you tested for me. There's a spreadsheet in the description where you can fill in the device name, operating system, browser version and report if anything is wrong. Great if you propose solutions as well. The data I collected from you is just a bunch of JSON files like the one we saw earlier. Each of these contains the session ID, student name and eight different drawings. At the moment there are about 500 files and you can get them from GitHub. The number is still growing and I'll be updating it from time to time. Now the drawings are grouped together by the session and that's not convenient in the long run. So I'll teach you how to process this row data into a more manageable form. A data set where each drawing is a sample. I'll also teach you how to make a data visualizer app but the processing part we do at first doesn't really need an interface. So I'll be working with Node.js, a back end environment for JavaScript. To follow along, install Node.js from the link in the description. To test if we have Node.js installed properly we're going to use a terminal. You can use the one here in Visual Studio Code if you want but I'm old fashioned and I use command prompt in Windows. We can type Node minus minus version and you should see it displayed there like so. We are going to be working more with the terminal today and we have to navigate to where our project is. So this is where we built the drawing app last time and I'm going to go to this directory here by copying the path and here typing cd space and right click paste it for me. And now we're in the same directory. If I type here there we can see the web directory right here. To work with the data we need a place to store it and I'm going to create here a directory make their data and you can see it appearing here in Visual Studio Code as well. If you like more working with the user interface that's also possible at this point. So this is the folder we just created. Inside it we are going to create a new folder called raw. This is going to contain the raw information collected from the drawing app. I'm going to go inside it and paste all the data I've collected from students over the past few months. There are almost 500 submissions each with eight different drawings. It's really amazing. Now each of these files contain eight drawings and we're going to process these to create a data set where each sample is a drawing. I'll go back to the data directory here and create a new folder called data set. And inside of this data set folder I'm going to create two different folders. One will hold the JSON representation of our samples and the other one will be an image form. I want to visualize them as well. So a folder for JSON and a folder for image. We'll do this processing using node.js and I'm going to create a directory here using the terminal node and move into this directory using cd node. In this directory we'll create our first script. Let's call it datasetgenerator.js. It's a JavaScript file. And in it I'm going to define some constants in an object called constants. I will write down the paths of all the folders we have created so far. So we have the data directory. We have the raw data directory inside of the data directory. We have the data set directory also inside of the data directory. And we have the JSON directory in the data set directory this time. And the image directory also in the data set directory. And here in the data set directory I also want to create a file where we'll store a kind of summary for the samples. Let's call this samples. It's going to be in the data set directory and it's going to be a JSON file. Samples.json like this. We're going to create this file next. I'm going to use the file system. And let's read the file names from the raw data directory like this. Then I'm going to create an array for the samples where we're going to store information about each sample. And I will initialize an ID with one. I want to give IDs to each individual sample. Now for each file name, let's call it FN for short. I'm going to extract the content from the raw directory concatenating with the name of the file like so. So this contains a string of characters, one for each of the files in that raw directory. From this string I'll extract the session information, the student information, and the drawings. By first parsing the string into JSON and then using the destructuring assignment here. Now we'll have a sample for each drawing. So I'm going to write here for label in drawings. The label is going to be what that drawing was of. I will push to the samples array, a new object with the ID, the label. And now I'm going to call student name what we previously had student. And because student names are not necessarily unique, I'm going to also add here student ID session like this. The sessions were made to be unique. Now let's increment this ID and move on to write the samples array into our samples file. I go down here like this and write file sync in the samples, a JSON stringified version of this array. Now to test this code, we're going to go to the terminal and write node, datasetgenerator.js. And something happened. If we look here in the data and the dataset, there is a new file created called samples JSON. And it looks like this. If you want this to look nicer, you can install a Visual Studio Code extension for JSON. Then you can press something like Ctrl Shift J and it will format it like this. But you can see now some of my drawings are here. And there's an entry here for the car, an entry for the fish, an entry for the house, entry and so on, then other users follow as well. Now this is just a summary here. To generate the data associated with each sample, I'm going to go here before moving on to the next item and I'll write write file sync into the JSON directory. And the file will be named after the ID dot JSON. Simple enough, we just have to stringify here the drawing of that specific label. Let's save this and rerun our script here. In the terminal, you can simply press the up arrow and it will retype the last command. It took a little longer this time, but if we look inside of this JSON directory, you can see an entry for each sample. And if you click on it, you'll see all the data needed to make that drawing. It's the array containing all the paths and each path contains points. Now for each of these entries, I also want to have an image representation. They will go in this image directory where we draw these paths, essentially this. These are the paths. Let's rename this to paths. And I'm going to extract here, paths is the drawings of label. So let's write the new code for doing that. We'll make a function called generateImageFile and this will take the output file path in the image directory. It will be called the same, but the extension will be PNG. And here we add the paths as well. Now we'll implement this generateImageFile function here at the bottom. It will have two parameters, the output file and the paths. And to draw them, we'll use a canvas and we'll use the draw paths function from our web application from last time. This one right here. So we're going to go and write draw paths on the context, the paths. And we want to store these drawings as an image. So we can write here buffer, canvas to buffer, a PNG image, which we write in the output file like so. Now this won't work yet because this draw paths function needs to be referenced first. And we need to move it from here to a common place that both the web and our node scripts can access. So let's close these to avoid confusion. And here I'm going to create a new directory called common. And in it, I'm going to move this draw.js file from here. Now inside this file for it to work in node.js, we need to export this. And we do that by typing module exports is equal to draw. We want to export the entire object. And in our data set generator, all the way at the top, we have to include this file by typing draw is equal to require dot dot slash common slash draw.js. The dot dot here just means we go up a folder. So now the draw exists, but we still need the canvas. And unlike in the browser, here we have to install the canvas separately. We go to the terminal and type npm install canvas. And now the node package manager is installing the canvas on our system. Now I basically installed the library, even though I said no libraries in the core syndrome. To make things worse, we also use then we'll continue to use the math library. So I kind of lied, but the browser has the canvas there by default. And I think that we all take math and drawing functions for granted nowadays. And if you don't, sorry, but I definitely won't teach you how to draw lines or how mathematical functions are implemented. Not in this course at least. You can see here in the node directory, some new files appeared and even a new folder. This package Jason just says that the canvas was installed. This gives more details and this is where the binaries are. Now back in our data set generator at the top, we can require the create canvas function from the canvas module and use it to create a canvas of 400 by 400. The same size we had in our drawing app. We can also get a reference to the drawing context, same as in the web version. And here at the bottom, we're using it to draw the paths on the canvas and then storing it as an image, but it's the same canvas for each of the samples. So I'm going to have to clear it before drawing the new paths. Now let me open this to show you what happens next. I'm going to go in data, data set image and in the terminal, let's type node data set generator.js again. You can see it takes a while this time and here our image files are being generated and the number is growing and growing and growing. It takes longer than generating just the Jason files. And if I click this up here, you can actually see how they look like. Okay, the script is done, but now there's a problem because we broke our drawing app from last time down here where we have our draw.js included. We need to include it now from dot dot slash common, save the file and let's try to test. You can see here, it says module is not defined. And that's because in draw.js, the browser doesn't know what module is. We can fix this by typing if type of module is not undefined. Then we are going to do this, which will make the node code work. Now I refresh and everything still works as before. We'll be using this kind of structure throughout the course. Let's apply the same on the constants right here. We're going to separate them in another file. I will require the file here. It will be constants.js and we will take out the constants here like so. And now I'm going to cut all this code from here. It's in my clipboard, save this file. And in common, I will add a new file called constants.js. I paste everything from earlier and at the bottom, we need to type if the type of module is not undefined, we export the constants. Now if we go back to the terminal and rerun the script, it works. But it's slow and it would be nice to have a progress indicator here. I'll implement one in a new file called utils. Let's write here utils.js and we create the utils object. And in it, let's add the function called print progress with two parameters. It will have a count and a max value. In the count reaches max, it's 100%. Now we'll use process STD out to get the standard output. We clear the line and we move the cursor in the standard output back to zero. This will essentially delete the line in the console. Now we can calculate the percent. I'm going to implement the function for formatting a percent, just a second. So that will be format percent of the count divided by max. I just wanted to look like a percent value, you'll see. And then we'll write to the standard output like so. First, how many from the max are there? And then I'm going to concatenate with this percent in parenthesis like this. Now here I'm going to implement this format percent function which just takes a value and it's going to return the value multiplied by 100. And then I want to have a two decimal precision here. I will also concatenate the percent symbol. Now let's export this object down here. If type of module is not undefined, we add it to the exports. Save the file and let's include it next in our data set generation here at the top. Scopy this, utils, utils and here I'm going to write print progress according to this ID so the ID is going to be the count and then the maximum value is actually the number of files in that directory multiplied by eight because each file contains eight drawings. Let's save this, rerun the script and now we have a nice progress indicator here. Now keep in mind that these images are already here and what this script is doing is regenerating them over and over again every time we run it. I just did this to teach you about the file structure and the progress indicator. Okay, it's done. Now we're going to start building a web application that can display this data set. But before that we need to store it in a way that the browser can read it as well. I'm going to go here in common and create a folder called JS underscore objects. This folder is going to contain files we use to communicate between our node scripts and the web apps we're going to make. I'm going to go to constants JS and add here a path to this JS objects directory like this and here we're going to create the samples JavaScript file similar to the samples JSON but it will initialize a JavaScript object in it as well, you'll see. Inside of JS objects and it will be called samples dot JS this time. We write this file in data set generator at the bottom by copying this adding here samples underscore JS and instead of just stringifying here I'm going to add const samples is equal to and I will concatenate with this string and add here a semicolon at the end. Now let's rerun the script. We can see the file now here. It's essentially the same as samples but with this beginning part right here and a semicolon at the end. Now this kind of structure I'm using here is not really standard and I won't go into much detail as to why I chose it for this course but in short it's to avoid course I thought the web server is too much trouble and the VS code live server extension is buggy with some of the things we'll do later at least on my computer. But feel free to reorganize this if you have better ideas. We are going to create a viewer app for our data set here in the web part. Let's create a new file viewer dot HTML and begin to write some basic HTML. In the head I will include this meta tag for supporting UTF characters and the title will be data viewer. We'll also link the external style sheet from the same directory styles dot CSS. Let's close the head and in the body I'm going to start by writing the title in an H1 tag data viewer and let's have here a div with an idea of container. This will contain the data set. Let's close this div and include our samples JavaScript file we just created. We're going to need a few other JavaScript files from common. I'm just going to copy this two times and here I'm going to include constants JS and here utils. Now here I'm going to open a script tag and let's just log those samples to see if we have them in the browser. Close the body and the HTML tag. I'm going to save this and open this viewer HTML and there they are in the console. Our samples all 3968 of them. We can investigate them here. What they are. Who made them and so on. Next I want to create a table with these samples so that each row are all the samples from one specific user. I'm going to have to group these by the student ID. Let's go to this utils JS and implement a group by function there. Here at the bottom let's type utils group by a given array and a given key. It's going to be a general function that can group by any key. In our case we'll use student ID. Let's initialize the groups as an empty object and for each object of object array I'm going to get the value of that key. If this groups of that value is null in our case this means if the student ID was just identified we don't have any drawings from them yet we initialize it with an empty array and then we push to that student ID the whole object and return these groups. Now in viewer HTML let's remove this log and first get the groups by grouping by the student ID and let's log these ones now to see how it works. Save the file refresh and you can see the output is a little bit different this time. If we open it up I can see for each student ID I have eight different samples. The carfish house tree bicycle. So these are per user Miguel Fernandez in this case. Let's create the table next. So I'm going to loop given each student ID like this I'm going to refer to the samples as the groups of this student ID and let's take out the student name as well for clarity. I just take it from the first sample student name like so and let's create a function called create row that on a given container in this case our different above it will display a row with the name the student name on the left and then the samples one after the other towards the right. Let's close this for loop and we'll use this create role function in other applications as well. So I'm going to write it in a different file it's going to be a file called display js in our JavaScript directory here. So let's create it display js and the create row function goes in here like this with the three parameters and first we create a row which is going to be a simple div. We create the div and let's add the styles for these as well. We'll have to modify the CSS afterwards. I'm going to add a class for this row. Let's call it row and append to the container this div like so. I will continue now with the label which is also going to be div and I'm going to set the HTML of this div to be the student name the second parameter. Now also a class for the label. Let's call this row label which we can control in CSS and append to the row the row label. So our div from previously now has the label to begin with. And below we're going to loop through the samples and add each of them independently. For example of samples I will take out here just the ID and the label of the samples. They are really all that we need. And I'm going to create an image element next like so and I will assign the source of this image to be the path of this image file. I'll take that from our constants, image directory and concatenate with the ID dot png. That's where the file is. I'm also going to add a style for this image. Let's call this thumb. It will be a thumbnail size and append each of these images to our row. Let's close this create row function, save the file and if we refresh the page we can see something here. It's Radu and then some drawings here and then another Radu I actually drew twice. And Yasin and so on but these don't look good without the CSS. I'm going to go at the bottom here and add the row using display flex and the thumbnail will have a width of 10% and the row label will have a little bit wider width so that we can see the name of the person. Let's save this refresh. Wow, it looks like the Finnish Prime Minister is here. Hello, I don't believe it's actually her. But who knows, in this day and age everyone learns coding. Okay, but we can do better. Long names like this don't really fit in. And also the names here don't really align with the center of the samples here. I'm going to fix that by writing here align items to center. And for the row label, I'm going to add ellipsees. So text overflow set to ellipsees, I'm going to hide the overflow and no wrap for the whitespace. Let's save this refresh, sorry Gustavo, but it's okay when we watch this on the larger screen, all the text will appear there. Now I want these drawings to have a white background each. So I'm going to go to display JS here and wrap them inside of a div, a sample container. And I'm going to add an ID to this div, we'll need it later, it's going to be sample underscore and the ID of the sample, it's going to uniquely identify it. Also add a class to this, let's call it sample container. And I'm also going to add the label, the true label of this object, what it is. So it's going to be another div, where we set the inner HTML to the label. And we append the label to this sample container first, then here after we have the image, I'm also going to append it to the sample container. And below here where we append to the row, we don't append the image anymore, we append the sample container. Okay, save the file, refresh the page and it's worse than before. But that's okay, we'll fix it in CSS. I'm going to implement the class for the sample container first, we'll give it a white background, text aligned center, order radius 10%, I want it to have round corners and margin one pixel. Now I'll explain this flex part a little bit later. Let me also modify now this thumbnail here to be 100% this time. And the row label will have this property here, let's save this refresh. And what happens now is that horizontally, each of these eight different samples are taking 10% of the space and 20% of the space is taken by this label on the left. That's what this two here is for. And the other ones are just one. So that's why this one here is for you can see that this bicycle doesn't fit very well here. And if I'm going to zoom in, others have issues as well. It's because they try to fit in the text here. I'm using a very small screen here where testing, so this is not really a big deal. This will be a desktop application, not a mobile app. But we can do a quick fix here by typing an overflow of hidden to the sample container, save the file, refresh the page. And now the structure is good, but the text doesn't fit entirely. But it's not a big deal, and a normal zoom level and a larger screen is just fine. Now before we look a bit through these drawings, I have to talk a bit about flagged users. You see, some people made some drawings that I think might get me in trouble. I'm going to go to utils.js and add here who they are, flagged users. And I'm going to paste here an array I created earlier that looks like this. These are their IDs. Now in display.js, I'm going to destructure here to take out the student ID as well. And then here where we have that image, I'm going to write if the flagged users includes this student ID, I will add the class to blur it. In CSS, we'll define this blur class where we just apply the blur filter with five pixels like this. Let's save this, refresh the page, and now we can have a look at the date. Let's make it bigger. Okay, my two drawings are here that I did in the beginning. And then wow. This might be both from free code camp. He also commented on YouTube using emojis. I got an idea from that. We'll see. Another me, awesome. Wow, this one is really nice. Oh my. This house is amazing. On text here, this is going to be tough. Wow, some of these drawings are... How long did you spend on this? This is funny. I think some of you thought that it says horse instead of house. I've noticed some of these problems. No big deal, it makes the data more challenging. Wow, this is a really nice car. Giving me mafia vibes. Okay, this is something. We'll have to talk about the data cleaning at some point and how it improves the models. Wow. These are amazing. These are amazing. Look at that. You know who you are. Wow, very different looking tree here. Wow. And the Christmas tree, oh, this is fitting. It's soon Christmas when I'm filming this. And a couple of more horses here where the house should be. I'm quite sure these are honest mistakes. Look at this, wow, basic coder, but not so basic drawings. I don't know why I'm thinking about Magikarp now. Jin-Zon-Ning and Zhong-Nanago. That's like ‑‑ As you can see, some of these drawings are really detailed and must have taken a really long time to draw. That makes our data slightly different than the one from Quickdraw. They have a time limit. We also have an undo button to fix mistakes, so our drawings should be better on average. Think you can organize or style the page in a better way? Share your version with me and I'll showcase my favorites in a future video. We're going to implement functions to extract features from these samples in a file in common. We're going to call this file features.js and we'll add these functions to an object called features like so. First function will be to get the path count from the given paths. It's a really simple one. We just return here how many paths there are by outputting the length of the array. And another simple function will be to get the point count from the given path. To do this, we're going to flatten the array of paths first. This just converts the array of points into one big array of points with all of them, and then we return points.length as before. We need to export this since we're working with node. But because we're going to use this in the web applications as well, I'm going to check if module exists before that. Let's save this file and move to the node directory where we're going to create our feature extraction script. Let's create a new file here called feature extractor.js and begin by including the constants from common constants.js. And I'm going to include also the features file that we just created. Now I'm going to use file system and begin by reading the samples. I'll do this in one line here by parsing whatever is inside the samples file. And then to extract the features, we are going to loop through all of the samples. And from each sample, we will get the paths by parsing whatever is inside the JSON file containing the paths identified by the sample ID dot Jason. I'm going to create the data point now consisting of two feature values, the path count and the point count. But because we can have many features, I'm going to use an array here, then we don't need to worry about X, Y, Z and so on. So I'm going to type sample dot point. I'm going to add this information to the sample and say an array with get path count from the paths and get point count from the paths like so. Let me close this for loop. And before we can write this to a features file, I'm going to also mention here what are the names we want these features to have path count point count. And now I'm going to write all this information in a new file. We have to implement this constant soon and I'm going to stringify a new object with the feature names and the samples. I just combine both of them in one file. Let's save this and define this constant next. So I'm going to constants and under these samples here, I'm going to write features also in the data set directory, another file called features dot JSON. Let's save this file, go to our terminal and type node feature extractor dot JS. And this one ends quite quickly. I won't need the progress indicator for it, but I will go up here and write the log saying extracting features so I know it's happening. And at the bottom, I will just output done. Now let's have a look in our data data set features to see how they look like. We can see the feature names as the first attribute and the samples, which is pretty much the same as samples JSON, but also the point value here as well. Now strictly speaking, a feature file wouldn't contain things like this. They are kind of extra, maybe this idea as well. And that's okay. We can exclude them if we really needed. We can type here samples, remap them, given a sample as we will remap it to be an object containing just the point and just the label. Like this. If I rerun the script, we have just this now. But for convenience, I'm also going to keep the previous version with all the data combined. It will help when building the web applications. I'm going to go to constants and here at the end, I will define a new file path for another JavaScript file, which will hold this information again as a JavaScript object. In the JavaScript file, let's save this. And in our feature extractor, at the end, I'm going to write in this new file features underscore JS, I'll start the template literal, so I'm using a back tick. In this way, I can write on multiple lines and I can also include the value coming from the feature names and the samples as JSON. So here, I'm including everything, not just the point and the label. Let me close this and the back tick and this function call, I'll save the file, rerun the script and now in JavaScript objects here, we can see features JS as well, which is the same thing from previously, but assigned to a features object. We'll update the web application first for viewing the data, our web viewer. And the first thing we do here is say that we want to use this features JS file instead of the samples JS. And since it has everything as the samples JS and more, we can take out the samples from there and our code here will still work. We can also take out the feature names now. From the features, global value and features JS, I'm going to save this. And if we refresh this, the code still works. But now if I'm going to open the console here and type feature names, they are here. And if I'm going to look at the samples, they contain some additional information like the feature values here in the point. These points are what we're going to visualize next using a Google chart. And yeah, I know I said no libraries, but this is a temporary thing, you'll see. We'll display the chart in a new div called chart container. And Google charts expect first here some options. I'm going to start to define these options in an object. The width, for example, 400 pixels, the height. Now the horizontal axis title will be coming from feature names of zero. And the vertical axis title comes from feature names of one. I'm going to remove the legend because I think it takes too much space and my screen size here is limited. And I will load the most recent version of the core chart package like this. Now this load expects a callback. It's not going to happen instantly. And we can write that here. Like so. And we can generate the data in a form that Google likes. So let's put here a new data table with two columns. One for the first feature value, a number, and we'll write here the feature names of zero. And another column will hold another number for feature names of one. These are not the actual values. The values go here in add rows where we take all the samples and we remap them to keep just the point value. This will work. We can create the chart now using the Google visualization scatter chart. Where we call draw with the data and the options right here. Now if I'm going to save this and refresh the page, it takes a while, but eventually we have it here. And wow, we can see now some drawings have like 200 paths and 15,000 points. This is amazing. I don't think mine have something like that. I'm really curious to see which these are from the data. And we'll get to that. But first, let's see if we can zoom into this place. I want to investigate even deeper and by default, this won't let me unless I go here and add a few more options like explorer actions dragged to zoom, if I refresh, I can now zoom in by drawing this kind of region with the mouse. To zoom out, you need to add something more. It will happen on right click and we can do that here, right click reset. As far as I know, it's the only way possible. And once you zoom in, you can now zoom back to the very beginning, which is a little bit inconvenient because maybe you want to zoom out only a little bit, but you can't. And another problem is I can't zoom in deep enough. I need to pass something more here, max zoom in value of 0.01. This will be enough for our needs. And now I can zoom in very deep like so. We can also use different colors, one for each class. I'm going to go here and show you how to do that. We are going to add another column, which is going to be used for styling. It's going to be a string with the role of styling. But because we have now three columns here, our point has just two values. We need to do something about it. So I'm going to generate here a new array and then I go inside it. And first thing I do is spread the point value of the sample into two different values. And then the third value will be a style that we will need to define. I'll define these in utils and one for each label. Let's save this, go to utils.js in common and implement the styles here at the top. Let's say the car should degrade, the fish, red, house, yellow, the tree, green, bicycle cyan, guitar, blue, pencil, magenta, and a clock, light gray. These are all primary colors and then dark gray, light gray, this should be okay. Let's save this refresh. And now you see these different colors right here, wow. But one thing I really wanted to have here is transparency. It gives an idea of the density in different parts. Unfortunately, this wasn't as easy as I thought. You can't just put here colors with alpha values, it won't work. But what did work is to use a slightly different version of the Google chart, a different library which they call the materials charts, which supports transparency as well. To do that, you have to replace here core chart with scatter in our case, or you can add that one as well if you want to support both. And this bottom part right here will be a little bit different. Let me just comment it out for reference so you can see what changes. And here you type chart, Google charts, scatter this time so it's a little bit different. And here we draw the data and now we need to convert these options in a new format. Luckily, Google has an option for this convert options from options to this new form that the new version of the charts expects. Now we save this refresh and it looks a little bit different and transparency is here. But unfortunately, a lot of things are not. I can't zoom in anymore, I don't have the colors there and there is no support for this. So I can either choose one chart variant or another one and I prefer the previous one actually. To get the best of both worlds and more, we'll use a different chart here, the one I've implemented. If you want to see how it's made, check out the video on my channel. Otherwise, you can skip that part. We will edit the code a bit later, but they are minor changes and you can handle it I think. Now get the chart code from the link in the description and let's add it to the project. I'm going to go here in the web and create a new folder, chart and here I'm going to add the chart code. We'll only need the JavaScript files here so I'm going to remove these other ones. So they are these files right here. Now we can remove the Google chart from here and replace it with our own script from Chart JS and we'll need to add the other two ones as well. The one for the chart graphics and the one for the math. Now here are the options I tried to simplify a little bit, all my charts are usually square in aspect ratio so I just put here size and the other ones I replaced with axis labels, just the feature names directly and then the styles I pass those from utils like so. And this part we can remove and define our chart using chart is equal to new chart on the chart container showing the samples with these options like so. I'm going to save this, refresh the page and this is our version. We can zoom in, we can drag. The transparency is not exactly what I wanted so I'm going to pass here to the options, this transparency and let's set it to a low value. It looks like there are a lot of points here overlapping and I need a low transparency to be able to see them, refresh and the data is really, really dense here. You can clearly see that now but we can do more stylistic changes. I'm going to go to utils and change these colors into an object. I'm going to add here for each of them a color attribute and that's because here we can also add a text field where I'm going to add emojis one by one. Now before we try this text let's see how the colors look and I think because the contrast is not so strong with the colors I want to set the transparency to 0.7. And to get the text appearing there the emojis we can type here icon text. So now we get these emojis here instead of the colors which are quite good because it means that we can see what these points represent without having a need for labels on the side. Labels make your eyes go always to the side to see what something is and colors on their own are problematic to many people who have trouble distinguishing them. I think this is a good idea but we could combine the colors with these shapes to make something more interesting so that's why there is support here also for the image. Where we generate the images for these different objects according to this text right here and the specified color I will generate these images by typing graphics generate images for these styles defined in utils. Now I refresh and you may see nothing here at first but if you're going to drag your mouse over this they will appear and now the colors of these different shapes are depending on the colors specified in utils right here. Now our chart supports another parameter here a callback function for what to do when selecting an item we'll use this to identify it in the table below. Let's call this function handle click and we'll implement it in display.js. At the bottom here the handle click function given a sample it's going to get the element by ID of sample and the sample ID with an underscore in between so this is how we defined these IDs to be up here and then it's going to add a class to this item called emphasize. Let's save this file and in style CSS I'm going to implement this emphasize class where I just set the background color to yellow. Let's save everything refresh and now when we click on something like this car the item should be also emphasized somewhere in this very long list. Aha it took a while but here it is so this is the car with the most number of points cloud use car. I see it's because of all this shading cloud you really took drawing seriously here but let's make it next so we don't have to scroll through this list it's going to work automatically. We go back to display.js and after we emphasize we write here scroll interview for that element with an auto behavior you can also try smooth but I like things to happen fast and block center will make it scroll to be in the middle of the page. Let's close this save the file refresh and now if we press the car we immediately jump to cloud use car and if I'm going to go back up let's check this house and this is amazing cloud you has both items with the most number of points and we also see a problem here we shouldn't have two emphasized items anymore let's fix this. I'm going to take all the emphasized items in an array like this and then for each item in this array for each element e we are going to remove the emphasized class let's save refresh now I click the car now I go up and I click the house works but I don't want to go up all the time I need to select something new let's have the chart on the side and the other content on the left and the chart will not move on scroll I'll first go to style CSS and make our container to have a width calculated from the 100% minus the width of the chart of 400% like this make sure you have these spaces here they're important I'm going to save this refresh and if I'm going to zoom out a bit you can see the space here this is where the chart is going to be back to CSS we can define here the style for chart container position will be fixed right will be zero top will be 50% and this will just make its top part to be in the middle I want it to be in the middle so we translate nothing horizontally but by minus 50% of its height vertically let's save this refresh and now if I'm going to press these items I don't need to scroll back to the chart anymore I would also like to have selection happen the other way around like clicking on these items to do that we go to display JS at the top here and add our click handler also to the sample container like so and if we refresh and click this fish it selects it but it also moves it here in the middle of the screen and I don't want this behavior when I'm clicking in the table itself I do want it for the chart but not here so let's specify here a parameter set to false it's going to be do scroll true by default but only if true it's going to do this scrolling to view section here let's save refresh and now clicking on these things emphasizes them as well but not yet on the chart for that we need to go back here and say chart select sample sample we can save this refresh and now we can see where this fish is in the feature space this is a nice fish by the way wow and James drew a Christmas tree I didn't think about doing something like that now there's actually one problem here that causes an error if I'm gonna click here in the empty space to select nothing this handle click now breaks because sample is null so we need to take care of this here at the top I'm going to do a quick check if sample is null then let's just de-emphasize everything just in case and I'm going to return this breaks because we are trying to access its ID here we can also de-select the sample by clicking on it again so we can check here if the element is already emphasized the one that we clicked on no matter in the chart or in the list we are going to remove the emphasize class from it we're also going to tell the chart to select nothing if we clicked on this from the list on the left and then we return we don't want to disturb the functionality when we're actually selecting something here which still needs this de-emphasize code from previously like so let's save this refresh and now if we click here no error if I'm gonna click on the house it's going to select it if I'm gonna click outside here it will de-select it both from here and from the list if I'll click this bicycle it will select it both here and there if I'm gonna click on it again it will de-select it and the same de-selection happens if we click on it again here on the chart so this is now the full chart functionality done let's test bigger I think we can have a little bit larger chart here so in viewer HTML I'm going to set this to maybe 500 save style CSS we also have to subtract here 500 and save let's refresh yeah I like this let's see what this tree is here it has so many paths okay Daniel here drew it so that it has many many many branches 200 something branches wow actually I think there's detail also here really nice let's see what else is here wow okay Nikon has these drawings here with a lot of content this will be tough wow such a nice car here so much detail I meant must have spent a lot of time on this Erminio's bicycle also has many points same kind of style here with the shading actually Nikon seems to use it too having a fast responsive system like this is a huge help when you inspect your data otherwise you'd have to do a lot of manual work like maybe get the ID somehow and look for it in the files it's slow and mistakes can happen try working smart all the time and avoid silly mistakes like that can you extract two other features from the samples and show them on the chart if you need inspiration you can watch my older video here it also teaches how to extract features from the pixels something will do a bit later now when you're done look at the structure this is any better useful features should group together samples of the same type share your code and screenshots with me and I'll include the best ones in a future video let's organize these imports here in viewer HTML we have at the top this features object containing information about the samples and their features now here we have the common code with the node js scripts the display part right here and these ones below are for the chart let's have the new script where we include the sketchpad for the sketchpad to work we also need to include the draw js file from common and we'll use the sketchpad as an input to draw something to be classified I'm gonna go up here and prepare a container for it let's say input container like so and at the very bottom I'm going to initialize a sketchpad on the input container and save the file let's refresh and there is our sketchpad right here and it works this is the benefit of working with components now let's add some styles for it I wanted to float this well over the data next to the chart in style CSS I'm gonna go at the bottom and define here a style for input container position fixed I'm gonna give it 500 pixels from the right that's how large the chart is and 50% from the top and we translate again not on the horizontal axis but by minus 50% vertically so it's centered on the screen to get that undo button in the middle I'll set the text align to center and let's give it also a margin of 10 pixels so it doesn't exactly touch the chart let's save this refresh and now it's here and when we scroll it stays in place but it's blocking our data I want to be able to hide it as well so in viewer HTML we'll need to add a control panel it's gonna be a div right here and the button inside it on click will toggle the input visible or not let's give it a name and this toggle input function I'm gonna implement in display JS at the bottom right here we will write toggle input if the input container display is none then we need to set it to be visible I'll set it to block otherwise I will set it to none let's save this refresh and the button is here and it works but let's put it up here over the chart there's a lot of empty space there in style CSS I'll add a style for our control panel position it to fixed write 0 top 0 I'm gonna give it the same width as the chart just so it aligns well and then text align to center so it's gonna be in the middle of that 500 pixel area and let's have it with the padding of 10 pixels refresh and it's right here now I want when this input is here to somehow hide what's in the background and I found a funny way to do this in viewer HTML below the sketchpad right here I'm going to give the canvas of the sketchpad style using CSS text I'm gonna append to its existing style it already has some things there an outline 10 pixels solid red for now and it looks like this but I'm gonna make it really big so really really big like this big and refresh you can see it's hiding everything there and if we change the color to transparent black it's gonna look like this and I think it's pretty nice it's only like that when it's on now when we draw something here I want to immediately show the features on the chart so we'll need that information here on update let's open the sketchpad and add here a second parameter on update which can be null let's store this callback function here as an attribute and in our redraw method at the very end here I'm going to check if this update callback exists and call it with all the paths let's save this and now in viewer HTML we can pass here a function on drawing update which I'm gonna implement right here what to do when the drawing updates with the paths and what we have to do is extract the features in the same way we did in the feature extractor here a node feature extractor we have this part right here where we're doing it let's paste it here and let's just define this as a constant and log it right here now if we save the file and refresh the page we're going to get an error here features get path count is not a function and the problem is that we have two objects called features this one here should be the one in common here it has the feature functions but it conflicts with the one from here in JS objects we also call features these samples that have also the feature values maybe this is not the best naming convention here but I think we can actually rename this one to feature functions everywhere so let's do it feature functions feature functions feature functions I will also rename the file to feature functions as well and we have to fix it in the feature extractor feature functions a rename symbol feature functions save the file and finally in viewer HTML let's separately include now feature functions and replace this as well save the file refresh the page and you can see here 0 0 because the sketchpad is redrawing itself on the beginning and the paths are empty so the path count is 0 and the point count is also 0 but if I'm going to start to draw something here you can see the first number is the path count and the second number is the point count how many points are being drawn if I'm going to start another path I have two and the point count is still going up and so on it also updates on undo we're back to 0 0 now let's close some of these files right here there are too many open right now and these folders right here let's show this point on the chart next and it's going to be a dynamic point that moves as we are drawing this is not a standard component charts have but we can add it to our own since we built it let's remove this log and say show dynamic point at this point location let's save this and in chart JS I'm going to go just below the constructor and add here show dynamic point as a public method given a point let's set an attribute to this value and redraw now this dynamic point is going to be here an attribute of the chart class initially it's going to be no and to draw it will go in this draw method at the end just before drawing the axes and if we have one of these dynamic points we'll first have to get the pixel location of this data point so we remap from the data bounds that's how the features are represented to the pixel bounds and here we can draw this point using our draw point function from the graphics file it's gonna be black for now let's save this refresh and there should be something at 0 0 but this chart starts now at 1 and 10 because that's apparently the smallest value for the path count and the point count we need to have at least one path to make it a drawing so if we drag this we should be able to see this dot right here that should be moving as we are drawing something so when I'm gonna start to draw it should immediately jump to this one location for the path count and then as the path grows it's gonna go up in this direction let's see yes and now if I'm gonna stop this path and create a new one it's gonna teleport to this right column here and continue going up from there but it's not very visible here and we can make it better you see before drawing the point in black here I'm going to draw another point the same location and it's gonna be a transparent white point with a very very very big size let's put here 10 million I basically want when I refresh to see this point on top of this faded background it's much more visible and the value needs to be so big because the chart can be zoomed in and I want that if you zoom in like this and then you go to some extreme corner this region doesn't run out so let's see now how it looks like when we are drawing much clearer what is happening there now if I'm gonna press toggle input it's still here and I want it gone above where we have our show dynamic input I'm just gonna copy it and right here hide dynamic input where we're gonna put dynamic point to null and we don't need this parameter here let's save this and in display JS when the input container is made invisible we also hide the dynamic point safe refresh and now when we press toggle input the data here becomes clear because that faded region goes away but I wanted to come back when I press the toggle input now it does appear again when we start drawing so if we can get this sketchpad to trigger an update it's gonna work let's save this and in sketchpad JS I'm gonna go here and implement the trigger update method where I just put this code from here like so and since we have now this method I'm going to write here trigger update so the logic doesn't change let's save this refresh look at that point creeping in here very very small movement there's a long way to go to 15,000 points yeah but it works now before we start to talk about classification we would need this data to look like something to have a structure and these features are just not very good it looks really mixed in my opinion so let's try to compute some other features I'm gonna go to common feature functions and let's calculate the width of the drawing for that I'm going to write a function called get width given the paths we take out the points as we did in the get point count function and now I'm going to focus just on the X coordinate by taking the zeroed value of the point to get the width of the drawing we'll need the minimum and the maximum the minimum can be found using the min function of the math library if we spread out this X array and the same thing for the maximum like so and the width will just be the maximum minus the minimum like this and since we have the width we can also get the height in a similar way they are really easy to understand features and usually they work well let me just copy this and this will be get height with the y here taking out p of 1 and the y here and here now we're going to need to extract these new features so we'll have to go to node feature extractor and change this from the path count and point count to the width and height also these names here need to be changed in viewer HTML we also need to change this because we need to change in so many places we need to find a better way to restructure this so that we don't get confused what we use for extracting what we use for displaying they should just point to the same resource so going back in feature functions I'm going to write here an object called in use which are the functions that will be in use and I will also add their names here how I want them to be referred to as so the name width and the function will be feature functions get width the name height feature functions get height notice we are not calling these functions here we are just passing them into this object now let's save this and in feature extractor js we can now take out those functions by looking at the feature functions in use and only focusing on the function from there and to calculate this sample point we can just do map for each of these functions we call the function on the paths like this now we don't need this code from here they do exactly the same but now with the new functions we have in use the code is also more general in the sense that you can pass any number of features making this a multi-dimensional point but I'll teach you about those later now let's save this file and regenerate the features in my terminal in the node directory I will type node feature extractor it will work also without the js and now in our dataset features we should see new values for the width and height and I think they're proper since our canvas is 400 pixels oh but the feature names are not good we have to handle these as well we go to feature extractor js and here for the names we just replace this with feature functions in use and we take out the name from there save the file regenerate the features I just press up and enter here and now the feature names are also correct to get them to appear on the website as well I'm gonna go to feature extractor js and just copy these two lines of code in viewer js right here this will be just point and we can remove this code we don't need to worry about the feature names because we just read them from this file right here now let's save this refresh and there they are now our node scripts and the web page are in sync and something is weird here isn't it look at this one drawing this one house it has a width of 900 something how can that be our canvas for drawing on the sketchpad was 400 maximum what is this place let's hide this input and uh-huh interesting erminio looks like this horizontal line somehow might extend out of the canvas it's possible that on some devices maybe if you make a quick line like that it still records the point I don't know but interesting and common I have to say in general when you collect data you have to expect outliers like this and I think there are more problems look this height also goes beyond the 400 which I expected to be here let's have a look at this highest tree here okay maybe let's not look at that tree one next to it okay so this is also above 400 of course when I'm drawing them here I'm drawing them on a 400 by 400 canvas but it looks like it might continue downwards like this it's strange maybe some of you can help me figure out what's going on there seems to be a clear border here and I'm quite sure that this is the 400 mark because I guess on most devices the app should work yeah it's exactly on the 400 mark here and the height maximum height also I believe it's here yeah but it's actually very common for data to have problems in fact I'd be surprised that there were none I've seen things like this all the time like when working with location based data the most common issue there was to have coordinates switched but only in parts of the data another thing was that people use positive values in the southern hemisphere maybe that's where they live and they had a different convention anyway we have a few problematic samples and it's great we'll get to see how they affect look at all these pencils they're here because many people just draw pencils like a vertical line kind of thing which is nice and I'm quite sure that these here must be pencils drawn horizontally like that but there's even more structure here look at all these clocks in a line here and I'm quite sure this happens because many people draw clocks to have a square aspect ratio where the width is equal to the height guitars are also here probably because people draw them kind of like the pencil a vertical line but a little bit wide because of this part so they are similar to the pencil in a sense from their aspect ratio but a little bit wider a lot of trees appearing here as well nice we can definitely work with this let's see if our input still works for some reason that transparent overlay doesn't appear here it appears when we're drawing but not before let's debug real quick I'm going to log here our point save the file refresh the page it's minus infinity minus infinity because you can't calculate the width and height of nothing it's fine it can be like this not a problem now the most basic way of doing classification to figure out what this drawing is is to extract its features and then have a look in the feature space what's nearby in this case it's surrounded with trees so maybe we just classify it as the same thing that is nearby it that we learned from the data so we need to find the nearest sample relative to this point and we already have a function for that when we built the chart in math.js we have here the get nearest used to emphasize and select things on the chart and it also uses this distance function which is just the length of the hypotenuse in a triangle using the Pythagorean theorem so these are the two things that we need and instead of making this code modular in a way that the chart and what we do next will use the same I'm going to have a copy of it this time it's because I want the chart to be its own component and have these functions here so let's copy these in our utils at the end right here this math is going to be utils and we have to use it here as well and here as well otherwise I think it's the same let's have a quick look and remember how these work the get nearest function given a location and the set of points begins by initializing a minimum distance with the maximum possible number and then it will return an index the index of this point not the point itself we then loop through all of the points take out the point like this and calculate the distance from our given location to this point store it in D if this is less than the minimum distance we update this minimum distance because we found a better one and we keep a note of the index finally we return the index quite easy just the minimum search pretty much let's save this go back to our viewer and here we can remove this logging and instead of that attempt to classify this to get the label from the point what is this drawing and let's log this label our classify function I'm going to write it below right here for now and first we need to take out just the points from our samples so I'm going to use map from each sample the point now let's get the nearest the index using our get nearest function from utils like this and we can get the nearest sample from the samples at this index like so and we return the label from the sample let's save refresh and it already says car here for some reason let's try to draw something maybe a pencil because that's probably going to work yeah I'm really great at drawing a pencil but it seems to work it says it's a pencil because our point is here next to these other pencils and probably one of these is the closest I think this one we'll improve this chart to show the nearest neighbor soon but first let's display this somewhere on screen I'm going to make a place for it here inside the input container actually I'm going to type an ID predicted label container and let's close this ID I'm going to give it a quick style in CSS I want it to be white and relatively large relatively positioned let's save this and go back to viewer HTML at the bottom here instead of logging it in the console we are going to tell our predicted label container to set its inner HTML to a text it will be like is it concatenating with the label question mark make it a bit more funny refresh is it a car it says car when nothing is drawn but if we draw something it says now a house this is not really a house I wanted to draw a clock but okay now it's a clock let's undo back to the house here you can see it's quite sensitive these features are definitely not enough to make it reliable I mean I can do here anything I want and it's not going to change the classification or the aspect ratio this object can be anything depending on what I draw inside it just won't have a chance to classify it using these two features but it's a start and it kind of works try it out a fish seems to work because there's another fish next to this one let's update the chart so it looks nicer next I'm going to pass here to show dynamic point also the label and in chart js here and this dynamic point will have both pieces of information now it will have the point and the label so in this draw method we have to take out here the point and the label separately from this dynamic point and I'm going to replace here with point like this and instead of a black dot here I'm going to draw it using an image using the predicted label so we take the style from the label the image from that and place it at the pixel location with the draw image function from our graphics toolkit let's save this refresh and now you can see the point changes its appearance depending on what we are drawing here apparently a fish let's make a line connecting to the nearest sample here we could figure out what the nearest sample is here again but we already have it here so I'm going to return here the label and then also the nearest sample a little bit redundant but it's okay and we need to make sure that here where we call classify we take out both the label and the nearest sample like this and here in show dynamic point we can pass the nearest sample as well in chart.js we can go up in the constructor and specify a field for this one the nearest sample set no and show dynamic point is gonna have it here as well where we can set it to the nearest sample attribute we just created now inside of this draw function in between these two we can type a new path we don't have a function for drawing a line we just have to move to with the context to the pixel location and now we need to go to the nearest samples point but in the pixel space so we need to line to remap the point from the data bounds to the pixel bounds using the point from the nearest sample and let's stroke this is not really a standard component and we're probably gonna change things soon anyway so I'm not worried about the code too much let's save this refresh let's draw a card this time it's uh not the bicycle and not the fish but let's see why it's a fish it's a fish because it's mapped to this fish right here if it would have just a little bit more width I think it's going to go next to this car right here let's see let's try to draw very small yeah it's a car but it's going to turn to a house if we draw a little bit more let's see okay it's the house now it's this fish let's try to get it to that car above there increase its height a little bit okay now let's get closer to this fish something's wrong can you see it let me make it bigger look at this it claims this fish is the nearest sample it can't be this car is definitely closer even this fish there's nothing wrong with our nearest neighbor search our chart is just lying to us bad chart it's because it's displaying the data to fit in this square aspect ratio here the width is squished like this and it gives the illusion that the car here is closer but it's not not in the data space and this is where people start to get confused they look at things like this and they don't understand why their machine learning methods work differently let's make our charts not squish or stretch the data and you'll understand better here where we get the minimum and maximum x and y i'm going to calculate a delta on x as the difference a delta on y as the difference and i'm going to store the maximum of these two and we'll recompute these maximum values here for the right will be minx plus delta and for the top will be miny plus delta like this let's save this refresh and now our chart looks like this with a lot of empty space here because the maximum delta the one from the width is applied on the height as well and that's why most charts don't do this we don't like empty space in general so stretching is default functionality here but it's very confusing if you don't normalize your data now if i bring my drawing back here and zoom in we can see that this fish is indeed the closest now that the squishing doesn't happen anymore but some kind of squishing or stretching is usually applied on the data and to understand why let's try to visualize the path count and point count from previously i'm going to go to feature functions here let's comment these out and type here name path count feature functions get path count and the second one point count feature functions get point count save this in our terminal pre-extract the features and look at that all our data is essentially on one line this path count feature isn't treated equally at all we basically have just one dimensional data here let's zoom in and see this better and let's try drawing something this is now our zero zero point it's classified as a fish according to these features now and if i'm going to start drawing i can see going up very quickly increasing the point count and now a little bit to the right when the path count increased but to draw anything you really generate a lot of points relative to the path count and if we visualize the data as we did before i'm going to keep this other variant for reference you will see that this path count has almost no effect on the classification let's save refresh and let's zoom in here to see what happens you can see these columns appearing again when zooming because of the squishing now if i draw something i'm going to jump to this column and now the point goes up see how those lines appear claiming that is the nearest sample instead of this it is the nearest sample if you look at the range of this x-axis compared to the y-axis and as you are drawing you constantly get those lines to different columns there because the path count just doesn't matter but we want it to matter in general we want to even the playing round we want each of these features to have the same kind of significance and that's why we need to do data scaling to make these two features comparable the width and height we calculated today are actually those of the bounding box but some of the drawings are tilted like this and i think we'd get a better result if we calculate them some other way think you can do it try also visualizing these rotated bounding boxes in that way you know your code is good share screenshots and your code with me and i'll include my favorite version in a future video data scaling is a step we do to give features the same importance when doing classification now maybe we don't want exactly that and we'll talk about adding weights to features later but leveling the playing field is something you need to know to do anyways and there are two techniques people commonly use for this the one i'll focus on is called normalization where the values are remapped to be between zero and one we'll normalize the data here after we have the feature values and we do that by calling a function we'll need to implement the new tills normalized points that takes the sample points so from the samples we remap them to extract just a point like this now let's save this and go to utils in common and implement at the end of it the new normalized points function given the points this function will change their values to be between zero and one for that we need to have here the minimum and maximum on each of the dimensions each feature corresponds to one dimension and until now we've been using just two dimensions but later i'll teach you how to use more so let's prepare this function to be general enough we'll initialize min and max to be the first point values i'm creating a new array here with the values so they don't accidentally get changed later and same for the max in the beginning they are just whatever are the values of the first point but then we will loop through the remaining points starting at one and now we start to loop through the dimensions let's extract these dimensions up here so it's clearer how many there are they are just going to be equal to the length of the first point for example all of them should be the same now back down here we can loop with another variable j through each of these dimensions and update min of j to be the minimum between the previous min of j and whatever the point of i and j is same thing for the max but using the maximum function from the math library and now we have our min and max with them we transform our data next we iterate through all the points starting at zero this time and through all dimensions as before and now our points of i and j will be modified to be between zero and one by just subtracting the minimum value and then dividing by the difference but this is exactly what the inverse slurp function is doing from our chart component i will call it here and implement a new version in utils so between min and max we convert this given value into a percentage let's close this and quickly implement our inverse slurp function here above this one return v minus a divided by the minus a like so save the file and i think we forgot to include it in feature extraction so let me go up here and utils require utils from common save the file and in the terminal we type node feature extractor and if we look in our data dataset features now they are all values between zero and one and if we refresh the page you can see this reflected right here zero and one on the x-axis zero and one on the y-axis and this is now the true distribution of the data the chart is not tricking us anymore but before we can test the classification we need to normalize the features extracted from this drawing to the same space using the same min max values computed earlier now my point just went somewhere very very high up there it's connected to this car right here but i can't even fit it on screen anymore to do that we need to use these min max values from here so let's return them from this function like this save this file and in feature extractor we take them out here and now at the end i'm going to write them in one of these javascript object files that we use to communicate with the interface we need a new constant for that we'll call it min max js and it will just have min max is equal to and we stringify the value of min max like this save this and in constants js let's add a line for this at the end constants min max js also in the js objects min max js save this file and let's generate the file here in js objects we need to use the terminal extract the features again and now the min max values are here in this file we can now load them in our viewer html in the same way that we load our features up here but with min max instead and now it's a global variable we'll need to use them to normalize this point from here before we attempt to classify it so we go here and say utils normalize points it accepts multiple points but we can just wrap this one in an array like that then it will work but we need to update this normalize points function to support a given min max value right so back in utils if we pass here also a min max value in addition to the points we don't need to calculate them so we just go here and check if min max exists that parameter there then min is min max dot min max is min max dot max and we close this else we have to recalculate these like so now let's go to our web page refresh and when we draw something here our point is here right where it needs to be let's zoom in and see what happens locally when we draw the next path it will jump to the right column and then start to go upwards but you can see now it's always mapped as you'd expect we don't get those lines connecting the columns anymore good but let's revert to our width and height features they were much better at prediction than this i'll comment these out so they're here if we need to use them again and now we need to re-extract the features in the terminal refresh the page and here they are between zero and one both of them and now if i input my card drawing from earlier you can see that it is a car and if we zoom in here it's not mapped to that fish anymore it's mapped to this nearest car because now the width and height are treated equally there's still kind of a strange thing here though look how this data looks like because of this one house from here what would happen if we remove it sorry erminio let's find the idea of this sample i'm just gonna drag it up here and it's 3107 i'm going to go to feature extractor and here after we load the samples i will say dot filter the samples whose id is not 3107 let's save this and in the terminal re-extract the features refresh the page and you can see the data looks quite different now with just one sample missing erminio's house from here now let's bring back my drawing from earlier and it's a fish because if we zoom in here it's now mapped to this fish the data distribution is different all because one point was removed but it was an outlier point a problematic point so to speak but i like erminio's house so i'm gonna bring it back there it is normalization is very sensitive to outlier points one way to deal with this is to automatically detect outliers and remove them another way is to use standardization a different data scaling technique where instead of min and max we compute the mean and standard deviation and then remap each feature by subtracting the mean and dividing by the standard deviation standardization is less sensitive to outliers and it would work better in our case think you can implement it share your code with me and the first to get it right we'll get a shout out in a future video a generalization of the nearest neighbor classifier is the k nearest neighbor's classifier where k is the number of neighbors that will play a role in the classification we'll decide the class based on the majority let's try drawing something here oh come on i tried my best let's see what's around it there is this one tree here that is closest a very tall pointy tree from montana 57 but apart from that all these neighbors are pencils maybe this will work better if we use more neighbors here and consider the majority let's say in viewer html we need to update our nearest neighbor's search here to return more items let's go to utils to the get nearest function and add here a third parameter k we set it to one by default so by default we'll return just the nearest neighbor but otherwise k nearest neighbors and let's remove this code we'll write a more general one instead i want to return the indices here so let's take out objects from our data points so that we take their value and their index and return a new object like this so all bj now contains the same thing as points but as an object where the point is the value and the other attribute is the index next we can sort these with the following callback function we return the distance between our given location and the value of a minus the distance between our location and the value of b this will sort them so that the nearest ones are first and then we can take out just the indices using the map function like this and return the first k indices now let's save this and in viewer html here we actually get indices now so there are many of them and here the first one is the nearest one let's see if the code still works now it should be exactly as before refresh let's draw something if we zoom in yeah seems to work it finds the nearest let's change it a little bit increase its width yeah all good but now let's find more neighbors let's say maybe 10 and handle this new logic with multiple values let's figure out the nearest samples next they are going to be coming from the indices if we map them so that each index becomes the sample at that index and we can get the labels of these samples by typing labels is nearest samples where we take out the label now we want to count how many of these are how many are trees how many are cars and and so on so i'm going to prepare an object for counts and for each label of labels i'm going to update these counts i will say counts of label is equal to if counts of label is defined then i'm going to increase it by one otherwise i'm going to set it to one it's the first time we found that item then we want to figure out the majority so let's see what is the maximum value of these counts i'm going to set max equal to the maximum of the values of counts i'm getting the values not the keys here and this is an array so we can spread it and pass it to our math max function and finally we can get the label the label will be searched from the labels as the label where counts of that label L is max now we need to fix our code in several places like here we can just keep this label as such and let's return all nearest samples here so let's put an s here and also need to make sure up here where we classify that we get nearest samples as well we will also be passing nearest samples to show dynamic point we will draw lines to all of them you'll see we do that inside of the chart first we need to update this nearest sample in the constructor to nearest samples and here nearest samples nearest samples nearest samples nearest samples we will use these in our draw method down here instead of drawing this one line we are going to loop through all the samples of these nearest samples and let's extract the point from here so this is going to be this part from here where we convert this point from the data space to the pixel space just need to make sure that we're passing here sample point and now we can reuse this point down here point like so all of these go inside of this for loop we close the for loop and save the file refresh the page let's draw something and zoom in a bit and look at all these lines right here and even though the nearest one is probably the tree or the fish in this case it selects the clock because we have one two three clocks which are more than everything else let's change it a bit wow this is really nice it looks like it's walking there kind of like a spider now i'm having fun watching this instead of teaching you things let's see how it looks like on my pencil from earlier and it's a pencil let's see why i'm going to go up here where it is in the data space and it's a pencil because of all these other pencils around it but is this really better or does it only work in this particular case we'll see try to calculate the probability for the classification to be correct based on the number of neighbors of the predicted class most libraries have support for this feature and it's a useful thing to have other variants of the nearest neighbor classifiers exist as well like a version weighted by distance or one that considers a fixed size neighborhood think you can implement any of these share your code with me and i'll showcase my favorite implementations in a future video to objectively evaluate the classifier we'll need to test it on a lot of data we do this by splitting the data into a training set and a testing set we classify all testing samples according to their nearest neighbors in the training set but we know the correct labels for these already so we can count how many classifications are correct and compute the accuracy to follow along make sure you have the code from last time or get my version from github and let's settle the score between these two we'll split the data in feature extractor js before we start writing files i'm going to log here that we will be generating splits it's going to be a fast process it's more like a comment for us to know what happens and here let's decide the training amount and i will set that to be 50% of the number of samples and i will define an empty array for training and an empty array for testing we loop through all the samples like this and if we are less than the training amount we add to the training so the first half of the samples will be the training set and the second half will be the testing set now we're going to be writing these into files as well same as what we did here with the features both in the json file and the javascript object let's paste this here and rename for training the samples here need to be the training array and then training js let's say this object will be called training and here i'm going to still refer to this attribute as samples but set the value of training and let's copy this again for testing i need to be really careful here because it's very common to make the mistake where you're testing on the training data or something like that so let's rename this to testing and this here to testing this will be testing js this will be testing and this is testing now we need to define all of these constants so in common constants i will copy this features line and right here training training json testing testing json and this features js here at the bottom i also duplicated with training js the training javascript object and then testing js testing javascript file let's save this and in our terminal we can type node feature extractor to get now this new data in our data set we can find training json and testing json they both have the same format but the data is different let's load next these javascript objects for testing and training into our web application i'm going up here and we'll copy this features js two times once for training and once for testing and below here we can take out those samples like so from the training and from the testing the files have also the feature names but we have those already here now when we're testing we pretend that we don't know the label the labels are here we just forget about them so i'm going to loop through all of the test samples and i'm going to store the value of the label in an attribute called truth and replace the label with question mark i'm storing the truth here because we'll need it when we calculate the accuracy you'll see but now let's divide this code into two where we handle the training samples and testing samples separately i'm going to group this and copy it below and right here that these are going to be the training groups coming from grouping the training samples let's copy these training groups here instead of groups and here instead of groups now the same thing happens below but with the testing groups coming from the testing samples and let's copy here testing groups testing groups now if we scroll down in the chart i want to visualize just the training data the training samples and in the classification we need to be very careful here we classify using only information from the training data so training samples and also here this one training samples let's save this and open the page you can see the data looks different it looks like Erminius house is not part of the training data it's still here but we don't know the label for it this is the case now for all testing samples in the second half of the page this is the border line between the sets now before continuing we need to talk about normalization one more time you see these numbers here they are coming because we normalize the whole data and that's not proper we have no idea what the testing set will be so we should normalize only with the training data in feature extractor we go here where normalizing happens and we replace this with training then i'm going to copy this and normalize the testing using here the min max from the training this is proper but it won't work unless we move this code below here where training and testing exist let's save this and in our terminal i will regenerate the features refresh the page and now we have this zero one here as expected i won't be using this input so much anymore so i'm going to disable it by default in viewer html under the sketchpad we will say toggle input so now if i save and refresh the page it's off and if we look for erminio here and click on his house something weird happens i think we have an error and it wants to display it on this chart with an image for that label which we don't have so let's go to utils in common and i can't write here an attribute with the question mark but i can using this other syntax so i will we'll have a red question mark emoji here now let's refresh click on the house again and there's no error but where is the house on the chart it's actually where it should be i'm going to zoom out and it's there normalizing doesn't guarantee that your result will be in the zero one range but most of the time it is there especially if you have a large training set like if i'm going to click on any of the others here is probably going to be there next what we'll do is classify all these unknown points at once i'm going to go to viewer html and all the way at the top here where we don't know the label we try to find it out using our classify function we classify the test sample point i'm destructuring here because the nearest samples also come with it and then i'm going to set the test label to whatever comes from there and i'm also going to set here an attribute for correct which comes after comparing this label with the truth value from earlier this will be useful when displaying the items i want to use a different color let's go to js display js up in this create row function when we destructure here properties of the sample let's add this correct attribute as well and here where we are creating our sample container i'm going to check if it's correct and if it is i'm going to style its background so it's light green let's save this refresh and look at this presumably this is where our test set starts can't be sure because this could also be part of the test set with all of them wrong but i remember it was this one to be sure we should add a subtitle here let's go to viewer html between these two pieces of code i'm going to write subtitle creating an h2 tag with an inner html of testing i will add this to the container as well let's save this refresh and now it's clear where it starts now if we click on some of these they don't turn yellow anymore if they are these green ones so we can fix that by going to style css inside of this emphasize we can add here an important like this this will make it appear even on top of this green color now i'm not going to count how many of these things are green instead i'm going to calculate an accuracy we'll go to viewer html and in the control panel after this toggle input button let's have a statistics field i'll just use a div for that and in it we'll put some numbers that we calculate here i'm going to count how many times we are correct and how many times we have in total here i will just update the total count by one at each time and our correct count will grow depending on this correct value if it is correct it grows by one otherwise it doesn't grow then below this i'm going to put inside of the statistics div an inner html starting with the bold accuracy and then a new line where we are going to concatenate the correct count slash the total count and i'm also going to put in parenthesis the percentage we have a function for that in utils already like so let's save this refresh and here is our number it could be bigger even on my small testing page here let's go to style css at the bottom and add the field for these statistics increasing the size and adding some padding okay so 39.62 that's our accuracy when we use our k nearest neighbor classifier where the 10 nearest neighbors are considered but this is really a parameter and now we can compare if we extract it as k and move it up here at the top and set it to one let's see what the accuracy was without using multiple neighbors it was actually much worse so considering multiple neighbors was indeed a good idea now we can say for sure let's try some other value maybe 50 wow we got 43.6 almost 50% with just these two basic features let's see how this input spider looks now with 50 lines here oh creepy we can see it better if we change the color to black for those lines in chart JS let's set here ctx stroke style I think gray is enough I don't know why I like this so much the accuracy is not amazing but we're on the right track it's much better than guessing or is it what is the probability to guess correctly can you figure it out try also implementing a classifier that just guesses and empirically test to see if you were right also there's a lot of duplicated code in feature extractor JS and viewer HTML can you find a better structure for it share it with me I'll choose my favorite and refactor the code in that way decision boundaries are a useful way to understand the classifier I'll show you how to generate decision boundary plots and how to display them on our chart now this spider is cool but it's not very informative I mean to figure out why this is a house you would need to count all of these different things and not very useful really we can get more information from something called the decision boundary plot and I'll teach you how to get to that but first we need to refactor some of this code so we can do this evaluation also in our node environment let me close some of these things here and in our node folder I'll create a new file run evaluation JS and let's begin by loading our constants and our utils I'm just going to copy this and say utils now we're going to create a place to hold our classifiers like our KNN classifier will be in common classifiers KNN JS let's pretend we have it for now and just see what we would do with it if it was there sometimes it's good to think like that let's access the file system and I'll put a log here for running classification first thing we need for classification is to get a hold of our training samples I'm just parsing the data in our training JSON file and I'm calling it training samples what we would do next is instantiate our K nearest neighbor classifier as a new KNN on the training samples and a value of K maybe 50 that's all we need then we test we get our testing samples from the other JSON file like this and compute the accuracy similar as what we did on the web page I am going to initialize total count and correct count and go through all the samples of the testing samples and for each of them I'm going to want to predict what the label is by calling KNN predict given this testing sample point the correct count will increase if it's a correct prediction and the total count regardless then let's log in the console similar is on the web page the accuracy column and then concatenate with correct count total count and format the percent using our utility function in parenthesis okay that's all we want in this file but now let's implement this KNN class we'll do it in common and here we need to create a new folder for classifiers eventually we'll have more than just the K nearest neighbor in it we create a new file KNN JS and start defining our class constructor will take some samples and a K it will store them and the K and we predict a given point to implement this I'm just going to take the code from viewer HTML the one for our classification pasted here and fix a few things this training samples now is this samples in both these places so here and here K is also this K so I need to be careful with that otherwise we're fine this is called point in both places but the return is different we are not returning just the label but also the nearest samples it could be useful to have both of them maybe even more data sometimes like probabilities so I'll keep this as such and change our run evaluation script accordingly by destructuring here label as predicted label like so now we also need to remember in KNN.js to go down and export this object we will use it also in the web so we need to check if module is existing there and it uses from utils get nearest because of that we have to require utils here and because we use it on the web I'm also going to check if utils doesn't already exist now in the terminal we will type node run evaluation js and we get an accuracy same as we did on the web page no surprise there let's update the web page to use this new code that we wrote I'm going to copy this and include the KNN classifier and below here I will instantiate it using the training samples and K now here instead of classify we can just ask KNN to predict this point and the same thing below for the sketchpad and we can remove this classify function completely let's refresh the page and everything still works as before but now the code is only in one place and we don't make silly mistakes like changing one thing in one and forgetting about the other refactoring like this is really important now we can go to this run evaluation after printing the accuracy here and I'll teach you how to generate a decision boundary plot let's log that we are starting to do that and we'll need to use a canvas we use the create canvas function to generate the canvas of 100 times 100 and let's get an access to the to the drawing context and use it to draw our plot it will be a pixel based plot where we take each individual pixel of this canvas and treat it as a feature between zero and one we'll normalize it and then we will color the pixel depending on the predicted value you'll see we loop first on the x-axis pixel by pixel and the same on this y-axis and let's create a normalized point here a point which will have the x component x divided by this canvas width and the y component is y divided by the canvas height but we put a one minus that because eventually we'll put this plot on our chart here and it should start from the bottom and go upwards otherwise it will be flipped like that now let's get the label from predicting the point and choose a color from our styles in utils according to this label like so now we can set the fill style to this color and draw a small rectangle at the xy location with one and one width and height like one pixel i'm gonna close this and write this image into a file using a buffer it's gonna be a png image we write it in a place where we need to define the constant for and let's log here that we are done we just need to add this constant next so constants i'll put it here at the end the decision boundary will be in our data set directory decision boundary dot png save the file return to our terminal and rerun the evaluation you can see this decision boundary takes some time and it's done we can see it here in the data data set decision boundary and visual studio code can open it here as well it's a small image it's 100 times 100 pixels generating larger is possible but it will take a long time so we'll work with this for now and i'll show you higher resolution once later we now set this image as the background of our chart we go to viewer html where we define our chart options here and let's implement a new option which will be for the background bg it will be a new image and i'm going to set its source to be that of the decision boundary plot our chart doesn't support this background yet so i'm going to chart chart js to implement this feature i'm going to take the background image out of the options here store it as an attribute and in our draw method before drawing the samples here i'm going to plot the image i will first take the top left coordinate according to the data it's the zero one point essentially but we have to get the pixel value for it so from the data bounds to the pixel bounds we can put here zero one and the size of this image is going to be the size of the chart minus the margin on both sides and then we have to also divide this according to how the transformation was done how the scaling was done using the second power and then finally we can draw this background at the top left location we can spread this out like so and then size size for the width and height it's a square aspect ratio let's save this open the page and it actually works the image will appear here as soon as my mouse will hover it i'm not waiting for the image to load so hovering here redraws the chart after it has loaded and i'm fine with this you can see there are different colors on the background now and they correspond to our samples but zooming in this image is really low resolution so we could do with a better one also this smoothing effect that happens here can be disturbing a bit we can remove it actually if we go to the chart constructor after we get a hold of this context we can tell it to disable the smoothing by setting it to false let's save this refresh and now you can see these pixels here sharp but to understand things better we'll need to look at an image with a higher resolution and i've computed one in a few hours it's 5 000 by 5 000 and it looks like this and what it means is that actually we don't need to show the data anymore like if i'm gonna go here in the draw method and comment out drawing the samples actually i think we can be without many things here we don't need the samples or the hovered sample or the selected sample anymore and here we don't need this part even we just need to see where the point is let's refresh and if i'm going to open this input and sketch something here you can see the point here on the plot and it's a fish because it's in the red area but if this would go higher it would instantly change to a bicycle then to a clock and then let's see closer quite complex structure here reminds me of fractals or land masses you could use this as a terrain generator now let's go even higher so there it's a house and then it's a tree and so on much clearer in this way we don't need to count how many nearest neighbors there are how many labels and and so on these colored regions tell us and it's really interesting to look at how these different regions appear let me show you one for when k is set to one very different isn't it to understand better let me add the points again and now let's zoom in a bit you can see this green patch here every point inside it is closest to this tree and this is the separating border between the tree and this pencil if you know what the Voronoi is then these edges here are a subset of that and if we go inside the data where it's more complex you can really see this pattern each of the training samples are in their own little region sometimes multiple points make a bigger region like this all right you now know about feature extraction data scaling classification and evaluation and you have a lot of tools under your belt your task if you choose to accept it is to calculate the accuracy for all possible values of k and create a line chart conclude what the best value is and generate the high resolution decision boundary plot for it as well share these charts with me and I'll showcase the first correct answer in a future video let's review what we learned so far using python and the scikit learn library install these now to follow along but first we still do a bit of javascript to prepare the data for python in javascript we use json a lot like here where we are writing our features it's possible to do this also in python but more common there is to use csv files so let's go to utils js and implement the function to convert our samples to csv format let's call it to csv and it will take us parameters the headers of the data the name of the features and the samples array it will output a string and let's initialize this with the headers separated by comma and then a new line like this we then loop through all the samples and we do the same thing for each sample separating the feature values by a comma as well and the new line at the end let's return this string save the file and in feature extractor js let's go where we are writing the training data and write a new code for outputting it in csv form we write the file and we'll have a separate constant for the csv file we output to csv our feature names but also the label notice here i'm sending the first parameter a concatenation between the feature names and the label so it will have three columns not just two and similarly the second parameter i'm going to remap training to contain the point and the label of that point let's close this copy this and do the same thing for testing as well where we're writing the testing json file beneath it i'm going to paste this and rename to testing csv and training here to testing as well now save the file and let's define these csv constants i'm just going to copy this training and testing json constants from here below and rename this to training csv testing csv and the extensions here will be csv and csv let's save the file and in our terminal now we can run our feature extractor again and if we look in the data dataset two new files have appeared testing csv training dot csv and if we open them they look like this this kind of format is very common when working in python actually even more common would be to replace here the label names with the number an index for the label but we'll leave it like this now let me close these from here and let's start working with python to do that i'm going to create a new folder python and inside this folder i will create the file called knn.py we'll reimplement the k nearest neighbor in python this time using libraries we begin by opening the training csv file from data for reading let's read the lines and let's print to see what form we get this in i'm going to be testing in the terminal let me go out of this node folder and inside the python directory and let's run the script by typing python knn.py and we get this array where each item is a row represented as a string the new line character is also there let's check to see how long this array is yeah that looks about right now let's parse this data in a more usable format it's common to refer to the data as capital x in python like this and for the labels as y these are two empty arrays and we're going to populate them by looping through our lines but i'm going to start at one because i want to skip the headers like so now we are going to split by the comma and start appending values into x we're going to append an array with the feature values so we'll exclude the last entry here which is the label we also have to convert these values into float by default their strings so we can write float conversion of the row of j for j in range the length of the row minus one we can close this and then to the y we append the label the last value in this row after the split but as i mentioned previously it's more useful to have these as numbers so i'm going to remap these using a classes object you'll see of the row of minus one this is a quick way to refer to the last item in python so this class is here is going to be an object that i'm going to define here at the top and it's just going to say that for car use a value of zero for fish use a value of one for house use a value of two for tree bicycle for guitar five pencil six and the clock seven in python objects like this are called dictionaries now if i'm going to save this and rerun the script we get an error here because car comes with a backslash and at the end so let's fix this by stripping this string here it removes white space let's save restart the script and no errors this time but we just get the print from earlier let's remove this and print something else maybe we can print the value of the x array but we could just print the first ten items like this so we don't overload the terminal let's save this restart and this is what we get if we look in the data to compare in the data set training csv they do seem to match with the first ten values from here let's print also the y value here save restart the script and these look good as well going from car to clock which is a seven and then back to zero one and so on now let's remove this and do our k nearest neighbor classification i will import from the scikit learn neighbors the k neighbors classifier and instantiate this like so where we can pass how many neighbors we want to use in the web app we used 50 so let's set here and neighbors to 50 and there are many possible parameters here for all kind of variants but to get similar behavior as in our web app i'm going to pass the brute force algorithm and uniform weights we can then fit our data to this knn object and start testing it's really that easy but we need to read the testing data as well and instead of just writing this all over again i'm going to extract this as a function let's call this read feature file given a file path like this let's rename this with the file path and indent everything inside indentation in python is very important and return here a tuple with x and y now here we can read our training data using the new function so we have the x and y to pass to the fit and and read the testing data next i'm going to store in the same variables here we can then calculate the accuracy of our model with score of x and y and let's print it in the terminal i'm going to save this rerun the script and there it is python can do much more than this try installing matplotlib and use it to display the feature values try customizing it the way you want can you make it display the decision boundaries as well share your code with me and i'll showcase it in a future video now the course will take a short break meanwhile do your homework and reflect on what you learned next time we start phase two where we bring the accuracy to a whole new level see you guys
Info
Channel: freeCodeCamp.org
Views: 1,182,596
Rating: undefined out of 5
Keywords:
Id: vDDjtwQDw2k
Channel Id: undefined
Length: 231min 30sec (13890 seconds)
Published: Mon Apr 17 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.