How To Extract Data From Websites Using Chrome

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right so in this one we're going to grab webpages and using the inspector of the browser we're going to parse that data using array methods and convert it to arrays with json stringify so we can use it in a machine learning or data processing project and also we're going to convert this in a cvs file as you can see over here so let's start here we have freemusic.org and we have a list of music right so what we want is to have this list in a cvs file or in an array so we can better use it in other applications so we're going to do is press f12 or go inspect here and we're going to select one of this row and we're going to see that we have a div and the class names here so one cool thing is that you can reference this element in the console by doing dollar sign and zero and you can see that we have the selected element here so how we go about selecting all of the songs here we're going to create a variable called songs i'm going to use the document query selector or to select all the songs and put it in this variable so we can put here element names class names and ids i think that's it but now we're going to select using the first class and see if we have all the items that we want okay so as you can see you can expand this and see if so if you hover over this you can see that it selects in the page so we can see that we have 20 of the lists beautiful now we want to change this knowledge to an array to use the array method so we can map this data and have it in a better way right so what we can do is in the later versions of javascript we can use the spread operator to convert this node list to an array so we can do is create an array up and put here the songs array so this is going to evaluate here and assign it to the variable again so now that we have our songs array we're going to start by retrieving the information of that to do that we're going to go to the element here and see where the data is and we can see that it is in the second child of the parent and we can expand this and we can see that there is all span elements that the data is sent so what you can do is go to songs let's create another variable that is going to call songs later and we're going to grab the songs and use the map method we're going to give it a function this function is going to accept a single element so a single song and in this song we're going to go to songs.children and to better illustrate this let's first do this this is not going to do much but we're going to go to songs i'm going to grab the first one and call get the children and we can see that we have the children elements of this one okay so we can do is grab the second one and also call children of this and as you can see we have our data here so going back to our map function we're going to return implicitly here uh the songs children one then we're going to go children again zero and i think i need another one i'm going to go inner text i put this one wrong it is supposed to be sunk okay so we have an array of arrays of the first element here that is the first data the artist name i'm going to repeat this for the other ones i'm going to have four so we're going to go to this is the first one the second one is going to be one two three and that's it you go enter and you can see that we have here our arrays and our data so how we can retrieve this now so what we can do now is you see json functions and we're going to use the stringify function and we're going to pass the songs data i'm going to have a string of our data here so with the string and using for example be sure to go we go to another file and we paste we paste our string first we need to copy it so we copy it and what we can do now is here grab for example these two characters right here and replace this with the same but with an enter you need to press ctrl enter here and press all and we have our data ready to go here so if we want to convert this again to an object we can use again json but this time we're going to use parse we're going to put in the variable songs json for example and press enter so i changed the first code for a special code of javascript and this is the one that you use to put variables in the string but this time we are using it because we have a little bit with the double quotes and the single quote so this is going to work this okay so how we can do now to have this data in a cvs form so for that i'm going to use the reducer function to reduce this data to a single string that is in the format of a cbs file so we're going to grab and put the songs in cvs and we're going to grab songs either one uh works because they're same reduce okay so we're going to give it the function and an initial state for now this function is going to return nothing and the initial value of the comma superiority values of the cvs file is going to be the artist and we're going to grab it for here right the track so we have here our four headers the first initial state of the reducer function we're going to have in the function the accumulator and the current value and we're going to do is grab the accumulator that is going to start in the initial value and we're going to add that to the accumulator we're going to add a new string we're going to use the special javascript codes we're going to use door assign and curl braces and grab the first value here current zero i'm going to put a comma here and in the first part of the string we're going to put these two cards here that signifies and enter and now you can copy this and carry it over four times so we we delete this one and now we hit enter and as you can see we have here a string now we can do with this string is copy it right copy this string and we go to for example bs gold i'm going to replace the character here for an enter into it shift enter to do this and we have here our comma separate values uh we need to remove the the string quotes and that's it so now another example is this one that we have a little bit of a form but you can see that it's going a little bit not that structure so we're going to start with this one first of all we're going to clear the console i'm going to start so you're going to have our jk variable here you're going to use document query selector or here and this time we're going to go article dot joke cross code enter and we have here our list of notes we're going to go a little faster here because we did go slow in the first one we're going to spread that into another variable here jk so the dot jk so we have our right here we're going to put d data in a new variable called data the gay data so jk array the map and now this is going to look a little bit uh worse because we need to go hunting for this data now okay so data is a lot in industry of um the elements here so we can we need to hunt for it more or less right so you're going to use the first trick that we learned going to grab the serial element we're going to go children we can see that we have single children it might be better to grab that one instead of the article but we do it this way and we go children again and we see that we have we have a header that we need because it's the title of the joke and the content of the joke so we have this one is the title of the joke and this one is going to be the content that is going to be in the first one or the second one and i say we have our two like route we go to the map function again i'm going to go current joke i'm going to return an array with current and the title was this one so we grab this and it should be the same i'm going to go current and grab this one it's going to be the body of the joke and enter and we have a trouble here what's the job i found the arrow here is that so because we selected all the articles some articles are not jokes and doesn't have the body but it has like a children so we are selecting one that we don't want but luckily we can showcase a little cool thing about javascript it's called optional channeling so if we don't have this property going to skip it and don't put it as an undefined so one is for example here when it doesn't find it it puts an undefined to finish this section we're going to use another array method that is called a filter so we're going to filter those that doesn't have a second value so uh we have here our filter and what we can do is graph the current one and say if the current one second value is not equal to undefined click enter and we can see that we don't have those two that didn't have data okay so the rest is the same as the other one all right last one of the video is going to be a little bit of a quick one so we're going to grab all the images of google images here one trick that we can do is use the app page grab a bunch of images here and go in the console going to go dogs click and you know the drill by now we're going to select the image now so we have all the images here of the docs and we're going to extract now the source of the image and put it in the array so let's do that so dogs source map go into dog oh that source and i forgot to convert it to an array so dogs source is going to be an array use this spread operator here of docs now we can use again dog source to map it and we're going to have all the images these are encoder um these are encoded in a beta base64 so on a link let's see if this works so we can see that it is working so and that's it the rest is pretty simple uh one trick here if you want to parse um google images is that the first i think the first two are the logos of your account so make sure to extract those the rest are images so so that's about it let me know if you like it and if you do consider giving a like that will help a lot and subscribe if you like the content and that's it
Info
Channel: RamgenDeploy
Views: 167
Rating: undefined out of 5
Keywords: Tutorials, programming, computer science, javascript, javascript tutorial, javascript for beginners, learn javascript, data, js, js tutorial, artificial intelligence, deep learning, python
Id: -mZgTGkHjLU
Channel Id: undefined
Length: 12min 15sec (735 seconds)
Published: Sat Sep 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.