Web Scraping With Javascript (Puppeteer Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys how's it going i'm back here with another video and today i decided to bring this video which some of you guys have been requesting me for a while now almost like a year um which is a video on how to web scrape using javascript and a library that is very famous called puppeteer i do have a five video series on this topic that i made um right in the beginning of when i started posting videos and i really liked that series and the people who watched it also really liked it but um as many of my subscribers pointed out i had to remove two or three out of the five videos because i was very stupid i i script websites uh that didn't allow scraping and those companies obviously um let me know because i publicly post it on youtube so in this tutorial i won't be scraping any websites that i feel like would um come to me and ask me to take down the video i'll be either scraping my own old website which is this one over here i don't post any more over here but um like it's a website that i created in the past um and some other websites that i feel like they allow scraping and won't cause any troubles now before we get into the tutorial if you guys could leave a like and subscribe i would massively appreciate it because it would help me push my videos to more people and yeah that's basically it let's get into the tutorial [Music] okay so puppeteer is a very very famous library as you can see over here it is used for many different reasons not just um like projects that are based on scraping but also to get data and gather information it is very useful and also to test code right if you want to test it out you can use puppeteer as well now what happens is we're going to be going over the basics in the beginning and then start implementing some solutions to more interesting things for example we're going to be using puppeteer to not only help us gain data and in order to get it started i would open up vs code over here as you can see and i already created a very simple project it doesn't have anything it's just a file called index.js without anything inside but basically what i did is i opened up my terminal on vs code i wrote npm init dash y pressed enter it created the package.json file and then after this was done i just wrote npm install puppets here like this and when you install this package it might take a bit because it also installs something called chromium which is a web browser that is going to be used to to scrape right um and it might take a bit but it does it isn't supposed to take like a very long time and when you have it installed because i already have it you'll see that in your package.json you should see that the dependency puppeteer is installed and you can see that this is the version that i'm using so that you can compare to yours um so then what i did is i just created this file called index.js and here is where we're going to be writing most like actually all of our code so i'm going to start out with um how do you set up um a project to be able to scrape well we first need to um initialize the puppeteer library right so i'm going to say const puppeteer equals to require puppeteer like this puppets here so puppeteer over here and um when we require this library we'll just come down here at the bottom and the first thing that i want to show you guys is the format of a puppeteer application usually runs as following you'll have a function that is created like this it's just an anonymous function that that returns something at the end so you need to put two parentheses like this and it is a asynchronous function so i'm going to write async over here and um let's create the the function with the arrow syntax and this is basically the format i know it looks kind of weird for most people who are not familiar with this but basically we're just creating an anonymous function um and making it a synchronous now most of our code will run inside of this function and the reason why we have it asynchronous is because most of the the actions that you can or the the functions that you can use with puppeteer they return a promise which makes sense because when you're web scraping you're telling a bot to to wait for something to happen to then take some sort of action and that's kind of what async await does you're basically telling your bots to wait for something to finish occurring so that it can move on and that will be able to simulate some sort of real-life behavior so instead of this async function the first thing we'll always have to do is initialize our browser so tell our bots to open up a browser and the way we're going to do this is we're going to create a variable called browser like this set it equal to a weight and then pop it here and then dot launch so launch is a function that exists inside of puppeteer and it basically obviously launches a browser and you might see that for almost everything that we write we're going to put in a weight so that it it has some sort of steps you know some sort of recipe to what the bot needs to do it first launches the browser when the browser finishes launching will then open a page go to a website and that kind of stuff right and one important thing about the launch function is that there's two ways of launching a browser in puppeteer there's the headless way and the non-headless way what does that mean headless basically means that the bot is gonna run without you actually opening up a browser in your computer and seeing everything happening you can actually web scrape without seeing the stuff happening the browser opening going to the website that kind of stuff and that's actually what most people do but for the purposes of the tutorial i want to be able to see what the bot is doing as it it runs so what we're going to do is we're going to set the headless property of the the browser to be equal to files so that when we open the browser it will be able to see it opening now what i want to do is i want to basically just open a new tab in the browser immediately when it opens so i'm going to say const page is equal to await then browser dot new page this is a function that just opens a new a new tab in your browser and it's pretty simple now that we have a page we can actually tell this page to go somewhere and what do i mean by that well you can basically tell the bots to write a link in in the page in the tab and just go to that link and the way we do this is pretty simple we'll just say um await page dot go to and as you can see it's pretty interesting because puppet's your chosen function names to be very related to what it's doing right new page launch go to so that's one of the best benefits in my opinion of using puppetry because it's a really well designed library and then for the go to we just need to put over here um a link which is the link to the website that we want to go so as the first example for this tutorial i'm just going to script my own website and by the way all this information is outdated as i mentioned um it's like things that i did back in the day uh you can see um like this is literally my youtube channel uh my old project and everything um if you want to check it out you can check it out but all this information is outdated and for that reason i'll be scraping this website so what we want to say is we want to tell our bots to go to this link over here so i'm going to put https machadopedro.com and it should go to this link then we can do whatever we want with this page the page variable over here allows us to grab information inside of the page which then leads us to getting data from our scraping data from the page now the first thing that i want to show you guys is something that is pretty it's one of the simplest things you can do but it's also really interesting which is taking a screenshot of the page it's one of the first examples that the documentation shows um and i think it's really cool because it just shows you the power of using a web scraper so i want to show this really quick to you guys before we get into scraping data so what we can do is basically we can come over here say await and then just say that we want to tell my page to screenshot which is a function and when it screenshots the page of the website i want to save it to a path called um let's just call it my website dot png or jpeg you can actually choose the extension so this is what i'm doing i'm just taking i'm going to this website taking a screenshot it's going to save over here in my project and i'm going to call it this and let's see if this actually works one thing we need to do is we need to actually tell our browser to close at the end because we don't want to keep it open we want to tell the bot to run do all the stuff it does and then close so to close the browser as you guys might expect by the easiness of of the function names that i already showed you guys it's literally just a weight browser dot close and it's a function that just closes the browser so in order to run this project we're going since it's a node.js project i'm just going to say node index.js and which is the name of the file that i created and it will run the file and run everything automatically you should see that when i press enter it is going to launch a browser into my computer i'm not going to touch anything it goes to my website takes a screenshot and then closes the browser let's take a look to see if this actually worked if i come over here open this up you should see that we have the mywebsite.png file now and when i open it up it is exactly a screenshot of my website which is amazing right so i'm going to close this right now um delete this file because like why would i need this what this picture um and what i want to show you guys is the basics of getting data querying um like selecting the tags and getting actual html from the page okay so the first thing i want to show you guys in regards to accessing like html tags inside of the page and grabbing information is by um coming over here to your code and the way you do this is basically you need to create a function that will evaluate the page and what i mean by evaluate is um it will take a look at a page and grab all the information you want when you call that function so we're going to create this function over here and let's call it um as an example just to show you guys the basics um let's try to query um my this piece of text over here right i just want to query this get that information and console log it through the bot so how do we do that well we need to first create a function and call it something that makes sense like grab um i don't know grab like paragraph it doesn't make sense but let's just say let's just say grab paragraph then set it equal to a weight and then we say that we want to evaluate the page and the page that evaluate returns a function a callback function inside of here as you can see and um this function um we can write all the things that we want to do when we evaluate the page so the first thing i want to do is i want to grab whatever text is inside of here so how do we do that well with web scraping you need to be really good with taking a look at website and understanding its html so i'm going to zoom out a bit i'm just so we can see more stuff at the same time basically if i'm on chrome which i am i can use the developer tool to inspect element and take a look at all of the different like the the skeleton the structure that this html page is uh made out of so what i want is i want this thing over here so i'm just going to select it and i can see that it is a paragraph tag and um the thing is it doesn't seem to have any classes or ids attached to it it's just a paragraph tag individually right and then it is very important that we are using stuff like classes and ids so that we can identify which specific thing from the page we're trying to get but one thing i can do is you can see that this paragraph is inside of a div with the class name of this thing over here so what i can do is i can basically tell my bots to look in the page where there's a div with this class name and then grab the only paragraph that is inside of it which is this thing over here right the p tag so what i'm going to do is i'm going to copy this class name come over here to our visual studio code and then i'm going to say const then let's call it a paragraph i'm gonna call it pg for paragraph um tag we're gonna create a variable called paragraph tag set it equal to um document and here's the important part when you're web scraping you can access the elements in the page by using the document um the document object that exists in inside of javascript and you can use all the different methods like get element by id get class name no get element by class or you can use the query selector which is uh the the thing that i love using the most because you can basically grab any kind of element directly by using this this function and the query selector it takes in one piece of information which is a string containing either the tag or the id or the class of an element that you want to grab so it's going to grab that information and set it equal to this variable over here so technically we just want to grab the element with this class name over here which i just copied from our html because this class name over here is this whole div and then to access the the the text inside of the paragraph tag all we have to do is we just have to grab the div and then grab the paragraph inside of it so i'm going to say that i want to select the element with this class name and the important thing as well is when it has more than one class name as you can see it has two um we need to unite them by putting a dots right in between it and also even if it's a single class name we have to put a dot at the front right and we not only want to grab this class name but we also want to grab the paragraph inside of it so i'm going to press space and then say that i want to access the paragraph the p tag inside of this class if you're not familiar with accessing elements like this it is a very important topic it is widely used in css it is very important that you get familiar with it so i would just recommend playing around with it seeing what you can do with it um it's going to be really useful when you're web scraping so we're now saying that we want to set the pg tag equal to this tag over here so now that we have access to the tag what we can do is uh we can actually access information about it for example each tag in html like a paragraph tag has a property called the inner dot html which actually actually represents the text inside of it or whatever is inside of this tag we can also use the inner text which doesn't even auto complete but it's actually something that you can use uh but i'm going to use the inner dot html so that we can take a look at what it looks like and what i want to do is i want to return from this grab paragraph function the inner dot html from this tag and at the end after we create this this function i just want to console.log the result from it so i'm going to say console.grab paragraph and let's take a look to see what happens my expectation is that the bot is going to open the page then it's not going to screenshot i'm going to delete this then it's going to evaluate the page grab the tag the paragraph tag that contains this information over here this little thing over here set it equal to the pg tag and instead of this function we're going to return whatever element or whatever html is inside of that tag which should be representing this text over here and then at the end i'm just console logging whatever this function returns so let's take a look see if it works i'm going to clear my terminal run node index.js it's going to open up the browser let's see it goes to the page it is loading and um you can see that it actually works perfectly closes the browser any console logs the little paragraph that i wrote over there now you can see that it includes some tags right for example the the word coding apparently for some reason is in in bold so there's a b tag over here if we wanted to only grab the text without any html inside of it let's try it out by using the inner text property that i mentioned and let's run it again and let's see what difference occurs well it's going to open up the same page wait for it to load as well but now it doesn't include the b tag because it's not looking for html it's looking purely for text so that's basically the basics of how to grab elements in a page we're gonna do some more advanced stuff right now so i would just play around with um grabbing different elements um if you wanna try to grab links trying to grab things with ids instead of classes i'm just playing around with it i'm i am going to show you guys how to deal with whenever you want to query select many elements at the same time so for example if there is a bunch of elements with the same class name i'm going to show you guys how to select all of them and deal with that um but for now just play around with selecting one element and come back to the video okay so as the first example of um grabbing a lot of information at the same time i wanna use still my website um just to to show you guys something so what i want to use is if you come over here to this part of the page which is the about me page it has a bunch of um lists like it has two lists over here um which includes like technologies and all um which is good because we can basically query this list and i want to grab each of these items individually and console log it inside of my bot so how exactly would we do this well let's do the first step of every web scraper like a web scraping project which is to analyze the page if we open up um the inspect element and we take a look at what this list is made out of um we can see that we have a ul tag and a bunch of li tags for each item in the list so that's good because we can query all of them but to specify which of them um we want we can see that there's a class called row backend and inside of it there's the ul with all of the li items so what i want to do is in order to specify that we want to query all of these items over here we can come over here we can say grab technologies which is that what we're grabbing then instead of using query selector we're going to use the query selector all an important thing is that when you put query selector all it's going to grab all of the items in your page that matches the same um like the same identifier over here which is what we're going to have because we're going to put that we want to grab all the items where um are inside of the the class name row backend and we do we do need to put the dot in between them like this then we want to say that we want to grab all the items that are inside of the ul tag because all of this items are inside of a ul tag and then we want to specify that we want to grab each li items inside of it so i'm just going to put another space and l i you recognize that i only put spaces when i want to say that i want to grab something inside of that element right so the row backhand we don't put space because they are the same item right but if i want to say that i want to grab a ul inside of this div i need to put a space and then put the ul and then i want to grab the li so i do the same thing and when i say query selector all i'm just grabbing all of the list items from that list so i'm going to change the name of this to check tags to mean it's the technology tags and right now um what this is going to return is it's going to return a list of the html tags but i don't want to do that i want to return back a list of just the the items right just this items over here so what i can do is i'm going to create an array over here called technologies and it's going to start an empty and then i want to grab the list of tags and loop through it by saying four each and for each tag i want to basically grab the tag itself and i want to push to the technologies list the tag dot inner text like this and what should happen is it's going to loop through all the tags that match this thing over here and it's going to push the inner text of it inside of this array so that at the end we can just return the technologies array so return technologies and when we say now grab technologies and just console.log it what we should see is we should actually see a list of other technologies that we have in the backend so i'm going to run and let's see what happens i'm going to push this over here it should wait for it to load analyze and as you can see it works perfectly we can even change this these are only the back-end technologies because this is how i made my website i separated them in between back-end and front-end um but if you want to grab the front-end ones you can just change you can see that the front-end technologies have a class name of raw front-end instead of row back-end so i can just change this to row front-end and when i save this and run it again it should actually scrape out the list of the front-end technologies let's just check to see if it works and yeah it works perfectly so um this is a very easy way of acquiring a lot of information and again um if this is the first time of you learning it i recommend playing around with it and you guys can also play around with other websites like like google um facebook um i don't know other websites like that um they're great places for you to scrape because it is really like you can just play around with it and just do whatever you want just don't record it and post it on youtube like i did because then they will probably ask you to take it down so when you feel comfortable understanding this specific syntax over here remember that you just query the information loop through it push it to an array return it back so that you can see that information over here um we're gonna start going into some um more action-based um things that you can do with the bot meaning that for example you can create web scraping is not just grabbing information right and what i mean by that is puppets here isn't just for web scraping web scraping means that you grab data from a website grab information from website but puppeteer can also can be mainly used for browser automation and what i mean by browser automation you can actually tell the bots to do stuff in the website and i mean like write stuff on inputs click buttons change stuff in the website which is really cool and i'm gonna show you guys some clear examples of how you could do this okay so as you can see i opened up here a website which the link will be in the description and i use this in the previous series that i made and i'm gonna use it again because i feel like it's a great website to um actually like practice and i feel like it will be really helpful for you guys so this website is called um quotes to scrape and it's basically just a website full of like different things that you can use to scrape so that for example there's like a login page but you don't even need to have an account you just like put stuff over here and um it will log in so it's basically a place to practice um your web scripting or your browser automation skills so what i'm going to do is i'm going to play around with this the first thing i really want to try out is just first of all scraping the name of the authors for the all of this quotes over here right just this actually maybe the name of the author and the the text just to get it started so in order to do that we're going to come over here to our code i'm going to delete the what we had previously because we're not going to be using it anymore and i'm going to change the link for the website to be this one over here so let's say quotes to scrape we'll just put this link over here now what we need to do first is what i've been teaching you guys um you need to analyze the web page so i'm going to inspect element over here we're going to take a look at what is happening so we have a we see we have a div called a column medium 8 or call md8 and it contains all the quotes which is perfect for us because we now have identified the container of all of the quotes and instead of it each item has a separate div as you can see with the class called quote so if i get um if i query select all um and use the class quote as the identifier it should return back a list full of each of these items over here so that's amazing let's say start by creating over here and saying grab um quotes right we're going to we need to first we go to the website and then we're going to create this function called grab quotes then say page await page dot evaluate because we need to evaluate the page and then what we need to do is we need to um basically create the function inside of here which is going to serve as the place for us to uh write all get all the elements inside of our of the webpage so initially i'm just going to come over here and say const quotes equals to document dot query all query selector all and then just put the dot quote because um remember that when we're using the query selector all we put the the tag of the element that appears many times so there's many elements called quotes so it will select all the elements with the class name quote and inside of this ele this element called quotes we can see that we have a few things we have a span actually two spans and a div the div is just for this tags that exist inside of each quote and um the spends are divided into first the quotes and then um the like the author right it includes some information about the author so we're going to play around with it so what we're going to do is we're going to create that same kind of format that we were doing before which is basically we're going to create an array called um quotes array and send equal to an empty array then we're going to loop through the the like the all the tags and say quote tag and for each tag that we have like for each quotes tag we want to get some specific information remember that each like this out this variable over here quote tag is equal to this div over here because it's what we're getting so for each quote what we want to do is we want to access both of its spends and it's going to have two spans inside of it because this is how it is the format in the website so what we can do with this is we can first of all grab the first um span which we know is the one containing the the actual quote so we're going to say i don't know const actual quotes is equal to quotes tag like this dots and you can actually select something inside of this so the way we're going to address this is the following we're going to use a query selector all um inside of this div to grab all the spans and we know that the first one in that in the list will be the one containing the um the quote and the second one will contain the author so we can do it like this we can say const um let me see quote info let's call it quote info is equal to document dot query select url actually not document the card select all it's just a quote tag dot query select all the reason why is because we don't want to tell uh we don't want to get all the all the spans in the website we want to grab all these pens that exist inside of this tag over here when we use the document we're looking at the whole website but when we use another tag that we that we have and we select something from inside of it it will only look for the elements inside of this specific tag so over here what we're going to do is we want to grab all the span elements and we can use span like this just like how we can use um a class name or an id so we'll grab both of them and we'll know that whatever comes back from this array over here this query select all the first like actual quotes let's call it actual quotes will be equal to this array uh quote info and the first element of it because it will always be the first element as you can see over here each quote contains two spends and the first one is the quote so we also know now that the second element will be the author so actual author is equal to quote info 1 which is the second element so now what we can do is we can just return back this information by pushing into the quotes array like this we can say quotes array like quotes array dot push and for each tag we're destructing this information and we want to push back an object containing two pieces of information the quote which will be the actual quote like this and we don't want to grab like this actual quote variable right now is just an html tag so we want to grab the element inside of it so how do we grab this thing over here well i i've taught you guys this already we just say dot inner inner text like this and it should work now we also want to put the author so i'm gonna say author and we're gonna grab the actual author um and there's a just a minor difference which is if you take a look over here the actual author or this pen contains this over here and it also contains this thing called this mole and we really just want to grab this thing over here so we need to make some sort of changes because uh this span the the the first span includes just the quote but we only want to get the author name so we need to kind of destructure this part over here so the way we do this is we're actually going to come here at the top and i'm going to just open this up a little bit so you guys can see it better but before we do anything with the author we need to grab the whatever is inside of the small tag and this is a good way for us to identify we find what we can use to identify with the information we want and we used it so we can say cons author name is equal to actual author dot query because we can query something and we want to query the element with the name small and this should actually return back that information so instead of saying actual author we say author name and we need to grab the inner text in order to grab the actual name so now what we're doing is we're pushing into this quotes array an object containing a quote and an author which is a good format for us to have in our project so now what we do is after all of this we're going to console.log the um the grab grab quotes thing like here grab quotes but also before anything we need to return back right we need to in this grab quotes we need to return the quotes array which is what we're going to be what we need to console.log so quotes array is the thing that we are returning and i wrote return wrong so return um now that this is pretty much done let's run it and let's see what happens i'm going to clear this up let's run node index.js it opens up premium it closes immediately it actually went super fast as you can see it grabbed the correct information we have the quote and we have the author which is amazing now just as an overview what we did is we evaluated the page because we need to do that then we first selected all the quotes that exist in this page we created an array to hold the result that we want back then we looped through each of the quote elements and we found out that if we instead of this element if we really just wanted the author name and the quote we had to play around with the tags so the first thing we did is we played we realized that for each quote um tag inside of this for each quote um html element uh there's two spans one span includes the information about the quotes and the other one includes the information about the author so we queried all this pens inside of the tag and we knew that the first one was the quote the second one was the author then we had to destructure the author name even more because it wasn't the way that we wanted so we selected the small element that exists in every author span and we got that information at the end we just pushed that information as an object to our quotes array and we returned it back so that we can finally console log it so this is basically what we wanted to do and it looks pretty nice um i hope this was a good exercise so that we can get even better with um web scraping this website but now i'm going to teach you guys a little bit more of like some actions that we can take on this website for example i find it interesting that we have this button over here which is a login button and when you click on it it brings you to this login page which you can input a username and a password and then try to log in so what i want to do is i want to come to this page over here click on this button put some information for the username and password click on the login button and log in to some sort of account and i want the bot to do all of that on its own so the way we do it is we're going to come over here i'm going to delete everything that i just did and um i'll just initially show you guys a function that is very useful to to to start out with this part of puppeteer where you can actually make actions so the function is a click function and by the name you already know that it should be the one used to click this login button and that's exactly what it does we're going to inspect element over here and we will take a look to see how can we get this login link um and access it so the way we can do it is um there's a bunch of stuff over here but most importantly there's this row header box over here then there's a div um which apparently is this one over here code medium four then there's a p tag and finally an a tag it's pretty difficult right it's it's actually pretty hard to even um understand how can we make some identify this thing specifically right we can't say that we want to grab the element from this class because it might have other elements in this page with the same class name so one way that we can do it is we can actually grab elements and identify them with very specific attributes and one attribute that i see in this link that i don't see anywhere else in this web page is the fact that this link links to the login page which is exactly what we want right we want to click on something that links us to the login page and since we're not um actually like scraping anything we're just telling the bot to do certain actions we don't actually need to evaluate the page um without the the evaluate function is mostly used just for scraping when we want to do something like i'm clicking on buttons uh we can do it outside of that function so um over here what we want to do is we want to call a function specifically from puppeteer that allows us to click on on elements which is the um let me just write it page dot click this function is pretty simple you just tell the page to click on an element and you put the selector over here so like i was talking about how are we going to select this link well the best way would be to um access this the like some sort of like tell the thing to select an a link tag or an a tag with this specific href so the way we do it is we just say that we want to select an a tag over here and then to specify some sort of attribute in that html tag we open and close square brackets like this and then we can put the name of the attribute like href then set it equal to uh whatever value that attribute has which in our case is slash login so it's going to look for in the page uh the element which is an a tag with the ahref equal to this and it's going to click on it so we can test this out by opening this up and going to node index.js it opens up the thing clicks on it really quick and that was pretty dumb because we couldn't really see it right so i'm gonna comment out the browser that close so that it doesn't close when it finishes doing all its tasks so i'm gonna open this up run it again um let's hope that it will work and as you can see it works perfectly it automatically went to the first webpage clicked on the link and now it's in the login page so now that we have this working i'm going to kill the terminal over here i want to first of all put back the browser.close and um i want to basically grab whatever information exists inside of this login page the two pieces of information that i want are the like the inputs and the button so in order to grab the inputs we're going to inspect element over here let's take a look at how we can identify it and right off the bat i can see that the username input has a name of username or an id of username which is even better so actually what i can do is i can say i want to select an element with the id username like this and this will be the selector for the username and password i'm going to guess it is similar it has an id called password so i can select it like this so this is how we're going to do it and how are we going to actually type on this input well as you can imagine puppets here has a function called um page dot type and the type function is a bit different from the click one because what you do is um you basically pass first of all an identifier so we're gonna pass the um the username identifier then we need to pass whatever text that we want to put inside of it so as an example let's write pedro tech over here and we'll do the same for the password um but we'll put another password um like this and let's put just i don't know password one two three and let's see what happens um let's not close it as well let's comment this browser that calls off and let's see if it works so when we run the code it opens up the bot it goes to the page and automatically fills in the inputs with this information which is really cool but if you actually want to have a visual confirmation of them writing it you can actually add a delay to this i can come to this page.type add a delay to as an object over here like this and give it a delay for example of a hundred and you'll see that if i did this for both of them it would actually show the bot a writing on it so let's take a look it will go to the page and it starts writing as if it was an actual person this is good because it might um help prevent people like websites from detecting a bot since it's kind of writing like a human but again i don't know why you would do this since you're trying to scrape a lot of information probably or automate a lot of things so maybe just going as fast as possible will be the best idea so now that we wrote to these two pieces of information into the page um into the input i want to click on this button so let's come over here inspect it take a look to see if we can find anything and right off the bat you guys can see that it has a it is of an input of type submit and it has a value of login um i'm gonna make a guess and say that there aren't many inputs with the value of login other than this because um it's kind of weird for you to have a button called with a text saying login and not being a login button so we're going to use this as our identifier i'll just say await page dot click and inside of here i'll just say that i want to grab the an input and with the same um like like the same idea the same way we we like identified this link by accessing an attribute we're going to do the same thing i'm going to say that we want to grab the inputs with the attributes value which is equal to login like this and since obviously i'm using double quotes twice i'll make this first quote a single quote and a second and the one inside a double quote so we'll save this and let's take a look to see if it works um i'll just run my my bot again let's see if it will log in and it did i don't know if you guys realized but it actually did and the reason why i know it logged in is because now this button is saying log out because it already logged in and if i click on the back button you'll see that it logged in as pedro tech so it is working and this is kind of like the basics of it right you you're able now to um scrape a lot of information you're able to click on stuff type on stuff do a lot of stuff if you want to this is a brief introduction as well to um puppeteer as a whole and i really hope you guys were able to get a lot of value from it if you enjoyed it please leave a like down below and comment what you want to see next i just wanted to make this video because i felt like my my series was completely lacking because three out of the five videos were were missing and if you're if you still want to check out the other videos i'm gonna leave a link or or just like show them here in in the in the outro so you can click on it um there's actually a pretty advanced one where which i show you guys how to build a bot that beats um like a typing speed um website so it actually types faster than humanly possible and the website doesn't detect it so if you want to check out the video i'm going to link it here at the end of the video and yeah that's that's basically it i hope you guys enjoyed it if you enjoyed it please have a like down below and comment what you want to see next and i see you guys next time [Music]
Info
Channel: PedroTech
Views: 6,009
Rating: undefined out of 5
Keywords: computer science, crud, css, databases, javascript, learn reactjs, mysql, nodejs, programming, reactjs, reactjs tutorial, typescript, react js crash course, node js, express js, pedrotech, traversy media, traversymedia, clever programmer, tech with tim, freecodecamp, deved, pedro tech, puppeteer, learn puppeteer, puppeteer tutorial, web scraping, web scraping tutorial, browser automation, web scraping javascript, web scraping nodejs, nodejs web scraping, node js project, node js tutorial
Id: Sag-Hz9jJNg
Channel Id: undefined
Length: 43min 35sec (2615 seconds)
Published: Mon Oct 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.