Web Scraping with Next.js 14+ | Puppeteer | Zustand | Server Actions, Tailwind and Export Data

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys welcome back to another video on nextjs 14 this video is a bit different from my previous videos because in this video I'm building an OLX Scrapper and scrapping means web scrapping web scrapping means that without using apis of any platform it automatically opens up the pages of different websites and analyze the data on that page and then bring and return the data from that page so in this OLX Scrapper I have actually scrapped the title price description and the features which are added on this page so let me copy the URL of any product on OLX and I'm going to paste over here and click on the scrap and you can see that it opens up this page you can see that it's not using API it is behaving like a real human being is accessing the website and checking out the title what is the price what is the description and what are the different features of this product on OLX and you can see that this is what it returned the title price description and these are all the features and along with that it has given us the link uh which we can click on and this is just uh an example to know that which product it's scrapped so we can click on and check another product so I'm going to click on that I can click on this particular button we can actually disable this window as well if we want it to scrap the data automatically behind the scenes it is just showing you that how it is going to analyze this particular page Okay so so once it will be completed you can see that it automatically showing up over here and along with scrapping logic which I'm going to show you in this video I'm going to build up this awesome UI this is actually a responsive UI it can work on mobile devices you can see that uh and it can work on the desktop devices as well and one main feature after you scrap the data we can click on the export and it is going to export all the data which have been scrapped and I'm going to click on okay if I come back to my code you can see that this is the file name which I have given to it my data. Json and you can see that this is the exported format and the procedure which I'm going to show you you can export this data in any other format as well in CSV in Excel in Json format currently I have exported it in Json so this video is going to show you all the major Concepts all the things which you would need to write a scrapper a web Scrapper for any kind of data on any page all right so let's started I'm going to be creating a new next js14 project from scratch so that I can explain you each and everything step by step so I've opened up an empty folder in my VSS code so I'm going to create a new NEX gs14 project by writing npx create next app like this at latest space dot means I don't want to create any new folder inside it so I'll hit enter and I'm going to go with the typescript yes I'm going to go with the es L Tailwind CSS this is what I will be using in this project to make the UI and make the UI responsive so I will hit yes Source directory I'm going to go no app router yes this is very important because this is what the next js14 is all about and uh yes no so it is actually installing all the dependencies required to run the nextjs project I hope that you have some understanding of how the next gs13 or 14 app directory works if you don't know you want to learn everything about nextjs 13 and nextjs 14 I have created a lot of videos you can see that this is U it names the 13 but the 13 verion and 14 verion there is no difference there is a bit different but not about creating files how the folder structure actually happens in both the varion you can see that I've created this crash codes I've created these blog applications all the different concepts that you would need to become a good nextjs 14 developer you can learn from my channel and there is an another playlist as well you can learn pretty much everything from my videos debugging performance SEO crash course building blog application building uh e-commerce app theming pretty much everything pwa uh making things responsive uh so you can check that out so let's get back and it has created this project first of all let me verify if my project is working so npm run Dev so it's actually running over here and I'm going to go this is actually the old UI I'm going to refresh this page Local Host column 3000 and let's see if it shows the default UI provided by next yes yes our project is working so it's time to start working on the files so I'm going to open up this app directory first of all I'm going to remove everything from the global. CSS file except the top three lines okay so let me save it and uh I'm going to go over over here and uh I'm going to change the background color of it first of all let's verify you can see that it has happened like this I'm going to add the background color so I'm going to change it to this format all right and uh then I'm going to write BG neutral and this is going to be 200 70% for this particular color and uh for it to work we can actually put it inside these curly braces so let me verify if the color changed yes there is a gray color so I'm going to remove everything inside it so let's remove this div this div as well and this div as well so we only have this main empty tag so I'm going to go over here there is nothing inside it so this is everything we need to remove from the existing folder structure and the code inside it so inside this main element uh if we go over here we need to actually add the header OLX Scrapper at the top so I'm going to write H1 and it is going to have this text OLX Scrapper all right so let's go let's refresh so it should show up at the top of the page and you can see that it's showing over here we need to add some CSS styles to it as well so I'm going to add class name and I'm going to write absolute width full text Center and now I need to add background gray 800 padding by 400 text 3XL font bold text white top zero and let's see if it makes any change yes you can see that it has made this OLX Scrapper at the top and let me verify if it becomes responsive as well or not yes it remains at the center all right we are going good now I would like to modify few classes of the main tag because the content below this needs to have some margin from left and right and the top so I will be coming over here we need Flex and uh actually we don't need this Min height screen Flex Flex column item Center justify between um and I would like to add few padding from left and right padding X means from left and right and then I would like to add the padding only from Top okay and then I would like to add this is what uh I'm making responsive so this padding is for mobile devices okay and then I'm adding this LG means that this padding is going to be applied when our Viewpoint is going to be on the desktop screens that have uh this uh large resolution so I'm going to add PX 40 so on Mobile screen it would be four on large screens it would be 40 and then I'm going to add padding top on the large screen is going going to be 28 and on the Excel it is going to be even greater than on the extra high resolution screens uh I'm going to add padding from left and right so I'm going to add uh save this file so you will not see any difference because there is no content yet inside it all right so now below this OLX Scrapper I will be adding this div and then I will be adding this class name width full margin bottom 8 okay so this is going to contain a component and I'm going to name this component search bar at the top all right it is not existing currently so I'm going to create a new folder parallel to this app directory and this is going to be called components let's create a new file inside it and I'm going to name it search. TSX all right use those Snippets uh extension of the vs code and uh I'm going to change its name now so this is going to be the search bar and this is going to be the search bar and uh for now let's add search and let's first go ahead and uh I'm going to try importing this from the component you can see that error is gone and uh it should should show this text over here all right so currently let me verify if those mobile responsive checks which I added is actually working or not so you can see that uh on mobile there is a less margin on left and right if I increase the space you will see that uh the left and right margin are increasing even more this is awesome and uh here search bar I'm going to go inside it and this is where I will be adding most of my code to scrap the data actually I'll be creating the server s side code as well because next yes support server s side code so first of all I'm going to add a few classes over here so class name and I'm going to add Flex Flex column on large screen it is going to be the flex row because uh on smaller screens uh I want uh the search bar and the buttons after that to be on the separate uh lines on the large screen we can have it on a single line so Flex row and then width full items is going to be on the left side Gap equals to three all right so let's remove it and add a few things over here so I'm first of all I'm going to add this input so let's add few Tailwind CSS classes with full padding on all the four sides is going to be three border is going to be four and Border color is going to be the new neutral 200 okay it is a bit lighter so rounded equals to large and text is going to be the gray and then the 500 all right so now after that we can have this type text placeholder search for an OLX uh product to scrap all right I'm using this tab N9 AI assistant it uh it is an awesome extension of vs code you can install it it suggests you different code Snippets while you are writing it in vs code so this is the placeholder and uh I need to actually create first of all let me verify if this search bar visible yes you can see that it is looking great it has this awesome border as well all right so above this I'm going to create a state management variable using use State hook so I'm going to write this one this is going to be the search prompt and set search prompt equals to use State U is going to be coming from the react and by default it is going to have this empty string all right so let's come back below this placeholder and I will be adding a value uh search prompt and then I will be adding uh the on change I know you have an idea how these things work in react so e aror function set search prompt this is going to be added in this variable as we keep on writing something inside this so there seems an error let's see uh use State search prompt let me see Let me refresh let me hard refresh it so you are importing a component that needs US state it only works in the client H sorry guys uh I need to add use client at the top because in nextjs whenever we try to add any input tags that require user interaction it can be a button it can be an input uh and also all the hooks work uh inside the client component by default all the components in nextjs 13 or 14 uh those are server side but in order to make it client side we need to write use client at the Top If I refresh now you can see that this is how uh it's working and after that we need to add the button uh couple of buttons actually so let's add this below this input actually below yes uh we have added this Flex column so first of all I will be adding this main div uh which will be containing both of these buttons so let's first add the button using this button tag and first of all I will be adding uh a class name all right so this class is going to have uh classes I'm going to write it like this because later on we will be adding few conditions as well so here I'm going to add the cursor pointer on how I want cursor to be changed to pointer and then I will be adding uh this one BG gray 800 and uh then I will be adding width this is is going to be 150 pixels disabled when it is disabled I want button color to be 400 okay and then it is going to be rounded medium then it is going to be padding X on left and right it is going to be the five padding Y is going to be three and text is going to be the white all right so we need to close this one and inside it um let me add is loading which I need to create yet so scrapping dot dot dot if it is not loading then I will be adding only scrap so let's first create this is loading variable actually the state variable at the top so I'm going to copy this variable and I'm going to paste over here so this is going to be is loading set is loading all right so by default button is not going to be loading all right so let's save it and you can see that uh this is how it's looking so there is uh an alignment issue if you see closely which uh I think I need to fix so for that I'm going to come over here and add a class to this div okay so in this div I'm going to add Flex I'm going to add Gap 2 and then I'm going to add Flex 2 so let's save it and now you can see that it is aligned properly with this it has a border as well so that is fine uh and now here I'm going to add few conditions on this button if the it is scrapping or it is loading uh then if then we want to change the cursor to pointer all right so in order to do that I'm going to add um it like this so inside it I can write dollar and I can write if there is a search prompt search prompt is actually if it is empty and if it is not loading all right then this cursor pointer is going to work otherwise it is not going to work all right and uh we need to actually close it like this let's save it and this is how the condition is looking like if I refresh this page you can see that this search bar is empty and it is not even loading as well so this is not disabled so we need to add the disabled property as well so I'm going to add the disabled property and this is going to be if search prompt is equals to empty or is loading all right if I save it now you will see that it is disabled this is awesome and uh if I try to write something in this search bar you can see that it is enabled now and uh currently I've have not implemented the scrapping logic yet so if it is is loading then this is going to be disabled again and the text is going to be changed to scrapping because of this so currently we don't have that so let's add an on click so on click and this is going to call a function handle submit which I need to create yet before I create this function let's try to copy this button and I'm going to paste this button below all right if I there should be an error yes so uh in order to see the UI let's create that function first of all at the top so below these State variables I'm going to add handle submit Arrow function and like this so I'm going to pass an event of type any and then I'm going to add e do prevent default and I'm going to add set is loading equals to true and then I will be adding try and catch block error like this and uh we have this error we don't need to do anything so we can simply log that error over here okay and uh finally once this process is done we can write is loading equals to false So currently uh there is no scrapping logic over here scrapping logic which I yet need to create so if I save it you will see that the error is gone if I go over here you will see that this particular thing is actually disabled um because uh we have the same logic so first of all I'm going to add this this button to export okay and uh I'm actually going to remove everything because it is going to confuse me um now I'm going to start writing BG gray 800 and uh this is going to be disabled if it is disabled then I'm going to add BG gray 400 and then I'm going to add rounded MD Shadow equals to extra small actually x m XS actually padding five padding y 3 and text is going to be white all right so let's see you can see that uh this is the width of the button so this is the disabled logic uh so this is not going to work search prompt it is not related to search prompt I want to check if there are any products already scrapped below if there are any products exist below then I'm going to enable this button so first of all I'm going to go at the top and I'm going to create a state variable so cost products uh this is not going to be straight variable this is going to be a state from the zand store Z stand is an awesome I have a separate video for it that is a global State management system okay so this is going to be the products from the stand which I yet need to create products equals to empty array first of all I'm going to store an empt array for now uh and I'm going to come over here and this is going to check if products. length uh exists or uh if it is not not loading this scrapping logic is not loading the products are already scrapped then I'm going to disable this button change the color as well so this is going to be disabled not products. length and is loading then this is going to be disabled and I'm going to add the logic over here as well I'm going to remove everything because I will be creating a new function later on uh to scrap the data all right so let me remove this and coming over here I'm going to check with this Sign products do length and not is loading question mark cursor pointer otherwise empty string let's close this okay so there seems uh error so I need to add this one over here I'm going to create a new component which is going to show all the products data and the same component is going to show the error message that no products exist or no products scrapped all right so for now uh we are good with this component first of all I'm going to check if it is responsive uh yes you can see that the buttons is coming below when the screen is smaller when the screen is larger button are going on the right side on the same row all right awesome now what I'm going to do is I'm going to create a store so this is going to be we can say a hook so I'm going to create a hooks folder and inside it I'm going to create a new OLX products dots uh this is going to be storing the local data but the data which is going to be stored in this inside this part particular Z stand store can be accessible in all the components all the files uh of this project okay so in the terminal first of all I'm going to install the stand so this is the spelling of it you need to install it so first of all I'm going to import one very important function from Z stand create this is going to be from and the stand all right and after that we can create a door I have a separate video the advanc the stand State management system in that video I've created a complete crud application using zand so you can check that out if you want to learn uh full fledged zand State management system um and this is going to be like this and inside it I'm going to create a variable products empty array and then I'm going to add uh products I'm I don't need to edit products I don't need to delete product products as I showed you at the start of this video I gave you the demo so product any and then it is going to be having this Arrow function uh and then this is going to take the set State any arrow and then it is going to take all the products which we have added products and we can actually use the spread of operator state do products comma and then the product all right so this is the simplest logic that we have created and in order to use it in all the components it means that if one component is updating this products array the other component can actually use this function to get all the products and this is how cool this Z stand is and we don't need to wrap any layout or any create any context provider like context API or Redux store so that that's why this is so awesome all right so this is pretty much it for this file and uh now we can actually go over here and uh we can fetch these products from the zand store so in order to do that we actually need to import that hook but it will be automatically imported so const products equals to use store coming from the Z stand it is it is going to automatically imported and then it is going to be like this so State any Arrow function state. products so it is showing some kind of error let's hover over it so argument of type state so this is actually imported from the zand we don't want it to be imported from the stand so I'm going to write use uh store like this and this is going to be coming from the OLX product which we have created I'm going to remove it and now I'm going to write it again so use store from hooks OLX all right so now you can see that there is no error over here and similar thing we can copy it and we can actually import the add product function as well from the same so add product which will be coming from that the stand store which is not being used currently so let's come back inside this handle submit function inside this Tri block and uh I'm going to be creating an action and that is going to be a server site code we can use the use server uh string at the top of that file which is a very important and uh very awesome feature of nextjs that we can use a write a server side code as well as the C side code in nextjs so here we can actually call A Part particular function which I have yet to create so I can write the scrap OLX products and then we can pass the search prompt which is going to be a state variable this one all right and uh we can log this out over here product and we can call the add product function and pass that product which is returned from that function function uh and this is the add product function coming from that just stand store that we have created once this process is done of fetching the data we can actually set search prompt empty we want to make the search prompt empty from inside so I think that's pretty much it we need to do in this particular file and yes we need to write the async over here because we are using a wait actually over here handle submit so a sync all right so let's create this function and I'm going to create a new folder and this is going to be actions and inside this folder I can create a new file and this is going to be the scrap products. TS okay so inside this particular file first of all I will be using the server site code because I need to use puyer that can be used with the node and the server side node uh packages can only be used when we write the use server string at top of this file otherwise uh it will be considered as a client component um and so I'm going to first of all import puer which I yet need to install so this is the spelling of it so I'm just going to copy this and I'm going to install it by writing its name in in my terminal all right and uh I will also be needing this one revalidate path from the next cache uh because um this is going to refresh the page automatically uh when the new product is scrapped okay so here we can write the export async function scrap let me copy the name of that function which we have written over here scrap OLX product and this is going to receive URL string like this okay and this is going to have this try catch block like this all right and in the TR catch we can simply log the error and simply return null and for now I'm simply returning null from here as well so that the error can be removed and you can also see that this error is gone because we have successfully installed puppet here so let's go over there and we need to import that so I'm going to write the scrap OLX products from the actions and this error is gone this is awesome now I'm going to go and I'm going to use this Puppeteer this is the main part of this video uh this video actually teach you uh the Tailwind CSS creating UI and scrapping the products so inside this try block I'm first going to write few uh lines of code this one is very important first of all we need to launch and inside it we can pass few properties we need to pass the Headless equals to true or headless equals to false or headless equals to uh I believe we can write the new as well over here okay so for now I'm going to write headl L false all right so after this I'm going to create a new page and this is going to be browser new page and then I'm going to wait for the navigation so when we are launching a particular page we need to wait sometimes the internet gets slower uh or um internet having a good speed so we need to consider both of these things here so I will be writing the const nav navigation promise equals to page. wait for navigation and wait until Network idle we can write zero over here and we can write the time out as well if we don't write time out in the slower internet it will cause problem so I'm going to write the maximum value inside it okay so this is going to be like you can say a 2 minutes so after this uh we can actually go to a particular URL which is going to be passed from here and as promises we can also write wait until Network idle zero and we can also write the timeout value over here as well uh for navigating to that particular page okay so after this uh we don't need to actually uh import the jury but jury provide us a lot of buil-in functions that are uh better than I believe to me um I find it more flexible when using jury rather than writing the JavaScript document. getet element by ID or get um element by class name we can use the JavaScript built-in functions as well but I will be using the jQuery over here so for using the jQuery we need to add a script so this is the Syntax for adding the script await page add script tag and this is the URL uh that we are importing from so await after this uh we need to a wait for navigate promise which we have written over here okay uh after that we can actually check if the JQ is successfully imported I'm just adding different validation these are not actually required so if jur is successfully imported that we are going to proceed otherwise we are going to throw an error that J cury is not loaded so this is the line of code that you can write by pausing this video so const is jury loaded AWA page evaluate uh and then this part is important and this is the error all right so this particular code which I have written so far in this Tri block is necessary for every kind of scrapping that you will be doing um in your scrapping log loic uh if you want to use JQ actually okay so uh this is common for all kind of scrapping logic and next we are specifically going to focus on the OLX products now so here I'm going to write con data equals to A wait page do evaluate like this and uh inside it uh we can actually scrap the data so tab 9 is actually assisting and suggesting me few things but this is not correct um I'm going to create few variables what we need to do we need to import the title okay and we need to import the price from the OLX page we need to import the description and we need to import for the features this is going to be of type string array equals to this array okay and uh we are going to Simply return the title price description and then the features outside this data we can stop that browser of puer and we have to revalidate path as well which we have imported above it is going to automatically uh refresh the page we need to write the path that is going to show all the products and then we can actually return the data but we also need to return the URL which is scrapped from here uh although URL will always be I think available already available over here search prompt uh but you know that we are actually U making this empty so this is just to show you that how we can actually manage sending the URL from here all right so I'm going to remove this particular line and uh let's go over here and this product should be returning the empty data for now I have not written any logic over here this is very important and need your focus so let's first scrap the title of that product so I'm going to open up any specific product and what we need to do is I'm going to right click to the title which I think is the title of this particular product let me bring it like this you can see that this particular title have this class a38 b81 and I'm going to verify the class name of uh this product as well okay so that I know that the title exist in the same class which I actually focusing on all right so this is the class name let me copy this class name actually and uh we can uh use this class name to get the inner text of this particular H1 there is another another thing which I need to make sure so I'm just going to copy this class name and I'm going to find that any other element should not have the same class because if I try to scrap the data of the same class name and that class name might be existing elsewhere in this particular page it can be existing uh in the food so I will not be able to fetch the correct title if I use the class name which is duplicating at multiple places and one place it is using for title another place it is using for the title of description maybe the features so I'm just verifying it so this is the title which is the class name okay and uh then this is let me verify so I think this is just the compiled data uh we can uh use this particular class name so I'm going to copy the class name and I'm going to come back to my code and this is the title so let's use J cury I hope that you have an idea how we can use the J cury and we can add the class name over here and another thing I would like to mention it's better to use the tag name as well because if the same class name can be used uh with other elements as well and in case uh the same class name is used in another P tag so that P tag will not be included even though the class name is say so we are making sure that uh the class name exists on H1 should have the title fetched from so I can write the H1 from here all right and then I can write the text and then uh we can trim as well okay so uh I'm going to save it and and there seems and I can't find do you install the definition so let's ignore that for now when we'll run over project we'll see if it throws an error or not maybe we have not run the project so it is not loading the jqy for now so and in case there is another H1 that have a same class name but one includes the title and another includes something else but we want to title then we are going to go back to the parent and we are going to match that the parent should be different if that is same as well we are going to come back to the grandparent uh and then we can actually come select the parent element then we are going to select the children of it then this is how the J cury work I'm not teaching J cury over here so when we'll come back to the U features uh you will have a better idea for the price I'm going to right click to the price and it has this particular class name in the span so I'm going to copy the class name for this price and I'm going to use that class name so dollar sign and this is the span element Dot and I'm going to paste that class name so this is going to be the parentheses actually okay so again text and then the trim okay so let's go back and let's copy the description so I'm going to right click inspect and um this is I think uh the class name for actually I'm going to open this up so I think 0f1 so I'm going to copy this one and uh I'm going to when we are going to call this particular function text it's going to ignore the children span element it's only going to focus on this particular text so let's copy this one and I'm going to write the dollar and this is going to we are only going to use the class name for now all right so let's add the text and then the trim for now let's leave the features empty uh let's save this file and let's run the project let's see if it throws an error or not and in the search component we are already calling this particular function we are logging the product we are not showing the product at the moment below this search bar we can actually right click and we can uh go to the console window this is where it will export the data for that product so let me copy this URL let me paste that URL you can see that it has actually enabled this scrapped product and uh the project is running as well let me refresh this so that uh we know that the project is running it is compiling okay so let me copy again and paste over here let's click on the scrap it should open up that particular page yes it has opened up successfully this is great now we need to wait for this so that it uh detect the title price and the description based upon the class names that we have uh added to it all right so you can see that it has actually exported this description exported this price and the title and also the URL this is awesome we are successfully scrapping the data for that particular product except features obviously we know that we are returning and I'm using the console ninja extension and we can see the title price and description being shown over here along with this URL all right um and this is going to add the product in our Z stand store as well but we are not seeing yet because we have not uh created the products component to show the data below this search bar all right this is awesome now let's go back to the scrap products and let's try to Let's ignore this one we can actually let me hover over it or all right so we actually need to install this particular thing and I think it will be removing this particular error the types J cury needs to be installed yes you can see that it has actually removed that meanwhile let me run the project again and I'm going to go over here and uh one more thing I would like to mention to you guys that all the products of OLX since this is an international level website it's coding structure is pretty good and most of the site who is a good developer build the sites with the proper format so all the titles of all the products will be having the same class name if the website is built with a bad code structure then that website is hard to scrap because there might be a chance that one product have the title having different class and another product having the title within the different class name so for each product you need to write a separate Scrapper separate class name separate logic over here so but uh the good websites uh I think 99% of sites will have will be following the same class name same coding structures but sometimes it's hard you have to make sure all the products all the lists all the data having the same class name otherwise you need to make a check if this is this then use this class name if uh there is another uh logic being verified then you need to use another class name so this is just giving I'm just giving you the tip so now let's come back to this feature here I'm going to right click to the features and here you can see that uh we have uh these class names for this so um I'm not going to use this class name although all the features having the same class name I'm going to use uh the parent class name this is the parent class name and J cury gives us a children function we can iterate through all the children of this particular div and uh we can fetch the inner text of each each children element which can be existing in the parent development so I'm just going to copy this particular class name of this Dev let's come back to over here and uh now I'm going to use this dollar sign of the jqy and I'm going to write the class name of the parent all right and jury gives us this children function and children once these are fetched at this particular point we have access to the all the child span elements of this class okay um and here we can actually check each function and then this is going to be storing the features we have already created the features variable question mark push and then this part each function gives us a built-in keyword that is called this okay and this is referring to each span element okay and then we can write the text and we can write the trim as well if you want but we are not going to do that at the moment all right so Now features is going to contain all the features as well all right so I think we are good to go um and let's save this file let's go let's refresh this so that everything is up to date although next just automatically refreshes the website but uh in case it does not let's go to the second product uh we have uh scrapped this before let's reload this third one so that it works on all the websites so I'm going to paste the URL let's click on the scrap and it's going to scrap you can also see that at the back it has been converted to the scrap pink this is awesome this is this is what we added the logic in our search component so it's taking some time it depends upon the internet as well it depends upon the speed of this website as well so here you can see that we have got the seven features coming in uh let me verify there are seven features yes there are seven features over here this is awesome we are successfully scrapping the data now we need to create a UI component so that we can show the results below this so uh I'm going to go and create a new component and this is going to be called products. TSX let's use the react snippits RFC products all right so first of all inside it I will be using the Z stand to fetch all the products uh this is the benefit of that Z stand so use store coming from the hooks OLX all right so state of type any and the state products all right and we don't need to pass this products to print this out from the parent components it automatically done first of all uh let's add uh this one first of all I'm going to add the class name width equals to full with respect to parents and uh I'm going to check if the products do length is greater than zero question mark and I'm going to write an m T string over here then this one and this okay so inside it we can actually write the P tag let's add the P tag and I'm going to write the class name text Center and uh we can make this text uh gray as well gray let's give it 600 and inside this P tag we can write no products exists all right so let's save it all right so and inside it we are going to show the data if there are any products exist in our store all right so we can uh write Dev let's close it and uh inside this Dave I can write the space Y and four so inside this div we can I trate through all the products so products do map product of type any and this is going to be an arrow function and we can pass the index as well uh so we can add the key prop which is required by the products. map function so after this we can uh let's let's close this one as well like this all right and here we are going to check if product. title exists sometimes uh our logic in our scrap products can throw an error that uh we are not able to scrap uh the product because of the slower internet uh and if the title exists title is scrapped it means that U it successfully scrapped the other data as well okay so that's why we are only making check on the title if title exists then then we are going to show the data and uh for the price description and the feature okay uh otherwise we are not going to show that particular data so we can actually write the null here as well so um there seems uh an error so I'm going to add a div let's close this div as well all right so inside this Dev I'm going to write the key this is going to be the index and I'm going to write the class name as well so this is going to be the Border 4 border neutral 200 and BG white padding five rounded large all right so now inside this Dev I'm going to write another Dev so this particular div is going to contain the title and the link okay so let's add uh the title first of all inside the H3 this is going to be the product. tile all right so after this title I'm going to write I'm going to add few classes as well but for now let's add uh HF and this is going to be the product question mark. URL and the second prop is going to be the real uh this is actually required no opener and the No referer and then we can add the class name we can have the hover BG gray 100 border PX3 py1 ml2 rounded MD and height I can give it to 35 pixels okay so this is the anchor tag and uh it has this HF it has uh we can give the target to it as well so that whenever we click on it it opens up on a new tab of our browser browser and uh I'm going to select the link from uh my Emojis so I can search the link from here we can select this one so it is going to be visible over here all right so it's visible let's try to give few classes to it so class name Flex W full items stop and justify between okay let's give the class name to it as well so text is equals to the extra large font equals to the Bold text Gray equals to 800 all right so before moving further and start printing the features I'm going to verify if it works or not but we need to add this product somewhere I'm going to add this product in my page so let's add it below this one so products coming from the components all right so there seems an error below no modules uh saying external selector development use ref okay since we need to use the use client at the top on our products component so use client as I've mentioned by default all the components are the server side components all right so now let's verify if we are able to see the products over here currently it's showing no products exists so uh the reason we use use client because we are using hook whenever we need to use hook we need to make it use client uh I would like to add question mark because if there are no products exist then we are not going to process this product. title let's save it let's refresh this page uh let's copy one of the URL of any product and I'm going to paste it over here and I'm going to click on scrap now so it's going to open up the that URL and yes it has opened that up and it should be scrapping all the data but it should show the title and the link at the moment because this is what we are showing over here at the moment so let's see uh so I think it has to complete depending upon the internet connection and the speed of website as well let's see currently it's still showing the scrapping means that the scrapping is still in process all right guys so uh it's showing all the data over here but it's showing the title and the link link is going to be the same link which we have scrapped all right so we have successfully scrapped the data now let's quickly add the price description and the features over here so for that I'm going to add a P tag over here I'm just going to copy that so that uh the time is not wasted so here you can see see that uh I've added the P tag with these classes and added the product do price and after this we need to add the description so let's add the P tag over here with the product. description with slightly different class names over here and uh after this uh we need to add the couple of divs below this P tag let me close these divs as well like this all right so inside this particular Dev I'm going to I trate through all the product so inside the product question mark and we have the features inside it and then we have this map and inside this map we have the feature and then the index so feature of type any feature in index uh this is what we can add uh so let's add index of type number over here let's add the parentheses and inside it we can add the span tag okay span is going to show the feature name which is the string which which Auto automatically going to be added the reason it's giving error because we need to add this index okay so now uh we need to add one last thing which is the class name to it so I'm going to add uh the classes to save our time directly over here all right so this is the classes it has the background blue few paddings margin inline block round so this is how it's going to show up this data we have added all the different validations so now let's go ahead and verify this over here all right so it automatically showing this up over here because the data was already existing in the local storage uh local storage means I mean to say the the stand local storage and another thing you can see that export button is enabled because there is at least one product in the zand store that's why this export button is enabled but currently clicking on it will not do anything if we go over search component this is the button and we have not added any function to it this is the last thing that we need to do over here so let's do that first of all I'm going to go up and I'm going to create a new function and this is going to be export products async like this all right and here I'm going to write uh a weit export data which I yet need to create this is going to be existing in the same Action Scrap products that I have created I can create a separate file as well but for now I'm going to uh move it inside so here it is let me add an alert as well exported okay we can use the toast libraries to show the toast which you can use but for now that's not our purpose catch console log uh error all right so let's copy this one and I'm going to add the export products over here on click all right and uh now it should be clickable now let's go ahead and click on the export data first of all let's try to write the export data over here as well which is not existing so let's come over here so let me copy this particular line and uh let's come back here and I'm going to paste this line and I'm going to change the function name to export data and it is going to take the data from the search component of type any let's go and verify if the error is gone yes error is gone now so inside it first of all I'm going to use a package of node this is going to be called FS package okay so we can use the fs from FS we don't need to install it because on the server side node automatically have a built-in package called FS FS means file system we can fet the data from any files we can create new files we can store the data in any files uh it provides a lot of bu in functions okay so here what we can do is we can again let's at TR catch block uh which is very important console. error equals to error okay so now I'm going to write the Json content this is how the uh FS module works so we need to pass the stringified data so stringified data uh we can actually add these optional parms here as well all right so FS dot uh let me show you these are all the functions that it provides uh these are very useful and we can write the right file we can name any file name to it so data. Json this is the name I have given to it let's give the Json content and after that let's give the type to it utf8 uh and also finally we are having this error in case there is an error then uh we are simply going to print or return the error from here while uh exporting the data otherwise if there is no error we can simply log the data it's up to us we can uh push the data in our database if we are linked to the database in our application we can store the data in Excel file it's up to us all right uh so console log U Json file have been created all right or updated so let me save it let's go to the search let's go to the page and let's go over here it should be automatically refreshed uh unless and until I don't refresh hard refresh the page the the stand store is not going to get empty um now I'm going to click on the export okay it is showing exported uh this is good I'm going to go and here it has created this file Json file data. Json I'm going to click on that and here you can see that it is showing the title price description features and URL why it is not showing the price and it is showing the null as well let me verify this uh it's not showing the price over here as well let's see uh it was not able to fetch the price okay I'm going to scrap another data now so let me copy this URL let's add this one and I'm going to click on scrap ping and it's it needs to open up the URL let's wait for it let's see what happens uh meanwhile it's scrapping let's go over here does it contain So currently it does not have the price that's why it didn't show the price the the internal data of that uh element in our Dom was empty so this they have not mentioned any price on that particular class name okay so I'm going to go so this is in progress let's see uh and let's also verify whether it had the price yes it had the price let's go over here and yes guys you can see that it is showing the price over here and uh it is showing the features and because it contained the price all right um before adding the price we can add the random data over here it's up to us now let's click on the export that file should be updated the function which I have used over here uh it updates that file you can see that it has exported let's click on that and here you can see that uh it is actually showing the price data I don't know why it is showing this null in between so let me try Once More by hard refreshing this page so that the the stand store gets empty and I'm going to scrap the data again let's click on the scrap and uh I believe that I didn't hard reload so sometimes in between it returned null because I'm returning null from here okay so let's see what happens uh let's go back and uh it should show the data and after once uh it scrap the data I'm going to click on the export button and it's going to update this data. Json let me remove this file as well uh and show so that it gets created again okay so this has been added uh let's export and scrap this data as well before we can export couple of products data Okay so once this is being done uh again I'm going to explain you that you can uh process this logic you can process this functionality to any extent so means that uh you can pass the URL for the list of products let's say if we go down uh related ad so it can have uh different products uh there can be the hundreds of products over here you can uh go through the classes you can add the loop you can I trade through all the loop data and you can fetch that products and upload to the database So currently if we go back you can see that the couple of product have been exported let's click on export so it quickly exported click okay let's go back and see data. Json let's go over here so there is no null uh I was right so it is showing the data perfectly all right and uh we can export the data in CSV format you can check out the documentation of FS module that we have used over here it provides a lot of buil-in functions that you can export in almost every format uh this data there are few more tips that I want to give you guys over here uh so you know that the OLX is an very big website the it's the e-commerce I would say this is the website and uh there are other websites like the Amazon Google Facebook these websites have the security checks added on it uh means that whenever a bot is trying to access these websites they detect that our website is continuously being accessed by the Bots currently I did I have used it for like five six products not much but if the bot is accessing uh this their website for dozens of time maybe hundreds of time they detect and they ban the IP the network IP the internet IP uh from accessing their websites Okay so uh but that kind of security check is not added uh on smaller websites the local they they they don't care about like uh security is important uh the scrapping that we have written is a kind of Bot because this is not a real human being who is accessing their website it is the bot that we have written who is accessing their website So to avoid uh this restriction from the big website uh to access because web Scrappers needs to scrap have the data for from thousands of products from OLX from Amazon from Google Store uh from any website okay uh so there is something called proxy servers if I write proxy server uh for web scrapping okay so you will see a lot of proxy servers added so along with this Puppeteer if which I have used uh we will be adding some credentials for the tools for the proxy servers that we need to use in order to scrap the data so what proxy server does is they keep on changing the IP addresses while scrapping the data whenever a new page is opened or launched over here they keep on changing the IP so it does this because the main website should not be able to detect that the this is the bot who is accessing the website because IPS keep on changing from different locations of the word so if the IPS keep on changing they will not be able to detect that this is being accessed by bot and they will not be able to block any particular IP from accessing their websites this this part I have not included in this tutorial because uh I didn't want to make this tutorial longer uh one very famous is auxy laabs I would suggest you guys should check this out it is not difficult to add auxy Labs or any proxy servers along with your Puppeteer configuration you just need to create an account you just need to add maybe username password while launching the page uh or maybe you need to add some ports other than oxy Labs there is a very famous bright data which I use quite a lot and bright data is a very good uh toolkit which you can use and configure along with your Scrapper but you you can search out uh there are a lot of proxy servers that you can use proxy server is very important if you want to scrap thousands of products from a single websites it can be Amazon it can be the Alibaba AliExpress OLX or maybe the movies from Netflix and but uh scrapping is important when you don't have apis provided by the websites uh API is an awesome thing to fetch the data from any website but any website which does not have API the only way is to scrap the data this is what the uh search engines do they scrap the data from different websites and show you the results so I think that's pretty much it for this video if you have any question about web scrapping you can ask me in the comment below and one more thing I would like to mention here that if you want any development services from me you can check that uh the LinkedIn profile of mine in the description of this video you can give me work I can uh assist you I can guide you in the technical decision making I can give you my services in the software development uh just contact me on LinkedIn uh the link is in the description of this video so guys that's pretty much it thank you so much for watching do subscribe my channel and I'll be creating more videos in upcoming days see you in the next video
Info
Channel: Programming with Umair
Views: 1,208
Rating: undefined out of 5
Keywords: web scraping in next.js 14, web scraping with puppeteer, configure puppeteer in next.js 14, scrape olx products with puppeteer, server actions in next.js 14, export scraped data in next.js 14, build web scraper in next.js 14, configure jquery in puppeteer, nextjs, next.js 14 tutorial, build responsive UI with tailwindcss, javascript, react
Id: dIFcAgecqhA
Channel Id: undefined
Length: 72min 10sec (4330 seconds)
Published: Mon Feb 12 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.