Golang Tutorial: How to scrape Ebay with Golang and Goquery

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this go long tutorial we will scrape ebay website with go and go query package it's a small go long project for today and i want to get from ebay titles of ads urls to these ads and prices after that i want to save the script data to a csv file also i want to do it for each page for a certain search query so in this tutorial we will look at how to use go query package for websites scrapping how to script data from multiple pages how to write data to csv files feel free to ask your questions and suggest anything in comments and also please don't forget to leave a like and share the video in social networks it helps to promote the channel okay first of all i want to create a new go module go mod init and the path with home.com for example if you just started to learn go language consider watching of my golang imports tutorial it's about of modules and packages paths in go so then i want to create a brand new main go file and its package name will be the main tool then let's declare the main function it's an entry point of our program now to scrape the ebay website first of all i have to perform a get request to it and get the source code at the first page with the results for my search query after that i have to parse this html code with the go query library and save script data to a csv file and after that i want to scrape a url to the next page then get it and scrape its ads and so on until i get the data of all ads so let's start with getting the html code of the main page of the ebay website let's define in the main function the url variable it's a string and by the way i want to use a beatles puzzle as a search query there are 700 results and also i want to set 200 items per page and as the starting url i want to use the one that's in the address bar of my browser this one firstly i want to scrape ads data from this url and at the end i will scale the script to get data from other pages so we got the first url and now i want to send a request to it to get the source code of the page so to make a request we can use the net http package let's import it now let's create a function that will perform get requests to a url i want to use a separate function because we have several pages to scrape and to get html code of each page i will call this function so let's say it'll be get html that takes a url as an argument it's a string and the function will return a response objects to be exactly it will return a pointer to the response object http response and to perform a get requests i have to call the get function from the http package http and get it takes the url as an argument and the get method will return two objects the server's response and an error if something will go wrong so response and error it will be a good idea to check the error variable whether it has a value or not if it has a value it means that something is wrong with the server and we didn't get the response anyway i want to see the value of the error variable so let's check the error if error is not nil and i want to print this error and of course i have to import on the fmt package i got the response from the server and now i want to check the status code of this response if it's not awk that is the status code is more than 400 i want to see it if response status code is more than 400 then again let's print it and as you probably know if the server responds with 200 it means that everything is okay 301 and 302 means redirects but it's still okay 403 x is forbidden 404 page not found and 500 codes are related with the internal servers errors so if the status code is more than 400 i will see the message and at last the function will return the response now in the main function let's call the get.html function i need the response the html source code on the website is stored in the body field of the response variable and i have to close the body of the response after i will complete the scraping process to do it i have to use the defer keyword and call the close methods of the body field so defer response object i need its body and i'm calling the close methods okay now let's install the goquery library or package that will parse the html source code of the page let's add it to the import section and to install it i have to execute go get command so as we got an html code of the page we have to convert it to the three of go objects three of structs that will allow us to search through it to do it i have to call the new document from reader function from the go query package and pass into it the body of the response object so go query new document from reader and the new document from reader function will return two objects of the document and an error let's unpack it so let's check on the error again if error is not now let's print it and the right style we got some code redundancy because we have the error checking in two places probably it will be a better idea to create for that a separate function so let's create the check function it will get an error and let's copy this code here and here let's call the check function and pass in swedes of the error the same here and then i want to call the script page data function for example to script the data and write them to a csv file script page data that gets the doc variable so let's define the script each data function it will take the document as an argument it's a go query document it's a pointer to the document and now let's examine the website for what exactly i have to search for i wanted to get the title of each ad its url and its price so inspect and we can see that these ads are inside the ul tag with the srp results css class and each ad is a list item tag with the as item class so let's get them all inside the script page data let's call the find method of the doc variable find and i want to find ul unordered last tag with a srp results css class and i want to get all its child tags list item tags with the as item class dot here means the class and the find method will return a selection object as a set of elements an iterable object that we can iterate through and for each element i want to get the title url and the price and to do it i have to call the each method each the each method iterates through the selection object and it gets a function as an argument each method will execute this function for each element of the selection that we get from the find methods so the function has a standard signature it takes an index of each element index is an integer and an item itself it's a go query selection element and in the body of this function i have to describe what actions should be performed with the item with the element of the selection i want to get the title of an ad and its url so let's examine it i need the a tag with as item length class item i i'm calling defined methods and i need a tag with as item link and its text property is the title i want to get and the url is the value of its href property fa href attribute so let's get them the title will be a text i'm calling the text method let's print it let's run the script and we got titles okay and also i want to get rid of white spaces at the beginning of the title and at the end golan has a trim space function from the strings package let's import it strings and let's wrap the text function call with the strings frame space okay and then the url the url is a value of the href attribute to get the value of the attribute i have to call the other method a author and i have to specify the attributes i want to get the other method returns also a boolean value whether the attribute exists or not i don't want to use it right now so i will use the underscore and also i wanted to get a price the price is a spun tag with the s item price css class let's get it price span item find spend with the s item on the score on the school price and i need again it's text let's print it and in my case ebay determines my location by my ap and here we can see the currency for my location russian rubbles i want to get rid of it to do it i want to use the trim function that takes a string as an argument and the substring i want to get trimmed so the price variable by the way let's get rid of spaces strings tram space and here strings trim i want to pass into it the price span as the first argument and the substring i want to get rid of so we got all data for each element of the selection now i want to combine them into a slice it will be a slice of strings so let's create a new variable script data for example and it'll be a slice of strings string the title as the first price and the url and now we are ready to save the script data variable into a csv file so let's call the right csv function and pass into it the script data variable now let's define it it will take as an argument script data it's a slice of strings some ports on the csv package from encoding module and here let's create the file name variable let's say it'll be data csv for example then i have to create this file if it's not exist or just open it if it is exist and append it with the new portion of script data to create or open a file and go we can use the open file function from the awesome package so let's import it and now let's call the open file function and it will take the following arguments first of all i want to open file with a file name i want to create it so als o underscore create and i want to create it for writing as o right only or for a pendant this three arguments are flags for opening the file and the last argument is the permissions for the file let's say the file will have all permissions zero seven seven seven and the open file function will return an opened for writing file object and an error so it's file and error so let's check the error and let's defer the file closing then let's create a writer it's a special object that will write our data to the csv file csv new writer and gets the file object as an argument file and after writing data we have to be sure that output buffers was pushed to the file i have to call the flash methods so right now i want to defer the call of the flash method writer flush and at last let's call the right methods of our writer right and let's pass into it the script data variable the right method will return an error if something will go wrong so let's check it tool error and as i already have an error variable here i have to reassign a new value to it so i have to use here just the equal character check error and that's it with the variety csv function let's run the code i got data csv it's working now let's scale the scraper to get the data of other pages we have to use a loop to iterate through pages we don't know how many iterations it will have and so it'll be an infinite loop then we have to find out a way to get the url of the next page and also we need to find out the condition of the lubricant so let's examine the pagination bar it's enough tag with a paginated class and each page sorry and each page is an a tag inside the ordered lists tag and i think that i need the next page button this one and has the length the url of the next page page number two and it has a pagination next css class let's copy it then let's get the last page right now we are on the last page number four and we can see that the next page button has the same url address as the current one and i think that the break and loop condition will be this one if the url address of the next page button is the same as a the url address of the current page then we will break the loop so in my main function let's define a new variable it will be the previous url variable no it will be defined in this way it's a string and then let's start an infinite for loop then in the first iteration of the for loop i want to use this initial address get the response from the server and scrape it let's move this strings inside the for loop and all these actions will be performed in the first iteration with this initial address then in this iteration i have to get the url of the next page so i need to get the nav tag with the pagination css class href underscore dog finds enough generation and i need its child a tag with a pagination next class and i want to get its href attribute okay we got the url address of the next page and then i want to check whether the href variable has the same value as the previous url variable this one if href is equal to previous url if it's true it means that we are on the last page and here i want to break the loop otherwise else the url variable this one will get a new value the value of the href variable it's the url of the next page and the previous url variable also gets this value and the loop goes to the second iteration with the new value of the url variable and with the new value of the previous url so let's run the code okay and we got 700 results a bit more but nevertheless the scraper is working so if you like the video please leave a like and subscribe to the channel thanks for watching you
Info
Channel: Red Eyed Coder Club
Views: 733
Rating: undefined out of 5
Keywords: golang, golang web scraping, golang scrape website, go programming, golang tutorial, go tutorial, red eyed coder club, golang tutorial 2021, golang ebay, golang ebay scraping, scrape ebay with golang
Id: mS74M-rnc90
Channel Id: undefined
Length: 29min 58sec (1798 seconds)
Published: Mon Nov 15 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.