Web Scraping Made Easy With Google Sheets

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
using Google Sheets for basic web scraping let's see how that works in this detailed yet clear tutorial hi I'm Lisa and as you might have already realized today I'll be covering the topic of Google Sheets web scraping we'll talk about the main reason for using this technique cover situations when it's suitable and present an example of scraping website data with Google Sheets Google Sheets web scraping can be a very effective technique While most ways of web scraping require you to write code web scraping with Google Sheets requires no coding or add-ons all you need to do is use a built-in function of Google Sheets thus it acts as a basic web scraper Google Sheets is a great tool for those who want to begin web scraping without writing any code this way you'll be collecting the data in a spreadsheet which is another Advantage however note that Google Sheets has limited capabilities and you would need to use or create a web scraper script for any Advanced Data Gathering operations and now we're ready to move on to the tutorial part import XML is a function that Imports data from various structured Pages including HTML it accepts two parameters the URL and the XPath quiring you can create an expat query directly from the browser open the web page in your preferred browser right click the element you want to extract and select inspect you'll see the developer tools open with the HTML element highlighted right click the HTML of the highlighted element then select copy and choose copy XPath this action copies the XPath and saves that element to your clipboard let's look at one more example we can extract the first quote from the web page by using a more complex XPath as you can see expat queries can become pretty complex in order to remedy that we suggest putting URLs and their related queries into cells and refer to those cells in our function arguments foreign overview of how import XML and XPath work press Ctrl F or comment F in the Chrome developer toolbar you've opened earlier any XPath you write will be highlighted by Chrome in the HTML code by analyzing the code you can specify the absolute location of an element in our case an anchor tag this is extremely inconvenient though as the structure of the HTML page constantly changes instead of an absolute path to the element you can just select all anchor tags no matter their location let's use the built-in Chrome function codex to see whether our query works as expected as you can see it selects all the links but by specifying an index in the square brackets you can narrow it down to only the first link next let's extract the text of the link by using the text function we can also extract attributes and even select HTML elements based on the values of those attributes now that we've learned some XPath it's time to use it in Google Sheets we will be scraping a website that contains books and their prices start by opening the website and finding out where the book titles are stored using Chrome devtools we can clearly see that the title of the book is stored inside of an attribute that belongs to an anchor tag the anchor tag itself resides within an H3 heading tag navigate to Google Sheets and create a new sheet enter the URL of the web page and the XPath in two cells entering these values in the cells enables us to create a formula that's easy to maintain now type in the function import XML inside of a new cell and pass the URL and XPath to it as an argument press enter to execute this function in a moment you'll get all the 20 book titles you can also extract the prices by passing the same URL and a different expat query to the same function apart from import XML a few other functions can be used for web scripting directly from the Google Sheets document again there is no need for add-ons as this are natively available these functions are as follows import HTML import feed import data you can use the import HTML function to extract data from tables and lists import data function can scrape data when you Target website URL contains data in a CSV or tsv format import feed function can import RSS or atom fits if your target page contains data in a table the import HTML function is perfect for you let's use this function to scrape a list of highest grossing films from Wikipedia start by creating a cell for the URL as usual then invoke the import HTML function specify the cell that refers to the URL a string table as the second argument and an index we can even extract a specific column from the table we fetched by using the index function which accepts a URL row number and column number since we want to extract all rows we skip the row number in the function arguments and only specify the column we wish to parse RSS and atom are two standard XML formats to generate website feeds even though scraping data from them using the import XML function is possible that would still require writing the expat queries thus A specialized function is more suitable there are a few third-party add-ons but we don't need any of them this is where the import fin function comes in it only needs one argument which is the URL of the feed this function will fetch the feed data and present it to you in Google Sheets let's take the example of the New York Times technology feeds to use this function in action let's update our function to include the headers by specifying true as the third parameter and also limit the number of items we fetch you can also extract specific information about the feed using the query parameter you can even narrow down to a title a description or a URL you might see an N A in your cell this means that the information you fetched is empty when you are ready to fetch actual data from the feed use items in your query if you have a website URL that contains a CSV file you can use the import data function to get the data let's fetch sends to CSV from the census.gov website and display it in Google Sheets foreign to answer the question of whether the data stays fresh if you keep your Google Sheets open these functions check for updated data every hour data will also be refreshed if you delete and add the same cell however data will not be refreshed if you refresh your sheet or if you copy paste a cell with these functions the key advantages of using import functions in Google Sheets are you don't need to learn coding data stays reasonably fresh automatically you can use somewhat Dynamic Imports as this formulas can be used as regular Google Sheets formulas which means this can reference other cells there are plenty of drawbacks as well this method is not scalable there is no option to customize the headers no option to send a post request and it's not possible to use a proxy for anything Advanced you would have to rely on either programming or professional Solutions for the last part I will cover a few errors that you can get while scraping using Google Sheets error every result was not expanded means you are accidentally attempting to override data in existing cells error result too large when using import XML narrow the query down more to avoid too many results error this function is not allowed to reference or sell with now Etc it means that you are trying to reference one of the volatile functions such as now rent or rent between in one of the parameters this references may be indirect or direct the import functions can't use most of the volatile functions the solution is to copy and paste values and then reference the values and that's it for today in this tutorial I demonstrated how to scrape website data using Google Sheets for your convenience we also have this tutorial covered in our blog as well as GitHub repository you can find the relevant links in the video description below if you have any questions about this or any other topic related to web scraping feel free to leave a comment below or contact us at hello at oxalabs.io if you enjoyed this video don't forget to like it and subscribe to our Channel thanks for your time and see you soon
Info
Channel: Oxylabs
Views: 3,979
Rating: undefined out of 5
Keywords: web scraping, web scraper, web scraper extension, web scraping tutorial, data scraping from websites into google sheets, data scraping tutorial, webscraper, website scraping, data scraping, oxylabs, oxylab, google sheets, no code web scraping, extract data to sheets, web scraping tool
Id: _T43JfW-T3s
Channel Id: undefined
Length: 10min 49sec (649 seconds)
Published: Thu Mar 09 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.