5 Functions for Web Scraping in Google Sheets

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey there today's video is a compilation of three previous videos that I released we're going through all five built-in functions to import data into your Google sheet I hope you find Value in this please hit the like button and subscribe to my channel for more content like this now let's go can you connect two Google Sheets data to one another yes and it's a built-in function that's easy to use it's called import range you need two things you need the URL of the spreadsheet that you want to pull data from and then you need the spreadsheet range on that spreadsheet that you want the data to come from check it out import range B2 this is a URL for the data over here which is just a bunch of Samsung Galaxy random data that I got on kaggle that's the URL if you want to type it out you need to put it in quotes and type out HTTP blah blah blah I find it easier and cleaner to just paste it into a cell and then reference that cell then also within quotes you need to reference the range of the spreadsheet you're pulling data from I've renamed this sheet tab to data that's why you see data exclamation mark A1 through K and that will as I hit enter pull every piece of data on the Samsung Galaxy phones into my spreadsheet now you can minimize this to say the first 10 rows if you want to just get those first 10 rows any range that you want to put in here what if you want to pull data from a website itself well there's an import HTML function also built in this is going to take you guessed it the URL a query which this is going to be one of two words you're going to either have the word list or the word table and then an index meaning the first table the second table like which number table if there's more than one on the page do you want to pull in now what are list and index those are HTML descriptions of the data on a page if we go quickly to the article that I'm referencing on Wikipedia here and we pull up the inspection developer tools we can see that this right here is formatted as a table in HTML in other cases you may come across a list an unordered or an ordered list of HTML data you want to bring in that's getting into the weeds a little bit let's just check out what our import does here we are again referencing the URL in this B3 cell we're typing table because that's the table we want we're going to get the zeroth table that's the first one on the sheet and check it out it is pulling over this table of information and stats into our spreadsheet let's import some more data into our Google Sheets shall we today let's look at import data and import feed though all of these are built-in functions two Google Sheets that you have access to right now import data all it wants is a website so I've found some data sets over here and I've found this in a CSV format it wants to see either CSV format or tsv format that's comma separated or tab separated values and then I've pasted that in here for our URL to use and all it wants is hey import data URL there it is there's the data it's doing the heavy lifting for us if for some reason you hit a snag you can always go in here and manually put the delimiter as a comma in our case here that's a CSV file that we are pulling in but it shouldn't need that something would have to be messed up for you to have to manually type that in that's it that's easy import data nothing to it import feed is about as simple it needs a URL for an RSS feed so here's the RSS feed page for one of my podcasts and here is the XML code from that actual page it's just a huge long document that keeps track of every single episode and all the pieces of information about a podcast so with import feed we can come down here and say import feed give it the URL then everything else is optional but all I wanted here was the title for each of the items which gives me each episode title and then I wanted the headers to be there so I wanted it to say title at the top and then I said give me 50 items and I actually don't have 50 podcasts but that just pulled in everything that I do have I did the same thing over here to pull in the URLs for each of those so if I went over here then it would actually take me to each individual podcast player page and that's all there is to it now if you wanted to get some more information on some of those options you can pull up the help box here and it'll tell you about those query options like the items and the URL that I used what all you can do with those all right let's talk about import XML this is the last of the five of the import functions that are built into Google Sheets I've gone over the others they're up here for your reference and they will be included in the Google sheet that I've Linked In the description below today though we're looking at a very powerful one but it's also complicated import XML let's start with a web page because we need a URL address and here we have it up here this is my author page at free codecamp it's just got a list of the articles that I've written the titles and the links to them Etc if you pull up the developer tools we can then inspect the page and see some things about all of these elements to use import XML you have to know a little bit of HTML which is how information is presented on a website you can see that down here as I highlight or as I scroll through different items it's highlighting them up there on the web page for instance this H2 class postcard title this is actually this section right here the title to that article as we'll find if we inspect others they are all H2 class postcard titles and then they simply have different text as well as what's called an anchor tag with an href that's the link to this particular article okay so you can find out tons of stuff just by right clicking a page and saying inspect and then it will pull up the developer tools down below let's go into our spreadsheet where I've pulled the address to that page for us to use and it's in a11 and then let's start out by writing something simple ourselves import XML a11 which is where the URL of that page is that's my author page and then XPath query so this is going to take a query that's written in double quotes with a double slash at the start of it and we can put in something as simple as H1 which is or an H2 rather which is a header element and it will pull in all of those titles because they're all H2 elements but it's also pulling this guy right here this little snippet about me I don't want that so in order to be a little bit more specific over here in our actual statement we're going to write a11 again slash slash asterisk that's gonna that's shorthand for saying I want all of these things and the things we want are enclosed in Brackets we're going to go at class postcard title okay that's going to say hey look for anything with the class postcard title and we found that in our page right here okay so once we do that it's going to pull in all of the titles to my articles in order to get the slug or the URL for the individual articles we're going to do something similar it starts out the same way so we're getting that class postcard title and then we're saying hey go one step further nested beneath there is an anchor tag so slash slash a slash at href it gets a little funky right but it's real powerful because you can get super specific on what information you want to pull into your Google sheet so that's saying look at those H2 elements the titles we just pulled and then attached to those is this anchor tag with a URL so bring that over here and sure enough it brings that over here now in order to turn that into a working link I just concatenated the freecodecamp.org part with that slug I just extracted okay this is the Bare Essentials of what you can do but it gives you a good introduction to the power of XPath query uh here is a cheat sheet this will be linked in the description also linked as that Google sheet that we just used this shows you I mean it goes way into the weeds you can find whatever you need to in terms of how to look up different things depending on how it's listed on the page that you're trying to look it up I found everything I needed here it took some problem solving some sleuth work in some cases but super powerful I hope it's helpful for you and importing data to your own spreadsheets let me know in the comments below click like And subscribe to the video really helps out the channel you're awesome thanks and goodbye [Music] foreign [Music]
Info
Channel: Eamonn Cottrell
Views: 9,144
Rating: undefined out of 5
Keywords: Google Sheets, Spreadsheets, Coding, Programming, Excel, Technical Tutorials, productivity, Web Scraping
Id: Hx1Uepq3lLI
Channel Id: undefined
Length: 10min 14sec (614 seconds)
Published: Thu Sep 07 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.