IMPORTHTML, IMPORTXML Functions - Google Sheets Tutorial to Extract from Web Pages to Spreadsheets 2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay I'll be covering some functions today to import HTML and XML and generally we'll be talking about importing HTML but I'll show you how you could use XML to actually import HTML to because HTML is basically a subset of XML so first of all we're going to start with something very basic so I'm going to go to this page on Wikipedia and if I scroll down I'll see that there is this little table on this particular page in Wikipedia so if I scroll again there is the second table and the third table so basically these are community areas in Chicago and I'll show you how we could import this data to our Google sheet so I'm going to go to my brand new Google sheet and I'm going to start with my equal sign as usual and I'll start typing my functions so the first function I'll be talking about is import HTML so I'll be going over here and by the way today the way I'm picking the function is once you get your autocomplete results you can just arrow down and scroll and choose the function you like and just hit tab and it's just going to populate the rest for you so the first part of this function or the first argument is going to be the URL of the page where the information is coming from so I'm going to go ahead to this Wikipedia page copy the URL go back to my Google sheets and I will have to put the string in quotes because in Excel if you want to or Google sheets doing too much Excel classes apparently makes you say Excel all the time but anyway so in Google sheets we have our strings in quotes otherwise it's going to look for a function or a range that has this name which it does so there it is that's our string which is the page the URL of the so I'm going to hit comma that's going to move me to the second argument so the second argument is what type of form this information is in so import HTML function can include two types of forms a table and a list so a table would be this so this would be a table and that should be an HTML table so the way you can actually make sure that what you're looking at is an HTML table if I right click in my Google Chrome browser and I just hit inspect on this element if I look right here you'll see that the HTML element is the table for this so if the element is a table and we should be able to actually import that using a table so let's go ahead and write table again this is a string so it's going to be wrapped in codes and comma and the last argument is this index so the index is this so on this particular page we have multiple tables if you remember when I said if I scrolled from top down there was this one table then I scroll there's the second table the third table so basically the index is the index of the table on a pitch this is this seems like it's the first table on a page so don't think this point is a table I'm not going to inspect it I'll just go for it I think this is the first one so let's just go ahead and try it so I'm going to just put number 1 as the first table close my parenthesis and hit enter so and yes it turns out that's that table so if we go here and take a look that's the accurate table we've got it looking good and that's our table so if I go back to this function and switch this one to two we should be able now to import the second table from the list which is right here so let's take a look yep we have our information right here from the table so that's our import HTML with our table as a second argument so I'm going to I guess switch this back to number one you have the first table so now let's find an example awful list this one might be a list so let's take a look so again if I look at this see this is a list this is an unordered list this is the UL tag and we have list items and here so this should work for a list let's give it a try I don't think there is another list on top so this seems like it's the first list let's give it a shot so I'm going to go here right here we have some space so I'll go ahead and type my import HTML open my codes go ahead and copy my URL for the page paste it in and then I'll type list as the item I'm looking for and finally I'm going to give the index number one as the first list on the page and if we're not mistaken that is there it is so the first item here is community areas is this entire chunk is the first item and the second item the third item and the fourth and the fifth item so this is how you can use import HTML function so it's pretty straightforward easy to use and there we have our information so I'm going to rename this tab import HTML ok and now I'm going to start a new tab here now we'll be covering another function which is import XML so import XML is what I would say is much more powerful function that import HTML and we import XML you could actually do more and extract more information from the page if you want it to so let's try to use it so for example remember how we have all of this table the first table the second table third table the fourth table the fifth table so let's try something with our import XML functions so I'm going to copy this page because we'll need the URL for the page so I'll go ahead and type import XML that's my function and the first argument is again going to be our URL so I'll go ahead and use my URL then I'm going to use comma and now the second argument is the object from the page or what they call as an XPath query from the page we want so it needs to be in quotes again as a string so I'm going to try something like this I'm gonna give it a table as my strengths so let's see what happened here so I've exported quite an imported quite a bit of information here so that's number of communities so in your northside seems like we're importing the entire area in here as a table so that's kind of a little messy this seems like we've imported all the tables from there but they're very disorganized so this is not very helpful so let's take this a step further so in HTML if I explore my HTML tables you will notice that inside of the table the table structure in regular HTML is that you have the table and inside of the table you have this TR element which is basically the table row element so this is for example would be the first row see that's the headers on top that's the second row the third row the fourth row so what I'm going to do instead of just importing the table I'm going to do something else here I'm going to go ahead and after this table I will go ahead and add another - here and what I will use is my TR as the table brah so let's hit enter let's see what we get so that's interesting now we have a little better formatting we're including our table Bros that what I want you to see is that let's say this is Near North Side let's take a look so what's our first table here so there it is that's near North Side that we should have the loop then your South Side then what is the next one here so near side side and then we have the second table which it has a table bro which is the north center so oh so the first one is again it's not dead it's the header area here so there it is that's the header area and then the next table row so basically we're extracting every single table row from here and then there is apparently some more rows all the way down on a page again this is now much better but what I'm really trying to accomplish I just want to get all the community areas in Chicago and unfortunately for me they're not in a single table so if you look at this this is this community area three of them are listed here then there are more of them listed here then there are more of them listed here and so it goes so they are not very well organized in here but I think I've come up actually looking at this it seems like we've accomplished it with the exception that we have some chunk in this table so let's try to actually get this to look even better so what I'm going to do is look inside of my HTML to make this much cleaner and I'm going to look at our table structures so if we look at our table again the first one was tabled growth which is the element who I was extracted right now so I'm going to look inside a table row and if you see there's this th which is our table heading elements which is this basically the header section of this file and then if I look here this one table bro there are this TD elements so that's the table data element that we have inside of those there it is the second table data the third table data so these are our table data elements that hold the actual data and these are the th elements which are holding our table headings so what I'm going to do I'm going to take this a step further and I'm going to go to my table data elements so right now we're looking inside of the table and it's at a table or finding a table row which is the TR inside of the TR we're finding the TD okay so let's hit enter let's see what we got so right now it says 8:00 north side and whatever this is so let's see what we're extracting so what we're extracting is basically do we've skipped this because these are not table data anymore right and here where our first table data starts so this is the first one 0 8 then we have north side then we have this third box with all of this elements so there it is 8 north side and the third box with all this elements so all of them are just row by row and then we'll go to the next one which is 32 loop and all of these 32 loop all of this and so on so that's now we got two table data elements now what we really wanted was just the community area part right we didn't want all this 0 8 we didn't want all this neighborhood stuff so just the community area so what I'm going to do after this TD I will go with this square bracket opening and since it was the second one so if you look here this is the first one and this is the second one so since it was the second one I'm going to put two in here and close the square bracket I'm going to hit enter give it a second and look what we've got now a clean nice list of all of our neighbor neighborhoods loop New York North Side then North Center like view isn't that nice so that's the way we can get the entire list so this is I want you to note that this is not just one one table this is from all of this tables that are located all over this page but we were able to just grab exactly what we're looking for right here in the second row of each table so there you have it so if I did something like TD three I would grab this neighborhoods column and it would be like this guys so let's go ahead and give it a shot so I'm going going to just give copy this right I'm going to hit escape to get out of here without changing anything and right here I'll try to paste instead of TD - let's use TD three and let's give it a shot let's see what we got so this what this is doing it's doing the third row and as you can see it's just merging all of these together on a single line which is not really looking that pretty honestly but we were able to actually get them so that's the third data table and it's excluding all the mark-up out of it but I'm thinking we could go a step further because we're inside of the third data table and what we're really looking for is this seems like these are list items inside of this table let's take a look so it seems like this is the table data element then we have an unordered list and then we have a list item inside of it so let's try to use that structure to actually get better data so once I've extracted this I'm going to try ul and then I'm going to try a li which is inside of that ul and it looks like it works just fine so there it is the Gold Coast Goose Island look at the fits and smile isn't that great so all of these community areas with just extracting them right out of this box put them line by line right in our worksheet just like that so you could use this structured way of XML to actually grab any part of HTML page you need so for example if you had like a product grid of different products with their links to the product page links to the image links to the you know are not links but the price of the product or the title of the product you could actually go to that page just look at the HTML and look at a structural HTML and give our XML structure just like this and you should be able to grab all those elements out of the page just like that all right that's it that's our import HTML and import XML function and I'll see you next time
Info
Channel: Learn Google Spreadsheets
Views: 126,735
Rating: undefined out of 5
Keywords: importhtml, importxml, google sheets, spreadsheet, excel, tutorial, learn, class, help
Id: 7B4tPczv-H8
Channel Id: undefined
Length: 17min 9sec (1029 seconds)
Published: Fri Jan 27 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.