Master Excel Web Scraping - Learn How to Import Data AND Images (with ZERO Coding)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Getting your data from a web page  into Excel isn't always easy. Well,   the good news is that Excel's Web Connector  recently got an upgrade, which makes it a lot   easier for us to import data from the web. Okay,  so here's the thing—easier is an understatement,   as ChatGPT would say; it's a game changer. So  check this out. I have this web page with some   header text. Then I have some blocks down here,  so we have some popular course package blocks,   and then we have separate blocks for individual  courses. Now, there are no HTML tables on this   page. What if I needed to bring the information  about each course as a separate row in Excel? So,   I want it to have a separate column for the  course category, another one for course name,   for lessons, and so on. Notice these blocks are  clickable, so it's going to be a nightmare to grab   this text. I have to find a way to copy this,  bring it to Excel, format it the way I want,   and then if I wanted to bring over the  links, it's going to add to that nightmare,   right? If I hover over this, we can see the  link on the bottom, and then on top of that,   the whole process is manual, right? If I add new blocks,  I'd have to repeat this nightmare over again. Well, the new Power Query Connector puts an end to  all this. Check out the end result. This is what   we can achieve. We get all the information  that we need from each block directly into   an Excel table that's connected to the web  page. We have clickable links, plus we get   to have our images here, and all of this is  dynamic. So, if the web page content updates,   all I have to do is refresh my query, and I  get the new clickable links added together   with the images. This is super cool, and as  you can see, I'm super excited about this. Originally, I had another web page in mind where I  wanted to demonstrate this Power Query Connector,   but that one didn't have links and images. Then I  saw a request come in that asked us for a list of   our courses, the number of lessons, and the course  link, and that got me thinking. I should be able   to use the Power Query Connector, and there must  be a way to bring over the links and the images,   right? When there's a will, there is a way.  So let me show you my process from scratch. Okay, so the first thing you need to do is grab  the URL of the page that you want to import. So,   I'm going to Ctrl+C this to copy it.  Go open a new blank Excel workbook,   go to Data, From Web, and paste in the  URL that you just copied and click on OK. Now, Power Query is going to create a connection  to that web page, and it's going to try to   identify any official tables and also any patterns  that it finds, and it's going to present those to   you as suggested tables. So, for example, under  Suggested Table 1, I can see this table was   picked up. Now, if we take a closer look at this,  these are actually our course blocks. We have the   course category here, then we have the name, we  have a short description, the number of lessons,   and then the Join Now button. So it starts with  Power BI and it ends with Google Sheets. Now,   if you want to compare this to the actual content  of the web page, you don't have to open up your   browser. You can just switch to Web View in the  same window. You get to see the content of the   page; the images just don't load, but that doesn't  matter. We can see what Table 1 contains. It   actually starts from this section right here,  under "Courses Designed to Help You Succeed,"   and then it ends right here with Google  Sheets, just before the eBook section. Now, you can take a look at the other suggested  tables. I'm going to switch to Table View again   to see if it has the contents that you  need. This one is information on cookies;   we don't need that. This one contains some more  information. This contains pretty much all the   blocks that we have. We also see the popular  course packages here and also the eBooks. Now,   one thing that all of these suggested tables  don't have is the course link. They also don't   have the course image. So, what are our other  options? Well, we have the HTML code here;   you might find this useful depending on the type  of analysis that you are doing. I don't need it in   this analysis. We also have the displayed text.  To grab exactly what I want, I need to go to   Add Table Using Examples. This allows me to give  Power Query a list of what I want, and it's going   to try and pick up a pattern. Let's say from this  page, I want the course category, the course name,   the number of lessons, the link to the course,  and the image that comes with this course. I'm going to start with the course name. I'll call  this column "Name." The name of the course is this   one, so you can start typing in. So, I'm going  to go with "Fast Track," and you're going to see   it pop up in this list, which makes it super  easy. So, we're just going to double-click to   select it. Next, I want the course category, and  this is going to be "Power BI"; that's this one   right here. Next, let's also add in the number of  lessons, and here it's 125 lessons. Now, we want   to grab the link for the course link; it's going  to start with "https"; we get a list of all the   links here, and we have to know what that link is.  So, this one right here is linked to this course,   so that's "Fast Track to Power BI"; that's what  we need. Last, I want to grab the image, and   the image is also a link. So, I'm going to start  with "https"; let's just scroll down to get to the   images. We can see these PNG/JPG files. The one  for Power BI is right here, so let's select that. So, we gave it one example; it hasn't picked up a  pattern. We need to give it another example. So,   I'm going to come here and start typing  in the next course, which is "Master Excel   Power Query Beginner to Pro." It's this  one right here. I'm going to press Enter,   and I can see a pattern was picked up. Next, which  category is this? This is in the Excel category.   When I select it and press Enter, a pattern is  picked up. The number of lessons here is 185,   and again, a pattern was picked up.  Now let's do that for the link as well;   select the link, press Enter, and let's  move to the images. This time I'm just   going to start typing in "PNG" to get all  the images. That's the one for Power Query. So now that it's picked up a pattern, this is  going to be dynamic. So, if I add a new block to   this section, it should be automatically picked  up. I click on OK, and I get a new custom table   created. I'm going to go ahead and transform the  data. Let's adjust the query name to "Courses."   Now, we can see that a couple of steps were  automatically added. So, that's the Source step;   that's the URL. Then, this part is extracted table  from HTML. So, if I click on this gear icon, I go   back to adding table using examples. Then I got  a Changed Type step automatically applied. Now at   this point, you can add any type of transformation  step that you want. For example, we could remove   "lessons" from this step, right? I can just select  this, go to Replace Values, find "Space lessons"   and replace it with nothing. Click on OK, and  then let's just apply a Changed Type step. I'm   actually going to push this Changed Type step down,  just going to drag and drop it down here, and I'll   just go and update to a whole number for this one  and let's replace current. So, these are links,   right, but there is no data type that's called  link or URL. So what's going to happen when I   send this to Excel? Let's go ahead and close  and load. Let's just load this to a new page. Well, I get my course information in a  nice table, but I don't have my images,   and the links that I get aren't clickable.  To get these to be clickable and have to   go into edit mode. So I'll have to  double click in cell, press Enter,   and then I get clickable links or  I have to use other workarounds,   but I don't want that. I want these links to be  clickable from the start. So if I get new course   blocks on that web page and I refresh my query,  these links are going to be immediately clickable,   plus I'm going to see the images of the courses  and not this link. All I have to do is make one   tweak to Power Query. So we're going to go and  edit the query, go to Transform, Structured Column,   and select Create Data Type. This feature is an  underused feature in Power Query, and it's super   helpful for these situations and any situation  where your data has some characteristics. So,   like if you're dealing with Master data, in this  case, our data type name is our courses, right?   That's what describes the data, Course or Courses.  Then, we decide on what are the characteristics of   this data type. So, we want to have the Category  as part of the course, the number of lessons,   so I'm just going to multi-select these  and add these together to this side. Okay, now you're going to see how cool  this feature is. I'm going to click on OK,   and I get my course data type here. So every  time I click to the side on the bottom,   we get the additional characteristics. I  see the name, category, lessons, and so on,   and I have the ability to expand this, but I  don't want that. All I want is the data type,   and I'm going to send this back  to Excel. Okay, let me just go   ahead and change the table design. I'm  not going to take it away completely;   I'll just go with this more minimalistic look, and  let's go ahead and remove the grid lines as well. Here comes the fun bit. What we're going to do  is either click on these cards, so these cards   show you the additional characteristics of each of  these courses. We see the name, category, lessons,   link, and image. So you can click on these  and add a new column. So let's say we want the   category as well. So I'm going to click on that  plus, and it's going to add a new column here.   Let's click on the plus here, add the number  of lessons as well. I want to have the link,   and I want to have the images. It kind of looks  like we are back to the same problem that we had,   right? None of these are clickable, and I don't  have my images. But wait a second, there is   something different this time. Look at the formula  bar; it's using a formula, right? It's referencing   the data type here, which is the course, and then  ".Link" grabs the link information, ".Image" grabs   the image information. Now, because I'm dealing  with formulas here and these are table formulas,   I can update this formula. I'm going to use the  HYPERLINK function to change this text into a   hyperlink, so I'm going to press Enter,  and the whole column updates immediately. What am I going to do with this? Well, I'm  going to use the IMAGE function to grab all   the images. All of these links are clickable,  so if I click on this, I go to the Essentials   page. Now, I just want to expand these  columns so I can see the images better,   and just to make sure that nothing shifts again  or nothing autofits again when I refresh this,   I'm going to go to Data, Properties  and let's uncheck Adjust Column Width. Okay, so that's it. We have everything we need.  All we need to do is test if it really works. So,   I've gone ahead and updated our course page by  including a new block for our newest course called   "Quick Wins with ChatGPT." The question is when  I update my Power Query table, am I going to see   this image and am I going to see a clickable link?  Let's test it out. Let's go back to our Excel   table; this is it. The last one is Google Sheets.  Let's right-mouse-click and refresh, and we get   "Quick Wins with ChatGPT." We have a clickable  course link right here, and we can see the course   image. This is super cool, and you can pretty much  use this technique on any web page that you come   across, as long as it doesn't have JavaScript  that changes the load behavior of the page, on the fly. Before we wrap up, if you've enjoyed the  sneak peek into the power of Power Query,   you're going to love what's in store for you in  our "Master Power Query Beginner to Pro" course.   Over 50,000 professionals have transformed their  Excel skills with our Power Query course. So,   whether you're just starting out or  looking to sharpen your Excel skills,   this course is going to be your ticket to  becoming an Excel Pro. Check out the link   in the description to start learning today.  Thank you for watching till the end. Thank   you for being here, and don't forget to like  this video if you found some value in it,   and do subscribe if you unsubscribed yet,  and I'll catch you in the next video.
Info
Channel: Leila Gharani
Views: 247,048
Rating: undefined out of 5
Keywords: XelplusVis, Leila Gharani, Advanced Excel tricks, Excel online course, Excel tips and tricks, Excel for analysts, Microsoft Excel tutorials, Microsoft Excel, Excel 2016, Excel 2013, Excel 2019, XelPlus, Microsoft 365, Excel 365, webscraping, web-scraping, excel import from web, import web page to excel, how to, web scraping, web scraping tutorial, data scraping, power query clickable links, power query import images, excel import web images, excel import URL
Id: EOhSODFLqic
Channel Id: undefined
Length: 14min 20sec (860 seconds)
Published: Thu Dec 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.