How to Create a Custom Recipe - Rows

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we'll demonstrate how to create your own custom recipe this video will actually be separated into four separate sections with each section covering a specific topic in the recipe building process the four topics will be billed in rows building columns the nav in actions and then the fine tool and advanced selectors and if you want to jump ahead to those sections you can actually find those videos further along in the help documents or you can just continue watching this video so let's jump right in so first we'll open up a Data Miner and first I'd like to point out that you will see an empty space here this is actually where your recipes will be stored once you create a few all right so now it's going to click on new recipe and now you'll see is the first start tab which is where you pick the type of recipe you need to be a list recipe or a detail recipe hey this recipe is when you're looking at multiple search results and it's gonna have multiple pages so we are actually looking at a list page right now it has multiple individuals and multiple pages and what you'll do to separate out the individuals is by building your rows so let's go ahead and click on to the second tab so as I was mentioning the rows is the process of where you establish containers and you're essentially telling the recipe to extract information just from within those containers and it's what you use to separate the data from one another so let's go ahead just begin so I click the find button and then you'll see with your mouse that you can now hover and highlight two different areas of the page and you'll see that we can highlight different sections or different containers of the individuals so what we want to do is hover until our mouse is highlighting one entire example that we want to extract so right about here is good so now with my keyboard I'll hit shift and then on the right here in recipe creator you'll see that there are now a few suggested elements that we can use as our selectors the pending dotted line it means that the information is just pending and that's kind of where we're focused but to actually lock in the specific container you have to pick either a class or an ailment type that's suggested here on the right so it looks like we have a class called row that's going to pick that or that does not look ideal because we have multiple green lines when really we want the green line to wrap around just the specific data we want so let's try a div and looks like Dib is even worse our goal is to create a selector that is specific to just the area you want to extract information from when we don't have anything perfect right off from the very beginning what we can do is select parent which is a button here in the right so clicking that select parent will actually move the focus line yeah me get rid of that so you can see better it will move the the focus or the dotted line further up the HTML so as you can see as we click it it goes further and further let me start over so we're at the rose and then I do select parent and now it gave us some more containers I'm sorry more around classes so now we have a class called person container so that sounds appropriate let's get bad try great so now as you can see we have the green line wrapping around just the information we want so now that is done all you have to do is press confirm so at this point you've now established your containers and at this point you are pretty much good to go so the next step would be to build your columns all right you're going to build some columns the columns is the third tab here and the columns are essentially just the individual pieces of data so this would be anything from price to name to URL so all you'll do is go and begin building individual column cards so let's go ahead and give the first input field something so that is name so let's say we want to extract the URL so you'll put that as your name and then the extraction type rather than text we will select URL and then once again with the find tool we're going to select that and then we're gonna hover and instead of capturing an entire container this is when we go after the specific piece of data so now if we want to get the URL that's typically either in the image or in a topics name or a products name so now let's hit shift and as you can see we have a few classes so we have one class in one element type typically if you're looking for a URL it's always gonna be within an a tag so that's something to keep in mind so we see an a tag or sorry in a element so since we have that we can try that but it looks like we have multiple so rather than doing that remember we need something specific so let's do the class headline it was great it's capturing just the person's name so now we can do confirm and we can actually double check our work by clicking on the four here and as you can see we have the four URLs and this is just a little eyeball preview area you can click on for all your individual columns so let's go and move on to the second column let's say we want to just get the name this time so put a name and for extract type that will just be text at this point since it's just text off the the webpage and then we'll do the find button and once again we'll hover and tap shift and once again we have the headline okay and we'll press confirm and let's say we do one more just for it of an example so we'll give this name and let's say we want to do the image so we'll do and then rather than text we will extract the image URL and we will hit the find button and once again we with our mouse will hover and then on our keyboard we will hit shift and it looks like we do not have any options there bro so maybe that was a bad hover so let's try that again so we'll hover and hit shift great so now we have this class called person image that looks much better so let's confirm and now we have an image URL perfect so now we can close out of that preview and that is pretty much it for the Collins video we will go ahead and move on to the next video and that will be the nav and actions alright thanks for watching and I'll talk to you in soon all right welcome back we are now going to be focusing on the nav section of the recipe building process so this is where you will use the same find button but rather than building rows or columns you'll actually use it to find the next page button on a web page and essentially what you're doing here is just telling the recipe to find this button and then click it every time you find a new page and then it will scrape all right so to do that well click the find a button once again well hover and we will hit shift on the next page button and as you can see there is once again a class suggested and it looks like it's called next and I'd also like to point out that it is an eighth element which means that there is going to be a URL here which is good we're not extracting the URL at this time but we are going to be using that URL to navigate to the next page all right so let's do confirm and we can actually test it here so let's press the test navigation great so it looks like we it took us to the second page that means that's working and at this point that is good so there's nothing else we need to do here so we can move on to the next section or the next topic of this section which is actions and actions are going to be typically used for more advanced pages the actions are when you ever you need to click on an element to reveal something so like let's say you have to click a show more button to reveal an email or you have to click a Display button to display like a phone number so that is what you would use for the pre scrape here and then we have an additional one called scroll to page end this is if you ever need to have the the page scroll to the bottom this is in case the web page is longer and it doesn't load all of it right away or if there are maybe images that have something called a lazy load where it won't display the information until you've scrolled to the bottom so that is pretty much that the purpose of that action so it's again to scroll to the bottom of a page to allow the page to fully load now I want to kind of clarify this process does not work for infinite scroll for meaning if the page will continuously like add more to the bottom by clicking a show more button or by cooking a load more so whenever a page paginates or loads more on top of it its each other that is actually when you do infinite scroll with click and with this process you essentially will be rather than finding the next page button here it will actually will find the the show more button and then you'll have that selector in here and then you'll tell date a matter how many times you want to click so by default it's 10:00 and then the wait time is 3 seconds so what this means is data miner will look for the show more button at the bottom of the page it'll click it 10 times and then it will wait 3 seconds for the full page to load and then it will do its one single scrape where with the next page pagination it will actually scrape after every click so you'll click to page 3 and scrape click the for in scrape and then click the 5 in scrape where with the infinite scroll what it would do is it click they do one click two click 3 click all the way until 10 clicks once it's showing all the data and then it would scrape otherwise you would get repeated data since you're just you're still scraping the same page I know that was a lot so hopefully that made sense but if you don't quite understand it feel free to reach out and we'd be happy to kind of explain it via email support so now that I've gone through the 3 actions I'll quickly demonstrate how to add them so I'm actually going to change it up and go to another page here cool so this is where we can just practice some actions real quick so what I'll do here is rather than actually having a find a button for every individual action what you'll do is actually start from the top here click the find button and then you'll once again hover and shift over the item you want to get a selector for so hover and hit shift and now as you can see we have a class called email button so we'll select that press confirm and then what we'll do is actually just copy this and then we'll paste it in where we want it so since we're clicking a show more or sorry show email button that will be a click so what we'll do is paste that in here in the the click on element and then you can actually test it and see how that works perfect so that's working and then all you have to do is add it and once it's added then you're good to go and just for the sake of demonstrating I'll do one more example so scroll to page end all you have to do here is just simply edit right so now we have our two actions and you can actually shift them around if you want to change the order but I think click on element and then scroll is good and just so this page works let's review in this real quick I think it should still work even though we change the page so what we'll do is actually save it so let's just say I'm Test scrape one with click and you can actually just put a description so let's say um show email and scroll lets and changes to actions cool so now we have our recipe and we can go ahead and save it so at this point we will now go ahead and just run recipe here from the middle and actually I should show you this so it clicked ok I'll try and do that one more time so you can see it so let's actually go to edit that's and go back here all right so I'll hide that and let's see if I can kind of demonstrate this real quickly so we'll do run recipe move that of the way great so you hopefully you saw that click and then it is now scrolling and once it gets to the bottom it might give us some deets um data you know it will not give us any data because we actually built the recipe for the previous page so sorry if that is confusing but we can actually do is we can edit this once again and then we will let's just go back to our original sandbox and we'll get rid of the actions since you just saw those and then we can actually save this recipe once again and what I'll do is let's just get rid of that and this will be just a basic scrape and then I'll do save as so now we have two separate recipes that we just built and just so you can now see this rent from the original view the viewer or the Dame on our recipe viewer I'll open up recipe or data miner once again and as you can see we now have our two recipes so we have test grade one with actions and then the test scrape one without any actions and you can also actually also see the indicators here on the right so what we'll do is just run it and now as you can see now that we're on the proper page we have the information we extracted well so we have URL the name and then the image quote so at this point you can pretty much download it you can either do a CSV excel file or copy it to your clipboard so that is pretty much it for this process so we have one more section we're going to cover that is going to be the find tool and the ins just advanced selectors but this is pretty much all you really need to build a recipe I still recommend watching in that last video because we all cover the details and the kind of the the tricks and tips you'll need to to tackle more challenging websites so I definitely recommend watching that but I hope this video has been helpful so far and I'll talk to you again and then in a moment all right welcome back so we are on our laughs our last section which will be the find tool and the advanced selectors so as you can see we have our two recipes from before and what we'll do now is just I think maybe go ahead and edit one and what we'll do is essentially go through just the fine tool in a little more detail and I'll cover some advanced selectors you'll need for some more challenging websites first let's just quickly review so we established to the our rows so we have four rows and it's person container and then we have individual columns we have the URL name image and then we did our nav and then we also added actions from the previous page but then we also got rid of those because they are not for this page we're currently looking at cool so now let's jump back to the columns because this is where you'll spend most your time using the find tool and the advanced selectors so at this point let's go ahead and create a new column and we'll give the name let's just say a testing and we will now click the find tool so as I mentioned in the earlier videos the find tool will activate the hover and you will be able to highlight by moving your mouse over different elements and once you hit shift you will then kind of have the information in focus but you need to select a selector here on the right to lock it in so a little information about the the stuff here on the right so we have classes l M types and then one other item you're gonna see is called IDs this is all information taken directly from the HTML and what we use is just what we do essentially is just take that information and then we use it as anchors so data miner knows where to find the information based on the the classes or the elements sometimes the the elements or the classes will just be kind of random numbers and letters other times it might be organized to names I mean it really depends on the site but ultimately what your first thing to do should be just to kind of click around and do like trial and error to see what seems to be working the best and the way to gauge what's working is simply by looking on the website to see what is highlighting so for example if I try the P tag or sorry the P element you will see that we have multiple P's and this might have kind of a disorganized output when you finally scrape so that may not be ideal especially if you're only going after this one in pieces of information right here that I I tapped or I hit shift on so let's try it title instead and now as you can see the title is much more specific to what I was going after and it has your items so that is essentially the process you'll do when you try to find the right selector if there aren't any good ones you'll use the parent to move up and this gives you more options but it might also kind of go further away from the data you're after so with that said I'm actually sorry let me backtrack let's go back to just the P so as you can see we have multiple P elements and let's say this is your only option and you have to use this element type as your selector so what you can do is actually use the arrows here so we have L it's called choose a sibling element we have some arrows and this is essentially going to create an ordered list of all the items with a P and then you can say with the arrow I want the first one where I want the second one and you can do that by simply clicking up or down to move up or down so if we continue to click up you'll see that we're going through all the different peas on this page so you can actually pick which one you want by just simply using the arrows but a quick note about the sibling arrows is that it's all based on the position so if you create some titles that say like let's say a title one title to title three and the the original title one happens to not be there on one page then your other two peas will actually shift down so then they will be misaligned by a one cell because since it's based on the order it's always going to pick the first second and it won't be able to kind of create a structured format where the the empties cell will remain empty it will just shift down all right so I hope that it makes sense so now back to kind of what I was saying before with with the parent element is you can actually use it to move further up the page to get more options as you can see we now have a basic info and one the thing called container style and then we have padding margin and you can't and as you can see you get further up you go further out and up the information so you can use this parent button as a way to get more selection options or you can use it to kind of create a road map meaning you can combine different selectors to make a specific the one that works for you so let's go and let's go ahead and play with that a little bit so what we'll do is that's gonna clear this out and start fresh let's say we want a specific example so we do a new column let's say I want to capture the experiance so what I'll do here is so I hover it and shift and as you can see we now have an experience class so let's try that and it looks good so we can press confirm but now as you can see we're also capturing the word experience which is not ideal let's say you want just the actual number and the years so let's go back in and do that again actually it's already there so what we can do here is as I mentioned there is you can kind of hover and shift and you see we have a strong element here which could have been an option but now we're getting multiple so as I mentioned earlier you can kind of actually use multiple selectors to kind of create a road map so what we can do is actually use this strong and then the original class experience so what you will essentially be doing is combining these two so you say I want the experience row where I started the experience like mini container here and then actually in the element selector box you can type and we can actually type the word strong because that is what focuses on just this one piece of information so again we did the class experience and then we typed in the element strong and that now gets us just the information we're after so let's do that one more time with a different example so let's say we want the industry that's gonna give that a name cool we'll do find and we will hit shift and once again looks like the strong is it go it has a too many other selectors is associated with and if we do industry we're getting that whole word with with the actual industry itself so what we can do is select industry do confirm and then type strong in it and now we're getting just the industry strong cool and what I'll do now is do one more specific example with that whole title scenario so let's say that so we want to do clearance but let's pretend that there was no clearance class let's say you're stuck with just the P element obviously this is going to be way more information than you need if you want just the clearance but what we can do is use an advanced selector and it's called the contains so what we'll do is actually a colon and then immediately after the colon we'll type the word contains and then open and close parentheses and then inside that you're going to type the word that you're searching for so not the data you're actually going after but the title that you're gonna be looking for so let's do clearance all right so now what we did was we essentially told data miner to look for the word clearance on this page and now that we did that what we can do is once again type the word strong to get specifically just that one area or that one piece of information and this would actually work for any of them so we can replace this with experience all we can do this with industry and pretty much any word as long as it's on the page you can do it with the contains and I'm going to do let's see I feel like we should do one more example for these selectors all right I think so one more thing I'll show you is actually going to be in the more focused in the recipe creator element so what I'll do here is let's just select than the title again so let's say that for some reason we do not have any good classes or any good element types what we can actually do at this point is go to the bottom recipe creator and select view elements HTML and this is going to show us the full HTML available so all the raw information that data miner is pulling stuff from and even though we build this site to kind of be easy and it's gonna have all the stuff available we can still kind of mock it all up to be a little bit more difficult so instead of let's say that there was no class and then there was no eight tag what we can actually do is go in this HTML and copy something from it and well so highlight and then copy and then anything raw taken from the HTML all you have to do is simply put it in square brackets and then paste it the full thing and then it works just as if it were from something suggested so even though we already provide you with classes we can pretend that this was maybe something called like data attribute equals headline or maybe title equals headline I mean anything from here in fact I wonder if we could actually try this let's try these the unique URL or sorry the href I wouldn't recommend this because this is going to be specific to just this one person but it should work cool so essentially what we did here was we're saying anything with this href highlight and so it's getting the name and the image and so that will work for let's see I guess they're all gonna be classes so yeah I think that is that is about it well there will be some more selector options in a table further down in the help page they're all going to be kind of similar and essentially what you want to do is just kind of become familiar with those and as you learn to use them you'll be able to build and kind of create more advanced selectors and at that point you can pretty much use them on any site because even though sites might be different the same tricks and tips still apply and you can kind of use them for any site so I hope these videos were helpful thank you for watching and of course if you have any questions feel free to reach out to us on email support all right thanks and I'll see you later right
Info
Channel: Data Miner
Views: 113,270
Rating: undefined out of 5
Keywords: Data Miner, Data Mining, Data Scraping, Data Scraper, Scraper, Scraping, Scape, Excel, CSV
Id: DAMiHZauQDI
Channel Id: undefined
Length: 27min 32sec (1652 seconds)
Published: Fri Apr 06 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.