Watch Me Build An Instagram Scraper & Generate +400K DMs/Day With AI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what's going on everybody it's Nick and in this video I want to run you through how to scrape Instagram effectively scalably and with data that lets you customize DMS and Outreach using AI I'm going to walk you through my personal entire funnel from end to end and I'll show you a couple software platforms and give you guys sort of a behind the scenes look at how to do this realistically within a company so if that sounds like something you're interested in stay tuned and let's get into it okay so first things first I got my Instagram profile open here and just so that nobody yells at me I'm just going to use my own profile and uh maybe just some like random bot profiles that I find for the purpose of the scraping but uh what you see when you go on my profile is you see a couple things we have like a little profile description here obviously we have a link to threads we have you know the first name last name this is where like the bio or the description would go we can see some linked websites and then we have a bunch of images here um and essentially what I want to do over the course of the next few minutes is show you guys how to get all of information in Json format and then use it in an AI flow and what we're going to be doing in particular is we're going to be feeding in some image post data into Ai and we're going to have it customize the first line of a message that we're sending out to people in my experience and I do you know cold Outreach basically for a living at this point um you know the best cold Outreach is customized but it's also incidental and the way that it comes across is almost just like you're talking to a friend at the bar or you know you're just texting somebody on your phone you know it should be hey X saw your post about y wanted to talk about Z if that makes sense and and that's just going to be the sort of uh the sort of format that I'm going to follow here now I should make it known that Instagram has changed their API and so you can't actually send DMS well I was going to say legally but it's not like Instagram's the government not yet you can't actually send DMS uh just using the Instagram API so uh we can't actually do the sending of the DMS on our end unfortunately or at least I'm not going to be showing you how to do it in this video there are a variety of like web browser automation tools that allow you to do that sort of thing uh and if you guys are interested in you know showing or running me having me run you guys through how to actually go and send a DM like you know having a bot that like opens a browser clicks on messages clicks on a DM list or whatever I can do that for you just note that Instagram is probably the world's most sophisticated uh web service and so as a result they're very very secure and they're very very difficult to automate you know because this is like a problem that's basically basically they've been solving for the last 15 years or however long they've been around regardless we're going to get a bunch of data and then just for the purposes of automation I'm just going to dump it all into a big Google sheet and then we're going to take that Google sheet data and then you can do whatever you want with it whether you want to add it to maybe your own CRM with like a task for like a virtual assistant or a salesperson or you know maybe you want to run some additional analysis on it whatever at the end of it we are going to get you know the images we're going to get the profile we're going to get the URL and then we're also going to get like something to tell them and uh yeah that's going to be what we're going to develop over the of course the next few minutes so uh let's talk tools you can obviously use a variety of tools to do Instagram scraping there are probably like 30 or 40 of them out there now um what I'm going to do here is I'm going to show you my favorite which is called Phantom Buster and Fuster is um basically like a I don't really know exactly what you want to call it they use browser automation under the hood and so they are going out and like clicking on links for you and then like scraping a page and then like you know penting and all that uh but they're probably like the most secure or safe I want to say um because you know you obviously need to log into a profile in order to do the scraping and so yeah basically as a result of um using phantombuster you can get very very safe data uh for reasonably cheap and the way that they do their billing is they use sort of like execution time because they have these Cloud servers it's like hosted somewhere else on the web and instead of billing you per run like make.com does they'll bill you per minute used and I just get the starter plan and I run probably like four or five businesses out of it and it's perfectly fine I basically never run out of Ops so uh you're probably good with a starter plan you can also just use a trial for the purposes of the video if you wanted to like build out something relatively similar but yeah I'm going to use Phantom Buster for this and there a variety of other tools as well we're also obviously going to be using make.com so if you're unfamiliar with make.com you know I have a whole tutorial on how to use that platform extremely powerful automation platform that I made a ton of money with and uh and you can too and then yeah we're just going to be using Instagram so let me walk you through what a flow here might look like and I'm going to start off by just using Phantom Buster and then I'm going to get a bunch of like um basically uh example data and then with that example data I'll like feed it into AI we'll do the prompt and then after that we'll actually go and then I'll show you what like uh how to import that into make essentially so um first things first let me just see what Google sheet account I have access to here so we're going to be adding a row to a Google sheet and then okay so this going to be on my other email I have so many emails open here it's Bonkers okay we're going to call this uh Instagram Phantom Buster scraping and I'm not going to fill in any of the fields right now I'm just going to share this with an email address this is sort of the less glamorous part of doing a lot of these automations in order to obviously be able to like access the correct sheet and all that stuff you just need to make sure permissions are good okay next up I'm going to go into phantombuster so I'm going to assume that you've already signed up when you sign up you get a page that looks something like this and essentially what you see is you see a bunch of phantoms this is just uh I guess like a cool branding Quirk Phantom is just a bot they're just calling it a phantom instead of a bot and if you click on use a new Phantom and then you just type Instagram up in the top left you'll see that there are a variety of phantoms or a variety of bots that we can use in order to get our data and the data that's returned by each of these Phantoms is quite different so pay close attention to the sorts of things that we're doing there's a follower collector which will extract the followers of an Instagram account and if you click more you'll see what you give and then what you get so you can see that we get the profile URL the username the image URL full name ID is private is verified and so this is probably what I would use if I were you know in to do cold Outreach for people that are in a particular Niche I would just find a follower or a followe or like an authority or an influencer that's sort of like definitive in that Niche I don't know maybe it's like Mar brownley or something like that for Tech products and then I would uh extract or collect their followers just based off of you know the the little followers button so it's going to go through and find all that information and then what I do is I'd feed every profile into another Phantom which maybe helps me get image data and post data and sort of thing uh because I am doing this just on my own profile I'm just going to go on one so you can see there's like Auto comment or autof follow Auto lier you can do a lot of like active things with this as well and if you guys want me to do a video on any of that just let me know we're happy to do so I figured I'd keep it simple here though um we're going to do okay profile post extractor that might be one so I'm just going to oops I don't actually want to use this I'm going to go back here and just click more we'll do profile scraper this is probably the one that we're going to use we can do story Auto Watcher that's pretty interesting story extractor story viewers export tag post extractor so I'm pretty sure the one that I want to use is called the profile post extractor yeah um so we're not going to use the photo lier obviously let me just see what we can get with the profile scraper yeah so you can get the bio website you get all that stuff but no we're going to want to get the uh we're going to want to use the uh profile post extractor and so I've actually already like Tak one of the Phantoms here so I'm just going to use the Phantom that I have um right over here Instagram photo extractor but yeah you know just find Instagram photo extractor and then use that okay great so after we're done with that let me just call this YouTube Instagram photo extractor once you're done with that you're going to want to click into the Phantom and uh you'll see a page like this this just shows you your run history uh you can see I ran this on my profile a while ago probably when I was thinking about doing this video uh and it showed me you know the information like the Instagram the comment count the light count you get a ton of information which is pretty nice so next you got to click on this little setup and then what it's going to do is it's going to ask you for your Instagram session cookie now if you're unfamiliar with what a session cookie is you know when you log in a service and you put in your username and your password well basically in order to prevent you from having to resend your username and password every time you want to do anything on the platform like anytime you want to like a post right instead of having to type in your username and password to authenticate um Instagram will basically do some big mathematical operation and then give you a very information like the fact that I'm exposing the session cookie to you right now basically everybody would say Nick this is stupid you shouldn't be doing this um but I know that the session cookie turns over every so often and so I'm not personally worried about anybody using mine but if you're using this in like a company or something where you have maybe a personal Instagram account or whatnot just make sure that you hide this or you know you're not really showing this to other people in in the business odds are they're not going to know what to do with it anyway but just something to keep in mind um session cookies are like you know the secret code that they give you as a result of putting your username and password in there it's not like you can recreate your username and password but anybody that has this can log into your Instagram account uh and do with it so any who I'm going to click login and then uh we'll do instagram.com nxra that's just the URL of my profile when we scale this up we're going to do this on multiple profiles so we're not actually going to be using this one we're just going to feed profile by profile um things in here you'll see if you scroll down a bit it says you can give your profile URLs in one of the following formats the URL of a single Instagram profile the URL of a Google sheet containing a list of Instagram profile URLs and the you have a CSV file containing a list of Instagram profile URLs and so we can actually like scale this up any way that we want we could just run this once per um Instagram profile that's probably a little inefficient uh what we're probably going to do is do a Google sheet or we're going to do a CSV file um and then the Google sheet you can just like generate a Google sheet every time that you want to do this and then use that Google sheet as the URL uh so pretty pretty straightforward stuff and you have some other things you can Define over here but since we're not using a Google sheet that's not a big deal you can set the number of post to extract per profile just to keep you know automation limits as low as humanly possible I'm just going to extract three and then I'm going to have ai tell me something about these three and then I'm going to combine their I guess descriptions or whatever and then use that to say something and then the number of profiles to process per launch is going to be one there's some advanced settings down here but usually you don't have to worry about the advanced settings so we're going to click save I'm going to run it once um just as a purpose of this example and then uh I'm going to say no thanks here okay and then I'm going to launch and now it's actually going to go out and log in on my Instagram and then feed in the session cookie navigate to the URL and then basically just scrape the Json from that whole page and then feed it to me in a format that I can use so that took not even like 5 Seconds now that I'm looking at it um yeah it's a successful authentication it says no new post found just because I ran this a few days ago when I was figuring out what I wanted to talk about in this video but the end result is you get this information so you get the post you with the column description comment count like count location location ID I guess you know Port Moody British Columbia is a specific location ID publication date you can see I haven't posted anything in like five years Jesus liked by viewer is sidecar I don't know what that means but it means something type caption profile URL username full name image URL post ID query uh timestamp tagged full name one and tagged username one so what I'm going to do here is just for the purposes of this demonstration I'm going to download this um file it's going to be a CSV and then remember how earlier I had uh let's see this Google sheet well now I'm just going to upload what I just downloaded here into this Google sheet and it's just going to like recreate it okay great and now I have the data and some format that I can work with realistically um I mean I did three images here but realistically I'm just thinking about like the quantity of the data that's coming in we might just want to do one and just have it tell us something about the last image that's probably smarter um we could loop around three but you know maybe token cost would be a bit much so why don't I just cut these other two out and then we'll just use one okay and then the reason I'm doing this just to be clear is because I want to have the data in a format um that I can then like use to to call um a make scenario and uh Google Sheets are just the same simplest and easiest and I think most people here intuitively understand it so I'm just using this as an example we may not actually have a Google sheet at the end of it okay great so now let's actually um go out and then do something with this so excuse me let's uh I think I've shared this to the correct place yes I have let's go to make.com and then I'm just going to use Google sheet as the source and then for the purpos of this I'm just going to say search rows although in the future maybe we're not going to use search rows I don't really know um it looks like there's some issue like validating my account so let me see if I can find this spreadsheet here should just be Instagram yeah Instagram Phantom Buster scraping beautiful we're going to call results that's our wonderfully descriptive sheet name and we're going to run this puppy just as a test to get the data you should always be running things before you build out your full flow just to make sure that everything that you're doing is uh is necessary and you're getting all the data that you need so you see I got my post URL I got the description I got the common count there's an image URL here if I copy and paste this image URL this say the URL signature expired that's kind of weird I might not have pasted it incorrectly or something like that oh you know what I think I know why this happened um I think this happened because yeah I think I know why this happened um because I haven't done this in a couple of days right so I imagine when it scraped it it probably uh like it probably didn't add the new URL Instagram will update its URLs all the time and what are called URL signatures just to make sure that people aren't scraping data storing it and then like using it uh I don't know six months later or something and so they're very very careful about this um so when I ran this in order to preserve my resources Phantom Buster has sort of like a sort of like a cache and it says oh we've already uh scraped this image so so I'm not going to scrape this image again um so we may just need to scrape the image one more time so in order to do so I've uh yeah okay so this should work this is one new post extracted so if you guys remember last time it said no new data to update but if I go to the image URL that it pulled now I bet you it'll work yeah there you go that's me absolutely F demolishing a Subway sandwich okay so I'm just going to um download this new results file and then I'm going to go back over here here and then reimport this just so that we're all clear on you know what you would do if you were attempting to test this out on your own so you just drag and drop that here I'm just going to replace the spreadsheet because this is like the old data and I want the new data okay cool and now I have accessible image URL with like an accessible signature the signature I believe is just in the URL after question mark STP and then they just have like this whole like long string and that just takes like the current date and time and then make sure that you're not using it uh 6 months from now or whatever I don't know if you guys are are familiar with um kind of how that works but like there were a bunch of services out there that tried scraping uh Instagram and then like hosting like Instagram alternatives for a while and so this is just how they shut all that stuff down so now that we have the data why don't we do something cool with it let's feed it into Ai and then uh so in order to do that I'm going to go to add and then I'll just type open Ai and then I'm going to do create a completion no I'm going to do analyze images let's do that image that we're going to supply is going to be an image URL and the image URL in particular is going to be this image URL it's literally called image URL uh and then I'm going to say the following is an Instagram image shouldn't be a problem sometimes when you feed stuff like this in like with brand names uh gp4 or gbd3 or whatever is doing the image analysis will be like I'm sorry I can't scrape on Instagram I'm so sorry uh Joe or whatever the hell that how 9000 is uh a meme is um so we're just going to say the following as an Instagram image and then I'm going to say um I've included a caption below as well and then we'll just say caption and then um tell me something about use the image and the caption to write a customized one line introduction use the following format um we can do Jason yeah you know what let's just let's just see how it goes let's just see how it goes without actually doing Json formatting and stuff like that I'm I'm just doing this off the cuff I haven't run through the specific example so we can sort of learn together and maybe iterate and hopefully this will show you guys at least some some insight into my thought process when I'm designing these things so I'm just providing it some examples because I want to steer it in a direction where I'm just not going to hate the first output yeah that looks pretty good we'll just roll with that uh we're going to spot the image URL Max token yeah 300 that's probably Overkill we probably need like 150 and then are there any advanced settings that I need let's decrease the temperature top PE looks good so we'll give that a little run um unable to parse range oh right because I just re-uploaded this file so I got to go back to the sheet name go to new results file and this may uh no this should be okay okay so we're going to run this puppy again so we got the image URL here so it's currently reading the image and then it's also taking a caption and it's telling us something about it and let's see what the result was your latest snap is epic taking a big bite with those cool shades on okay that's kind of lame let's just say using no FRS ton of voice let's run this puppy going all in on that burger I see hope it tasted as good as it looks so that's a pretty good output which looks nice um now there are a couple of different ideas here you can use Json like I've used in the past and uh so you know if I use Json what I'll do is I'll say something like write in Json format and then I'll say example and then I'll have it like output Json and then I'll parse the Json um usually I'll do that when I have it outputting multiple things so if it's outputting I don't know like a you know like an icebreaker but then a like a like a reason and then maybe like like a heading and then a subheading or something I'll just do it in Jason because it's really easy to um to index afterwards in this case all I'm really doing is I'm just generating a oneline Icebreaker so I don't really know if I need anything else um but yeah just to just to sort of walk you guys through what my what my thought process is there looks like there are quotes that it generates um the quotes seem syntactically important that probably just you know I've provided quotes and the examples here just to show and kind of disambiguate it from the previous text so I'll just leave the quotes for now and then I'm just going to trim the quotes um as necessary or maybe like just remove the quotes uh and then yeah I think that's all that I need to do and then what I'm going to do is I'm going to go back into this Google sheet and then um there's sort of a couple different ways that you can do this you could hypothetically create a new Google sheet called like Instagram Phantom Buster uh you know DM list or something like that but I think that you know it's possible that anybody that I get to do this task ask like a virtual assistant or whatever can use this information in order to improve the quality of the message if they need to or something like that so what I'm going to do is I'm just going to add a new column and I'm going to add it all the way at the end of this and it's just going to be called like DM Icebreaker and then I'm going to copy a template that I've set up here to sell my service and then I'm just going to like make it very easy copy and pastable um I think if I could go back in time I might do this on air table but I know that not everybody here has access to air table so maybe this is a better call Air table is just a lot easier to build these formulas and stuff like that um in it uh and then you know it's also a lot more I think straightforward for virtual assistance and stuff but that's okay so for here for this column I'm just going to write um DM maybe we'll go DM we can't camel case DM I'll just say DM uh and then I'm going to go back to the integration in particular and then we're not going to add a row I think we need to update a row or some like that let me see it's probably update a row so first thing we got to do is we need to run this again um because we have a new structure here with a new um a new column I'm going to connect this and then I need to re use a different account I think I just have so many of these and I haven't really gone through and uh and swapped them and then what am I doing here I'm looking for Instagram no this might be shared with me so we'll do Instagram okay great and then the sheet name is going to be whatever the new results file is and the row number in our case uh we're just going to take the row number in from the previous Google sheet and then what we're updating here is the DM and it's going to be high and then we're going to grab the previous first name um the way that I'm going to do the the first name is I'm just going I mean I'm going to do it really hacky here I'm just going to say full name then I'll split this based off the presence of uh space I think and then I'm going to uh get this the first result of that and then I'm going [Music] to I guess DMS are usually in one line so I don't actually need to do a new line uh in order to keep the tokens or U the DM as short as possible because I think there's like a DM character limit Instagram let me see what this is th characters that's actually pretty long we're probably not going to need this um I'm going to feed in this uh we should just replace all of the quotes with empty string that should work so this will be our oneline DM uh let me just check if my on line DM is going to include like punctuation so we got an exclamation point there we got like a period here and I got exclamation point so it's probably going to include punctuation um I thought you might be interested in X we do y and offer Z let me know if this might be worth a chat okay cool so now I'm going to run this puppy again going to test this out make sure that I updated that row um I guess since it's all on one line we don't really need to capitalize that first letter but anyway yeah that that seems pretty good um and then what we can do is we can populate this Instagram Fanta Buster scraping sheet with just like a massive list of uh of you know people and posts and and that sort of thing and we have a profile URL column here and a username so we should be able to you know in the future use this information to like index um any other database that we set up with Instagram information so like if it's a list of profiles we can use this as like a primary key and then use that to sift through if it's if it's anything else we can make it pretty simple and now like your sop for a a VA or something like that you know you might have I don't know another column here that's like status and then you know you might just be able to like do a check mark or something like that um but basically they just go through this one by one copy it and then paste it into Instagram DMS um send it and then you know check check it is done and then just move on to the next one and that sort of deal and in this way you know instead of paying so really there are a couple benefits to doing this sort of approach if you guys are interested in you know how to systematize this more generally um you know like a lot of the time virtual assistants um you know live in much lower cost quality of living countries so like in the Philippines um you know where the average wage is maybe five or $6 do an hour you can provide well actually I think it's substantially lower than five or $6 an hour but like the five or $6 an hour is like the virtual average wage so you can provide a significantly lower um um wage on an hourly basis but you also don't really have to suffer any of the drawbacks if you think about the typical drawbacks of offshoring which are um you know usually some type of like language incompatibility or or you know a slight difference in culture or attitudes towards work and so in this way it's just a very simple straightforward streamline process where uh you get to leverage AI to do like the customization aspect and then all you have to do with the virtual assistance time is just sort of copy and paste and copy and paste um you know we're using humans for this purpose we don't necessarily have to suffer uh anywhere near the same rate limit sorts of issues of Instagram word the tech that we were using Bots and you know presumably if you're selling something via DM it's probably some type of high ticket offer like a lot of people do coaching via DM or or something like that so the5 or $6 an hour might net you like several hundred DMS with a conversion rate of maybe like half a percent 1% you can usually at least convert a sales conversation or two for like four or five dollars and you know just for your guys understanding U being able to like generate a sale opportunity or a sale or book to meeting or whatever you want to call it for $ four or5 do is like insane most businesses will spend uh you know anywhere between like 30 to 40 times that for qualified opportunities so yeah you know there's some issues with Instagram dming for sure like Instagram dming is uh I think you know you now lend in the requests pile so I'm not necessarily recommending that uh everybody drop what you're doing then use Instagram DMS or anything like that I found a little bit of success with it uh for my own business a very tiny bit and then I've also built systems like this for other businesses that have found substantially more success and particularly the coaching Niche but uh yeah just just wanted to give you guys sort of a sense of the sort of the order of magnitude value in a system like this even when that's not fully automated and even when that you know the Instagram um API sort of tried to stamp down very very potentially profitable so the question now is okay I showed you how to do this manually how do you do this on an automatic basis well there are a couple of caveats to this so I think I mentioned this to you before um because of the whole DM Instagram um API thing um we we sort of have like a little thing that we need to solve first before we can do this this session cookie lasts a certain amount of time and so what that means is in order for the system to run completely autonomously you need to update the session cookie every so often and everybody that's making videos on this stuff will not tell you this part because it looks really glamorous if you could just run a phantom Buster Loop over and over and over again right but in reality you know reality is always more complicated than Fiction right so uh in reality you need to take a couple of additional steps to make sure that this works and if you think about it let's say that the Instagram session cookie expires after two hours that means that every two hours that you want this thing to run you're going to need to make sure the session cookie is right and then if it's not right you're going to need to go and update it how do you update it well you can either manually go to this page and click connect to Instagram or you can use like a third party automation tool like a browser automation tool like what I was mentioning ear earlier um to go and then log your website or Instagram and then retrieve the session cookie from one of the cookies and then use that to update some resource so why do I say this I say this because if I were building this out in practice what I would do is I would have a Google sheet and I would call this like authentication I would have [Music] um a list of the profiles that you are going to be using to scrape I would build it out so maybe I'm using Nix R to scrape I would then build it out so that we have a column here called session cookie and then we also have some like date last accessed so if I go back to my new results file I think I can just pull just going to pull a random timestamp here okay what I'm going to be doing is at the beginning of every um automatic execution of this Phantom Buster scenario which we can do through make I will look through the session cookie for the profile name that I'm using to access and then I will attempt to run it using that session cookie and then if the session cookie is valid it'll just continue down the flow and if the session cookie is invalid what it'll do is there'll be some error handle and then using that error handle I'll run another scenario that other scenario will be for my browser automation software which will then go out on to instagram.com you know spin up a web page try log in and then when it logs in it's not even going to do anything it's just going to like log in and then pull the session cookie and use it to update this it's then going to call back to the original uh Instagram scraper and then it's going to run the Phantom Buster module again with the session cookie that's sort of how all this looks from a high level um at least on the make.com set of things now if you want to do this in practice what you do is you go down a phantom Buster here and then you would have a module called launch a phantom and this is going to be like a multi scenario sort of deal so you'd have like one called launch of phantom if I go to map you'll see there's Instagram photo extractor I believe this is the one that I was using you'll see that one of the fields here is session cookie another spreadsheet URL profile URLs column name number of post per profile number profiles per launch CSV name right this is like the the output file uh and so you know what what we would do is we'd have a Google sheet we can just copy this search rows we'd make this the trigger we'd run this on some schedule so maybe I don't know every 60 minutes let's just do 120 Minutes we'd connect this and then uh if you remember I added a new page here called authentication now I only have one element here which makes it really easy but you may have several profiles that you're going to be using anyway we're going to run this puppy and then you see there's a big session cookie so what we' do is we'd feed in that session cookie I have to recap connect this and then I have to cancel this and then open it again make can be sort of finicky with this sort of with this sort of thing so just keep that in mind then I'm going to run a YouTube Instagram photo extractor go down to session cookie feed in the session cookie if we had a big spreadsheet URL of profiles then I would use that um for the purpose of this demonstration I can just use this profile URL and it should uh it should function similarly um the column name only applies if you're using spreadsheet URL and then number of posts per profile we're just going to do one number profiles prch we're going to do one CSV name I'm just not going to do anything and I'm just going to run this to show you guys how it works uh you can only get first post oh that's interesting I guess they have like an additional option via the the API that you can do there's Watcher mode filter results you can add some filters here save argument menual launch that's kind of funny uh whatever we don't need that okay so it's gone through and then it's gotten our authentication and then it's ran the Phantom but as you can see the output like it didn't provide anything in the output and so the reason why um they do this is Phantoms take a certain amount of time to execute right so we don't actually know whether or not the Phantom is going to be done um like we we can't just wait for it to finish because it may be like five minutes it maybe like 40 minutes depending on how many profiles you want to go through so they sort of separate into a two-step design pattern they have the first scenario that sort of like launches the Phantom and then they have a bunch of other modules where you can create new scenarios that watch an output and so notice how I can't actually connect this to my flow because this is an act trigger only so basically what we have to in order to make this scenario work is we need to make it multi-step so number one is we need to launch Phantom Buster Instagram scraper just going to save this I'll go back to my example builds and I'm just going to create a new one um this sort of two-step design process just in case you guys are interested in developing these sorts of things um more often uh you use this basically every time you use like a scraping or a browser automation uh application just because again there's like a variable amount of time between between when you'll launch something and then when you'll be able to like pull its output so extremely common um yeah so we're going to go back to example builds here we have launch Phantom Buster Instagram scraper I'm going to create a new scenario and then I'm going to say watch output of phantom Buster Instagram scraper just going to copy all the stuff in here think there's something preventing me from clipping continue allowing this I see this is weird I don't know why this isn't allowing me to paste oh might be because I have two things that I'm copy and pasting yeah there you go just trying to copy and paste the the title as well any who then we are going to feed in this watch and output now notice what happens when you connect this you have just have the Phantom ID and that's all you have to put in all you're doing is you're just watching to see if that runs you'll also notice that this is not acid um what that means is or this is an instant and acid what that means is this actually just has to pull every certain amount of time to check whether or not you have uh your your Phantom Buster scrape has concluded so that's pretty annoying it usually leads to much higher token usage what this means in practice is I'd run this at the same interval that you're running the first scraper so you know if this is set up at 120 Minutes you know I'd set mine up at 120 Minutes too but I'd try and like do it a few minutes later so so that you know if I know that the thing's going to do finish it like 120 Minutes i' I do it here uh you can also use web hooks um but you know in order for that to happen you would need to know how long it takes if you wanted to use web hooks you would use um you'd make an HTTP request down here connect this and then on this end you would use um actually I think you use Phantom Buster get an output now that I'm thinking about it get a phantom download a result most likely or get an output yeah yeah okay I take that back we actually don't need to use watch and output we can just use get an output and what we can do is we can use uh web hook here set up our own and then we can create it so that it's um Phantom Buster Instagram scrape completed and we can actually call this web Hook from our other scenario which is pretty neat uh and then when we receive reive that um you know request then we can use that to trigger the get an output module and in what we'll do in order to feed this in is we'll feed in the well actually we don't even need the Phantom ID because we know what we want to get we want to get the Instagram photo extractor and then obviously we're not going to call this immediately we need some type of sleep or some weight or something um I wish there was a way to scale this up but uh you know you could just use like a you could just use like a sleep rule of thumb where maybe you multiply the number of rows by some number of seconds or whatnot and that might be like a good proxy for how long it's going to take in our case I'm just going to use a flat uh 60c wait time and that should be good because I'm just doing an example on one so now I'm going to make a request okay what am I doing I'm calling this URL fantastic this is the web hook so I'm triggering it after 60 seconds then I'm going to catch this and then I'm going to feed it into Phantom Buster get an output then from Phantom Buster get an output we're going to um actually why don't we just run this first and see how this goes so I'm going to run this and [Music] then I'm just going to change the CSV name because I know that this is going to sort of run it uh a new okay great so I just trigger this and now we're sleeping for 60 seconds I'm going to go back to my homepage here and you'll see this is actually running now now because we're only doing a single page it should run fairly quickly it's probably not going to take 60 seconds probably going to take 10 but you know I just want it to be really safe kick rate says one post extracted and then the file is just result. CSV oh maybe I need to refresh this looks like it's C it's caching results or something like that so um that's all good any who looks like this is I don't know 15 seconds 30 seconds I don't actually know exactly what uh proportion of the you know I don't know if this is the 60 no I don't think so I think this is probably 60 so it probably goes down to one quarter Okay cool so we're about halfway done uh what we're going to do next is we're going to catch that web hook and then use it to trigger the uh the Phantom Buster output I love screwing around with these much more sophisticated apis if I'm being completely honest well not apis but services like Instagram or Tik Tok or whatever because there are so many safeguards that they put in to prevent you from being able to access their information that in order to do so successfully you need to develop these very creative strategies and applications and then uh when you do you know the results are very outsized because nobody else has the ability to execute at that level so very very cool anyway what ended up happening is we got an output um looks like there was sort of like a text out output where it's a container whatever yeah so it just gave us a what looks like a log which is nice um then there's a container ID where you get the output we should be able to download a results file I don't know exactly where the results file would be yeah we might just we might just do that so I'm just going to go to container ID and then I'm just going to feed it in uh manually for now just to verify what the format is of the output okay it's a buffer um so we need to convert this buffer into um text so I think we need to do like download a file or something I've since completely forgotten how to download the file so maybe we'll try HTTP so this is getting a file from a given URL uh but we need to convert this binary back into Jason maybe is there like a parse Json module we parse binary let me see here maybe one of these but I'm not entirely sure if I'm honest uh this is a binary string so I doubt it let me check to see if maybe know it's just any object yeah I had a flow here that was pretty similar um I've just forgotten how to dump this in a format that works let me just check to see if there's any I remember it was a um there's a download module of some kind so yeah this is an example of a similar system that was built in air table oh okay I guess you could just parse yeah you know what I guess you could just parse it so we'll just drop parse here um for the purposes of this let me just check to see how long the output bundle is I think that's probably going to be okay not entirely sure so we're just going to run this puppy with this Json string no source is not valid Jason we're just going to feed this in to do some manual testing then I'm going to feed in the data field here just because I think I was mis formatted yeah it was mis formatted okay great so now I have the result object um looks like it parsed it but it didn't parse it didn't parse it in Json which is interesting I wonder if I do this if that would be enough let's run this puppy again and ignore the hell out of These Warnings okay uh no that did not work we may just have to parse this twice honestly let me see what happens if I feed this same thing [Music] in um afterwards because uh make doesn't really or Phantom Buster doesn't really like strings okay we'll feed in 10 do result object not 10. object yeah okay now we have access to everything so it's sort of annoying that we have to purse this twice uh I remember I came up with a solution that made it so we didn't have to do this but whatever anyway so now we have uh we have a bunch of you know values here that we could use obviously for this um and we can use this to update the Google sheet I mean now that we have access to all this stuff I don't even know if we need to well yeah we should probably dump this into a Google sheet regardless so why don't we just run the open AI thing first we'll do the image URL is right here and then uh instead of updating a URL we're going to add the URL or add the thing to the sheet entirely and you know what I have to go and I have to do the annoying sheets authentication again so we'll add a new row here we will go through our shared with me believe and then it'll be Instagram scrape yeah and then we're just going to um basically split okay actually we shouldn't call this new results file either we should call this um profiles let's do that yeah and we'll go over here and then we're going to have to refresh this because we just changed the name of the sheet and if you don't change the name of the sheet in um the Google Sheets module then the API call will be Mis formatted and then we'll just go through and then we'll enumerate over all of these do location we'll do location ID we'll do uh pubdate like by sidecar type caption profile URL username image URL post ID query Tim stamp and then the DM will just be the result um and you know what no it's not just going to be the result oh I think I uh oh actually you know I could have just used this what the hell I'm silly man okay we're going to get the full name the result here is this result okay great that should be good um yeah that should be fine now we're going to do this we're going to test this Just Once on This flow just make sure this works we should add a new row to our uh sheet with basically the same information except from this seems you're enjoying your burger in your latest photo yeah so you know the um copy isn't the best in the whole wide world but I still think that it's reasonable and that's probably more than enough if I'm being honest I've seen much shittier campaigns run extremely extremely well now that we have this obviously we need a way to do multiple profiles uh and probably the way that I would do this is I would set up another sheet and then I'd say to scrape and then inside of two scrape I would do a phantom Buster run here if I go back to this dashboard there should be a way to let's just delete all this stuff annoying then we'll make a new one and we'll call it Instagram follower collector we use this Phantom uh I'm just going to do this as an example first so you guys can see so now we're going to actually go and get a giant list of followers for people so um mares brownley Instagram we just going to use this as an example it is unsurprisingly MKBHD for this purpose I'm going to just do 500 no actually let's just do 100 you'll see that it allows you to extract a large number of followers you can do 5,000 and 9,000 every 15 minutes so you know you do the math what's 24 * 496 time 5,000 let's say you probably do like 400k which is pretty huge and then number of profiles to process per launch you can have it rerun over and over and over and over again um yeah we're not going to do that because this is uh specific to a Google sheet and then we're good and again I'm also going to run this puppy manually let me rename this so it's simpler we'll call this bulk Instagram follower collector and then give this thing a run and yeah as I mentioned it's going to like extract a big list of followers it's going to do them pretty quick because basically what it's doing is it's going here it's clicking followers and then it's just scrolling through really really fast so pretty straightforward but anyway the result is you get this uh file here with a bunch of profile URLs which is nice and so what we can do then is we can download this as a CSV we go to our two scrape we can import this just so we understand sort of like the data format and then I'm just going to click replace uh I'm not going to replace spreadsheet I'm going to append a current sheet that way I don't replace the name and everything like that and you know one thing that you're probably noticing if you're still with me is uh you know when I'm doing these designs I'm sort of starting at the the end and then I'm working my way back so what was the end that we wanted well we wanted the Instagram information sorry we wanted the AI generated DM and then you know we start with that and then we sort of reverse engineer everything up the queue so from that I knew okay well in order to get that information I need a single profile okay well in order to get a bunch of profiles I need to you know have a list somewhere that I'm referencing okay in order to do that I need to store a Google sheet right uh logistically this is like first principles thinking and it's probably like the number one thing to lean on I would say um when you're building these sorts of systems out looks like you also get the profile URL of some of these people which is pretty cool uh sorry the profile image URL which is pretty cool um it's pretty blurry I don't know if this is going to be enough for AI to do anything anything yeah probably not I mean it's just a picture of a tree and it's like 160 pixels by 160 pixels regardless from here now we get the um the username and then we also get the profile URL and if you remember um the one thing that we need in order to query a profile is we just need the profile URL so we have the profile Ur all the username we have the full name not all of these have full names so we probably need to do some type of exception for if they don't have full names it also looks like not all of them are like first last like me I'm clearly a square cuz I call myself Nix these people are carig good you know incredible India that's a hell of a name um may I have a jar of Americano please sure my love yeah so that's kind of annoying we could um set a character limit if we wanted to that's probably like a simple and easy way to to do this uh at scale so we could say you know over whatever maybe like 10 uh if the result of uh splitting and then getting the first thing is over 10 then we do not want to use that we just want to like have an empty space so it's just say high instead of high yes Shore meore 1 234 that's probably what I'd do but anyway okay great so uh yeah so This Is Us launching the scraper for a single sheet I think so what we need to do now is we need to have another scenario let's call this like three let's call this uh like four we'll call these individual Phantom Buster scrapers then we're going to make a new scenario actually two probably two new scenarios now I'm thinking about it we need to make a Google sheet we might need to do one we might need to do two so uh what we need to do here is I'm going to say launch bulk Phantom Buster Instagram scraper and then um I'm going to copy some of the stuff just because uh we have some authentication sort of pre-created for us and just be careful you're not also copying the name because um essentially what happens when you copy a module under the hood is you're copying the Json of that flow then if you add a name to it the Json breaks and uh make doesn't allow you to paste it in but okay great uh what we're going to do here now is we know that we don't want to like just we don't need to reference a Google sheet actually we do need to reference a Google sheet what am I saying we need to get the authentication or the cookie so I'm going to paste this in this is going to be my trigger you know you might run this I don't know how often you want to run it depends on how many um how big you're going to have the scrap be I think I set mine to 100 just as an example so uh we're we're not going to do it very often so maybe this would be like you know run I don't know once a day or something like that maybe it runs like in the morning and then we just go through the list of results later on basically what you do is you'd search you'd get the session cookie you'd then launch the Phantom here and you'd pull the session cookie from this um the profile URLs you can either store that information somewhere else or if you're like me and you're just using um Marquez I really don't know how to say that name Marquee brownley you would feed that in here um actually we need to make sure we're launching the right Phantom here I think I'm using the wrong one so let me go to the right one we're doing bulk Instagram collector that's right okay so session cookie would be here um we don't need a spreadsheet URL oh it doesn't actually just allow me to do one H that's interesting I don't know why it doesn't just allow me to do a single one wonder what happens if I just feed this into spreadsheet URL or maybe queries my intuition is that this would work but we should probably try this out because my intuition has been known to be completely wrong before uh we need a session cookie too so why don't I go back to authentication and then steal the session cookie let's run this puppy let's go back here and just see if the run is okay or if we really need a Google sheet if we need a Google sheet then that's really annoying okay yeah no I was right um just because of like Phantom but like the a lot of the time what'll happen with these scraping Services is they'll be developed by people without a very strong understanding of the English language and so um you know it's just a little bit more annoying for them to pick up on um like field names and stuff like that um I've found that there's usually like a little bit of a little bit of of of give I I guess in like the titles of some of these fields and what happened in this specific scenario is instead of calling this um I don't know input or something like that they called it spreadsheet URL and so you think that you need to supply a Google sheet spreadsheet uh instead you can just Supply like the URL of of Instagram uh that's cool man if I were working in French or whatever I'd probably have no idea what I was doing so completely understandable um but anyway in our case I just sort of thought back to hey you know inside of phantom Buster you don't just need a spreadsheet URL you can also just Supply a link and so kind of what's Happening under the hood I imagine is they have sort of options they're like if it's a Google sheet then I want you to run the Google sheet scraper if it's uh an Instagram profile then I only want you to scrape that one Instagram profile using our Instagram profile scraper and that's sort of what happened here so yeah anyway so we can run this we can then sleep 60 and then once we've launched the Phantom we're going to have the same design pattern that we had before where we we get the results basically and then we use the results to update some sheet so I'm going to create a second scenario here um this is going to be let's see watch output watch output of bulk Phantom Buster Instagram scraper what we're going to need to do here is we're going to need to set up a web hook again this will be a custom web hook drop that puppy in and then um we're going to create a new one called bulk Instagram Phantom Buster scraper let me just be correct in my formatting and terminology here and then we need to feed this in as the URL that we're going to be calling um this is probably going to take a little ball longer so the Sleep 60 probably makes sense and basically the flow that's going to happen is uh you know every I don't know once a day at 837 a.m. we are now going to call our Google sheet we're going to use that to pull our session cookie we're then going to launch the bulk Phantom Buster Scraper on um mkbhd's profile I'm using this as a static resource but you can imagine how in the Google sheet you might have another column or another sheet called like profiles to scrape or something in our case I'm just doing this because Marquez brownley probably has more than enough followers to keep us occupied for several lifetimes and then um you know before my microphone runs out of battery here uh you know then I'm sleeping for 60 seconds I'm then making an HTTP request to my next step in my scenario or in my in my flow and that next step is this second scenario which is now getting that web hook and then it's using it to download the result now what I want to do is I want to select the name of the container the name of the container is going to be bulk Instagram follower collector so I'm now going to download this uh looks like you can um containers created on March 30 oh I get it so every time you run it I guess there's a new container so what we need to do actually is we need to pass the name of the container in the in the call here you could just say container ID equals and then click that in that'll work and then we can just grab the uh the container ID here will be container undor ID okay great so now we select this we download the result we'll parse the Json a couple of times so that'll be the first parse and then this will be the second parse um I don't know if it's called result object whatever it'll be called but then all we'll do is we'll go into our Google Sheets and then we'll just add uh we we'll add rows um you can do this I mean like because of the way that this is set up we're probably going to have to do this one at a time but there are ways that you can add like a thousand rows or 100 rows to a Google sheet without consuming 100 operations this just sort of like above the uh or sort of beyond the the purposes of this video so I'm not going to cover that right now but yeah now we have what's going to happen is we're going to have like a big list of um profiles and then this is way more work than I thought that it would be to delete all these things Jesus we're going to have a big list of profiles over sorry a big list of uh profiles under two scrape here with profile URLs and then you know once every two hours or something we're going to go through that profile list and then we're going to um scrape every profile in that list using the individual scraper which is over here and then we'll again launch another Phantom to do the individual side we're sleeping 60 here but we don't necessarily need to and then we are going to make a request to our other scenario to actually go out and then download all of those results from this what we're then going to do is I'm going to feed in this container ID to the downloader result module we're then going to parse it a couple times then we're going to feed it into Ai and then at the very end we're going to um add it to our our Google sheet row under a sheet called profiles where there is also a status column and then there's like a DM column that they can just copy and paste into the DM and this is sort of like you know how you automate several 100 hours of work I would imagine per month um to actually go and do the scraping and that sort of thing and then in doing so save you you know a lot of money that you can then just have a VA or somebody like that um actually be responsible for the for the sending so that's sort of my idea behind this uh obviously there's still a little bit more that needs to be flushed out so let me just make sure that all this looks good so we're going to call the container ID here okay we're going to run this and then you know we're going to wait 60 seconds here so let me see how much I can do in 60 seconds um I think we're doing sheets here what we actually need to do is we need to like add a sheet yeah so I need to add a sheet I don't think I'll be able to do this before it runs so I'm just going to set a block here well I guess it looks about halfway done uh basically what we're going to do is every time that this runs we're going to add a new sheet the sheet is going to be called two scrape that's where we're going to generate that list of 100 profiles and then after we're done scraping them we're going to delete the sheet and then just regenerate it every time um that's probably the simplest way to do this at scale um sort of annoying to have to create a new sheet every time but that's okay okay great I blocked this um so what do we do here we got the Phantom we made the request to the other web hook we got it with the container ID we then download the result now because this is multiple results um I don't know if the data or something was messed up so I'm not really sure what's going on there file name is result. Jason let's just get this container ID and then do this manually the fact that it's empty is weird because oh maybe the data on the Phantom Buster side of things is empty because I'm running the same search twice I hate when that happens yeah input already processed so that's pretty annoying um I think there's a way to disable this setting so that every time you run it it's new let me just see yeah this is Watcher mode only look for new followers so no we don't want that if you rename your results file at any point between launches the Phantom will create a new results file so I think the way to do this is we just um you know we just run this every single time or we we'd add a different name every single time just to make sure that every time it runs it's uh running off new data alternatively we could just set you know we could just say if this isn't new and if the results file is null then yeah actually that's probably what we should do we should say if length of buffer is less than whatever then uh don't proceed because this isn't like actually doing a new a new scrape it's not actually dumping any new followers in so that's probably what we would do in reality however I don't really care about that uh because I I still want this like I still want to get the data of the last container so let me see if I can just do this man um manually so we can test this for now and then we can use a full run later this is we bulk Instagram follower collector and then there should be a couple runs we ran it uh recently 45 and then we also ran at 35 so let me just check to see if this is short no this is pretty nice and long so we're going to run that puppy and then what do we get get a list of profile URLs great what do we get here we parse that and then we get a bunch of bundles great so every one of these bundles is uh profiles and we can we could just add yeah okay so we're going to have to [Music] um okay what we should do is we should actually add the sheet before we do any of this and then we should reference that previous sheet and then use this to add profiles so I'm going to move this over here we're not going to have a block anymore we're going to create create two scrape sheet then add add this puppy in um this is going to be the same sheet I believe over and over and over again so we should be able to just de feed the spreadsheet ID in and then oh I guess we can just select from the list oh it's probably going to be in shared with me and we're probably going to need to reauthenticate okay we'll do Instagram phantomers we call this two scrape and we'll create a new one basically every time that this scenario runs and then uh actually we should probably do this after we do the Phantom Buster download result what am I doing this way we we should be able or maybe we even do it after this parse Jason because that way we'll know whether or not like the scenario worked and then if it didn't work it's not going to run and we can purse the second Jason afterwards and then okay after this we have a bunch of bundles and so now we can just do add row and then we have to unfortunately go through the rig roll of selecting the spreadsheet ID again it's probably the man if I had 5 seconds for every time I tried to find a spreadsheet ID okay and then the sheet name um looks like we can only manually select the sheet name that's kind of annoying so we're just going to call this to scrape and then the spreadsheet ID that we feed in here is going to be the same same as this column range I'm just going to say A to Z and then I imagine the data is just going to be profile URL uh username full name image URL ID is private is verified query timestamp okay this should be good what I'm going to do as a test is I'm going to delete this we're probably not going to have header now that I'm thinking about it but that's okay then I'm going to run this on the same container we're going to parse the Json if this Json is a long string it'll continue otherwise it'll error out which is nice we could have another check we could say does not exist um actually that's not going to be correct uh let's just do length of whatever is greater than let's just say 100 characters that's a good hack for now then two scrape sheet we'll create the sheet called two scrape we'll parse the rest Json and then we'll iterate through every bundle and then add it to the sheet that looks pretty good okay great let's run this puppy see what happens so it's now adding to the Google sheet if we go to two scrape you'll see that it's populating it quite nicely reasonably quickly I was would say uh we set it to 50 I think so we're we're only doing 50 awesome so that part of the flow looks good assuming that the container that we get is the right one then uh yeah everything should work there and then what we want to do is we want to we can't um select any of the stuff manually now we have to do it all automatic like using IDs so let me just see what the spreadsheet ID is here um the spreadsheet here it says Instagram Phantom Buster scraping so we're going to need to do this manually yeah this looks to be the same sheet which is nice the sheet name is going to be two scrape because we just created one called two scrape um awesome it's going to run and then I'm just going to run this and we see a bunch of results here which is nice now actually I don't even know if we need this anymore to be honest because uh we can just feed in the spreadsheet URL um and then the what H I wonder what the Google sheet spreadsheet URL here is going to be well it needs to be public so we can probably just feed this in right we do that and then this is going to be the spreadsheet ID we're not going to feed in profile URLs anymore uh we're going to do one post per profile one profile per launch uh actually no we don't want to do one profile per launch we want to do more then we have uh I'm just going to empty all of this sleep 60 we're probably going to do more than 60 so for testing purposes why don't we you know we got 50 here but for testing purposes why don't we just do like five you know anything that works with five should work with 50 and then um we may need headers this is really my only question we may need headers I don't know if we need headers we're screwed I think we're going to need headers because then we have to go through and then we have to like add the sheet add a line first with the headers and then we got to go and do everything which is super freaking annoying um the you a Google sheet containing a list of yeah oh okay maybe we maybe we don't actually need this maybe we actually don't need um anything except for the profile URLs actually yeah we might only need this okay yeah yeah I'm actually feel a little bit more competent confident about this now I feel more competent probably should huh okay let's do this um sleep we're not going to sleep 60 we're probably going to need to sleep substantially more than that and this is where you start getting into the question of like should I use a third party service to store my hooks right like I'm not just doing this for one profile anymore I'm doing this for like 100 or 500 or 5,000 you know the amount of time that it takes to scrape this is going to be so variable that you know I feel like it would be simple sorry it would be ludic ludicrous for me to try and do it on a simple um uh you know flat weight time okay anyway so we fed in the two scrape uh what I'm going to do is I'm just going to run this puppy and then I'm just going to see how the Phantom Buster run looks for the individual scraper this one here the photo extractor I should say so we're starting here and then I'm just going to run this yeah so actually we don't need to we don't need to do this at all we don't need to feed this in um even a tiny bit oh right right right yeah so what we can do is we can just click this and then just run this with the ID of the spreadsheet of interest and we can just hardcode that ID in oh yeah sorry I lied we we actually do need that because we need the session cookie don't we I lied I lied okay we'll go back here and um we're not going to do any of that we're just going to scrape authentication and then we're going to grab the posession cookie um and the syntax of this is one so I think I can just go to 11 one and that should be the same thing excellent okay um so I should say this is this may be scraping right now I don't know yeah missing cookie cuz we didn't Supply one so let's run this again with the correct cookie let's go back here um it's running which is nice I love how you get like instant feedback with this tool we're now successfully authenticating it's now going out and you know it's pulling in um photos for for each of these I would think it may be running still or it may have fed in just a my profile instead of the Google sheet let's see what happened here so we fed in spreadsheet URL we fed in number of posts per profile one yeah that should be that should be working let's go back here now that we don't have my profile hardcoded in and run it again see what happens okay saying that it's running saying the input is already processed though let's try hard coding this maybe seeing how that works okay yeah so that that seems to be working okay uh no seems like it um is getting no post from these profiles it's possible that none of these profiles have any post but I consider that unlikely oh right cuz they're private not all them are going to be private though yeah yeah I mentioned that this works um this works primarily with uh non like you know the profiles need to be public of course um in experience probably about like 30% of profiles are public so the other 70% are not going to be uh but that's okay because you know we're working with such big numbers here that it doesn't really matter if they're public or or private or whatever you know if you're running 10,000 followers or whatever you still have 3,000 to to go with it just looks like none of the ones that I supplied back there had a public profile which is sort of annoying um yeah that's freaking annoying man maybe we'll do Nix I don't know do I have another profile I think I have another profile I don't know for sure if that one is ex ah not entirely sure maybe I should just Source all of the people that have dm'd me on Instagram and just use you guys that'd be pretty fun no I don't think I'm going to do that um we'll just put these two in for testing purposes and then I'll just just run this um anyway and I think one of those might be accessible maybe not both so let's run this again we're then going to go back over here um it's saying input already processed I think because this is just me my name so uh I'm going to define a different output name I'm just going to call this x we're going to run this again and I'm going to connect that to my sleep 60 so it's ran this it's now running and getting um a new input with successful authentication looks like we got a couple of files here which is nice okay and now we're waiting and what we're going to do on this end is uh this is number three right so now we're going to turn on number four and then number four is going to catch that output get the output it's going to download the result like we did before uh do paring and then it's going to dump it into the the open AI module to get that like custom DM and then yeah I believe that's it I think that's all we need to do um clearly it works with the individual URLs let me just double check see if there's anything else oh the times we should make sure the scheduling is correct yes so the bulk Instagram Phantom Busters scraper let's just say it runs at like 8: a.m. um these [Music] Phantoms yeah I mean I've hardcoded in like the name of the profile but as I mentioned earlier you can do it elsewhere uh we're scraping very fixed number of profiles with fixed weight times which is unfortunate but obviously that'll still work we're checking kind of hackly to see if the results object is just greater than 100 characters um if it's uh empty then it'll just be less than 100 characters probably be like here or something so that's just another sort of hacky um solution but you know obviously it works we then creating the two scrape sheet oh we're currently dumping in all of the information I don't think we need it we can just do um oops we can do just the profile URLs clearly because that's the only thing that matters and then um I'm going to allow you to download this blue print there's the authentication there the spreadsheet URL is obviously hardcoded to be my spreadsheet URL so you know if you guys are screwing around with this in the blueprint just keep that in mind um I'm adding a fake CSV name just called X here but that's okay and then I'm calling the container ID oh and then uh it looks like we had an error but I'll cover that in a second then we're catching and then we are getting the output of this and it looks like there is an output then downloading the result Sal we are pursing this looks like the object that we got was no posts found so oh I think we're pulling the old container that's annoying but presumably that will not run when you have the new uh the new result and then there are three queries one says no post found actually all of them say no post found one just says post URL for some reason and then when we feed it into open a obviously it's not going to work because you know we're not feeding in an image URL um so yeah there's that but I know the reason why this is happening the reason why this is happening is because if you think about it logically um the bulk run that we ran at the very beginning where we have the two scrape profiles only includes Nick under dosf and then Nick Sara here which I think is like not working um I think I thought I had an additional profile but I don't think that I do and so because we've already pulled Nick Nots like what five times at this point it doesn't want to continually do it unless we continually provide a new name which I'm not doing um you can obviously continually provide a new name um you could just like randomize a number and then just use the number as the name or generate like a new uyu ID or something like that that's probably a good idea now that I'm thinking about it why don't we use that as the um uh as the name yeah we'll generate a new uu ID for the CSV name and then we'll also do this here that should fix this problem because the likelihood of like a hash Collision or whatever or or a collision is so low is to be meaningless yeah okay um yeah I mean honestly that that should work now uh we should probably put in some error handling so that if the Json is empty and we run through this we're not actually um you know we we might want to add like a little bit of a sleep here as well so why don't we call this and then why don't we just add like a couple second sleep you don't have to do this but if you are on a relatively low uh open AI plan like I'm on like the highest plan so I can technically do I don't know five trillion queries a minute or some like that the number of tokens I can do per minute is honestly quite ex quite uh incredible but if you're on like the beginning plan or something then you'll probably run into rate limits unless you sleep a couple seconds between each of these so I'm just going to say sleep two and then uh we should have some type of error handle where if error is no post found then it just doesn't run and so it just doesn't convert that into the next um row yeah but yeah aside from that you know assuming that you're using good data and assuming that the followers that you're getting are obviously um you know on like a profile that's big enough uh you know you can use my hardcoded Approach or you can add some some additional context if You' like but yeah that's how you build an Instagram scraper what I'll do now is I'm going to download the blueprints of each of these scenarios and then I'm just going to put them up on my make.com for people who want to make real money doc I'm also going to create a Google Drive which contains all of the blueprints that I've generated so far uh because a lovely follower of mine told me that uh as it is right now it's sort of unintuitive you sort of have to like open up a blueprint which is then a bunch of Json and you got to like download it manually which is stupid so I'll try and make it as simple as humanly possible but in short this is how you build like an automated Instagram scraping machine as I mentioned earlier there are a couple caveats you know the session cookie issue is one that you can solve and I've solved before for a bunch of people but you just need to like develop a third party browser automation to basically sign in Instagram or whatever service you want every X hours uh so if there's interest there then just let me know and I'm more than happy to record a video on it aside from that yeah I hope you guys enjoyed that was pretty interesting and hopefully I showed to you guys my thought process when designing these things I didn't really expect it to be a for scenario flow but uh just I was developing it you know starting to put the pieces together I didn't do any of the the mapping ahead of time which you know usually makes this a little bit easier just as I was doing that I was like yeah we're going to need one to launch we're going to need one to watch then we're going to have like a two-step so we'll need to multiply all that by two any who uh yeah thanks so much for watching if you guys have any specific requests just leave them down below otherwise like comment subscribe do all that fun stuff I will catch you on the next video byebye
Info
Channel: Nick Saraev
Views: 9,770
Rating: undefined out of 5
Keywords: automation, make.com, content creation, ai content, google sheets, chatgpt, wordpress, openai, gpt-3.5, blogging, integromat, make, automating, automate, gpt-4, gpt, openai api, indie hacking, small business, $20K/mo, make money online, make.com for people who want to make real money, make.com money, make.com entrepreneur, make.com guide, make.com tutorial, make.com money guide, make.com course, instagram, instagram dm, automate instagram, instagram automation, make.com instagram
Id: klozZP1Wf8g
Channel Id: undefined
Length: 83min 10sec (4990 seconds)
Published: Sat Mar 30 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.