Scraping JSON data from SCRIPT tag | Extracting latitude and longitude from Zillow properties

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it was going on guys welcome to your another web scraping tutorial on demand and today we're gonna be scraping data out of the scrape tax so here I have a commentary by unlimited sky zero zero hello there man if you listening to me you're watching this videos so this is dedicated for you again just like the previous one basically so here he wants to scrape their led you Dan delay and the longitude data but he's looking for this sort of data within the article class in the service code of the page is great and for but he only can see the five matches while there is lots of they're basically all all the coordinates are shown on the map and let me just demonstrate what is it talking about so here the task basically is just to find oh my god what happened to my browser again so the task is actually to find the latitude and longitude coordinates for each particular apartment here each particular property and as you can see here on the map there were lots of this dots which stands for the particular word needs so if you just gonna go to your the page to you the source code of the particular page then we can see that really if just try to search for the same latitude actually spelled correctly probably this is the way so here like somewhere near the article time because it was mentioning it's just like one here and two and three and this is and all the other platitudes word needs are basically located within the script tag here so you can see this sort of script tag with a type of application Jason so this is the exact source by the white words where we are supposed to get our coordinates from so you can see here we have the URLs for the for the exact properties we've been scraping already and in the previous version of the source code we already did manage to scrape the URL for the particular property so we'll be able to make use of that in order to find the coordinates within the script tag so this was this this will take quite a bit of efforts to be honest but still it's pretty doable to my to my mind so in this video we're supposed to be extracting this kind of coordinates for each particular address here so again let me just demonstrate you this so if we just grab the address and copy and just paste this to search and go in slightly up then we'll see that we have we already have this addresses within the view elements with the article class in particular so we can use this URL string to reference this script tag somewhere along there so like like this we can rip we can split the entire script tag content by this URL and then we can just extract this latitude and longitude element here and so basically we'll be splitting probably by by currently by closing curly braces followed by the comma and then we can do the just dance like jason loads and this trend so as far as we extracted that and then we can just extract the latitude and the longitude out of it so at least that's the theory that I'm supposed to apply in this in this tutorial so let's actually try to have a look whether it works or not for us so here in the current working directory I've just copied the file called zero rent which was covered in one of my previous videos were I was making a zero scraper for the Rastas illa scraper of properties for rent within the oh my god in Atlanta GA okay and I will album using this code as a basis basically but I've created an absolute copy of the of that code and call that zero rad chords the PI and here will be actually adding some new logic to extract the latitude and longitude so here when the items will also add the latitude and longitude as well so this one was supposed to do in this video basically okay so the other couple couple things to consider is actually first want to limit the crowling pages just to the very first one in order to avoid torturing the server and the second thing we need is actually to we need to store our response so we'll be able to actually extract in the script X data so actually there later you're in the longitude is also let me recover recall the names of the methods here so say response and load response okay so here we don't need to parse for a while instead we need to say self dot store response okay tax and we also want to break we want to break after growling her first page here and I hope to see the response dot HTML file and the very end so just go into the terminal open if I can't work in territory and by simply type in Python 3 rent words the pine him enter I hoped in what invoke my scraper here okay okay probably I misspell stories bones somehow oh say response now the store response I'm sorry for this so not Stuart but safe okay okay and now he did succeeded successfully made and I'm going into difficut request to the particular European point so here we have our response and if we try to look for the latitude over here for some reason I don't understand why my computer is so laggy okay ready dude hold on a sec okay so it seems like it's extra it doesn't really find this but the strings are too long so that's why the editor is going crazy so I just close this tab basically and I will be reference the source code within the browser in order to create the logic of extraction this elements so from now on well let me think a little bit so what are we supposed to do here okay so probably we don't need so okay here I can just restore the fence back to like this and now we now we don't need to go for making requests at all also we don't need we don't need to store information to Jason the only thing we need is actually to say self dot load response and I believe this would be the type of blame Python strain well at least I hope so let me just quickly check that out so you know the return to HTML which is the type of Fighters doing great and all we can simply parse that and after parsing yeah we'll be printing this this all the extracted elements as well I no need to say self dot forests so as far as really bugging the first method here don't need anything else here and the response like this problem you can call this in HTML because it's literally HTML string response like this and then it would be converted adjacent a beautiful soup object and being poor so I hope to see ok my Jason dealer great so now we have the list of this URLs that we can use as the references to find the coordinates within the script tag so the very first thing to consider here is actually we need to find the actual script tag that has those coordinates and probably will be looking for this script tag just try to see okay okay so probably we don't need to do this within the loop and over the cards because the script is loaded just once here so hmm okay let me just add a couple of comments bones and here we extract and extract 30 cards and here we'll say extract let's call this coordinate script so like this and here well that's called the variable script this is very fine I believe and we need to see content find all oh my god find a hole we're looking for this script tag and now let's have a look at the script tag itself so if it has some sort of a unique class or attributes that would be just nice well I don't know maybe [Music] maybe this maybe this guy is to be considerable so what about this mobile search and all the stuff is there loss of that so the first is here so is this a script tag no it's a div so okay I think we can use we can use this element as the unique identifier for the exact script we're looking for so just copy this one and specifying the key name [Music] value name I'll search page story copy and paste here I want to get well don't we don't need to find all we actually need to find the only element here and we also need to find Ceyda text here to just get the text attribute so now I hold my breath and basically I would try to so let's avoid printing this JSON responses for a while and instead I just want to print our script so just to making sure that we got the exact script containing our coordinates so I just want to say print script around the code one more time and okay so it seems like it seems like it does contain so it's just a quickly searchable at again I hope it would find some okay so let in that long that long so here it seems like exactly the same data that we were looking in the browser okay and also it does seem to have yeah this detail URL so this is the URL it was supposed to use you know to find this latitude and longitude stuff here so now let's basically start our experiments well you know like I didn't really prepare for this video something this for the very first time to be honest and well that's basically that's basically start experimenting here guys so now what we need in here so hmm so here were pen list okay so probably just before appending this here will say why well maybe it won't be the try/except but still let's call this try to extract coordinates from script okay so I believe here we can make use of items with a reference by the URL key so let's just try to print items referenced by the URL key and command this stuff out just to make sure that we have our access to the URLs for the particular properties well it seems like we do have that okay great so we'll won't be printing this stuff for a while and here okay what are we supposed to do here so now we can use this items URL in order to split the entire script strength so let's create a variable of coordinates and this would be equal to so we're looking for our script AK right so script split and we want to split this by the items URL and I'm not sure what exactly the element we need probably the last one I'm not sure check that out words here and run the code again let's provide a couple of new lines just in order to see work one and three aunts and the in order of begins basically so just to make it possible to make sure okay so status type four and well doesn't seem doesn't say what we need there so let's try the very first elements that okay white is so long okay hold on a sec okay I would now try to make it a little bit a little bit more precise here so I specify the detail you're all followed by the Colin and there's double quotes so just try to copy this okay so that's just great friend to make sure it kind of works so close this single quotes and boss items okay now we need to enclose this within the double quotes again and let's just try to bring our splitter so let's call this splitter just a bit more precise of beliefs of splitter here splitter save and okay try this one more time okay detail your role and the stuff we got here well know this I I think this definitely shouldn't work now so we'll be splitting this by our splitter okay and let's have a look at the coordinates again just just to turn to explain what I'm I try to achieve here guys so we're looking for okay so I just want to split the all entire this kind of strain I want to split this by the details followed by the specific URL and at the very and then I want to see the strains it starts from this that long and just to the very end and then we'll be able to extract this part of string so that's what I'm trying to achieve but for some reason doesn't yet work let's try this kind of one more time not sure regarding the indexing to be honest well maybe maybe I just should get rid of the indexes I mean I'm not sure it's really long response not doing this the very best ever possible wait but for summary then it gives us this query state map balance so it's not it's not really what what we kind of need here so I'm just trying to see where exactly where exactly goes here okay so he does this the very first part okay so that was because we used index of zero let's try the index of minus one so I'm in the very last one minus one is the very last one for Python lists indexing okay and now it's again not really that great so now it gives us the status type for ant and in other stuff it's really strange to be honest let me just check that as well and okay he gives us this while we were looking for the latter years okay so here is the details euro okay hold on okay I'm sorry guys finally seems like a have achieved the correct result here so first I met her this little break statement just to bring the only the data for the only element within the cards so just for the very first basically properly there and also yes so the -1 the -1 index was actually correct and here our quartz so and also also i've added this extra during the material for the splitter so if you have a look here then you see like after followed by by the link itself it goes like this lat long and so this would be just for easier to extract the latitude and longitude and yeah so this is this is basically kind of aunt no let me just want this one more time to show you how it works basically so you just run this one more time and scroll down slightly upward again then the her first line would be the exact exactly the latitude and longitude that we need actually takes quite a bit of time but still here and now we need to extract this part of of the strain so if we just so what was supposed to use as the delimiter and this time as a splitter at this time we used to make use of this closing curly braces followed by the come and I believe it will give us so this kind of string we don't have yet we don't yet have the comma so the camera would make wouldn't make the trick basically but this stuff should have so now let's try to take words equal word start split and then we will make this problem just like a one-line or probably we can use may make use of it just at the moment so here we can simply say can't like split and we want to split by curly braces followed by the comma and this time I believe we want to use the very first element and just let's hold our breath and this one more time great now we have our longitude and right at you and I want order to just being able to parse this value using the Jason module we need to add an extra closing curly braces so here we can simply say plus closing curly braces like this great and now we can say basically Jason dot loads and we're using our cords so now it should be considered as yeah as the jason object and now we can actually add these cords to our items so we can simply say mmm like probably items would be equal to this one Jason or needs and even better it would be to make just like this Jesus looks over here and here we just say coordinates and that's kind of it and now we can try to print the result we don't probably need our break statement anymore here and now we can just brain try to frame the results and I hope to see the coordinates exactly everywhere okay so for some reason was it from the first time okay so those extra coordinates foremost yeah great for most of of properties but but here okay so if you can't you can't bring some but so okay it's why I considered to type or a try here so so okay this line basically fails so let's try to say simple like trying and accept and here we'll say I can coordinate not available not available like this and also I would like to print I would like to print basically this kind of stuff just to make sure what did he actually extract extracting there so I'm just wondering why it can be Jason Cyril Eisner so have a couple of new lines to see basically and also maybe I call this just to make it more usable from the user perspective oh my god not glowing into scrapers cancel I'm sorry they execute the wrong command by the accident so here okay coordinates not available error okay so okay so this seems to be this seems to be basically the error here well the problem is he doesn't have the particular gyro it doesn't have the particular coordinates but just some sort of West is South North not sure what this stands for well definitely you could have extract this this kind of things as well the map bounce well I'm not sure if this is really a good idea or not well maybe maybe that's basically trying this so how do we suppose to pooris how do we supposed to parse this kind of stuff so we can it's all nice so we can split this by maybe this nope not a good idea so well I just want to I just want to understand if this this trend is being repeated rapidly or not so let me just quickly check that out as well so I copy and paste and yeah it seems like it's been yeah all the way basically so we can split by by this stuff so just copy and here okay and so we'll call this map bounce okay this okay now this is this is the splitter actually so splitter how basically probably it's more correct to call this a delimiter I'm not sure doesn't matter okay obviously need to enclose this within this single quotes okay and here we're working with the coordinates right but okay let's make another coordinates basically so just take this so I say grace Peeples this kind of stuff right and Matt bounce okay Matt bounce would be equal to quartz dots led by this leader and I'm not sure probably -1 index again and let's try to print our map bounce here bounce and also saying man bounce okay and yeah so let's write around this one more time basically okay my bounce seems to be just just pretty well so this is already jason serializable so I'll simply say so here we can say again like Jason loads and oh this kind of stuff and this time we don't need to use items coordinate equals to not available but instead we can make it equal to map bounce and this hold my breath and I hope to see they exact this exact difference so either the coordinates the particular coordinates or just a map bounce there so let's have a look so here we have the coordinates and like where's north north south and all this stuff okay here we have the latitude and longitude well this is quite pretty nice to my my personal understanding so it's for it's not it's just really the net bounce well II didn't specify that it's not basically the last you dog is it but instead in that bounds but still still this other coordinate so I consider this to be just fairly enough at least for this disguise conservative tutorial so unlimited sits kind 0 0 if you're still watching this well please consider please feel free to change this logical names for the variables I just I just want to show the very gist of how to proceed with this sort of stuff that extracting the data Auto this script X okay so from now on it seems like we did everything we actually need it to do and I'm not sure but probably it's time actually did it start to go for our cash casual logic here with growling all the pages all the four pages in our case so ok so this is the zero rant okay let me just delete this Wow and now I try to corral all the pages again and I just want to see what happens so if there would it be the proper coordinates ok so it seems like ok something goes wrong again well let's see scrape or wrong ok I can just find out what's wrong here ok so the problem is within this line so let's try to enclose this into the try/except basically again so probably don't need a worse bones anymore leave this well look at her just basically accept and pass there and that's it just leaving the not available value but I just want to have a look ok so so this mm-hmm so this line results in error here so try and hear I want to accept just wanna print I just want to print this coordinates so just to make to make it clear why exactly why exactly it gives us an errors probably some sort of the string format is a format and again maybe there is a non value or something like that so well we'll basically now probably find that out as well well at least I hope so so okay just crawling back again and here somewhere should give us an error basically okay here is in here is the error also they successfully successfully okay okay so here at some point he couldn't do much okay what was so probably didn't do anything he didn't ever scrape that well as far as here is no information regarding regarding the coordinates itself mmm I can simply say I can simply accept this case by typing Omega where is this okay so here we'll just simply say copy paste and we just we just say that this the coordinates are basically not available on let's leave this entire output file again and I'll try to run this one more time and now I hope that this would be just just gonna fine so either the coordinates the map bounce or the coordinates which is pretty nice okay oh okay and here coordinates are not available well which is which is literally just I believe this is just fine to have the 1 value Mason well okay guys I think this is it for this video I hope you learned something interesting regarding how to extract data out of the script tags so unlimited sky zero zero just again I want to I want to say thank you man for providing this you know now they're really nice and interest in topic for the upcoming videos on this comic against channel so this is it from my side guys I wish you to keep calm in this stuff coronavirus time and just not going crazy like the world does just probably stay at home to your job and I don't know learn things trying to understand try to obtain your own on this understanding of the situation out there and basically do your job and live your life and everything's gonna be just fine so until next time and take care
Info
Channel: Code Monkey King
Views: 958
Rating: 5 out of 5
Keywords:
Id: 14lsy2lKVLs
Channel Id: undefined
Length: 35min 13sec (2113 seconds)
Published: Mon Mar 30 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.